- unwind ai
- Posts
- Google Introduces AI RAG Engine
Google Introduces AI RAG Engine
PLUS: Largest opensource model with 4M context, AI agent recipes
Today’s top AI Highlights:
Build multi-agent systems with state and custom routing
Google’s fully-managed RAG engine that balances ease of use and customization
Largest opensource model with 456B parameters, 4 million context and Lightning Attention
Automate identifying and fixing architectural tech debt with AI
& so much more!
Read time: 3 mins
AI Tutorials
LLMs are great at generating educational content and learning roadmaps, but they struggle with complex, multi-step workflows. While you could ask an LLM to create a curriculum, then separately ask it to design exercises, then manually compile resources – this process is tedious and requires constant human coordination.
In this tutorial, we'll solve this by building an AI Teaching Agent Team. Instead of isolated tasks, our AI agents work together like a real teaching faculty: one creates comprehensive knowledge bases, another designs learning paths, a third curates resources, and a fourth develops practice materials.
The user just needs to provide a topic. Everything is automatically saved and organized in Google Docs, creating a seamless learning experience without manual overhead. We are using Phidata and Composio to build our AI agents.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
AgentKit by Inngest offers a robust framework for building and orchestrating AI agents. It lets you create anything from single-agent systems to complex multi-agent networks that share state and work together.
Built with TypeScript, AgentKit provides a clean API for composing agents, managing state persistence, and controlling routing logic between agents. What makes it particularly appealing is its flexible architecture that allows for both supervised and autonomous routing, letting you decide exactly how much control you want over your agent workflows.
Key Highlights:
Streamlined Workflow - Build agents with a straightforward API that handles model inference, tool integration, and state management. Each agent can use different models (OpenAI, Anthropic, etc.) and maintain its own set of tools, while sharing context through a network-wide state system that persists between calls.
State Management - The framework implements a dual-storage approach with an append-only message history and a key-value store for sharing data between agents. This makes it easy to build agents that can maintain context, share information, and make decisions based on previous interactions without writing complex state management code.
Flexible Routing - Choose between code-based routing for predictable workflows, agent-based routing for autonomous decision-making, or a hybrid approach. The routing system gives you granular control over agent interactions while providing built-in safety mechanisms like maximum iteration caps and error handling.
Testing and Deployment - Run agents locally during development, then deploy to production with built-in support for retries, caching, and durability. The framework handles all the complexity of managing concurrent agent calls and maintaining system stability.
Writer RAG tool: build production-ready RAG apps in minutes
RAG in just a few lines of code? We’ve launched a predefined RAG tool on our developer platform, making it easy to bring your data into a Knowledge Graph and interact with it with AI. With a single API call, writer LLMs will intelligently call the RAG tool to chat with your data.
Integrated into Writer’s full-stack platform, it eliminates the need for complex vendor RAG setups, making it quick to build scalable, highly accurate AI workflows just by passing a graph ID of your data as a parameter to your RAG tool.
Google just announced the general availability of Vertex AI RAG Engine, a fully managed service to build RAG applications. It hits a sweet spot for developers seeking a balance between ease of use and customization.
The service manages vector storage, chunking, and retrieval strategies so you don’t spend time wrestling with infrastructure setup. Unlike most managed services that limit your choices, Vertex AI RAG Engine lets you mix and match components based on what works best for your stack. It integrates seamlessly with Vertex AI Search, multiple vector databases, multiple generative and embedding models, and supports connectors to various data sources including Cloud Storage, Google Drive, Jira, and Slack.
Key Highlights:
Adaptable Architecture & Tooling - Vertex AI RAG Engine isn’t a vendor lock-in; it integrates with what you're already using. You can choose your preferred vector databases like Pinecone or Weaviate, or use Google's Vertex AI Vector Search. It supports multiple models, including Gemini, Llama, and Mistral.
Data Ingestion & Management - RAG Engine has built-in connectors for various data sources, including Cloud Storage, Google Drive, Jira, Slack, and even local files. It also provides simple configurations to adjust data chunking, parsing, and other preprocessing steps. It also integrates with Document AI layout parser, which can be used to parse and chunk documents.
Retrieval and Ranking - Improve result relevance using built-in reranking capabilities powered by LLMs or Vertex AI Rank service. The system automatically tunes retrieval parameters based on usage patterns and provides controls for adjusting similarity thresholds and top-k results, helping you deliver more accurate responses to user queries.
Native Gemini Integration & Production Readiness - The RAG Engine is natively integrated with Gemini models as a tool. Deploy with confidence using features like error handling, rate limiting, and comprehensive logging. The service automatically handles scaling and provides monitoring tools to track performance and debug issues in production.
Quick Bites
Jina AI has released ReaderLM v2, a 1.5B model to convert HTML content into markdown and JSON, supporting up to 512K tokens and 29 languages. The model outperforms much larger models like GPT-4o and Gemini 2.0 in HTML-to-markdown conversion tasks and matches their performance in HTML-to-JSON. It is available via Jina's Reader API and on major cloud platforms including AWS SageMaker, Azure, and GCP marketplace.
Singapore-based tech company MiniMax has open-sourced its new MiniMax-01 series of models - a text model and a vision model - featuring Lightning Attention architecture. The text model is an MoE model with a staggering 456B parameters with 45.9B activated per token, while the vision model is a transformer model with 303B parameters. API pricing is at $0.2 per million input tokens and $1.1 per million output tokens.
The architecture uses Lightning Attention with linear attention in 7 out of every 8 layers, marking the first commercial-scale implementation of linear attention mechanism.
Text model can handle context lengths of up to 4 million tokens, which is 20-32x longer than other leading models.
The models compete strongly with GPT-4o, Gemini 2.0, Claude 3.5 Sonnet, and Llama 3.3 405B across all tasks.
Both models with complete weights are available on GitHub and Hugging Face, and via API.
Together AI has launched "Agent Recipes," a resource with pre-built agent and workflow code examples in Python and TypeScript. The initial release includes recipes for prompt chaining, routing, parallelization, orchestrator-worker setups, and evaluator-optimizer loops, with autonomous agent recipes coming soon.
Tools of the Trade
Axal: Automatically identifies architectural tech debt, like circular dependencies and dead code, and prioritizes it based on business value. After issues are identified and prioritized, the tool uses AI to autonomously fix them, without breaking existing code.
GitInsight: Analyzes your GitHub repo and shows event counts per repo as well as feedback (roasts) by ingesting all your recent commits, PRs, issues, and comments along with their messages. It roasts through different personas - from a nitpicky software architect to a potential Tinder match.
DocWrangler: Open-source IDE to create and refine AI-powered data processing pipelines through interactive feedback and visualization tools. It has a spreadsheet interface with automatic summary overlays, in-context feedback mechanisms for prompt refinement, and an AI assistant to understand concepts and patterns.
Lexoid: A Python document parsing library that combines traditional static parsers with LLM-based parsing capabilities through a unified API. It intelligently routes document pages between fast static parsers for simple content and LLM processing for complex data.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
A million-dollar question is how to make an LLM say "I don't know." ~
Andriy BurkovAfter spending the weekend trying to glue LLM calls together (aka AI Agents), I regret to admit that LLMs are pretty stupid. ~
Shayan
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply