• unwind ai
  • Posts
  • Build Context Aware RAG Pipelines

Build Context Aware RAG Pipelines

PLUS: ChatGPT AI Agent šŸ¤ VS Code, Xcode, Terminal

Todayā€™s top AI Highlights:

  1. Build smarter RAG pipelines with this new RAG technique requiring minimal code changes

  2. Train your own realtime voice-to-voice AI with any LLM using this opensource training code

  3. ChatGPT can now use VS Code, Xcode and Terminal on your MacOS desktop

  4. All-in-one embedding model for content-rich multimodal RAG

  5. An intelligent gateway to protect, observe, and personalize LLM apps (agents, assistants, co-pilots) with your APIs

& so much more!

Read time: 3 mins

Quick Update: Tutorial Week Ahead! šŸš€

Just a quick heads-up ā€“ next week's going to be a bit different. Instead of our usual daily AI newsletter, we're dedicating the entire week to in-depth AI tutorials. We are taking a short break to recharge (our first vacation since starting Unwind AI!), but we've handpicked 5 technical tutorials that we're really excited to share with you.

We'll be back to our regular newsletters the week after, hopefully with fresh energy and updates to serve you better. Can't wait to dive back in and tell you about all the latest AI dev updates! Until then, enjoy exploring these hands-on tutorials ā€“ we'd love to hear what you build with them.

AI Tutorials

Running a fully local RAG (Retrieval-Augmented Generation) agent without internet access is a powerful setup, allowing complete control over data, low-latency response, and ensuring privacy.

Building a local RAG system opens up possibilities for secure applications where online connections are not an option. In this tutorial, youā€™ll learn to create a local RAG agent using Llama 3.2 3B via Ollama for text generation, combined with Qdrant as the vector database for fast document retrieval. 

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Donā€™t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

LlamaIndex has introduced a new RAG technique called dynamic section retrieval that solves the problem of context fragmentation in document Q&A systems. Instead of breaking documents into isolated fragmented chunks, it maintains complete sections by using a two-pass retrieval system. The approach is particularly useful when working with technical documentation, research papers, and long-form content where preserving the structural context is crucial.

The team has provided an example implementation in a Jupyter notebook that demonstrates the technique using ChromaDB, showing how to integrate it into existing RAG pipelines while significantly improving the context quality fed into LLMs.

Key Highlights:

  1. Two-Pass Retrieval - The method uses a two-pass approach: first pass performs a standard similarity search to find relevant chunks, second pass utilizes metadata associated with these chunks to retrieve all chunks belonging to the corresponding section. This ensures that the retrieved context is complete and maintains its original structure.

  2. Drop-in Integration - It can work with existing vector store setups by adding a metadata layer on top. The example shows ChromaDB integration, requiring only additional metadata filtering on top of standard similarity search, without changes to existing embeddings or indexes.

  3. LlamaParse for Structured Document Parsing - Using LlamaParse, the system quickly converts documents into a structured format suitable for analysis and metadata extraction. This allows for granular control over the retrieval process.

  4. Context Window Optimization - The implementation includes built-in handling of context window limitations through tree summarization, making it practical for use with different LLMs without manual prompt engineering.

  5. Get Started - Check out the example implementation showing how to process academic papers, extract section metadata, and build a section-aware retrieval system. It includes code for both the core retrieval logic and a basic query engine setup.

The fastest way to build AI apps

Writer is the full-stack generative AI platform for enterprises. Quickly and easily build and deploy AI apps with Writer AI Studio, a suite of developer tools fully integrated with our LLMs, graph-based RAG, AI guardrails, and more.

Use Writer Framework to build Python AI apps with drag-and-drop UI creation, our API and SDKs to integrate AI into your existing codebase, or intuitive no-code tools for business users.

Conversational AI platform Ultravox has released Ultravox v0.4.1, a family of multi-modal, open-weight models trained specifically for real-time AI conversation agents. The opensource model family is trained on Llama 3.1 8B & 70B and Mistral NeMo. Model training code is available on GitHub and the weights are on HuggingFace. You can use the training code for your own version of Ultravox on another model or different data sets.

Accompanying this is Ultravoxā€™s managed API service, Ultravox Realtime, with 30 minutes of free credit and includes built-in support for multiple voices, tool calling, and telephony integration.

Key Highlights:

  1. Technical Architecture - Ultravox does not rely on a separate automatic speech recognition (ASR) stage. Rather, the model consumes speech directly in the form of embeddings, reducing information loss from speech traits like emotion and intonation.

  2. Model Performance - Ultravox shows speech understanding capabilities that are comparable to OpenAIā€™s Realtime, and are better than other opensource options.

  3. Realtime API - Ultravox Realtime comes with complete SDK support for major platforms, with ready-to-use components for echo cancellation, reconnection, and sound isolation. The API includes WebSocket support for real-time streaming and handles background noise and multi-speaker scenarios better than pipeline systems.

  4. Integration & Deployment - Built on vLLM for efficient scaling, the system can be self-hosted or used via the managed API. You can access pre-trained weights on HuggingFace, train your own Realtime with any other LLM using the training code on GitHub, or integrate with existing API infrastructure.

Quick Bites

ChatGPT for macOS now integrates with your desktop app directly - VS Code, Xcode, Terminal, and iTerm2 for hands-on coding assistance. In this early beta, Plus and Team users can use ChatGPT to write code or even make git commitsā€”everything right from your desktop, with permissions firmly in your hands. Just update to the latest app version and youā€™re set; Enterprise and Education users, your access is coming in just a few weeks.

Bloomberg also reported that OpenAI is planning to launch an AI agent called ā€œOperatorā€ in January thatā€™ll autonomously perform tasks by controlling computer interfaces. This is probably the first step in that direction!

If youā€™ve been struggling with multimodal RAG, this is for you! Voyage AI has released voyage-multimodal-3, a new state-of-the-art for multimodal embeddings that can vectorize texts + images and capture key visual features from PDFs, slide decks, tables, figures, and more. It outperforms Cohere's latest multimodal model by up to 43% in document retrieval quality. Available now via API, you can try voyage-multimodal-3 with 200 million free tokens and explore setup details in their sample notebook and documentation.

Quick Updates from Google:

  • New Gemini model - Google has released a new Gemini-exp-1114 model in the Google AI Studio for testing. Currently with 32K tokens context window, itā€™s unclear if it belongs to 1.5 Pro or Flash family. The model reached the top rank in LMSYS Chatbot Arena, matching GPT-4o-latest and surpassing o1-preview. The API will be available soon.

  • Gemini iOS App - The Gemini app is now live on iPhone, offering free-flowing, multi-language AI interactions, image generation, and study assistance right from a dedicated app.

Anthropicā€™s Console now has a "Claude-powered prompt improver" that optimizes your prompts in under a minute. Hereā€™s how it works: enter your prompt > specify improvement areas > Claude drafts a plan to improve your prompt using chain-of-thought reasoning > Claude writes the initial draft > the draft goes through Claude AGAIN to find additional points for revision > Claude writes the final draft of the optimized prompt. Try it out yourself in the Anthropic Console.

Tools of the Trade

  1. Arch: A powerful prompt gateway to manage prompt-based interactions with APIs for AI agents, RAG applications, and co-pilots. It centralizes functions like intelligent routing, jailbreak prevention, function calling, and observability to streamline prompt handling and personalization for AI apps.

  2. Goose: A command-line dev agent that automates repetitive coding tasks, from debugging to dependency updates, and other boring stuff. It integrates with tools like GitHub, Jira, and Slack, and can be extended with Python toolkits.

  3. Next.js AI Chatbot: Opensource template by Vercal for building AI chatbots with multiple LLM options from OpenAI, Anthropic, Cohere, and more, seamless routing, and built-in data storage. Easily deploy it on Vercel or run locally.

  4. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos with simple text prompts. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. Iā€™m constantly befuddled by how many otherwise tech-savvy people still consider what is going on with AI right now as just another technological breakthrough, albeit a major one. Folks, we are in a midst of historical change none of us have ever seen. ~
    Bojan Tunguz

  2. Every serious AI researcher I've met believes we need something beyond LLMs to reach AGI. The field splits on whether missing ideas are small (LLM++) or fundamental (Deep Learning + Program Synthesis). In public debates, LLM++ gets collapsed into just LLM. ~
    Mike Knoop

Thatā€™s all for today! See you tomorrow with more such AI-filled content.

Donā€™t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends šŸ˜‰ 

Reply

or to participate.