unwind ai
Posts
RAG Without Vector Database

RAG Without Vector Database

PLUS: Multi-agent systems on distributed networks, 1.5B model outperforms O1-Preview

Shubham Saboo & Gargi Gupta
February 13, 2025

Today’s top AI Highlights:

Roaming RAG - RAG without chunking, vectorization, and database
Build and scale multi-agent systems with mixed architectures
A 1.5B parameter model outperforms OpenAI O1-Preview model
Major LLM APIs shared user prompts due to Prompt Caching
Run DeepSeek R1 locally on an iPhone

& so much more!

Read time: 3 mins

AI Tutorials

When solving coding problems, developers often encounter them in different formats - whether as text descriptions, screenshots from documentation, or images from whiteboards. Having a tool that can understand these different formats and help generate optimal solutions can significantly speed up the development process.

In this tutorial, we'll build a powerful multimodal coding assistant that combines three specialized AI agents working together:

Vision Agent (using Gemini 2.0 Pro): Handles image processing, extracting coding problems and requirements from uploaded screenshots or pictures
Coding Agent (using o3-mini): Generates optimized code solutions with proper documentation and type hints
Execution Agent (using o3-mini + E2B): Runs the generated code in a secure sandbox environment and provides execution results and error analysis

Users can submit problems either as text descriptions or images, and the appropriate agent takes charge based on the input type.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Multimodal AI Coding Agent Team with o3-mini and Gemini 2.0

Fully functional AI agent app with step-by-step instructions (100% opensource)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

LLMs Navigate Documentation Without Vector Storage🚶‍♂️📚

Roaming RAG introduces a new approach to building RAG systems that skipps vector databases entirely. This implementation lets LLMs navigate (roam) documentation directly through a hierarchical structure, similar to how humans use a table of contents to find information.

By parsing documents into sections with unique identifiers and allowing the LLM to expand relevant sections on demand, Roaming RAG reduces infrastructure complexity while maintaining context richness. The approach works particularly well with structured documentation like technical manuals, legal codes, and the emerging llms.txt standard.

Key Highlights:

Simplified RAG Implementation - Skip the traditional pipeline of document chunking, vectorization, and database setup. A 300-line codebase handles document parsing and section management, making it quick to implement and maintain. The system assigns unique identifiers to sections and provides tools for the LLM to navigate through them programmatically.
Better Context Preservation - Unlike traditional RAG systems that retrieve isolated chunks of text, Roaming RAG presents information within its document hierarchy. The LLM can see section headings, understand document structure, and navigate to related content, leading to more informed and accurate responses.
Smart Document Navigation - The LLM can explore documentation strategically by expanding sections at different levels, diving deeper into subsections, or exploring multiple sections in parallel. This mimics how humans naturally navigate documentation, improving the relevance of retrieved information.
Practical Requirements - Works best with well-structured documentation where titles and headings are clear, sections have descriptive opening text, and content follows a logical hierarchy. Perfect for technical documentation, product manuals, and especially the new llms.txt standard, which provides machine-readable site documentation.

Multi-agent Systems with Mixed Models and Architectures 👫📈

Naptha brings multi-agent systems to the next level with its framework for developing and running distributed agent networks at scale. The framework lets you build and runn multi-agent systems at scale with heterogeneous models, architectures, and data.

Agents and other modules can run on separate devices, while still interacting over the network. The platform handles everything from local inference and communication to storage and orchestration, allowing you to focus on building your agent applications.

Key Highlights:

Flexible Module System - Build your agent system using modular components that handle specific functions like knowledge bases, memory, tools, and environments. Each module can run on separate devices while maintaining communication through APIs. This means you can start small and expand your architecture as needed without major refactoring.
Framework-Agnostic Integration - Connect agents built with different frameworks like CrewAI, Autogen, or LangChain using simple decorators. No need to rewrite existing code - just add a few lines to make your agents Naptha-compatible. The SDK handles all the cross-framework communication behind the scenes.
Production-Ready Tooling - Ships with built-in support for common agent needs like database storage, file handling, and API management. The included evaluation tools help measure agent performance, while detailed logging makes it easy to monitor and debug multi-agent interactions.
Local LLM Integration - Run various open source models using vLLM or Ollama with a fully OpenAI-compatible API server. Naptha nodes support structured outputs, tool calling, and optimized throughput for multi-agent simulations - letting developers achieve complete privacy while using LLMs on their data.

Quick Bites

This is a very interesting project by UC Berkeley. Their PhD students have built DeepScaleR-1.5B-Preview, a language model that surpasses OpenAI's o1-preview on math reasoning tasks while using just 1.5B parameters. The model, finetuned from Deepseek-R1-Distilled-Qwen-1.5B using reinforcement learning, achieved 43.1% accuracy on AIME2024 (14.3% improvement over the base model) and was trained using only 3,800 A100 GPU hours ($4,500). The team has open-sourced their dataset, code, and training logs.

Your prompts to popular LLM APIs might not have been as private as you thought. A new audit by Stanford reveals that several major language model API providers were using "global prompt caching," a speed-boosting technique that inadvertently shared cached prompts across all users. This meant an attacker could potentially infer what others were asking the AI, simply by measuring response times. The good news is that following responsible disclosure, many providers have already patched the issue, but it is genuinely concerning and highlights the need for rigorous security audits.

AI2 has released OLMoE, a fully open-source on-device language model, now available as an iOS app for iPhone 15 Pro and newer devices or M-series iPads. The app, running a quantized 7B parameter model at 41 tokens/second, enables private, offline AI interactions - the complete source code is available to build upon or integrate into your applications.

LlamaIndex has open-sourced CrossPoster, an AI agent that will automatically cross-post any draft to Twitter, LinkedIn, and BlueSky. Built using LlamaIndex workflows and powered by Claude 3.5 Sonnet, this agent can handle platform-specific nuances like @-mentions and character limits, using AI to identify entities and adapt content, while maintaining a human-in-the-loop verification. It’s a great project to take inspiration from and build multi-platform content management agents.

Tools of the Trade

Scrapling: Python library for high-performance web scraping that automatically adapts to website structure changes, maintaining functionality where traditional scrapers might fail. It offers features like smart element tracking, content-based selection, and multiple fetching options (including stealthy and browser-based approaches).
PgAssistant: Open source tool to help you understand and optimize your PostgreSQL database performance. It provides insights into database behavior, identifies schema-related issues, and assists in correcting them.
Pocket: iOS app for running AI language and image models completely offline on your phone, including DeepSeek R1 alongside other models like GPT-4o, Mistral, and Claude. You can switch between different models and customize your experience through themes, fonts, and system prompts.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

google’s core problem is that it was built to organize a web that no longer exists. the open web has been replaced by walled gardens, discord servers, newsletters, private forums, & algorithmic feeds that are never exposed to search. worse, the visible parts of the web that google still indexes have been overrun by seo-optimized sludge, ai-generated spam, & paywalls.

their dna is fundamentally extractive. they never built a creator ecosystem because their whole game was to scrape, index, & serve ads against other people’s content.

the entire ecosystem slowly but surely shifted drastically—with llm’s anyone can organize anything so the mission breaks down. ~
signüll
Things I'm looking forward for AI to replace:
• Jira
• Scrum
• Non-technical software managers
• Stack Overflow
• Waiting in line for the next available representative
• Legalese, terms and conditions, and any stupid fine print
• Every writer of "10 insane demos. Number 7 will shock you" posts ~
Santiago

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.