Don't Do RAG; Do CAG

PLUS: Open source computer use agent, Local LLM with function calling

In partnership with

Today’s top AI Highlights:

  1. Build lightweight AI agents with this extensible opensource framework

  2. Don't do RAG; when Cache-Augmented Generation is all you need

  3. 100% open source Computer Use agent

  4. Run LLMs locally with function calling with LM Studio

  5. Use multiple LLM backends in a single project with a unified API

& so much more!

Read time: 3 mins

AI Tutorials

The demand for AI-powered data visualization tools is surging as businesses seek faster, more intuitive ways to understand their data. We can tap into this growing market by building our own AI-powered visualization tools that integrate seamlessly with existing data workflows.

In this tutorial, we'll build an AI Data Visualization Agent using Together AI's powerful language models and E2B's secure code execution environment. This agent will understand natural language queries about your data and automatically generate appropriate visualizations, making data exploration intuitive and efficient.

E2B is an open-source infrastructure that provides secure sandboxed environments for running AI-generated code. Using E2B's Python SDK, we can safely execute code generated by language models, making it perfect for creating an AI-powered data visualization tool

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

AutoChain is a lightweight and extensible framework for building and evaluating AI agents that reduces complexity compared to LangChain and AutoGPT. It maintains a lightweight structure with just two layers of abstraction, helping build and iterate on agent implementations quickly.

The framework introduces automated multi-turn conversation evaluation where AI-simulated users test agents under different scenarios, eliminating the need for manual verification. AutoChain also integrates with various memory systems and supports evaluation of agents.

Key Highlights:

  1. Simplified Architecture - Built around two core components: Chain for orchestration and Agent for decision-making. This reduced abstraction makes the codebase easier to follow and modify while still maintaining essential agent functionality.

  2. Automated Testing Framework - Evaluates agents through AI-simulated conversations, where one model acts as a test user with specific contexts and goals, while another assesses if the conversation achieved the intended outcome. This reduces the manual effort needed for agent evaluation.

  3. Multi-Framework Support - Compatible with evaluating agents built on different frameworks including LangChain and those using OpenAI's function calling.

  4. Memory System - Supports multiple memory implementations including buffer memory for experiments, long-term memory with vector databases, and Redis for distributed setups.

Discover 100 Game-Changing Side Hustles for 2024

In today's economy, relying on a single income stream isn't enough. Our expertly curated database gives you everything you need to launch your perfect side hustle.

  • Explore vetted opportunities requiring minimal startup costs

  • Get detailed breakdowns of required skills and time investment

  • Compare potential earnings across different industries

  • Access step-by-step launch guides for each opportunity

  • Find side hustles that match your current skills

Ready to transform your income?

Finding the right balance between performance and complexity in RAG systems remains a persistent challenge. Traditional RAG pipelines, while powerful, come with siginificant latency during retrieval, can sometimes fetch irrelevant documents, and require careful maintenance of multiple components - all of which impact the end-user experience.

Enter Cache-Augmented Generation (CAG), which uses long-context LLMs by preloading documents and precomputing the key-value cache. Unlike simply dumping documents into a large-context LLM's prompt, CAG preprocesses the information and stores it in an optimized format that the model can access much more efficiently. You can implement this using this Python framework that supports popular LLMs like Llama 3.1.

Key Highlights:

  1. Performance Gains - Eliminates retrieval latency by preloading documents and precomputing KV cache, resulting in significantly faster inference times compared to traditional RAG. Experiments show superior BERTScore metrics in most test scenarios.

  2. Simple Architecture - Removes the complexity of managing separate retrieval pipelines and vector stores. The entire knowledge base is loaded directly into the LLM's context, reducing system overhead and maintenance requirements.

  3. Opensource Framework - Includes Python scripts for both RAG and CAG experiments, with support for different datasets (SQuAD, HotpotQA) and models. You can easily configure parameters like knowledge size, paragraph limits, and similarity metrics.

  4. Practical Use Cases - Best suited for applications with limited, manageable knowledge bases that fit within the LLM's context window. Particularly effective when working with datasets up to 100,000 tokens, making it ideal for domain-specific applications and targeted knowledge tasks.

Quick Bites

E2B has released "Open Computer Use," an open-source AI agent that can use a virtual desktop environment. The agent uses a secure cloud Linux computer, powered by E2B's Desktop Sandbox. It employs Llama 3.2, Llama 3.3, and OS-Atlas (a foundation-action model), and can execute tasks via keyboard, mouse, and shell commands, while live-streaming the sandbox display. The project, fully open-source, is available on GitHub.

Microsoft has fully opensourced the Phi-4 model, announced last month. The model is available under the MIT license, and its weights can be downloaded from Hugging Face. You can run the model locally using the Hugging Face Transformers library, as well as through platforms like LM Studio and Ollama.

LM Studio's latest version update introduces function calling capabilities through its OpenAI compatibility API to use local models for tool use with existing OpenAI-compatible frameworks. The update also brings vision capabilities with support for new vision-input models like Qwen2VL family and Qwen/QVQ, available in both MLX and llama.cpp engines.

Tools of the Trade

  1. RLLM: Rust library that lets you use multiple LLM backends in a single project: OpenAI, Anthropic, Ollama, DeepSeek, xAI, and Phind. With a unified API and builder style - similar to the Stripe experience - you can easily create chat or text completion requests without multiplying structures and crates.

  2. GPT Crawler: Crawls websites and generates knowledge files to create custom GPTs or OpenAI assistants from URLs. It scans web pages according to specified configuration settings and compiles the extracted content into a JSON file that can be uploaded to OpenAI's platform.

  3. Keet: Creates APIs for websites that don't have them, allowing you to programmatically interact with those sites on behalf of your users. It handles authentication and session management so you can perform actions (like form submissions or data extraction) through API calls instead of browser automation.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. agent frameworks are useless (why are there so many) ~
    anton

  2. year is 2025. NVIDIA is worth 5 trillion dollars. agi is real. elon musk is named first citizen, imperator of Britannia. wyd in this situation ~
    roon

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.