unwind ai
Posts
Memory Recall in AI Agents

Memory Recall in AI Agents

PLUS: Rewrite-Retrieve-Read RAG framework, LLM compression with 100%+ performance

Shubham Saboo & Gargi Gupta
January 06, 2025

Today’s top AI Highlights:

Opensource framework to bring temporal memory recall in AI agents
Improve RAG systems by rewriting search queries before retrieval
Llama.cpp-compatible model compression with 100%+ accuracy recovery
Google’s whitepaper on AI agents
One serverless API to call 41 LLM APIs out of the box

& so much more!

Read time: 3 mins

AI Tutorials

Data analysis often requires complex SQL queries and deep technical knowledge, creating a barrier for many who need quick insights from their data. What if we could make data analysis as simple as having a conversation?

In this tutorial, we'll build an AI Data Analysis Agent that lets users analyze CSV and Excel files using natural language queries. Powered by GPT-4o and DuckDB, this tool translates plain English questions into SQL queries, making data analysis accessible to everyone – no SQL expertise required.

We're using Phidata, a framework specifically designed for building and orchestrating AI agents. It provides the infrastructure for agent communication, memory management, and tool integration.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build an AI Data Analysis Agent

Fully functional AI agent app using GPT-4o (step-by-step instructions)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Hybrid Vector Search for Semantic AI Memory 🧠

Memora brings human-like memory capabilities to AI agents, enabling them to remember and learn from past interactions. Built with Neo4j and Qdrant databases at its core, it automatically extracts, stores, and recalls relevant information during conversations.

The framework handles everything from temporal memory tracking to name placeholders, and integrates with popular LLM providers through a clean API. You can start with SQLite for local testing and seamlessly move to production, using the same codebase.

Key Highlights:

Memory Management Made Simple - Build memory-aware AI assistants with just a few lines of code. The system automatically extracts relevant information from conversations, maintains context history, and intelligently recalls memories when needed. The async-first design ensures optimal performance even at scale.
Database Flexibility and Integration - Native support for both Neo4j (graph database) and Qdrant (vector database) with built-in multi-tenancy. Uses hybrid search combining dense and sparse embeddings for more accurate memory retrieval. The modular architecture lets you add custom database implementations when needed.
Framework-Friendly Development - Works seamlessly with any LLM provider through a unified interface. Includes pre-built integrations for OpenAI, Azure, Groq, and Together AI. Compatible with popular frameworks like LangChain and AutoGen, making it easy to incorporate into existing projects.
Production-Ready Architecture - Built for scale with strategic indexes and constraints. Handles millions of users and interactions while maintaining fast query performance. Includes comprehensive memory management features like version tracking, memory updates, and flexible search scopes to support real-world applications.

Improve RAG by Rewriting Queries Before Retrieval 📝

RAG systems often struggle when retrieving information because they don't fully understand the user’s intended query. Standard methods typically use a user's input verbatim, which is not always the best way to fetch the needed context.

The "Rewrite-Retrieve-Read" (RRR) framework addresses this head-on. It intelligently rewrites a user's input into a more effective search query before the retrieval process. This improvement directly makes the overall system more accurate and responsive. There is an open-source implementation of this, which provides a working pipeline you can modify and use right away.

Key Highlights:

Query Optimization - RRR explicitly uses an LLM to rewrite input queries into more effective versions before retrieval, which leads to better search results from the get-go, and reduces the need for complex prompts or finetuning. You can use either a static LLM or a trainable model like T5 as a query rewriter.
Adaptable to Black-Box LLMs - RRR is designed with black-box LLMs in mind. You don't need full access to the LLM's architecture. You can integrate this easily as a modular step in front of your existing retrieval system using API calls. In cases where you only have access to LLM API calls, you can use a static LLM like ChatGPT to rewrite your search query.
Reinforcement Learning Tuneable Rewriter - The RRR framework also provides the option of making the query rewriter a trainable component. With PPO, you can finetune it based on feedback from the LLM reader, optimizing for your specific retrieval task with minimum resources.
Open-source Implementation - The fully functional open-source implementation of RRR by Athina AI gives you hands-on code to quickly test this framework. You can utilize the provided code and modify it by changing models and vector databases. It also integrated with Athina's easy-to-use evaluation platform.

Quick Bites

Nexa AI has introduced NexaQuant, a new compression technique for LLMs that reduces model size by up to 73% while achieving 100%+ accuracy compared to original FP16 versions. It is compatible with llama.cpp. Benchmarks show a Llama 3.2 1B model compressed to 730MB with 1.38GB RAM usage, all while exceeding the original model's performance. The technology supports multimodal models and runs efficiently on devices from phones to desktops, boasting sub-second startup times and impressive token generation speeds.

Google has released a whitepaper on AI agents - a deep dive into how agents extend LLMs' capabilities through tools, reasoning frameworks, and external data access. The paper emphasizes practical architecture patterns and implementation approaches, drawing parallels between agent operation and human problem-solving to illustrate key concepts:

The orchestration layer is crucial: Google details how frameworks like ReAct, Chain-of-Thought, and Tree-of-Thoughts can be implemented to structure agent reasoning and decision-making processes.
Tools come in three flavors: Extensions (agent-side API execution), Functions (client-side execution with more developer control), and Data Stores (for RAG and external knowledge access) - each serving distinct architectural needs.
Production readiness: The paper concludes with practical examples using LangChain and shows how to scale agent architectures using Vertex AI's managed environment, providing a clear path from prototype to production.

Tools of the Trade

Devpilot: Connects companies with developers by evaluating their technical skills through automated tests rather than traditional résumés. It includes features like AI-assisted recruitment, project management, and automated skill assessments.
Gitee AI: Provides access 41 different AI models (for text, image, and speech) through simple API calls without managing servers. Users purchase resource packages to use the models and access them with authentication tokens.
Micro Agent: A command-line AI agent that writes and fixes code by generating test cases and iterating on the code until the tests pass. It can also match visual designs through screenshot comparison using models like Claude.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

The biggest issue for LLM coding assistants is that by default, they try to solve problems by adding 20 new lines of code; works great for a while, but after 50+ edits your codebase becomes a superfund site ~
Tom Dorr
AI won’t replace programmers, but rather make it easier for programmers to replace everyone else. ~
Naval Ravikant

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.