unwind ai
Posts
RAG with Long-term Memory

RAG with Long-term Memory

PLUS: Opensource AI layer on Playwright, Cohere's AI agents for enterprise

Shubham Saboo & Gargi Gupta
January 10, 2025

In partnership with

Today’s top AI Highlights:

Opensource AI layer on Playwright to build AI web agents easily
RAG framework that uses long-term memory for better knowledge discovery
Cohere’s platform to build and deploy AI agents for enterprise data
xAI releases Grok iOS app; free access to Grok 2 model
Run Python in your browser to turn messy documents into Markdown

& so much more!

Read time: 3 mins

AI Tutorials

The demand for AI-powered data visualization tools is surging as businesses seek faster, more intuitive ways to understand their data. We can tap into this growing market by building our own AI-powered visualization tools that integrate seamlessly with existing data workflows.

In this tutorial, we'll build an AI Data Visualization Agent using Together AI's powerful language models and E2B's secure code execution environment. This agent will understand natural language queries about your data and automatically generate appropriate visualizations, making data exploration intuitive and efficient.

E2B is an open-source infrastructure that provides secure sandboxed environments for running AI-generated code. Using E2B's Python SDK, we can safely execute code generated by language models, making it perfect for creating an AI-powered data visualization tool

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build an AI Data Visualization Agent

Fully functional AI agent app (step-by-step instructions)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Next-Gen RAG with Long-Term Memory 🧠

MemoRAG, an open-source RAG framework, introduces a memory-driven approach to knowledge retrieval and generation. Built on a long-memory model, it moves beyond traditional RAG systems by developing a comprehensive global understanding of entire databases, handling up to 1 million tokens in a single context.

Unlike standard RAG that relies on explicit information matching, MemoRAG generates precise retrieval clues from its memory to enhance evidence discovery and response accuracy. The framework offers both a lightweight mode that runs on a single T4 GPU and a full version with expanded capabilities.

Key Highlights:

Development Options - Start with MemoRAGLite using just a few lines of code on a 16GB GPU, or scale up to the full version handling million-token contexts. The framework supports quick prototyping with cached weights, reducing context processing time from 35 seconds to 1.5 seconds for a 200K-token context.
Optimized Memory Management - Implements caching for chunking, indexing, and encoding with up to 30x speedup in context pre-filling. The system supports context reuse, letting you encode long contexts once and reuse them across multiple queries, significantly reducing computational overhead.
Model Support - Seamlessly integrates with OpenAI and Deepseek APIs as generators, while supporting various memory models including Meta-Llama-3.1-8B and custom LLMs. The framework provides built-in compatibility with HuggingFace models and flexible generator options to match your needs.
Developer-Friendly Architecture - Offers independent usage of the Memory model for storing, recalling, and interacting with context, along with memory-augmented retrieval capabilities. The modular design lets you customize each component.

The future of presentations, powered by AI

Gamma is a modern alternative to slides, powered by AI. Create beautiful and engaging presentations in minutes. Try it free today.

Three Simple APIs to Build Web Agents 🌐 🔄

Stagehand brings AI-powered web automation to Playwright, offering three straightforward APIs - act, extract, and observe - to control browser actions using natural language commands. This open-source framework allows you to build web automations and provides fundamental tools and infrastructure for building web agents. You need to use simple natural language commands like "click the login button" or "extract the product price," while the AI handles the underlying Playwright implementation, adapting to UI changes and bypassing bot detection.

Stagehand makes it simpler to build automations and web agents, handling tasks like form filling, data extraction, and multi-step workflows. The framework maintains full interoperability with Playwright, allowing you to mix traditional selectors with AI-powered automation where needed, and providing the tools to build sophisticated agentic workflows.

Key Highlights:

Natural Language - Stagehand lets you use natural language instructions like act("click submit") or extract("product name") to control the browser, which reduces script complexity and relies on AI for element detection and action.
Modular Foundation - Stagehand's act, extract, and observe APIs provide the core of web interaction for higher-level agents or automations. It allows for self-healing by dynamically finding actionable elements, and bypassing bot detection, making it a suitable base to build web-based agents.
Model Integration - You can use OpenAI, Anthropic models, or even custom LLMs via a LLMClient. This flexibility, combined with several customization options at the constructor, allows you to tailor the agent to your needs.
Smart Fallback - When traditional Playwright selectors fail, Stagehand's act() method serves as an intelligent fallback that can identify and interact with elements using natural language. The framework also includes vision-based processing that kicks in automatically if text-based element identification fails, allowing your agent to continue working.
Production-Ready Features - Built-in support for prompt caching reduces API costs, while the chunking system keeps context windows manageable for better reliability. Full compatibility with Browserbase provides capabilities like persistent sessions, custom contexts, and automated captcha-solving to deploy and run reliable agents.

Quick Bites

xAI has launched a standalone iOS beta app for Grok, offering free access to the Grok 2 model. The app features image generation, and real-time information access via X and the web.

Cohere has launched North, a secure AI workspace platform that combines LLMs, multimodal search, and AI agents in a vertically integrated stack. The platform lets you create and deploy custom AI agents that can access enterprise data across various formats and applications, with Cohere's RAG engine (Compass) and Command R LLM powering the core functionality.

You can quickly deploy AI agents across different business functions, with the platform handling data indexing, search, and generation through its integrated technology stack.

Qwen has launched Qwen Chat (chat.qwenlm.ai), a web UI for interacting with its models. You can chat with various Qwen models, including the flagship Qwen2.5-Plus, and explore vision-language and reasoning capabilities. Key features include model comparison, document uploads, and HTML preview, with web search and image generation coming soon.

Tools of the Trade

Flows: Build AI workflows in a Colab notebook-style UI, by connecting modular blocks like LLM calls, API calls, code execution, and RAG systems. It provides both testing capabilities for large datasets and production deployment features. You can quickstart with the templates made by the community.
Weco AI Functions: Converts natural language descriptions into API endpoints with structured outputs, allowing you to use AI capabilities as regular Python functions or REST API calls. It provides features like schema definition, version control, A/B testing, and monitoring while handling the underlying AI implementation.
Office File to Markdown: Browser-based tool that converts files (Office documents, PDFs, images, and audio) into Markdown format. It runs entirely in the browser using WebAssembly technology, specifically using Pyodide to execute Python code to handle file conversion.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

Microsoft has never made a good product. Windows is absolutely dogshit - its only use case is to play video games but even then just get a playstation.
Github is kinda cool but they didn't make that. HoloLens sucked. At one point, Microsoft did sponsor me and sent me their best laptop + headphones. They fucking sucked. Also who uses Azure? ~
Avi Schiffmann
funny thing that low code tools have revealed is that there are very few people outside the engineering team who are capable of automating or connecting business flows even when you remove the 'learn to code' hurdle ~
Gabriel

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.