unwind ai
Posts
GraphRAG with Chain-of-Thoughts

GraphRAG with Chain-of-Thoughts

PLUS: User profile-based memory layer, OpenAI Operator-like automations locally

Shubham Saboo & Gargi Gupta
February 18, 2025

Today’s top AI Highlights:

Chain-of-thought graph-based memory with a single LLM loop
1st opensource user profile-based memory for AI agents & apps
Run OpenAI Operator-like automation workflows locally
Opensource DeepSearch that searches, reads, and reasons until the best answer found
Run secure, isolated environments on Apple Silicon for Computer Use AI agents

& so much more!

Read time: 3 mins

AI Tutorials

Building powerful AI applications that can reason over documents while maintaining data privacy is a critical need for many organizations. However, most solutions require cloud connectivity and can't operate in air-gapped environments.

In this tutorial, we'll create a powerful reasoning agent that combines local Deepseek models with RAG capabilities. It has a dual mode that can operate in both simple local chat mode and advanced RAG mode with DeepSeek R1.

Local Chat Mode - Direct interaction with DeepSeek models running locally, perfect for general queries and conversations.
RAG Mode - Enhanced reasoning with document processing, vector search, and optional web search integration for comprehensive information retrieval.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Local RAG Reasoning Agent with DeepSeek R1

Fully functional AI agent RAG app with step-by-step instructions (100% opensource)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Open-source Chain-of-Thought GraphRAG

Yohei Nakajima introduces Graphista, a new open-source chain-of-thought GraphRAG prototype that combines two simple loops for ingesting and querying data. Using a single LLM loop with tools, Graphista manages both data ingestion and query operations in a single architecture.

The system identifies existing nodes and updates them instead of creating duplicates, with the reasoning process handled by the LLM loop rather than hardcoded logic. Graphista currently supports local JSON backend with experimental support for Neo4j and FalkorDB. Its lightweight design makes it particularly appealing for prototyping graph-based memory systems.

Key Highlights:

Data Management - The ingest() function uses specialized LLM tools to automatically process incoming text, handling deduplication and updates to existing nodes. Rather than creating new nodes for every piece of information, it intelligently identifies and updates existing ones, keeping your knowledge graph clean and efficient.
Natural Language Querying - The ask() function enables natural language questions with detailed chain-of-thought reasoning. It leverages the SmartRetrievalTool to traverse the graph, find relevant information, and provide comprehensive answers based on the stored knowledge, making it easier to extract insights from your data.
Integrations - Quick setup with a unified Memory class that brings together your graph database, ontology, and LLM integration. The system supports multiple backend options (local JSON, Neo4j, FalkorDB) and includes pre-built components for tasks like entity extraction and relationship mapping, reducing development overhead.
Query Capabilities - Beyond basic natural language queries, you can perform hybrid vector and property searches, execute custom queries via Query objects, and use keyword retrieval. The system supports both simple lookups and complex graph traversals, giving you full control over how you access your data.

Profile-Based Long-Term Memory for AI Agents 🧑‍🧑‍🧒‍🧒💡

Memobase introduces a user profile-based memory system that helps AI agents and applications remember and understand user interactions over time. Built to handle millions of users, it maintains dynamic profiles that update automatically as users interact with your AI.

You can integrate this memory layer with just a few lines of code for your AI agents to deliver personalized experiences without managing complex memory architectures. The system works by extracting meaningful insights from conversations while maintaining structured profiles, making it particularly valuable for applications where user context and history matter like education and personal companionship.

Key Highlights:

Integration with Popular Tools - Add memory capabilities to OpenAI API calls by simply passing a user_id parameter. Compatible with Python, Node.js, and Go SDKs, plus direct API access. Integration requires minimal code changes to your existing LLM stack, making it easy to enhance current applications.
Cost Efficiency - Uses a buffer zone to aggregate recent messages before processing them together, reducing LLM API costs. The buffer automatically flushes when full or after a set idle time, with manual flush options for specific use cases like end-of-session processing.
Customizable Profile Structure - Define exactly what user information your AI captures through configurable profile slots. Each profile can track specific attributes like preferences, behaviors, or domain-specific information, giving you control over memory organization while maintaining predictable structure.
Production-Ready Performance - Handles profile updates asynchronously to minimize latency in your application's main flow. Offers both cloud hosting and self-hosted options (open-source) with Docker. Includes built-in support for scaling to millions of users.

Quick Bites

Mistral AI has released Saba, a 24B parameter model designed for Middle Eastern and South Asian languages, with robust capabilities in Arabic and Indian languages, particularly excelling in Tamil and Malayalam. The model, which runs at speeds exceeding 150 tokens per second on single-GPU systems, delivers more accurate and culturally nuanced responses than models 5x its size. It is available both as an API and for local deployment within customer premises.

Aident AI has released Open-CUAK, an open-source alternative to OpenAI's Operator, to build and run browser automation workflows locally using any vision-compatible frontier model. The name stands for Computer Use Agent Kit (pronounced "quack"), it eliminates the $200 monthly fee while offering features like vision-based automation, dedicated remote browsers, and bot detection bypass.

Jina AI has introduced DeepSearch (jina-deepsearch-v1), an open-source agentic search system that iteratively searches, reads, and reasons until finding optimal answers, with full compatibility with the OpenAI Chat API schema. Unlike OpenAI and Google’s Deep Research agents focused on long-form research reports, DeepSearch specializes in delivering concise, accurate answers through its iterative process. It is now available for integration into local chat clients with 1M free tokens for new API keys.

Google's Gemini app now has an infinite memory, letting it recall details from all your past conversations. You can seamlessly reference previous chats within Gemini, and it will factor that context into its responses. You have full control to view, edit, or delete the chat history that the model used. Available for Gemini Advanced users.

Nous Research has released DeepHermes-3 Preview, an 8B parameter LLM that uniquely combines traditional language model responses with toggleable long-chain reasoning "deep thinking" mode for enhanced reasoning. Activated by a specific system prompt, the model demonstrates significant improvements in mathematical reasoning when its deep thinking mode is enabled, showing up to 50% gains on MATH benchmarks compared to its predecessor, while maintaining strong performance in general language tasks.

Tools of the Trade

Cua: Create and run secure, isolated environments on Apple Silicon where AI agents can interact with desktop applications through a computer-use interface. It provides the infrastructure to run and automate multi-app workflows in sandboxed environments, supporting both local and cloud AI models to control these applications.
Distributed Llama: Open-source tool to distribute LLM inference by connecting multiple home devices into a computing cluster, dividing the model's memory and processing requirements across nodes. At its core, it uses tensor parallelism and network synchronization to split LLMs (like Llama 3.x) across multiple devices to run models that would be too large for a single machine.
assistant-ui: Open-source Typescript/React library for building AI chat interfaces. It provides customizable, composable components that handle features like streaming, tool calls, and accessibility, to easily integrate and connect AI chat functionality to your backends.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

ChatGPT has voice and video. Able to hold a conversation with you about what you are showing it on your phone.
Tomorrow a new Grok is coming.
A smarter model isn’t enough to get attention in the market anymore.
So I bet we see new voices and video features too. That leads to Tesla integration and Optimus.
Your robots need real world understanding.
A smarter model is table stakes.
It is the integration into the real world that matters.
Which is what I will be watching for tomorrow night. ~
Robert Scoble
at OpenAI, the whistleblowers warn about the models being too powerful/dangerous
at xAI, the whistleblowers warn about the models being too dumb ~
James Campbell

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.