Memory Layer for AI Agents

PLUS: Opensource OpenAI o1, OCR with Llama 3.2 vision in 4 lines of code

Today’s top AI Highlights:

  1. Build AI agents with memory that learn continuously from user interaction

  2. Opensource implementation of OpenAI o1 reasoning architecture

  3. PhD-level o1-preview goes rookie in this new Math benchmark

  4. Download LLMs from Hugging Face with one simple command using LM Studio CLI tool

  5. Set up OCR with Llama 3.2 vision in just 4 lines of code

& so much more!

Read time: 3 mins

AI Tutorials

xAI API is finally here with the new grok-beta model. This model comes with 128k token context and function calling support. Till 2024 end, you even have $25 free credit per month!

We just couldn’t resist building something with this model so here it is! We are building an AI Finance Agent that provide current stock prices, fundamental data, analyst recommendations for stocks, and search for the latest news of a company to give a holistic picture.

It uses:

  • xAI's Grok for analyzing and summarizing information

  • Phidata for agent orchestration

  • YFinance for real-time stock data from

  • DuckDuckGo for web search

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Zep is a memory layer for AI assistants and agents. With Zep, you can build AI agents that continually learn from user interactions. The opensource framework stores chat history, extracts important facts, and keeps track of how information changes - all through a knowledge graph that you can query using simple API calls.

Zep automatically builds and updates its knowledge base as users interact with the AI, making it easier for you to create agents that genuinely remember and learn from past interactions.

Key Highlights:

  1. Knowledge Management - Drop in your conversation data or JSON and Zep does the heavy lifting - it automatically extracts facts, tracks relationships, and maintains context history. No need to write complex graph queries or manually curate what information to store and update.

  2. Fast Retrieval - Zep's asynchronous precomputation of facts and summaries ensures quick retrieval times, minimizing latency in agent responses. This speed is primarily linked to the performance of your embedding service, giving you direct control over a critical performance factor.

  3. Flexible Integration - Zep is framework-agnostic, readily integrating with LangChain, LangGraph, Autogen, Chainlit, and more. This adaptability reduces integration friction and allows you to incorporate Zep into your existing projects without major overhauls.

  4. Simple Memory Management - Zep provides high-level APIs for streamlined memory management, as well as lower-level APIs for more granular control over search and CRUD operations. This caters to both quick prototyping and complex memory manipulation needs.

  5. Get Started Quickly - Install with pip or npm, configure with your API key, and start adding memory to your agents with just a few lines of code. The docs include ready-to-use examples for various frameworks and custom integrations.

Since OpenAI released the o1 models, the opensource community has been making efforts to reproduce the model’s advanced reasoning capabilities. Here’s another attempt at it.

Steiner is an opensource model built on Qwen2.5-32B, that can explore multiple reasoning paths during inference and autonomously verify or backtrack when needed. Through a three-stage training process involving synthetic data and reinforcement learning, Steiner achieves notable improvements without requiring Chain of Thought prompting, though it hasn't obviously matched o1's inference-time scaling abilities. The models are available now to download on Hugging Face.

Key Highlights:

  1. Dynamic Reasoning - Steiner uses a unique "linear traversal of an implicit search tree" during inference. This method helps maintain the full context of the reasoning process and allows for exploring more possibilities than traditional methods, dynamically adjusting its approach based on previous steps.

  2. Smart Training Data - The model is trained on synthetically generated data that includes backtracking steps. This data is structured as directed acyclic graphs (DAGs) representing reasoning steps and their relationships. This approach differs from standard chain-of-thought datasets and enables the model to learn how to self-correct and explore alternate reasoning paths.

  3. Optimized with Reinforcement Learning - Steiner is trained with reinforcement learning using a reward function that encourages the model to find an effective balance between the breadth and depth of its reasoning process, leading to more efficient problem-solving.

  4. Opensource - Steiner is available on Hugging Face and can be deployed using standard inference services like vLLM.

Quick Bites

Conversational AI platform Play AI has released a new end-to-end AI speech model PlayDialog that uses a conversation’s historical context to control prosody, intonation, emotion and pacing to deliver natural-sounding speech. The model is ideal for voice dubbing, podcasts, and immersive customer interactions.

They have also released PlayNote, an enhanced version of NotebookLM that lets users create podcasts, briefings, narrations, and even childrens’ stories from files like PDFs, text, videos, and other media. Both PlayDialog and PlayNote are available via API.

Homebrew Research, the AI research lab giving Llama 3.1 a native "listening" capability, has released a new model checkpoint Ichigo v0.4. The new model features a higher MMLU score of 64.66, improved voice detection in noisy environments, and enhanced multi-turn conversation tracking, making it more responsive and robust. It is available on GitHub and Hugging Face. You can try the live demo here.

Epoch AI just released FrontierMath, a new benchmark of high-level math problems to test advanced mathematical reasoning in AI. This set of unpublished, research-grade problems covers areas like number theory and algebraic geometry, specifically to stump even the best models. Surprisingly, while models like OpenAI’s o1-preview, Claude 3.5 Sonnet, and Gemini 1.5 Pro excel in traditional math benchmarks (GSM8K), they managed to solve less than 2% of FrontierMath problems!

LM Studio CLI now lets you download any local LLM from Hugging Face using the lms get command directly from the terminal. Try specifying models, quantizations, or even using keywords to streamline your model downloads.

Tools of the Trade

  1. llama-ocr: Together AI’s OCR tool powered by Llama 3.2 vision. It takes documents & outputs markdown, and does really well for complex receipts, PDFs with tables/charts, etc. Set it up in just 4 lines of code.

  2. Inferit: UI for LLM inference that allows you to compare outputs from multiple models, prompts, and sampler settings side-by-side. It works with OpenAI-compatible backends and can run instantly online, as a browser extension, or locally.

  3. LLMariner: Opensource platform on Kubernetes for managing genAI workloads with OpenAI-compatible APIs. It simplifies training and inference management across clusters, allowing integration with tools like WebUI, VS Code, and Weights & Biases for seamless AI application deployment.

  4. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos with simple text prompts. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. the funny thing about many of the ppl who say LLMs aren’t reasoning is that they’re not reasoning either, they’re usually regurgitating a cached argument that they heard somewhere (aka their training data) ~
    James Campbell


  2. I dont think its clear from any of the rebuttals from OAI xAI etc just what kind of scaling we're talking about getting continuing gains from. Most of them hand wave back to post training/o1, are we getting the gains from continuing to scale pretraining (raw text data and parameters) or not ~
    Teknium (e/λ)

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.