unwind ai
Posts
Multiple AI Agents, One Memory

Multiple AI Agents, One Memory

PLUS: Opensource AI web agent beats OpenAI o1, AI full-stack engineer

Shubham Saboo & Gargi Gupta
December 04, 2024

Today’s top AI Highlights:

Memory layer for AI agents that works across LLMs
Open-source AI agent beats OpenAI GPT-4o and o1 at web navigation while being dramatically smaller in size
AI Godmother Fei-Fei Li’s AI startup is generating 3D worlds from a single image
GPTs-like store for Anthropic’s Model Context Protocol servers
Build full-stack real web apps with natural language prompts

& so much more!

Read time: 3 mins

AI Tutorials

In this tutorial, we'll build a Personal Health & Fitness AI Agent that demonstrates how to create task-specific AI agents that collaborate effectively. Using Google Gemini and Phidata, we'll create a system where two specialized agents - one for diet and one for fitness - work together to generate personalized recommendations.

This app generates tailored dietary and fitness plans based on user inputs such as age, weight, height, activity level, dietary preferences, and fitness goals.

Phidata makes this multi-agent approach straightforward by providing a framework designed for building and coordinating AI agents. It handles the complexity of agent communication, memory management, and response generation, letting us focus on defining our agents' roles and behaviors.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Personal Health and Fitness AI Agent using Google Gemini

Fully-functional AI agent app in just 50 lines of Python Code (step-by-step instructions)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Build AI Agents That Learn From Past Conversations 🧠

Swastik AI is a model-agnostic memory system for AI agents that works across different LLMs and agent frameworks. It enables you to create and retrieve memories for your AI agents with just a few lines of code, reducing token usage by eliminating the need to send extensive chat histories as context. The system automatically manages memory creation, preventing duplicates and maintaining relevant context through cached and working memory mechanisms.

You can integrate SwastikAI's memory layer into your applications using a straightforward API that handles the complex memory management logic behind the scenes.

Key Highlights:

Model-agnostic - The memory system functions independently of the underlying LLM, allowing you to switch between different models without impacting stored memories. This maintains consistent agent memory across changes.
Token Optimization - Instead of sending entire chat histories, the system intelligently manages cached and working memories, retrieving only context-relevant information. This results in significant token savings and reduced API costs when working with LLMs.
Structured Memory Management - The API supports hierarchical organization through company, department, team, and agent IDs for organized memory storage for complex multi-agent systems. The system automatically handles memory transitions between cached and working states based on context.
Developer-Friendly Integration - Implement memory capabilities with minimal code using the Python SDK. The API requires only essential parameters like agent_id and user_id, with optional fields for more granular control. Simple pip install to get started quickly.

This AI Web Agent Beats GPT-4o and o1 by Mimicking Knowledge Workers 🖥️

Meet ScribeAgent - an AI that learned to navigate the web by learning from real users using real SaaS applications, not by memorizing synthetic examples. While other AI web agents based on GPT-4o and o1 models fumble while navigating websites, ScribeAgent has mastered the actual patterns of how humans click, type, and navigate through interfaces. Drawing from over 250 live websites and 6 billion training examples, it makes AI truly understand and interact with web UIs.

The most impressive part? A lightweight 7B parameter version of ScribeAgent outperforms massive models like OpenAI o1 at completing real web tasks like filling complex forms and navigating e-commerce, productivity tools, and enterprise platforms.

Key Highlights:

Practical Web Navigation - ScribeAgent learns from actual user sessions - how people find buttons, fill forms, and chain actions together. This means it understands real UI patterns, whether it's hunting down that submit button that's styled differently or knowing when to click before typing in a form.
Robust Real-World Performance - The model directly processes HTML and generates precise actions without getting confused by dynamic content or losing context between steps. In testing, it successfully automated complex workflows that required chaining multiple interactions across different pages and forms.
Implementation - Built on open-source Qwen models, ScribeAgent runs efficiently on standard hardware, with a 51.3% success rate that beats previous solutions by 14.1%. No need for expensive GPUs or complex deployment setups - it delivers high-performance web automation without the infrastructure headaches.
Production-Ready Capabilities - Beyond just clicking buttons, ScribeAgent demonstrates a sophisticated understanding of web interfaces - it knows when to scroll, how to handle dropdown menus, and can maintain context across multi-page workflows.
Next Steps - The research team used Qwen2-7B-Instruct (for ScribeAgent-Small) and Qwen2.5-32B-Instruct (for ScribeAgent-Large) as base models, fine-tuned on proprietary data with LoRA. The code is in this GitHub repo. You can look forward to their further research on integrating advanced reasoning and planning modules, exploring multimodality and scaling.

Want to build the next generation of AI agents that millions of users will use in their daily workflows? Join the team behind ScribeAgent, where you'll work with production-scale data from over 3 million users, including 30%+ of Fortune 500 companies.

Open Positions:

Quick Bites

Google’s much anticipated AI video generation model Veo is now available on Vertex AI in private preview. It can transform both text and image prompts into high-quality videos. Along with this, Google’s latest text-to-image model Imagen 3 will also be generally available next week, including new editing and customization features for select customers.

Tencent has open-sourced HunyuanVideo AI video generation foundation model with over 13 billion parameters, making it the largest open-source video model. It outperforms state-of-the-art models like Runway Gen-3, Luma 1.6, and 3 top-performing Chinese models. The inference code and model weights are available on Hugging Face and GitHub.

Hugging Face has launched a free, hands-on course for aligning small LMs (SmolLM2 series) with minimal hardware requirements. Modules cover topics like instruction tuning, preference alignment, efficient fine-tuning, and evaluation. The course is open for community contributions and peer review.

AI Godmother Fie-Fie Li shared what her AI startup World Labs has been cooking. They have developed an AI system that generates 3D worlds from a single image. Beyond the input image, the entire world is generated. The system also lets users adjust various camera effects like depth of field and zoom, as well as interactive 3D elements such as sonar and ripple effects. They are now focused on enhancing the fidelity and interactivity of these generated spaces.

Supabase just rolled out AI Assistant v2 in its Dashboard - a single AI interface for schema design, querying, debugging, and managing Postgres elements like RLS Policies and Functions. It now also converts SQL to Supabase-js code and automatically detects context from your current dashboard view. Access it with a quick shortcut or click.

Tools of the Trade

lovable.dev: AI tool to generate full-stack web apps from natural language prompts. It also integrates with GitHub for version control and Supabase for backend functionality like data persistence and authentication. You can quickly prototype and iterate on web projects, then export the generated code for further customization.
mcspservers.ai: Discover and share pre-built Model Context Protocol (MCP) servers. It’s like a centralized repository where developers can find ready-to-use MCP servers for various resources and contribute their own implementations. 56 servers are currently live on the platform.
lm.rs: Provides Rust code for running LM inference directly on CPU, without relying on external ML libraries. It currently supports Gemma, Llama 3.2, and multimodal Phi-3.5. It also offers options for quantization and an optional web interface.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):
- There will be the first major public protest related to AI
- A big company will see its market cap divided by two or more because of AI
- At least 100,000 personal AI robots will be pre-ordered
- China will start to lead the AI race (as a consequence of leading the open-source AI race).
- There will be big breakthroughs in AI for biology and chemistry.
- We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.

How my predictions for 2024 turned out:… ~
Clement Delangue
ai agents are the new apps ~
Greg Isenberg

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.