- unwind ai
- Posts
- Multi-Step AI Agents with Long-term Memory
Multi-Step AI Agents with Long-term Memory
PLUS: Advanced Voice Mode with video and screen-share, Google's AI Deep Research agent
Today’s top AI Highlights:
Build AI agents that can handle multi-step tasks and persistent long-term memory
Search, RAG, recommendations, and analytics over complex structured & unstructured data
Not to be Outdone: OpenAI also releases video and screen-sharing capabilities with Advanced Voice Mode
Google’s AI agent that does multi-step research and presents an organized report on any topic
Simple, unified interface to multiple LLM providers
& so much more!
Read time: 3 mins
AI Tutorials
While analyzing videos or searching the web individually is powerful, combining these capabilities opens up entirely new possibilities for AI applications.
In this tutorial, we'll build a Multimodal AI Agent using Google's Gemini 2.0 Flash model that can simultaneously analyze videos and conduct web searches. This powerful combination allows the agent to provide comprehensive responses by understanding both visual content and related web information.
Gemini 2.0 Flash, Google's latest model, brings impressive capabilities to the table. It offers better performance than even the Pro model while being 2x faster with built-in tool integration. The best part? the API is free with a generous rate limit while it’s in the experimental phase!
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Julep is a platform for building AI agents beyond simple, linear prompt-response patterns, common in frameworks like LangChain. These agents can remember past interactions and execute complex, multi-step tasks. You can design workflows that include dynamic decision-making, loops, and parallel processing, while directly integrating various external tools and APIs into their agent's processes.
Julep handles all the underlying complexities of stateful sessions and long-running processes, so you can focus on the core logic of your AI app rather than the intricacies of its infrastructure.
Key Highlights:
Build Complex Workflows with Ease - Julep lets you create multi-step tasks using familiar YAML syntax. You can define workflows with conditional logic, loops, and parallel execution, enabling the creation of advanced AI agents capable of managing complicated processes without the need to build all the complexity from scratch. Julep handles execution, retries, and keeps the tasks running reliably.
Stateful Sessions - Julep lets you build persistent AI agents that remember past interactions. This means your agents can maintain context over long-term conversations. You don’t have to worry about managing context windows. It also includes an "Adaptive Context" feature, that intelligently manages context size, which means you can build indefinite conversations.
Direct Integration with Tools and APIs - Julep allows for direct integration with a number of external tools (Brave Search, BrowserBase, email) and APIs so the agents can do everything they need within the platform. This reduces the need to write custom code to interact with various services, and Julep provides a framework to define user-defined tools as well.
Task Definition & Execution - Define tasks in YAML with a variety of step types: prompts, tool calls, data manipulation, conditional logic, and more. Julep handles the execution of tasks on its servers and provides tooling to manage task executions, allowing developers to monitor the process in real-time.
Ready to Level up your work with AI?
HubSpot’s free guide to using ChatGPT at work is your new cheat code to go from working hard to hardly working
HubSpot’s guide will teach you:
How to prompt like a pro
How to integrate AI in your personal workflow
Over 100+ useful prompt ideas
All in order to help you unleash the power of AI for a more efficient, impactful professional life.
Superlinked is an open-source Python framework to build search, recommendation, and RAG systems that work with both structured and unstructured data. The framework lets you construct custom data and query embedding models by combining pre-trained encoders for different data types - from text and images to numbers and timestamps.
Beyond basic vector search, Superlinked handles complex scenarios like semantic chunking, behavioral events, and multi-modal embeddings while maintaining a clean, intuitive API. The framework is particularly valuable when working with nuanced queries that blend multiple data types including numbers, categories, and even timestamps.
Key Highlights:
Smart Vector Creation - Superlinked lets you embed not just text but also numerical, categorical, and time-based data alongside images into a single vector space. It can handle complex queries like “popular science fiction movies from the 80s with strong female leads” that traditionally require separate search layers and re-ranking.
Flexible Query System - Define queries with dynamic parameters and weights that can be adjusted at runtime. This means you can fine-tune search behavior without re-embedding your data, and even implement natural language queries through LLM integration.
Production-Ready Architecture - Choose between running locally with SQLite for development or scaling up with PostgreSQL for production. The framework includes built-in performance optimizations, proper error handling, and comprehensive logging to help you deploy with confidence.
Developer-Centric Design - Get started quickly with minimal boilerplate code. The framework handles complex tasks like vector computation and query processing out of the box, while still giving you granular control when needed. Includes detailed documentation and example notebooks to help you implement common use cases.
Quick Bites
Nous Research has released Hermes 3 3B, a compact 3B LLM fine-tuned from Llama 3.2 with capabilities for function calling, JSON mode, and structured outputs. The model is available on Hugging Face with GGUF quantized versions, making it easy to run on consumer hardware like phones and laptop. Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities.
OpenAI has expanded ChatGPT's Advanced Voice Mode to include real-time video chat and screen-sharing capabilities, allowing you to have visual conversations and share your screens directly with GPT-4o for assistance. In a festive addition, they've also introduced a special Santa persona in Voice Mode for December, complete with voice interactions and holiday-themed conversations.
Midjourney has released Patchwork, an infinite canvas for collaborating and building fictional worlds. Combining language models, image models, and a canvas interface, Patchwork lets you construct the foundations for stories by generating, organizing, and modifying text and image scraps, while collaborating with others in real time.
Google has released Deep Research, an AI research agent in Gemini Advanced that autonomously conducts multi-step research on your behalf. When you enter your question, it creates a multi-step research plan for you to revise or approve. Once you approve, it begins analyzing relevant information from across the web. It browses, synthesizes, and reiterates, just like a human would. The final report is clean and organized, with relevant source links given. You can even ask follow-up questions or refine the report further.
Tools of the Trade
aisuite: A Python library that provides a unified interface for interacting with multiple LLM providers, such as OpenAI, Anthropic, and Google, using a consistent API similar to OpenAI's. It acts as a thin wrapper around provider-specific SDKs so you can easily swap between providers and compare results without changing the core application code.
Claude Engineer v3: A powerful self-improving AI assistant for creating and managing AI tools with Claude 3.5. It enables Claude to generate and manage its own tools, continuously expanding its capabilities through conversation. Available both as a CLI and a modern web interface.
Docling: Document parsing library that converts PDFs, Office files, and other document formats into HTML, Markdown, or JSON while preserving their structure and layout. It provides a unified document representation format and integrates with LlamaIndex and LangChain.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
ilya could have joined up with anthropic when he left
mira could have joined up with ilya when she left
but the AGI ring of sauron has a strong pull, and those who have been close enough to it know that it's the final competition on earth, and few want to share that power/glory ~
Daniel FaggellaApple Intelligence should just integrate with Perplexity. Real pressure test ~
Ananay Arora
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply