- unwind ai
- Posts
- Gemini 2.0 Brings the Era of Multimodal AI Agents
Gemini 2.0 Brings the Era of Multimodal AI Agents
PLUS: RAG pipeline as-a-service, AI engineer Devin available for $500/month
Today’s top AI Highlights:
Google releases Gemini 2.0 and Multimodal Realtime API with native tool-use
Build RAG pipelines using only 2 API endpoints
AI SWE Devin is now generally available at $500 a month
Replit Agents is out of early access with new features
Open-source framework to build, deploy, and run code-based AI workflows
& so much more!
Read time: 3 mins
AI Tutorials
In this tutorial, we have created a multi-agent AI legal team where each AI agent represents a different legal specialist role, from research and contract analysis to strategic planning, working together to provide thorough legal analysis and recommendations. We have used OpenAI's GPT-4o, Phidata, and Qdrant vector database.
This Streamlit application mirrors a full-service legal team where these specialized AI agents collaborate just like a human legal team - researching legal documents, analyzing contracts, and developing legal strategies - all working in concert to provide comprehensive legal insights.
The AI Agent Team:
Legal Researcher - Equipped with DuckDuckGo search tool to find and cite relevant legal cases and precedents. Provides detailed research summaries with sources and references specific sections from uploaded documents.
Contract Analyst - Specializes in thorough contract review, identifying key terms, obligations, and potential issues. References specific clauses from documents for detailed analysis.
Legal Strategist - Focuses on developing comprehensive legal strategies, providing actionable recommendations while considering both risks and opportunities.
Team Lead - Coordinates analysis between team members, ensures comprehensive responses, properly sourced recommendations, and references to specific document parts. Acts as an Agent Team coordinator for all three agents.
Our application provides five distinct types of analysis, each activating different combinations of our specialized agents
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Building RAG systems can take takes weeks of engineering time - handling document parsing, chunking, embeddings, and vector search operations. Changing this is Memoire, a document retrieval pipeline “as-a-service”, providing these capabilities through a clean API interface. Instead of spending time on infrastructure, you can now implement production-ready RAG in their applications within hours.
The service handles everything from PDF parsing to hybrid search algorithms, while offering a source-available license that's free for hobby projects and paid for production use.
Key Highlights:
Developer-First API - Get started with just two API endpoints - one to index documents and another to search them. No need to write parsers or manage embedding pipelines. Send your documents (PDFs, Word files, Excel sheets) and start searching with a few lines of code.
Light on Resources - Keep your infrastructure costs low. The entire pipeline runs comfortably on 1GB RAM and a single CPU core, even when handling 100k documents. Docker configurations help you test and understand your resource needs before scaling.
Flexible Development Flow - Choose what works for your stack. Use local CPU models during development, then switch to AWS Bedrock, OpenAI, or Azure for production - all through simple environment variables. No code changes needed when switching providers.
Production Ready - Focus on building features instead of infrastructure. Memoire handles the complex parts - hybrid search algorithms, automatic document chunking, authentication, and monitoring. Comprehensive logging helps you track and debug when needed.
Google just launched Gemini 2.0 Flash, a powerful new model designed for speed and performance, and it's now available to developers. The big news isn't just improved speed; it's the push toward agentic AI: models that can understand context, make plans, and perform actions. Think code execution, tool use, and real-time interaction using modalities like images, voice, and video.
Gemini 2.0 Flash is setting the stage for building more immersive apps, and its experimental phase is open for you to try using the free API access with generous limits. So, what does this mean for your workflow? Read on.
Gemini 2.0 Flash - This model is 2x faster than Gemini 1.5 Pro while outperforming it on text, code, and video benchmarks. It also natively generates images, custom audio, and has integrated tools like Google Search and code execution. The single API call supports multimodal output with text, audio, and images.
Availability and Pricing - You can start building using the Gemini API in Google AI Studio and Vertex AI for free, with limits of 10 RPM, 4M TPM, and 1500 requests per day, plus free Search tool usage during the experimental phase. General availability of Flash 2.0 and other model sizes is planned for early next year.
Multimodal Live API - OpenAI just demoed, but Google shipped! Powered by Gemini 2.0 Flash, this new API lets you stream audio, video, and text into your applications while dynamically calling tools like search and code execution in the background, all simultaneously. You can try it right away in the Google AI Studio (we did too and it’s darn good!).
Project Astra: Universal AI Assistant - Google’s universal AI assistant Project Astra is getting better, with multilingual dialogue, up to 10 minutes of in-session memory, tool usage like Search and Lens, and remembers past conversations to personalize responses. It is currently under testing with phones and glasses.
Project Mariner: Browser-Based Agents - Like Claude Computer Use, Project Mariner is an AI agent that can understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms, and then uses that information via an experimental Chrome extension to complete tasks for you. It’s still in the early stages.
Jules: AI Code Agent - This AI coding agent, powered by Gemini 2.0, integrates into GitHub and handles coding tasks asynchronously, including planning, code modification, and creating pull requests for Python and JavaScript. It scored 51.8% on SWE-bench Verified, higher than the new Claude 3.5 Sonnet too. Currently available to a select group of testers, Jules should be out by early 2025. You can sign up at labs.google.com/jules.
Quick Bites
Cognition Labs' Devin, an autonomous coding agent, is now generally available for engineering teams at $500/month, offering tools like Slack integration and IDE extensions for tasks like bug fixes and code refactors. No seat limits, you can now start using Devin directly via app.devin.ai.
ChatGPT integration with Siri in Apple Intelligence is now available. Siri can now take ChatGPT’s help when it cannot answer your questions. To enable this on your iPhone, iPad, or Mac, go to Settings > Apple Intelligence & Siri > enable Apple Intelligence and ChatGPT in Integrations. This also lets you use ChatGPT using the Camera Control buttons on your iPhone, and give ChatGPT an image directly from your Camera app.
Lmarena just launched WebDev Arena, an arena where LLMs build a web app. You can compare the two side-by-side and vote for the best output. The Leaderboard is coming soon! Code from LLMs is executed in isolated E2B sandboxes.
Replit Assistant - AI agent to do iterations and quick updates without disrupting your workflow. No more copy-pasting the code — just describe what you need, and the Assistant will make the change directly.
Checkpoint Billing: Unlimited usage through a checkpoint system. You’ll get a monthly allotment and then transparent, usage-based billing for Agent “checkpoints”.
React & Design Cloning: Replit Agent now uses React to create more visually stunning applications. You can also share a screenshot or URL for the Agent to "clone" that design and incorporate it into their app.
Tools of the Trade
Recase: Open-source framework to build, deploy, and run code-based AI workflows. These workflows can be triggered via a UI, API, or natural language commands in Slack, integrating LLMs, web scraping, and other tools.
DocComment: Generates detailed, human-readable code explanations, either as non-intrusive sidecar comments or inline documentation, for various languages. It uses code structure analysis and LLMs to provide different levels of granularity in comments.
LLM Sandbox: Python library provides a sandboxed environment for executing LLM-generated code, using Docker containers for isolation. You can manage the lifecycle of these containers, run code within them, and copy files between your host and the runtime environment.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
devin is $500 per month
o1 pro is $200 per month
it's pretty clear where this is going. average people are getting priced tf out.
if u wanna get access to the highest performant AI tools, you need to be rich. and having access to them will make you even richer.
this is going to accelerate the gap between rich and poor ~
Moritz KrembI've lost all confidence in OpenAI. They have clearly hit a wall. I might be mistaken, but from the outside, it looks like they are focused on maximizing short-term profits before everyone realizes the emperor is naked. ~
Santiago
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply