- unwind ai
- Posts
- Gemini 2.5, DeepSeek v3 & MCP Agents
Gemini 2.5, DeepSeek v3 & MCP Agents
PLUS: GPT-4o native image generation, Business engine for AI agents
Today’s top AI Highlights:
Build and deploy MCP-enabled Python Agents in minutes
Gemini 2.5 Pro: 1M context, agentic coding, and native reasoning
Some AI image generation and editing updates
Build multi-agent teams that communicate directly instead of awkward Handoffs
Business infrastructure for monetizing AI Agents - just 5 lines of code
& so much more!
Read time: 3 mins
AI Tutorials
When OpenAI released their Agents SDK with the new voice pipeline integration, we immediately saw an opportunity to solve a practical development challenge: how to create dynamic audio content that adapts to user input without constantly re-recording voice talent.
In this tutorial, we'll build a Self-Guided AI Audio Tour Agent - a conversational voice system that generates personalized audio tours based on a user's location, interests, and preferred tour duration.
Our multi-agent architecture leverages the OpenAI Agents SDK to create specialized agents that handle different aspects of tour content, from historical information to architectural details, culinary recommendations, and cultural insights.
OpenAI’s new speech model GPT-4o-mini TTS enhances the overall user experience with natural emotion-rich voice. One of the most powerful features is how easily you can steer the voice characteristics with simple natural language instructions - you can adjust tone, pacing, emotion, and personality traits without complex parameter tuning.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments

fast-agent is an opensource Python library to define and deploy AI agents in minutes. fully implementing the Model Context Protocol. It is the first framework with complete, end-to-end tested MCP feature support. The library eliminates complex boilerplate with a clean, decorator-based syntax that lets you focus on agent logic rather than infrastructure.
fast-agent functions as an MCP client, enabling agents to connect seamlessly with 1000s of third-party data sources and tools through standardized MCP servers. It supports both Anthropic and OpenAI models, providing flexible model selection per task and handling multimodal inputs natively.
Key highlights:
Minimal boilerplate code - Create functional agents with just a few lines of Python using simple decorators like
@fast.agent()
,@fast.chain()
, or@fast.orchestrator()
, letting you focus on prompts and logic rather than infrastructure.Rich workflow patterns - Implement advanced patterns out-of-the-box including chains, parallel processing, evaluator-optimizers, routers, and orchestrators that can dynamically plan and execute multi-step tasks.
Model flexibility - Mix and match models within a single workflow, using different providers and capabilities for specific tasks (e.g., GPT-4o for analysis, Claude for coding), with easy model switching for testing.
MCP integration - Connect your agents to any MCP-compatible server including Google Drive, GitHub, Postgres, and custom tools, with built-in support for consuming MCP services.
Testing and Debugging - Evaluate how different models handle Agent and MCP Server calling tasks. Debug and refine agents through the built-in interactive shell, allowing real-time intervention, diagnosis, and direct communication with any agent in your workflow.
The #1 AI Meeting Assistant
Still taking manual meeting notes in 2025? Let AI handle the tedious work so you can focus on the important stuff.
Fellow is the AI meeting assistant that:
✔️ Auto-joins your Zoom, Google Meet, and Teams calls to take notes for you.
✔️ Tracks action items and decisions so nothing falls through the cracks.
✔️ Answers questions about meetings and searches through your transcripts, like ChatGPT
Try Fellow today and get unlimited AI meeting notes for 30 days.

Google has launched Gemini 2.5 Pro, its most advanced AI model to date. The model offers unified reasoning capabilities, a 1 million token context window, and multimodal processing (audio, images, and videos). It tops the LMArena leaderboard, outperforming its peers including DeepSeek R1, OpenAI o3-mini, GPT 4.5, and Claude 3.7 Sonnet across benchmarks for reasoning, coding, math, agentic capabilities, and more.
Google prioritized coding performance in this release, with Gemini 2.5 Pro demonstrating significant improvements in generating executable code, and handling code transformation and editing tasks. It is now available as an experimental version for free in Google AI Studio and the Gemini API, with pricing details coming soon.
Key Highlights:
Benchmark performance - The model tops the LMArena leaderboard, which measures human preferences, by a significant margin. It scores 18.8% on Humanity's Last Exam, 63.8% on SWE-Bench Verified for agentic coding, and shows strong results on GPQA Diamond and AIME 2025.
Coding capabilities - Google specifically focused on enhancing coding performance, with the model excelling at creating web apps, developing agentic code applications, and handling code transformation and editing tasks.
API features - Gemini 2.5 Pro supports structured outputs, function calling, and native tool use capabilities, making it versatile for developers building complex applications.
Compatibility options - While code execution and search grounding are supported, the model doesn't currently support tuning, caching, image generation, audio generation, or live API capabilities.
Developer access - Available immediately in Google AI Studio and via API with rate limits of 2 RPM and 50 requests per day for free users, and 5 RPM for paid tier users.
Quick Bites
DeepSeek has finally released more details on its latest non-reasoning model DeepSeek-V3-0324. The new model shows dramatic improvements in math reasoning, front-end coding, and tool use, strongly competing or even outperforming GPT 4.5 and Claude 3.7 Sonnet. The model is available to download on Hugging Face, also available in the API and via chat.deepseek.com.
Reve AI has emerged from stealth mode with Reve Image 1.0, a text-to-image model that's claimed the top spot on Artificial Analysis' Image Arena, outperforming Google's Imagen 3, FLUX 1.1 Pro, and Midjourney v6.1. The model stands out for its exceptional prompt adherence, superior text rendering capabilities, and natural language editing features. You can try the free preview now at preview.reve.art, though API access isn't available yet.
Alibaba has released Qwen2.5-VL-32B-Instruct, an open-source multimodal model under Apache 2.0 license. The 32B model features improved human-aligned responses, enhanced mathematical reasoning capabilities, and fine-grained image understanding, with particularly strong results in complex reasoning benchmarks. It outperforms comparable models like Mistral-Small-3.1-24B, Gemma-3-27B, and even GPT-4o with significant margins.
OpenAI has released native image generation capabilties into GPT-4o. The model can now generate precise, photorealistic outputs that excel at rendering text and following detailed prompts.
It can handle up to 10-20 different objects with proper relationships, significantly outperforming competitors' 5-8 object limitations
You can edit images using simple prompts, while maintaining visual consistency across multiple iterations
Now rolling out to Plus, Pro, Team, and Free users as ChatGPT's default image generator, with API access for developers coming soon.
Agno just released Agent Teams 2.0, a revamped multi-agent architecture moving beyond the basic handoff approach they aced a year ago that other frameworks have since copied. The new architecture introduces 3 collaboration modes: Route (directing requests to specialists), Coordinate (delegating tasks and combining results), and Collaborate (everyone tackles the same problem). What makes this really promising is the new "Agentic Context" system that maintains shared memory between team members, helping these multi-agent systems actually scale in production environments.
Tools of the Trade
Paid: Business infrastructure platform for AI agent monetization that handles pricing, subscriptions, billing, and margin tracking with minimal code. It offers flexible pricing models aligned with AI agent value delivery, margin analysis tools, and a client portal that quantifies agent work for clear ROI demonstration.
MCPAdapt: Connects agentic frameworks to over 650 MCP servers, providing access to various data and tools. It adapts these MCP servers into usable tools within frameworks like Smolagents, Langchain, and CrewAI.
Holmesgpt: AI agent that automatically investigates alerts by fetching and analyzing data from multiple observability sources. It connects to various monitoring tools (Kubernetes, Grafana, Prometheus, etc.) to identify root causes of technical issues without human intervention.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes
vibe coders are just you guys when you were learning to code
we use state of the art AI to code
you googled or stack overflowed
excuuuuuuuse us for getting some stuff wrong along the way ~
Ben TossellMan was born to generate high performance code, but everywhere he is forced to generate more images. ~
Bojan Tunguz
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply