unwind ai
Posts
Opensource General-Purpose AI Agent

Opensource General-Purpose AI Agent

PLUS: ChatGPT image generation in the API, Perplexity can now take multi-app actions

Shubham Saboo & Gargi Gupta
April 24, 2025

Today’s top AI Highlights:

Autonomous open browser agent for complex web tasks
100% opensource general AI Agent built to work like a human
Generate OpenAI’s viral Ghibli-style images via API
Perplexity releases web browsing and multi-app actions agent
MCP MCP – an MCP server to list MCP servers

& so much more!

Read time: 3 mins

AI Tutorial

Financial management is a deeply personal and context-sensitive domain where one-size-fits-all AI solutions fall short. Building truly helpful AI financial advisors requires understanding the interplay between budgeting, saving, and debt management as interconnected rather than isolated concerns.

A multi-agent system provides the perfect architecture for this approach, allowing us to craft specialized agents that collaborate rather than operate in silos, mirroring how human financial advisors actually work.

In this tutorial, we'll build a Multi-Agent Personal Financial Coach application using Google’s newly released Agent Development Kit (ADK) and the Gemini model. Our application will feature specialized agents for budget analysis, savings strategies, and debt reduction, working together to provide comprehensive financial advice. The system will offer actionable recommendations with interactive visualizations.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Multi-Agent Personal Finance Coach

Fully functional multi-agent app with step-by-step instructions (100% opensource)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Opensource Browser Agent Powered by Reasoning Models 🌐♻️

Laminar AI has released Index, a new open-source agent for automating complex tasks directly within a browser environment. Index operates by using a focused script combining JS, computer vision, and OCR to pinpoint interactable web elements, presenting them visually to powerful vision-enabled LLMs for action.

It achieved a 92% score on the WebVoyager benchmark with Claude 3.7 (extended thinking), outperforming OpenAI Operator (CUA), which scored 87%. Index also includes built-in browser session replay and trace synchronization tailored for developer debugging. Available via a simple Python package, CLI, or API, Index integrates with leading models like Gemini 2.5 Pro, Claude 3.7, and OpenAI o4-mini, offering flexibility for various use cases.

Key Highlights:

Observability - Index records complete browser sessions while tracking agent steps and LLM calls, then synchronizes everything in a unified interface. This debugging environment lets you see exactly what the agent perceives alongside execution traces, making it easier to identify and fix problems.
Element Detection - The core of Index is its carefully developed detection script, created through extensive testing. Combined with computer vision and OCR techniques, it accurately identifies interactive elements across different web interfaces.
Architecture - Index runs on a straightforward while loop powered by well-crafted prompts. This approach, improved through thousands of evaluation runs, delivers strong performance with minimal complexity.
Integration Options - Get started quickly with a simple pip install lmnr-index and run via Python code, an interactive CLI, or serverless API. The tool supports multiple LLMs like Gemini 2.5 Pro/Flash, Claude 3.7 Sonnet, and OpenAI o4-min.

Opensource Generalist AI Agent That Acts for You 💻🎯🪄

Generalist AI agents like Manus AI and Genspark are redefining automation by using computers and completing complex real-world tasks just like your human assistant or employees. Here comes another AI agent, Suna, a 100% open-source solution in this space, available under the Apache 2.0 license. Suna becomes your AI employee to help you accomplish real everyday challenges with natural conversation.

Suna's powerful toolkit includes seamless browser automation to navigate the web and extract data, file management for document creation and editing, web crawling and extended search capabilities, command-line execution for system tasks, website deployment, and integration with various APIs and services.

Key Highlights:

Autonomous Task Execution - Suna breaks down your instructions into logical steps, formulates effective strategies, and navigates between different tools and information sources to accomplish objectives. The agent adapts based on intermediate results, overcomes obstacles, and maintains focus on the original goal throughout the entire process.
Use Cases - From generating B2B lead lists with contact information to creating detailed market research reports with visualizations, Suna handles diverse real-world applications. The agent can scrape data, analyze public reviews to identify trends, plan detailed travel itineraries, monitor stock performance, or even create interactive web games.
Toolkit Integration - The agent combines browser automation, file management, web crawling, command-line execution, and API integration into a cohesive system. This lets Suna use and switch seamlessly between different tools to solve problems.
Architecture - Suna's consists of four manageable components—Python/ FastAPI backend, Next.js/ React frontend, isolated Docker execution environments, and Supabase database. The setup requires minimal configuration with standard components like Redis, making deployment straightforward.
Production-Ready - Each agent operates in an isolated Docker container with controlled access permissions, ensuring task execution happens in a secure sandbox environment. This architecture provides the flexibility needed for powerful automation while maintaining appropriate security boundaries for sensitive operations or data handling tasks.

Quick Bites

OpenAI has released its image generation capabilities worldwide in the API with gpt-image-1. The new model offers enhanced features including higher fidelity images, diverse visual styles, precise editing capabilities, rich world knowledge integration, and consistent text rendering. You can now access these capabilities through dedicated endpoints for image generation and editing, with additional support for the Responses API coming soon.

Groq has launched Groq Desktop, a new MCP host that brings lightning-fast tool use to local models like Llama 4 Scout and Qwen QwQ 32B. Unlike Claude Desktop, which can lag with tool calls, Groq Desktop connects to any MCP server and returns results in seconds — even when models fetch files or trigger APIs. It’s already in beta, supports local chat with image inputs, and works with all MCP-compatible models running on Groq.

Agno unveils Memory 2.0 for its agent ecosystem, providing multi-user and multi-session memory capabilities similar to ChatGPT but designed specifically for AI agents. The new system offers three memory types - session storage for conversation history, user memories for personalization, and session summaries for condensing lengthy interactions - all now available as the default memory driver for all Agno Agents.

Perplexity has finally rolled out its Voice Assistant feature to iOS users, enabling iPhone owners to perform tasks across multiple apps through natural language commands. The assistant can now help users book reservations, manage calendar events, draft emails, play media content, and navigate ride-sharing services—all while maintaining conversations even when users switch between apps.

Tools of the Trade

MCP MCP - An MCP server to discover MCP servers from any MCP client like Claude desktop or Cursor. Just use prompts like “which MCP server will help me manage my calendar,” and it’ll give you a list of available MCP servers.
Astral.now: Build web-browsing AI agents that automate marketing workflows. These agents execute defined steps like searching platforms, evaluating content with AI, and generating targeted replies to find potential leads.
Scira MCP Chat: Opensource minimalist web client that connects to any MCP server using Composio, Zapier, or your own setup. It supports multiple models (like Grok-3, GPT-4.1 Mini), has built-in reasoning support, and is built with Next.js, Tailwind, and Vercel’s AI SDK — no auth needed, full chat history included.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

OpenAI tried to acquire Cursor at $300M ARR but they chose to say no and raise at $10B.
Then they offered Windsurf $3B as they hit $100M ARR.
The model labs clearly want to buy breakout apps built on them. ~
Deedy Das
vibe coding is dead
its just called coding ~
Ben Tossell

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.