• unwind ai
  • Posts
  • o3 and o4-mini with Agentic Tool Use

o3 and o4-mini with Agentic Tool Use

PLUS: Vibe code production-grade softwares, First open-source native 1-bit LLM

Today’s top AI Highlights:

  1. Build full-stack production-ready applications with multi-agent platform

  2. OpenAI’s reasoning models can agentically use every tool within ChatGPT

  3. First open-source native 1-bit LLM with 2B parameters by Microsoft

  4. Google releases Veo 2 in the Gemini app and via API

  5. Scrape web data without specifying URLs using an AI web action agent

& so much more!

Read time: 3 mins

AI Tutorial

Financial management is a deeply personal and context-sensitive domain where one-size-fits-all AI solutions fall short. Building truly helpful AI financial advisors requires understanding the interplay between budgeting, saving, and debt management as interconnected rather than isolated concerns.

A multi-agent system provides the perfect architecture for this approach, allowing us to craft specialized agents that collaborate rather than operate in silos, mirroring how human financial advisors actually work.

In this tutorial, we'll build a Multi-Agent Personal Financial Coach application using Google’s newly released Agent Development Kit (ADK) and the Gemini model. Our application will feature specialized agents for budget analysis, savings strategies, and debt reduction, working together to provide comprehensive financial advice. The system will offer actionable recommendations with interactive visualizations.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

The market is flooded with vibe coding platforms like Bolt and Lovable to build simple toy applications—to-do lists, invitation apps, or basic landing pages that look impressive in demos but fall short in production. But now builders want to build serious products that can handle real-world complexity, scalability, and production-grade requirements.

Emergent is an agentic platform to build complete, production-ready applications without you writing a single line of code. This AI-native engineering platform doesn't just generate code snippets—it builds entire systems with databases, APIs, authentication, and infrastructure, all through simple prompts. Emergent employs a team of specialized multi-agents handling your frontend development, backend architecture, infrastructure configuration, DevOps automation, and quality assurance that collaborate seamlessly to complete your task.

Key Highlights:

  1. Full-Stack Development - Just tell it what you want to build in plain English, and watch it handle everything from frontend design to database schema creation, API development, and infrastructure setup—no manual wiring required.

  2. Entire Development Lifecycle - Emergent manages the entire product journey with automated testing, debugging, refactoring, and continuous integration. The platform resolves frontend bugs, boosts test coverage, and maintains documentation without you micromanaging each step.

  3. Data and Infrastructure Management - Build efficient ETL pipelines, streamline data migrations, and process information at scale with built-in data warehousing capabilities. Emergent handles database configuration, server setup, and third-party integrations automatically.

  4. Build Real Applications - Create genuine SaaS products and AI applications with login systems and payment processing, functional marketplaces with user onboarding, and data-rich dashboards with complex logic—all deployable to real customers immediately.

  5. Built-In Security and Scalability - Every application comes with enterprise-grade security features and architecture that scales with your user base, eliminating the typical growing pains of early-stage products.

Emergent is giving beta access to UnwindAI readers before going public. Be among the first to experience the future of building today — Sign up now and enter code "UNWIND" for immediate access.

OpenAI just released o3 and o4-mini, their smartest agentic reasoning models. These models don't just think longer before responding - they can now search the web, analyze files with Python, process images, and even generate visuals, all within a single conversation.

What makes this special is how they're trained to decide when and how to use these tools, solving complex problems in under a minute by chaining multiple actions together without you having to guide each step.

Both models particularly excel in coding, math, and scientific domains with greater efficiency and speed than previous O series models. Sam Altman has also confirmed that OpenAI will release o3-pro for ChatGPT Pro users in a few weeks.

Key Highlights:

  1. Full Tool Integration - Both models can agentically use and combine all ChatGPT tools in a single session, chaining web searches, data analysis, visual reasoning, and image generation to tackle multi-step problems without any manual guidance.

  2. Visual Intelligence - The models can now think with images directly in their reasoning process - they'll analyze your photos, diagrams, or sketches (even blurry or poorly lit ones), and can manipulate these images by rotating or zooming them as part of problem-solving.

  3. Codex CLI - OpenAI also launched Codex CLI, an open-source terminal-based coding agent just like Claude Code, that leverages o3 and o4-mini's reasoning capabilities with direct access to your local code, plus support for passing screenshots or sketches directly from your command line.

  4. Benchmark Performance - o3 sets new state-of-the-art scores on Codeforces, SWE-bench, and MMMU benchmarks, while o4-mini tops performance charts on AIME 2024 and 2025, while being faster than previous models.

  5. Availability - The models are now available to ChatGPT Plus, Pro, and Team users (replacing o1 and o3-mini). Developers can access both models via the Chat Completions API and Responses API, with upcoming tool integration support in the Responses API.

Microsoft's new BitNet b1.58 2B4T brings 1-bit AI to a whole new scale. They have opensourced BitNet b1.58 2B4T, their 2-billion parameter model built using a native 1.58-bit architecture that runs on CPUs, including Apple M2. Trained on 4 trillion tokens, this model matches the performance of similar-sized full-precision models while drastically cutting down resource needs.

It uses an innovative approach by quantizing weights to just three values (-1, 0, +1), making it dramatically more resource-friendly than traditional models. However, unlocking its speed and energy efficiency requires using a dedicated C++ library (like Microsoft’s bitnet.cpp), which is outside the standard Hugging Face ecosystem.

Key Highlights:

  1. Performance-to-Size Ratio - BitNet b1.58 2B4T matches or exceeds similar-sized models like LLaMA 3.2 1B and Gemma 3 1B on various benchmarks, particularly excelling in GSM8K (58.38%) and WinoGrande (71.90%), proving that extreme quantization doesn't have to sacrifice capability.

  2. Dramatic Resource Efficiency - The model uses just 0.4GB of memory (non-embedding) compared to 2-5GB for competitors, with CPU decoding latency of 29ms versus 41-124 ms for similar models, making it perfect for resource-constrained environments and edge computing.

  3. Speed Without Compromise - When run through the bitnet.cpp framework, the model delivers speeds up to 2x as fast as comparable models while using significantly less energy.

  4. Implementation - Available under an MIT license with multiple variants including packed 1.58-bit weights for efficient deployment and BF16 format for training/fine-tuning, plus GGUF format compatible with the bitnet.cpp library for optimized CPU inference.

Quick Bites

Google has released Veo 2 video generation model, to Gemini Advanced users in the Gemini app and to developers through the Gemini API. The model can create 720p videos from simple text prompts with fluid character movement and lifelike scenes that accurately simulate real-world physics across diverse visual styles.

  • Gemini Advanced users can generate 8-second-long 16:9 landscape format videos. Just select Veo 2 from the model dropdown in the Gemini app.

  • Veo 2 is also available in Whisk by Google Labs (only for Gemini Premium One subscribers), where you can not only create new images using text and image prompts but also animate them into videos with Veo 2.

  • Developers can use Veo 2 via the Gemini API and Google AI Studio to integrate both text-to-video and image-to-video generation capabilities into their apps, priced at $0.35 per second of video (much higher than Runway, which is $0.5 for a 5s video).

  • The API provides flexible options including video length (5-8 seconds), aspect ratios (16:9 or 9:16), and safety filters.

xAI has released Grok Studio, a canvas-like feature similar to OpenAI and Google’s Canvas, that can generate documents, code, reports, and browser games. Grok Studio will open your content in a separate window, allowing both you and Grok to collaborate on the content together. It will allow you to preview HTML, run code in multiple languages, including Python and JavaScript, and even use Google Drive files in the same canvas.

Firecrawl has launched Extract v2, a major upgrade to their data extraction endpoint powered by their FIRE-1 agent released just yesterday. While v1 could only pull structured data from specific URLs via prompts, v2 now actively interacts with websites, handles multi-page navigation, and can even extract data without requiring a URL through its built-in search functionality.

Tools of the Trade

  1. Operative.sh: Vibe test your apps using AI agents that simulate real user interactions and automatically test user flows (using BrowserUse). Their MCP server also enables MCP client agents like Cursor to to automatically test and verify your code end-to-end.

  2. Multi-Agent Canvas with MCP: Open-source canvas-style app to chat with multiple agents in one dynamic conversation and add MCP servers to enable these agents to complete your tasks. Built with Next.js, LangGraph, and CopilotKit.

  3. Codebuff: Terminal-based AI coding agent that indexes your entire codebase in seconds to give context-aware assistance. It gives you precise control for writing frontend and backend code across various languages and frameworks. It has a persistent memory to learn from previous sessions.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. "good research takes time", the o9 superintelligence muttered to itself after its 8th consecutive research idea failed within the last 640 milliseconds ~
    James Campbell

  2. "pip install" is dead. "uv add" is the new king. ~
    Santiago

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.