• unwind ai
  • Posts
  • Gemini 2.5 Flash with Thinking Budget

Gemini 2.5 Flash with Thinking Budget

PLUS: Use OpenAI's Codex CLI with Gemini, Fully-hosted RAG chunking service

Today’s top AI Highlights:

  1. Google releases Gemini 2.5 Flash with a Thinking Budget

  2. HTTP API for Claude Code, Goose, Aider, and Codex

  3. OpenAI Codex CLI working with Gemini and Ollama models

  4. The fastest way to turn any website into a clean text file for LLMs

  5. MCP server for AI agents to execute Python code in a sandboxed environment

& so much more!

Read time: 3 mins

AI Tutorial

Financial management is a deeply personal and context-sensitive domain where one-size-fits-all AI solutions fall short. Building truly helpful AI financial advisors requires understanding the interplay between budgeting, saving, and debt management as interconnected rather than isolated concerns.

A multi-agent system provides the perfect architecture for this approach, allowing us to craft specialized agents that collaborate rather than operate in silos, mirroring how human financial advisors actually work.

In this tutorial, we'll build a Multi-Agent Personal Financial Coach application using Google’s newly released Agent Development Kit (ADK) and the Gemini model. Our application will feature specialized agents for budget analysis, savings strategies, and debt reduction, working together to provide comprehensive financial advice. The system will offer actionable recommendations with interactive visualizations.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Google has released Gemini 2.5 Flash in preview, their first fully hybrid reasoning model, giving developers the ability to turn thinking on or off. The model gives you control over exactly how much "thinking" it does based on task complexity.

This new model builds on the speed and cost efficiency of 2.0 Flash while adding significant reasoning capabilities with an excellent price-to-intelligence ratio, making it ideal for building complex AI solutions that are both high volume and reasoning sensitive. Even with thinking turned off, you'll still get better performance than the previous version while maintaining impressively fast response times.

Key Highlights:

  1. Intelligent Resource Management - The model is trained to know how long to think for a given prompt, and therefore automatically decides how much to think based on the perceived task complexity. It’ll use fewer tokens for simple queries and more for complex problems like multi-variable equations or code generation.

  2. Flexible Reasoning Control - You get fine-grained control over the maximum number of tokens a model can generate while thinking with the “thinking budget” parameter (ranging from 0 to 24,576 tokens) to find your perfect balance between quality, speed, and cost. Set it to 0 for fastest performance or crank it up for multi-step reasoning tasks.

  3. Superior Price-Performance - 2.5 Flash absolutely crushed Claude 3.7 Sonnet across benchmarks, including Humanity’s Last Exam, science, math, and visual reasoning, while being 20-25x cheaper!

  4. Availability - An early version of Gemini 2.5 Flash is being rolled out today in preview in the Gemini API via Google AI Studio and Vertex AI. Gemini 2.5 Flash is also available to everyone in the Gemini app, and can be used with new features like Canvas.

AgentAPI brings HTTP connectivity to terminal-based coding agents, letting you control Claude Code, Goose, Aider, and Codex with simple API calls. This open-source tool creates a standardized interface for these AI coding assistants without requiring you to modify their codebase. Now you can integrate these powerful coding agents into your workflows, applications, and automation pipelines while keeping their full capabilities intact.

Key Highlights:

  1. Simple deployment and integration - Get started in minutes with a single command that works across all supported agents. The HTTP server exposes a clean API with just four endpoints for complete control over your coding assistants.

  2. Universal compatibility - Works with Claude Code, Goose, Aider, and Codex without requiring changes to your existing agent installations. The standardized interface lets you switch between different agents without rewriting your integration code.

  3. Real-time communication - Track agent activity through server-sent events that provide immediate status updates and message notifications. The attach command even lets you drop into an interactive terminal session at any point.

  4. Smart terminal handling - AgentAPI intelligently parses the terminal output to extract meaningful messages while automatically filtering out TUI elements and user inputs, giving you clean, usable responses from your agents.

Quick Bites

Roboflow has released "trackers," a new open-source library offering clean Pythonic implementations of multi-object tracking algorithms for computer vision workflows. Built on supervision, theit seamlessly integrates with detection models from Ultralytics, Transformers, and MMDetection.

Mistral AI has released "Classifier Factory," a platform to create custom classification models for a variety of applications, including content moderation, sentiment analysis, and fraud detection. Using the Mistral 3B model, it is available directly in La Plateforme and our API.

Someone just forked the original OpenAI Codex CLI (OpenAI’s new Terminal-based AI coding agent) to work with Gemini and other models via Open Router and Ollama. This open-codex is built for developers who already live in the terminal and want ChatGPT‑level reasoning plus the power to actually run code, manipulate files, and iterate – all under version control. It's chat‑driven development that understands and executes your repo.

Tools of the Trade

  1. llmstxt.new by Firecrawl: Converts any URL into clean text files optimized for LLMs. Just add "llmstxt.new/" before any URL and instantly get a .txt file. It produces two outputs: a concise summary file and a complete content file, both formatted for LLM training and inference.

  2. MCP Run Python: MCP server that allows agents to execute Python code in a secure, sandboxed environment. It uses Pyodide to run Python code in a JavaScript environment with Deno, isolating execution from the host system.

  3. cua-mcp-server: MCP server for the computer-use agents, allowing you to run CUA through Claude Desktop or other MCP clients. It exposes CUA's full functionality through standardized tool calls. It supports single-task commands and multi-task sequences, giving Claude Desktop direct access to all of CUA's computer control capabilities.

  4. Chonkie Cloud: A fully hosted RAG chunking service that processes text documents into optimized chunks without requiring infrastructure management. Comes with visualization tools to help developers build and debug their RAG systems more effectively.

  5. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. we’ll look back at this era like the gold rush.
    except this time:
    – picks + shovels = prompts + AI agents
    – gold = attention, data, distribution
    – miners = builders automating boring work
    – gold pans = n8n, replit, bolt, lovable
    – land grabs = ai-first domains + keywords
    – mining towns = niche discords + communities
    – saloons = X, short form video
    – mentors = youtube + pods
    – railroads = zapier, lindy, chatgpt + open source workflows
    – hardware stores = marketplaces for agents + templates
    – prospecting = searching reddit, docs, search consoles
    – speculators = people flipping AI tools
    – outlaws = folks scraping sites and charging
    – sheriffs = devs enforcing rate limits + terms you don’t need funding.
    you need a browser, a niche and a good idea.
    start digging. ~
    GREG ISENBERG

  2. btw 90% of the people I’ve watched use an LLM can’t prompt if their life depended on it ~
    Sully

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.