unwind ai
Posts
Voice AI Agents and GPT-4o Audio

Voice AI Agents and GPT-4o Audio

PLUS: Liner Deep Research outperforms Perplexity and GPT-4.5, Moore’s Law for AI Agents

Shubham Saboo & Gargi Gupta
March 21, 2025

Today’s top AI Highlights:

Build voice AI agents with OpenAI's new audio models and SDK
This multi-agent Deep Research beats Perplexity and GPT-4.5
Moore’s Law for AI Agents
Fine-tune LLMs with RL in your browser with this fully managed service
OpenAI o1-pro in API is the most costly model

& so much more!

Read time: 3 mins

AI Tutorials

In this tutorial, we'll show you how to create your own powerful Deep Research Agent that performs in minutes what might take human researchers hours or even days—all without the hefty subscription fees. Using OpenAI's Agents SDK and Firecrawl, you'll build a multi-agent system that searches the web, extracts content, and synthesizes comprehensive reports through a clean Streamlit interface.

OpenAI's Agents SDK is a lightweight framework for building AI applications with specialized agents that work together. It provides primitives like agents, handoffs, and guardrails that make it easy to coordinate tasks between multiple AI assistants.

Firecrawl’s new deep-research endpoint enables our agent to autonomously explore the web, gather relevant information, and synthesize findings into comprehensive insights.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Deep Research Agent with OpenAI Agents SDK and Firecrawl

Fully functional AI agent app with step-by-step instructions (100% opensource)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Agentic Deep Research Done Right: Accurate and Affordable 🎯

Everyone's rushing to launch a Deep Research agent, slapping on a $20+ monthly fee for the agentic research privilege. But They're Getting It Wrong – most aren't even giving the right facts. Agentic AI search platform Liner is changing that.

Meet Liner Deep Research: The multi-agent research system that actually delivers. Liner (again!) topped OpenAI's SimpleQA factuality benchmark, beating Perplexity's Deep Research and even OpenAI's own state-of-the-art GPT-4.5, hitting a stunning 95.3% accuracy. That’s not all – what took OpenAI and Perplexity Deep Research 5-6 minutes, Liner did it in just 1 minute with research from more sources.

Here's what Liner Deep Research does:

✅ Multi-Agent Teamwork - Liner deploys a network of specialized AI agents. You'll see them work: initial search, deep dives into sources, synthesis, validation – each agent has a focused role.

✅ Watch Agents in Action - Watch as your question gets broken into specific research angles – no more black-box answers without context.

✅ Visual Insights - Who wants to slog through a 10-page text dump? Liner delivers high-quality, visual insights – tables, charts, and summaries.

✅ From the Source - Accuracy is everything. Liner automatically prioritizes and cross-references information from 50+ trusted sources like ArXiv, PubMed, and Nature. Plus, their proprietary re-ranker prioritizes the most reliable web pages, not just popular ones.

✅ Your Research, Your Way - Export your complete findings in the format you need: PDF, MS Word, OneNote, or a simple text file. It's all about seamless integration into your workflow.

Don't settle for the inflated prices and questionable accuracy of other "deep research" tools. Try Liner Deep Research today with 10 free reports EVERY DAY (offer up for a limited period!).

Build Voice AI Agents in Few Lines of Code 🗣️🎙️🦾

OpenAI has released a suite of tools for developers to build intuitive and customizable voice AI agents. The new offerings include two state-of-the-art speech-to-text models, a new text-to-speech model with improved expressiveness, and updates to their recently launched Agents SDK for seamless integration of voice capabilities.

These tools allow you to create voice experiences where your end users can speak naturally to AI agents that not only understand what is said but can respond with appropriate tone and emotion—all with minimal code changes to existing text-based agents.

Key Highlights:

State-of-the-art STT - OpenAI introduced gpt-4o-transcribe and gpt-4o-mini-transcribe, two new speech-to-text models that set a new bar for accuracy. They boast significantly lower Word Error Rates (WER) across multiple languages compared to previous Whisper models and competitors, and are specifically designed to handle challenging audio conditions like background noise and accents.
Highly Customizable TTS - The new gpt-4o-mini-tts text-to-speech model gives you fine-grained control over the style of generated speech. Using a simple "instructions" field, you can now direct the model to speak with specific tones, emotions, and pacing – think "mad scientist" or "sympathetic customer service agent."
Audio API Features - The updated STT APIs now include features like streaming, which delivers real-time transcription responses. Other improvements are built-in noise cancellation, and a semantic voice activity detector that intelligently segments audio. A significant plus is that the style and tone for TTS is prompted, not tuned, granting more flexibility.
Agents SDK Voice Integration - The Agents SDK now includes a VoicePipeline component. This involves a simple three-step process: STT, agent logic execution, and TTS, with minimal code changes required. It’s just like adding an STT and a TTS model to both ends of your text AI agents.
API Pricing and Demo - gpt-4o-transcribe is priced at $0.06/minute (same as Whisper), while gpt-4o-mini-transcribe comes in at just $0.03/minute. gpt-4o-mini-tts is even more affordable at $0.01/minute. You can immediately experiment with the new TTS model and even grab the code at openai.fm.

Quick Bites

Here’s an interesting study from the METR team where they reveal a "Moore's Law for AI agents": the length of tasks AI agents can complete autonomously is doubling approximately every 7 months. Current frontier models like Claude 3.7 Sonnet can handle tasks that take humans about an hour with 50% reliability, while earlier models like GPT-2 could only manage tasks of a few seconds.

If this exponential trend continues, AI agents could be capable of autonomously completing week-long projects by the end of this decade, dramatically expanding their practical applications and economic impact.

Predibase is the first end-to-end platform for Reinforcement Fine-Tuning (RFT), allowing developers to adapt open-source LLMs with minimal labeled data. Their platform combines fully managed, serverless infrastructure with an integrated workflow from data to deployment, featuring optimized GRPO methodology that DeepSeek-R1 popularized.

OpenAI has released o1-pro in the API available to select developers on tiers 1-5. Supports vision, function calling, structured outputs, and works with the Responses and Batch APIs, but comes with heft pricing at $150/1M input tokens and $600/1M output tokens — 2x the input cost of GPT-4.5 and 10x the output cost of regular o1.

A little late to the party but Claude can now search the web to provide more up-to-date and relevant responses. When Claude uses information from the web, it provides direct citations so you can easily fact-check sources. Available for all paid Claude users in the US. To get started, toggle on web search in your profile settings.

Tools of the Trade

Glama AI: An all-in-one AI workspace that provides access to multiple AI models through a single interface with features like API gateway, agents, and MCP integration. It hosts a directory of MCP servers and tools that extend AI capabilities through standardized connections.
MCP.so: A repository that hosts various MCP server implementations, including multi-service MCP servers for services like GitHub, GitLab, Google Maps, memory storage, and web automation via Puppeteer.
Prompt flow by Microsoft: An end-to-end toolkit for developing LLM-based applications that streamlines the entire process from creation to deployment by connecting LLMs, prompts, code, and other tools into executable workflows.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

first, they told us that leetcode questions are the best way to interview software engineers.
now, they tell us that models optimized on competitive programming benchmarks will take software engineer jobs.
do they even understand what software engineers do? ~
Santiago
It's 2025 and most content is still written for humans instead of LLMs. 99.9% of attention is about to be LLM attention, not human attention.
E.g. 99% of libraries still have docs that basically render to some pretty .html static pages assuming a human will click through them. In 2025 the docs should be a single your_project.md text file that is intended to go into the context window of an LLM.
Repeat for everything. ~
Andrej Karpathy

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.