unwind ai
Posts
GPT-4.5 Performance at 1% Cost

GPT-4.5 Performance at 1% Cost

PLUS: Build AI web agents, AI Coding Agent to build full-stack apps in minutes

Shubham Saboo & Gargi Gupta
March 17, 2025

In partnership with

Today’s top AI Highlights:

China’s Baidu releases ERNIE model matching GPT-4.5 at 1% API price
Large action model framework to develop AI web agents
OpenSora, trained in just $200K, closes the performance gap
Ask Claude Code to “think”, “think more”, or “think harder”
AI Coding Agent to build full-stack apps in minutes in your IDE

& so much more!

Read time: 3 mins

AI Tutorials

In this tutorial, we'll show you how to create your own powerful Deep Research Agent that performs in minutes what might take human researchers hours or even days—all without the hefty subscription fees. Using OpenAI's Agents SDK and Firecrawl, you'll build a multi-agent system that searches the web, extracts content, and synthesizes comprehensive reports through a clean Streamlit interface.

OpenAI's Agents SDK is a lightweight framework for building AI applications with specialized agents that work together. It provides primitives like agents, handoffs, and guardrails that make it easy to coordinate tasks between multiple AI assistants.

Firecrawl’s new deep-research endpoint enables our agent to autonomously explore the web, gather relevant information, and synthesize findings into comprehensive insights.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Deep Research Agent with OpenAI Agents SDK and Firecrawl

Fully functional AI agent app with step-by-step instructions (100% opensource)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

GPT 4.5-level Performance at 1% Cost 💸📈

China is aggressively challenging Silicon Valley's dominance, first with DeepSeek R1 (which OpenAI is attempting to get banned from the US), and now with Baidu. Baidu just dropped two powerful new AI models, ERNIE 4.5 and ERNIE X1, offering performance similar to GPT-4.5 and DeepSeek R1, and they're immediately available for free for individual developers on ERNIE Bot!

ERNIE X1 is a deep-thinking reasoning model with multimodal capabilities that delivers performance on par with DeepSeek R1 at only half the price. Meanwhile, ERNIE 4.5 a new-generation native multimodal model, competing with GPT 4.5 at just 1% price. Both models are set to be open-sourced in June 2025. This is yet another challenge to Silicon Valley's dominance after the recent DeepSeek R1.

Key Highlights:

ERNIE 4.5 architecture: The model uses "FlashMask" Dynamic Attention Masking and Multimodal Mixture-of-Experts to optimize performance across different modalities. These enhance its understanding, generation, reasoning, and memory while reducing hallucinations.
ERNIE X1 specialized reasoning - Built with Progressive Reinforcement Learning and Chains of Thought and Action, ERNIE X1 excels in understanding, planning, reflection, logical reasoning, and complex calculations.
Technical capabilities - ERNIE X1 supports advanced tool use including document Q&A, image understanding, code interpreting, and various search capabilities, making it suitable for complex agentic applications and RAG implementations.
Benchmark Performance - ERNIE 4.5 strongly competes with SOTA models like GPT 4.5 and DeepSeek V3, matching or outperforming in benchmarks like MMLU Pro, GSM8K, and HumanEval+.
Aggressive Pricing - Baidu's pricing is incredibly disruptive. ERNIE X1's input costs start at just $0.28 per million tokens, and ERNIE 4.5's at $0.55 – that's 1% of the cost of comparable GPT-4.5.

This undercuts OpenAI and puts serious pressure on Western AI companies' pricing models. China is really not playing around.

The #1 AI Meeting Assistant

Still taking manual meeting notes in 2025? Let AI handle the tedious work so you can focus on the important stuff.

Fellow is the AI meeting assistant that:

✔️ Auto-joins your Zoom, Google Meet, and Teams calls to take notes for you.
✔️ Tracks action items and decisions so nothing falls through the cracks.
✔️ Answers questions about meetings and searches through your transcripts, like ChatGPT

Try Fellow today and get unlimited AI meeting notes for 30 days.

Get Started Free

Large Action Model Framework for AI Web Agents 🌐🧑‍🔧

Computer-using AI agents are quickly becoming the key focus in the AI automation space. People are ready to pay $200 a month for OpenAI Operator and everyone’s just waiting for Manus AI’s invite code. This is a great time to build truly capable computer-use agents and commercialize them. Even the most sophisticated solution Manus AI is powered by the open-source framework Browser Use.

So here’s another one: LaVague is an open-source framework to create AI web agents to automate processes for end users. These agents can take an objective in natural language, such as "Print installation steps for Hugging Face's Diffusers library," and generate and perform the actions required to achieve the objective.

Key Highlights:

Two-Component Architecture - LaVague uses a combination of a World Model and an Action Engine. The World Model takes the user objective and current state of a webpage to generate the next steps, and the Action Engine converts these instructions into executable automation code for browsers.
Multi-Environment Compatibility - The framework works across 3 driver options (Selenium, Playwright, and Chrome Extension), giving you flexibility in how you implement automation. Selenium provides the most comprehensive feature set with headless operation, iframe handling, and multi-tab support.
Model Flexibility - LaVague integrates seamlessly with popular AI providers through built-in contexts for OpenAI, Anthropic, Azure, Fireworks, and Gemini. You can use default configurations or swap in custom models from the LlamaIndex ecosystem to optimize for different performance needs.
Developer Tooling - The framework ships with tools including customizable configurations, token counting for cost estimation, comprehensive logging, and debugging utilities. An optional Gradio interface makes it easier to develop and test agents during the implementation.

Quick Bites

Singapore-based HPC-AI Tech has open-sourced Open-Sora 2.0, a high-quality video generation model trained for just $200,000—five to ten times cheaper than other model. The model narrows the performance gap with OpenAI's Sora to less than 1% on benchmark tests while outperforming other alternatives like HunyuanVideo and Runway GEN-3 in human evaluations. Available under Apache 2.0.

LM Studio now supports Speculative Decoding for faster token generation in llama.cpp and MLX engines. This technique uses a smaller "draft model" to predict tokens, which are then verified by the main model, leading to 1.5x-3x speedups. You can enable this feature and select compatible draft models within the LM Studio interface or via the API.

Nous Research has released their latest DeepHermes Preview models, a series of "Hybrid Reasoners" available in 24B, 8B, and 3B parameter sizes that lets you toggle between quick answers and detailed reasoning paths for harder problems. Built on Mistral and Llama 3 architectures, these models boast impressive performance gains—with the 24B version showing a fourfold improvement on complex math problems and a 43% boost on the STEM-based GPQA benchmark. Available through HuggingFace and Nous Research's API, with both standard and quantized GGUF versions.

Anthropic has released a new feature in Claude Code: extended thinking. You can simply ask Claude to “think”, “think more”, or “think harder” and it’ll show its extended thinking process. Powered by their hybrid reasoning model, Claude 3.7 Sonnet. First, tell Claude about your task and let it gather context from your project. Then, ask it to “think” to create a plan. Claude will think more based on the words you use.

Tools of the Trade

rtrvr.ai: AI web agent that understands your instructions in natural language and can carry out complex tasks across different websites. It can help you navigate, gather information, and perform actions automatically. Available via Chrome extension.
GoCodeo: AI coding agent in VS Code for building full-stack apps in minutes, with one-click Vercel deployment and seamless Supabase integration. It gives context-aware coding assistance and even automate unit testing. Works with multiple LLMs, across multiple programming languages and frameworks.
Augment Code: A developer AI platform designed for professional software engineers working with large, complex codebases. Offers features like codebase understanding, contextual chat, guided edits, and intelligent code completions. Integrates with popular IDEs including VSCode, JetBrains, Vim, and Neovim.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

Sam: "Totally hopeless to compete with us."
Sam: "I wish he would just compete by building a better product."
also Sam: We need to ban my competitors! ~
Santiago
Grok should automatically generate a high quality transcript for videos uploaded to X (X is a text-first platform).
YouTube has an illegible transcript layout and current Grok can’t even look inside a video uploaded to X. ~
Naval Ravikant

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.