unwind ai
Posts
Amazon Nova Web AI Agent or Adept 2.0

Amazon Nova Web AI Agent or Adept 2.0

PLUS: OpenAI's opensource reasoning model, Agentic terminal beyond Claude Code

Shubham Saboo & Gargi Gupta
April 01, 2025

Today’s top AI Highlights:

Amazon releases general-purpose web AI agent with SDK
The agentic terminal that does the work while you grab coffee
OpenHands 32B - Open coding agent model outperforming DeepSeek R1
OpenAI plans to go Open AI with their first open reasoning model
Train a robot in 2 mins and scrape the web on autopilot with no-code

& so much more!

Read time: 3 mins

AI Tutorials

We've been stuck in text-based AI interfaces for too long. Sure, they work, but they're not the most natural way humans communicate. Now, with OpenAI's new Agents SDK and their recent text-to-speech models, we can build voice applications without drowning in complexity or code.

In this tutorial, we'll build a Multi-agent Voice RAG system that speaks its answers aloud. We'll create a multi-agent workflow where specialized AI agents handle different parts of the process - one agent focuses on processing documentation content, another optimizes responses for natural speech, and finally OpenAI's text-to-speech model delivers the answer in a human-like voice.

Our RAG app uses OpenAI Agents SDK to create and orchestrate these agents that handle different stages of the workflow. OpenAI’s new speech model GPT-4o-mini TTS enhances the overall user experience with a natural, emotion-rich voice. You can easily steer its voice characteristics like the tone, pacing, emotion, and personality traits with simple natural language instructions.

We share hands-on tutorials like this every week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Voice RAG Agent

Fully functional agentic RAG voice app with step-by-step instructions (100% opensource)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Build Web AI Agents with Amazon Nova Act SDK 🧰🤖🕸️

Amazon has released Nova Act, a new AI model for web-browsing AI agents, handling tasks from filling forms to automating complex workflows with impressive accuracy. This puts Amazon in direct competition with similar browser automation tools from OpenAI Operator and Anthropic Computer Use, with all three tech giants betting that AI agents navigating the web will make today's chatbots significantly more useful.

The company is releasing a Research Preview SDK of Nova Act, with tools to break down complex workflows into reliable commands and chain them up. The SDK is designed to provide reliable building blocks that can be composed into complex automations that run without constant supervision.

Key Highlights:

Action-Oriented SDK - Nova Act SDK allows you to break down complex tasks into smaller, reliable atomic actions like "search", "checkout", "answer questions about the screen" and chain them to form complex workflows. Also, add more detailed instructions (e.g., "don't accept the insurance upsell"), call APIs, and even alternate direct browser manipulation through Playwright to further strengthen reliability (e.g., for entering passwords).
Python-first - The SDK is built to work seamlessly with your existing Python tools and libraries. This lets you leverage existing code, add custom logic, set breakpoints for debugging, and parallelize browser tasks with thread pools.
Headless Mode - Once you've got your agent working, you can switch to headless mode and schedule it to run in the background. Turn your agent into an API, automate tasks on a schedule, or integrate it directly into your product. A crop job to run it on a schedule is as simple as flipping a switch.
Structured Data Extraction - Nova Act isn't just about clicking buttons; it can also extract structured data from web pages. You can define a Pydantic class and the agent will return JSON matching that schema from web pages, which you can then directly use in your applications.

The Agentic Terminal That Does the Work For You 🏋️‍♀️🎯

Claude Code has a serious competitor. This weekend, we used Warp.dev to download a YouTube video, clip it, and caption it with Whispr AI without running a single command. This is how development should feel in 2025!

Warp.dev is a reimagined terminal with a general-purpose AI agent that understands natural language and executes complex tasks autonomously. You can simply type in what you want to accomplish. Warp creates a plan that you can review, and let Warp handle everything from code generation to environment setup for just about anything.

It can be implementing a feature, grokking a large codebase, or asking Warp to automate a task.

Command with AI - Use natural language to tell Warp what you need. No cryptic commands. Ask it to install dependencies, debug your code, or even deploy a server, all using plain English.
Plan, Don't Just Execute - Review and approve AI-generated execution plans before they run. See exactly what Warp will do before it does it, ensuring you're always in control and preventing unexpected actions.
Automate Complex Workflows - Delegate repetitive tasks to Warp's Agent Mode. Create and store reusable workflows and let Warp handle everything from linting and testing to git commits and pull requests.
Fix Errors Instantly - Let Warp analyze error messages and suggest solutions. Active AI proactively recommends fixes and next actions based on your terminal errors, inputs, and outputs.

OpenHands Coding Model that Outperforms DeepSeek R1 🧑‍💻📈

All Hands AI, the open-source version of Devin, just dropped two major releases. They've released OpenHands LM, an open-weight coding model built on the foundation of Qwen Coder 2.5 Instruct 32B, for local execution with competitive performance on real-world coding tasks. This 32B parameter model achieves an impressive 37.4% success rate on SWE-bench-verified, matching/outperforming the performance of DeepSeek R1 and the latest DeepSeek V3 which are 20x its size.

Alongside this model, All Hands AI has launched OpenHands Cloud, a hosted version of their platform accessible via web browser or GitHub integration, offering $50 in free credits for new users.

Key Highlights:

Local Deployment - OpenHands LM features a 128K token context window ideal for large codebases while remaining small enough to run on consumer hardware, making it accessible for individual developers.
Competitive Performance: The model resolves 37.4% of real-world issues on SWE-bench Verified, approaching the capabilities of DeepSeek's 671B parameter model (38.8%) despite being 20x smaller.
Specialized Training - Built on Alibaba's Qwen Coder 2.5 Instruct, OpenHands LM was fine-tuned using the SWE-Gym environment with successful agent trajectories generated from OpenHands itself working on diverse open-source repositories.
Access Options - You can download the model directly from Hugging Face, create an OpenAI-compatible endpoint with SGLang or vLLM, or use the new cloud platform which enables running multiple agents in parallel and solving GitHub issues by simply mentioning "openhands" in comments.

Quick Bites

OpenAI is preparing to release its first open-weight model since GPT-2, with a focus on reasoning capabilities. Sam Altman announced they're organizing developer events globally to gather feedback and demonstrate early prototypes, inviting developers to sign up for sessions starting in San Francisco before expanding to Europe and Asia-Pacific regions.

Runway just dropped Gen-4, their new image generation AI model that creates videos with consistent characters and environments across different scenes. It lets you generate high-fidelity videos from reference images and text prompts, maintaining subject consistency while allowing different angles and environments—all without requiring fine-tuning or additional training. Gen-4 Image-to-Video is now available to all paid and Enterprise customers.

China's Manus AI is rolling out upgrades, including a new iOS app, extended context length, and improved multimodal capabilities. The platform is now fully powered by Anthropic's Claude 3.7 for all tasks with no fallback to 3.5, alongside a more stable sandbox environment. Manus has also introduced two premium subscription tiers—a $39/month Starter plan with 3,900 credits and 2 concurrent tasks, and a $199/month Pro plan offering 19,900 credits and up to 5 simultaneous tasks.

Tools of the Trade

Gemini Code: AI coding agent that brings Gemini 2.5 Pro directly to your terminal. Inspired by Anthropic's Claude Code, it can write, debug, and refactor code, execute terminal commands, manage files and directories, run tests, and perform code quality checks—all without leaving your CLI.
Nodezator: A multi-purpose node editor for Python. It takes your functions (or other callables) and turns them into visual Python nodes, allowing you to create and execute complex node layouts and even export them back as Python code.
Maxun: No-code opensource platform to extract data from any website with a simple point-and-click interface. You can create no-code robots in 2 minutes to automate data extraction, run robots on a specific schedule, handle infinite scrolling and pagination, and more.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

Prediction (a bit crazy, but I wouldn’t be shocked if it happens):
Big AI will soon start lobbying universities to transition Computer Science curriculums from “programming” to “prompting”. ~
Santiago
Modern AI coding tools write poor code. They don't recognize when to split a function or divide a file. They keep adding code until it's unmaintainable. This makes sense: the system gets a user prompt and codebase context, follows the prompt, and considers the task complete. It has no incentive to refactor code into modular, readable pieces. Worse, as code quality declines, refactoring becomes costlier and less aligned with the user's query, giving the AI even less reason to fix it. This is one of dozens of small anti-patterns modern language models exhibit when programming that companies will inevitably have to deal with in the coming months and years. ~
Yosi Frost

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.