- unwind ai
- Posts
- Agents Are Not Enough
Agents Are Not Enough
PLUS: Multi-agent system in few lines of code, Opensource Computer Use model
Today’s top AI Highlights:
AI agents are not enough to make Agentic AI a reality
Build multi-agent systems in a few lines of code with LlamaIndex’s new framework
ByteDance made Computer Use free for everyone (100% opensourced)
Browser infrastructure for your AI Apps and Agents
& so much more!
Read time: 3 mins
AI Tutorials
Game development demands handling a daunting array of specialized skills - a compelling narrative and storylines, intricate mechanics, visual aesthetics, technical architecture, and more. It’s a struggle synchronizing these - scope creep, misaligned creative visions, and technical bottlenecks.
In this tutorial, we'll build an AI Game Design Agent Team that coordinates multiple specialized AI agents - each focusing on their domain expertise - to generate cohesive game concepts where narrative, gameplay, visuals, and technical specifications work in harmony.
The entire process is automated so developers can quickly iterate on ideas and ensure all crucial aspects of game design are considered.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
The buzz around AI agents is growing, but building better AI agents isn't just about making them more capable - the real challenges run deeper than technical improvements. New research suggests that simply creating more capable autonomous programs isn't enough for widespread adoption.
University of Washington and Microsoft researchers give a new perspective on agentic systems that focuses on the user experience, not just task completion. This new ecosystem comprises of Agents, Sims, and Assistants. 'Sims' would represent each user, having an awareness of the user’s preferences, and 'Assistants' manage user interactions and coordinate complex tasks using a network of specialized agents.
Key Highlights:
Rethinking Agent Architecture - The paper critiques current agent designs, from rule-based systems to multi-agent frameworks, because of a lack of generalization, scalability, and limited real-world adaptability. Think beyond pure LLM outputs and integrate more structured, explainable elements for more reliable agent behaviors.
Introducing Sims for Personalized AI - "Sims" captures user-specific profiles, preferences, and contexts, bridging the gap between raw task execution and user needs. Think of "Sims" as a detailed, dynamically updated user persona, which can inform agent behavior across various tasks. Instead of giving agents direct access to user data, implement this Sims layer which acts as a privacy-preserving middleware for agent-user interactions..
Assistants as Orchestrators - Assistants are personalized agents that directly interact with users. These Assistants, with a deep understanding of user needs, handle the user interactions, coordinating Sims and Agents for task execution. This layer offers a new framework for us to explore how to create systems that can effectively balance autonomy and user control.
Ecosystem-First Approach - The core message is that we need to move away from individual "super-agents" to a model built on an agent ecosystem. We need to start thinking about separating our agent architecture into three layers: task-specific agents, user representatives, and orchestrators. This separation will help us focus on each component’s strengths, and make our system more maintainable and scalable.
LlamaIndex has released AgentWorkflows, a high-level system built on top of their Workflows orchestration layer that lets you create multi-agent AI systems with just a few lines of code. The system handles collaboration and handoffs between agents based on their specialized capabilities while maintaining state and context across interactions.
AgentWorkflows introduces built-in abstractions for human-in-the-loop capabilities, state management, and real-time monitoring - features that require significant custom development work. You can now focus on defining agent behaviors and tools rather than building orchestration infrastructure.
Key Highlights:
Multi-Agent Orchestration - AgentWorkflows significantly reduce the boilerplate code needed to set up multi-agent systems. Instead of managing complex logic, you can focus on defining specialized agent roles and their tools, using a few lines of code.
Agent Types and Customization - You are not locked into a single agent pattern, as AgentWorkflows support FunctionAgent (ideal for LLMs with function calls) and ReActAgent (suitable for any LLM). You can easily plug in custom tools and even define custom agent types to match your specific needs.
Shared Context and State Management - AgentWorkflows provide a global Context object accessible to all agents and tools. This shared context ensures state is maintained across interactions. Crucially, you can easily access and modify state within tools, and serialize the whole context for persistent runs – allowing you to save and restore workflows mid-execution for more robust and stateful applications.
Granular Control Flow - You can define agent handoff rules, implement approval workflows with human-in-the-loop capabilities, and customize the interaction between agents - all while keeping the underlying complexity of multi-agent orchestration abstracted away.
Quick Bites
Liquid AI has released LFM-7B, a new 7 billion parameter model based on liquid foundation model architecture. LFM-7B is optimized for private enterprise chat, code, fast instruction following, and agentic workflows. With an impressive performance across benchmarks in its size class, the model exhibits a low memory footprint and fast inference speed. It is available through multiple platforms including AWS Marketplace, Lambda API, and Openrouter.
ByteDance has opensourced UI-TARS (User Interface — Task Automation and Reasoning System), a native GUI agent model to control computers through GUI interactions, building on the concept similar to Claude's Computer Use API but making it freely accessible. This model combines perception, reasoning, grounding, and memory into a unified vision-language model (VLM) to automate tasks across desktop, mobile, and web platforms.
The model follows a workflow of: analyzing screenshots through computer vision, planning a sequence of actions based on the task goal, and executing these actions, while continuously monitoring the screen for changes and adjusting its approach if needed.
UI-TARS achieves state-of-the-art performance on multiple GUI agent benchmarks, outperforming models like GPT-4o and Claude.
The model (in 3 sizes) and its code have been open-sourced under Apache License 2.0, making it freely available to use and build upon.
Tools of the Trade
Hyperbrowser: A cloud platform to launch and manage headless browsers at scale, with built-in features for bypassing anti-bot detection, solving CAPTCHAs, and managing browser sessions. It can spin up hundreds of concurrent browser sessions with sub-second startup times to help AI agents reliably interact with websites.
NodeFlow AI: A visual workspace to create custom analysis workflows by connecting content from various platforms (YouTube, Instagram, TikTok, PDFs) to AI models like ChatGPT and Claude through a node-based interface.
Cuse: Open-source framework for AI agents to interact with computers through features like display control, file operations, and shell access, while handling authentication through a secure keychain service for the agent to log into services.
Dropstone: AI tool to automate common software development tasks like debugging, code optimization, and project analysis. The tool integrates with GitHub and streamlines development workflows through automated code scanning, bug detection, and performance optimization features.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
Re India training its foundation models debate: I feel like India fell into the same trap I did while running Perplexity. Thinking models are going to cost a shit ton of money to train. But India must show the world that it's capable of ISRO-like feet for AI. Elon Musk appreciated ISRO (not even Blue Origin) because he respects when people can get stuff done by not spending a lot. That's how he operates. I think that's possible for AI, given the recent achievements of DeepSeek. So, I hope India changes its stance from wanting to reuse models from open-source and instead trying to build muscle to train their models that are not just good for Indic languages but are globally competitive on all benchmarks…. ~
Aravind SrinivasThe deeper I get into my project, the more I realize how far we are from having an AI that replaces developers. ~
Santiago
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply