• unwind ai
  • Posts
  • Agent Protocol to Deploy AI Agents in Production

Agent Protocol to Deploy AI Agents in Production

PLUS: Train AI models 50% faster, Control your computer with your voice

In partnership with

Today’s top AI Highlights:

  1. All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  2. LangChain’s Agent Protocol for deploying AI agents in production environments

  3. Train AI models 50% faster using PyTorch's latest Float8 implementation

  4. Control your computer with your voice - No keyboard or mouse needed

  5. One stack is all you need to create robust AI agents

& so much more!

Read time: 3 mins

AI Tutorials

Choosing the right model for AI tasks is often challenging. Some tasks need the power of GPT-4o, while others are efficiently handled by lighter models like smaller Llama models. This is where RouteLLM helps—it automates routing between different models based on query complexity.

In this tutorial, we’ll show how to build a Multi-LLM Routing Chat App using RouteLLM in just 30 lines of Python code. This Streamlit app dynamically selects the best model for every query, ensuring efficiency and performance.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Vector databases, LLMs, and data pipelines shouldn't require three different systems to set up and maintain. Neuml's txtai packs everything you need into one Python framework. It combines vector databases, graph networks, and relational databases into a system that works right out of the box - whether you're testing locally or deploying to production.

You can quickly build anything from basic similarity search to complex RAG pipelines and autonomous agents. And with pre-built components for common tasks like transcription, translation, and summarization, you can focus on building features instead of wrestling with infrastructure.

Key Highlights:

  1. Fast prototyping with minimal code - Index and search data in just 3 lines using the built-in API. No need to manually set up separate vector stores, databases or orchestration layers. The system handles data storage, vector computation and query processing out of the box.

  2. Production-ready deployment options - Run locally during development, then seamlessly deploy to containers, serverless functions or Kubernetes clusters. Built-in support for model caching, cloud storage sync, and API bindings for Python, JavaScript, Java, Rust and Go.

  3. Flexible architecture for custom workflows - Mix and match embeddings, LLMs, and specialized models based on your needs. The workflow system lets you chain components together while handling batching and scaling automatically. Easily integrate with external tools via the agent framework.

  4. Practical RAG implementation with citations - Build production RAG applications with built-in support for source tracking and citations. The system handles vector search, prompt construction, and response generation while maintaining references to original documents. Integration with popular LLM frameworks like llama.cpp and hosted API services.

  5. Quick start - With pip install txtai and the default models handle common use cases out of the box. Over 60 example notebooks cover everything from basic search to complex workflows.

An entirely new way to present ideas

Gamma’s AI creates beautiful presentations, websites, and more. No design or coding skills required. Try it free today.

LangChain has released Agent Protocol, a framework-agnostic API specification for deploying LLM agents in production environments. The protocol standardizes how agents handle execution runs, manage conversational threads, and interact with persistent storage - three core components that developers frequently rebuild from scratch.

While LangGraph Platform implements this protocol natively, LangChain has open-sourced the specification to encourage broader adoption and community implementations. The protocol addresses common production challenges like concurrency control, state management, and streaming outputs, providing developers with ready-to-use patterns instead of building custom solutions.

Key Highlights:

  1. Flexible Execution Control - The protocol offers both synchronous and asynchronous execution modes for agent runs, letting developers choose the best approach for their use case. It includes built-in handling for common production scenarios like retry mechanisms, connection drops, and traffic spikes.

  2. Smart State Management - You can manage conversation state and history through a thread system that handles concurrent access and automatic versioning. The system maintains an append-only log of states and supports operations like copying threads or rolling back to previous states.

  3. Production-Ready Memory System - The memory system provides flexible storage options with support for different scoping levels - from user-level to thread-level memory. Developers can store and retrieve both simple text and structured data, with built-in search functionality that makes it easy to implement features like context retrieval.

  4. Developer-Friendly Implementation - The API follows RESTful principles with clear documentation and predictable patterns, making it straightforward to implement in existing systems. It provides streaming capabilities for real-time updates and includes practical features like connection recovery and run cancellation, addressing common challenges in production deployments.

Quick Bites

PyTorch users can now achieve up to 50% faster training throughput using float8 precision with FSDP2, DTensor, and torch.compile. These improvements, demonstrated on Meta Llama models ranging from 1.8B to 405B, show significant speed gains without sacrificing loss convergence or model quality. You can readily implement these updates using the latest PyTorch nightlies and torchao.

AI2 just dropped OLMo 2 family of 7B and 13B language models trained on a whopping 5T tokens. Notably, OLMo 2 7B outperforms Llama 3.1 8B and OLMo 2 13B outperforms Qwen 2.5 7B despite its lower total training FLOPs. OLMo 2 brings not just model weights, but also the full training data, code, and recipes. Check out the playground and start experimenting with this new, fully-open model.

Fireworks AI has launched f1, a compound AI model for complex reasoning, simplifying development by allowing declarative prompting instead of intricate system building. Available now in preview on the Fireworks AI Playground, f1 and its faster variant, f1-mini, aim to match or exceed the reasoning capabilities of many frontier models.

This voice AI will use your computer and make mouse and keyboard outdates! Hume AI model EVI 2 now works with Anthropic’s Computer Use API to make it happen. EVI processes speech, sends instructions to the agentic computer control loop, explains its actions with voice, and can even be interrupted to change course—all in real-time through an API. Perfect for crafting voice-first apps like smarter assistants or customer support tools!

Tools of the Trade

  1. AgentStack: A command-line tool for quickly creating AI agent projects in Python. It provides preconfigured frameworks and tools like CrewAI, AutoGen, Mem0, MultiOn, E2B, etc., and a simple project structure. It is not a low-code alternative to development.

  2. Phantasm: Toolkit to build human-in-the-loop approval layers for AI agents, letting humans monitor and guide AI workflows in real-time. It includes a server, dashboard, and client library to integrate human approval into AI agent actions before execution.

  3. AgentServe: A lightweight framework for hosting and scaling AI agents. It is easy to use and integrates with existing projects and agent / LLM frameworks. It wraps your agent in a REST API and supports optional task queuing for scalability.

  4. Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

  1. everyone loves to quibble about the definition of AGI, but it's really quite simple
    AGI is defined as the capability at which OpenAI chooses to terminate their agreement with Microsoft ~
    James Campbell


  2. The irony of AI is we expected it to be awesome at math & be all cool logic (“your idea of love does not compute!”)
    Instead, AI is bad at math, just wants to write poems, & is all hot, weird simulated emotion. For example, if you make GPT-3.5 “anxious,” it changes its behavior! ~
    Ethan Mollick

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.