unwind ai
Posts
Opensource AI-agent-as-a-Service

Opensource AI-agent-as-a-Service

PLUS: GitHub Copilot free in VS Code, AI agent automations with no-code

Shubham Saboo & Gargi Gupta
December 19, 2024

Today’s top AI Highlights:

Build AI agents as services with this open-source framework
LLMs move beyond tokens to a new way of understanding text
You can now call ChatGPT and chat with it on WhatsApp
GitHub Copilot is now FREE in VS Code
Open-source version of bolt.new - choose the LLM you want to build full-stack apps with

& so much more!

Read time: 3 mins

AI Tutorials

Building powerful RAG applications has often meant trading off between model performance, cost, and speed. Today, we're changing that by using Cohere's newly released Command R7B model - their most efficient model that delivers top-tier performance in RAG, tool use, and agentic behavior while keeping API costs low and response times fast.

In this tutorial, we'll build a production-ready RAG agent that combines Command R7B's capabilities with Qdrant for vector storage, Langchain for RAG pipeline management, and LangGraph for orchestration. You'll create a system that not only answers questions from your documents but intelligently falls back to web search when needed.

Command R7B brings an impressive 128k context window and leads the HuggingFace Open LLM Leaderboard in its size class. What makes it particularly exciting for our RAG application is its native in-line citation capabilities and strong performance on enterprise RAG use-cases, all with just 7B parameters.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a RAG Agent with Cohere ⌘R

Fully functional RAG Agentic system using Command R7B (step-by-step instructions)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

AI Agent Server for the Enterprise 🏢

Eidolon AI is an open-source framework to build and deploy enterprise-grade AI agents as seamless services. The platform removes deployment complexity by treating agents as infrastructure rather than applications, allowing deployment right into your organization's Kubernetes pipeline. Built with a YAML-based approach,

Eidolon helps quickly configure agents and handles agent-to-agent communication out of the box – no need to build custom networking layers or message formats.

Key Highlights:

Agent Development & Deployment - Eidolon significantly cuts down development time with its pre-built agent templates, and declarative YAML configuration. You can start quickly and define custom agents using existing frameworks or plain code. Direct Kubernetes deployment ensures that the agents scale efficiently and comply with enterprise security policies.
Modular and Pluggable Components - Eidolon lets you easily swap components such as LLMs (OpenAI, Anthropic, Mistral, etc.) and memory backends. You can configure every part of an agent's processing unit (APU) with reusable references, allowing for both experimental iterations and solid system configuration.
Inter-Agent Communication - Eidolon includes a built-in mechanism for agents to communicate with one another. You can define agent-to-agent communication using simple YAML configurations (agent_refs) to create more sophisticated multi-agent systems. It automatically generates tool functions from agent definitions.
Enterprise-ready Features - It includes policy enforcement capabilities to control resource access and security boundaries between agents. Features like containerization and human-in-the-loop options make it production-ready for enterprise environments. Comprehensive logging and monitoring track every agent action and decision. The platform provides clear debugging tools and audit trails.
Consumption Methods - Agents can be consumed through a range of methods. A REST API, React components, and a CLI offer flexibility for building UIs, connecting agents directly with applications, or just experimenting. You can start with a simple CLI interaction for experimentation, create a UI with React components, or just use HTTP requests.

The Era of Fixed Tokens May Be Over 📝

Meta has released BLT (Byte Latent Transformer) - a major shift away from traditional tokenization in LLMs. For the first time, a byte-level architecture not only matches tokenizer-based model performance but opens up new possibilities for scaling.

BLT works directly with raw bytes, dynamically grouping them into patches based on complexity - no fixed vocabulary needed. This brings a unique advantage: you can now scale up model size without proportionally increasing inference costs by adjusting patch sizes. It also yielded impressive results across the board, from better handling of messy inputs to improved performance on low-resource languages, while potentially cutting inference costs by up to 50%.

Key Highlights:

Resource Management - BLT introduces a fundamentally different approach to text processing: rather than using fixed tokens, it dynamically adjusts patch sizes based on complexity. When handling predictable content like common word endings, it creates larger patches to save compute. For complex sequences requiring detailed analysis, it maintains smaller patches - compute allocation exactly where it's needed.
Three-Part Harmony - BLT orchestrates three specialized components: a lightweight Local Encoder converting bytes into patch representations, a powerful Latent Transformer handling high-level reasoning, and a Local Decoder generating the final byte sequence. This gives efficient processing while preserving access to vital byte-level details.
Scaling Innovation - Here's where BLT really shines: you can grow your model size while keeping inference costs in check by adjusting patch sizes. This new dimension of scalability means better performance without the usual computational overhead.
Built-in Resilience - Working directly with bytes gives BLT natural advantages in handling messy inputs, understanding character-level patterns, and processing low-resource languages. The model shows particular strength in tasks requiring precise text manipulation, spelling analysis, and working with diverse scripts and languages.
Open-source Code - The team has open-sourced code so you can experiment and integrate into your current AI/ML workflows.

Quick Bites

Anthropic has made several features generally available in their API, including prompt caching (cutting costs by up to 90%), an expanded Message Batches API supporting 100k messages per batch, token counting, and visual PDF support. Alongside these, new Java and Go SDKs (in alpha) have been released with type-safe API access and convenient helpers for authentication, pagination, error handling, and retries in their respective languages.

Nexa AI has released OmniAudio-2.6B, the fastest and most efficient audio-language model, reaching up to 66 tokens/second. This model integrates audio and text processing into a single, efficient architecture, enabling responsive voice QA, content generation, and more directly on devices with just 1.3GB RAM. You can explore the model through HuggingFace or with the Nexa SDK for local deployment.

OpenAI just made ChatGPT accessible through phone calls and WhatsApp, allowing users to interact with the AI through voice conversations and messaging. US users can call 1-800-242-8478 for 15 free minutes of voice chat per month, while WhatsApp access is available globally for text-based conversations.

NVIDIA has supercharged its entry-level AI developer kit with the new and compact Jetson Orin Nano Super Developer Kit, delivering AI performance of 67 TOPS (up from 40 TOPS) and memory bandwidth of 102 GB/s through a software update. Priced at $249, this compact edge AI powerhouse lets you run modern generative AI models including LLMs and vision models. Existing Jetson Orin Nano users can upgrade their kits via a free software upgrade.

GitHub Copilot is now available for free in VS Code! With just a GitHub account, developers get 2000 monthly code completions and 50 chat requests, accessing both GPT-4o and Claude 3.5 Sonnet models. The free plan also includes new features like multi-file editing, custom instructions, full project awareness, voice chat, and terminal integration, and will soon support vision-based UI generation.

Tools of the Trade

Helicone: Open-source LLM developer platform that logs, observes, analyzes, and evaluates your LLM API requests through a simple integration. It also integrates with numerous LLM providers and frameworks.
Workloop: No-code platform for building automated workflows using AI Agents. It allows you to integrate various tools, create workflows via drag-and-drop nodes, and schedule automated runs via triggers.
bolt.diy: Open-source version of Bolt.new. Build full-stack web apps in your browser with the LLM you want to use - OpenAI, Anthropic, Ollama, OpenRouter, Gemini, LMStudio, xAI, HF, DeepSeek, or Groq models - and it is easily extended to use any other model supported by the Vercel AI SDK.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

Why are they called agents? Because it sells… try explaining a function calling LLM that can query a db and use other “tools” to someone that isn’t on X and you’ll realize they won’t get it ~
anton
Given that Google has assembled all the pieces for a working AI assistant in the coming months with Gemini 2 Flash multimodal plus Mariner, I really wonder if Apple catches up or if AI is finally the Nokia moment for iPhones. ~
Ethan Mollick

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.