- unwind ai
- Posts
- Build AI Agents at Scale
Build AI Agents at Scale
PLUS: Cohere's Command R 7B, ChatGPT gets Google Search-like features
Today’s top AI Highlights:
Open-source platform to build AI agents using your existing code
Cohere’s Command R7B brings advanced RAG to standard MacBooks and CPUs
xAI releases new version of Grok-2 on X and via API, $25 free credits available
Google’s NotebookLM now lets you “Join” the AI podcast to ask questions verbally
Windsurf IDE’s AI code agent gets a memory to remember your coding style
& so much more!
Read time: 3 mins
AI Tutorials
Building powerful RAG applications has often meant trading off between model performance, cost, and speed. Today, we're changing that by using Cohere's newly released Command R7B model - their most efficient model that delivers top-tier performance in RAG, tool use, and agentic behavior while keeping API costs low and response times fast.
In this tutorial, we'll build a production-ready RAG agent that combines Command R7B's capabilities with Qdrant for vector storage, Langchain for RAG pipeline management, and LangGraph for orchestration. You'll create a system that not only answers questions from your documents but intelligently falls back to web search when needed.
Command R7B brings an impressive 128k context window and leads the HuggingFace Open LLM Leaderboard in its size class. What makes it particularly exciting for our RAG application is its native in-line citation capabilities and strong performance on enterprise RAG use-cases, all with just 7B parameters.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
Inferable is an open-source platform to help you build reliable production-ready AI agent applications that integrate with your existing infrastructure. It lets you transform your functions and APIs into tools for AI agents with built-in support for distributed execution, fault tolerance, and structured outputs.
You won't need to learn new frameworks, but you do need to bring your domain expertise; it does all the AI magic for you, including model routing. The platform’s on-premise execution model ensures your code runs securely within your environment, and you also have complete control over data and compute.
Key Highlights:
Integration with Existing Code - Inferable allows you to easily convert existing functions, REST APIs, and GraphQL APIs into tools. You can use your existing codebase without major overhauls, saving development time and reducing the learning curve. The platform handles complexities like serialization, routing, and fault tolerance out of the box.
Durable Execution - The platform's "Runs" feature provides complete end-to-end state management, making it easy to build complex workflows that incorporate agent chat, function calls, and point-in-time recovery.
Human-in-the-Loop - A simple API for "Human-in-the-Loop" lets you pause execution for approvals, regardless of how long it takes for a human response, ensuring that critical processes are reviewed before moving ahead.
On-Premise Execution - Your functions always run on your infrastructure, with only the outputs being sent to Inferable’s control plane, and you maintain complete control over your data and sensitive information. Enterprise users have the option of using their own models with the managed runtime.
Architecture - You can start building right away using your preferred language, with native SDKs available for Node.js, Go, and .NET. It also comes with a built-in ReAct agent and provides LLM routing for optimal performance. The platform is fully open-source and self-hostable.
Cohere has just released Command R7B, the smallest, fastest, and final model in their R series of LLMs. This 7B model is designed for speed and efficiency, running even on commodity GPUs, even MacBooks and CPUs, while still delivering high performance.
Command R7B maintains the 128k context window of its larger siblings while delivering strong capabilities in multilingual processing, RAG with citations, reasoning, and tool use. It shows particularly strong results in enterprise use cases, outperforming similarly-sized open models on tasks like RAG, function calling, and agent behaviors. The model’s weights are open for self-hosting and is available via API.
Key Highlights:
Optimized for Speed - Built for high-throughput applications like chatbots and code assistants, the model runs efficiently on consumer GPUs and CPUs. This enables faster development cycles and significantly lower deployment costs compared to larger models.
Strong Enterprise Features - Excels at business-critical tasks with built-in citation support for RAG, robust tool-use capabilities, and reliable agent behaviors. The model ranks first among similar-sized open models on the HuggingFace Open LLM Leaderboard.
Production-Ready Performance - Maintains high accuracy across math, code, and multilingual tasks while using fewer parameters than competitors. Particularly strong at avoiding unnecessary tool calls and handling multi-step reasoning - critical features for production environments.
Cost-Effective Deployment - Available through Cohere's API at $0.0375 per 1M input tokens and $0.15 per 1M output tokens, with model weights also accessible on HuggingFace for self-hosting applications.
Quick Bites
OpenAI wrapped up its 8th day of the 12-day announcement series with some really exciting updates to ChatGPT’s Search functionality that’d give Google a sweat. They have released three new features:
Search in ChatGPT is now faster and gives visual information like images and videos as well as in-line citations within the ChatGPT interface.
The ChatGPT mobile app is also now integrated with Apple’s Maps to give location-based results like descriptions, operating hours, directions, etc. What makes it even better than Maps is the ability to ask follow-up questions about locations.
Search is now integrated with the Advanced Voice Mode, something that we all have been waiting for. This will be released in the coming days.
Search capabilities are now being rolled out to all free, logged-in users of ChatGPT worldwide, across all platforms (web, iOS, and Android).
xAI has rolled out an upgraded version of Grok-2 to all X users, featuring 3x faster performance and improved accuracy, instruction-following, and multilingual capabilities. They have also released two new API models (grok-2-1212 and grok-2-vision-1212) with reduced pricing at $2/1M input tokens and $10/1M output tokens. You can try the API with $25 in free credits. xAI will add their new Aurora image generation model to the API in the coming weeks.
Codeium's Windsurf IDE just got a wave of updates! The latest version introduces "Cascade Memories," letting you set rules for its AI, like preferred languages or API usage. Plus, Cascade can now auto-run safe terminal commands, with customizable allow/deny lists. Other goodies include beta WSL support and improved devcontainer integration.
Google’s NotebookLM has received a significant overhaul with a redesigned interface optimized for managing and generating content. The update also lets you “Join” the AI-generated podcast to interact directly with the AI hosts using your voice, asking questions and receiving explanations. Additionally, Google has launched NotebookLM Plus, a subscription tier offering increased usage limits, enhanced customization, and team collaboration features for power users and organizations.
Tools of the Trade
Llama 3.3 crawler: Tool that crawls websites, extracts structured data, and is built using Together AI and Firecrawl's new extract endpoint. It refines a crawling objective, crawls, and structures extracted data.
Your Source to Prompt: Browser-based tool to convert local code files into a single text file optimized for LLM prompts, running entirely in your browser - no installation or external dependencies required. It provides a GUI for selecting files, supports preset management for repeated use, includes minification options, and more.
Prompty: A standardized asset format for LLM prompts that enhances their observability, understandability, and portability. It includes a VS Code extension for creating, previewing, and running prompts with various model configurations, and is integrable with popular orchestration frameworks.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
For real though, why isn't Elon just funding Anthropic? Why Grok? I thought his issue was with OpenAI specifically? Just imagine whatever is behind Sonnet-3.5 with Grok's budget. I don't get it, human progress is slowed by petty human issues. That's so stupid ~
Victor TaelinMy advice to the younger generation:
Focus all your energy on AI.
Put all your savings in Bitcoin. ~
Fred Krueger
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply