- unwind ai
- Posts
- Turn AI Agents into Production APIs
Turn AI Agents into Production APIs
PLUS: Social media simulator with 1 million AI agents, Open-source web app testing agent
Today’s top AI Highlights:
Open-source, hackable, and production-ready framework to build AI agents
Build, debug, and evaluate RAG pipelines with centralized tracing
DeepSeek V3 outperforms Claude 3.5 Sonnet and GPT-4o in coding and math
Social media simulator with 1 million AI agents
Create AI workflows with a drag-and-drop no-code canvas
& so much more!
Read time: 3 mins
AI Tutorials
We built a sophisticated RAG system with intelligent database routing that uses multiple specialized vector databases with an agent-based router to direct queries to the most relevant database.
This app allows users to upload multiple documents to three different databases: Product Information, Customer Support & FAQ, and Financial Information. The user can query the uploaded information in natural language, and the app will route to the most relevant database.
When no relevant documents are found, it gracefully falls back to web search using DuckDuckGo. A confidence threshold mechanism ensures that vector similarity routing is only used when we're sufficiently confident about the database choice (score ≥ 0.5).
Tech stack: Langchain (RAG orchestration), Phidata (router agent), LangGraph (fallback mechanism), Qdrant (vector store), Streamlit (user interface), and GPT-4o as the LLM
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
The Cheshire Cat AI is an open-source framework to create production-ready AI agents without wrestling with infrastructure complexities. Built with Docker at its core, it lets you quickly spin up an AI agent that can process documents, connect to external APIs, and use both commercial and open-source LLMs.
The framework handles everything from vector storage to conversation management, letting you focus on building features instead of setting up databases. With live reload capabilities and a growing plugin ecosystem, you can quickly prototype and extend your agent's capabilities while maintaining full control over deployment.
Key Highlights:
Document Processing - Upload PDFs, markdown files, JSON, or web pages directly into the agent's memory. The framework automatically handles vector storage and retrieval, with built-in support for both PostgreSQL and SQLite databases. When you need to scale, transition seamlessly from local development to production without changing your code.
Plugin-First Architecture - Create plugins by simply adding a folder with your Python code - no complex boilerplate needed. The framework handles hot reloading, so you can iterate quickly while your agent stays online. Install community plugins with one click or build your own to add custom functionality like API integrations or specialized tools.
LLM Integration - Choose between commercial APIs like OpenAI/Claude or run open-source models locally through Ollama integration. The framework provides a unified interface, letting you swap models without rewriting code.
Production-Ready Features - Ships with essential production capabilities like user authentication, memory management, and an admin panel for monitoring. The 100% dockerized setup makes deployment straightforward, while the microservice architecture lets you easily integrate with existing systems through extensive HTTP and WebSocket APIs.
Humanloop is bringing dev-friendly evals into the AI development cycle, making it easier to ship and scale AI applications with confidence. The platform gives you everything you need to continuously evaluate your LLM apps through code or UI, with support for automated evals, human feedback collection, and real-time performance monitoring.
Unlike traditional code-centric tools, Humanloop adapts to AI's unique challenges where outputs are stochastic and testing needs to account for both code and data quality. The platform integrates seamlessly with existing CI/CD pipelines while providing an intuitive UI for collaborating with non-technical domain experts.
Key highlights:
Code as well as UI-first - Humanloop offers both UI and code-based editors with built-in version control for prompts, tools and agents. It works with various model providers, so you can use the best models for your specific tasks. Humanloop lets you maintain your model’s source of truth by storing it alongside your existing code in Git.
Flexible Evaluation - You can automatically evaluate your models, incorporating metrics into your deployment pipelines. Configure code-based, AI-driven, and human evaluators to prevent regressions and iteratively refine your systems. This allows integration with existing libraries via your own code.
Detailed Observability - Get step-by-step views of your system's performance through detailed logging of function calls, tool usage, and model interactions. Monitor production performance, track user feedback, and set up alerts using Humanloop's built-in tools.
Full visibility into RAG - Trace and replay every step of your RAG pipeline, from retrieval to generation. The platform tracks each component's performance and helps identify exactly where improvements are needed, taking the guesswork out of debugging complex AI systems.
API-First - Humanloop also provides code-first practices through its API and SDKs, offering full programmatic access for logging, evaluations, and deployments. It supports integrations for Python and Typescript.
Quick Bites
Researchers have introduced OASIS, a new social media simulator with AI agents to help scientists study online social dynamics without running potentially disruptive experiments on real platforms. OASIS can model interactions between up to one million AI agents across platforms like Twitter and Reddit, allowing researchers to observe how information spreads, groups polarize, and herd behaviors emerge at realistic scales.
OASIS successfully replicated several real-world social phenomena, with findings suggesting that larger agent populations lead to more diverse and helpful discussions.
Google has released the Gemini API Cookbook, a collection of 100+ notebooks to help you quickly start building with the Gemini API. The cookbook includes quickstarts, tutorials, and examples covering various features like multimodal inputs, JSON mode, function calling, and code execution; you can now explore the latest Gemini 2.0 capabilities and experiment with audio streaming and spatial understanding.
DeepSeek has officially released the DeepSeek V3 Base model. This open-source model is a 671B parameter Mixture-of-Experts model with 37B activated parameters per token. It boasts a 128K context window and 3x faster inference speed (60 tokens/second) than V2.
Trained on 14.8T high-quality tokens, in just $5.57 million.
Outperforms the state-of-the-art models including Claude 3.5 Sonnet, GPT-4o, and Llama 3 405B on reasoning, math, and coding benchmarks.
API pricing is $0.27/million input tokens ($0.07/million with cache hits) and $1.10/million output tokens starting Feb 8.
Available on Hugging Face with accompanying technical report.
Tools of the Trade
tldraw computer: Create workflows on a canvas using interconnected components representing text, images, and other elements. These components are linked by arrows, visualizing data flow, and they execute procedures using AI models to process and transform data.
Languine: AI tool to automate the translation of application content across 100+ languages by detecting changes in code using Git diff. It supports multiple file formats and integrates with version control systems.
Hercules: Open-source testing agent to automate end-to-end tests by converting Gherkin steps into executable actions, no code required. It uses a multi-agent system to interact with web browsers, handle complex UIs, and provide detailed test reports with videos and logs.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
> Sets up brand new Flash-Thinking
> Feels good
> "Quick Twitter break"
> Learns Flash-Thinking is ancient tech ~
Tom Dörrnobody should give or receive any career advice right now. everyone is broadly underestimating the scope and scale of change and the high variance of the future. your L4 engineer buddy at meta telling you “bro cs degrees are cooked” doesn’t know shit ~
roon
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply