- unwind ai
- Posts
- Multimodal Voice AI Agents
Multimodal Voice AI Agents
PLUS: Knowledge Table better than GraphRAG, Messages Batch API for Claude
Receive Honest News Today
Join over 4 million Americans who start their day with 1440 – your daily digest for unbiased, fact-centric news. From politics to sports, we cover it all by analyzing over 100 sources. Our concise, 5-minute read lands in your inbox each morning at no cost. Experience news without the noise; let 1440 help you make up your own mind. Sign up now and invite your friends and family to be part of the informed.
Today’s top AI Highlights:
Opensource Knowledge Table gives 2x better accuracy than GraphRAG
Build AI Agents that handle real-time audio and text seamlessly
Anthropic releases new API to batch 10,000 queries at half the cost
Opensource tool that builds entire apps from scratch by talking to you
The easiest way to create AI apps powered by OpenAI API
& so much more!
Read time: 3 mins
AI Tutorials
Building AI tools that can handle customer interactions while retaining context is becoming increasingly important for modern applications.
In this tutorial, we’ll show you how to create a powerful AI customer support agent using GPT-4o, with memory capabilities to recall previous interactions.
The AI assistant’s memory will be managed using Mem0 with Qdrant as the vector store. The assistant will handle customer queries while maintaining a persistent memory of interactions, making the experience seamless and more intelligent.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
🎁 Bonus worth $50 💵
Latest Developments
Dealing with multi-document extraction and retrieval in RAG (Retrieval-Augmented Generation) systems isn't just difficult — it’s time-consuming and prone to errors, especially if you're after granular responses instead of broad summaries. WhyHow.AI has opensourced Knowledge Table, a package that simplifies extraction, organizes data into a structured form, and integrates smoothly with your RAG pipeline.
If you aim for precision and control in your AI systems, you will find this tool invaluable. It introduces a tabular intermediary in the backend, significantly improving the quality of graph-based retrieval.
Key highlights:
Boost in accuracy - Knowledge Table improves multi-document retrieval accuracy by 2.5x, beating even well-known tools like Text2Cypher and GraphRAG.
Custom extraction rules - You get granular control over the extraction process, with the ability to define what must or may be returned, reducing unwanted noise in your data.
Ontology-powered queries - The query engine lets you directly interact with the structured data using tools and specific columns, making your retrievals both fast and precise.
Flexible export options - Extracted data can be exported in CSV or Graph Triples format, allowing you to extend your own processes or integrate with WhyHow’s platform seamlessly.
OpenAI and LiveKit have released a MultimodalAgent API, designed for building real-time AI applications that can handle both audio and text seamlessly, using Realtime API. This API allows ultra-low latency interactions, perfect for creating voice assistants, real-time transcription tools, and conversational agents. The API abstracts much of the complexity, so you can focus on building your AI without worrying about communication protocols. You can try the API in the LiveKit playground and get started with building advanced AI agents that interact with users in real-time.
Key Highlights:
Realtime Audio Streaming with Low Latency - The API uses WebRTC to ensure fast communication, perfect for voice apps or any use case needing quick responses.
Multimodal Support - Handle audio, video, and text inputs in one go, making it ideal for building agents that can listen, see, and respond in real-time. No need to stitch different services together.
Plug-and-Play with Major Models - Easily add support for popular LLMs, text-to-speech, and transcription services like Google, DeepGram, and OpenAI with pre-built plugins—saving you hours of work.
Works Everywhere - Whether you’re developing locally, on your own servers, or using LiveKit Cloud, the setup is identical, and you can scale it whenever you need.
Quick Bites
Anthropic has introduced a new Message Batches API, allowing you to process up to 10,000 queries per batch asynchronously at 50% less cost than standard API calls. The API, now in public beta, supports all Claude models and processes batches within 24 hours, making it ideal for non-time-sensitive tasks.
The Godfather of AI Geoff Hinton and AI pioneer John Hopfield have been awarded the 2024 Nobel Prize in Physics for their groundbreaking work on artificial neural networks.
LangChain has introduced long-term memory support in LangGraph for AI agents to store and recall information across conversations in Python and JavaScript. This new feature allows AI applications to adapt to user preferences, with built-in persistent memory available for all LangGraph Cloud & Studio users.
LM Studio 0.3.4 now ships with Apple's MLX engine for fast on-device LLMs for Apple Silicon Macs, including models like Llama 3.2 running at 250 tokens per second on M3 Max. This update also supports using MLX models via Chat UI or code, structured output in JSON, and vision models like LLaVA.
Tools of the Trade
Pythagora: An opensource VS Code extension that builds production-ready apps from scratch and talks to you whenever it needs a creative decision or feedback. It keeps you, the human, in the loop throughout the entire development process. It manages the entire codebase and talks to you in natural language.
openai-gradio: A Python package that makes it very easy for developers to create web apps that are powered by OpenAI API in just a few lines of code.
LLM Chatbot Web UI: a Gradio-based chatbot using LangChain and Hugging Face for conversational AI and PDF retrieval. It allows control over text generation settings and PDF-based queries with RAG.
Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.
Hot Takes
AC currently broken at OpenAI office - AGI delayed four days ~
Steven HeidelA lot of engineering teams complain that interview candidates cheat using AI.
This is not cheating; it is resourcefulness 😀
The answer is to let candidates use any LLM or AI assistant they want to solve the interview questions.
This will also force you to ask the right questions — ones that AI cannot solve without human input. ~
Bindu Reddy
Meme of the Day
That’s all for today! See you tomorrow with more such AI-filled content.
🎁 Bonus worth $50 💵
Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it wtith at least one, two (or 20) of your friends 😉
Reply