Voice AI Agents with RAG

PLUS: Meta's video generation model, Apple Intelligence releasing in October-end

In partnership with

Learn AI Strategies worth a Million Dollar in this 3 hour AI Workshop. Join now for $0

Everyone tells you to learn AI but no one tells you where.

We have partnered with GrowthSchool to bring this ChatGTP & AI Workshop to our readers. It is usually $199, but free for you because you are our loyal readers šŸŽ

This workshop has been taken by 1 Million people across the globe, who have been able to:

  • Build business that make $10,000 by just using AI tools

  • Make quick & smarter decisions using AI-led data insights

  • Write emails, content & more in seconds using AI

  • Solve complex problems, research 10x faster & save 16 hours every week

Youā€™ll wish you knew about this FREE AI Training sooner (Btw, itā€™s rated at 9.8/10 ā­)

Todayā€™s top AI Highlights:

  1. Meta unveils AI model that generates, edits, and personalizes videos

  2. Build RAG apps with real-time voice interactions running locally using OpenAI Realtime API

  3. Google's opensource project turns real objects into interactive AR portals

  4. Apple Intelligence is coming by this month-end

  5. Instruct AI agent with your voice as it crawls websites in real-time

& so much more!

Read time: 3 mins

AI Tutorials

Building AI tools that can handle customer interactions while retaining context is becoming increasingly important for modern applications.

In this tutorial, weā€™ll show you how to create a powerful AI customer support agent using GPT-4o, with memory capabilities to recall previous interactions.

The AI assistantā€™s memory will be managed using Mem0 with Qdrant as the vector store. The assistant will handle customer queries while maintaining a persistent memory of interactions, making the experience seamless and more intelligent.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

šŸŽ Bonus worth $50 šŸ’µ

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get an AI resource pack worth $50 for FREE. Valid for a limited time only!

Latest Developments

Meta is also setting up an audacious goal for Meta AI's user base. With currently 400 million active users, Meta is providing a state-of-the-art AI assistant that not just responds to your text but also generates images, responds to your voice, even edits images from your text prompts, all for free!

Taking these capabilities multiple steps forward, Meta has released a report on its next generation of foundation models MovieGen for video and audio generation. These models will allow you to generate and edit videos from simple text prompts. But the real innovation lies in generating personalized videos featuring your own face in realistic scenarios. The system can also create audio tracks and sound effects tailored to video and text inputs.

Key Highlights:

  1. HD Video Generation - Generate HD videos of up to 16 seconds at 16 fps using text prompts and images. The model excels at representing object motion, subject interactions, and camera movements even for complex video sequences.

  2. Precise Video Editing - Precisely edit videos using text prompts. You can add, remove, or replace video elements with great control while preserving the original content.

  3. Personalized Video - Go beyond standard text-to-video. Give it your images and generate videos featuring you, adapting to the text prompt while preserving your identity and motion in the generated video.

  4. Synchronized Audio Generation - Generate high-fidelity audio synchronized with video content. This includes sound effects, background music, and ambient sounds.

  5. Under the Hood - MovieGen is a family of two foundation models - Movie Gen Video (30B parameters) and Movie Gen Audio (13B parameters). The personalized video generation and video editing capabilities were added to Movie Gen Video model via post-training procedures.

We are as excited as you are to try the models and build on top of them. Meta might even opensource the models (while we keep waiting for Sora, or maybe not anymore).

Thinking of combining RAG with voice interaction? Hereā€™s VoiceRAG by Microsoft, a ready-to-use pattern that combines the GPT-4o Realtime API with Azure AI Search, specifically for real-time voice-based apps. This solution offers real-time audio interaction along with secure, backend-managed retrieval of relevant data. The best part? You can run it locally to experiment with real-time voice-driven apps. The model handles live conversations and can make retrieval calls without interrupting the flow.

Key Highlights:

  1. Function Calling for RAG - The GPT-4o model uses function calling to invoke searches in the knowledge base during a live conversation, seamlessly integrating external data retrieval.

  2. Middle Tier for Security - All model configurations and API keys are handled on the backend, ensuring the client side is secure and doesn't require direct access to sensitive components.

  3. Full-Duplex Audio - The Realtime API supports full-duplex audio for live back-and-forth conversations while handling data lookups in the background.

  4. Run Locally - You can set up and run the VoiceRAG application locally, with the full code available on GitHub. The setup instructions guide you through configuring Azure OpenAI and AI Search services.

Quick Bites

Google has launched XR-Objects, an open-source augmented reality prototype that turns real-world objects into interactive digital tools using real-time object segmentation and multimodal LLMs. When you interact with an object, XR-Objects shows context menus with related information, lets you compare objects, and places interactive widgets in 3D space.

The iPhone 16 lineup marketing was heavily based on Apple Intelligence but itā€™s not here yet. A report says that a few Apple Intelligence features are rolling out on October 28 with the release of iOS 18.1. The early features will include the writing tools, notification summaries, and revamped Siri.

Meta released new features in the Meta Ray Ban smart glasses where the AI-assisted camera lets you ask questions about your surroundings and save locations as memories, sparking questions about privacy and the use of this data. Meta declined to comment on whether itā€™ll use these images to train its future AI models, hinting that Meta eventually might use this data.

Tools of the Trade

  1. OpenAI Realtime Console with Firecrawl: Control an AI agent with your voice to crawl and browse websites in real-time using OpenAIā€™s Realtime API and Firecrawl. It runs locally in a React app, utilizing full-duplex audio and function calling for tasks like web crawling.

  2. Voice_chat_pdf: Uses OpenAIā€™s Realtime API to chat with PDFs by integrating a simple RAG system built on LlamaIndex in Next.js. After generating document embeddings, you can interact with the PDFs via real-time voice commands.

  3. Auto_Jobs_Applier_AIHawk: Automates job applications using AI, allowing you to apply to multiple jobs efficiently. It personalizes applications and resumes for each position.

  4. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. first, we use ai to apply to jobs. then, we use ai to screen candidates. next thing we know weā€™re being assigned jobs at birth. ~
    Yohei

  2. As much shit as I give OpenAI they are totally about to nuke a good portion of white collar work and nobody seems to notice. 85% of my audience still doesn't believe we have AGI when AI has:
    1. Surpassed most human coders
    2. Crushes the LSAT and qualifies for MENSA
    3. Can provide better diagnostic interpretation than humans
    4. Can aid in cancer research
    This is what I mean when I say I'm bored of AI. Mostly I'm bored of humans. Y'all will catch up when AI bitchslaps you into the 21st century. ~
    David Shapiro

Meme of the Day

Thatā€™s all for today! See you tomorrow with more such AI-filled content.

šŸŽ Bonus worth $50 šŸ’µ 

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it wtith at least one, two (or 20) of your friends šŸ˜‰ 

Reply

or to participate.