• unwind ai
  • Posts
  • ChatGPT like Llama-3.1 Voice Mode

ChatGPT like Llama-3.1 Voice Mode

PLUS: Build AI Agent Apps, RAG v/s Long Contexts with Claude Sonnet 3.5

  • Explore future trends, infrastructure & scalability, and more

  • Learn from the best in Model Development and Performance Optimization

  • Get inspired by real-world case studies and success stories

Don’t miss out - use code SHUBHAM15 for 15% off your ticket! 

See you in Austin, TX, November 7-8, 2024

Today’s top AI Highlights:

  1. Build agentic apps with LangGraph’s new AI agent templates

  2. Opensource Llama 3.1-based voice AI with same latency as OpenAI’s advanced Voice Mode

  3. Boost your RAG accuracy with Anthropic's Contextual Retrieval

  4. Add code interpreting into your AI apps with this opensource SDK

  5. AI pair programmer with global codebase understanding & personalized memory

& so much more!

Read time: 3 mins

AI Tutorials

Gemini with Gmail looks great! But is it worth $20 a month?

In just 30 lines of Python, you can build an AI assistant that connects with your Gmail inbox, retrieves email content, and answers questions about your emails using RAG.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

🎁 Bonus at the end!

Latest Developments

LangGraph has just introduced a set of templates to help you build and deploy AI agent-based apps easily. These templates provide customizable entry points for building agentic apps, and they can be easily modified to fit specific use cases. Available in both Python and JavaScript, the templates are fully compatible with LangGraph Studio and LangGraph Cloud.

The best part? You can customize everything—from prompts to chaining logic—without getting locked into a specific provider, so you can tailor it to your project’s needs as it scales.

Key Highlights:

  1. Three ready-to-go templates - Get started with a ReAct agent, RAG chatbot, or data enrichment agent—covering common architectures to help solve real-world problems right away. You can get started with the templates here.

  2. Full control and customization - Clone the repo, modify the code, and fine-tune your agents. Change prompts, adjust workflows, and select the LLM or vector store that works best for your use case.

  3. Seamless debugging and deployment - All templates are set up to work with LangGraph Studio for easy visualizing and debugging. They can be easily deployed with a single click to LangGraph Cloud.

  4. Room to grow - LangGraph is kicking off with these three high-quality templates, but more are in the pipeline, meaning more tools to solve increasingly complex challenges will be available soon.

Just like OpenAI’s Sora, the craze for advanced Voice Mode will soon fade with more players offering their real-time voice AI assistant. And now, there are a bunch of opensource models available to experiment with. Here’s a voice AI LLaMA-Omni based on Llama 3.1 8B that understands speech and generates audio and text in response simultaneously. Its response latency can be as low as 226ms, comparable to or even faster than GPT-4o's audio latency! The architecture eliminates the need for speech transcription.

Key Highlights:

  1. Architecture - LLaMA-Omni utilizes the Whisper-large-v3 audio encoder and a HiFi-GAN vocoder, providing a robust foundation for speech understanding and high-quality speech synthesis.

  2. Efficient Training - Trained on a new 200K speech instruction dataset, LLaMA-Omni achieves strong performance in under 3 days on just 4 NVIDIA L40 GPUs, making it incredibly resource-efficient.

  3. Achieve GPT-4o Level Latency - LLaMA-Omni achieves an impressive response latency as low as 226ms, comparable to GPT-4o's audio latency. This means you can build highly responsive, real-time speech applications.

  4. Open Source - The model is open source and supports running inference locally. Full instructions for setup and running the model are available on GitHub.

Quick Bites

Anthropic introduces Contextual Retrieval, a novel technique addressing the loss of context when using RAG with large knowledge bases. Rather than splitting documents into isolated chunks and embedding them, it adds relevant context to each chunk before embedding or indexing, using Claude to generate chunk-specific explanatory context.

  • Reduces retrieval failures: Contextual Retrieval diminishes retrieval failures by up to 49%.

  • Further enhancement with reranking: Integrating a reranking step with Contextual Retrieval can reduce retrieval failures by up to 67%, ensuring only the most relevant chunks are used.

  • Cookbook to get started: Anthropic has given a cookbook to deploy your RAG app with Contextual Retrieval with Claude.

Perplexity has added a new "reasoning" focus (beta) on Perplexity for Pro users. It will use the new OpenAI o1-mini. There is no web search integration yet. The model is slow, and usage is limited because of rate limits. It is good for puzzles, math, and coding.

Amazon is adding new generative AI tools to its shopping experience, offering personalized product recommendations and enhanced descriptions based on users' preferences and purchase history.

For sellers, Amazon has introduced an AI video generator for creating product ads and a chatbot assistant to help improve business performance.

Tools of the Trade

  1. E2B's Code Interpreter SDK: Add code interpreting capabilities to your AI apps using a secure, sandboxed environment. It works with Python and JavaScript, supports LLMs, and runs AI-generated code safely on serverless or edge functions.

  2. Sourcetable: AI-powered spreadsheet that lets you analyze data, build financial models, and generate reports without any coding. Upload CSVs, query databases, create visual dashboards with live data updates, and work with complex data in one place.

  3. MagiCode: AI pair programmer in VS Code that anticipates your next steps and offers real-time assistance without needing prompts. It understands your entire project, keeps track of your edits, and provides proactive suggestions.

  4. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. NotebookLLM makes me think AGI has been achieved internally (at Google) ~
    gfodor.id

  2. one of my hot takes is that we won’t have embedding-based RAG in 10 years; if LLMs can crawl through data & index it, and LLMs can search the index directly…having LLM agents do indexing & querying is more MLOps-friendly than a separate model with unintuitive params ~
    Shreya Shankar

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

🎁 Bonus worth $50 💵 

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Unwind AI - Twitter | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉 

Reply

or to participate.