Llama 3.1 Can Now Listen

PLUS: Grok 2 mini got 2x faster, Mistral NeMo now with 8B parameters

Today’s top AI Highlights:

  1. Llama 3.1 8B is ready to listen, without transcription

  2. NVIDIA releases small 8B language model with state-of-the-art accuracy

  3. Grok 2 mini is now 2x faster and more accurate

  4. AI data scientist for Jupyter and Metabase

& so much more!

Read time: 3 mins

Latest Developments

AI research lab Homebrew has released Llama3-s v0.2 multimodal checkpoint for enhanced speech understanding. Built upon Llama 3.1 8B, this opensource model leverages semantic tokens, drawing inspiration from WhisperVQ, to directly process audio data, bypassing the traditional transcription step. This results in faster processing and allows the model to understand and respond to speech almost real-time.

Key Highlights:

  1. Training Data - The team used a combination of real human speech (MLS-10k dataset) and a synthetically generated speech dataset to equip it to handle various speech patterns and accents.

  2. Semantic Token Efficiency - Unlike acoustic tokens, semantic tokens offer better compression and consistent speech-feature extraction, leading to faster and more efficient audio processing.

  3. Outperforming Competition - Llama3-s v0.2 surpasses models like SALMONN, Qwen-Audio, and WavLLM on the ALPACA-Audio evaluation, demonstrating its superior performance in speech understanding tasks.

  4. Open-Source - The model is opensource. You can try the demo here.

NVIDIA has released Mistral-NeMo-Minitron 8B, a smaller, more efficient version of their Mistral NeMo 12B language model. This new model boasts state-of-the-art accuracy in a compact size. It excels in tasks like chatbots, virtual assistants, and content creation, and is even small enough to run on NVIDIA RTX-powered workstations. The model leverages both pruning and distillation techniques to achieve this impressive balance of size and performance.

Key Highlights:

  1. High Accuracy, Low Compute - Mistral-NeMo-Minitron 8B offers comparable accuracy to the original 12B model but with significantly lower computational requirements. This makes it easier and more cost-effective to deploy, especially for resource-constrained environments.

  2. Deployment Options - The model is available as an NVIDIA NIM microservice with a standard API, downloadable from Hugging Face, and soon as a downloadable NIM for deployment on any GPU-accelerated system. This flexibility allows for integration into various workflows and infrastructures.

  3. Further Customization - You can leverage the NVIDIA AI Foundry to further prune and distill the 8B model for even more specific use cases and devices, like smartphones or embedded systems, while retaining high accuracy.

Quick Bites

  1. Build a fully working web app with just voice commands. Use Better Dictation tool in Cursor Compose to give instructions to the AI assistant by voice and watch it code like magic.

  2. Grok 2 mini is now 2x faster. After rewriting the inference stack from scratch, both the models have gotten faster and slightly more accurate.

  1. Apple is holding its Apple Event on September 9 where it will showcase the iPhone 16 lineup. Apple Intelligence is rumored to be a big part of the event. Currently, only iPhone 15 Pros support Apple Intelligence, but the entire iPhone 16 lineup might be compatible with it.

  2. After Microsoft poached Inflection AI’s CEO leaving the company high and dry, Inflection is capping free access to its chatbot Pi, shifting focus toward enterprise products. The company is also letting users export their conversations as it navigates resource constraints and explores licensing its AI models to businesses.

  3. AI coding platform Cursor has raised $60 million in Series A funding from Andreessen Horowitz, OpenAI and other investors at a $400 million valuation.

Tools of the Trade

  1. MinusX: A Chrome extension that automates data analysis by interacting with your existing tools like Jupyter and Metabase. It uses AI to click, type, and perform tasks directly within your analytics apps without changing your workflows.

  1. CursorLens: An opensource dashboard for Cursor IDE. Log AI code generations, track usage, and control AI models (including local ones). Run locally or use upcoming hosted version.

  2. Trellis: AI tool that transforms unstructured data, like financial documents and emails, into structured, SQL-ready formats for data and operations teams. It integrates with over 200 data sources and supports various formats.

  3. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. Future be like tab tab tab ~
    Andrej Karpathy

  2. This is now one of my favorite ways to think about generative AI: it’s an inherently unreliable technology, which explains why it’s so unintuitively difficult to learn how to get good results from it - and why so many people are rejecting it as useless or over-hyped ~
    Simon Willison

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

Unwind AI - Twitter | LinkedIn | Instagram | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one (or 20) of your friends!

Reply

or to participate.