• unwind ai
  • Posts
  • Advanced Voice Mode using Local Llama 3.1

Advanced Voice Mode using Local Llama 3.1

PLUS: OpenAI Swarm with Groq and Anthropic, Adobe's new text-to-video AI model

In partnership with

Today’s top AI Highlights:

  1. Llama 3.1 can now listen, think, and speak - all in real-time

  2. Opensource framework for programming - not prompting- foundation models

  3. Transform static Physics diagrams in PDFs into interactive simulations

  4. Anthropic CEO pens down how “radical the upside of AI could be”

  5. A fork of OpenAI Swarm that supports Groq and Anthropic

& so much more!

Read time: 3 mins

AI Tutorials

Build a web-scraping AI agent that runs entirely on your local machine using Llama 3.2. With just a few lines of code, you can scrape any website and customize what information you want to extract, all powered by a locally running AI.

The AI agent uses Ollama to run the model locally and ScrapeGraphAI, a web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

🎁 Bonus worth $50 💵

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get an AI resource pack worth $50 for FREE. Valid for a limited time only!

Latest Developments

Stanford NLP team has introduced DSPy, a framework that changes how we build applications with LLMs. It simplifies complex LLM pipelines by automating prompt engineering and optimization. Instead of tweaking prompts manually, you can now define your application logic in Python and let DSPy handle the LLM interactions. This new approach allows for easier experimentation and adaptation to different LLMs. DSPy also introduces innovative optimization techniques that enhance the performance and reliability of LLM-powered applications.

Key Highlights:

  1. Automated Prompt Optimization - DSPy's optimizers learn from data to generate effective prompts, eliminating the need for manual tweaking and making it easier to achieve high performance. This is particularly helpful for complex multi-stage pipelines where prompt engineering can be a major bottleneck.

  2. Modular Design - DSPy separates application logic from LLM parameters (prompts and weights). This makes it simple to modify and experiment with different pipeline architectures, LLMs, or datasets without rewriting large portions of code.

  3. Support for Diverse LLMs - DSPy can "compile" the same program into different instructions and prompts optimized for various large language models, including GPT-4, T5-base, and Llama 13B. This simplifies the process of switching between LLMs or deploying to different environments.

  4. Data-Driven Optimization - DSPy's optimizers use a user-defined metric to fine-tune prompts and even model weights. This data-driven approach enables continuous improvement and adaptation to changing requirements. You can use just a few examples, or hundreds, depending on the optimizer.

Writer RAG tool: build production-ready RAG apps in minutes

  • Writer RAG Tool: build production-ready RAG apps in minutes with simple API calls.

  • Knowledge Graph integration for intelligent data retrieval and AI-powered interactions.

  • Streamlined full-stack platform eliminates complex setups for scalable, accurate AI workflows.

OpenAI's Realtime API is impressive, but pricey. If "free" sounds better, here’s an opensource alternative based on Llama 3.1. AI research lab Homebrew has released 🍓 Ichigo, a new multimodal checkpoint built on Llama 3.1 that directly processes speech and responds in voice, eliminating the need for separate transcription and text-to-speech models.

Ichigo was trained in three phases to address limitations in previous checkpoints, focusing on multilingual capabilities, retaining base model performance, and handling inaudible inputs. The model, code, and dataset are available to try and develop further.

Key Highlights:

  1. Early-fusion architecture - Ichigo employs an early-fusion approach, processing audio and text together for improved multimodal understanding, as inspired by Meta's Chameleon paper.

  2. Multi-turn Conversations and Inaudible Input Handling - It is trained to handle multi-turn conversations with speech input and intelligently reject inaudible or nonsensical audio, leading to a smoother user experience.

  3. Three-phase training - The training focused on incorporating multilingual speech data, recovering base LLM performance (MMLU score of 63.79), and specifically training the model to respond appropriately to inaudible input.

  4. Open-source - The model weights and code are publicly available under Apache 2.0 license for you to experiment, fine-tune, and integrate Ichigo into your apps. You can try out the demo here.

Quick Bites

University of Calgary students have developed "Augmented Physics," an AI tool that transforms static physics diagrams from textbooks into interactive simulations using Segment Anything model and multimodal LLMs like Gemini. This tool allows students to manipulate parameters, observe real-time effects, and create animations, transforming how physics is taught and learned.

Adobe kicked off the Adobe MAX 2024 event, introducing new powerful generative AI tools across its platforms for creators. Their text-to-video Firefly Video Model is now available in beta. It creates 5-second long videos from simple text prompts. It’s free to try on the Firefly website, probably with rate limits.
Adobe has also released the Generative Extend feature in beta that adds frames to the beginning or end of a video clip using the Firefly Video Model. Just grab, drag, and extend the clips.

Anthropic CEO Dario Amodei predicts that AI could radically transform fields like medicine, neuroscience, economies, and governance within 5-10 years, achieving progress that would otherwise take 50-100 years.
This is a similar timeline that Sam Altman aims to achieve AGI in. Whether you're a techno-optimist or an AI doomer, if the two most powerful AI companies are talking about such rapid transformations, it’s time to seriously consider how AI should be used responsibly for a better and safer future.

Tools of the Trade

  1. Microagent: Forked from OpenAI's Swarm project, it adds support for Groq and Anthropic LLMs while retaining the same agent semantics. It leverages the core concepts of Swarm, such as agent coordination and handoffs, while introducing new enhancements like real-time streaming of agents’ responses.

  2. Screenpipe: Opensource tool that records your screen and mic 24/7, storing data locally and connecting it to local LLM for search, automation, and more. It ensures data privacy by keeping everything on your computer, and offers various integrations for productivity tools and custom AI plugins.

  3. Finic: Provides browser infrastructure for developers building web scrapers, browser automations, and AI agents in Python. It does this by giving you a browser in the cloud you can control remotely using Playwright or Puppeteer (in just a few lines), or Selenium (with some work).

  4. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. Curious if there are any VCs investing like they actually expect AGI in the lifetime of their current funds.
    Many talk like AGI is in 5-10 years but invest like today's AI models are the best they are ever going to get. Revealed beliefs vs cheap talk? Failure of imagination? ~
    Ethan Mollick

  2. Hi OpenAI can you please upgrade your hardware to GroqInc? I am sure it will be faster than what you currently have. ~
    Marcial Messmer

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

🎁 Bonus worth $50 💵 

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it wtith at least one, two (or 20) of your friends 😉 

Reply

or to participate.