• unwind ai
  • Posts
  • Build Your Own Llama 3.2 from Scratch

Build Your Own Llama 3.2 from Scratch

PLUS: New Whisper model by OpenAI, Autonomous coding agent in VS Code

Today’s top AI Highlights:

  1. OpenAI reduces Whisper Large model to half the size, Groq ramps up speed

  2. Build Your Own Llama 3.2 with this Jupyter Notebook implementation

  3. Run AI models locally to chat with Microsoft Word and Excel files

  4. OpenAI chooses a public benefit structure to protect itself from takeovers

  5. Autonomous coding agent that edits files, fixes bugs, and monitors your terminal

& so much more!

Read time: 3 mins

AI Tutorials

Building AI tools that can handle customer interactions while retaining context is becoming increasingly important for modern applications.

In this tutorial, we’ll show you how to create a powerful AI customer support agent using GPT-4o, with memory capabilities to recall previous interactions.

The AI assistant’s memory will be managed using Mem0 with Qdrant as the vector store. The assistant will handle customer queries while maintaining a persistent memory of interactions, making the experience seamless and more intelligent.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

🎁 Bonus worth $50 💵

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get an AI resource pack worth $50 for FREE. Valid for a limited time only!

Latest Developments

Last week, OpenAI released Whisper Large v3 Turbo, a streamlined version of Whisper Large v3 with significantly reduced decoder layers—from 32 to just 4. It is nearly 50% smaller than the non-Turbo Large v3 model, reducing the parameter count from 1.55B to 0.8B. This change results in a much faster model with only a slight compromise in quality, making it more practical for many real-time use cases. It's now available in Hugging Face's Transformers library, for speech recognition tasks that balance speed and accuracy.

Groq has also integrated Whisper Large v3 Turbo into GroqCloud, making it accessible to developers at competitive pricing for fast multilingual speech recognition.

Key Highlights:

  1. Whisper Large Optimized - Whisper Large v3 Turbo, by OpenAI, offers a pruned decoder with 4 layers, boosting speed while keeping a minor quality trade-off compared to the original 32-layer Whisper Large v3.

  2. Usage - The model is supported in the Hugging Face Transformers library and allows for different decoding strategies, such as temperature fallback and timestamp prediction.

  3. 216x Faster on Groq - Groq's implementation of Whisper Large v3 Turbo on GroqCloud reaches an impressive speed factor of 216x real-time, offering fast and efficient automatic speech recognition (ASR).

  4. Competitive API Price - Available at $0.04 per hour, Groq's Whisper Large v3 Turbo provides a cost-effective ASR solution for industries needing rapid, multilingual transcription, such as customer service, education, and media.

If you've been eager to get your hands more dirty with Llama 3.2, look no further! Here’s a new, minimal Jupyter Notebook implementation of Llama 3.2 (1B and 3B) from scratch, providing you with a concise codebase to explore and experiment. The code is open-source (Apache 2.0), ready for you to tinker with and adapt.

Key Highlights:

  1. Clean architecture, readily available - Explore the inner workings of Llama 3.2 with a straightforward implementation using torch and tiktoken. Grasp the core concepts of Grouped-Query Attention and RoPE and leverage the code for your own projects.

  2. Run it anywhere (almost!)- Run Llama 3.2 even on a MacBook Air with adequate RAM, thanks to bfloat16 precision and smart memory management with buffer reuse. Learn how to optimize LLMs for diverse hardware.

  3. Weights at your fingertips - Download the pre-trained weights and tokenizer directly from the Hugging Face Hub. Clear instructions guide you through authentication and license acceptance.

  4. Generate text, experiment, learn: Start generating text immediately with the included example. Explore different prompts, sampling methods, and text cleaning techniques to customize text generation.

Quick Bites

Google DeepMind CEO Demis Hassabis along with Director John Jumper are being awarded the 2024 Nobel Prize in Chemistry for their work in creating the AlphaFold AI model. They are sharing the prize with David Baker, head of the Institute for Protein Design at the University of Washington.

Nomic’s platform for interacting with AI models locally GPT4All, has been upgraded. It now supports faster models (Llama 3.2 Instruct 1B and 3B) that work faster and smoother on low-end devices. The update also adds support for Excel and Word so you can directly integrate these file types into chat sessions for enhanced interaction.

OpenAI is restructuring as a public benefit corporation to shield itself from hostile takeovers and keep control firmly in its own hands. By adopting this model, OpenAI seems to be making a calculated move to balance investor interests while maintaining its narrative of benefiting humanity - giving it the leverage to wave off unwelcome outsiders.

OpenAI is reportedly working on a new feature in the Advanced Voice Mode for MacOS where you will be able to create short voice clips of your chat and share these clips.

Tools of the Trade

  1. Cline (prev. Claude Dev): Autonomous coding agent that integrates with your CLI and editor to help with tasks like creating and editing files, running commands, and fixing bugs. It provides a human-in-the-loop GUI to approve every file change and terminal command.

  2. GitHub → LLM: A free web tool that converts GitHub links (projects, folders, or files) into a text format readable by LLMs. Just paste the link and get the formatted text ready. The tool automatically excludes binary files like images, audio, etc., and is also incredibly fast!

  3. Candle: A lightweight ML framework for Rust that runs AI models locally, supporting both CPU and GPU. It supports popular models like Llama and Stable Diffusion, ideal for fast, serverless deployment.

  4. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. I was particularly fortunate to have many very clever students, much cleverer than me, who actually made things work. They've gone on to do great things. I'm particularly proud of the fact that one of my students fired Sam Altman. ~
    Geoffrey Hinton

  2. There's millions on the table for anyone who builds a solid realtime AI audio app in the next few months. ~
    Pietro Schirano

  3. 1-person companies that rely on AI are on the rise
    Why do you need a copy writer, website programmer, a designer or lawyer when AI can do all the heavy lifting?
    You can also get rid of a HR system, CRM and other expense SaaS software
    Soon there will be millions of 1-2 person businesses and some of them will be wildly successful ~
    Bindu Reddy

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

🎁 Bonus worth $50 💵 

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Unwind AI - X | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it wtith at least one, two (or 20) of your friends 😉 

Reply

or to participate.