• unwind ai
  • Posts
  • Llama 3 Inference gets 97% Faster

Llama 3 Inference gets 97% Faster

PLUS: Human-in-the-loop AI Agents, OpenAI o1 Engineer, OpenAI raises $6.6B

In partnership with

Writer RAG tool: build production-ready RAG apps in minutes

  • Writer RAG Tool: build production-ready RAG apps in minutes with simple API calls.

  • Knowledge Graph integration for intelligent data retrieval and AI-powered interactions.

  • Streamlined full-stack platform eliminates complex setups for scalable, accurate AI workflows.

Today’s top AI Highlights:

  1. PyTorch introduces torchao for faster LLM inference and lower memory use

  2. Build production-ready AI agents and expose their operations to end users

  3. OpenAI raised $6.6B in new funding to continue building AGI

  4. OpenAI o1 engineer turns your command line into a coding assistant

  5. The first AI text-to-show generator that generates characters, scrips, and scenes with a single prompt

& so much more!

Read time: 3 mins

AI Tutorials

Cutting-edge AI apps don’t always require the cloud. Today we are combining local Llama 3.1 with tool-use, making it possible to interact with real-world data sources while the AI itself runs locally.

In this tutorial, we’ll guide you through building a local Llama 3.1-powered assistant that integrates tools like Yahoo Finance for stock data and SerpAPI for web searches. While the assistant itself operates locally, it will access real-time data through these external APIs.

Not just that! You can even select which tools (YFinance and/or SerpAPI) you want the assistant to use via checkboxes in the sidebar.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

🎁 Bonus worth $50 💵

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get an AI resource pack worth $50 for FREE. Valid for a limited time only!

Latest Developments

The PyTorch team has released torchao, a new library to optimize your PyTorch models. torchao focuses on reducing model size and increasing speed by using low-bit data types, quantization, and sparsity, while maintaining accuracy.

This library is accessible, primarily written in PyTorch code, and includes composable APIs that work on any model, including popular ones like Llama 3 and Diffusion models. Using torchao achieved a 97% speedup for Llama 3 8B inference with minimal accuracy loss. Similar results were seen with Flux1.dev - a 53% speedup for inference on float8 dynamic quantization.

Key Highlights:

  1. Effortless Quantization - Apply weight-only or dynamic activation quantization with various data types and sparse layouts using a simple quantize_ API. An autoquant feature even handles layer-specific optimization.

  2. Boosted Training Efficiency - torchao enables low-precision compute and communication, including float8 for torch.nn.Linear layers, leading to significant speedups in training large models.

  3. Memory-saving Optimizers - Reduce optimizer memory footprint by 2x-4x using 8-bit or 4-bit quantized optimizers, available as drop-in replacements for AdamW.

  4. Seamless Integration - torchao is designed for compatibility with popular tools like Hugging Face Transformers, Diffusers, torchtune, and FSDP2, ensuring smooth integration into existing workflows. Here’s the GitHub repo.

CopilotKit, the popular open-source framework for integrating AI copilots into applications, just got a powerful upgrade. It now introduces CoAgents - a new way to build human-in-the-loop AI agents.

CoAgents allow end-users to monitor and steer AI agents in real-time, ensuring better task completion and allowing human intervention when needed. This new feature integrates with LangGraph Studio, the visual IDE for building AI agents, making it easier to track the agent’s decisions step by step and respond as needed.

Key Highlights:

  1. Real-time agent monitoring - CoAgents allow end-users to track AI agent activities as they happen, ensuring full visibility into the agent's decision-making process. They can also bring agents back on track when they drift.

  2. Shared State - Enables seamless collaboration between agents and users by synchronizing data between the application and the agent. This means both are working with the same understanding of the context.

  3. LangGraph integration - CoAgents takes LangGraph Studio a step forward with streaming live agent states from different nodes of the workflow. This gives developers and users a real-time view of each step the agent is processing, helping them understand and intervene if necessary.

  4. Flexible agent control - Developers can manually emit agent states or messages through the frontend, offering fine-tuned control over how AI agents interact with the application and its users.

  5. Getting Started - Check out this demo of a Perplexity clone where the agent state is streamed live to the front-end. Check out the documentation for setting up agent states, streaming processes, and integrating human-in-the-loop systems

Quick Bites

OpenAI has finally closed $6.6 billion in its latest funding round at a soaring post-money valuation of $157 billion. This round was led by Thrive Capital in which it invested around $1.3 billion. Microsoft invested a little less than $1 billion, while Nvidia pledged $100 million and SoftBank put in $500 million.

OpenAI’s projected loss for this year is $5 billion. If this funding is towards AGI research or just saving them from bankruptcy, we can’t know 🤷‍♀️

Google is reportedly building an AI model similar to OpenAI’s o1 models. These models will mimic human reasoning to solve complex problems in areas like math and programming, using chain-of-thought prompting.

Text-to-video AI platform Pika Labs has released its latest version, Pika 1.5 with some eye-popping effects that defy the laws of physics. Calling them “Pikaffects”, these let you transform an object into a bizarre state. Explode, melt, crush, or inflate anything. Be wild with your imagination and Pika will do it!

Tools of the Trade

  1. o1-engineer: A coding assistant built from the ground up to leverage o1 reasoning capabilities. It can create and edit multiple files or entire folders, plan complex projects, execute them, and write code reviews, all from your terminal.

  2. JARS.ai: The first AI text-to-show generator. Just give it a simple prompt on the basic narrative, and it automatically generates characters, casts them into the show, and sets up their world to create an engaging parody or TV-inspired content.

  3. GraphRAG-UI: A web interface that simplifies using GraphRAG for managing and querying large text data with the RAG. It supports local LLMs through platforms like Ollama, offering tools for index management, query execution, and more.

  4. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. solving problems using LLMs that can be solved by fine-tuning BERT is a skill issue ~
    merve

  2. If you think Yann LeCun is out of touch, you should see Geoff Hinton and Yoshua Bengio. ~
    Pedro Domingos

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

🎁 Bonus worth $50 💵 

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Unwind AI - Twitter | LinkedIn | Threads | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it wtith at least one, two (or 20) of your friends 😉 

Reply

or to participate.