- unwind ai
- Posts
- Run 100B LLM on a CPU
Run 100B LLM on a CPU
PLUS: BabyAGI-2o, OpenAI Swarm in VS Code
Today’s top AI Highlights:
The simplest self-building general autonomous agent that builds and uses new tools as needed
Run up to 100B models locally 6x faster without a GPU
Sam Altman’s new eye-scanning Orb will prove you’re a real human
Google AI Studio’s Compare Mode lets you compare different Gemini models side-by-side
Use OpenAI’s multi-agent Swarm framework in VS Code to write production-grade code
& so much more!
Read time: 3 mins
AI Tutorials
Google search is great, but how about building your own AI research assistant that not only searches academic papers but also remembers your preferences based on your past queries? Sounds complex, right?
This tutorial breaks it down step-by-step, guiding you through building an AI agent that queries arXiv, processes results intelligently, and retains user context over time using memory storage.
The app combines several components: GPT-4o-mini for parsing search results, MultiOn for web browsing, and Mem0 with Qdrant to manage user-specific memory. With just 40 lines of code, you’ll have a personalized research assistant that gets smarter with every interaction.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
🎁 Bonus worth $50 💵
Latest Developments
Here is BabyAGI-2o, a compact, self-building autonomous agent written in just 174 lines of Python. This agent dynamically creates and utilizes tools as needed to execute user-provided tasks. It's a simplified exploration into autonomous agent creation and meant to be integrated with the BabyAGI 2 framework for persistent tool storage. It's surprisingly versatile, tackling tasks like web scraping, image analysis, and image generation, although reliability can vary.
Key Highlights:
Dynamic Tool Generation - The agent builds tools on-the-fly using a single LLM loop. It starts with three core tools: create_or_update_tool, install_dependencies, and task_completed. The create_or_update_tool function is dynamically loaded, enabling the agent to adapt to new task requirements.
Automatic Dependency Management - BabyAGI-2o automatically installs any Python packages required by the dynamically created tools via install_dependencies.
Iterative Error Handling - The agent can process errors encountered during execution, feeding the error information back into the LLM loop to refine the code until it works correctly.
Supported Models - BabyAGI-2o supports various language models with tool-calling capabilities, including GPT-4, Claude-2, and other models supported by LiteLLM so you can experiment with different models.
Microsoft has just opensourced a blazing-fast inference framework for 1-bit LLMs on CPUs, bitnet.cpp. A 1-bit LLM uses a compressed model format that reduces storage and memory requirements without sacrificing accuracy.
What makes this exciting? Even 100B parameter models can now run smoothly on a CPU, delivering speeds close to human reading (5-7 tokens per second). NPU and GPU support are on the way, but this release already brings impressive improvements for those who want to explore high-performance AI on local devices.
Key Highlights:
Performance Gains - bitnet.cpp delivers performance gains of up to 5.07x on ARM CPUs and up to 6.17x on x86 CPUs, significantly enhancing the speed of LLM inference.
Energy Efficiency - The framework reduces energy consumption by 55.4% to 70% on ARM CPUs and 71.9% to 82.2% on x86 CPUs, making it a sustainable option for running AI models.
Local Deployment - Models like BitNet b1.58 and Llama3-8B-1.58 can now run locally on CPUs with optimized kernels, making high-performance AI more accessible to developers.
Set up - The installation process supports Python, CMake, and Clang, with Conda recommended for easy setup. Models from Hugging Face can be integrated easily so you can quickly test real-world performance and benchmarks.
Quick Bites
Worldcoin, the crypto project co-founded by Sam Altman, has rebranded to "World" and introduced a new iris-scanning Orb that verifies human identity to distinguish people from AI online. The company aims to expand its verified user base by introducing verification at coffee shops and partnering with delivery service Rappi for home verifications.
Anthropic has released another Claude project as part of the Anthropic Quickstart collection. This is a financial data analyst powered by Claude 3 Haiku & Claude 3.5 Sonnet. It lets you upload spreadsheets, documents, or financial charts and get instant insights with visualizations. Deployable in seconds from GitHub, it supports various file formats and generates charts like line, bar, and pie for detailed financial analysis.
Since Mira Murati has stepped down from OpenAI, rumor on the street is that she’s planning to raise funding for her own AI startup and poaching OpenA employees. Her venture will reportedly focus on building AI products based on proprietary models and could raise more than $100 million in this round.
Google AI Studio’s new feature Compare Mode lets you prompt and evaluate Gemini models’s responses side-by-side within the Studio. It will help you understand the critical tradeoffs involved in model selection, such as cost, latency, token limits, and response quality.
Tools of the Trade
OpenAI Swarm in VS Code: AI code assistant CodeGPT now uses OpenAI's new opensource multi-agent AI framework Swarm to help you write production-grade code within VS Code. It helps you set up, customize, and deploy your projects using the Swarm framework. You can try out this web version here.
Manicode: A CLI tool that lets you generate and edit code directly from your terminal using simple prompts. It can install packages, run scripts, and make context-aware edits across your codebase without clicks or lock-in to any specific IDE.
Virtual Try-On Prototype: A virtual clothing try-on app built using Flask, Twilio's WhatsApp API, and Gradio's virtual try-on model. You can send images via WhatsApp to Twilio to try on garments virtually, and you’ll get the results back on the same channel.
Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.
Hot Takes
Unpopular Opinion: Google will achieve AGI first, and we likely have only a few thousand days left to witness it. ~
Ashutosh ShrivastavaDo not expect OpenAI to release anything important besides the full O1 model before the election.
OpenAI showed regulators GPT-5, and I imagine they agreed to wait until after the election to introduce the next generation. ~
Haider.
Meme of the Day
How I deal with impersonators
— Jason (@mytechceoo)
8:53 PM • Oct 19, 2024
That’s all for today! See you tomorrow with more such AI-filled content.
🎁 Bonus worth $50 💵
Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for a limited time only!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply