unwind ai
Posts
Run Llama on your Old Windows PC

Run Llama on your Old Windows PC

PLUS: Multimodal AI agents, Blazing fast graph database for RAG

Shubham Saboo & Gargi Gupta
December 30, 2024

Today’s top AI Highlights:

Build multimodal AI agents that can talk, hear, and process in real-time
Graph database for blazing-fast RAG with large-scale data
Running Llama 2 model on Windows 98 Pentium II machine
Generate production-ready Next.js apps with this tool

& so much more!

Read time: 3 mins

AI Tutorials

Ever had your RAG system confidently give completely irrelevant information? Or watched it stubbornly stick to outdated data when better sources were just a web search away? You're not alone. Traditional RAG systems, while powerful, often act like that one friend who never admits when they need to double-check their facts.

In this tutorial, we'll fix that by building a Corrective RAG Agent that implements a multi-stage workflow with document retrieval, relevance assessment, and web search. Using LangGraph's workflow capabilities, we'll create a system that can evaluate its responses, adapt on the fly, and even reach out to the web when its local knowledge falls short. Think of it as RAG with a built-in fact-checker and research assistant.

We'll combine the analytical prowess of Claude 3.5 Sonnet with LangGraph's flexible workflow engine. By the end of this tutorial, you'll have a RAG system that's not just smarter but also more honest about what it knows (and doesn't know).

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build a Corrective RAG Agent

Fully functional agentic RAG app using Claude 3.5 Sonnet (step-by-step instructions)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Build Real-time Multimodal AI Agents 🔊💻

LiveKit Agents is a framework to build interactive, AI applications capable of real-time voice, video, and text interactions, all managed through WebRTC. This framework isn't just another API; it lets you create stateful, long-running AI agents that can handle everything from voice conversations to real-time video analysis.

Built for programmers who want to avoid complexity without sacrificing functionality, LiveKit Agents manages the WebRTC transport and media handling while letting you focus on building the agent's core logic. You can develop locally with the included worker service and seamlessly deploy to production without changing code.

Key Highlights:

Familiar Development Tools - LiveKit Agents allows you to build multimodal AI experiences using Python or Node.js, so you don't have to learn a new language. The framework handles the complexities of WebRTC transport, media devices, and encoding, you need to focus only on the agent’s logic.
Flexible Plugin System - It offers a wide selection of pre-built plugins for major providers like OpenAI, Deepgram, Google, and ElevenLabs, making integration simple. You can also create custom plugins for your own preferred AI models or providers if needed.
Manage Agent State - Agents are designed to be stateful and maintain context across user interactions. They handle everything from managing the conversation state, to buffering responses from the model, and sending them to the user in real-time.
Transcriptions and Function Calling - You can send synchronized transcriptions to your frontend, and implement complex workflows with integrated LLM function calls for accessing external tools or triggering actions within your app.
Testing Environment - Start building and testing your agents instantly with the included Agents Playground. Run your agent locally during development, then deploy to production without changing code or managing complex infrastructure transitions.

Build Super Fast RAG Apps with this Graph Database 📊🕸️

FalkorDB is a graph database built to handle massive-scale knowledge graphs with blazing fast performance. Its new architecture goes beyond simple nearest-neighbor searches, and incorporates linear algebra under the hood to provide a system that is fast and accurate.

It integrates seamlessly with leading LLM frameworks while maintaining sub-millisecond query response times, even when handling complex graph operations. The included GraphRAG SDK allows developers to rapidly build production-ready RAG applications with direct access to the knowledge graph's relationship data.

Key Highlights:

Low-Latency Operations - FalkorDB's implementation of GraphBLAS sparse matrix operations allows it to process complex graph queries in sub-milliseconds. The architecture eliminates the overhead of heavy dependencies while maintaining high throughput, letting you build responsive applications that can scale with your data.
Production-Ready Features - Everything you need for production deployment comes ready out of the box - access control, data persistence, replication, monitoring, and cluster support. The database includes comprehensive logging tools that track queries, performance metrics, and system health.
RAG Tools - The GraphRAG SDK handles the heavy lifting of building RAG applications by managing ontology creation, knowledge graph construction, and LLM integration. It supports major providers like OpenAI, Google, and Azure, while letting you customize the pipeline for your specific use case.
Integration Options - Built-in support for both RESP and Bolt protocols means you can connect using Redis clients or Neo4j-compatible tools. The database exposes a REST API and offers client libraries for Python, JavaScript, Java, Go and other major languages, making it easy to integrate with your existing stack.

Quick Bites

Frontier AI doesn't have to run in a datacenter. Exa Labs managed to run Llama 2 on a Windows 98 Pentium II (a 25-year-old machine) using a custom C implementation. Their open-source project llama98.c achieved inference speeds of:

39 tokens/second with a 260K parameter model
1 token/second with a 15M parameter model.

This shows that AI models can run on legacy hardware with some creative engineering using classic tools like Borland C++ 5.02 and FTP transfers.

aiXplain has developed a new framework that enables AI agent systems to autonomously optimize themselves through iterative refinement and feedback loops. The system employs specialized agents for refinement, execution, evaluation, modification, and documentation, using Llama 3.2-3B to analyze outputs and generate improvements without human intervention.
Case studies across multiple domains, including market research, AI architecture, and career transitions, demonstrated significant performance gains. Complete code and evaluation data are available here.

Tools of the Trade

Gemini Coder: Use Google's Gemini API to generate production-ready Next.js and Tailwind applications from simple text prompts. The generated code can be previewed in-browser, customized, and downloaded as complete web applications.
SRE Buddy: AI tool for SRE and DevOps teams to aggregate alerts and events from sources like Datadog and AWS Health, and provide a daily report on ongoing issues. Uses RAG and graph-based search.
LOTUS: A semantic query engine that processes structured and unstructured data using natural language expressions, offering Pandas-like operators such as semantic joins, filters, and aggregations. It provides a simple programming interface where users can specify operations in plain English.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

One of the best points that someone made to me once was this:
"Humans generalize on far less data than AI currently does. That means there's something our brains are doing algorithmically to do far more with far less data. Until we figure out that paradigm, we are no where near the top of what we can do with AI."
I'm sort of paraphrasing but that was a such a poignant observation that I still think about it regularly. ~
David Shapiro
Hot take: once OAI and Salesforce lose their political patronage, San Francisco is finished for good. ~
Bojan Tunguz

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.