• unwind ai
  • Posts
  • Build Multimodal RAG Apps in Minutes

Build Multimodal RAG Apps in Minutes

PLUS: Single Llama running on 1 NVIDIA RTX 4090, 1 AMD 6800XT, and Google's TPU

  • Explore future trends, infrastructure & scalability, and more

  • Learn from the best in Model Development and Performance Optimization

  • Get inspired by real-world case studies and success stories

Don’t miss out - use code SHUBHAM15 for 15% off your ticket! 

See you in Austin, TX, November 7-8, 2024

Today’s top AI Highlights:

  1. New opensource high-performance AI inference stack using Zig language

  2. Build multimodal RAG pipeline within minutes with LlamaCloud

  3. Mistral AI launched a free API tier and reduced prices by up to 80%

  4. AI-powered end-to-end software testing without writing a single line of code

& so much more!

Read time: 3 mins

Latest Developments

ZML, a high-performance AI inference stack for running LLMs across diverse hardware setups, has emerged out of stealth. It is completely open-sourced and makes it easier for you to scale AI models across hardware setups without compromising on performance or flexibility. Built using the Zig programming language, it lets you deploy and run AI models on a variety of hardware, including CPUs, GPUs, and TPUs. It can handle a range of models and distribute them across multiple devices, even across different geographical locations.

Key Highlights:

  1. Models You Can Run - ZML supports a range of models from simple tasks like handwritten digit recognition (MNIST) to more complex models like Llama and OpenLlama for NLP tasks. It’s specially optimized for scaling across hardware setups.

  2. Why Zig? - ZML uses Zig for its low-level memory control and compile-time optimizations, offering speed gains over Python or other frameworks. If performance is critical, Zig’s efficiency ensures faster execution of models without sacrificing flexibility.

  3. Learning Curve - If you’re coming from a C++ or Python background, expect a moderate learning curve. Zig’s syntax is more straightforward than C++ but offers similar low-level control. If speed and optimization matter to your projects, ZML can be worth the investment.

  4. Distributed Performance - ZML is capable of running models across geographically distributed hardware setups. For example, it has been tested with a Llama 2 model sharded across an NVIDIA RTX 4090, AMD 6800XT, and Google Cloud TPU v2.

  5. Getting Started - ZML GitHub repository has detailed setup instructions. You’ll need to install Bazel, clone the ZML repo, and follow their guides to run pre-built models like MNIST or Llama. The examples folder provides everything you need to get up and running quickly.

LlamaCloud now offers multimodal RAG capabilities where developers can build end-to-end multimodal RAG pipelines within minutes. This new feature enables you to seamlessly integrate and process various document types, including marketing decks, legal contracts, and financial reports, incorporating both text and image data. By simply toggling in the setting, LlamaCloud indexes each page as both text and image chunks which you can further build upon.

Key Highlights:

  1. Faster Development - When creating a new index in LlamaCloud, simply toggle the "Multi-Modal Indexing" option. LlamaCloud will automatically extract and index both text and images from your documents.

  2. Integration with Multimodal LLMs - LlamaCloud provides built-in support for multimodal LLMs like Mistral AI’s Pixtral, Claude 3.5 Sonnet, and GPT-4o.

  3. Enhanced Contextual Understanding: Utilize both textual and visual information to generate more accurate and insightful summaries of complex documents.

  4. Customizable Query Engine: Build a custom query engine tailored to your specific use case, leveraging LlamaCloud's framework for handling multimodal data. This new feature is available to all users. Check out their reference notebook to start building your next-gen RAG applications today.

Quick Bites

Mistral AI announced new updates, including a free API, price cuts for all models, a new version of Mistral Small, and more:

  • Mistral’s La Plateforme now offers a free tier for developers to prototype and experiment with Mistral’s models

  • They have reduced the API prices by 50% for Mistral Nemo, by 80% for Mistral Small and Codestral, and by 33% for Mistral Large

  • Their enterprise-grade small model Mistral Small has been upgraded with improvements in human alignment, reasoning capabilities, and code

  • The latest multimodal AI model Pixtral 12B is now freely available on le Chat.

Perplexity has added a new Reasoning focus (beta) for Perplexity for Pro users. It will use the new OpenAI o1-mini. There is no search integration yet. The model is slow, and usage is limited because of rate limits.

Elon Musk’s Neuralink has received the FDA's "breakthrough device" designation for its Blindsight implant, which aims to restore vision even for those who have lost both eyes and the optic nerve. Musk says that initially, the vision will be blurry but it has the potential to be better than natural vision.

OpenAI CEO Sam Altman has left the Safety and Security Committee, which will now operate as an independent board group led by Zico Kolter from Carnegie Mellon. The committee will oversee safety reviews of models, including OpenAI's latest release

Tools of the Trade

  1. TestSprite: AI-powered software testing where AI generates test plans, implements test codes, runs them on the cloud, debugging, and creating detailed reports. It speeds up the testing process for mobile, web, and SDK apps.

  1. SocialAI: AI-powered social network where you post updates and receive endless AI-generated responses from virtual followers. It's for personal reflection, journaling, and emotional support, with no real users - just AI interactions tailored to your thoughts.

  2. Expand AI: Turns any website into an API so you can scrape structured data easily. It generates type-safe schemas, ensures high data quality, and scales for large datasets.

  3. Awesome LLM Apps: Build awesome LLM apps using RAG to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos through simple text. These apps will let you retrieve information, engage in chat, and extract insights directly from content on these platforms.

Hot Takes

  1. We are absolutely at a place where, if AI development completely stopped, we would still have 5-10 years of rapid change absorbing the capabilities of current models and integrating them into organizations and social systems.
    I don't think development is going to stop, though. ~
    Ethan Mollick

  2. Vertical llm agents is the new vertical saas - the most straightforward way to generate $1B company ideas. ~
    Jared Friedman

Meme of the Day

That’s all for today! See you tomorrow with more such AI-filled content.

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!

Unwind AI - Twitter | LinkedIn | Instagram | Facebook

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one (or 20) of your friends!

Reply

or to participate.