- unwind ai
- Posts
- RAG Pipelines with Visual Embeddings
RAG Pipelines with Visual Embeddings
PLUS: AI observability library by Hugging Face, 1-bit LLMs match full FP16
Today’s top AI Highlights:
Build RAG pipelines with Visual Embeddings: skip OCR, index images directly
Open source SDK to track every AI interaction without extra code
1-bit quantization finally matches full FP16 accuracy
The largest multilingual open dataset 100% free
Open source LLMOps platform: prompt playground, prompt management, LLM evaluation, and observability all in one place
& so much more!
Read time: 3 mins
AI Tutorials
AI tools are transforming how entrepreneurs identify trends and make decisions, but building a scalable solution to analyze startup opportunities often involves integrating multiple data sources and processing them quickly. But with advanced LLMs equipped with the right tools, this processcan be automated to deliver actionable insights.
In this tutorial, we’ll guide you through building an AI Startup Trend Analysis Agent. This AI agent will analyze startup news, identify emerging trends, and validate ideas. It’ll integrate Newspaper4k and DuckDuckGo with Claude 3.5 Sonnet using less than 50 lines of Python code.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
ColiVara takes a fresh perspective on Retrieval Augmented Generation by moving away from text-based processing to visual embeddings. Instead of struggling with text extraction from complex documents, the framework uses vision models to understand and process your content - meaning your tables, charts, and intricate layouts stay intact and meaningful. Whether you're working with PDFs, Word documents, or PowerPoint files, ColiVara handles them all through a clean Python SDK that lets you implement RAG in just a few lines of code.
Key Highlights:
Document Processing Pipeline - Directly process documents using vision models instead of complex text extraction pipelines. Handle PDFs, presentations, and web pages through a single unified API that preserves tables, charts, and layouts. The system automatically takes screenshots of web pages and processes them, eliminating OCR or text parsing.
Production-Ready Performance - Built on the ColPali paper and using ColQwen2 for embeddings, the system achieves up to 87.6% accuracy on benchmark tests. Each document processes in about 7 seconds per page, with built-in support for asynchronous processing. The API includes comprehensive error handling and connection recovery mechanisms.
Developer-First Implementation - Get started with just 3 lines of code using the Python SDK. Built-in filtering capabilities let you search by collection name, metadata, and document properties. Supports both synchronous and asynchronous document processing with automatic batch handling for large document sets.
Flexible Integration Options - Deploy using the cloud API with a simple API key, or self-host using Docker for complete control. Includes built-in support for S3-compatible storage and PostgreSQL with pgvector, making it adaptable to existing infrastructure.
Observers is a Python SDK that lets you track and analyze you AI model interactions through multiple storage backends like Hugging Face datasets, DuckDB, and Argilla. The library provides a lightweight wrapper for OpenAI-compatible LLM providers, making it easy to monitor model behavior, store interaction data, and perform detailed analysis.
With minimal configuration, you can start recording AI interactions, run SQL queries to analyze patterns, and use familiar tools like DuckDB CLI or Hugging Face's Dataset Viewer to explore their data. Beyond basic logging, Observers includes integrations with document intelligence platforms and supports multiple LLM routers like AISuite and LiteLLM.
Key Highlights:
Minimal Code - Wrap any OpenAI-compatible LLM provider using a single function call. The SDK works with existing code and requires no architecture changes - just install with pip and add two lines to start tracking interactions. Supports major providers through AISuite and LiteLLM integrations.
Storage Options with Query Capabilities - Store interaction data in Hugging Face datasets for cloud storage with UI-based filtering, DuckDB for local SQL querying, or Argilla for annotation workflows. Each backend maintains complete interaction records including messages, timestamps, and raw responses.
Tracking Without Performance Impact - Records full interaction context including model responses, timestamps, and error states while maintaining production performance. The lightweight design adds minimal overhead to API calls, making it suitable for both development and production environments.
Extended Support - Integrates with document intelligence platforms through Docling, enabling tracking of document-based AI interactions. Install the Docling extension to monitor how your models process and respond to document inputs.
Quick Bites
Here’s the first 10B parameter LLM trained across the globe, on up to 14 concurrent nodes distributed across 3 continents, using 112 H100 GPUs simultaneously with open compute. Built on the Llama 3 architecture, it was trained using distributed low-communication training with innovations like ElasticDeviceMesh and a custom int8 all-reduce. This achievement proves that large-scale model training can be successful despite bandwidth constraints and node volatility. The model, code, and datasets are available on Hugging Face.
1-bit quantization usually forces us to choose between model size and performance. Here’s Bi-Mamba, a scalable 1-bit Mamba architecture that lets you deploy LLMs across sizes (780M, 1.3B, and 2.7B parameters), while achieving accuracy comparable to full-precision models (eg., FP16 or BF16) and significantly reducing memory footprint. This might be an interesting area to watch in the next year!
Elon Musk’s xAI could soon make its next move to compete with OpenAI: launching a standalone app for its Grok chatbot this December. It is currently accessible only via X.
Pleias has just released Common Corpus, the largest 2 trillion token dataset for multilingual LLMs, completely free on HuggingFace. This dataset isn't just big; it's diverse, spanning code, legal documents, books, and more, with extensive curation and detailed provenance. You can get it here.
Tools of the Trade
Agenta: Open source platform for building production-grade LLM apps. It offers a prompt playground, evaluation tools, prompt management, and observability features to accelerate the LLM app development lifecycle.
SQLite-Vec: A lightweight, portable vector database extension for SQLite for RAG to store and search vectors for similarity. It’s great for on-device RAG, allowing systems to perform local vector searches within a SQLite environment.
GitDigest: Transforms any GitHub repository into a text digest optimized for LLMs. The web interface lets you input a GitHub URL and receive formatted output containing file structure, token statistics, and a repository summary, all tailored for LLM prompts.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
perplexity is garbage software and im tired of pretending its not ~
Tom Lynchrandom guess: OpenAI is not building a browser for humans; it's building one for AI's
to fully solve web agents, it likely helps to re-engineer the browser from the ground up ~
James Campbell
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply