- unwind ai
- Posts
- Opensource On-Prem RAG
Opensource On-Prem RAG
PLUS: Local LLM + web search, Serverless platform to build AI agents
Today’s top AI Highlights:
Build on-premise RAG - Run LLMs, embeddings, and reranking on your hardware
Develop and deploy modular AI agents and fully featured AI apps
A new neural network combines the speed of RNNs with the performance of transformers
Local LLMs can now search the web without compromising data privacy
Compare LLM inference APIs with performance/cost/uptime metrics
& so much more!
Read time: 3 mins
AI Tutorials
Data analysis often requires complex SQL queries and deep technical knowledge, creating a barrier for many who need quick insights from their data. What if we could make data analysis as simple as having a conversation?
In this tutorial, we'll build an AI Data Analysis Agent that lets users analyze CSV and Excel files using natural language queries. Powered by GPT-4o and DuckDB, this tool translates plain English questions into SQL queries, making data analysis accessible to everyone – no SQL expertise required.
We're using Phidata, a framework specifically designed for building and orchestrating AI agents. It provides the infrastructure for agent communication, memory management, and tool integration.
We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Latest Developments
asterai offers a serverless platform to build and deploy AI applications through a modular plugin system. The platform combines LLMs, semantic search, and vector databases with custom logic that you define through plugins written in AssemblyScript or Go.
What makes asterai useful is its ability to handle both structured data and natural language - your AI agents can process user queries and trigger programmatic actions on your backend while returning formatted data to enhance your UI. The platform takes care of infrastructure management, providing a single API key that covers your entire AI stack from LLM providers to vector databases.
Key Highlights:
Plugin-Based Development - asterai’s plugin system lets you build modular AI functionalities that can be quickly added to your applications, allowing for quick development cycles. You can define plugin behavior using a Protobuf manifest, then implement your logic using AssemblyScript or Go, and deploy it directly from your terminal. This makes integration of custom logic and existing APIs easier with LLMs.
Managed AI Infrastructure - Skip managing multiple provider keys and infrastructure setups. asterai handles the AI stack including LLMs, vector databases, and knowledge bases. The platform offers 99.99% uptime with managed scaling and a predictable serverless pricing model.
Data Handling & Output - Plugins can output both natural language and structured data to build apps with both AI responses and interactive UIs. Using structured data, front-end widgets can be displayed, which can trigger front-end function calls along with an LLM's response. This flexibility allows for richer and more dynamic experiences than plain text-based chatbots.
Developer-Friendly Integration - Query agents through REST APIs or client SDKs (currently JavaScript/TypeScript) with support for both streaming and full responses. The platform provides TypeScript types for type safety and detailed documentation for implementing common patterns like chatbots and AI search.
Minima is an open-source RAG tool to build on-premises conversational AI systems with configurable container setups. This tool allows for integration with external services like ChatGPT and Claude, or can function completely locally, putting data privacy front and center.
Minima's flexibility spans from handling different document types (.pdf, .xls, .docx, etc.) to a configurable indexing and query system. What makes it even more appealing is the ability to run all the core components, including LLMs, rerankers, and embeddings, on your own hardware, giving you complete control over your data.
Key Highlights:
Deployment Options - Minima provides three distinct modes: a fully isolated on-premises setup using Ollama, integration with ChatGPT through custom GPTs, or integration with Anthropic Claude via the Model Context Protocol (MCP). You can choose the mode that best fits your infrastructure and needs. The fully local setup is great for those who want to run every piece of it locally.
Customizable Configuration - Minima is configured via a straightforward .env file. You can easily tweak parameters such as embedding models (Sentence Transformers), embedding sizes, and reranking models (BAAI). This level of customization allows for optimization based on your specific use case and the resources you have available.
Containerized Architecture - Everything runs within Docker containers, streamlining deployment and management. The tool includes pre-built docker-compose files for each mode, simplifying the initial setup process. You can choose the appropriate docker-compose file and then use docker compose with the .env file.
Local Usage - For a fully local setup, Minima provides an Electron app that can be launched using npm. This feature lets you quickly test and integrate RAG functionalities within a convenient local environment. When used in conjunction with ChatGPT or Claude, the indexing is still done locally, providing a good level of privacy.
Quick Bites
Here’s an exciting one! The Linux Foundation AI project RWKV has released version 7 ("Goose") - a novel approach to language model architecture that combines RNN and transformer characteristics. This attention-free model achieves transformer-level performance while offering linear computational scaling, constant memory usage, and infinite context length capability. Now implemented in Windows & Office, RWKV provides a unique solution for deploying LLMs efficiently through its hybrid RNN-transformer design.
Phind, an answer engine for programming questions, has now been upgraded to version 2. Where it’d always perform a web search for every input previously, Phind v2 is now essentially a pair programming agent that knows when to browse the web, ask clarifying questions, and call itself recursively; (2) the answering engine defaults to GPT-4, and you can use it without a login; (3) is now integrated with your codebase via a new VS Code extension.
Anthropic has released a roadmap outlining upcoming features for their Model Context Protocol. These include remote connection support with secure authentication, reference implementations for developers, improved server distribution and package management, as well as expanded support for AI agent workflows and integration. This should make connecting external tools and data sources to AI models even more straightforward and efficient.
EXO Labs has launched Private Search, a system that enables local LLMs to privately access real-time data from sources like Twitter and Wikipedia using homomorphic encryption, delivering <2s latency with 100,000x less data transfer than traditional client sync.
The system works by clustering document vectors and using encrypted similarity search to privately identify and retrieve only the most relevant documents, allowing local models to maintain privacy while accessing current information.
Tools of the Trade
YPerf: A web-based monitoring tool that tracks and compares the real-time performance metrics (latency, throughput, uptime) of various LLM inference APIs. Make data-driven decisions when selecting LLM providers by combining performance data from OpenRouter with benchmark rankings and cost estimates.
Datafuel: An API service that transforms websites and knowledge bases into clean, markdown-structured data optimized for LLM and RAG systems. Through a single query, it can scrape entire websites, handle authenticated content, and output data in multiple formats (like MD and JSON).
Chipper: A local AI development toolkit that provides web and CLI interfaces for building RAG pipelines with Haystack, Ollama, and ElasticSearch. You can create and manage embedding pipelines, document processing, and query workflows through a hackable architecture that can run locally or scale as a containerized service.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.
Hot Takes
There’s a ~12 month capabilities to wide scale production gap, most vision use case work now, but aren’t widely deployed. Agents still need a little more work for billion user level scale. ~
Logan KilpatrickAI will never take our jobs.
If we were smart enough to invent Scrum Masters and convince an entire generation to use Jira and meet every 15 minutes in the name of “productivity”, then we can figure this one out. ~
Santiago
That’s all for today! See you tomorrow with more such AI-filled content.
Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!
PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉
Reply