unwind ai
Posts
RAG-as-a-Service

RAG-as-a-Service

PLUS: Personal AI supercomputer, Minimalist LLM framework in 100 lines

Shubham Saboo & Gargi Gupta
January 07, 2025

In partnership with

Today’s top AI Highlights:

RAG-as-a-Service that keeps your data in your infrastructure
Build AI agents with this opensource 100% Python framework
NVIDIA’s personal AI supercomputer for $3,000 to run 20B LLMs locally
Minimalist framework in 100 lines to enable LLMs to program themselves

& so much more!

Read time: 3 mins

AI Tutorials

Data analysis often requires complex SQL queries and deep technical knowledge, creating a barrier for many who need quick insights from their data. What if we could make data analysis as simple as having a conversation?

In this tutorial, we'll build an AI Data Analysis Agent that lets users analyze CSV and Excel files using natural language queries. Powered by GPT-4o and DuckDB, this tool translates plain English questions into SQL queries, making data analysis accessible to everyone – no SQL expertise required.

We're using Phidata, a framework specifically designed for building and orchestrating AI agents. It provides the infrastructure for agent communication, memory management, and tool integration.

We share hands-on tutorials like this 2-3 times a week, designed to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Build an AI Data Analysis Agent

Fully functional AI agent app using GPT-4o (step-by-step instructions)

Don’t forget to share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Latest Developments

Production RAG with 2 API Calls; No Data Sharing 🔏

RAGaaS brings production-grade RAG infrastructure to your applications while keeping your data within your control. The platform handles complex document processing, embedding generation, and vector search directly in your infrastructure - no data leaves your systems.

Built by the team behind SiteGPT.ai after serving 100+ customers, RAGaaS packages battle-tested components into a simple API to process, embed, and search your private documents within your own infrastructure, never storing your actual data on their servers.

Key Highlights:

Privacy-First - Your data stays in your infrastructure, always. RAGaaS processes documents through its API but stores everything directly in your S3-compatible storage and vector database. The platform never retains your data, making it ideal for handling sensitive information and meeting compliance requirements.
Production-Ready Performance - Skip months of infrastructure work with a platform tested across 100+ deployments. The hybrid search pipeline combines semantic search, keyword matching, and reranking to deliver better results, while the document processing handles complex PDFs, OCR, and parsing reliably at scale.
Developer Experience - Get started with just two API calls - one to process documents, one to search them. The platform integrates with your existing S3 storage and vector database, letting you maintain full control while RAGaaS handles the complex parts like chunking, embedding generation, and search optimization.
Enterprise-Ready Features - Built-in support for multi-tenant isolation, rate limiting, and access controls tested with enterprise clients. The platform includes real-time content updates, advanced metadata filtering, and comprehensive logging for production deployments.

Ready to level up your work with AI?

HubSpot’s free guide to using ChatGPT at work is your new cheat code to go from working hard to hardly working

HubSpot’s guide will teach you:

How to prompt like a pro
How to integrate AI in your personal workflow
Over 100+ useful prompt ideas

All in order to help you unleash the power of AI for a more efficient, impactful professional life.

Get the free guide and level up your AI game today!

Lightweight & Scalable Python AI Agent Framework 🪶 🐍

Nevron is a highly customizable AI agent framework written 100% in Python, to build AI agents that can work autonomously. It provides the core building blocks - memory storage, decision making, and task execution - needed to create agents that learn from experience and adapt their behavior.

Rather than starting from scratch, you get a collection of pre-built tools for common tasks like social media posting and research, along with clear patterns for adding custom capabilities. The framework makes it simple to create agents that can understand their environment, make decisions, and take actions without constant supervision.

Key Highlights:

Modular and Extensible - Nevron's core is built with planning, feedback, and memory modules, enabling developers to customize the framework with unique workflows and tool integrations. The modular design means you can pick and choose components, swapping them out and extending them as required.
LLM and External Integrations - The framework seamlessly supports OpenAI (GPT-4o for decision-making) and Anthropic (and other models) for intelligence and provides ready-to-use tools for integrating with external services such as Telegram, Twitter, and Perplexity.
Q-Learning for Autonomous Decisions - Using Q-learning algorithm, the framework gives you a starting point to create an agent that can make its own decisions. The parameters for the planning algorithm can be configured, and it learns from feedback, allowing the agent to adapt over time.
Memory Management & Vector Databases - The memory module uses vector embeddings to store and retrieve data, allowing the agent to recall past experiences. Nevron supports both Chroma (default, local) and Qdrant (distributed) databases for this, you have the flexibility of choosing the right database depending on the use case.

Quick Bites

The highly-anticipated CES 2025 kicked off yesterday in Las Vegas where the biggest tech companies made a slew of announcements. As one would expect, generative and agentic AI was the highlight, permeating nearly every product category from silicon to software. Here are the announcements you can’t miss:

NVIDIA
- GeForce RTX 50 series GPUs were released, powered by the Blackwell architecture, introducing breakthroughs in AI rendering. The lineup includes the RTX 5090 (3,352 AI TOPS for $1999), RTX 5080 (1,801 AI TOPS for $999), RTX 5070 Ti (1,406 AI TOPS for $749), and RTX 5070 (988 AI TOPS for an insane $549). This high-performance compute at these price points is unprecedented, you can locally train, fine-tune, and deploy LLMs that would traditionally require large-scale data centers.
- Cosmos World Foundation Models (WFMs) is a family of AI models capable of generating physics-aware video. These models, ranging from 4 to 14B parameters, are offered in Nano, Super, and Ultra tiers, catering to different needs for latency and fidelity.
- Project Digits is a personal AI supercomputer featuring Nvidia's Grace Blackwell Superchip, offering a petaflop of computing performance for AI development. Capable of running up to 200B models, it can be linked with another to run 405B models on Nvidia's software stack, at a starting price of JUST $3,000!
AMD announced a wide array of new processors, targeting various segments from high-performance desktops to mobile and handheld gaming devices. The releases include the Ryzen 9 9950X3D desktop CPU for gamers and creators, a new "Fire Range" of laptop chips, the Ryzen AI 300 and Max series for AI-accelerated PCs, and new Ryzen Z2 series chips for handheld gaming devices.
- AI PC Chips: AMD’s Ryzen AI 300 and Max series have dedicated Neural Processing Units (NPUs) to accelerate AI workloads. With 6-8 cores clocked up to 5GHz, Ryzen AI 300 prioritizes power efficiency (aiming for 24+ hour battery life) while the Max series (6-16 cores up to 5.1 GHz) targets higher AI and 3D rendering performance.
- Graphics Cards: The new Radeon RX 9070 XT and RX 9070 GPUs, based on the 4nm RDNA 4 architecture, feature improvements in ray tracing performance, media encoding quality, and AI acceleration. They will also be the initial hardware to benefit from FidelityFX Super Resolution 4.0 upscaling technology which uses AI to deliver 4K resolution with minimal latency.
- Handheld Processors: AMD is expanding the Ryzen Z2 series with the new lightweight Ryzen Z2 Go (4 cores up to 4.3 GHz with 12 graphics cores) and the Ryzen Z2 Extreme (8 cores up to 5 GHz with 16 graphics cores).
Samsung’s Live Translate feature, previously on its mobile devices, is now available for select 2025 TV models. This feature provides real-time translation of closed captions on live broadcasts in 7 languages, alongside an AI-based voice removal with audio subtitles feature to cater to visually impaired users.

Tools of the Trade

MiniLLMFlow: A 100-line Python framework that provides the core abstraction of an LLM application. It represents tasks as a nested directed graph of LLM steps with branching and recursion for agent-like behavior. The framework intentionally avoids vendor-specific wrappers and is designed to be easily understood and used by LLMs for self-programming.
Stanza: AI development tool that integrates with GitHub repos to provide code understanding and analysis capabilities. Three core features - a natural language chat interface for codebase queries, automated code reviews on pull requests, and a developer API for building custom tools.
LLM Scraper: A TypeScript library that converts webpage content into structured data using LLMs. It works with various LLM providers like OpenAI, Ollama, and GGUF. The library offers features like type safety, schema definition using Zod, multiple formatting modes (HTML, markdown, text, image), etc.
Awesome LLM Apps: Build awesome LLM apps with RAG, AI agents, and more to interact with data sources like GitHub, Gmail, PDFs, and YouTube videos, and automate complex work.

Hot Takes

When I watched Her, it really bothered me that they had extremely advanced AI and society didn't seem to care. What I thought was a plot hole turns out to be spot on ~
Tom Dörr
Actually wild to think that a 1.5B LLM (gpt-2) was at one point considered “too dangerous” and now we have 405B models available to download ~
anton

That’s all for today! See you tomorrow with more such AI-filled content.

Don’t forget to share this newsletter on your social channels and tag Unwind AI to support us!

Unwind AI - X | LinkedIn | Threads | Facebook

Awesome LLM Apps | Sponsor Us

PS: We curate this AI newsletter every day for FREE, your support is what keeps us going. If you find value in what you read, share it with at least one, two (or 20) of your friends 😉

Reply

or to participate.