• unwind ai
  • Posts
  • Build a Local RAG Agent with Llama 3.2 and Vector Database

Build a Local RAG Agent with Llama 3.2 and Vector Database

Fully functional RAG agent running locally in less than 20 lines of Python Code (step-by-step instructions)

Running a fully local RAG (Retrieval-Augmented Generation) agent without internet access is a powerful setup, allowing complete control over data, low-latency response, and ensuring privacy.

Building a local RAG system opens up possibilities for secure applications where online connections are not an option. In this tutorial, you’ll learn to create a local RAG agent using Llama 3.2 3B via Ollama for text generation, combined with Qdrant as the vector database for fast document retrieval.

Why local RAG agent? 

Unlike cloud-based setups, this RAG agent operates without relying on external APIs or the internet. With Llama 3.2 as LLM and Qdrant for vector search, you’ll have a fully-contained RAG solution running right on your computer.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This application implements a RAG system using Llama 3.2 via Ollama, with Qdrant as the vector database.

Features

  • Fully local RAG implementation

  • Powered by Llama 3.2 3B through Ollama

  • Vector search using Qdrant

  • Interactive playground interface

  • No external API dependencies

Prerequisites

Before we begin, make sure you have:

  1. Python installed on your machine (version 3.7 or higher is recommended)

  2. Ollama and Qdrant installed

  3. Basic familiarity with Python programming

  4. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the local_rag_agent folder:

cd rag_tutorials/local_rag_agent
pip install -r requirements.txt
  1. Install and start Qdrant vector database locally

docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
  1. Download and install Ollama. Pull Llama 3.2

ollama pull llama3.2

Code Walkthrough

Let’s create our app. Create a new file local_rag_agent.py and add the following code:

  1. Import necessary libraries:
    • Qdrant for vector storage

    • Phidata framework for agent creation

    • Ollama for running local Llama 3.2

from phi.agent import Agent
from phi.model.ollama import Ollama
from phi.knowledge.pdf import PDFUrlKnowledgeBase
from phi.vectordb.qdrant import Qdrant
from phi.embedder.ollama import OllamaEmbedder
from phi.playground import Playground, serve_playground_app
  1. Set up Qdrant vector database locally:

    Follow the instructions in the Qdrant Setup Guide to install Qdrant locally for free: https://qdrant.tech/documentation/guides/installation

collection_name = "thai-recipe-index"

vector_db = Qdrant(
    collection=collection_name,
    url="http://localhost:6333/",
    embedder=OllamaEmbedder()
)
  1. Create knowledge base from PDF:

    • Loads PDF from URL

    • Processes content

    • Stores in Qdrant vector database

knowledge_base = PDFUrlKnowledgeBase(
    urls=["https://phi-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
    vector_db=vector_db,
)
  1. Load and index the knowledge base:

    • Processes the PDF

    • Creates embeddings

    • Stores in Qdrant database

knowledge_base.load(recreate=True, upsert=True)
  1. Create the RAG Agent:

    • Uses local Llama 3.2

    • Connects to knowledge base

    • Handles recipe queries

agent = Agent(
    name="Local RAG Agent",
    model=Ollama(id="llama3.2"),
    knowledge=knowledge_base,
)
  1. Create and serve the user interface for RAG agent:

    • Creates interactive user interface

    • Serves the Playground app

app = Playground(agents=[agent]).get_app()

if __name__ == "__main__":
    serve_playground_app("local_rag_agent:app", reload=True)

Running the App

With our code in place, it's time to launch the app.

  • Before running the app, you need to authenticate your local environment with Phidata. This ensures that your setup is properly configured to run the Agent UI locally. To do that, run the following command in your Terminal

phi auth
  • Once done, in your terminal, navigate to the project folder, and run the following command

python local_rag_agent.py
  • Phidata will provide a local URL (typically localhost:7777). Open your web browser and navigate to the URL provided in the console output to interact with the RAG agent through Phidata’s playground interface.

Working Application Demo

Conclusion

You’ve built a local RAG agent that can search, retrieve, and generate responses from embedded data using Llama 3.2 3B and Qdrant, all without internet. The system operates in a secure, offline environment, perfect for applications that require privacy and quick access to a predefined knowledge base.

For further enhancements, consider:

  1. Fine-Tuning Retrieval Parameters: Adjust vector search parameters in Qdrant for improved search accuracy.

  2. Supporting Additional File Types: Extend the knowledge base to process different file formats, like Word or text files.

  3. Customizable Query Handling: Allow the agent to answer questions with predefined templates or multiple response options.

  4. User Access Controls: Add user login and access permissions for a more secure interactive interface.

  5. Logging and Analytics: Implement logging to track queries, responses, and improvements over time.

Keep experimenting and refining to build even smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.