- unwind ai
- Posts
- Build a Local RAG Agent with Llama 3.2 and Vector Database
Build a Local RAG Agent with Llama 3.2 and Vector Database
Fully functional RAG agent running locally in less than 20 lines of Python Code (step-by-step instructions)
Running a fully local RAG (Retrieval-Augmented Generation) agent without internet access is a powerful setup, allowing complete control over data, low-latency response, and ensuring privacy.
Building a local RAG system opens up possibilities for secure applications where online connections are not an option. In this tutorial, you’ll learn to create a local RAG agent using Llama 3.2 3B via Ollama for text generation, combined with Qdrant as the vector database for fast document retrieval.
Why local RAG agent?
Unlike cloud-based setups, this RAG agent operates without relying on external APIs or the internet. With Llama 3.2 as LLM and Qdrant for vector search, you’ll have a fully-contained RAG solution running right on your computer.
What We’re Building
This application implements a RAG system using Llama 3.2 via Ollama, with Qdrant as the vector database.
Features
Fully local RAG implementation
Powered by Llama 3.2 3B through Ollama
Vector search using Qdrant
Interactive playground interface
No external API dependencies
Prerequisites
Before we begin, make sure you have:
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the local_rag_agent folder:
cd rag_tutorials/local_rag_agent
Install the required dependencies:
pip install -r requirements.txt
Install and start Qdrant vector database locally
docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
Download and install Ollama. Pull Llama 3.2
ollama pull llama3.2
Code Walkthrough
Let’s create our app. Create a new file local_rag_agent.py
and add the following code:
Import necessary libraries:
• Qdrant for vector storage• Phidata framework for agent creation
• Ollama for running local Llama 3.2
from phi.agent import Agent
from phi.model.ollama import Ollama
from phi.knowledge.pdf import PDFUrlKnowledgeBase
from phi.vectordb.qdrant import Qdrant
from phi.embedder.ollama import OllamaEmbedder
from phi.playground import Playground, serve_playground_app
Set up Qdrant vector database locally:
Follow the instructions in the Qdrant Setup Guide to install Qdrant locally for free: https://qdrant.tech/documentation/guides/installation
collection_name = "thai-recipe-index"
vector_db = Qdrant(
collection=collection_name,
url="http://localhost:6333/",
embedder=OllamaEmbedder()
)
Create knowledge base from PDF:
• Loads PDF from URL
• Processes content
• Stores in Qdrant vector database
knowledge_base = PDFUrlKnowledgeBase(
urls=["https://phi-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf"],
vector_db=vector_db,
)
Load and index the knowledge base:
• Processes the PDF
• Creates embeddings
• Stores in Qdrant database
knowledge_base.load(recreate=True, upsert=True)
Create the RAG Agent:
• Uses local Llama 3.2
• Connects to knowledge base
• Handles recipe queries
agent = Agent(
name="Local RAG Agent",
model=Ollama(id="llama3.2"),
knowledge=knowledge_base,
)
Create and serve the user interface for RAG agent:
• Creates interactive user interface
• Serves the Playground app
app = Playground(agents=[agent]).get_app()
if __name__ == "__main__":
serve_playground_app("local_rag_agent:app", reload=True)
Running the App
With our code in place, it's time to launch the app.
Before running the app, you need to authenticate your local environment with Phidata. This ensures that your setup is properly configured to run the Agent UI locally. To do that, run the following command in your Terminal
phi auth
Once done, in your terminal, navigate to the project folder, and run the following command
python local_rag_agent.py
Phidata will provide a local URL (typically localhost:7777). Open your web browser and navigate to the URL provided in the console output to interact with the RAG agent through Phidata’s playground interface.
Working Application Demo
Conclusion
You’ve built a local RAG agent that can search, retrieve, and generate responses from embedded data using Llama 3.2 3B and Qdrant, all without internet. The system operates in a secure, offline environment, perfect for applications that require privacy and quick access to a predefined knowledge base.
For further enhancements, consider:
Fine-Tuning Retrieval Parameters: Adjust vector search parameters in Qdrant for improved search accuracy.
Supporting Additional File Types: Extend the knowledge base to process different file formats, like Word or text files.
Customizable Query Handling: Allow the agent to answer questions with predefined templates or multiple response options.
User Access Controls: Add user login and access permissions for a more secure interactive interface.
Logging and Analytics: Implement logging to track queries, responses, and improvements over time.
Keep experimenting and refining to build even smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply