unwind ai
Posts
Build a Local RAG Reasoning Agent with DeepSeek R1

Build a Local RAG Reasoning Agent with DeepSeek R1

Fully functional AI agent RAG app with step-by-step instructions (100% opensource)

Shubham Saboo & Gargi Gupta
February 15, 2025

Building powerful AI applications that can reason over documents while maintaining data privacy is a critical need for many organizations. However, most solutions require cloud connectivity and can't operate in air-gapped environments.

In this tutorial, we'll create a powerful reasoning agent that combines local Deepseek models with RAG capabilities. It has a dual mode that can operate in both simple local chat mode and advanced RAG mode with DeepSeek R1. The local mode enables direct interaction with the model, while the RAG mode adds comprehensive document analysis and web search capabilities - all while running primarily on your machine.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This Streamlit application implements a sophisticated reasoning system that combines locally running DeepSeek R1 models with document processing and optional web search capabilities. The system is designed to provide both basic interactions and advanced RAG features for complex document analysis.

The application operates in two modes:

Local Chat Mode - Direct interaction with DeepSeek models running locally, perfect for general queries and conversations.
RAG Mode - Enhanced reasoning with document processing, vector search, and optional web search integration for comprehensive information retrieval.

Features

Flexible dual-mode operation with easy switching between local chat and RAG capabilities.
Support for both lightweight (1.5B) and more capable (7B) DeepSeek models to match your hardware capabilities.
PDF and webpage processing with automatic text chunking and vector storage.
Web search with Llama 3.2 3B in two distinct settings - automatic fallback when document search yields no relevant results, and a manual toggle for forcing web search when needed.
Configurable similarity threshold for fine-tuning document retrieval precision.

Tech Stack

DeepSeek R1 serves as our primary language model, providing local inference (via Ollama) with both 1.5B and 7B variants to accommodate different hardware capabilities.

Snowflake Arctic Embed model (running locally via Ollama) handles document embeddings, providing state-of-the-art performance for semantic search and document retrieval.

Llama 3.2 3B powers our web search agent, offering robust capabilities for processing and summarizing web search results.

Agno (prev. Phidata) framework orchestrates the entire system, managing agent interactions and providing a performant infrastructure for both local and RAG operations.

Qdrant functions as our vector database, handling efficient storage and retrieval of document embeddings with support for similarity search.

Exa AI enables web search capabilities when needed, combining neural and keyword search approaches for comprehensive online results.

Streamlit powers the user interface, offering intuitive controls for mode switching, document upload, and search configuration.

Prerequisites

Before we begin, make sure you have the following:

Python installed on your machine (version 3.10 or higher is recommended)
Ollama installed
Your Qdrant and Exa AI (optional) API key
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git

🌟 Don't forget to star the opensource repo to show your support.

Go to the deepseek_local_rag_agent folder:

cd rag_tutorials/deepseek_local_rag_agent

Install the required dependencies:

pip install -r requirements.txt

Initial setup and API keys:
1. Ollama Setup
  Install Ollama
  Pull the Deepseek r1 model(s):
  
  For the lighter model
  ollama pull deepseek-r1:1.5b
  
  For the more capable model (if your hardware supports it)
  ollama pull deepseek-r1:7b
  
  Pull Snowflake Arctic Enbed model
  ollama pull snowflake-arctic-embed
  
  Pull Llama 3.2 model
  ollama pull llama3.2
2. Qdrant Cloud Setup (for RAG Mode)
  Visit Qdrant Cloud
  Create an account or sign in
  Create a new cluster
  Get your credentials:
- Qdrant API Key: Found in API Keys section
- Qdrant URL: Your cluster URL (format: https://xxx-xxx.cloud.qdrant.io)
1. Exa AI API Key (Optional)
  Visit Exa AI
  Sign up for an account
  Generate an API key

Creating the Streamlit App

Let’s create our app. Create a new file deepseek_rag_agent.py and add the following code:

Let's set up our imports:

import os
import tempfile
from datetime import datetime
from typing import List
import streamlit as st
import bs4
from agno.agent import Agent
from agno.models.ollama import Ollama
from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_qdrant import QdrantVectorStore

Define our custom OllamaEmbedder class:

class OllamaEmbedder(Embeddings):
    def __init__(self, model_name="snowflake-arctic-embed"):
        self.embedder = OllamaEmbedder(id=model_name, dimensions=1024)

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return [self.embed_query(text) for text in texts]

    def embed_query(self, text: str) -> List[float]:
        return self.embedder.get_embedding(text)

Set up the Streamlit interface and initialize session state:

# Streamlit App Initialization
st.title("🤔 Deepseek Local RAG Reasoning Agent")

# Session State Initialization
if 'google_api_key' not in st.session_state:
    st.session_state.google_api_key = ""
if 'qdrant_api_key' not in st.session_state:
    st.session_state.qdrant_api_key = ""
if 'qdrant_url' not in st.session_state:
    st.session_state.qdrant_url = ""
if 'model_version' not in st.session_state:
    st.session_state.model_version = "deepseek-r1:1.5b"  # Default to lighter model
if 'vector_store' not in st.session_state:
    st.session_state.vector_store = None
if 'processed_documents' not in st.session_state:
    st.session_state.processed_documents = []
if 'history' not in st.session_state:
    st.session_state.history = []
if 'exa_api_key' not in st.session_state:
    st.session_state.exa_api_key = ""
if 'use_web_search' not in st.session_state:
    st.session_state.use_web_search = False
if 'force_web_search' not in st.session_state:
    st.session_state.force_web_search = False
if 'similarity_threshold' not in st.session_state:
    st.session_state.similarity_threshold = 0.7
if 'rag_enabled' not in st.session_state:
    st.session_state.rag_enabled = True  # RAG is enabled by default

Configure the sidebar and model selection:

st.sidebar.header("🤖 Agent Configuration")
st.sidebar.header("📦 Model Selection")
model_help = """
- 1.5b: Lighter model, suitable for most laptops
- 7b: More capable but requires better GPU/RAM
"""
st.session_state.model_version = st.sidebar.radio(
    "Select Model Version",
    options=["deepseek-r1:1.5b", "deepseek-r1:7b"],
    help=model_help
)

Add RAG configuration and API settings:

st.session_state.rag_enabled = st.sidebar.toggle("Enable RAG Mode", value=st.session_state.rag_enabled)

if st.session_state.rag_enabled:
    st.sidebar.header("🔑 API Configuration")
    qdrant_api_key = st.sidebar.text_input("Qdrant API Key", type="password")
    qdrant_url = st.sidebar.text_input("Qdrant URL")

Implement document processing for PDFs:

def process_pdf(file) -> List:
    try:
        with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp_file:
            tmp_file.write(file.getvalue())
            loader = PyPDFLoader(tmp_file.name)
            documents = loader.load()
            
            for doc in documents:
                doc.metadata.update({
                    "source_type": "pdf",
                    "file_name": file.name,
                    "timestamp": datetime.now().isoformat()
                })
            
            text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=1000,
                chunk_overlap=200
            )
            return text_splitter.split_documents(documents)
    except Exception as e:
        st.error(f"📄 PDF processing error: {str(e)}")
        return []

Add web processing functionality:

def process_web(url: str) -> List:
    try:
        loader = WebBaseLoader(
            web_paths=(url,),
            bs_kwargs=dict(
                parse_only=bs4.SoupStrainer(
                    class_=("post-content", "post-title", "post-header", "content", "main")
                )
            )
        )
        documents = loader.load()
        # Add metadata and split text
        # ... (rest of the function)

Implement vector store management:

def create_vector_store(client, texts):
    try:
        client.create_collection(
            collection_name=COLLECTION_NAME,
            vectors_config=VectorParams(
                size=1024,  
                distance=Distance.COSINE
            )
        )
        
        vector_store = QdrantVectorStore(
            client=client,
            collection_name=COLLECTION_NAME,
            embedding=OllamaEmbedder()
        )
        
        vector_store.add_documents(texts)
        return vector_store
    except Exception as e:
        st.error(f"🔴 Vector store error: {str(e)}")
        return None

Create the web search agent:

def get_web_search_agent() -> Agent:
    return Agent(
        name="Web Search Agent",
        model=Ollama(id="llama3.2"),
        tools=[ExaTools(
            api_key=st.session_state.exa_api_key,
            include_domains=search_domains,
            num_results=5
        )],
        instructions="""You are a web search expert...""",
        show_tool_calls=True,
        markdown=True,
    )

Implement the main RAG agent:

def get_rag_agent() -> Agent:
    return Agent(
        name="DeepSeek RAG Agent",
        model=Ollama(id=st.session_state.model_version),
        instructions="""You are an Intelligent Agent...""",
        show_tool_calls=True,
        markdown=True,
    )

Add document relevance checking:

def check_document_relevance(query: str, vector_store, threshold: float = 0.7) -> tuple[bool, List]:
    if not vector_store:
        return False, []
        
    retriever = vector_store.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs={"k": 5, "score_threshold": threshold}
    )
    docs = retriever.invoke(query)
    return bool(docs), docs

Set up the chat interface:

chat_col, toggle_col = st.columns([0.9, 0.1])
with chat_col:
    prompt = st.chat_input("Ask about your documents..." if st.session_state.rag_enabled else "Ask me anything...")
with toggle_col:
    st.session_state.force_web_search = st.toggle('🌐', help="Force web search")

Implement the main chat logic:

if prompt:
    st.session_state.history.append({"role": "user", "content": prompt})
    # ... (main chat logic)

Running the App

With our code in place, it's time to launch the app.

In your terminal, navigate to the project folder, and run the following command

streamlit run deepseek_rag_agent.py

Streamlit will provide a local URL (typically http://localhost:8501).

Working Application Demo

Conclusion

You've built a powerful RAG reasoning agent that can operate entirely locally while maintaining the flexibility to leverage web resources when needed. This dual-mode setup offers the best of both worlds - secure local processing for sensitive data and enhanced capabilities through RAG when required.

Here are some potential enhancements to consider:

Add a conversation memory layer to help the agent maintain context across chat sessions and remember frequently accessed documents.
Extend beyond PDFs and web pages by adding support for additional document types like Word, Excel, or specialized formats.
Implement techniques like query expansion or hybrid search to improve document retrieval accuracy and reduce reliance on web search fallback.
Allow users to create and maintain specialized knowledge bases for different topics or departments.

This foundation can be adapted for various use cases, from personal knowledge management to enterprise document analysis systems. Keep experimenting and refining to build even smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.