• unwind ai
  • Posts
  • Build a RAG Agent with Database Routing

Build a RAG Agent with Database Routing

Fully functional agentic RAG app using GPT-4o (step-by-step instructions)

Imagine you're building a customer service AI that needs to handle queries about products, support issues, and financial matters. You could dump all your documents into a single vector database, but that would be like having a library where cookbooks, technical manuals, and financial reports are all mixed together. Not very efficient, right?

Traditional RAG systems treat all documents uniformly, leading to slower search times and diluted results. What if we could automatically route queries to the most relevant database while maintaining high performance?

In this tutorial, we'll build a sophisticated RAG system with intelligent database routing that uses multiple specialized vector databases (product info, customer support, financial data) with an agent-based router to direct queries to the most relevant database. When no relevant documents are found, it gracefully falls back to web search using DuckDuckGo.

The app uses:

  • Langchain for RAG orchestration

  • Phidata as the router agent to determine the most relevant database for a given query

  • LangGraph as a fallback mechanism, utilizing DuckDuckGo for web research when necessary

  • Streamlit for a user-friendly interface for document upload and querying

  • Qdrant for storing and retrieving document embeddings

  • GTP-4o for answer synthesis

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

We’re building a RAG app with database routing capabilities that allows users to upload multiple documents to three different databases: Product Information, Customer Support & FAQ, and Financial Information. The user can query the uploaded information in natural language, and the app will route to the most relevant database.

Features:

  1. Query Routing - The system uses a three-stage routing approach:

    • Vector similarity search across all databases

    • LLM-based routing for ambiguous queries

    • Web search fallback for unknown topics

  2. Dual-stage smart query routing:

    • Primary: Vector similarity scoring with confidence thresholds (0.5 threshold)

    • Fallback: GPT-4o-powered routing agent when confidence is low

  3. Document Processing

    • Automatic text extraction from PDFs

    • Smart text chunking with overlap

    • Vector embedding generation

  4. Answer Generation

    • Context-aware retrieval

    • Confidence-based responses

Prerequisites

Before we begin, make sure you have the following:

  1. Python installed on your machine (version 3.10 or higher is recommended)

  2. Your OpenAI and Qdrant API Key along with cluster URL

  3. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

  4. Basic familiarity with Python programming

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the rag_database_routing folder:

cd rag_tutorials/rag_database_routing
pip install -r requirements.txt
  1. Get your API Key:

    • Obtain an OpenAI API key and set it in the application.

    • Qdrant Cloud Setup - Visit Qdrant Cloud > Create an account or sign in > Create a new cluster > Get your credentials:
      - Qdrant API Key: Found in API Keys section
      - Qdrant URL: Your cluster URL (format: https://xxx-xxx.aws.cloud.qdrant.io)

Creating the Streamlit App

Let’s create our app. Create a new file rag_database_routing.py and add the following code:

  1. Let's set up our imports and configurations:

from dataclasses import dataclass
from langchain_core.documents import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Qdrant

# Define database types and configurations
@dataclass
class CollectionConfig:
    name: str
    description: str
    collection_name: str

COLLECTIONS = {
    "products": CollectionConfig(...),
    "support": CollectionConfig(...),
    "finance": CollectionConfig(...)
}
  1. Initialize session state and models:

def initialize_models():
    if st.session_state.openai_api_key:
        st.session_state.embeddings = OpenAIEmbeddings()
        st.session_state.llm = ChatOpenAI(temperature=0)
        
        client = QdrantClient(
            url=st.session_state.qdrant_url,
            api_key=st.session_state.qdrant_api_key
        )
  1. Document processing pipeline:

def process_document(file) -> List[Document]:
    with tempfile.NamedTemporaryFile(suffix='.pdf') as tmp_file:
        tmp_file.write(file.getvalue())
        loader = PyPDFLoader(tmp_file.name)
        documents = loader.load()
        
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )
    return text_splitter.split_documents(documents)
  1. Create routing agent with phi:

def create_routing_agent() -> Agent:
    return Agent(
        model=OpenAIChat(id="gpt-4o"),
        description="Query routing expert",
        instructions=[
            "Route questions to: products, support, or finance",
            "Return ONLY the database name",
            "Consider question context carefully"
        ]
    )
  1. Implement smart query routing:

def route_query(question: str) -> Optional[DatabaseType]:
    # Try vector similarity routing first
    for db_type, db in st.session_state.databases.items():
        results = db.similarity_search_with_score(question, k=3)
        avg_score = sum(score for _, score in results) / len(results)
        
    # Fallback to LLM routing if low confidence
    if best_score < confidence_threshold:
        routing_agent = create_routing_agent()
        response = routing_agent.run(question)
  1. Database querying logic:

def query_database(db: Qdrant, question: str) -> tuple[str, list]:
    retriever = db.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 4}
    )
    
    retrieval_qa_prompt = ChatPromptTemplate.from_messages([
        ("system", "Answer based on context"),
        ("human", "{context}"),
        ("human", "{input}")
    ])
    
    chain = create_retrieval_chain(retriever, combine_docs_chain)
    return chain.invoke({"input": question})
  1. Web search fallback:

def create_fallback_agent(chat_model: BaseLanguageModel):
    def web_research(query: str) -> str:
        search = DuckDuckGoSearchRun(num_results=5)
        return search.run(query)
        
    agent = create_react_agent(
        model=chat_model,
        tools=[web_research]
    )
    return agent
  1. Streamlit interface setup:

def main():
    st.set_page_config(title="RAG Agent with Routing")
    
    with st.sidebar:
        st.header("Configuration")
        api_key = st.text_input("OpenAI API Key", type="password")
        qdrant_url = st.text_input("Qdrant URL")
        qdrant_api_key = st.text_input("Qdrant API Key", type="password")
  1. Document upload interface:

tabs = st.tabs([config.name for config in COLLECTIONS.values()])
for (collection_type, config), tab in zip(COLLECTIONS.items(), tabs):
    with tab:
        uploaded_files = st.file_uploader(
            f"Upload to {config.name}",
            type="pdf",
            accept_multiple_files=True
        )
  1. Document processing and storage:

if uploaded_files:
    with st.spinner('Processing...'):
        all_texts = []
        for file in uploaded_files:
            texts = process_document(file)
            all_texts.extend(texts)
            
        db = st.session_state.databases[collection_type]
        db.add_documents(all_texts)
  1. Query handling:

question = st.text_input("Enter your question:")
if question:
    collection_type = route_query(question)
    
    if collection_type:
        db = st.session_state.databases[collection_type]
        answer, docs = query_database(db, question)
    else:
        answer, docs = _handle_web_fallback(question)
  1. Response display:

st.write("### Answer")
st.write(answer)

if docs:
    with st.expander("Sources"):
        for doc in docs:
            st.markdown(f"- {doc.page_content[:200]}...")
  1. Error handling and fallbacks:

try:
    # Main query logic
except Exception as e:
    st.error(f"Error: {str(e)}")
    answer = _handle_web_fallback(question)

Running the App

With our code in place, it's time to launch the app.

  • In your terminal, navigate to the project folder, and run the following command

streamlit run rag_database_routing.py

Working Application Demo

Conclusion

You've successfully built a production-ready RAG system with intelligent database routing! Unlike basic RAG implementations, your system now intelligently directs queries to specialized databases, making it vastly more efficient and accurate.

For further enhancements, you can:

  • Implement cross-database queries when questions span multiple domains

  • Add routing history analysis to improve future routing decisions

  • Add support for more document formats (DOCX, HTML, Markdown)

  • Implement caching for frequently accessed documents

  • Implement source citations in responses

Keep experimenting and refining to build smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.