- unwind ai
- Posts
- Build a RAG Agent with Database Routing
Build a RAG Agent with Database Routing
Fully functional agentic RAG app using GPT-4o (step-by-step instructions)
Imagine you're building a customer service AI that needs to handle queries about products, support issues, and financial matters. You could dump all your documents into a single vector database, but that would be like having a library where cookbooks, technical manuals, and financial reports are all mixed together. Not very efficient, right?
Traditional RAG systems treat all documents uniformly, leading to slower search times and diluted results. What if we could automatically route queries to the most relevant database while maintaining high performance?
In this tutorial, we'll build a sophisticated RAG system with intelligent database routing that uses multiple specialized vector databases (product info, customer support, financial data) with an agent-based router to direct queries to the most relevant database. When no relevant documents are found, it gracefully falls back to web search using DuckDuckGo.
The app uses:
Langchain for RAG orchestration
Phidata as the router agent to determine the most relevant database for a given query
LangGraph as a fallback mechanism, utilizing DuckDuckGo for web research when necessary
Streamlit for a user-friendly interface for document upload and querying
Qdrant for storing and retrieving document embeddings
GTP-4o for answer synthesis
What We’re Building
We’re building a RAG app with database routing capabilities that allows users to upload multiple documents to three different databases: Product Information, Customer Support & FAQ, and Financial Information. The user can query the uploaded information in natural language, and the app will route to the most relevant database.
Features:
Query Routing - The system uses a three-stage routing approach:
Vector similarity search across all databases
LLM-based routing for ambiguous queries
Web search fallback for unknown topics
Dual-stage smart query routing:
Primary: Vector similarity scoring with confidence thresholds (0.5 threshold)
Fallback: GPT-4o-powered routing agent when confidence is low
Document Processing
Automatic text extraction from PDFs
Smart text chunking with overlap
Vector embedding generation
Answer Generation
Context-aware retrieval
Confidence-based responses
Prerequisites
Before we begin, make sure you have the following:
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the rag_database_routing folder:
cd rag_tutorials/rag_database_routing
Install the required dependencies:
pip install -r requirements.txt
Get your API Key:
Obtain an OpenAI API key and set it in the application.
Qdrant Cloud Setup - Visit Qdrant Cloud > Create an account or sign in > Create a new cluster > Get your credentials:
- Qdrant API Key: Found in API Keys section
- Qdrant URL: Your cluster URL (format:https://xxx-xxx.aws.cloud.qdrant.io
)
Creating the Streamlit App
Let’s create our app. Create a new file rag_database_routing.py
and add the following code:
Let's set up our imports and configurations:
from dataclasses import dataclass
from langchain_core.documents import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Qdrant
# Define database types and configurations
@dataclass
class CollectionConfig:
name: str
description: str
collection_name: str
COLLECTIONS = {
"products": CollectionConfig(...),
"support": CollectionConfig(...),
"finance": CollectionConfig(...)
}
Initialize session state and models:
def initialize_models():
if st.session_state.openai_api_key:
st.session_state.embeddings = OpenAIEmbeddings()
st.session_state.llm = ChatOpenAI(temperature=0)
client = QdrantClient(
url=st.session_state.qdrant_url,
api_key=st.session_state.qdrant_api_key
)
Document processing pipeline:
def process_document(file) -> List[Document]:
with tempfile.NamedTemporaryFile(suffix='.pdf') as tmp_file:
tmp_file.write(file.getvalue())
loader = PyPDFLoader(tmp_file.name)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
return text_splitter.split_documents(documents)
Create routing agent with phi:
def create_routing_agent() -> Agent:
return Agent(
model=OpenAIChat(id="gpt-4o"),
description="Query routing expert",
instructions=[
"Route questions to: products, support, or finance",
"Return ONLY the database name",
"Consider question context carefully"
]
)
Implement smart query routing:
def route_query(question: str) -> Optional[DatabaseType]:
# Try vector similarity routing first
for db_type, db in st.session_state.databases.items():
results = db.similarity_search_with_score(question, k=3)
avg_score = sum(score for _, score in results) / len(results)
# Fallback to LLM routing if low confidence
if best_score < confidence_threshold:
routing_agent = create_routing_agent()
response = routing_agent.run(question)
Database querying logic:
def query_database(db: Qdrant, question: str) -> tuple[str, list]:
retriever = db.as_retriever(
search_type="similarity",
search_kwargs={"k": 4}
)
retrieval_qa_prompt = ChatPromptTemplate.from_messages([
("system", "Answer based on context"),
("human", "{context}"),
("human", "{input}")
])
chain = create_retrieval_chain(retriever, combine_docs_chain)
return chain.invoke({"input": question})
Web search fallback:
def create_fallback_agent(chat_model: BaseLanguageModel):
def web_research(query: str) -> str:
search = DuckDuckGoSearchRun(num_results=5)
return search.run(query)
agent = create_react_agent(
model=chat_model,
tools=[web_research]
)
return agent
Streamlit interface setup:
def main():
st.set_page_config(title="RAG Agent with Routing")
with st.sidebar:
st.header("Configuration")
api_key = st.text_input("OpenAI API Key", type="password")
qdrant_url = st.text_input("Qdrant URL")
qdrant_api_key = st.text_input("Qdrant API Key", type="password")
Document upload interface:
tabs = st.tabs([config.name for config in COLLECTIONS.values()])
for (collection_type, config), tab in zip(COLLECTIONS.items(), tabs):
with tab:
uploaded_files = st.file_uploader(
f"Upload to {config.name}",
type="pdf",
accept_multiple_files=True
)
Document processing and storage:
if uploaded_files:
with st.spinner('Processing...'):
all_texts = []
for file in uploaded_files:
texts = process_document(file)
all_texts.extend(texts)
db = st.session_state.databases[collection_type]
db.add_documents(all_texts)
Query handling:
question = st.text_input("Enter your question:")
if question:
collection_type = route_query(question)
if collection_type:
db = st.session_state.databases[collection_type]
answer, docs = query_database(db, question)
else:
answer, docs = _handle_web_fallback(question)
Response display:
st.write("### Answer")
st.write(answer)
if docs:
with st.expander("Sources"):
for doc in docs:
st.markdown(f"- {doc.page_content[:200]}...")
Error handling and fallbacks:
try:
# Main query logic
except Exception as e:
st.error(f"Error: {str(e)}")
answer = _handle_web_fallback(question)
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run rag_database_routing.py
Streamlit will provide a local URL (typically http://localhost:8501).
Working Application Demo
Conclusion
You've successfully built a production-ready RAG system with intelligent database routing! Unlike basic RAG implementations, your system now intelligently directs queries to specialized databases, making it vastly more efficient and accurate.
For further enhancements, you can:
Implement cross-database queries when questions span multiple domains
Add routing history analysis to improve future routing decisions
Add support for more document formats (DOCX, HTML, Markdown)
Implement caching for frequently accessed documents
Implement source citations in responses
Keep experimenting and refining to build smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply