- unwind ai
- Posts
- Build a RAG App with Hybrid Search using Claude 3.5 Sonnet
Build a RAG App with Hybrid Search using Claude 3.5 Sonnet
Fully functional RAG app using Claude 3.5 Sonnet, OpenAI embeddings, and PostgreSQL (step-by-step instructions)
Traditional chatbots can either access general knowledge or search through specific documents - but rarely do both well. Modern applications need the ability to intelligently combine document search with language model capabilities. Enter Hybrid Search RAG (Retrieval Augmented Generation), a powerful approach that combines the best of both worlds.
In this tutorial, we'll build a sophisticated document Q&A system that seamlessly combines document-specific knowledge with Claude's general intelligence to deliver accurate and contextual responses. It:
Allows users to upload PDF files
Automatically creates text chunks and embeddings
Uses Hybrid Search to find relevant information in documents
Uses Claude for high-quality responses
Falls back to Claude's general knowledge when needed
Provides an intuitive chat interface
What is Hybrid Search RAG?
Hybrid Search RAG combines two powerful approaches:
Semantic Search: Uses embeddings to find contextually similar content
Keyword Search: Finds exact or close matches to specific terms
RAG (Retrieval Augmented Generation): Uses the retrieved content to generate accurate, contextual responses
This combination helps overcome limitations of each approach:
Pure semantic search might miss exact matches
Pure keyword search might miss contextually relevant content
RAG ensures the language model's responses are grounded in your documents
Key Components
1. RAGLite
RAGLite is the foundation - a Python toolkit for RAG that provides:
Document processing and chunking
Vector and keyword search capabilities
Integration with various LLM providers
Database storage (PostgreSQL or SQLite)
2. Model Stack
We use three different models, each specialized for a specific task:
Claude 3 Opus: Main language model for generating responses
OpenAI text-embedding-3-large: Creates embeddings for semantic search
Cohere Reranker: Improves search result relevance by reordering results
3. Database
Supports multiple options:
PostgreSQL (recommended for production)
SQLite (great for development)
Stores both text and vector embeddings
Prerequisites
Before we begin, make sure you have the following:
1. Database:
Create a free PostgreSQL database at Neon:
Sign up/Login at Neon
Create a new project
Copy the connection string (looks like:
postgresql://user:[email protected]/dbname
)
2. API Keys:
OpenAI API key for embeddings
Anthropic API key for Claude
Cohere API key for reranking
3. Software Requirements:
Python installed on your machine (version 3.7 or higher is recommended)
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the hybrid_search_rag folder:
cd rag_tutorials/hybrid_search_rag
Install the required dependencies:
pip install -r requirements.txt
Install spaCy Model:
pip install https://github.com/explosion/spacy-models/releases/download/xx_sent_ud_sm-3.7.0/xx_sent_ud_sm-3.7.0-py3-none-any.whl
Creating the Streamlit App
Let’s create our app. Create a new file main.py
and add the following code:
Import required libraries and setup:
• RAGLite for core RAG functionality
• Anthropic for LLM (Claude 3.5 Sonnet)
• Reranking capabilities using Cohere
import os
import logging
import streamlit as st
from raglite import RAGLiteConfig, insert_document, hybrid_search, retrieve_chunks, rerank_chunks, rag
from rerankers import Reranker
from typing import List
from pathlib import Path
import anthropic
import time
import warnings
Set up the RAGLite configuration with multi-LLM support:
def initialize_config(openai_key: str, anthropic_key: str, cohere_key: str, db_url: str) -> RAGLiteConfig:
try:
os.environ["OPENAI_API_KEY"] = openai_key
os.environ["ANTHROPIC_API_KEY"] = anthropic_key
os.environ["COHERE_API_KEY"] = cohere_key
return RAGLiteConfig(
db_url=db_url,
llm="claude-3-opus-20240229",
embedder="text-embedding-3-large",
embedder_normalize=True,
chunk_max_size=2000,
embedder_sentence_window_size=2,
reranker=Reranker("cohere", api_key=cohere_key, lang="en")
)
Implement document processing pipeline:
def process_document(file_path: str) -> bool:
try:
if not st.session_state.get('my_config'):
raise ValueError("Configuration not initialized")
insert_document(Path(file_path), config=st.session_state.my_config)
return True
except Exception as e:
logger.error(f"Error processing document: {str(e)}")
return False
Create hybrid search functionality:
def perform_search(query: str) -> List[dict]:
try:
chunk_ids, scores = hybrid_search(query, num_results=10, config=st.session_state.my_config)
if not chunk_ids:
return []
chunks = retrieve_chunks(chunk_ids, config=st.session_state.my_config)
return rerank_chunks(query, chunks, config=st.session_state.my_config)
except Exception as e:
logger.error(f"Search error: {str(e)}")
return []
Implement intelligent fallback with Claude:
def handle_fallback(query: str) -> str:
try:
client = anthropic.Anthropic(api_key=st.session_state.user_env["ANTHROPIC_API_KEY"])
system_prompt = """You are a helpful AI assistant. When you don't know something,
be honest about it. Provide clear, concise, and accurate responses. If the question
is not related to any specific document, use your general knowledge to answer."""
message = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": query}],
temperature=0.7
)
return message.content[0].text
Set up Streamlit interface with multi-key configuration:
def main():
st.set_page_config(page_title="LLM-Powered Hybrid Search-RAG Assistant", layout="wide")
for state_var in ['chat_history', 'documents_loaded', 'my_config', 'user_env']:
if state_var not in st.session_state:
st.session_state[state_var] = [] if state_var == 'chat_history' else False if state_var == 'documents_loaded' else None if state_var == 'my_config' else {}
with st.sidebar:
st.title("Configuration")
openai_key = st.text_input("OpenAI API Key", value=st.session_state.get('openai_key', ''), type="password", placeholder="sk-...")
anthropic_key = st.text_input("Anthropic API Key", value=st.session_state.get('anthropic_key', ''), type="password", placeholder="sk-ant-...")
cohere_key = st.text_input("Cohere API Key", value=st.session_state.get('cohere_key', ''), type="password", placeholder="Enter Cohere key")
db_url = st.text_input("Database URL", value=st.session_state.get('db_url', 'sqlite:///raglite.sqlite'), placeholder="sqlite:///raglite.sqlite")
Implement document upload and processing:
uploaded_files = st.file_uploader("Upload PDF documents", type=["pdf"], accept_multiple_files=True, key="pdf_uploader")
if uploaded_files:
success = False
for uploaded_file in uploaded_files:
with st.spinner(f"Processing {uploaded_file.name}..."):
temp_path = f"temp_{uploaded_file.name}"
with open(temp_path, "wb") as f:
f.write(uploaded_file.getvalue())
if process_document(temp_path):
st.success(f"Successfully processed: {uploaded_file.name}")
success = True
else:
st.error(f"Failed to process: {uploaded_file.name}")
os.remove(temp_path)
Create chat interface with history:
for msg in st.session_state.chat_history:
with st.chat_message("user"): st.write(msg[0])
with st.chat_message("assistant"): st.write(msg[1])
user_input = st.chat_input("Ask a question about the documents...")
Implement RAG response generation:
formatted_messages = [{"role": "user" if i % 2 == 0 else "assistant", "content": msg}
for i, msg in enumerate([m for pair in st.session_state.chat_history for m in pair]) if msg]
response_stream = rag(prompt=user_input,
system_prompt=RAG_SYSTEM_PROMPT,
search=hybrid_search,
messages=formatted_messages,
max_contexts=5,
config=st.session_state.my_config)
Add streaming response handling:
full_response = ""
for chunk in response_stream:
full_response += chunk
message_placeholder.markdown(full_response + "▌")
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run main.py
Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, put in your API key, give it an area you’d want to explore, and watch your AI agent doing the research for you.
Working Application Demo
Conclusion
You've successfully built a sophisticated Q&A that combines the power of:
Hybrid search for better document retrieval
Multiple specialized AI models working together
Automatic fallback for general knowledge questions.
This foundation can be enhanced in several ways:
Add support for more document formats beyond PDF
Implement memory to maintain conversation context
Create a citation system to track source documents
Keep experimenting and refining to build smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply