unwind ai
Posts
Build a RAG App with Hybrid Search using Claude 3.5 Sonnet

Build a RAG App with Hybrid Search using Claude 3.5 Sonnet

Fully functional RAG app using Claude 3.5 Sonnet, OpenAI embeddings, and PostgreSQL (step-by-step instructions)

Shubham Saboo & Gargi Gupta
December 05, 2024

Traditional chatbots can either access general knowledge or search through specific documents - but rarely do both well. Modern applications need the ability to intelligently combine document search with language model capabilities. Enter Hybrid Search RAG (Retrieval Augmented Generation), a powerful approach that combines the best of both worlds.

In this tutorial, we'll build a sophisticated document Q&A system that seamlessly combines document-specific knowledge with Claude's general intelligence to deliver accurate and contextual responses. It:

Allows users to upload PDF files
Automatically creates text chunks and embeddings
Uses Hybrid Search to find relevant information in documents
Uses Claude for high-quality responses
Falls back to Claude's general knowledge when needed
Provides an intuitive chat interface

What is Hybrid Search RAG?

Hybrid Search RAG combines two powerful approaches:

Semantic Search: Uses embeddings to find contextually similar content
Keyword Search: Finds exact or close matches to specific terms
RAG (Retrieval Augmented Generation): Uses the retrieved content to generate accurate, contextual responses

This combination helps overcome limitations of each approach:

Pure semantic search might miss exact matches
Pure keyword search might miss contextually relevant content
RAG ensures the language model's responses are grounded in your documents

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Key Components

1. RAGLite

RAGLite is the foundation - a Python toolkit for RAG that provides:

Document processing and chunking
Vector and keyword search capabilities
Integration with various LLM providers
Database storage (PostgreSQL or SQLite)

2. Model Stack

We use three different models, each specialized for a specific task:

Claude 3 Opus: Main language model for generating responses
OpenAI text-embedding-3-large: Creates embeddings for semantic search
Cohere Reranker: Improves search result relevance by reordering results

3. Database

Supports multiple options:

PostgreSQL (recommended for production)
SQLite (great for development)
Stores both text and vector embeddings

Prerequisites

Before we begin, make sure you have the following:

1. Database:

Create a free PostgreSQL database at Neon:

Sign up/Login at Neon
Create a new project
Copy the connection string (looks like: postgresql://user:[email protected]/dbname)

2. API Keys:

OpenAI API key for embeddings
Anthropic API key for Claude
Cohere API key for reranking

3. Software Requirements:

Python installed on your machine (version 3.7 or higher is recommended)
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git

🌟 Don't forget to star the opensource repo to show your support.

Go to the hybrid_search_rag folder:

cd rag_tutorials/hybrid_search_rag

Install the required dependencies:

pip install -r requirements.txt

Install spaCy Model:

pip install https://github.com/explosion/spacy-models/releases/download/xx_sent_ud_sm-3.7.0/xx_sent_ud_sm-3.7.0-py3-none-any.whl

Creating the Streamlit App

Let’s create our app. Create a new file main.py and add the following code:

Import required libraries and setup:
• RAGLite for core RAG functionality
• Anthropic for LLM (Claude 3.5 Sonnet)
• Reranking capabilities using Cohere

import os
import logging
import streamlit as st
from raglite import RAGLiteConfig, insert_document, hybrid_search, retrieve_chunks, rerank_chunks, rag
from rerankers import Reranker
from typing import List
from pathlib import Path
import anthropic
import time
import warnings

Set up the RAGLite configuration with multi-LLM support:

def initialize_config(openai_key: str, anthropic_key: str, cohere_key: str, db_url: str) -> RAGLiteConfig:
    try:
        os.environ["OPENAI_API_KEY"] = openai_key
        os.environ["ANTHROPIC_API_KEY"] = anthropic_key
        os.environ["COHERE_API_KEY"] = cohere_key
        
        return RAGLiteConfig(
            db_url=db_url,
            llm="claude-3-opus-20240229",
            embedder="text-embedding-3-large",
            embedder_normalize=True,
            chunk_max_size=2000,
            embedder_sentence_window_size=2,
            reranker=Reranker("cohere", api_key=cohere_key, lang="en")
        )

Implement document processing pipeline:

def process_document(file_path: str) -> bool:
    try:
        if not st.session_state.get('my_config'):
            raise ValueError("Configuration not initialized")
        insert_document(Path(file_path), config=st.session_state.my_config)
        return True
    except Exception as e:
        logger.error(f"Error processing document: {str(e)}")
        return False

Create hybrid search functionality:

def perform_search(query: str) -> List[dict]:
    try:
        chunk_ids, scores = hybrid_search(query, num_results=10, config=st.session_state.my_config)
        if not chunk_ids:
            return []
        chunks = retrieve_chunks(chunk_ids, config=st.session_state.my_config)
        return rerank_chunks(query, chunks, config=st.session_state.my_config)
    except Exception as e:
        logger.error(f"Search error: {str(e)}")
        return []

Implement intelligent fallback with Claude:

def handle_fallback(query: str) -> str:
    try:
        client = anthropic.Anthropic(api_key=st.session_state.user_env["ANTHROPIC_API_KEY"])
        system_prompt = """You are a helpful AI assistant. When you don't know something, 
        be honest about it. Provide clear, concise, and accurate responses. If the question 
        is not related to any specific document, use your general knowledge to answer."""
        
        message = client.messages.create(
            model="claude-3-sonnet-20240229",
            max_tokens=1024,
            system=system_prompt,
            messages=[{"role": "user", "content": query}],
            temperature=0.7
        )
        return message.content[0].text

Set up Streamlit interface with multi-key configuration:

def main():
    st.set_page_config(page_title="LLM-Powered Hybrid Search-RAG Assistant", layout="wide")
    
    for state_var in ['chat_history', 'documents_loaded', 'my_config', 'user_env']:
        if state_var not in st.session_state:
            st.session_state[state_var] = [] if state_var == 'chat_history' else False if state_var == 'documents_loaded' else None if state_var == 'my_config' else {}

    with st.sidebar:
        st.title("Configuration")
        openai_key = st.text_input("OpenAI API Key", value=st.session_state.get('openai_key', ''), type="password", placeholder="sk-...")
        anthropic_key = st.text_input("Anthropic API Key", value=st.session_state.get('anthropic_key', ''), type="password", placeholder="sk-ant-...")
        cohere_key = st.text_input("Cohere API Key", value=st.session_state.get('cohere_key', ''), type="password", placeholder="Enter Cohere key")
        db_url = st.text_input("Database URL", value=st.session_state.get('db_url', 'sqlite:///raglite.sqlite'), placeholder="sqlite:///raglite.sqlite")

Implement document upload and processing:

        uploaded_files = st.file_uploader("Upload PDF documents", type=["pdf"], accept_multiple_files=True, key="pdf_uploader")

        if uploaded_files:
            success = False
            for uploaded_file in uploaded_files:
                with st.spinner(f"Processing {uploaded_file.name}..."):
                    temp_path = f"temp_{uploaded_file.name}"
                    with open(temp_path, "wb") as f:
                        f.write(uploaded_file.getvalue())
                    
                    if process_document(temp_path):
                        st.success(f"Successfully processed: {uploaded_file.name}")
                        success = True
                    else:
                        st.error(f"Failed to process: {uploaded_file.name}")
                    os.remove(temp_path)

Create chat interface with history:

        for msg in st.session_state.chat_history:
            with st.chat_message("user"): st.write(msg[0])
            with st.chat_message("assistant"): st.write(msg[1])

        user_input = st.chat_input("Ask a question about the documents...")

Implement RAG response generation:

                        formatted_messages = [{"role": "user" if i % 2 == 0 else "assistant", "content": msg}
                                           for i, msg in enumerate([m for pair in st.session_state.chat_history for m in pair]) if msg]
                        
                        response_stream = rag(prompt=user_input, 
                                           system_prompt=RAG_SYSTEM_PROMPT,
                                           search=hybrid_search, 
                                           messages=formatted_messages,
                                           max_contexts=5, 
                                           config=st.session_state.my_config)

Add streaming response handling:

                        full_response = ""
                        for chunk in response_stream:
                            full_response += chunk
                            message_placeholder.markdown(full_response + "▌")

Running the App

With our code in place, it's time to launch the app.

In your terminal, navigate to the project folder, and run the following command

streamlit run main.py

Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, put in your API key, give it an area you’d want to explore, and watch your AI agent doing the research for you.

Working Application Demo

Conclusion

You've successfully built a sophisticated Q&A that combines the power of:

Hybrid search for better document retrieval
Multiple specialized AI models working together
Automatic fallback for general knowledge questions.

This foundation can be enhanced in several ways:

Add support for more document formats beyond PDF
Implement memory to maintain conversation context
Create a citation system to track source documents

Keep experimenting and refining to build smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.