• unwind ai
  • Posts
  • Build a RAG App with Hybrid Search using Claude 3.5 Sonnet

Build a RAG App with Hybrid Search using Claude 3.5 Sonnet

Fully functional RAG app using Claude 3.5 Sonnet, OpenAI embeddings, and PostgreSQL (step-by-step instructions)

Traditional chatbots can either access general knowledge or search through specific documents - but rarely do both well. Modern applications need the ability to intelligently combine document search with language model capabilities. Enter Hybrid Search RAG (Retrieval Augmented Generation), a powerful approach that combines the best of both worlds.

In this tutorial, we'll build a sophisticated document Q&A system that seamlessly combines document-specific knowledge with Claude's general intelligence to deliver accurate and contextual responses. It:

  • Allows users to upload PDF files

  • Automatically creates text chunks and embeddings

  • Uses Hybrid Search to find relevant information in documents

  • Uses Claude for high-quality responses

  • Falls back to Claude's general knowledge when needed

  • Provides an intuitive chat interface

What is Hybrid Search RAG?

Hybrid Search RAG combines two powerful approaches:

  1. Semantic Search: Uses embeddings to find contextually similar content

  2. Keyword Search: Finds exact or close matches to specific terms

  3. RAG (Retrieval Augmented Generation): Uses the retrieved content to generate accurate, contextual responses

This combination helps overcome limitations of each approach:

  • Pure semantic search might miss exact matches

  • Pure keyword search might miss contextually relevant content

  • RAG ensures the language model's responses are grounded in your documents

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Key Components

1. RAGLite

RAGLite is the foundation - a Python toolkit for RAG that provides:

  • Document processing and chunking

  • Vector and keyword search capabilities

  • Integration with various LLM providers

  • Database storage (PostgreSQL or SQLite)

2. Model Stack

We use three different models, each specialized for a specific task:

  • Claude 3 Opus: Main language model for generating responses

  • OpenAI text-embedding-3-large: Creates embeddings for semantic search

  • Cohere Reranker: Improves search result relevance by reordering results

3. Database

Supports multiple options:

  • PostgreSQL (recommended for production)

  • SQLite (great for development)

  • Stores both text and vector embeddings

Prerequisites

Before we begin, make sure you have the following:

1. Database:

Create a free PostgreSQL database at Neon:

  1. Sign up/Login at Neon

  2. Create a new project

  3. Copy the connection string (looks like: postgresql://user:[email protected]/dbname)

2. API Keys:

  1. OpenAI API key for embeddings

  2. Anthropic API key for Claude

  3. Cohere API key for reranking

3. Software Requirements:

  1. Python installed on your machine (version 3.7 or higher is recommended)

  2. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

  3. Basic familiarity with Python programming

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the hybrid_search_rag folder:

cd rag_tutorials/hybrid_search_rag
pip install -r requirements.txt
  1. Install spaCy Model:

pip install https://github.com/explosion/spacy-models/releases/download/xx_sent_ud_sm-3.7.0/xx_sent_ud_sm-3.7.0-py3-none-any.whl

Creating the Streamlit App

Let’s create our app. Create a new file main.py and add the following code:

  1. Import required libraries and setup:

    • RAGLite for core RAG functionality

    • Anthropic for LLM (Claude 3.5 Sonnet)

    • Reranking capabilities using Cohere

import os
import logging
import streamlit as st
from raglite import RAGLiteConfig, insert_document, hybrid_search, retrieve_chunks, rerank_chunks, rag
from rerankers import Reranker
from typing import List
from pathlib import Path
import anthropic
import time
import warnings
  1. Set up the RAGLite configuration with multi-LLM support:

def initialize_config(openai_key: str, anthropic_key: str, cohere_key: str, db_url: str) -> RAGLiteConfig:
    try:
        os.environ["OPENAI_API_KEY"] = openai_key
        os.environ["ANTHROPIC_API_KEY"] = anthropic_key
        os.environ["COHERE_API_KEY"] = cohere_key
        
        return RAGLiteConfig(
            db_url=db_url,
            llm="claude-3-opus-20240229",
            embedder="text-embedding-3-large",
            embedder_normalize=True,
            chunk_max_size=2000,
            embedder_sentence_window_size=2,
            reranker=Reranker("cohere", api_key=cohere_key, lang="en")
        )
  1. Implement document processing pipeline:

def process_document(file_path: str) -> bool:
    try:
        if not st.session_state.get('my_config'):
            raise ValueError("Configuration not initialized")
        insert_document(Path(file_path), config=st.session_state.my_config)
        return True
    except Exception as e:
        logger.error(f"Error processing document: {str(e)}")
        return False
  1. Create hybrid search functionality:

def perform_search(query: str) -> List[dict]:
    try:
        chunk_ids, scores = hybrid_search(query, num_results=10, config=st.session_state.my_config)
        if not chunk_ids:
            return []
        chunks = retrieve_chunks(chunk_ids, config=st.session_state.my_config)
        return rerank_chunks(query, chunks, config=st.session_state.my_config)
    except Exception as e:
        logger.error(f"Search error: {str(e)}")
        return []
  1. Implement intelligent fallback with Claude:

def handle_fallback(query: str) -> str:
    try:
        client = anthropic.Anthropic(api_key=st.session_state.user_env["ANTHROPIC_API_KEY"])
        system_prompt = """You are a helpful AI assistant. When you don't know something, 
        be honest about it. Provide clear, concise, and accurate responses. If the question 
        is not related to any specific document, use your general knowledge to answer."""
        
        message = client.messages.create(
            model="claude-3-sonnet-20240229",
            max_tokens=1024,
            system=system_prompt,
            messages=[{"role": "user", "content": query}],
            temperature=0.7
        )
        return message.content[0].text
  1. Set up Streamlit interface with multi-key configuration:

def main():
    st.set_page_config(page_title="LLM-Powered Hybrid Search-RAG Assistant", layout="wide")
    
    for state_var in ['chat_history', 'documents_loaded', 'my_config', 'user_env']:
        if state_var not in st.session_state:
            st.session_state[state_var] = [] if state_var == 'chat_history' else False if state_var == 'documents_loaded' else None if state_var == 'my_config' else {}

    with st.sidebar:
        st.title("Configuration")
        openai_key = st.text_input("OpenAI API Key", value=st.session_state.get('openai_key', ''), type="password", placeholder="sk-...")
        anthropic_key = st.text_input("Anthropic API Key", value=st.session_state.get('anthropic_key', ''), type="password", placeholder="sk-ant-...")
        cohere_key = st.text_input("Cohere API Key", value=st.session_state.get('cohere_key', ''), type="password", placeholder="Enter Cohere key")
        db_url = st.text_input("Database URL", value=st.session_state.get('db_url', 'sqlite:///raglite.sqlite'), placeholder="sqlite:///raglite.sqlite")
  1. Implement document upload and processing:

        uploaded_files = st.file_uploader("Upload PDF documents", type=["pdf"], accept_multiple_files=True, key="pdf_uploader")

        if uploaded_files:
            success = False
            for uploaded_file in uploaded_files:
                with st.spinner(f"Processing {uploaded_file.name}..."):
                    temp_path = f"temp_{uploaded_file.name}"
                    with open(temp_path, "wb") as f:
                        f.write(uploaded_file.getvalue())
                    
                    if process_document(temp_path):
                        st.success(f"Successfully processed: {uploaded_file.name}")
                        success = True
                    else:
                        st.error(f"Failed to process: {uploaded_file.name}")
                    os.remove(temp_path)
  1. Create chat interface with history:

        for msg in st.session_state.chat_history:
            with st.chat_message("user"): st.write(msg[0])
            with st.chat_message("assistant"): st.write(msg[1])

        user_input = st.chat_input("Ask a question about the documents...")
  1. Implement RAG response generation:

                        formatted_messages = [{"role": "user" if i % 2 == 0 else "assistant", "content": msg}
                                           for i, msg in enumerate([m for pair in st.session_state.chat_history for m in pair]) if msg]
                        
                        response_stream = rag(prompt=user_input, 
                                           system_prompt=RAG_SYSTEM_PROMPT,
                                           search=hybrid_search, 
                                           messages=formatted_messages,
                                           max_contexts=5, 
                                           config=st.session_state.my_config)
  1. Add streaming response handling:

                        full_response = ""
                        for chunk in response_stream:
                            full_response += chunk
                            message_placeholder.markdown(full_response + "")

Running the App

With our code in place, it's time to launch the app.

  • In your terminal, navigate to the project folder, and run the following command

streamlit run main.py
  • Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, put in your API key, give it an area you’d want to explore, and watch your AI agent doing the research for you.

Working Application Demo

Conclusion

You've successfully built a sophisticated Q&A that combines the power of:

  • Hybrid search for better document retrieval

  • Multiple specialized AI models working together

  • Automatic fallback for general knowledge questions.

This foundation can be enhanced in several ways:

  • Add support for more document formats beyond PDF

  • Implement memory to maintain conversation context

  • Create a citation system to track source documents

Keep experimenting and refining to build smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.