• unwind ai
  • Posts
  • Build a RAG Agent with Cohere ⌘R

Build a RAG Agent with Cohere ⌘R

Fully functional RAG Agentic system using Command R7B (step-by-step instructions)

Building powerful RAG applications has often meant trading off between model performance, cost, and speed. Today, we're changing that by using Cohere's newly released Command R7B model - their most efficient model that delivers top-tier performance in RAG, tool use, and agentic behavior while keeping API costs low and response times fast.

In this tutorial, we'll build a production-ready RAG agent that combines Command R7B's capabilities with Qdrant for vector storage, Langchain for RAG pipeline management, and LangGraph for orchestration. You'll create a system that not only answers questions from your documents but intelligently falls back to web search when needed.

Command R7B brings an impressive 128k context window and leads the HuggingFace Open LLM Leaderboard in its size class. What makes it particularly exciting for our RAG application is its native in-line citation capabilities and strong performance on enterprise RAG use-cases, all while being efficient enough to run on commodity hardware.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

Our application allows users to upload documents, ask questions about them, and receive AI-powered responses with automatic fallback to web search when needed.

Features:

  1. Document Processing

    • Upload and process PDF documents

    • Automatic text chunking and embedding

    • Vector storage in Qdrant cloud

  2. Intelligent Querying

    • RAG-based document retrieval

    • Similarity search with threshold filtering

    • Source attribution for answers

  3. Advanced Capabilities

    • DuckDuckGo web search integration

    • Context-aware response generation

    • Long answer summarization

How the App Works

The application follows a sophisticated workflow:

  1. Document Processing Pipeline:

    • Documents are chunked using RecursiveCharacterTextSplitter

    • Text chunks are embedded using Cohere's embed-english-v3.0 model

    • Embeddings are stored in Qdrant for fast retrieval

  2. Query Processing Flow: User queries are processed through a two-stage system:

    • Primary RAG retrieval with a similarity threshold

    • Automatic fallback to web search if no relevant documents are found

  3. Response Generation:

    • Command R7B generates responses with source attribution using retrieved context

    • Long responses are automatically summarized

    • Web search results are integrated when needed

  4. Agent Orchestration:

    • LangGraph manages the interaction between components

    • Handles graceful fallbacks and error recovery

    • Maintains conversation context and history

Prerequisites

Before we begin, make sure you have the following:

  1. Python installed on your machine (version 3.10 or higher is recommended)

  2. Your Cohere and Qdrant API Key along with cluster URL

  3. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

  4. Basic familiarity with Python programming

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the rag_agent_cohere folder:

cd rag_tutorials/rag_agent_cohere
pip install -r requirements.txt
  1. Get your API Key:

    • Cohere API Key - Go to Cohere Platform > Sign up or log in to your account > Navigate to API Keys section > Create a new API key

    • Qdrant Cloud Setup - Visit Qdrant Cloud > Create an account or sign in > Create a new cluster > Get your credentials:

      • Qdrant API Key: Found in API Keys section

      • Qdrant URL: Your cluster URL (format: https://xxx-xxx.aws.cloud.qdrant.io)

Creating the Streamlit App

Let’s create our app. Create a new file mutimodal_agent.py and add the following code:

  1. Let's set up our imports:

import os
import streamlit as st
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_cohere import CohereEmbeddings, ChatCohere
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain import hub
import tempfile
from langgraph.prebuilt import create_react_agent
from langchain_community.tools import DuckDuckGoSearchRun
from typing import TypedDict, List
from langchain_core.language_models import BaseLanguageModel
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from time import sleep
from tenacity import retry, wait_exponential, stop_after_attempt
  1. Set up session state and API configuration:

def init_session_state():
    if 'api_keys_submitted' not in st.session_state:
        st.session_state.api_keys_submitted = False
    if 'chat_history' not in st.session_state:
        st.session_state.chat_history = []
    if 'vectorstore' not in st.session_state:
        st.session_state.vectorstore = None
  1. Create Qdrant client initialization:

def init_qdrant() -> QdrantClient:
    return QdrantClient(
        url=st.session_state.qdrant_url,
        api_key=st.session_state.qdrant_api_key,
        timeout=60
    )
  1. Document processing pipeline:

def process_document(file):
    with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp_file:
        tmp_file.write(file.getvalue())
        
    loader = PyPDFLoader(tmp_file.name)
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, 
        chunk_overlap=200
    )
    texts = text_splitter.split_documents(documents)
  1. Set up vector store creation:

def create_vector_stores(texts):
    vector_store = QdrantVectorStore(
        client=client,
        collection_name=COLLECTION_NAME,
        embedding=embedding
    )
    vector_store.add_documents(texts)
    return vector_store
  1. Create rate-limited web search:

class RateLimitedDuckDuckGo(DuckDuckGoSearchRun):
    @retry(wait=wait_exponential(multiplier=1, min=4, max=10))
    def run(self, query: str) -> str:
        sleep(2)  # Rate limiting
        return super().run(query)
  1. Implement fallback agent:

def create_fallback_agent(chat_model: BaseLanguageModel):
    tools = [web_research]
    agent = create_react_agent(
        model=chat_model,
        tools=tools,
        debug=False
    )
    return agent
  1. Query processing with RAG:

def process_query(vectorstore, query):
    retriever = vectorstore.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs={
            "k": 10,
            "score_threshold": 0.7
        }
    )
    relevant_docs = retriever.get_relevant_documents(query)
  1. Answer generation chain:

retrieval_qa_prompt = hub.pull("langchain-ai/retrieval-qa-chat")
combine_docs_chain = create_stuff_documents_chain(
    chat_model, 
    retrieval_qa_prompt
)
retrieval_chain = create_retrieval_chain(
    retriever, 
    combine_docs_chain
)
  1. Streamlit interface:

st.title("RAG Agent with Cohere ⌘R")
uploaded_file = st.file_uploader(
    "Choose a PDF File", 
    type=["pdf"]
)
query = st.chat_input("Ask a question:")
  1. Chat history management:

if query:
    st.session_state.chat_history.append({
        "role": "user", 
        "content": query
    })
    answer, sources = process_query(
        st.session_state.vectorstore, 
        query
    )
  1. Data clearing functionality:

if st.button('Clear All Data'):
    collections = client.get_collections().collections
    if COLLECTION_NAME in collection_names:
        client.delete_collection(COLLECTION_NAME)
    st.session_state.vectorstore = None

Running the App

With our code in place, it's time to launch the app.

  • In your terminal, navigate to the project folder, and run the following command

streamlit run rag_agent_cohere.py

Working Application Demo

Conclusion

You've successfully built a production-ready RAG Agent powered by Cohere's Command R7B that combines intelligent document querying with automatic web search fallback.

For further enhancements, consider:

  • Adding support for more file formats (Word, HTML, Markdown)

  • Fine-tuning similarity thresholds based on your use case

  • Adding user authentication and multi-user support

  • Implementing caching for frequently asked questions

Keep experimenting and refining to build smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.