- unwind ai
- Posts
- Build a RAG Agent with Cohere ⌘R
Build a RAG Agent with Cohere ⌘R
Fully functional RAG Agentic system using Command R7B (step-by-step instructions)
Building powerful RAG applications has often meant trading off between model performance, cost, and speed. Today, we're changing that by using Cohere's newly released Command R7B model - their most efficient model that delivers top-tier performance in RAG, tool use, and agentic behavior while keeping API costs low and response times fast.
In this tutorial, we'll build a production-ready RAG agent that combines Command R7B's capabilities with Qdrant for vector storage, Langchain for RAG pipeline management, and LangGraph for orchestration. You'll create a system that not only answers questions from your documents but intelligently falls back to web search when needed.
Command R7B brings an impressive 128k context window and leads the HuggingFace Open LLM Leaderboard in its size class. What makes it particularly exciting for our RAG application is its native in-line citation capabilities and strong performance on enterprise RAG use-cases, all while being efficient enough to run on commodity hardware.
What We’re Building
Our application allows users to upload documents, ask questions about them, and receive AI-powered responses with automatic fallback to web search when needed.
Features:
Document Processing
Upload and process PDF documents
Automatic text chunking and embedding
Vector storage in Qdrant cloud
Intelligent Querying
RAG-based document retrieval
Similarity search with threshold filtering
Source attribution for answers
Advanced Capabilities
DuckDuckGo web search integration
Context-aware response generation
Long answer summarization
How the App Works
The application follows a sophisticated workflow:
Document Processing Pipeline:
Documents are chunked using RecursiveCharacterTextSplitter
Text chunks are embedded using Cohere's embed-english-v3.0 model
Embeddings are stored in Qdrant for fast retrieval
Query Processing Flow: User queries are processed through a two-stage system:
Primary RAG retrieval with a similarity threshold
Automatic fallback to web search if no relevant documents are found
Response Generation:
Command R7B generates responses with source attribution using retrieved context
Long responses are automatically summarized
Web search results are integrated when needed
Agent Orchestration:
LangGraph manages the interaction between components
Handles graceful fallbacks and error recovery
Maintains conversation context and history
Prerequisites
Before we begin, make sure you have the following:
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the rag_agent_cohere folder:
cd rag_tutorials/rag_agent_cohere
Install the required dependencies:
pip install -r requirements.txt
Get your API Key:
Cohere API Key - Go to Cohere Platform > Sign up or log in to your account > Navigate to API Keys section > Create a new API key
Qdrant Cloud Setup - Visit Qdrant Cloud > Create an account or sign in > Create a new cluster > Get your credentials:
Qdrant API Key: Found in API Keys section
Qdrant URL: Your cluster URL (format:
https://xxx-xxx.aws.cloud.qdrant.io
)
Creating the Streamlit App
Let’s create our app. Create a new file mutimodal_agent.py
and add the following code:
Let's set up our imports:
import os
import streamlit as st
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_cohere import CohereEmbeddings, ChatCohere
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain import hub
import tempfile
from langgraph.prebuilt import create_react_agent
from langchain_community.tools import DuckDuckGoSearchRun
from typing import TypedDict, List
from langchain_core.language_models import BaseLanguageModel
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from time import sleep
from tenacity import retry, wait_exponential, stop_after_attempt
Set up session state and API configuration:
def init_session_state():
if 'api_keys_submitted' not in st.session_state:
st.session_state.api_keys_submitted = False
if 'chat_history' not in st.session_state:
st.session_state.chat_history = []
if 'vectorstore' not in st.session_state:
st.session_state.vectorstore = None
Create Qdrant client initialization:
def init_qdrant() -> QdrantClient:
return QdrantClient(
url=st.session_state.qdrant_url,
api_key=st.session_state.qdrant_api_key,
timeout=60
)
Document processing pipeline:
def process_document(file):
with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp_file:
tmp_file.write(file.getvalue())
loader = PyPDFLoader(tmp_file.name)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
texts = text_splitter.split_documents(documents)
Set up vector store creation:
def create_vector_stores(texts):
vector_store = QdrantVectorStore(
client=client,
collection_name=COLLECTION_NAME,
embedding=embedding
)
vector_store.add_documents(texts)
return vector_store
Create rate-limited web search:
class RateLimitedDuckDuckGo(DuckDuckGoSearchRun):
@retry(wait=wait_exponential(multiplier=1, min=4, max=10))
def run(self, query: str) -> str:
sleep(2) # Rate limiting
return super().run(query)
Implement fallback agent:
def create_fallback_agent(chat_model: BaseLanguageModel):
tools = [web_research]
agent = create_react_agent(
model=chat_model,
tools=tools,
debug=False
)
return agent
Query processing with RAG:
def process_query(vectorstore, query):
retriever = vectorstore.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={
"k": 10,
"score_threshold": 0.7
}
)
relevant_docs = retriever.get_relevant_documents(query)
Answer generation chain:
retrieval_qa_prompt = hub.pull("langchain-ai/retrieval-qa-chat")
combine_docs_chain = create_stuff_documents_chain(
chat_model,
retrieval_qa_prompt
)
retrieval_chain = create_retrieval_chain(
retriever,
combine_docs_chain
)
Streamlit interface:
st.title("RAG Agent with Cohere ⌘R")
uploaded_file = st.file_uploader(
"Choose a PDF File",
type=["pdf"]
)
query = st.chat_input("Ask a question:")
Chat history management:
if query:
st.session_state.chat_history.append({
"role": "user",
"content": query
})
answer, sources = process_query(
st.session_state.vectorstore,
query
)
Data clearing functionality:
if st.button('Clear All Data'):
collections = client.get_collections().collections
if COLLECTION_NAME in collection_names:
client.delete_collection(COLLECTION_NAME)
st.session_state.vectorstore = None
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run rag_agent_cohere.py
Streamlit will provide a local URL (typically http://localhost:8501).
Working Application Demo
Conclusion
You've successfully built a production-ready RAG Agent powered by Cohere's Command R7B that combines intelligent document querying with automatic web search fallback.
For further enhancements, consider:
Adding support for more file formats (Word, HTML, Markdown)
Fine-tuning similarity thresholds based on your use case
Adding user authentication and multi-user support
Implementing caching for frequently asked questions
Keep experimenting and refining to build smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply