- unwind ai
- Posts
- Build a Customer Support Voice Agent
Build a Customer Support Voice Agent
Fully functional agentic RAG voice app with step-by-step instructions (100% opensource)
Voice is the most natural and accessible way for users to interact with any application and we see it being used the most for customer support usecases. But building a voice agent that can access your knowledge base can be complex and time-consuming.
In this tutorial, we'll build a Customer Support Voice Agent using OpenAI's SDK that combines GPT-4o with their latest TTS (text-to-speech) model. Our application will crawl documentation websites, process the content into a searchable knowledge base, and provide both text and voice responses to user queries through a clean Streamlit interface.
We'll be using:
Firecrawl to extract content from documentation websites,
Qdrant for vector storage and search capabilities,
GPT-4o as the LLM and GPT-4o-mini TTS model,
OpenAI Agents SDK for orchestrating the AI agents and the voice pipeline, and
FastEmbed for generating embeddings.
What We’re Building
This application implements a Voice RAG system powered by OpenAI's Agents SDK that delivers voice responses to documentation queries. The system creates a searchable knowledge base from your documentation and uses a multi-agent approach to generate contextually relevant answers through both text and speech.
Features
Multi-agent RAG system with:
Documentation Processor Agent that analyzes documents and generates clear, informative responses to user queries
TTS Optimization Agent that refines responses for natural speech patterns with proper pacing and emphasis
PDF document processing and chunking
Qdrant vector database for similarity search
Real-time text-to-speech with multiple voice options
Downloadable audio responses
Support for multiple document uploads
How The App Works
The application workflow consists of three main phases:
1. System Initialization
User enters API credentials (Qdrant, Firecrawl, OpenAI) in the sidebar
User inputs the documentation URL they want to analyze
User selects their preferred TTS voice from 11 options
Upon clicking "Initialize System":
System connects to Qdrant and creates a vector collection
Firecrawl extracts content from the specified documentation URL
Content is processed into chunks, embedded via FastEmbed, and stored in Qdrant
Two OpenAI agents are configured with specific instructions and models
2. Query Processing
User enters a question in the main interface
System generates an embedding of the question
Qdrant searches for the most relevant documentation chunks
Top 3 chunks are extracted and formatted with the question as context
Documentation Processor agent (GPT-4o) generates a comprehensive answer
TTS Agent formats the response for optimal speech synthesis
OpenAI's GPT-4o-mini TTS converts text to audio with the selected voice
3. Response Presentation
Text response appears in the main panel
Audio player provides immediate voice playback
Source URLs are displayed for attribution
Download button allows saving the audio file
Prerequisites
Before we begin, make sure you have the following:
Python installed on your machine (version 3.10 or higher is recommended)
Your OpenAI, Firecrawl, and Qdrant Cloud API key along with the URL
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming
Code Walkthrough
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the customer_support_voice_agent folder:
cd ai_agent_tutorials/customer_support_voice_agent
Install the required dependencies:
pip install -r requirements.txt
API Keys: Get your OpenAI API key and Firecrawl API key. Set up a Qdrant Cloud account and get your API key and URL.
Creating the Streamlit App
Let’s create our app. Create a new file customer_support_voice_agent.py
and add the following code:
First, import the necessary libraries:
from typing import List, Dict, Optional
from pathlib import Path
import os
from firecrawl import FirecrawlApp
from qdrant_client import QdrantClient
from qdrant_client.http import models
from qdrant_client.http.models import Distance, VectorParams
from fastembed import TextEmbedding
from agents import Agent, Runner
from openai import AsyncOpenAI
import tempfile
import uuid
from datetime import datetime
import time
import streamlit as st
from dotenv import load_dotenv
import asyncio
load_dotenv()
Initialize the app state and session variables:
def init_session_state():
defaults = {
"initialized": False,
"qdrant_url": "",
"qdrant_api_key": "",
"firecrawl_api_key": "",
"openai_api_key": "",
"doc_url": "",
"setup_complete": False,
"client": None,
"embedding_model": None,
"processor_agent": None,
"tts_agent": None,
"selected_voice": "coral"
}
for key, value in defaults.items():
if key not in st.session_state:
st.session_state[key] = value
Set up the sidebar configuration:
def sidebar_config():
with st.sidebar:
st.title("🔑 Configuration")
st.markdown("---")
# API key inputs
st.session_state.qdrant_url = st.text_input("Qdrant URL", value=st.session_state.qdrant_url, type="password")
st.session_state.qdrant_api_key = st.text_input("Qdrant API Key", value=st.session_state.qdrant_api_key, type="password")
st.session_state.firecrawl_api_key = st.text_input("Firecrawl API Key", value=st.session_state.firecrawl_api_key, type="password")
st.session_state.openai_api_key = st.text_input("OpenAI API Key", value=st.session_state.openai_api_key, type="password")
# Document URL input
st.markdown("---")
st.session_state.doc_url = st.text_input("Documentation URL", value=st.session_state.doc_url, placeholder="https://docs.example.com")
# Voice selection
st.markdown("---")
st.markdown("### 🎤 Voice Settings")
voices = ["alloy", "ash", "ballad", "coral", "echo", "fable", "onyx", "nova", "sage", "shimmer", "verse"]
st.session_state.selected_voice = st.selectbox("Select Voice", options=voices, index=voices.index(st.session_state.selected_voice))
Vector database setup and document crawling:
def setup_qdrant_collection(qdrant_url, qdrant_api_key, collection_name="docs_embeddings"):
client = QdrantClient(url=qdrant_url, api_key=qdrant_api_key)
embedding_model = TextEmbedding()
test_embedding = list(embedding_model.embed(["test"]))[0]
embedding_dim = len(test_embedding)
try:
client.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(size=embedding_dim, distance=Distance.COSINE)
)
except Exception as e:
if "already exists" not in str(e):
raise e
return client, embedding_model
def crawl_documentation(firecrawl_api_key, url, output_dir=None):
firecrawl = FirecrawlApp(api_key=firecrawl_api_key)
pages = []
# Implementation details...
return pages
Store page embeddings:
def store_embeddings(client, embedding_model, pages, collection_name):
for page in pages:
embedding = list(embedding_model.embed([page["content"]]))[0]
client.upsert(
collection_name=collection_name,
points=[
models.PointStruct(
id=str(uuid.uuid4()),
vector=embedding.tolist(),
payload={
"content": page["content"],
"url": page["url"],
**page["metadata"]
}
)
]
)
Set up OpenAI agents:
def setup_agents(openai_api_key):
os.environ["OPENAI_API_KEY"] = openai_api_key
processor_agent = Agent(
name="Documentation Processor",
instructions="""You are a helpful documentation assistant...""",
model="gpt-4o"
)
tts_agent = Agent(
name="Text-to-Speech Agent",
instructions="""You are a text-to-speech agent...""",
model="gpt-4o-mini-tts"
)
return processor_agent, tts_agent
Create the query processing function:
async def process_query(query, client, embedding_model, processor_agent, tts_agent, collection_name, openai_api_key):
try:
# Create query embedding and search for similar documents
query_embedding = list(embedding_model.embed([query]))[0]
search_response = client.query_points(
collection_name=collection_name,
query=query_embedding.tolist(),
limit=3,
with_payload=True
)
# Process search results and build context
search_results = search_response.points if hasattr(search_response, 'points') else []
context = "Based on the following documentation:\n\n"
for result in search_results:
# Extract content from each result...
# Generate text response with processor agent
processor_result = await Runner.run(processor_agent, context)
processor_response = processor_result.final_output
# Generate TTS instructions with TTS agent
tts_result = await Runner.run(tts_agent, processor_response)
tts_response = tts_result.final_output
Generate and store the audio response:
# Generate audio with OpenAI TTS
async_openai = AsyncOpenAI(api_key=openai_api_key)
audio_response = await async_openai.audio.speech.create(
model="gpt-4o-mini-tts",
voice=st.session_state.selected_voice,
input=processor_response,
instructions=tts_response,
response_format="mp3"
)
# Save audio file temporarily
temp_dir = tempfile.gettempdir()
audio_path = os.path.join(temp_dir, f"response_{uuid.uuid4()}.mp3")
with open(audio_path, "wb") as f:
f.write(audio_response.content)
# Return results
return {
"status": "success",
"text_response": processor_response,
"audio_path": audio_path,
# Additional details...
}
Create the main Streamlit app:
def run_streamlit():
st.set_page_config(
page_title="Customer Support Voice Agent",
page_icon="🎙️",
layout="wide"
)
init_session_state()
sidebar_config()
st.title("🎙️ Customer Support Voice Agent")
st.markdown("""
Get OpenAI SDK voice-powered answers to your documentation questions! Simply:
1. Configure your API keys in the sidebar
2. Enter the documentation URL you want to learn about
3. Ask your question below and get both text and voice responses
""")
Handle user queries and display responses:
query = st.text_input(
"What would you like to know about the documentation?",
placeholder="e.g., How do I authenticate API requests?",
disabled=not st.session_state.setup_complete
)
if query and st.session_state.setup_complete:
with st.status("Processing your query...") as status:
# Process query and get result
result = asyncio.run(process_query(...))
if result["status"] == "success":
# Display text response
st.markdown("### Response:")
st.write(result["text_response"])
# Display audio player and download button
st.markdown(f"### 🔊 Audio Response (Voice: {st.session_state.selected_voice})")
st.audio(result["audio_path"], format="audio/mp3")
# Add download button
with open(result["audio_path"], "rb") as audio_file:
audio_bytes = audio_file.read()
st.download_button(
label="📥 Download Audio Response",
data=audio_bytes,
file_name=f"voice_response_{st.session_state.selected_voice}.mp3",
mime="audio/mp3"
)
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run customer_support_voice_agent.py
Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, put in your API keys, and you're ready to query your document for voice RAG.
Working Application Demo
Conclusion
You've successfully built a Customer Support Voice Agent that can process documentation, answer user questions, and deliver responses in both text and natural-sounding speech.
Want to take it further? Here are some ideas:
Streaming Responses: Modify the application to stream both text and audio as they're generated, rather than waiting for complete outputs.
Adding Context Retention: Extend the application to maintain conversation history, allowing follow-up questions without repeating context.
Implementing Local Fallbacks: Add fallback capabilities using local models when API connectivity is limited or for handling common queries.
Conversation Flows: Implement guided conversation paths for common support scenarios, allowing the agent to proactively gather required information rather than waiting for perfect user queries.
Knowledge Management: Extend beyond single URL crawling to support multiple documentation sources with appropriate tagging and metadata.
Keep experimenting with different agent configurations and features to build more sophisticated AI applications.
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply