• unwind ai
  • Posts
  • Build a Customer Support Voice Agent

Build a Customer Support Voice Agent

Fully functional agentic RAG voice app with step-by-step instructions (100% opensource)

Voice is the most natural and accessible way for users to interact with any application and we see it being used the most for customer support usecases. But building a voice agent that can access your knowledge base can be complex and time-consuming.

In this tutorial, we'll build a Customer Support Voice Agent using OpenAI's SDK that combines GPT-4o with their latest TTS (text-to-speech) model. Our application will crawl documentation websites, process the content into a searchable knowledge base, and provide both text and voice responses to user queries through a clean Streamlit interface.

We'll be using:

  • Firecrawl to extract content from documentation websites,

  • Qdrant for vector storage and search capabilities,

  • GPT-4o as the LLM and GPT-4o-mini TTS model,

  • OpenAI Agents SDK for orchestrating the AI agents and the voice pipeline, and

  • FastEmbed for generating embeddings.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This application implements a Voice RAG system powered by OpenAI's Agents SDK that delivers voice responses to documentation queries. The system creates a searchable knowledge base from your documentation and uses a multi-agent approach to generate contextually relevant answers through both text and speech.

Features

  1. Multi-agent RAG system with:

    • Documentation Processor Agent that analyzes documents and generates clear, informative responses to user queries

    • TTS Optimization Agent that refines responses for natural speech patterns with proper pacing and emphasis

  2. PDF document processing and chunking

  3. Qdrant vector database for similarity search

  4. Real-time text-to-speech with multiple voice options

  5. Downloadable audio responses

  6. Support for multiple document uploads

How The App Works

The application workflow consists of three main phases:

1. System Initialization

  • User enters API credentials (Qdrant, Firecrawl, OpenAI) in the sidebar

  • User inputs the documentation URL they want to analyze

  • User selects their preferred TTS voice from 11 options

  • Upon clicking "Initialize System":

    • System connects to Qdrant and creates a vector collection

    • Firecrawl extracts content from the specified documentation URL

    • Content is processed into chunks, embedded via FastEmbed, and stored in Qdrant

    • Two OpenAI agents are configured with specific instructions and models

2. Query Processing

  • User enters a question in the main interface

  • System generates an embedding of the question

  • Qdrant searches for the most relevant documentation chunks

  • Top 3 chunks are extracted and formatted with the question as context

  • Documentation Processor agent (GPT-4o) generates a comprehensive answer

  • TTS Agent formats the response for optimal speech synthesis

  • OpenAI's GPT-4o-mini TTS converts text to audio with the selected voice

3. Response Presentation

  • Text response appears in the main panel

  • Audio player provides immediate voice playback

  • Source URLs are displayed for attribution

  • Download button allows saving the audio file

Prerequisites

Before we begin, make sure you have the following:

  1. Python installed on your machine (version 3.10 or higher is recommended)

  2. Your OpenAI, Firecrawl, and Qdrant Cloud API key along with the URL

  3. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

  4. Basic familiarity with Python programming

Code Walkthrough

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the customer_support_voice_agent folder:

cd ai_agent_tutorials/customer_support_voice_agent
pip install -r requirements.txt
  1. API Keys: Get your OpenAI API key and Firecrawl API key. Set up a Qdrant Cloud account and get your API key and URL.

Creating the Streamlit App

Let’s create our app. Create a new file customer_support_voice_agent.py and add the following code:

  1. First, import the necessary libraries:

from typing import List, Dict, Optional
from pathlib import Path
import os
from firecrawl import FirecrawlApp
from qdrant_client import QdrantClient
from qdrant_client.http import models
from qdrant_client.http.models import Distance, VectorParams
from fastembed import TextEmbedding
from agents import Agent, Runner
from openai import AsyncOpenAI
import tempfile
import uuid
from datetime import datetime
import time
import streamlit as st
from dotenv import load_dotenv
import asyncio

load_dotenv()
  1. Initialize the app state and session variables:

def init_session_state():
    defaults = {
        "initialized": False,
        "qdrant_url": "",
        "qdrant_api_key": "",
        "firecrawl_api_key": "",
        "openai_api_key": "",
        "doc_url": "",
        "setup_complete": False,
        "client": None,
        "embedding_model": None,
        "processor_agent": None,
        "tts_agent": None,
        "selected_voice": "coral"
    }
    
    for key, value in defaults.items():
        if key not in st.session_state:
            st.session_state[key] = value
  1. Set up the sidebar configuration:

def sidebar_config():
    with st.sidebar:
        st.title("🔑 Configuration")
        st.markdown("---")
        
        # API key inputs
        st.session_state.qdrant_url = st.text_input("Qdrant URL", value=st.session_state.qdrant_url, type="password")
        st.session_state.qdrant_api_key = st.text_input("Qdrant API Key", value=st.session_state.qdrant_api_key, type="password")
        st.session_state.firecrawl_api_key = st.text_input("Firecrawl API Key", value=st.session_state.firecrawl_api_key, type="password")
        st.session_state.openai_api_key = st.text_input("OpenAI API Key", value=st.session_state.openai_api_key, type="password")
        
        # Document URL input
        st.markdown("---")
        st.session_state.doc_url = st.text_input("Documentation URL", value=st.session_state.doc_url, placeholder="https://docs.example.com")
        
        # Voice selection
        st.markdown("---")
        st.markdown("### 🎤 Voice Settings")
        voices = ["alloy", "ash", "ballad", "coral", "echo", "fable", "onyx", "nova", "sage", "shimmer", "verse"]
        st.session_state.selected_voice = st.selectbox("Select Voice", options=voices, index=voices.index(st.session_state.selected_voice))
  1. Vector database setup and document crawling:

def setup_qdrant_collection(qdrant_url, qdrant_api_key, collection_name="docs_embeddings"):
    client = QdrantClient(url=qdrant_url, api_key=qdrant_api_key)
    embedding_model = TextEmbedding()
    test_embedding = list(embedding_model.embed(["test"]))[0]
    embedding_dim = len(test_embedding)
    
    try:
        client.create_collection(
            collection_name=collection_name,
            vectors_config=VectorParams(size=embedding_dim, distance=Distance.COSINE)
        )
    except Exception as e:
        if "already exists" not in str(e):
            raise e
    
    return client, embedding_model

def crawl_documentation(firecrawl_api_key, url, output_dir=None):
    firecrawl = FirecrawlApp(api_key=firecrawl_api_key)
    pages = []
    # Implementation details...
    return pages
  1. Store page embeddings:

def store_embeddings(client, embedding_model, pages, collection_name):
    for page in pages:
        embedding = list(embedding_model.embed([page["content"]]))[0]
        client.upsert(
            collection_name=collection_name,
            points=[
                models.PointStruct(
                    id=str(uuid.uuid4()),
                    vector=embedding.tolist(),
                    payload={
                        "content": page["content"],
                        "url": page["url"],
                        **page["metadata"]
                    }
                )
            ]
        )
  1. Set up OpenAI agents:

def setup_agents(openai_api_key):
    os.environ["OPENAI_API_KEY"] = openai_api_key
    
    processor_agent = Agent(
        name="Documentation Processor",
        instructions="""You are a helpful documentation assistant...""",
        model="gpt-4o"
    )

    tts_agent = Agent(
        name="Text-to-Speech Agent",
        instructions="""You are a text-to-speech agent...""",
        model="gpt-4o-mini-tts"
    )
    
    return processor_agent, tts_agent
  1. Create the query processing function:

async def process_query(query, client, embedding_model, processor_agent, tts_agent, collection_name, openai_api_key):
    try:
        # Create query embedding and search for similar documents
        query_embedding = list(embedding_model.embed([query]))[0]
        search_response = client.query_points(
            collection_name=collection_name,
            query=query_embedding.tolist(),
            limit=3,
            with_payload=True
        )
        
        # Process search results and build context
        search_results = search_response.points if hasattr(search_response, 'points') else []
        context = "Based on the following documentation:\n\n"
        for result in search_results:
            # Extract content from each result...
        
        # Generate text response with processor agent
        processor_result = await Runner.run(processor_agent, context)
        processor_response = processor_result.final_output
        
        # Generate TTS instructions with TTS agent
        tts_result = await Runner.run(tts_agent, processor_response)
        tts_response = tts_result.final_output
  1. Generate and store the audio response:

# Generate audio with OpenAI TTS
        async_openai = AsyncOpenAI(api_key=openai_api_key)
        audio_response = await async_openai.audio.speech.create(
            model="gpt-4o-mini-tts",
            voice=st.session_state.selected_voice,
            input=processor_response,
            instructions=tts_response,
            response_format="mp3"
        )
        
        # Save audio file temporarily
        temp_dir = tempfile.gettempdir()
        audio_path = os.path.join(temp_dir, f"response_{uuid.uuid4()}.mp3")
        with open(audio_path, "wb") as f:
            f.write(audio_response.content)
        
        # Return results
        return {
            "status": "success",
            "text_response": processor_response,
            "audio_path": audio_path,
            # Additional details...
        }
  1. Create the main Streamlit app:

def run_streamlit():
    st.set_page_config(
        page_title="Customer Support Voice Agent",
        page_icon="🎙️",
        layout="wide"
    )
    
    init_session_state()
    sidebar_config()
    
    st.title("🎙️ Customer Support Voice Agent")
    st.markdown("""
    Get OpenAI SDK voice-powered answers to your documentation questions! Simply:
    1. Configure your API keys in the sidebar
    2. Enter the documentation URL you want to learn about
    3. Ask your question below and get both text and voice responses
    """)
  1. Handle user queries and display responses:

query = st.text_input(
        "What would you like to know about the documentation?",
        placeholder="e.g., How do I authenticate API requests?",
        disabled=not st.session_state.setup_complete
    )
    
    if query and st.session_state.setup_complete:
        with st.status("Processing your query...") as status:
            # Process query and get result
            result = asyncio.run(process_query(...))
            
            if result["status"] == "success":
                # Display text response
                st.markdown("### Response:")
                st.write(result["text_response"])
                
                # Display audio player and download button
                st.markdown(f"### 🔊 Audio Response (Voice: {st.session_state.selected_voice})")
                st.audio(result["audio_path"], format="audio/mp3")
                
                # Add download button
                with open(result["audio_path"], "rb") as audio_file:
                    audio_bytes = audio_file.read()
                    st.download_button(
                        label="📥 Download Audio Response",
                        data=audio_bytes,
                        file_name=f"voice_response_{st.session_state.selected_voice}.mp3",
                        mime="audio/mp3"
                    )

Running the App

With our code in place, it's time to launch the app.

  • In your terminal, navigate to the project folder, and run the following command

streamlit run customer_support_voice_agent.py
  • Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, put in your API keys, and you're ready to query your document for voice RAG.

Working Application Demo

Conclusion

You've successfully built a Customer Support Voice Agent that can process documentation, answer user questions, and deliver responses in both text and natural-sounding speech.

Want to take it further? Here are some ideas:

  1. Streaming Responses: Modify the application to stream both text and audio as they're generated, rather than waiting for complete outputs.

  2. Adding Context Retention: Extend the application to maintain conversation history, allowing follow-up questions without repeating context.

  3. Implementing Local Fallbacks: Add fallback capabilities using local models when API connectivity is limited or for handling common queries.

  4. Conversation Flows: Implement guided conversation paths for common support scenarios, allowing the agent to proactively gather required information rather than waiting for perfect user queries.

  5. Knowledge Management: Extend beyond single URL crawling to support multiple documentation sources with appropriate tagging and metadata.

Keep experimenting with different agent configurations and features to build more sophisticated AI applications.

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.