- unwind ai
- Posts
- Build a Multi LLM Routing App with Llama 3.1 and GPT-4o
Build a Multi LLM Routing App with Llama 3.1 and GPT-4o
Automatically routes the input to the most efficient LLM in just 30 lines of Python Code (step-by-step instructions)
Choosing the right model for AI tasks is often challenging. Some tasks need the power of GPT-4o, while others are efficiently handled by lighter models like smaller Llama models. This is where RouteLLM helps—it automates routing between different models based on query complexity.
In this tutorial, we’ll show how to build a Multi-LLM Routing Chat App using RouteLLM in just 30 lines of Python code. This Streamlit app dynamically selects the best model for every query, ensuring efficiency and performance.
Why Multi-LLM Routing matters:
A multi-LLM routing app optimizes both performance and cost. Instead of using high-powered models for all queries, it assigns simpler ones to more efficient models, reducing latency and operational costs.
What is RouteLLM?
RouteLLM allows you to route user queries dynamically between models, optimizing both performance and cost. With built-in routers like the mf router, it identifies the complexity of a query and directs it accordingly, avoiding unnecessary usage of high-powered models.
What We’re Building
This Streamlit application demonstrates the use of RouteLLM, a system that intelligently routes queries between different language models based on the complexity of the task. It provides a chat interface where users can interact with AI models, and the app automatically selects the most appropriate model for each query.
Features
Chat interface for interacting with AI models
Automatic model selection using RouteLLM
Utilizes both GPT-4o-mini and Meta-Llama 3.1 models
Displays chat history with model information
After personally vetting dozens of AI courses, we're excited to share something special with our developer community. These aren't your typical theoretical courses - they're intensive, hands-on programs where you'll build production-grade AI systems with industry veterans from Google, Stanford, and leading AI companies.
Why we're recommending these:
Live cohort-based learning, not pre-recorded videos
Build real production systems, not toy projects
Direct mentorship from industry practitioners
Rigorously tested implementation patterns
Focused on scalable architectures & deployment
You'll be implementing everything from advanced RAG systems to multi-agent architectures in real time.
Prerequisites
Before we begin, make sure you have:
Python installed on your machine (version 3.7 or higher is recommended)
Your OpenAI API Key and Together AI API Key
Basic familiarity with Python programming
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the llm_router_app folder:
cd advanced_tools_frameworks/llm_router_app
Install the required dependencies:
pip install -r requirements.txt
Get your API Keys: Sign up for an OpenAI account and Together AI account to obtain your API key.
Creating the Streamlit App
Let’s create our app. Create a new file llm_router.py
and add the following code:
Set up the environment and import libraries:
• Sets up API keys as environment variables
• Imports Streamlit for the web interface
• Imports RouteLLM Controller for LLM routing
import os
os.environ["OPENAI_API_KEY"] = "your_openai_api_key"
os.environ['TOGETHERAI_API_KEY'] = "your_togetherai_api_key"
import streamlit as st
from routellm.controller import Controller
Initialize the RouteLLM client:
• Sets up RouteLLM with multiple models
• Uses "mf" (Model Fusion) router
• Defines strong and weak models for different tasks
client = Controller(
routers=["mf"],
strong_model="gpt-4o-mini",
weak_model="together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
)
Set up the Streamlit app and initialize chat history:
• Creates a title for the app
• Initializes an empty chat history in the session state
st.title("RouteLLM Chat App")
if "messages" not in st.session_state:
st.session_state.messages = []
Display existing chat messages:
• Iterates through existing messages
• Displays each message with its role (user/assistant)
• Shows which model was used for assistant responses
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if "model" in message:
st.caption(f"Model used: {message['model']}")
Handle user input:
• Captures user input using Streamlit's chat_input
• Adds user message to chat history
• Displays the user message in the chat interface
if prompt := st.chat_input("What is your message?"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
Generate and display RouteLLM response:
• Uses RouteLLM to generate a response
• Extracts the message content and model name
• Displays the response and the model used
with st.chat_message("assistant"):
message_placeholder = st.empty()
response = client.chat.completions.create(
model="router-mf-0.11593",
messages=[{"role": "user", "content": prompt}]
)
message_content = response['choices'][0]['message']['content']
model_name = response['model']
# Display assistant's response
message_placeholder.markdown(message_content)
st.caption(f"Model used: {model_name}")
Add assistant's response to chat history:
• Stores the assistant's response in the chat history
• Includes the content and the model used for future reference
st.session_state.messages.append({"role": "assistant", "content": message_content, "model": model_name})
How the Code Works
RouteLLM Initialization: The app initializes the RouteLLM controller with two models:
Strong model: GPT-4o-mini
Weak model: Meta-Llama 3.1 70B Instruct Turbo
Chat Interface: Users can input messages through a chat interface.
Model Selection: RouteLLM automatically selects the appropriate model based on the complexity of the user's query.
Response Generation: The selected model generates a response to the user's input.
Display: The app displays the response along with information about which model was used.
History: The chat history is maintained and displayed, including model information for each response.
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run llm_router.py
Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, put in your API keys, start asking questions, and watch it magically switch between LLMs!
Working Application Demo
Conclusion
And your simple Multi-LLM Routing Chat App is ready that allows you to intelligently switch between models, balancing efficiency and performance.
This setup can now be expanded further:
Personalize routing logic: Tailor how the
mf
router distributes queries to better match your use case.Calibrate thresholds: Adjust the cost-quality threshold to control how often high-powered models are used.
Add session-based memory: Implement a memory layer to allow the system to recall past interactions, enhancing user experience with context-aware responses.
Keep experimenting and refining to build even smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply