• unwind ai
  • Posts
  • Build a Multi LLM Routing App with Llama 3.1 and GPT-4o

Build a Multi LLM Routing App with Llama 3.1 and GPT-4o

Automatically routes the input to the most efficient LLM in just 30 lines of Python Code (step-by-step instructions)

Choosing the right model for AI tasks is often challenging. Some tasks need the power of GPT-4o, while others are efficiently handled by lighter models like smaller Llama models. This is where RouteLLM helps—it automates routing between different models based on query complexity.

In this tutorial, we’ll show how to build a Multi-LLM Routing Chat App using RouteLLM in just 30 lines of Python code. This Streamlit app dynamically selects the best model for every query, ensuring efficiency and performance.

Why Multi-LLM Routing matters:
A multi-LLM routing app optimizes both performance and cost. Instead of using high-powered models for all queries, it assigns simpler ones to more efficient models, reducing latency and operational costs.

What is RouteLLM?
RouteLLM allows you to route user queries dynamically between models, optimizing both performance and cost. With built-in routers like the mf router, it identifies the complexity of a query and directs it accordingly, avoiding unnecessary usage of high-powered models.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This Streamlit application demonstrates the use of RouteLLM, a system that intelligently routes queries between different language models based on the complexity of the task. It provides a chat interface where users can interact with AI models, and the app automatically selects the most appropriate model for each query.

Features

  • Chat interface for interacting with AI models

  • Automatic model selection using RouteLLM

  • Utilizes both GPT-4o-mini and Meta-Llama 3.1 models

  • Displays chat history with model information

After personally vetting dozens of AI courses, we're excited to share something special with our developer community. These aren't your typical theoretical courses - they're intensive, hands-on programs where you'll build production-grade AI systems with industry veterans from Google, Stanford, and leading AI companies.

Why we're recommending these:

  • Live cohort-based learning, not pre-recorded videos

  • Build real production systems, not toy projects

  • Direct mentorship from industry practitioners

  • Rigorously tested implementation patterns

  • Focused on scalable architectures & deployment

You'll be implementing everything from advanced RAG systems to multi-agent architectures in real time.

Prerequisites

Before we begin, make sure you have:

  1. Python installed on your machine (version 3.7 or higher is recommended)

  2. Your OpenAI API Key and Together AI API Key

  3. Basic familiarity with Python programming

  4. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the llm_router_app folder:

cd advanced_tools_frameworks/llm_router_app
pip install -r requirements.txt
  1. Get your API Keys: Sign up for an OpenAI account and Together AI account to obtain your API key.

Creating the Streamlit App

Let’s create our app. Create a new file llm_router.py and add the following code:

  1. Set up the environment and import libraries: 
    • Sets up API keys as environment variables
    • Imports Streamlit for the web interface
    • Imports RouteLLM Controller for LLM routing

import os

os.environ["OPENAI_API_KEY"] = "your_openai_api_key"
os.environ['TOGETHERAI_API_KEY'] = "your_togetherai_api_key"

import streamlit as st
from routellm.controller import Controller
  1. Initialize the RouteLLM client:
    • Sets up RouteLLM with multiple models
    • Uses "mf" (Model Fusion) router
    • Defines strong and weak models for different tasks

client = Controller(
    routers=["mf"],
    strong_model="gpt-4o-mini",
    weak_model="together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
)
  1. Set up the Streamlit app and initialize chat history: 
    • Creates a title for the app
    • Initializes an empty chat history in the session state

st.title("RouteLLM Chat App")

if "messages" not in st.session_state:
    st.session_state.messages = []
  1. Display existing chat messages: 
    • Iterates through existing messages
    • Displays each message with its role (user/assistant)
    • Shows which model was used for assistant responses

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
        if "model" in message:
            st.caption(f"Model used: {message['model']}")
  1. Handle user input:
    • Captures user input using Streamlit's chat_input
    • Adds user message to chat history
    • Displays the user message in the chat interface

if prompt := st.chat_input("What is your message?"):
    # Add user message to chat history
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)
  1. Generate and display RouteLLM response: 
    • Uses RouteLLM to generate a response
    • Extracts the message content and model name
    • Displays the response and the model used

    with st.chat_message("assistant"):
        message_placeholder = st.empty()
        response = client.chat.completions.create(
            model="router-mf-0.11593",
            messages=[{"role": "user", "content": prompt}]
        )
        message_content = response['choices'][0]['message']['content']
        model_name = response['model']
        
        # Display assistant's response
        message_placeholder.markdown(message_content)
        st.caption(f"Model used: {model_name}")
  1. Add assistant's response to chat history:
    • Stores the assistant's response in the chat history
    • Includes the content and the model used for future reference

    st.session_state.messages.append({"role": "assistant", "content": message_content, "model": model_name})

How the Code Works

  1. RouteLLM Initialization: The app initializes the RouteLLM controller with two models:

    • Strong model: GPT-4o-mini

    • Weak model: Meta-Llama 3.1 70B Instruct Turbo

  2. Chat Interface: Users can input messages through a chat interface.

  3. Model Selection: RouteLLM automatically selects the appropriate model based on the complexity of the user's query.

  4. Response Generation: The selected model generates a response to the user's input.

  5. Display: The app displays the response along with information about which model was used.

  6. History: The chat history is maintained and displayed, including model information for each response.

Running the App

With our code in place, it's time to launch the app.

  • In your terminal, navigate to the project folder, and run the following command

streamlit run llm_router.py
  • Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, put in your API keys, start asking questions, and watch it magically switch between LLMs!

Working Application Demo

Conclusion

And your simple Multi-LLM Routing Chat App is ready that allows you to intelligently switch between models, balancing efficiency and performance.

This setup can now be expanded further:

  • Personalize routing logic: Tailor how the mf router distributes queries to better match your use case.

  • Calibrate thresholds: Adjust the cost-quality threshold to control how often high-powered models are used.

  • Add session-based memory: Implement a memory layer to allow the system to recall past interactions, enhancing user experience with context-aware responses.

Keep experimenting and refining to build even smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.