unwind ai
Posts
Build a Multimodal AI Agent Design Team

Build a Multimodal AI Agent Design Team

Fully functional multi-agent app using Gemini 2.0 Flash (step-by-step instructions)

Shubham Saboo & Gargi Gupta
December 13, 2024

Multi-agent AI systems are a powerful paradigm where specialized agents collaborate to solve complex problems. Each agent has distinct capabilities and objectives with which we can create systems that are robust and truly useful. When we add multimodal capabilities like images, text, videos, and structured data – these systems become even more powerful.

In this tutorial, we’re building a Multi-Agent Design Team powered by Google's Gemini 2.0, where three specialized agents work in concert to provide comprehensive design insights.

Each agent uses Gemini's multimodal capabilities to understand design assets in different ways: analyzing visual hierarchies, evaluating interaction patterns, and contextualizing market positioning. The agents communicate and coordinate their findings to deliver unified, actionable insights.

We're using Agno, a framework specifically designed for orchestrating AI agents. It provides the infrastructure for agent communication, memory management, and tool integration.

Also, Gemini 2.0 Flash brings impressive capabilities to our AI agents with multimodality, excellent performance, and fast inference.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This application leverages multiple specialized AI agents to provide a comprehensive analysis of UI/UX designs of your product and your competitors, combining visual understanding, user experience evaluation, and market research insights.

Our Design Team:

Vision Agent - A visual analysis expert that identifies design elements, patterns, visual hierarchy, and evaluates composition fundamentals like color schemes and typography. It focuses on the technical aspects of visual design, analyzing everything from component relationships to overall brand consistency.
UX Agent - A user experience specialist that evaluates user flows, interaction patterns, and identifies usability issues and opportunities for improvement. It applies best practices in UX design and accessibility to provide actionable recommendations for enhancing user interaction.
Market Agent - A market research expert equipped with DuckDuckGo integration that analyzes market trends and competitor patterns while providing strategic positioning insights. This agent combines design analysis with market research to deliver context-aware recommendations and industry-specific guidance.

Features:

Integrated analysis across all three agent perspectives
Comparative analysis with competitor designs
Customizable focus areas for detailed insights
Context-aware analysis for better relevance
Real-time processing with progress indicators
Structured, actionable output

How the App Works

The application orchestrates the three agents through a structured analysis workflow:

Analysis Types and Agent Assignment:

Visual Design Analysis - Handled by the Vision Agent
- Processes uploaded images
- Analyzes specific elements like color schemes, typography, layout based on user-selected focus areas
- Provides technical analysis of visual components
User Experience Analysis - Managed by the UX Agent
- Evaluates the same images from a UX perspective
- Focuses on user flows, interactions, and accessibility
- Provides practical improvement suggestions
Market Analysis - Conducted by the Market Agent
- Combines visual analysis with web research using DuckDuckGo
- Provides market context and competitive insights
- Suggests positioning strategies

Workflow Process:

Users upload design files and optional competitor designs
They select which types of analysis to run (can choose any combination of the three)
They can specify focus areas like Color Scheme, Typography, Layout, Navigation, Interactions, Accessibility, Branding, or Market Fit
Each selected analysis type triggers its respective agent
All agents have access to the same images but analyze them through their specialized lens
Results are compiled into a comprehensive report, with each agent's insights clearly separated
If multiple analysis types are selected, a combined "Key Takeaways" section shows how the different perspectives interconnect

Prerequisites

Before we begin, make sure you have the following:

Python installed on your machine (version 3.10 or higher is recommended)
Your Gemini API Key
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git

🌟 Don't forget to star the opensource repo to show your support.

Go to the ai_multimodal_design_agent folder:

cd advanced_ai_agents/multi_agent_apps/agent_teams/multimodal_design_agent_team

Install the required dependencies:

pip install -r requirements.txt

API Key: Visit Google AI Studio > Create or select a project > Generate an API key

Creating the Streamlit App

Let’s create our app. Create a new file design_agent_team.py and add the following code:

Import required libraries and setup:

from agno.agent import Agent
from agno.models.google import Gemini
from agno.media import Image as AgnoImage
from agno.tools.duckduckgo import DuckDuckGoTools
import streamlit as st
from typing import List, Optional
import logging
from pathlib import Path
import tempfile
import os

Initialize the specialized AI agents:

def initialize_agents(api_key: str) -> tuple[Agent, Agent, Agent]:
    try:
        model = Gemini(id="gemini-2.0-flash-exp", api_key=api_key)
        
        vision_agent = Agent(
            model=model,
            instructions=[
                "You are a visual analysis expert that:",
                "1. Identifies design elements, patterns, and visual hierarchy",
                "2. Analyzes color schemes, typography, and layouts",
                "3. Detects UI components and their relationships",
                "4. Evaluates visual consistency and branding",
                "Be specific and technical in your analysis"
            ],
            markdown=True
        )
        
        ux_agent = Agent(
            model=model,
            instructions=[
                "You are a UX analysis expert that:",
                "1. Evaluates user flows and interaction patterns",
                "2. Identifies usability issues and opportunities",
                "3. Suggests UX improvements based on best practices",
                "4. Analyzes accessibility and inclusive design",
                "Focus on user-centric insights and practical improvements"
            ],
            markdown=True
        )
        
        market_agent = Agent(
            model=model,
            tools=[DuckDuckGoTools()],
            instructions=[
                "You are a market research expert that:",
                "1. Identifies market trends and competitor patterns",
                "2. Analyzes similar products and features",
                "3. Suggests market positioning and opportunities",
                "4. Provides industry-specific insights",
                "Focus on actionable market intelligence"
            ],
            markdown=True
        )
        
        return vision_agent, ux_agent, market_agent
    except Exception as e:
        st.error(f"Error initializing agents: {str(e)}")
        return None, None, None

Set up the Streamlit app and API configuration:

st.set_page_config(page_title="Multimodal AI Design Agent Team", layout="wide")

# Sidebar for API key input
with st.sidebar:
    st.header("🔑 API Configuration")
    if "api_key_input" not in st.session_state:
        st.session_state.api_key_input = ""
    
    api_key = st.text_input(
        "Enter your Gemini API Key",
        value=st.session_state.api_key_input,
        type="password",
        help="Get your API key from Google AI Studio",
        key="api_key_widget"
    )
    
    if api_key != st.session_state.api_key_input:
        st.session_state.api_key_input = api_key
    
    if api_key:
        st.success("API Key provided! ✅")
    else:
        st.warning("Please enter your API key to proceed")

Create the file upload interface:

st.header("📤 Upload Content")
col1, space, col2 = st.columns([1, 0.1, 1])

with col1:
    design_files = st.file_uploader(
        "Upload UI/UX Designs",
        type=["jpg", "jpeg", "png"],
        accept_multiple_files=True,
        key="designs"
    )
    
    if design_files:
        for file in design_files:
            st.image(file, caption=file.name, use_container_width=True)

with col2:
    competitor_files = st.file_uploader(
        "Upload Competitor Designs (Optional)",
        type=["jpg", "jpeg", "png"],
        accept_multiple_files=True,
        key="competitors"
    )
    
    if competitor_files:
        for file in competitor_files:
            st.image(file, caption=f"Competitor: {file.name}", use_container_width=True)

Configure analysis options:

st.header("🎯 Analysis Configuration")

analysis_types = st.multiselect(
    "Select Analysis Types",
    ["Visual Design", "User Experience", "Market Analysis"],
    default=["Visual Design"]
)

specific_elements = st.multiselect(
    "Focus Areas",
    ["Color Scheme", "Typography", "Layout", "Navigation",
     "Interactions", "Accessibility", "Branding", "Market Fit"]
)

context = st.text_area(
    "Additional Context",
    placeholder="Describe your product, target audience, or specific concerns..."
)

Implement image processing functionality:

def process_images(files):
    processed_images = []
    for file in files:
        try:
            temp_dir = tempfile.gettempdir()
            temp_path = os.path.join(temp_dir, f"temp_{file.name}")
            with open(temp_path, "wb") as f:
                f.write(file.getvalue())
            agno_image = AgnoImage(filepath=Path(temp_path))
            processed_images.append(agno_image)
        except Exception as e:
            logger.error(f"Error processing image {file.name}: {str(e)}")
            continue
    return processed_images

Execute analysis workflow:

if st.button("🚀 Run Analysis", type="primary"):
    if design_files:
        try:
            st.header("📊 Analysis Results")
            
            design_images = process_images(design_files)
            competitor_images = process_images(competitor_files) if competitor_files else []
            all_images = design_images + competitor_images
            
            # Visual Design Analysis
            if "Visual Design" in analysis_types and design_files:
                with st.spinner("🎨 Analyzing visual design..."):
                    if all_images:
                        vision_prompt = f"""
                        Analyze these designs focusing on: {', '.join(specific_elements)}
                        Additional context: {context}
                        Provide specific insights about visual design elements.
                        Please format your response with clear headers and bullet points.
                        Focus on concrete observations and actionable insights.
                        """
                        response = vision_agent.run(
                            message=vision_prompt,
                            images=all_images
                        )
                        st.subheader("🎨 Visual Design Analysis")
                        st.markdown(response.content)

Add UX and Market Analysis:

# UX Analysis
            if "User Experience" in analysis_types:
                with st.spinner("🔄 Analyzing user experience..."):
                    if all_images:
                        ux_prompt = f"""
                        Evaluate the user experience considering: {', '.join(specific_elements)}
                        Additional context: {context}
                        Focus on user flows, interactions, and accessibility.
                        Please format your response with clear headers and bullet points.
                        Focus on concrete observations and actionable improvements.
                        """
                        response = ux_agent.run(
                            message=ux_prompt,
                            images=all_images
                        )
                        st.subheader("🔄 UX Analysis")
                        st.markdown(response.content)
            
            # Market Analysis
            if "Market Analysis" in analysis_types:
                with st.spinner("📊 Conducting market analysis..."):
                    market_prompt = f"""
                    Analyze market positioning and trends based on these designs.
                    Context: {context}
                    Compare with competitor designs if provided.
                    Suggest market opportunities and positioning.
                    Please format your response with clear headers and bullet points.
                    Focus on concrete market insights and actionable recommendations.
                    """
                    response = market_agent.run(
                        message=market_prompt,
                        images=all_images
                    )
                    st.subheader("📊 Market Analysis")
                    st.markdown(response.content)
        
        except Exception as e:
            st.error("An error occurred during analysis. Please check the logs for details.")
    else:
        st.warning("Please upload at least one design to analyze.")

Running the App

With our code in place, it's time to launch the app.

In your terminal, navigate to the project folder, and run the following command

streamlit run design_agent_team.py

Streamlit will provide a local URL (typically http://localhost:8501).

Working Application Demo

Conclusion

And you've just built a powerful multi-agent design analysis team with multiple AI agents powered by Gemini 2.0. This tool can significantly streamline any design review process and provide valuable insights for improvement.

As you continue developing your AI agent team, consider these enhancements:

Adding support for video analysis using Gemini's video capabilities
Creating custom analysis templates for different design types
Adding export capabilities for reports

Keep experimenting and refining to build smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.