• unwind ai
  • Posts
  • Build a Multimodal AI Agent Design Team

Build a Multimodal AI Agent Design Team

Fully functional multi-agent app using Gemini 2.0 Flash (step-by-step instructions)

Multi-agent AI systems are a powerful paradigm where specialized agents collaborate to solve complex problems. Each agent has distinct capabilities and objectives with which we can create systems that are robust and truly useful. When we add multimodal capabilities like images, text, videos, and structured data – these systems become even more powerful.

In this tutorial, we’re building a Multi-Agent Design Team powered by Google's new Gemini 2.0, where three specialized agents work in concert to provide comprehensive design insights.

Each agent uses Gemini's multimodal capabilities to understand design assets in different ways: analyzing visual hierarchies, evaluating interaction patterns, and contextualizing market positioning. The agents communicate and coordinate their findings to deliver unified, actionable insights.

We're using Phidata, a framework specifically designed for orchestrating AI agents. It provides the infrastructure for agent communication, memory management, and tool integration. Using Phidata, we can easily create agents that not only process multiple input modalities but also reason about them in combination.

Also, Gemini 2.0 Flash brings impressive capabilities to our AI agents with multimodality, excellent performance, and fast inference. The best part is that the API is free with a generous rate limit while the model is in the experimental phase!

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This application leverages multiple specialized AI agents to provide a comprehensive analysis of UI/UX designs of your product and your competitors, combining visual understanding, user experience evaluation, and market research insights.

Our Design Team:

  1. Vision Agent - A visual analysis expert that identifies design elements, patterns, visual hierarchy, and evaluates composition fundamentals like color schemes and typography. It focuses on the technical aspects of visual design, analyzing everything from component relationships to overall brand consistency.

  2. UX Agent - A user experience specialist that evaluates user flows, interaction patterns, and identifies usability issues and opportunities for improvement. It applies best practices in UX design and accessibility to provide actionable recommendations for enhancing user interaction.

  3. Market Agent - A market research expert equipped with DuckDuckGo integration that analyzes market trends and competitor patterns while providing strategic positioning insights. This agent combines design analysis with market research to deliver context-aware recommendations and industry-specific guidance.

Features:

  • Integrated analysis across all three agent perspectives

  • Comparative analysis with competitor designs

  • Customizable focus areas for detailed insights

  • Context-aware analysis for better relevance

  • Real-time processing with progress indicators

  • Structured, actionable output

How the App Works

The application orchestrates the three agents through a structured analysis workflow:

Analysis Types and Agent Assignment:

  1. Visual Design Analysis - Handled by the Vision Agent

    • Processes uploaded images

    • Analyzes specific elements like color schemes, typography, layout based on user-selected focus areas

    • Provides technical analysis of visual components

  2. User Experience Analysis - Managed by the UX Agent

    • Evaluates the same images from a UX perspective

    • Focuses on user flows, interactions, and accessibility

    • Provides practical improvement suggestions

  3. Market Analysis - Conducted by the Market Agent

    • Combines visual analysis with web research using DuckDuckGo

    • Provides market context and competitive insights

    • Suggests positioning strategies

Workflow Process:

  • Users upload design files and optional competitor designs

  • They select which types of analysis to run (can choose any combination of the three)

  • They can specify focus areas like Color Scheme, Typography, Layout, Navigation, Interactions, Accessibility, Branding, or Market Fit

  • Each selected analysis type triggers its respective agent

  • All agents have access to the same images but analyze them through their specialized lens

  • Results are compiled into a comprehensive report, with each agent's insights clearly separated

  • If multiple analysis types are selected, a combined "Key Takeaways" section shows how the different perspectives interconnect

Prerequisites

Before we begin, make sure you have the following:

  1. Python installed on your machine (version 3.10 or higher is recommended)

  2. Your Gemini API Key

  3. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

  4. Basic familiarity with Python programming

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the ai_multimodal_design_agent folder:

ai_agent_tutorials/ai_multimodal_design_agent
pip install -r requirements.txt
  1. API Key: Visit Google AI Studio > Create or select a project > Generate an API key

Creating the Streamlit App

Let’s create our app. Create a new file design_agent_team.py and add the following code:

  1. Import required libraries and setup:

    • Streamlit for the interface
    • Phidata for AI agents
    • Pillow for image processing
    • DuckDuckGo for web search
    • Google Gemini as the LLM

from phi.agent import Agent
from phi.model.google import Gemini
from phi.tools.duckduckgo import DuckDuckGo
import streamlit as st
from PIL import Image
from typing import List, Optional

def initialize_agents(api_key: str) -> tuple[Agent, Agent, Agent]:
    model = Gemini(id="gemini-2.0-flash-exp", api_key=api_key)
  1. Create Vision Analysis Agent:

vision_agent = Agent(
    model=model,
    instructions=[
        "You are a visual analysis expert that:",
        "1. Identifies design elements, patterns, and visual hierarchy",
        "2. Analyzes color schemes, typography, and layouts",
        "3. Detects UI components and their relationships",
        "4. Evaluates visual consistency and branding"
    ],
    markdown=True
)
  1. Create UX Analysis Agent:

ux_agent = Agent(
    model=model,
    instructions=[
        "You are a UX analysis expert that:",
        "1. Evaluates user flows and interaction patterns",
        "2. Identifies usability issues and opportunities",
        "3. Suggests UX improvements based on best practices",
        "4. Analyzes accessibility and inclusive design"
    ],
    markdown=True
)
  1. Create Market Research Agent:

market_agent = Agent(
    model=model,
    tools=[DuckDuckGo(search=True)],
    instructions=[
        "You are a market research expert that:",
        "1. Identifies market trends and competitor patterns",
        "2. Analyzes similar products and features",
        "3. Suggests market positioning and opportunities",
        "4. Provides industry-specific insights"
    ],
    markdown=True
)
  1. Setup Streamlit Interface and API Configuration:

with st.sidebar:
    st.header("🔑 API Configuration")
    api_key = st.text_input(
        "Enter your Gemini API Key",
        value=st.session_state.api_key_input,
        type="password",
        help="Get your API key from Google AI Studio"
    )
  1. Create File Upload Section:

st.header("📤 Upload Content")
col1, space, col2 = st.columns([1, 0.1, 1])

with col1:
    design_files = st.file_uploader(
        "Upload UI/UX Designs",
        type=["jpg", "jpeg", "png"],
        accept_multiple_files=True
    )

with col2:
    competitor_files = st.file_uploader(
        "Upload Competitor Designs (Optional)",
        type=["jpg", "jpeg", "png"],
        accept_multiple_files=True
    )
  1. Analysis Configuration:

st.header("🎯 Analysis Configuration")
analysis_types = st.multiselect(
    "Select Analysis Types",
    ["Visual Design", "User Experience", "Market Analysis"]
)

specific_elements = st.multiselect(
    "Focus Areas",
    ["Color Scheme", "Typography", "Layout", "Navigation", 
     "Interactions", "Accessibility", "Branding", "Market Fit"]
)

context = st.text_area(
    "Additional Context",
    placeholder="Describe your product, target audience..."
)
  1. Image Processing Function:

def process_images(files):
    processed_images = []
    for file in files:
        temp_dir = tempfile.gettempdir()
        temp_path = os.path.join(temp_dir, f"temp_{file.name}")
        with open(temp_path, "wb") as f:
            f.write(file.getvalue())
        processed_images.append(temp_path)
    return processed_images
  1. Visual Design Analysis:

if "Visual Design" in analysis_types:
    vision_prompt = f"""
    Analyze these designs focusing on: {', '.join(specific_elements)}
    Additional context: {context}
    Provide specific insights about visual design elements.
    """
    response = vision_agent.run(
        message=vision_prompt,
        images=all_images
    )
  1. UX Analysis:

if "User Experience" in analysis_types:
    ux_prompt = f"""
    Evaluate the user experience considering: {', '.join(specific_elements)}
    Additional context: {context}
    Focus on user flows, interactions, and accessibility.
    """
    response = ux_agent.run(
        message=ux_prompt,
        images=all_images
    )
  1. Market Analysis:

if "Market Analysis" in analysis_types:
    market_prompt = f"""
    Analyze market positioning and trends based on these designs.
    Context: {context}
    Compare with competitor designs if provided.
    """
    response = market_agent.run(
        message=market_prompt,
        images=all_images
    )
  1. Results Display:

st.subheader("🎨 Visual Design Analysis")
st.markdown(response.content)

st.subheader("🔄 UX Analysis")
st.markdown(response.content)

st.subheader("📊 Market Analysis")
st.markdown(response.content)

Running the App

With our code in place, it's time to launch the app.

  • In your terminal, navigate to the project folder, and run the following command

streamlit run design_agent_team.py

Working Application Demo

Conclusion

And you've just built a powerful multi-agent design analysis team with multiple AI agents powered by Gemini 2.0. This tool can significantly streamline any design review process and provide valuable insights for improvement.

As you continue developing your AI agent team, consider these enhancements:

  • Adding support for video analysis using Gemini's video capabilities

  • Creating custom analysis templates for different design types

  • Adding export capabilities for reports

Keep experimenting and refining to build smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.