- unwind ai
- Posts
- Build a Multimodal AI Agent Design Team
Build a Multimodal AI Agent Design Team
Fully functional multi-agent app using Gemini 2.0 Flash (step-by-step instructions)
Multi-agent AI systems are a powerful paradigm where specialized agents collaborate to solve complex problems. Each agent has distinct capabilities and objectives with which we can create systems that are robust and truly useful. When we add multimodal capabilities like images, text, videos, and structured data – these systems become even more powerful.
In this tutorial, we’re building a Multi-Agent Design Team powered by Google's new Gemini 2.0, where three specialized agents work in concert to provide comprehensive design insights.
Each agent uses Gemini's multimodal capabilities to understand design assets in different ways: analyzing visual hierarchies, evaluating interaction patterns, and contextualizing market positioning. The agents communicate and coordinate their findings to deliver unified, actionable insights.
We're using Phidata, a framework specifically designed for orchestrating AI agents. It provides the infrastructure for agent communication, memory management, and tool integration. Using Phidata, we can easily create agents that not only process multiple input modalities but also reason about them in combination.
Also, Gemini 2.0 Flash brings impressive capabilities to our AI agents with multimodality, excellent performance, and fast inference. The best part is that the API is free with a generous rate limit while the model is in the experimental phase!
What We’re Building
This application leverages multiple specialized AI agents to provide a comprehensive analysis of UI/UX designs of your product and your competitors, combining visual understanding, user experience evaluation, and market research insights.
Our Design Team:
Vision Agent - A visual analysis expert that identifies design elements, patterns, visual hierarchy, and evaluates composition fundamentals like color schemes and typography. It focuses on the technical aspects of visual design, analyzing everything from component relationships to overall brand consistency.
UX Agent - A user experience specialist that evaluates user flows, interaction patterns, and identifies usability issues and opportunities for improvement. It applies best practices in UX design and accessibility to provide actionable recommendations for enhancing user interaction.
Market Agent - A market research expert equipped with DuckDuckGo integration that analyzes market trends and competitor patterns while providing strategic positioning insights. This agent combines design analysis with market research to deliver context-aware recommendations and industry-specific guidance.
Features:
Integrated analysis across all three agent perspectives
Comparative analysis with competitor designs
Customizable focus areas for detailed insights
Context-aware analysis for better relevance
Real-time processing with progress indicators
Structured, actionable output
How the App Works
The application orchestrates the three agents through a structured analysis workflow:
Analysis Types and Agent Assignment:
Visual Design Analysis - Handled by the Vision Agent
Processes uploaded images
Analyzes specific elements like color schemes, typography, layout based on user-selected focus areas
Provides technical analysis of visual components
User Experience Analysis - Managed by the UX Agent
Evaluates the same images from a UX perspective
Focuses on user flows, interactions, and accessibility
Provides practical improvement suggestions
Market Analysis - Conducted by the Market Agent
Combines visual analysis with web research using DuckDuckGo
Provides market context and competitive insights
Suggests positioning strategies
Workflow Process:
Users upload design files and optional competitor designs
They select which types of analysis to run (can choose any combination of the three)
They can specify focus areas like Color Scheme, Typography, Layout, Navigation, Interactions, Accessibility, Branding, or Market Fit
Each selected analysis type triggers its respective agent
All agents have access to the same images but analyze them through their specialized lens
Results are compiled into a comprehensive report, with each agent's insights clearly separated
If multiple analysis types are selected, a combined "Key Takeaways" section shows how the different perspectives interconnect
Prerequisites
Before we begin, make sure you have the following:
Python installed on your machine (version 3.10 or higher is recommended)
Your Gemini API Key
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the ai_multimodal_design_agent folder:
ai_agent_tutorials/ai_multimodal_design_agent
Install the required dependencies:
pip install -r requirements.txt
API Key: Visit Google AI Studio > Create or select a project > Generate an API key
Creating the Streamlit App
Let’s create our app. Create a new file design_agent_team.py
and add the following code:
Import required libraries and setup:
• Streamlit for the interface
• Phidata for AI agents
• Pillow for image processing
• DuckDuckGo for web search
• Google Gemini as the LLM
from phi.agent import Agent
from phi.model.google import Gemini
from phi.tools.duckduckgo import DuckDuckGo
import streamlit as st
from PIL import Image
from typing import List, Optional
def initialize_agents(api_key: str) -> tuple[Agent, Agent, Agent]:
model = Gemini(id="gemini-2.0-flash-exp", api_key=api_key)
Create Vision Analysis Agent:
vision_agent = Agent(
model=model,
instructions=[
"You are a visual analysis expert that:",
"1. Identifies design elements, patterns, and visual hierarchy",
"2. Analyzes color schemes, typography, and layouts",
"3. Detects UI components and their relationships",
"4. Evaluates visual consistency and branding"
],
markdown=True
)
Create UX Analysis Agent:
ux_agent = Agent(
model=model,
instructions=[
"You are a UX analysis expert that:",
"1. Evaluates user flows and interaction patterns",
"2. Identifies usability issues and opportunities",
"3. Suggests UX improvements based on best practices",
"4. Analyzes accessibility and inclusive design"
],
markdown=True
)
Create Market Research Agent:
market_agent = Agent(
model=model,
tools=[DuckDuckGo(search=True)],
instructions=[
"You are a market research expert that:",
"1. Identifies market trends and competitor patterns",
"2. Analyzes similar products and features",
"3. Suggests market positioning and opportunities",
"4. Provides industry-specific insights"
],
markdown=True
)
Setup Streamlit Interface and API Configuration:
with st.sidebar:
st.header("🔑 API Configuration")
api_key = st.text_input(
"Enter your Gemini API Key",
value=st.session_state.api_key_input,
type="password",
help="Get your API key from Google AI Studio"
)
Create File Upload Section:
st.header("📤 Upload Content")
col1, space, col2 = st.columns([1, 0.1, 1])
with col1:
design_files = st.file_uploader(
"Upload UI/UX Designs",
type=["jpg", "jpeg", "png"],
accept_multiple_files=True
)
with col2:
competitor_files = st.file_uploader(
"Upload Competitor Designs (Optional)",
type=["jpg", "jpeg", "png"],
accept_multiple_files=True
)
Analysis Configuration:
st.header("🎯 Analysis Configuration")
analysis_types = st.multiselect(
"Select Analysis Types",
["Visual Design", "User Experience", "Market Analysis"]
)
specific_elements = st.multiselect(
"Focus Areas",
["Color Scheme", "Typography", "Layout", "Navigation",
"Interactions", "Accessibility", "Branding", "Market Fit"]
)
context = st.text_area(
"Additional Context",
placeholder="Describe your product, target audience..."
)
Image Processing Function:
def process_images(files):
processed_images = []
for file in files:
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, f"temp_{file.name}")
with open(temp_path, "wb") as f:
f.write(file.getvalue())
processed_images.append(temp_path)
return processed_images
Visual Design Analysis:
if "Visual Design" in analysis_types:
vision_prompt = f"""
Analyze these designs focusing on: {', '.join(specific_elements)}
Additional context: {context}
Provide specific insights about visual design elements.
"""
response = vision_agent.run(
message=vision_prompt,
images=all_images
)
UX Analysis:
if "User Experience" in analysis_types:
ux_prompt = f"""
Evaluate the user experience considering: {', '.join(specific_elements)}
Additional context: {context}
Focus on user flows, interactions, and accessibility.
"""
response = ux_agent.run(
message=ux_prompt,
images=all_images
)
Market Analysis:
if "Market Analysis" in analysis_types:
market_prompt = f"""
Analyze market positioning and trends based on these designs.
Context: {context}
Compare with competitor designs if provided.
"""
response = market_agent.run(
message=market_prompt,
images=all_images
)
Results Display:
st.subheader("🎨 Visual Design Analysis")
st.markdown(response.content)
st.subheader("🔄 UX Analysis")
st.markdown(response.content)
st.subheader("📊 Market Analysis")
st.markdown(response.content)
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run design_agent_team.py
Streamlit will provide a local URL (typically http://localhost:8501).
Working Application Demo
Conclusion
And you've just built a powerful multi-agent design analysis team with multiple AI agents powered by Gemini 2.0. This tool can significantly streamline any design review process and provide valuable insights for improvement.
As you continue developing your AI agent team, consider these enhancements:
Adding support for video analysis using Gemini's video capabilities
Creating custom analysis templates for different design types
Adding export capabilities for reports
Keep experimenting and refining to build smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply