- unwind ai
- Posts
- Build an AI Data Analysis Agent
Build an AI Data Analysis Agent
Fully functional AI agent app using GPT-4o (step-by-step instructions)
Data analysis often requires complex SQL queries and deep technical knowledge, creating a barrier for many who need quick insights from their data. What if we could make data analysis as simple as having a conversation?
In this tutorial, we'll build an AI Data Analysis Agent that lets users analyze CSV and Excel files using natural language queries. Powered by GPT-4o and DuckDB, this tool translates plain English questions into SQL queries, making data analysis accessible to everyone – no SQL expertise required.
We're using Phidata, a framework specifically designed for building and orchestrating AI agents. It provides the infrastructure for agent communication, memory management, and tool integration.
What We’re Building
An AI data analysis Agent built using the Phidata Agent framework and Openai's GPT-4o model. This agent helps users analyze their data - csv, excel files through natural language queries, powered by OpenAI's language models and DuckDB for efficient data processing - making data analysis accessible to users regardless of their SQL expertise.
Features
📤 File Upload Support:
Upload CSV and Excel files
Automatic data type detection and schema inference
Support for multiple file formats
💬 Natural Language Queries:
Convert natural language questions into SQL queries
Get instant answers about your data
No SQL knowledge required
🔍 Advanced Analysis:
Perform complex data aggregations
Filter and sort data
Generate statistical summaries
Create data visualizations
🎯 Interactive UI:
User-friendly Streamlit interface
Real-time query processing
Clear result presentation
Prerequisites
Before we begin, make sure you have the following:
Python installed on your machine (version 3.10 or higher is recommended)
Your OpenAI API Key
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)
Basic familiarity with Python programming
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the
ai_data_analyst.py
folder:
cd ai_agent_tutorials/ai_data_analysis_agent
Install the required dependencies:
pip install -r requirements.txt
Get your API Key: You'll need to obtain OpenAI API key.
Creating the Streamlit App
Let’s create our app. Create a new file ai_data_analyst.py
and add the following code:
Let's set up our imports:
import json
import tempfile
import csv
import streamlit as st
import pandas as pd
from phi.model.openai import OpenAIChat
from phi.agent.duckdb import DuckDbAgent
from phi.tools.pandas import PandasTools
import re
Create file preprocessing function:
def preprocess_and_save(file):
try:
# Handle different file types
if file.name.endswith('.csv'):
df = pd.read_csv(file, encoding='utf-8')
elif file.name.endswith('.xlsx'):
df = pd.read_excel(file)
# Clean and format data
for col in df.select_dtypes(include=['object']):
df[col] = df[col].astype(str).replace({r'"': '""'}, regex=True)
# Handle dates and numbers
for col in df.columns:
if 'date' in col.lower():
df[col] = pd.to_datetime(df[col], errors='coerce')
Add data type conversion and CSV saving:
# Save to temp file
with tempfile.NamedTemporaryFile(delete=False, suffix=".csv") as temp_file:
temp_path = temp_file.name
df.to_csv(temp_path, index=False, quoting=csv.QUOTE_ALL)
return temp_path, df.columns.tolist(), df
except Exception as e:
st.error(f"Error processing file: {e}")
return None, None, None
Set up Streamlit interface:
st.title("📊 Data Analyst Agent")
with st.sidebar:
st.header("API Keys")
openai_key = st.text_input("Enter your OpenAI API key:", type="password")
if openai_key:
st.session_state.openai_key = openai_key
Add file upload and data display:
uploaded_file = st.file_uploader("Upload a CSV or Excel file", type=["csv", "xlsx"])
if uploaded_file is not None:
temp_path, columns, df = preprocess_and_save(uploaded_file)
st.write("Uploaded Data:")
st.dataframe(df)
st.write("Uploaded columns:", columns)
Configure semantic model:
semantic_model = {
"tables": [
{
"name": "uploaded_data",
"description": "Contains the uploaded dataset.",
"path": temp_path,
}
]
}
Initialize DuckDB Agent:
duckdb_agent = DuckDbAgent(
model=OpenAIChat(model="gpt-4", api_key=st.session_state.openai_key),
semantic_model=json.dumps(semantic_model),
tools=[PandasTools()],
markdown=True,
system_prompt="You are an expert data analyst. Generate SQL queries..."
)
Create query interface:
user_query = st.text_area("Ask a query about the data:")
if st.button("Submit Query"):
if user_query.strip() == "":
st.warning("Please enter a query.")
else:
with st.spinner('Processing your query...'):
response1 = duckdb_agent.run(user_query)
response = duckdb_agent.print_response(
user_query,
stream=True
)
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run ai_data_analyst.py
Streamlit will provide a local URL (typically http://localhost:8501).
Working Application Demo
Conclusion
You've built a powerful AI Data Analysis Agent that makes data analysis accessible to everyone. This tool bridges the gap between complex data analysis and user-friendly interfaces.
To enhance the application further, consider:
Adding support for more file formats
Implementing data visualization capabilities
Creating a query history feature
Adding export options for analysis results
Implementing batch processing for multiple files
Keep experimenting and refining to build smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply