• unwind ai
  • Posts
  • Build an AI Data Analysis Agent

Build an AI Data Analysis Agent

Fully functional AI agent app using GPT-4o (step-by-step instructions)

Data analysis often requires complex SQL queries and deep technical knowledge, creating a barrier for many who need quick insights from their data. What if we could make data analysis as simple as having a conversation?

In this tutorial, we'll build an AI Data Analysis Agent that lets users analyze CSV and Excel files using natural language queries. Powered by GPT-4o and DuckDB, this tool translates plain English questions into SQL queries, making data analysis accessible to everyone – no SQL expertise required.

We're using Phidata, a framework specifically designed for building and orchestrating AI agents. It provides the infrastructure for agent communication, memory management, and tool integration.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

An AI data analysis Agent built using the Phidata Agent framework and Openai's GPT-4o model. This agent helps users analyze their data - csv, excel files through natural language queries, powered by OpenAI's language models and DuckDB for efficient data processing - making data analysis accessible to users regardless of their SQL expertise.

Features

  • 📤 File Upload Support:

    • Upload CSV and Excel files

    • Automatic data type detection and schema inference

    • Support for multiple file formats

  • 💬 Natural Language Queries:

    • Convert natural language questions into SQL queries

    • Get instant answers about your data

    • No SQL knowledge required

  • 🔍 Advanced Analysis:

    • Perform complex data aggregations

    • Filter and sort data

    • Generate statistical summaries

    • Create data visualizations

  • 🎯 Interactive UI:

    • User-friendly Streamlit interface

    • Real-time query processing

    • Clear result presentation

Prerequisites

Before we begin, make sure you have the following:

  1. Python installed on your machine (version 3.10 or higher is recommended)

  2. Your OpenAI API Key

  3. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

  4. Basic familiarity with Python programming

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the ai_data_analyst.py folder:

cd ai_agent_tutorials/ai_data_analysis_agent
pip install -r requirements.txt
  1. Get your API Key: You'll need to obtain OpenAI API key.

Creating the Streamlit App

Let’s create our app. Create a new file ai_data_analyst.py and add the following code:

  1. Let's set up our imports:

import json
import tempfile
import csv
import streamlit as st
import pandas as pd
from phi.model.openai import OpenAIChat
from phi.agent.duckdb import DuckDbAgent
from phi.tools.pandas import PandasTools
import re
  1. Create file preprocessing function:

def preprocess_and_save(file):
    try:
        # Handle different file types
        if file.name.endswith('.csv'):
            df = pd.read_csv(file, encoding='utf-8')
        elif file.name.endswith('.xlsx'):
            df = pd.read_excel(file)
            
        # Clean and format data
        for col in df.select_dtypes(include=['object']):
            df[col] = df[col].astype(str).replace({r'"': '""'}, regex=True)
            
        # Handle dates and numbers
        for col in df.columns:
            if 'date' in col.lower():
                df[col] = pd.to_datetime(df[col], errors='coerce')
  1. Add data type conversion and CSV saving:

# Save to temp file
        with tempfile.NamedTemporaryFile(delete=False, suffix=".csv") as temp_file:
            temp_path = temp_file.name
            df.to_csv(temp_path, index=False, quoting=csv.QUOTE_ALL)
            
        return temp_path, df.columns.tolist(), df
    except Exception as e:
        st.error(f"Error processing file: {e}")
        return None, None, None
  1. Set up Streamlit interface:

st.title("📊 Data Analyst Agent")

with st.sidebar:
    st.header("API Keys")
    openai_key = st.text_input("Enter your OpenAI API key:", type="password")
    if openai_key:
        st.session_state.openai_key = openai_key
  1. Add file upload and data display:

uploaded_file = st.file_uploader("Upload a CSV or Excel file", type=["csv", "xlsx"])

if uploaded_file is not None:
    temp_path, columns, df = preprocess_and_save(uploaded_file)
    
    st.write("Uploaded Data:")
    st.dataframe(df)
    st.write("Uploaded columns:", columns)
  1. Configure semantic model:

semantic_model = {
    "tables": [
        {
            "name": "uploaded_data",
            "description": "Contains the uploaded dataset.",
            "path": temp_path,
        }
    ]
}
  1. Initialize DuckDB Agent:

duckdb_agent = DuckDbAgent(
    model=OpenAIChat(model="gpt-4", api_key=st.session_state.openai_key),
    semantic_model=json.dumps(semantic_model),
    tools=[PandasTools()],
    markdown=True,
    system_prompt="You are an expert data analyst. Generate SQL queries..."
)
  1. Create query interface:

user_query = st.text_area("Ask a query about the data:")

if st.button("Submit Query"):
    if user_query.strip() == "":
        st.warning("Please enter a query.")
    else:
        with st.spinner('Processing your query...'):
            response1 = duckdb_agent.run(user_query)
            response = duckdb_agent.print_response(
                user_query,
                stream=True
            )

Running the App

With our code in place, it's time to launch the app.

  • In your terminal, navigate to the project folder, and run the following command

streamlit run ai_data_analyst.py

Working Application Demo

Conclusion

You've built a powerful AI Data Analysis Agent that makes data analysis accessible to everyone. This tool bridges the gap between complex data analysis and user-friendly interfaces.

To enhance the application further, consider:

  • Adding support for more file formats

  • Implementing data visualization capabilities

  • Creating a query history feature

  • Adding export options for analysis results

  • Implementing batch processing for multiple files

Keep experimenting and refining to build smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about leveling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.