unwind ai
Posts
Build a Web Scraping AI Agent with Llama 3.2 Running Locally

Build a Web Scraping AI Agent with Llama 3.2 Running Locally

LLM App using Llama 3.2 running locally in less than 40 lines of Python code (step-by-step instructions)

Shubham Saboo & Gargi Gupta
October 11, 2024

This tutorial will show you how to build a web-scraping AI agent that runs entirely on your local machine using Llama 3.2. With just a few lines of code, you can scrape any website and customize what information you want to extract, all powered by a locally running AI.

The AI agent uses Ollama to run the model locally and ScrapeGraphAI, a web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents.

🎁 $50 worth AI Bonus Content at the end!

What We’re Building

This Streamlit app will help you scrape websites using the local Llama 3.2 model via Ollama. With this agent, you can:

Scrape any website by entering the URL
Use Llama 3.2 as LLM for intelligent web scraping
Customize the scraping process by specifying what data you want the AI to extract.

You’ll enter the website URL and provide a specific instruction on what you need to scrape. The agent will return the extracted data directly to you.

Prerequisites

Before we begin, make sure you have:

Python installed on your machine (version 3.7 or higher is recommended)
Download Ollama and install Llama 3.2. Make sure it is running on your computer.
Basic familiarity with Python programming
A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git

🌟 Don't forget to star the opensource repo to show your support.

Go to the web_scrapping_ai_agent folder:

cd web_scrapping_ai_agent

Install the required dependencies:

pip install -r requirements.txt

Check that Ollama is running at localhost port 11434.

Creating the Streamlit App

Let’s create our Streamlit app. Create a new file `local_ai_scrapper.py and add the following code:

Import Required Libraries: At the top of your file, add
• Streamlit for building the web app
• Scrapegraph AI for creating scraping pipelines with LLMs

import streamlit as st
from scrapegraphai.graphs import SmartScraperGraph

Set up the Streamlit App: Streamlit lets you create the user interface. For this app we will add a title and a subtitle to the app using 'st.title()' and ‘st.caption()’

st.title("Web Scrapping AI Agent 🕵️‍♂️")
st.caption("This app allows you to scrape a website using Llama 3.2")

Configure the SmartScraperGraph:
• Set the LLM as 'ollama/llama3' served locally and output format as json.
• Set the embedding model as 'ollama/nomic-embed-text'

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "format": "json",  # Ollama needs the format to be specified explicitly
        "base_url": "http://localhost:11434",  # set Ollama URL
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434",  # set Ollama URL
    },
    "verbose": True,
}

Get the website URL and user prompt:
• Use 'st.text_input()' to get the URL of the website to scrape
• Use 'st.text_input()' to get the user prompt specifying what to scrape from the website

url = st.text_input("Enter the URL of the website you want to scrape")
# Get the user prompt
user_prompt = st.text_input("What you want the AI agent to scrae from the website?")

Initialize the SmartScraperGraph: Create an instance of SmartScraperGraph with the user prompt, website URL, and graph configuration

smart_scraper_graph = SmartScraperGraph(
    prompt=user_prompt,
    source=url,
    config=graph_config
)

Scrape the website and display the result:
• Add a "Scrape" button using 'st.button()'
• When the button is clicked, run the SmartScraperGraph and display the result using 'st.write()'

  if st.button("Scrape"):
    result = smart_scraper_graph.run()
    st.write(result)

Running the App

With our code in place, it's time to launch the app.

Start the Streamlit App: In your terminal, navigate to the project folder, and run the following command

streamlit run local_ai_scrapper.py

Access Your AI Assistant: Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, give it the URL of the website you want the AI, and have fun!

Working Application Demo

Conclusion

And your fully functional Web Scraping AI agent is ready! You've successfully built a powerful tool using Llama 3.2, customized scraping tasks with ScrapeGraphAI, and provided an easy-to-use interface through Streamlit.

For the next steps, consider expanding the agent's capabilities by integrating more APIs to handle dynamic content or adding advanced scraping features like handling pagination or login-required sites. You could also improve the scraping efficiency by exploring more sophisticated techniques for parsing complex websites.

Keep experimenting and refining to build even smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Bonus worth $50 💵💰

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Reply

or to participate.