- unwind ai
- Posts
- Build a Web Scraping AI Agent with Llama 3.2 Running Locally
Build a Web Scraping AI Agent with Llama 3.2 Running Locally
LLM App using Llama 3.2 running locally in less than 40 lines of Python code (step-by-step instructions)
This tutorial will show you how to build a web-scraping AI agent that runs entirely on your local machine using Llama 3.2. With just a few lines of code, you can scrape any website and customize what information you want to extract, all powered by a locally running AI.
The AI agent uses Ollama to run the model locally and ScrapeGraphAI, a web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents.
🎁 $50 worth AI Bonus Content at the end!
What We’re Building
This Streamlit app will help you scrape websites using the local Llama 3.2 model via Ollama. With this agent, you can:
Scrape any website by entering the URL
Use Llama 3.2 as LLM for intelligent web scraping
Customize the scraping process by specifying what data you want the AI to extract.
You’ll enter the website URL and provide a specific instruction on what you need to scrape. The agent will return the extracted data directly to you.
Prerequisites
Before we begin, make sure you have:
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the web_scrapping_ai_agent folder:
cd web_scrapping_ai_agent
Install the required dependencies:
pip install -r requirements.txt
Check that Ollama is running at localhost port 11434.
Creating the Streamlit App
Let’s create our Streamlit app. Create a new file `local_ai_scrapper.py
and add the following code:
Import Required Libraries: At the top of your file, add
• Streamlit for building the web app
• Scrapegraph AI for creating scraping pipelines with LLMs
import streamlit as st
from scrapegraphai.graphs import SmartScraperGraph
Set up the Streamlit App: Streamlit lets you create the user interface. For this app we will add a title and a subtitle to the app using 'st.title()' and ‘st.caption()’
st.title("Web Scrapping AI Agent 🕵️♂️")
st.caption("This app allows you to scrape a website using Llama 3.2")
Configure the SmartScraperGraph:
• Set the LLM as 'ollama/llama3' served locally and output format as json.
• Set the embedding model as 'ollama/nomic-embed-text'
graph_config = {
"llm": {
"model": "ollama/llama3.2",
"temperature": 0,
"format": "json", # Ollama needs the format to be specified explicitly
"base_url": "http://localhost:11434", # set Ollama URL
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"base_url": "http://localhost:11434", # set Ollama URL
},
"verbose": True,
}
Get the website URL and user prompt:
• Use 'st.text_input()' to get the URL of the website to scrape
• Use 'st.text_input()' to get the user prompt specifying what to scrape from the website
url = st.text_input("Enter the URL of the website you want to scrape")
# Get the user prompt
user_prompt = st.text_input("What you want the AI agent to scrae from the website?")
Initialize the SmartScraperGraph: Create an instance of SmartScraperGraph with the user prompt, website URL, and graph configuration
smart_scraper_graph = SmartScraperGraph(
prompt=user_prompt,
source=url,
config=graph_config
)
Scrape the website and display the result:
• Add a "Scrape" button using 'st.button()'
• When the button is clicked, run the SmartScraperGraph and display the result using 'st.write()'
if st.button("Scrape"):
result = smart_scraper_graph.run()
st.write(result)
Running the App
With our code in place, it's time to launch the app.
Start the Streamlit App: In your terminal, navigate to the project folder, and run the following command
streamlit run local_ai_scrapper.py
Access Your AI Assistant: Streamlit will provide a local URL (typically http://localhost:8501). Open this in your web browser, give it the URL of the website you want the AI, and have fun!
Working Application Demo
Conclusion
And your fully functional Web Scraping AI agent is ready! You've successfully built a powerful tool using Llama 3.2, customized scraping tasks with ScrapeGraphAI, and provided an easy-to-use interface through Streamlit.
For the next steps, consider expanding the agent's capabilities by integrating more APIs to handle dynamic content or adding advanced scraping features like handling pagination or login-required sites. You could also improve the scraping efficiency by exploring more sophisticated techniques for parsing complex websites.
Keep experimenting and refining to build even smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply