- unwind ai
- Posts
- Build a local ChatGPT Clone with memory using Llama 3.1
Build a local ChatGPT Clone with memory using Llama 3.1
Local AI chatbot with memory and vector database absolutely free and without internet (step-by-step instructions)
A ChatGPT-like assistant that runs entirely offline and recalls past conversations—an AI that learns from each chat and personalizes its responses, all without any internet dependency. Giving this kind of control to users is a powerful way to make AI both secure and adaptable for private use cases.
In this tutorial, we’ll build a local ChatGPT clone using Llama 3.1 8B with a memory feature, making it capable of recalling past conversations. All components, from the language model to memory and vector storage, will run on your local machine
For this app, we’re using Qdrant for vector storage, Ollama to run Llama 3.1 locally, and Mem0 to manage memory.
What is Mem0.ai?
Mem0 is an opensource framework to enhance AI assistants and agents with an intelligent memory layer for personalized AI interactions. Mem0 remembers user preferences, adapts to individual needs, and continuously improves over time. It manages storing, retrieving, and managing contextual information, and is super simple to integrate in your workflows.
What We’re Building
This Streamlit application implements a fully local ChatGPT-like experience using Llama 3.1, featuring personalized memory storage for each user. All components, including the language model, embeddings, and vector store, run locally without requiring external API keys.
Features
Powered by Llama 3.1 8B via Ollama
Personal memory space for each user
Local embedding generation using Nomic Embed
Vector storage with Qdrant
Prerequisites
Before we begin, make sure you have:
Step-by-Step Instructions
Setting Up the Environment
First, let's get our development environment ready:
Clone the GitHub repository:
git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
Go to the local_chatgpt_with_memor folder:
cd llm_apps_with_memory_tutorials/local_chatgpt_with_memory
Install the required dependencies:
pip install -r requirements.txt
Install and start Qdrant vector database locally
docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
Install Ollama and pull Llama 3.1 8B
ollama pull llama3.1
Code Walkthrough
Let’s create our app. Create a new file local_chatgpt_memory.py
and add the following code:
Import necessary libraries:
• Qdrant for vector storage• Mem0 for memory management
• LiteLLM for model interactions
• Ollama for running local Llama 3.1
import streamlit as st
from mem0 import Memory
from litellm import completion
Set up configuration for vector store, LLM and Embedder:
• Local Qdrant setup
• Llama 3.1 configuration
• Embeddings configuration
config = {
"vector_store": {
"provider": "qdrant",
"config": {
"collection_name": "local-chatgpt-memory",
"host": "localhost",
"port": 6333,
"embedding_model_dims": 768,
},
},
"llm": {
"provider": "ollama",
"config": {
"model": "llama3.1:latest",
"temperature": 0,
"max_tokens": 8000,
"ollama_base_url": "http://localhost:11434", # Ensure this URL is correct
},
},
"embedder": {
"provider": "ollama",
"config": {
"model": "nomic-embed-text:latest",
# Alternatively, you can use "snowflake-arctic-embed:latest"
"ollama_base_url": "http://localhost:11434",
},
},
"version": "v1.1"
}
Initialize Streamlit app and session state:
• Manages chat history
• Tracks user sessions
• Personal memory spaces
st.title("Local ChatGPT using Llama 3.1 with Personal Memory 🧠")
st.caption("Each user gets their own personalized memory space!")
# Initialize session state for chat history and previous user ID
if "messages" not in st.session_state:
st.session_state.messages = []
if "previous_user_id" not in st.session_state:
st.session_state.previous_user_id = None
Create user authentication sidebar:
• Simple username login
• Session management
• Memory persistence
with st.sidebar:
st.title("User Settings")
user_id = st.text_input("Enter your Username", key="user_id")
# Check if user ID has changed
if user_id != st.session_state.previous_user_id:
st.session_state.messages = [] # Clear chat history
st.session_state.previous_user_id = user_id # Update previous user ID
Add memory viewing functionality:
• View personal memory
• User-specific context
• Organized display
if user_id:
st.success(f"Logged in as: {user_id}")
# Initialize Memory with the configuration
m = Memory.from_config(config)
# Memory viewing section
st.header("Memory Context")
if st.button("View My Memory"):
memories = m.get_all(user_id=user_id)
if memories and "results" in memories:
st.write(f"Memory history for **{user_id}**:")
for memory in memories["results"]:
if "memory" in memory:
st.write(f"- {memory['memory']}")
Implement chat interface:
• Chat input handling
• Memory storage
• Message display
if user_id: # Only show chat interface if user is "logged in"
# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# User input
if prompt := st.chat_input("What is your message?"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message
with st.chat_message("user"):
st.markdown(prompt)
# Add to memory
m.add(prompt, user_id=user_id)
Generate responses with context:
• Uses Llama 3.1 locally
• Includes memory context
• Streams responses
memories = m.get_all(user_id=user_id)
context = ""
if memories and "results" in memories:
for memory in memories["results"]:
if "memory" in memory:
context += f"- {memory['memory']}\n"
# Generate assistant response
with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
# Stream the response
try:
response = completion(
model="ollama/llama3.1:latest",
messages=[
{"role": "system", "content": "You are a helpful assistant with access to past conversations. Use the context provided to give personalized responses."},
{"role": "user", "content": f"Context from previous conversations with {user_id}: {context}\nCurrent message: {prompt}"}
],
api_base="http://localhost:11434",
stream=True
)
Handle streaming responses:
• Real-time updates
• Smooth typing effect
• Error handling
for chunk in response:
if hasattr(chunk, 'choices') and len(chunk.choices) > 0:
content = chunk.choices[0].delta.get('content', '')
if content:
full_response += content
message_placeholder.markdown(full_response + "▌")
# Final update
message_placeholder.markdown(full_response)
except Exception as e:
st.error(f"Error generating response: {str(e)}")
full_response = "I apologize, but I encountered an error generating the response."
message_placeholder.markdown(full_response)
Store chat history and memory:
• Maintains conversation flow
• Updates memory
• Preserves context
st.session_state.messages.append({"role": "assistant", "content": full_response})
# Add response to memory
m.add(f"Assistant: {full_response}", user_id=user_id)
else:
st.info("👈 Please enter your username in the sidebar to start chatting!")
Running the App
With our code in place, it's time to launch the app.
In your terminal, navigate to the project folder, and run the following command
streamlit run local_chatgpt_memory.py
Streamlit will provide a local URL (typically http://localhost:8501 or 8503). Open this in your web browser > Put in your API keys > Give your prompt > Hit Send to All LLMs button > Watch your LLMs responses side-by-side.
Working Application Demo
Conclusion
Your local ChatGPT clone with memory is ready that operates fully offline. This setup ensures privacy, enhanced control, and a flexible base for further development.
For enhancements, consider:
Adding custom instructions: Allow users to set personalized prompts to tailor responses further.
Building Voice Interaction: Enable voice input and output for a more interactive experience.
Topic Segmentation: Enable the assistant to organize memories by topic, making it easier to recall relevant details based on user-selected themes.
Configurable Privacy Modes: Introduce a “private mode” that temporarily disables memory storage so that sensitive conversations are not saved.
Keep experimenting and refining to build even smarter AI solutions!
We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.
Reply