• unwind ai
  • Posts
  • Build a local ChatGPT Clone with memory using Llama 3.1

Build a local ChatGPT Clone with memory using Llama 3.1

Local AI chatbot with memory and vector database absolutely free and without internet (step-by-step instructions)

A ChatGPT-like assistant that runs entirely offline and recalls past conversations—an AI that learns from each chat and personalizes its responses, all without any internet dependency. Giving this kind of control to users is a powerful way to make AI both secure and adaptable for private use cases.

In this tutorial, we’ll build a local ChatGPT clone using Llama 3.1 8B with a memory feature, making it capable of recalling past conversations. All components, from the language model to memory and vector storage, will run on your local machine

For this app, we’re using Qdrant for vector storage, Ollama to run Llama 3.1 locally, and Mem0 to manage memory.

What is Mem0.ai?

Mem0 is an opensource framework to enhance AI assistants and agents with an intelligent memory layer for personalized AI interactions. Mem0 remembers user preferences, adapts to individual needs, and continuously improves over time. It manages storing, retrieving, and managing contextual information, and is super simple to integrate in your workflows.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

What We’re Building

This Streamlit application implements a fully local ChatGPT-like experience using Llama 3.1, featuring personalized memory storage for each user. All components, including the language model, embeddings, and vector store, run locally without requiring external API keys.

Features

  • Powered by Llama 3.1 8B via Ollama

  • Personal memory space for each user

  • Local embedding generation using Nomic Embed

  • Vector storage with Qdrant

Prerequisites

Before we begin, make sure you have:

  1. Python installed on your machine (version 3.7 or higher is recommended)

  2. Ollama and Qdrant installed

  3. Basic familiarity with Python programming

  4. A code editor of your choice (we recommend VS Code or PyCharm for their excellent Python support)

Step-by-Step Instructions

Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the local_chatgpt_with_memor folder:

cd llm_apps_with_memory_tutorials/local_chatgpt_with_memory
pip install -r requirements.txt
  1. Install and start Qdrant vector database locally

docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant
  1. Install Ollama and pull Llama 3.1 8B

ollama pull llama3.1

Code Walkthrough

Let’s create our app. Create a new file local_chatgpt_memory.py and add the following code:

  1. Import necessary libraries:
    • Qdrant for vector storage

    • Mem0 for memory management

    • LiteLLM for model interactions

    • Ollama for running local Llama 3.1

import streamlit as st
from mem0 import Memory
from litellm import completion
  1. Set up configuration for vector store, LLM and Embedder:

    • Local Qdrant setup

    • Llama 3.1 configuration

    • Embeddings configuration

config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "collection_name": "local-chatgpt-memory",
            "host": "localhost",
            "port": 6333,
            "embedding_model_dims": 768,
        },
    },
    "llm": {
        "provider": "ollama",
        "config": {
            "model": "llama3.1:latest",
            "temperature": 0,
            "max_tokens": 8000,
            "ollama_base_url": "http://localhost:11434",  # Ensure this URL is correct
        },
    },
    "embedder": {
        "provider": "ollama",
        "config": {
            "model": "nomic-embed-text:latest",
            # Alternatively, you can use "snowflake-arctic-embed:latest"
            "ollama_base_url": "http://localhost:11434",
        },
    },
    "version": "v1.1"
}
  1. Initialize Streamlit app and session state:

    • Manages chat history

    • Tracks user sessions

    • Personal memory spaces

st.title("Local ChatGPT using Llama 3.1 with Personal Memory 🧠")
st.caption("Each user gets their own personalized memory space!")

# Initialize session state for chat history and previous user ID
if "messages" not in st.session_state:
    st.session_state.messages = []
if "previous_user_id" not in st.session_state:
    st.session_state.previous_user_id = None
  1. Create user authentication sidebar:

    • Simple username login

    • Session management

    • Memory persistence

with st.sidebar:
    st.title("User Settings")
    user_id = st.text_input("Enter your Username", key="user_id")
    
    # Check if user ID has changed
    if user_id != st.session_state.previous_user_id:
        st.session_state.messages = []  # Clear chat history
        st.session_state.previous_user_id = user_id  # Update previous user ID
  1. Add memory viewing functionality:

    • View personal memory

    • User-specific context

    • Organized display

    if user_id:
        st.success(f"Logged in as: {user_id}")
        
        # Initialize Memory with the configuration
        m = Memory.from_config(config)
        
        # Memory viewing section
        st.header("Memory Context")
        if st.button("View My Memory"):
            memories = m.get_all(user_id=user_id)
            if memories and "results" in memories:
                st.write(f"Memory history for **{user_id}**:")
                for memory in memories["results"]:
                    if "memory" in memory:
                        st.write(f"- {memory['memory']}")
  1. Implement chat interface:

    • Chat input handling

    • Memory storage

    • Message display

if user_id:  # Only show chat interface if user is "logged in"
    # Display chat history
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

    # User input
    if prompt := st.chat_input("What is your message?"):
        # Add user message to chat history
        st.session_state.messages.append({"role": "user", "content": prompt})
        
        # Display user message
        with st.chat_message("user"):
            st.markdown(prompt)

        # Add to memory
        m.add(prompt, user_id=user_id)
  1. Generate responses with context:

    • Uses Llama 3.1 locally

    • Includes memory context

    • Streams responses

        memories = m.get_all(user_id=user_id)
        context = ""
        if memories and "results" in memories:
            for memory in memories["results"]:
                if "memory" in memory:
                    context += f"- {memory['memory']}\n"

        # Generate assistant response
        with st.chat_message("assistant"):
            message_placeholder = st.empty()
            full_response = ""
            
            # Stream the response
            try:
                response = completion(
                    model="ollama/llama3.1:latest",
                    messages=[
                        {"role": "system", "content": "You are a helpful assistant with access to past conversations. Use the context provided to give personalized responses."},
                        {"role": "user", "content": f"Context from previous conversations with {user_id}: {context}\nCurrent message: {prompt}"}
                    ],
                    api_base="http://localhost:11434",
                    stream=True
                )  
  1. Handle streaming responses:

    • Real-time updates

    • Smooth typing effect

    • Error handling

                for chunk in response:
                    if hasattr(chunk, 'choices') and len(chunk.choices) > 0:
                        content = chunk.choices[0].delta.get('content', '')
                        if content:
                            full_response += content
                            message_placeholder.markdown(full_response + "")
                
                # Final update
                message_placeholder.markdown(full_response)
            except Exception as e:
                st.error(f"Error generating response: {str(e)}")
                full_response = "I apologize, but I encountered an error generating the response."
                message_placeholder.markdown(full_response)
  1. Store chat history and memory:

    • Maintains conversation flow

    • Updates memory

    • Preserves context

        st.session_state.messages.append({"role": "assistant", "content": full_response})
        
        # Add response to memory
        m.add(f"Assistant: {full_response}", user_id=user_id)

else:
    st.info("👈 Please enter your username in the sidebar to start chatting!")

Running the App

With our code in place, it's time to launch the app.

  • In your terminal, navigate to the project folder, and run the following command

streamlit run local_chatgpt_memory.py
  • Streamlit will provide a local URL (typically http://localhost:8501 or 8503). Open this in your web browser > Put in your API keys > Give your prompt > Hit Send to All LLMs button > Watch your LLMs responses side-by-side.

Working Application Demo

Conclusion

Your local ChatGPT clone with memory is ready that operates fully offline. This setup ensures privacy, enhanced control, and a flexible base for further development.

For enhancements, consider:

  1. Adding custom instructions: Allow users to set personalized prompts to tailor responses further.

  2. Building Voice Interaction: Enable voice input and output for a more interactive experience.

  3. Topic Segmentation: Enable the assistant to organize memories by topic, making it easier to recall relevant details based on user-selected themes.

  4. Configurable Privacy Modes: Introduce a “private mode” that temporarily disables memory storage so that sensitive conversations are not saved.

Keep experimenting and refining to build even smarter AI solutions!

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Don’t forget to share this tutorial on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to support us!

Reply

or to participate.