• unwind ai
  • Posts
  • Fine-tune Llama 3.2 for Free in 30 Lines of Python Code

Fine-tune Llama 3.2 for Free in 30 Lines of Python Code

Fine-tune Llama 3.2 in Google Colab for free (step-by-step instructions with Code)

Meta’s new Llama 3.2 models are here, offering incredible advancements in speed and accuracy for their size. Do you want to fine-tune the models but are worried about the complexity and cost? Look no further!

In this blog post, we’ll walk you through finetuning Llama 3.2 models (1B and 3B) using Unsloth AI and Low-Rank Adaptation (LoRA) for efficient tuning in just 30 lines of Python code. We’ll also leverage the FineTome-100k dataset to train the model, but you can easily swap in your own dataset.

With Unsloth, the process is faster than ever—2x faster, in fact. And the best part? You can finetune Llama 3.2 for free on Google Colab.

🎁 $50 worth AI Bonus Content at the end!

What We’re Building

In this tutorial, we’ll use Unsloth to finetune Llama 3.2 models on the FineTome-100k dataset. This will allow you to:

  • Finetune Llama 3.2 models (1B and 3B) using LoRA for faster, efficient tuning

  • Configure training parameters like batch size, sequence length, and more

  • Run everything on Google Colab’s free GPU for a cost-effective solution

Prerequisites

Before we begin, make sure you have:

  1. A Google Colab account (or a local Python environment with a GPU)

  2. Python installed on your machine (version 3.7 or higher is recommended)

  3. Basic familiarity with Python programming

Step-by-Step Instructions

Step 1: Setting Up the Environment

First, let's get our development environment ready:

  1. Clone the GitHub repository:

git clone https://github.com/Shubhamsaboo/awesome-llm-apps.git
  1. Go to the llama3.2_finetuning folder:

cd llama3.2_finetuning
pip install -r requirements.txt

Step 2: Finetuning Llama 3.2

Open the script finetune_llama3.2.py in Google Colab. The script will guide you through finetuning Llama 3.2 with LoRA and Unsloth.:

  • Import Required Libraries: At the top of your file, import the following libraries.
    Torch for PyTorch operations
    Unsloth for 2x faster fine-tuning
    Datasets and TRL for dataset management and training

import torch
from unsloth import FastLanguageModel
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth.chat_templates import get_chat_template, standardize_sharegpt
  • Loading the Pre-trained Model: Next, we'll load the pre-trained Llama 3.2 model:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-3B-Instruct",
    max_seq_length=2048, load_in_4bit=True,
)

This code uses Unsloth's FastLanguageModel to load the Llama 3.2 3B Instruct model. We set the maximum sequence length to 2048 and enable 4-bit quantization for efficiency

  • Adding LoRA Adapters: To make our finetuning more efficient, we'll use Low-Rank Adaptation (LoRA). It also targets specific modules in the model architecture:

model = FastLanguageModel.get_peft_model(
    model, r=16,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
)
  • Preparing the Dataset: Now, let's set up our chat template and prepare the dataset. This code loads the FineTome-100k dataset, standardizes it, and applies the Llama 3.1 chat template to format the conversations:

tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")
dataset = load_dataset("mlabonne/FineTome-100k", split="train")
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(
    lambda examples: {
        "text": [
            tokenizer.apply_chat_template(convo, tokenize=False)
            for convo in examples["conversations"]
        ]
    },
    batched=True
)
  • Configuring the Trainer: Now we'll set up the SFTTrainer with specific training arguments. This configuration sets the batch size, gradient accumulation, learning rate, and other training parameters. It also automatically selects FP16 or BF16 based on hardware support:

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        output_dir="outputs",
    ),
)
  • Training and Saving the Model: Finally, we'll start the training process and save our finetuned model:

trainer.train()
model.save_pretrained("finetuned_model")

This initiates the training process with our configured settings and saves the finetuned model to a directory named "finetuned_model".

Conclusion

And there you have it! With just 30 lines of Python code, we've set up a complete pipeline for finetuning Llama 3.2. This approach leverages the efficiency of Unsloth and the power of LoRA to make the process accessible even on free platforms like Google Colab.

Here's a summary of what we've accomplished:

  1. Set up the environment with necessary libraries

  2. Loaded the pre-trained Llama 3.2 model

  3. Applied LoRA for efficient finetuning

  4. Prepared the FineTome-100k dataset

  5. Configured the training process

  6. Trained and saved the finetuned model

Remember, you can easily modify this script to use different datasets, adjust the model size, or tweak the training parameters to suit your specific needs. Some ideas for customization:

  • Try different datasets by changing the load_dataset function call

  • Experiment with various LoRA configurations by adjusting the target_modules and r value

  • Modify the training hyperparameters in TrainingArguments to optimize for your specific use case

This tutorial demonstrates how accessible and straightforward it can be to finetune LLMs. With tools like Unsloth and techniques like LoRA, even resource-intensive tasks like finetuning Llama 3.2 can be performed on free platforms like Google Colab.

We share hands-on tutorials like this 2-3 times a week, to help you stay ahead in the world of AI. If you're serious about levelling up your AI skills and staying ahead of the curve, subscribe now and be the first to access our latest tutorials.

Happy coding!

Bonus worth $50 💵💰

Share this newsletter on your social channels and tag Unwind AI (X, LinkedIn, Threads, Facebook) to get AI resource pack worth $50 for FREE. Valid for limited time only!

Reply

or to participate.