Gemma-3n.net

Fine-tuning Gemma 3n: A Deep Dive with Unsloth

The Gemma-3n.net Team
June 29, 2025

While running pre-trained Gemma 3n models is powerful, the real magic happens when you adapt them to your specific domain or task. This process, known as fine-tuning, can dramatically improve performance on specialized datasets. However, fine-tuning large models has traditionally been resource-intensive.

Enter Unsloth. Unsloth is a revolutionary library that makes fine-tuning up to 2-5x faster while using 50-80% less memory, with no loss in accuracy. This makes it possible to fine-tune models like Gemma 3n on free Google Colab notebooks or consumer-grade GPUs.

In this deep dive, we’ll walk through the concepts and code needed to fine-tune Gemma 3n using Unsloth and the Hugging Face ecosystem.

Key Concepts

Step 1: Setting Up the Environment (Google Colab)

The easiest way to start is with a free Google Colab notebook, which provides a T4 GPU. The following code will install all necessary libraries.

# Install necessary libraries
!pip install "unsloth[colab-ampere] @ git+https://github.com/unslothai/unsloth.git"
!pip install "git+https://github.com/huggingface/transformers.git"
!pip install "git+https://github.com/huggingface/peft.git"
!pip install "git+https://github.com/huggingface/accelerate.git"
!pip install "git+https://github.com/huggingface/trl.git"
!pip install "git+https://github.com/huggingface/datasets.git"

Step 2: Loading the Model and Tokenizer with Unsloth

We’ll use Unsloth’s FastLanguageModel to load Gemma 3n. This special wrapper automatically applies Unsloth’s performance optimizations.

from unsloth import FastLanguageModel
import torch

# Model options: unsloth/gemma-3n-e4b-it-bnb-4bit
model_name = "unsloth/gemma-3n-e4b-it-bnb-4bit"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = 2048,
    dtype = None, # Autodetect
    load_in_4bit = True,
)

Here, load_in_4bit=True is the key. It loads the model with its weights quantized to 4-bit precision, which is what drastically reduces the memory footprint.

Step 3: Configuring LoRA Adapters

Now, we add the LoRA adapters to the model. Unsloth makes this easy.

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # LoRA rank
    lora_alpha = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 42,
)

We are telling Unsloth to add LoRA adapters to all the key attention and feed-forward layers of the model.

Step 4: Preparing Your Dataset

For this example, we’ll use a simple dataset for turning instructions into JSON. You can replace this with any dataset that has a text column.

from datasets import load_dataset

# A simple dataset for demonstration
alpaca_prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{}

### Response:
{}"""

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    outputs      = examples["output"]
    texts = []
    for instruction, output in zip(instructions, outputs):
        text = alpaca_prompt.format(instruction, output)
        texts.append(text)
    return { "text" : texts, }

# Load and format the dataset
dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

Step 5: Training the Model

We use the SFTTrainer (Supervised Fine-tuning Trainer) from the trl library, which works perfectly with our Unsloth-prepared model.

from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60, # A short training run for demonstration
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "outputs",
    ),
)

trainer.train()

Video Guide

For a more in-depth explanation of fine-tuning with Unsloth, including helpful analysis, watch this video:

Conclusion

You have now successfully fine-tuned Gemma 3n on a custom task! This process, made incredibly efficient by Unsloth, allows you to create highly specialized models for your own needs. You can now save your trained LoRA adapters or merge them into the base model for deployment.

This is a foundational skill for any advanced AI developer, and mastering it opens the door to a new level of application development.

Share this article