Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/course/en/chapter11/section4.ipynb
Views: 2935
How to Fine-Tune LLMs with LoRA Adapters using Hugging Face TRL
This notebook demonstrates how to efficiently fine-tune large language models using LoRA (Low-Rank Adaptation) adapters. LoRA is a parameter-efficient fine-tuning technique that:
Freezes the pre-trained model weights
Adds small trainable rank decomposition matrices to attention layers
Typically reduces trainable parameters by ~90%
Maintains model performance while being memory efficient
We'll cover:
Setup development environment and LoRA configuration
Create and prepare the dataset for adapter training
Fine-tune using
trl
andSFTTrainer
with LoRA adaptersTest the model and merge adapters (optional)
1. Setup development environment
Our first step is to install Hugging Face Libraries and Pytorch, including trl, transformers and datasets. If you haven't heard of trl yet, don't worry. It is a new library on top of transformers and datasets, which makes it easier to fine-tune, rlhf, align open LLMs.
2. Load the dataset
3. Fine-tune LLM using trl
and the SFTTrainer
with LoRA
The SFTTrainer from trl
provides integration with LoRA adapters through the PEFT library. Key advantages of this setup include:
Memory Efficiency:
Only adapter parameters are stored in GPU memory
Base model weights remain frozen and can be loaded in lower precision
Enables fine-tuning of large models on consumer GPUs
Training Features:
Native PEFT/LoRA integration with minimal setup
Support for QLoRA (Quantized LoRA) for even better memory efficiency
Adapter Management:
Adapter weight saving during checkpoints
Features to merge adapters back into base model
We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps:
Define the LoRA configuration (rank, alpha, dropout)
Create the SFTTrainer with PEFT config
Train and save the adapter weights
The SFTTrainer
supports a native integration with peft
, which makes it super easy to efficiently tune LLMs using, e.g. LoRA. We only need to create our LoraConfig
and provide it to the trainer.
Exercise: Define LoRA parameters for finetuning
Take a dataset from the Hugging Face hub and finetune a model on it.
Difficulty Levels
🐢 Use the general parameters for an abitrary finetune
🐕 Adjust the parameters and review in weights & biases.
🦁 Adjust the parameters and show change in inference results.
Before we can start our training we need to define the hyperparameters (TrainingArguments
) we want to use.
We now have every building block we need to create our SFTTrainer
to start then training our model.
Start training our model by calling the train()
method on our Trainer
instance. This will start the training loop and train our model for 3 epochs. Since we are using a PEFT method, we will only save the adapted model weights and not the full model.
The training with Flash Attention for 3 epochs with a dataset of 15k samples took 4:14:36 on a g5.2xlarge
. The instance costs 1.21$/h
which brings us to a total cost of only ~5.3$
.
Merge LoRA Adapter into the Original Model
When using LoRA, we only train adapter weights while keeping the base model frozen. During training, we save only these lightweight adapter weights (~2-10MB) rather than a full model copy. However, for deployment, you might want to merge the adapters back into the base model for:
Simplified Deployment: Single model file instead of base model + adapters
Inference Speed: No adapter computation overhead
Framework Compatibility: Better compatibility with serving frameworks
3. Test Model and run Inference
After the training is done we want to test our model. We will load different samples from the original dataset and evaluate the model on those samples, using a simple loop and accuracy as our metric.
Bonus Exercise: Load LoRA Adapter
Use what you learnt from the ecample note book to load your trained LoRA adapter for inference.
Lets test some prompt samples and see how the model performs.