Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/sagemaker/08_distributed_summarization_bart_t5/sagemaker-notebook.ipynb
Views: 2543
Huggingface Sagemaker-sdk - Distributed Training Demo
Distributed Summarization with transformers
scripts + Trainer
and samsum
dataset
Tutorial
We will use the new Hugging Face DLCs and Amazon SageMaker extension to train a distributed Seq2Seq-transformer model on summarization
using the transformers
and datasets
libraries and upload it afterwards to huggingface.co and test it.
As distributed training strategy we are going to use SageMaker Data Parallelism, which has been built into the Trainer API. To use data-parallelism we only have to define the distribution
parameter in our HuggingFace
estimator.
In this tutorial, we will use an Amazon SageMaker Notebook Instance for running our training job. You can learn here how to set up a Notebook Instance.
What are we going to do:
Set up a development environment and install sagemaker
Chose 🤗 Transformers
examples/
scriptConfigure distributed training and hyperparameters
Create a
HuggingFace
estimator and start trainingUpload the fine-tuned model to huggingface.co
Test inference
Model and Dataset
We are going to fine-tune facebook/bart-base on the samsum dataset. "BART is sequence-to-sequence model trained with denoising as pretraining objective." [REF]
The samsum
dataset contains about 16k messenger-like conversations with summaries.
NOTE: You can run this demo in Sagemaker Studio, your local machine or Sagemaker Notebook Instances
Set up a development environment and install sagemaker
Installation
Note:Â The use of Jupyter is optional: We could also launch SageMaker Training jobs from anywhere we have an SDK installed, connectivity to the cloud and appropriate permissions, such as a Laptop, another IDE or a task scheduler like Airflow or AWS Step Functions.
Development environment
Permissions
If you are going to use Sagemaker in a local environment. You need access to an IAM Role with the required permissions for Sagemaker. You can find here more about it.
Choose 🤗 Transformers examples/
script
The 🤗 Transformers repository contains several examples/
scripts for fine-tuning models on tasks from language-modeling
to token-classification
. In our case, we are using the run_summarization.py
from the seq2seq/
examples.
Note: you can use this tutorial identical to train your model on a different examples script.
Since the HuggingFace
 Estimator has git support built-in, we can specify a training script that is stored in a GitHub repository as entry_point
 and source_dir
.
We are going to use the transformers 4.4.2
DLC which means we need to configure the v4.4.2
as the branch to pull the compatible example scripts.
Configure distributed training and hyperparameters
Next, we will define our hyperparameters
and configure our distributed training strategy. As hyperparameter, we can define any Seq2SeqTrainingArguments and the ones defined in run_summarization.py.
Create a HuggingFace
estimator and start training
Deploying the endpoint
To deploy our endpoint, we call deploy()
on our HuggingFace estimator object, passing in our desired number of instances and instance type.
Then, we use the returned predictor object to call the endpoint.
Finally, we delete the endpoint again.
Upload the fine-tuned model to huggingface.co
We can download our model from Amazon S3 and unzip it using the following snippet.
Before we are going to upload our model to huggingface.co we need to create a model_card
. The model_card
describes the model includes hyperparameters, results and which dataset was used for training. To create a model_card
we create a README.md
in our local_path
After we extract all the metrics we want to include we are going to create our README.md
. Additionally to the automated generation of the results table we add the metrics manually to the metadata
of our model card under model-index
After we have our unzipped model and model card located in my_bart_model
we can use the either huggingface_hub
SDK to create a repository and upload it to huggingface.co or go to https://huggingface.co/new an create a new repository and upload it.