GitHub Repository: better-data-science/TensorFlow
Path: blob/main/015_CNN_008_Transfer_Learning.ipynb
⁷² views

Kernel: Python 3 (ipykernel)

CNN 8 - Transfer Learning

Dataset:
- https://www.kaggle.com/shaunthesheep/microsoft-catsvsdogs-dataset
The dataset isn't deep-learning-compatible by default, here's how to preprocess it:

What you should know by now:

How to preprocess image data
How to load image data from a directory
What's a convolution, pooling, and a fully-connected layer
Categorical vs. binary classification
What is data augmentation and why is it useful

Let's start

We'll import the libraries first:

In [1]:

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

import warnings
warnings.filterwarnings('ignore')

import numpy as np
import tensorflow as tf

We'll have to load training and validation data from different directories throughout the notebook
The best practice is to declare a function for that
The function will also apply data augmentation to the training dataset:

In [2]:

def init_data(train_dir: str, valid_dir: str) -> tuple:
    train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
        rescale=1/255.0,
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest'
    )
    valid_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
        rescale=1/255.0
    )
    
    train_data = train_datagen.flow_from_directory(
        directory=train_dir,
        target_size=(224, 224),
        class_mode='categorical',
        batch_size=64,
        seed=42
    )
    valid_data = valid_datagen.flow_from_directory(
        directory=valid_dir,
        target_size=(224, 224),
        class_mode='categorical',
        batch_size=64,
        seed=42
    )
    
    return train_data, valid_data

Let's now load our dogs and cats dataset:

In [3]:

train_data, valid_data = init_data(
    train_dir='data/train/', 
    valid_dir='data/validation/'
)

Out[3]:

Found 20030 images belonging to 2 classes.
Found 2488 images belonging to 2 classes.

Transfer Learning in TensorFlow

With transfer learning, we're basically loading a huge pretrained model without the top clasification layer
That way, we can freeze the learned weights and only add the output layer to match our case
For example, most pretrained models were trained on ImageNet dataset which has 1000 classes
- We only have two classes (cat and dog), so we'll need to specify that
We'll also add a couple of additional layers to prevent overfitting:

In [4]:

def build_transfer_learning_model(base_model):
    # `base_model` stands for the pretrained model
    # We want to use the learned weights, and to do so we must freeze them
    for layer in base_model.layers:
        layer.trainable = False
        
    # Declare a sequential model that combines the base model with custom layers
    model = tf.keras.Sequential([
        base_model,
        tf.keras.layers.GlobalAveragePooling2D(),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dropout(rate=0.2),
        tf.keras.layers.Dense(units=2, activation='softmax')
    ])
    
    # Compile the model
    model.compile(
        loss='categorical_crossentropy',
        optimizer=tf.keras.optimizers.Adam(),
        metrics=['accuracy']
    )
    
    return model

In [5]:

# Let's use a simple and well-known architecture - VGG16
from tensorflow.keras.applications.vgg16 import VGG16

# We'll specify it as a base model
# `include_top=False` means we don't want the top classification layer
# Specify the `input_shape` to match our image size
# Specify the `weights` accordingly
vgg_model = build_transfer_learning_model(
    base_model=VGG16(include_top=False, input_shape=(224, 224, 3), weights='imagenet')
)

# Train the model for 10 epochs
vgg_hist = vgg_model.fit(
    train_data,
    validation_data=valid_data,
    epochs=10
)

Out[5]:

Metal device set to: Apple M1 Pro
Epoch 1/10
313/313 [==============================] - 160s 510ms/step - loss: 0.3786 - accuracy: 0.8258 - val_loss: 0.3144 - val_accuracy: 0.8943
Epoch 2/10
313/313 [==============================] - 160s 510ms/step - loss: 0.2897 - accuracy: 0.8712 - val_loss: 0.1988 - val_accuracy: 0.9224
Epoch 3/10
313/313 [==============================] - 160s 510ms/step - loss: 0.2751 - accuracy: 0.8800 - val_loss: 0.1944 - val_accuracy: 0.9216
Epoch 4/10
313/313 [==============================] - 160s 510ms/step - loss: 0.2717 - accuracy: 0.8812 - val_loss: 0.1820 - val_accuracy: 0.9264
Epoch 5/10
313/313 [==============================] - 160s 511ms/step - loss: 0.2699 - accuracy: 0.8829 - val_loss: 0.1809 - val_accuracy: 0.9268
Epoch 6/10
313/313 [==============================] - 160s 511ms/step - loss: 0.2709 - accuracy: 0.8822 - val_loss: 0.1792 - val_accuracy: 0.9297
Epoch 7/10
313/313 [==============================] - 160s 511ms/step - loss: 0.2668 - accuracy: 0.8852 - val_loss: 0.1763 - val_accuracy: 0.9236
Epoch 8/10
313/313 [==============================] - 162s 516ms/step - loss: 0.2688 - accuracy: 0.8817 - val_loss: 0.1889 - val_accuracy: 0.9212
Epoch 9/10
313/313 [==============================] - 160s 511ms/step - loss: 0.2667 - accuracy: 0.8857 - val_loss: 0.1760 - val_accuracy: 0.9264
Epoch 10/10
313/313 [==============================] - 160s 511ms/step - loss: 0.2685 - accuracy: 0.8836 - val_loss: 0.1802 - val_accuracy: 0.9281

We got amazing accuracy right from the start!
We couldn't surpass 77% accuracy on the validation set with the custom architecture, and we're at 93% with the VGG16 model
The beauty of transfer learning isn't only that it yields a highly accurate models - you can also train models with less data, as the model doesn't have to learn as much

Transfer Learning on a 20 times smaller subset

We want to see if reducing the dataset size negatively effects the predictive power
To do so, we'll create a new directory structure for training and validation images:

In [6]:

import random
import pathlib
import shutil

random.seed(42)


dir_data = pathlib.Path.cwd().joinpath('data_small')
dir_train = dir_data.joinpath('train')
dir_valid = dir_data.joinpath('validation')

if not dir_data.exists(): dir_data.mkdir()
if not dir_train.exists(): dir_train.mkdir()
if not dir_valid.exists(): dir_valid.mkdir()

for cls in ['cat', 'dog']:
    if not dir_train.joinpath(cls).exists(): dir_train.joinpath(cls).mkdir()
    if not dir_valid.joinpath(cls).exists(): dir_valid.joinpath(cls).mkdir()

Here's the directory structure printed:

In [9]:

!ls -R data_small | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'

Out[9]:

   |-train
   |---cat
   |---dog
   |-validation
   |---cat
   |---dog

Now, we'll copy only a sample of images to the new folders
We'll declare a copy_sample() function whcih takes n images from the src_folder and copies them to the tgt_folder
We'll keep n to 500 by default, which is a pretty small number:

In [10]:

def copy_sample(src_folder: pathlib.PosixPath, tgt_folder: pathlib.PosixPath, n: int = 500):
    imgs = random.sample(list(src_folder.iterdir()), n)

    for img in imgs:
        img_name = str(img).split('/')[-1]
        
        shutil.copy(
            src=img,
            dst=f'{tgt_folder}/{img_name}'
        )

Let's now copy the training and validation images
For the validation set, we'll copy only 100 images per class

In [11]:

# Train - cat
copy_sample(
    src_folder=pathlib.Path.cwd().joinpath('data/train/cat/'), 
    tgt_folder=pathlib.Path.cwd().joinpath('data_small/train/cat/'), 
)

# Train - dog
copy_sample(
    src_folder=pathlib.Path.cwd().joinpath('data/train/dog/'), 
    tgt_folder=pathlib.Path.cwd().joinpath('data_small/train/dog/'), 
)

# Valid - cat
copy_sample(
    src_folder=pathlib.Path.cwd().joinpath('data/validation/cat/'), 
    tgt_folder=pathlib.Path.cwd().joinpath('data_small/validation/cat/'),
    n=100
)

# Valid - dog
copy_sample(
    src_folder=pathlib.Path.cwd().joinpath('data/validation/dog/'), 
    tgt_folder=pathlib.Path.cwd().joinpath('data_small/validation/dog/'),
    n=100
)

Let's count the number of files in each folder to verify the images were copied successfully:

In [12]:

!ls data_small/train/cat/ | wc -l

Out[12]:

500

In [13]:

!ls data_small/validation/cat/ | wc -l

Out[13]:

100

In [14]:

!ls data_small/train/dog/ | wc -l

Out[14]:

500

In [15]:

!ls data_small/validation/dog/ | wc -l

Out[15]:

100

Now use init_data() to load in the images again:

In [6]:

train_data, valid_data = init_data(
    train_dir='data_small/train/', 
    valid_dir='data_small/validation/'
)

Out[6]:

Found 1000 images belonging to 2 classes.
Found 200 images belonging to 2 classes.

There's total of 1000 training images
It will be interesting to see if we can get a decent model out of a dataset this small
Model architecture is the same, but we'll train for more epochs just because the dataset is smaller
- Also, we can afford to train for longer since the training time per epoch is reduced:

In [8]:

vgg_model = build_transfer_learning_model(
    base_model=VGG16(include_top=False, input_shape=(224, 224, 3), weights='imagenet')
)

vgg_hist = vgg_model.fit(
    train_data,
    validation_data=valid_data,
    epochs=20
)

Out[8]:

Epoch 1/20
16/16 [==============================] - 9s 572ms/step - loss: 0.8472 - accuracy: 0.5740 - val_loss: 0.7049 - val_accuracy: 0.5100
Epoch 2/20
16/16 [==============================] - 9s 551ms/step - loss: 0.6389 - accuracy: 0.6840 - val_loss: 0.6876 - val_accuracy: 0.5150
Epoch 3/20
16/16 [==============================] - 9s 551ms/step - loss: 0.4936 - accuracy: 0.7800 - val_loss: 0.6461 - val_accuracy: 0.5300
Epoch 4/20
16/16 [==============================] - 9s 552ms/step - loss: 0.4318 - accuracy: 0.8020 - val_loss: 0.6082 - val_accuracy: 0.5850
Epoch 5/20
16/16 [==============================] - 9s 552ms/step - loss: 0.3935 - accuracy: 0.8270 - val_loss: 0.5831 - val_accuracy: 0.6450
Epoch 6/20
16/16 [==============================] - 9s 551ms/step - loss: 0.3945 - accuracy: 0.8100 - val_loss: 0.5638 - val_accuracy: 0.7000
Epoch 7/20
16/16 [==============================] - 9s 545ms/step - loss: 0.3444 - accuracy: 0.8300 - val_loss: 0.5374 - val_accuracy: 0.7350
Epoch 8/20
16/16 [==============================] - 9s 553ms/step - loss: 0.3490 - accuracy: 0.8510 - val_loss: 0.5064 - val_accuracy: 0.8100
Epoch 9/20
16/16 [==============================] - 9s 552ms/step - loss: 0.3523 - accuracy: 0.8330 - val_loss: 0.4810 - val_accuracy: 0.8500
Epoch 10/20
16/16 [==============================] - 9s 553ms/step - loss: 0.3317 - accuracy: 0.8610 - val_loss: 0.4618 - val_accuracy: 0.8650
Epoch 11/20
16/16 [==============================] - 9s 552ms/step - loss: 0.3084 - accuracy: 0.8740 - val_loss: 0.4410 - val_accuracy: 0.8800
Epoch 12/20
16/16 [==============================] - 9s 551ms/step - loss: 0.2890 - accuracy: 0.8740 - val_loss: 0.4182 - val_accuracy: 0.8850
Epoch 13/20
16/16 [==============================] - 9s 552ms/step - loss: 0.2823 - accuracy: 0.8780 - val_loss: 0.3945 - val_accuracy: 0.9200
Epoch 14/20
16/16 [==============================] - 9s 552ms/step - loss: 0.3029 - accuracy: 0.8610 - val_loss: 0.3769 - val_accuracy: 0.9100
Epoch 15/20
16/16 [==============================] - 9s 552ms/step - loss: 0.2998 - accuracy: 0.8590 - val_loss: 0.3614 - val_accuracy: 0.9150
Epoch 16/20
16/16 [==============================] - 9s 552ms/step - loss: 0.2905 - accuracy: 0.8790 - val_loss: 0.3403 - val_accuracy: 0.9300
Epoch 17/20
16/16 [==============================] - 9s 555ms/step - loss: 0.2736 - accuracy: 0.8740 - val_loss: 0.3255 - val_accuracy: 0.9400
Epoch 18/20
16/16 [==============================] - 9s 553ms/step - loss: 0.2956 - accuracy: 0.8780 - val_loss: 0.3126 - val_accuracy: 0.9200
Epoch 19/20
16/16 [==============================] - 9s 563ms/step - loss: 0.2556 - accuracy: 0.8920 - val_loss: 0.2992 - val_accuracy: 0.9150
Epoch 20/20
16/16 [==============================] - 9s 561ms/step - loss: 0.2718 - accuracy: 0.8820 - val_loss: 0.2887 - val_accuracy: 0.9150

It looks like we got roughly the same validation accuracy as with the model trained on 25K images, which is amazing!

Homework:

Use both models to predict the entire test set directory
How do the accuracies compare?

CNN 8 - Transfer Learning

Transfer Learning in TensorFlow

Transfer Learning on a 20 times smaller subset

Product

Resources

Company