CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
better-data-science

CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!

GitHub Repository: better-data-science/TensorFlow
Path: blob/main/013_CNN_006_Increasing_Model_Complexity.ipynb
Views: 47
Kernel: Python 3 (ipykernel)

CNN 6 - Do Larger Model Lead to Better Performance?

What you should know by now:

  • How to preprocess image data

  • How to load image data from a directory

  • What's a convolution, pooling, and a fully-connected layer

  • Categorical vs. binary classification


  • First things first, let's import the libraries

  • The models we'll declare today will have more layers than the ones before

    • We'll implement individual classes from TensorFlow

import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' import warnings warnings.filterwarnings('ignore') import tensorflow as tf from tensorflow.keras import Sequential from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.losses import categorical_crossentropy from tensorflow.keras.optimizers import Adam from tensorflow.keras.metrics import BinaryAccuracy tf.random.set_seed(42) physical_devices = tf.config.list_physical_devices('GPU') try: tf.config.experimental.set_memory_growth(physical_devices[0], True) except: pass
  • I'm using Nvidia RTX 3060 TI

physical_devices
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Load in the data

  • Use ImageDataGenerator to convert image matrices to 0-1 range

  • Load in the images from directories and convert them to 224x224x3

  • For memory concerns, we'll lower the batch size:

train_datagen = ImageDataGenerator(rescale=1/255.0) valid_datagen = ImageDataGenerator(rescale=1/255.0) train_data = train_datagen.flow_from_directory( directory='data/train/', target_size=(224, 224), class_mode='categorical', batch_size=32, shuffle=True, seed=42 ) valid_data = valid_datagen.flow_from_directory( directory='data/validation/', target_size=(224, 224), class_mode='categorical', batch_size=32, seed=42 )
Found 20030 images belonging to 2 classes. Found 2478 images belonging to 2 classes.

Model 1

  • Block 1: Conv, Conv, Pool

  • Block 2: Conv, Conv, Pool

  • Block 3: Flatten, Dense

  • Output


  • We won't mess with the hyperparameters today

model_1 = tf.keras.Sequential([ Conv2D(filters=32, kernel_size=(3, 3), input_shape=(224, 224, 3), activation='relu'), Conv2D(filters=32, kernel_size=(3, 3), activation='relu'), MaxPool2D(pool_size=(2, 2), padding='same'), Conv2D(filters=64, kernel_size=(3, 3), activation='relu'), Conv2D(filters=64, kernel_size=(3, 3), activation='relu'), MaxPool2D(pool_size=(2, 2), padding='same'), Flatten(), Dense(units=128, activation='relu'), Dense(units=2, activation='softmax') ]) model_1.compile( loss=categorical_crossentropy, optimizer=Adam(), metrics=[BinaryAccuracy(name='accuracy')] ) model_1_history = model_1.fit( train_data, validation_data=valid_data, epochs=10 )
Epoch 1/10 626/626 [==============================] - 40s 61ms/step - loss: 0.6586 - accuracy: 0.6149 - val_loss: 0.6115 - val_accuracy: 0.6804 Epoch 2/10 626/626 [==============================] - 37s 59ms/step - loss: 0.5223 - accuracy: 0.7422 - val_loss: 0.5265 - val_accuracy: 0.7554 Epoch 3/10 626/626 [==============================] - 38s 60ms/step - loss: 0.4073 - accuracy: 0.8125 - val_loss: 0.5061 - val_accuracy: 0.7571 Epoch 4/10 626/626 [==============================] - 38s 61ms/step - loss: 0.2476 - accuracy: 0.8942 - val_loss: 0.6336 - val_accuracy: 0.7672 Epoch 5/10 626/626 [==============================] - 38s 61ms/step - loss: 0.1004 - accuracy: 0.9625 - val_loss: 1.0141 - val_accuracy: 0.7571 Epoch 6/10 626/626 [==============================] - 39s 62ms/step - loss: 0.0419 - accuracy: 0.9863 - val_loss: 1.3990 - val_accuracy: 0.7700 Epoch 7/10 626/626 [==============================] - 38s 61ms/step - loss: 0.0352 - accuracy: 0.9894 - val_loss: 1.2963 - val_accuracy: 0.7680 Epoch 8/10 626/626 [==============================] - 39s 62ms/step - loss: 0.0263 - accuracy: 0.9932 - val_loss: 1.4017 - val_accuracy: 0.7684 Epoch 9/10 626/626 [==============================] - 38s 61ms/step - loss: 0.0263 - accuracy: 0.9940 - val_loss: 1.3149 - val_accuracy: 0.7780 Epoch 10/10 626/626 [==============================] - 38s 61ms/step - loss: 0.0237 - accuracy: 0.9940 - val_loss: 1.6602 - val_accuracy: 0.7482

  • Not bad, but we got 75% accuracy on the validation set in notebook 010

  • Will adding complexity to the model increase the accuracy?

Model 2

  • Block 1: Conv, Conv, Pool

  • Block 2: Conv, Conv, Pool

  • Block 3: Conv, Conv, Pool

  • Block 4: Flatten, Dense

  • Ouput


  • This artchitecture is a bit of an overkill for our dataset

  • The model isn't learning at all:

model_2 = Sequential([ Conv2D(filters=32, kernel_size=(3, 3), input_shape=(224, 224, 3), activation='relu'), Conv2D(filters=32, kernel_size=(3, 3), activation='relu'), MaxPool2D(pool_size=(2, 2), padding='same'), Conv2D(filters=64, kernel_size=(3, 3), activation='relu'), Conv2D(filters=64, kernel_size=(3, 3), activation='relu'), MaxPool2D(pool_size=(2, 2), padding='same'), Conv2D(filters=128, kernel_size=(3, 3), activation='relu'), Conv2D(filters=128, kernel_size=(3, 3), activation='relu'), MaxPool2D(pool_size=(2, 2), padding='same'), Flatten(), Dense(units=128, activation='relu'), Dense(units=2, activation='softmax') ]) model_2.compile( loss=categorical_crossentropy, optimizer=Adam(), metrics=[BinaryAccuracy(name='accuracy')] ) model_2_history = model_2.fit( train_data, validation_data=valid_data, epochs=10 )
Epoch 1/10 626/626 [==============================] - 39s 62ms/step - loss: 0.7040 - accuracy: 0.4955 - val_loss: 0.6932 - val_accuracy: 0.5000 Epoch 2/10 626/626 [==============================] - 39s 62ms/step - loss: 0.6932 - accuracy: 0.4959 - val_loss: 0.6932 - val_accuracy: 0.5000 Epoch 3/10 626/626 [==============================] - 39s 62ms/step - loss: 0.6932 - accuracy: 0.4987 - val_loss: 0.6932 - val_accuracy: 0.5000 Epoch 4/10 626/626 [==============================] - 39s 62ms/step - loss: 0.6932 - accuracy: 0.4993 - val_loss: 0.6932 - val_accuracy: 0.5000 Epoch 5/10 626/626 [==============================] - 39s 62ms/step - loss: 0.6932 - accuracy: 0.5006 - val_loss: 0.6932 - val_accuracy: 0.5000 Epoch 6/10 626/626 [==============================] - 40s 64ms/step - loss: 0.6932 - accuracy: 0.4924 - val_loss: 0.6932 - val_accuracy: 0.5000 Epoch 7/10 626/626 [==============================] - 40s 64ms/step - loss: 0.6932 - accuracy: 0.5020 - val_loss: 0.6932 - val_accuracy: 0.5000 Epoch 8/10 626/626 [==============================] - 40s 63ms/step - loss: 0.6932 - accuracy: 0.5023 - val_loss: 0.6932 - val_accuracy: 0.5000 Epoch 9/10 626/626 [==============================] - 40s 64ms/step - loss: 0.6932 - accuracy: 0.5003 - val_loss: 0.6932 - val_accuracy: 0.5000 Epoch 10/10 626/626 [==============================] - 40s 64ms/step - loss: 0.6932 - accuracy: 0.5034 - val_loss: 0.6932 - val_accuracy: 0.5000

  • When that happens, you can try experimenting with the learning rate and other parameters

  • Let's dial it down a bit next


Model 3

  • Block 1: Conv, Conv, Pool

  • Block 2: Conv, Conv, Pool

  • Block 3: Flatten, Dense, Dropout, Dense

  • Output


  • The first model was better than the second

  • We can try adding a dropout layer as a regulizer and tweaking the fully connected layers:

model_3 = tf.keras.Sequential([ Conv2D(filters=32, kernel_size=(3, 3), input_shape=(224, 224, 3), activation='relu'), Conv2D(filters=32, kernel_size=(3, 3), activation='relu'), MaxPool2D(pool_size=(2, 2), padding='same'), Conv2D(filters=64, kernel_size=(3, 3), activation='relu'), Conv2D(filters=64, kernel_size=(3, 3), activation='relu'), MaxPool2D(pool_size=(2, 2), padding='same'), Flatten(), Dense(units=512, activation='relu'), Dropout(rate=0.3), Dense(units=128), Dense(units=2, activation='softmax') ]) model_3.compile( loss=categorical_crossentropy, optimizer=Adam(), metrics=[BinaryAccuracy(name='accuracy')] ) model_3_history = model_3.fit( train_data, validation_data=valid_data, epochs=10 )
Epoch 1/10 626/626 [==============================] - 39s 62ms/step - loss: 0.7498 - accuracy: 0.5622 - val_loss: 0.6580 - val_accuracy: 0.6295 Epoch 2/10 626/626 [==============================] - 39s 62ms/step - loss: 0.6101 - accuracy: 0.6744 - val_loss: 0.5645 - val_accuracy: 0.7159 Epoch 3/10 626/626 [==============================] - 39s 62ms/step - loss: 0.5007 - accuracy: 0.7562 - val_loss: 0.5734 - val_accuracy: 0.7070 Epoch 4/10 626/626 [==============================] - 39s 63ms/step - loss: 0.3297 - accuracy: 0.8585 - val_loss: 0.7222 - val_accuracy: 0.7038 Epoch 5/10 626/626 [==============================] - 40s 64ms/step - loss: 0.1246 - accuracy: 0.9556 - val_loss: 1.1581 - val_accuracy: 0.6965 Epoch 6/10 626/626 [==============================] - 39s 63ms/step - loss: 0.0786 - accuracy: 0.9786 - val_loss: 0.8357 - val_accuracy: 0.6832 Epoch 7/10 626/626 [==============================] - 40s 64ms/step - loss: 0.0425 - accuracy: 0.9877 - val_loss: 1.3557 - val_accuracy: 0.7006 Epoch 8/10 626/626 [==============================] - 40s 64ms/step - loss: 0.0277 - accuracy: 0.9934 - val_loss: 2.0383 - val_accuracy: 0.6780 Epoch 9/10 626/626 [==============================] - 40s 64ms/step - loss: 0.0334 - accuracy: 0.9926 - val_loss: 1.0312 - val_accuracy: 0.6913 Epoch 10/10 626/626 [==============================] - 40s 64ms/step - loss: 0.0298 - accuracy: 0.9925 - val_loss: 1.5798 - val_accuracy: 0.6985

  • It made the model worse

  • More complex model don't necessarily lead to an increase in performance


Conclusion

  • There you have it - we've been focusing on the wrong thing from the start

  • Our model architecture in the notebook 010 was solid

    • Adding more layers and complexity decreases the predictive power

  • We should shift our focus to improving the dataset quality

  • The following notebook will teach you all about data augmentation, and you'll see how it increases the power of our model

  • After that you'll take your models to new heights with transfer learning, and you'll see why coming up with custom architectures is a waste of time in most cases