CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!
CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!
Path: blob/main/008_CNN_001_Working_With_Image_Data.ipynb
Views: 47
CNN 1 - Working with Image data
Cat Dog
Two folders (one per class), the data isn't split
Set these variables however you want
We'll later split the dataset into training/testing/validation sets with a 80:10:10 ratio
Directory structure
We want to have a folder that contains dedicated folders for training, testing, and validation images
Each of these subfolders will have two folders inside - one for cats, and the other for dogs
We'll declare a function which creates the directory structure:
Train/Test/Validation split
It's recommended to have three subsets when training image models
Training set - The largest subset on which the model trains
Validation sets - A separate set used for evaluation during training
Test set - Used to perform a final test, images were never seen by the model
We'll go over every image in cat or dog folder, and then create a random number between 0 and 1
If the random number is 0.80 or below, the image will belong to the training set
If the random number is between 0.80 and 0.90, the image will belong to the validation set
Else, the image will belong to the test set
You can use the
shutil
module to copy the images
It's not a perfect 80:10:10 split due to randomization, but it will do
Visualizng images
Always visualize the dataset when working with images
This function plots a random subset of 10 images from a given directory
The images are displayed in a grid of 2 rows and 5 columns:
What's next?
We'll explore what image data actually is and how to work with it