CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!
CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!
Path: blob/main/011_CNN_004_Convolutions_From_Scratch.ipynb
Views: 47
CNN 4 - Convolutions from scratch
Dataset:
The dataset isn't deep-learning-compatible by default, here's how to preprocess it:
Today we'll implement convolutions from scratch in pure Numpy
A convolution boils down to repetitve matrix element-wise multiplication and summation, which should be easy to implement
Let's declare two functions for plotting images
The first one plots a single image
The second one plots two images side by side (1 row, 2 columns):
And now let's load in the image
We'll apply grayscaling and resizing to 224x224
Without grayscaling you'd have to apply convolution to each of the three color channels individually
Declare filters for convolutions
The task of a convolutional layer is to find N filters (kernels) that best extract features from the dataset
Did you know there are known filters for doing various image operations?
We'll declare ones for sharpening, blurring, and outlining
Explore the rest here: https://setosa.io/ev/image-kernels/
These are just 3x3 matrices:
Implement convolution from scratch
We'll declare a helper function to make our lives easier
It will calculate the target image size
Sliding a 3x3 filter over an image means we'll lose a single pixel on all ends
You can address this with padding, but more on that later
For example, sliding a 3x3 filter over a 224x224 images results in a 222x222 image
Sliding a 5x5 filter over a 224x224 images results in a 220x220 image
Let's write the function:
Works as advertised:
Here's what convolution boils down to:
Let's extract the first 3x3 matrix from our image:
Do an element-wise multiplication between the image and the filter:
Sum the elements in the matrix:
And that's it!
We can now apply this logic to the entire image
The trickiest part is keeping track of the current N x N matrix
You need to iterate over all rows and all columns in the image and than subset the image from there and apply the convolution:
Let's test it
Sharpening filter first:
Here's how the image looks like in matrix format:
Let/s visualize it:
The colors are a bit off since values in the matrix don't range between 0 and 255
It's not a problem, but we can "fix" it by replacing all negative values with zeros:
And plot it again:
You can see that the image definitely looks sharper, no arguing there
Let's blur the image next:
The blurring filter matrix doesn't have negative values, so the coloring is identical
You can clearly see how the image was blurred
Finally, let's apply the outline:
It suffers from the same coloring problem:
Amazing!
All convolved images are of shape 222x222
What if you want to keep the original size of 224x224?
That's where padding comes into play
Implement convolutions with padding from scratch
TensorFlow's
Conv2D
layer lets you specify eithervalid
orsame
for thepadding
parameterThe first one is default, which means no padding is added to the images (what we implemented above)
The second one will add padding depending on the kernel size, so the source and convolved images are of the same shape
Padding is essentially just a "black" border around the image
It's black because typically zeros are added, and zeros represent the color black
The black borders don't have an impact on the calculations, since they're zero, and a convolution operation multiplies elements of an image with the elements of a filter. Anything multiplied with a zero is a zero
First, let's declare a helper function that calculates how "thick" of a border we need to add to the image
The bigger the kernel size, the thicker the border
All sides of the image will have the exact same border
It's just an integer division:
For example, 3x3 kernel means 3 // 2 which is 1
Add 1 pixel to each side:
5 // 2 = 2:
Let's declare yet another helper function
It's task is to add a padding to the image
First, the function declares a matrix of zeros with a shape of (image.shape + padding * 2)
We multiply the padding with 2 because we need it on all sides
Then we index the matrix so the padding is ignored and change the zeros with the actual image values:
Let's test it by adding a padding to the image for 3x3 filter:
It adds a 1 pixel-wide border to the image and makes it 226x226 in size
Here's how the matrix looks like:
You can see the original image surrounded with zeros - that's just what we wanted
Let's see if the same is true for the 5x5 kernel:
You can now visually see the black border, but still let's verify it's there:
Everything looks good
Let's apply a convolution operation to our 226x226 image (1 pixel-wide border):
The result is an 224x224 image, which is the same as the original one!
Let's plot them side by side to verify:
And that's how convolutions and padding work
TensorFlow's Conv2D layer is here to find the optimal filter matrices, but once it does, this is essentially what happens.
The next notebook will cover pooling from scratch, so stay tuned.