1. Introduction 📑

This notebook is just me being frustrated on deep learning and trying to understand in "baby steps" what is going on here. For somebody that starts in this area with no background whatsoever it can be very confusing, especially because I seem to be unable to find code with many explanations and comments.

So, if you are frustrated just like I was when I started this stuff I hope the following guidelines will help you. I am by no means a teacher, but in this notebook I will:

  1. Share articles/videos I watched that TRULY helped
  2. Explain code along the way to the best of my ability
Note: Deep learning coding is VERY different in structure than the usual sklearn for machine learning. In addition, it usually works with images and text, while ML usually works with tabular data. So please, be patient with yourself and if you don't understand something right away, continue reading/ coding and it will all make sense in the end.

2. Before we start ✋

This is my second notebook in the "series": How I taught myself Deep Learning.

  1. How I taught myself Deep Learning: Vanilla NNs
     * PyTorch and Tensors
     * Neural Network Basics, Perceptrons and a Plain Vanilla Neural Net model
     * MNIST Classification using FNN
     * Activation Functions
     * Forward Pass
     * Backpropagation (Loss and Optimizer Functions)
     * Batching, Iterations and Epochs
     * Computing Classification Accuracy
     * Overfitting: Data Augmentation, Weight Decay, Learning Rate, Dropout() and Layer Optimization   
  2. Recurrent Neural Networks and LSTMs Explained
     * 1 Layer RNNs
     * Multiple Neurons RNN
     * Vanilla RNN for MNIST Classification
     * Multilayer RNNs
     * Tanh Activation Function
     * Multilayer RNN for MNIST
     * LSTMs and Vanishing Gradient Problem
     * Bidirectional LSTMs
     * LSTM for MNIST Classification        

3. Convolutional Neural Networks 🏕🏞🛤🏜🏖🏝🏔

Pro Tip: Use this tool to create your own convolutional neural nets.

3.1 Why FNNs might not be the best approach 🤔

For image classification, Feed Forward Neural Nets are not the best approach for a number of reasons:

  1. They do not take the 2D geometry of the image into account. This means there is no notion of proximity. The human eye detects features in an image locally, meaning that it looks at portions of a picture and recognizes patterns, whereas FNNs don't. Instead, 1 neuron in the Hidden layer connects with ALL pixels from the image, NO matter their position in that image.

  2. FNNs require many connections, therefore they have many weights (parameters) to compute. For example, if we have a 100x100 pixel image => 10,000 neurons in the first layer. If the second layer has 500 neurons, then we end up with 10,000x500 = 5,000,000 weights. Hence, there is large computing cost.

  3. FNNs are prone to overfitting. This means they learn some features very well instead of learning to generalize.

3.2 Youtube Videos to save you time: 🎥

Watch these 2 Youtube videos to grasp a better understanding of CNNs.

3.3 Convolutions

Convolutions solve the issues encoutered by FNNs:

So: convolutional layers have multiple filters, composed by weights, that learn patterns in the data by sliding over the entire image.

Filters (like neurons in human visual cortex) in smaller layers detect edges, lines, and as the layers increase they start detecting shapes, patterns and even faces/ objects.

3.4 Computing a convolutional kernel 🧾🖊

Convolutional Filter == Convolutional Layer == Kernel == Filter

Feature Map == Activation Map

4. Understanding Convolutions 🧐

Let's create some convolutions on an image sample. We'll also introduce the notions of:

4.1 Imports and Data Preparation 📥

Libraries 📚

Seed 🌱

4.2 Create the Convolutions: 📚

To create Convolutions you need to have:

Example: original image (6x6 pixels) | kernel_size (3x3) | padding (1) | stride (2)

Let's first visualize the way this convolution works:

Note: You can see that the Convolution alters a bit the size of the image (from 320x320 to 316x316). To compute the new image shape after each convolution use the following formula:

Output: [(W - K + 2P)/S + 1] x [(W - K + 2P)/S + 1]

4.3 Parameters (weights) of the Convolutional Layer: ⚖

FNN: the trainable parameters include the network weights and biases (one weight for each connection, one bias for each output unit)

CNN: the trainable parameters include the convolutional kernels (filters) and also a set of biases. There is one bias for each output channel. Each bias is added to every element in that output channel.

4.4 Visualize Convolutions 🔎

4.5 Another Example: Increasing Padding and Stride:

Let's visualise what has happened:

new activation map size (remember, activation maps are the result of the filters applied to the image or another activation map): ((316 - 10 + 2x2) / 2) + 1 = 156

5. AlexNet 🎇

AlexNet is a very popular CNN architecture that is capable of achieving high accuracies in classifying 1000 different groups (animals, breeds, onjects etc.). However, removing any layer or changing any of it's parameters could drastically degrade it's performance.

It is composed of features and classifier:

For more information head here

6. MNIST Classification using CNNs 🔢

Finally, let's put everything into practice.

6.1 CNN_MNISTClassifier neural network

The Architecture will contain 2 main parts:

MaxPool() - Data is spatially autocorrelated: if a given pixel is green, an adjacent pixel is more likely to be a different tone of green than bright pink. So, to reduce computational load, after each Convolution we can call MaxPool2d() to reduce the activation map size to half. This method also introduces a degree of spatial invariance.

ReLU() - It simply takes all the negative numbers in filter and turns them into 0

So, a natural techinque during convolutions is: Conv2d() -> ReLU() -> MaxPool()

6.2 Understanding how the Network Works: 😎

How the schema of this example looks:

6.3 Training on all Images: 🚀

6.3.1 Accuracy Function: ✔

We'll use the same accuracy function used in the How I taught myself Deep Learning: Vanilla NNs notebook.

6.3.2 Training Function: 💪

We'll use the same train function used in the How I taught myself Deep Learning: Vanilla NNs notebook (only change is that we'll create an Adam optimizer instead of SGD).

6.3.3 Training...

Notice there is a much higher Test Accuracy for the CNN model vs the plain Vanilla FNN in the last notebook.

Bonuses 📌

1. Confusion Matrix 🙃

2. 2D Visualization of Convolutional Neural Nets 💎

Here is a very good resource to better visualize and understand MNIST Classification using CNN.

Other How I taught myself Deep Learning Notebooks 📒

If you have any questions, please do not hesitate to ask. This notebook is made to bring more clear understanding of concepts and coding, so this would also help me add, modify and improve it.

If you liked this, upvote!

Cheers!

References: 📇