1. Introduction 📜

This notebook is just me being frustrated on deep learning and trying to understand in "baby steps" what is going on here. For somebody that starts in this area with no background whatsoever it can be very confusing, especially because I seem to be unable to find code with many explanations and comments.

So, if you are frustrated just like I was when I started this stuff I hope the following guidelines will help you. I am by no means a teacher, but in this notebook I will:

  1. Share articles/videos I watched that TRULY helped
  2. Explain code along the way to the best of my ability
Note: Deep learning coding is VERY different in structure than the usual sklearn for machine learning. In addition, it usually works with images and text, while ML usually works with tabular data. So please, be patient with yourself and if you don't understand something right away, continue reading/ coding and it will all make sense in the end.

2. Before we start 📝

This is my third notebook in the "series": How I taught myself Deep Learning.

  1. How I taught myself Deep Learning: Vanilla NNs
     * PyTorch and Tensors
     * Neural Network Basics, Perceptrons and a Plain Vanilla Neural Net model
     * MNIST Classification using FNN
     * Activation Functions
     * Forward Pass
     * Backpropagation (Loss and Optimizer Functions)
     * Batching, Iterations and Epochs
     * Computing Classification Accuracy
     * Overfitting: Data Augmentation, Weight Decay, Learning Rate, Dropout() and Layer Optimization   
  2. Convolutional Neural Nets (CNNs) Explained
     * Why ConvNets
     * Convolutions Explained
     * Computing Activation Maps
     * Kernels, Padding, Stride
     * AlexNet
     * MNIST Classification using Convolutions

3. RNN with 1 Layer 📘

Recurrent Neural Networks are very different from FNNs or CNNs.

RNNs model sequential data, meaning they have sequential memory. An RNN takes in different kind of inputs (text, words, letters, parts of an image, sounds, etc.) and returns different kinds of outputs (the next word/letter in the sequence, paired with an FNN it can return a classification etc.).

How RNN works:

  1. It uses previous information to affect later ones
  2. There are 3 layers: Input, Output and Hidden (where the information is stored)
  3. The loop: passes the input forward sequentialy, while retaining information about it
  4. This info is stored in the hidden state
  5. There are only 3 matrixes (U, V, W) that contain weights as parameters. These DON'T change with the input, they stay the same through the entire sequence.

3.1 Youtube Videos to Save you Time 🎥

I highly recommend watching the following to better understand RNNs.

3.2 RNN with 1 Layer and 1 Neuron (🎇)

You can always increase the number of neurons in an RNN. For the moment we'll stick with 1. We'll have 2 timesteps, 0 and 1. The Architecture of our class will look like the figure below:

3.3 RNN with 1 Layer and Multiple Neurons (🎇🎇🎇)

Difference vs RNN 1 neuron 1 layer:

3.4 Vanilla RNN for MNIST Classification 🔢

From now on we'll use the build in nn.RNN() from PyTorch. As you see, the previous examples can't support large inputs and outputs, as we would have to input the information at every timestep and output the results.

Note: When using RNN in image classification, it is hard to find the logic of "why" exactly are we doing this. It is not like CNNs, when we know we put many "filters" on the image to extract the essence. I observed that when using RNNs is just another mathematical method in which the computer learns numbers and can therefore identify patterns.

This is why RNNs might be weird in the approach for image classification, but nevertheless very effective.

RNN is a very powerful neural net. As you'll see, it's performance is far grater than a normal FNN or CNN

Side Note: Images used as input NEED to have 1 channel (so need to be B&W)

3.4.1 Import the Data 📥

Note: to further augmentations on the data, check albumentations for PyTorch

3.4.2 RNN Architecture for MNIST Classification 🪓

Note: Don't bother with the prints, they are there for later only to understand what's happening inside the network.

Pro Tip: Use print() a lot if you don't understand what is happening (helps you visualize)

Understanding the Model:

Here is what's happening to the batch below:

If we unfold the RNN:

3.4.3 Training... 🚀

We'll use get_accuracy() and train_network() functions from my previous notebook, but with some changes (suited to the RNN's needs).

Side Note: It's AMAZING how important hyperparameters are. Try changing the learning_rate to 0.01 and see what happens. Also try changing the batch_size to 20 instead of 64. Try adding weight_decay to the optimizer functions. The accuracy of the model now is impressive, but by altering some of these hyperparameters can change this "sweet spot" we found instantly.

4. Multilayer RNNs 📚

4.1 Why multilayers?

Why use multiple layers rather than 1?

to create higher-level abstractions and capture more non-linearities between the data

Multilayers in RNN:

Activation functions: ReLU vs Tanh

use them to try to erase the vanishing gradients problem (we'll come back to these in the next chapter LSTM)

4.2 Multilayer RNN for MNIST Classification 🔢

Understanding the Model:

Here is what's happening in the batch below:

If we unfold the Multilayer RNN Example:

Training... 🚀

Let's see how the model performs by adding 1 more layer.

Accuracy improves faster compared to the Vanilla RNN, while final TEST Accuracy is slightly bigger.

5. LSTM (Long Short Term Memory RNNs) 💾

5.1 Material to Save you Time 🎥

I highly recommend going through the references below before continuing. You will understand how LSTMs are different from RNNs, how they works and what is Vanishing Gradient Problem.

5.2 Why RNN might not be the best idea:

Issues in Vanilla RNNs: 🍦

5.3 Vanishing Gradient Problem 🌪

What is vanishing gradient:

5.4 How does LSTM work?

An LSTM is more complex than an simple RNN:

Note: Check THIS blog post for more detailed explanation.

The Stacked LSTM is like the Multilayer RNN: it has multiple hidden LSTM layers which contain multiple memory cells

5.5 LSTM for MNIST Classification 🔢

Bidirectional LSTMs : are an extension of traditional LSTMs that can improve model performance on sequence classification problems. They train the model forward and backward on the same input (so for 1 layer LSTM we get 2 hidden and cell states)

How the Model Works:

Below is a schema of how the example code works

Training on ALL IMAGES 🚀

Now we get even HIGHER accuracies than the ones before. If we make a recap, FNNs from my previous notebook had an accuracy of ~ 80%, CNNs had and accuracy of almost 90%, while RNN reached 97%. Lastly, LSTMs were the best performing ones (99% accuracy).

6. Bonuses ➕

6.1 Confusion Matrix 🤔

A good way to visualize better how the model is performing is through a confusion matrix. So, you can see how well each label is predicted and what labels the model confuses with other labels (for example a 7 can be sometimes confused with 1).

6.2 Why shouldn't you use Transfer Learning?🧠

Transfer learning is a genious way to use the weights of a pretrained model on another set of images. This technique is often used in deep learning classification problems that use CNN (like EffNets, ResNets etc).

However, this is not a regular technique used for RNN networks. This is mainly because recurrent data cannot really be generalized like static data (images) can be.

Read more about this here.

Other How I taught myself Deep Learning Notebooks📋

If you have any questions, please do not hesitate to ask. This notebook is made to bring more clear understanding of concepts and coding, so this would also help me add, modify and improve it.

If you liked this, upvote!

Cheers!

References📇: