Ungraded Lab: Intro to Keras Tuner

Developing machine learning models is usually an iterative process. You start with an initial design then reconfigure until you get a model that can be trained efficiently in terms of time and compute resources. As you may already know, these settings that you adjust are called hyperparameters. These are the variables that govern the training process and the topology of an ML model. These remain constant over the training process and directly impact the performance of your ML program.

The process of finding the optimal set of hyperparameters is called hyperparameter tuning or hypertuning, and it is an essential part of a machine learning pipeline. Without it, you might end up with a model that has unnecessary parameters and take too long to train.

Hyperparameters are of two types:

  1. Model hyperparameters which influence model selection such as the number and width of hidden layers

  2. Algorithm hyperparameters which influence the speed and quality of the learning algorithm such as the learning rate for Stochastic Gradient Descent (SGD) and the number of nearest neighbors for a k Nearest Neighbors (KNN) classifier.

For more complex models, the number of hyperparameters can increase dramatically and tuning them manually can be quite challenging.

In this lab, you will practice hyperparameter tuning with Keras Tuner, a package from the Keras team that automates this process. For comparison, you will first train a baseline model with pre-selected hyperparameters, then redo the process with tuned hyperparameters. Some of the examples and discussions here are taken from the official tutorial provided by Tensorflow but we've expounded on a few key parts for clarity.

Let's begin!

Note: The notebooks in this course are shared with read-only access. To be able to save your work, kindly select File > Save a Copy in Drive from the Colab menu and run the notebook from there. You will need a Gmail account to save a copy.

Download and prepare the dataset

Let us first load the Fashion MNIST dataset into your workspace. You will use this to train a machine learning model that classifies images of clothing.

For preprocessing, you will normalize the pixel values to make the training converge faster.

Baseline Performance

As mentioned, you will first have a baseline performance using arbitrarily handpicked parameters so you can compare the results later. In the interest of time and resource limits provided by Colab, you will just build a shallow dense neural network (DNN) as shown below. This is to demonstrate the concepts without involving huge datasets and long tuning and training times. As you'll see later, even small models can take some time to tune. You can extend the concepts here when you get to build more complex models in your own projects.

As shown, we hardcoded all the hyperparameters when declaring the layers. These include the number of hidden units, activation, and dropout. You will see how you can automatically tune some of these a bit later.

Let's then setup the loss, metrics, and the optimizer. The learning rate is also a hyperparameter you can tune automatically but for now, let's set it at 0.001.

With all settings set, you can start training the model. We've set the number of epochs to 10 but feel free to increase it if you have more time to go through the notebook.

Finally, you want to see how this baseline model performs against the test set.

Let's define a helper function for displaying the results so it's easier to compare later.

That's it for getting the results for a single set of hyperparameters. As you can see, this process can be tedious if you want to try different sets of parameters. For example, will your model improve if you use learning_rate=0.00001 and units=128? What if 0.001 paired with 256? The process will be even more difficult if you decide to also tune the dropout and try out other activation functions as well. Keras Tuner solves this problem by having an API to automatically search for the optimal set. You will just need to set it up once then wait for the results. You will see how this is done in the next sections.

Keras Tuner

To perform hypertuning with Keras Tuner, you will need to:

Install and import packages

You will start by installing and importing the required packages.

Define the model

The model you set up for hypertuning is called a hypermodel. When you build this model, you define the hyperparameter search space in addition to the model architecture.

You can define a hypermodel through two approaches:

In this lab, you will take the first approach: you will use a model builder function to define the image classification model. This function returns a compiled model and uses hyperparameters you define inline to hypertune the model.

The function below basically builds the same model you used earlier. The difference is there are two hyperparameters that are setup for tuning:

You will see that this is done with a HyperParameters object which configures the hyperparameter you'd like to tune. For this exercise, you will:

You can view all available methods and its sample usage in the official documentation.

Instantiate the Tuner and perform hypertuning

Now that you have the model builder, you can then define how the tuner can find the optimal set of hyperparameters, also called the search strategy. Keras Tuner has four tuners available with built-in strategies - RandomSearch, Hyperband, BayesianOptimization, and Sklearn.

In this tutorial, you will use the Hyperband tuner. Hyperband is an algorithm specifically developed for hyperparameter optimization. It uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket wherein the algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. You can read about the intuition behind the algorithm in section 3 of this paper.

Hyperband determines the number of models to train in a bracket by computing 1 + log`factor`(max_epochs) and rounding it up to the nearest integer. You will see these parameters (i.e. factor and max_epochs passed into the initializer below). In addition, you will also need to define the following to instantiate the Hyperband tuner:

You can refer to the documentation for other arguments you can pass in.

Let's see a summary of the hyperparameters that you will tune:

You can pass in a callback to stop training early when a metric is not improving. Below, we define an EarlyStopping callback to monitor the validation loss and stop training if it's not improving after 5 epochs.

You will now run the hyperparameter search. The arguments for the search method are the same as those used for tf.keras.model.fit in addition to the callback above. This will take around 10 minutes to run.

You can get the top performing model with the get_best_hyperparameters() method.

Build and train the model

Now that you have the best set of hyperparameters, you can rebuild the hypermodel with these values and retrain it.

You will then get its performance against the test set.

We can compare the results we got with the baseline model we used at the start of the notebook. Results may vary but you will usually get a model that has less units in the dense layer, while having comparable loss and accuracy. This indicates that you reduced the model size and saved compute resources while still having more or less the same accuracy.

Bonus Challenges (optional)

If you want to keep practicing with Keras Tuner in this notebook, you can do a factory reset (Runtime > Factory reset runtime) and take on any of the following:

Wrap Up

In this tutorial, you used Keras Tuner to conveniently tune hyperparameters. You defined which ones to tune, the search space, and search strategy to arrive at the optimal set of hyperparameters. These concepts will again be discussed in the next sections but in the context of AutoML, a package that automates the entire machine learning pipeline. On to the next!