Developing machine learning models is usually an iterative process. You start with an initial design then reconfigure until you get a model that can be trained efficiently in terms of time and compute resources. As you may already know, these settings that you adjust are called hyperparameters. These are the variables that govern the training process and the topology of an ML model. These remain constant over the training process and directly impact the performance of your ML program.
The process of finding the optimal set of hyperparameters is called hyperparameter tuning or hypertuning, and it is an essential part of a machine learning pipeline. Without it, you might end up with a model that has unnecessary parameters and take too long to train.
Hyperparameters are of two types:
Model hyperparameters which influence model selection such as the number and width of hidden layers
Algorithm hyperparameters which influence the speed and quality of the learning algorithm such as the learning rate for Stochastic Gradient Descent (SGD) and the number of nearest neighbors for a k Nearest Neighbors (KNN) classifier.
For more complex models, the number of hyperparameters can increase dramatically and tuning them manually can be quite challenging.
In this lab, you will practice hyperparameter tuning with Keras Tuner, a package from the Keras team that automates this process. For comparison, you will first train a baseline model with pre-selected hyperparameters, then redo the process with tuned hyperparameters. Some of the examples and discussions here are taken from the official tutorial provided by Tensorflow but we've expounded on a few key parts for clarity.
Let's begin!
Note: The notebooks in this course are shared with read-only access. To be able to save your work, kindly select File > Save a Copy in Drive from the Colab menu and run the notebook from there. You will need a Gmail account to save a copy.
Let us first load the Fashion MNIST dataset into your workspace. You will use this to train a machine learning model that classifies images of clothing.
# Import keras
from tensorflow import keras
# Download the dataset and split into train and test sets
(img_train, label_train), (img_test, label_test) = keras.datasets.fashion_mnist.load_data()
For preprocessing, you will normalize the pixel values to make the training converge faster.
# Normalize pixel values between 0 and 1
img_train = img_train.astype('float32') / 255.0
img_test = img_test.astype('float32') / 255.0
As mentioned, you will first have a baseline performance using arbitrarily handpicked parameters so you can compare the results later. In the interest of time and resource limits provided by Colab, you will just build a shallow dense neural network (DNN) as shown below. This is to demonstrate the concepts without involving huge datasets and long tuning and training times. As you'll see later, even small models can take some time to tune. You can extend the concepts here when you get to build more complex models in your own projects.
# Build the baseline model using the Sequential API
b_model = keras.Sequential()
b_model.add(keras.layers.Flatten(input_shape=(28, 28)))
b_model.add(keras.layers.Dense(units=512, activation='relu', name='dense_1')) # You will tune this layer later
b_model.add(keras.layers.Dropout(0.2))
b_model.add(keras.layers.Dense(10, activation='softmax'))
# Print model summary
b_model.summary()
As shown, we hardcoded all the hyperparameters when declaring the layers. These include the number of hidden units, activation, and dropout. You will see how you can automatically tune some of these a bit later.
Let's then setup the loss, metrics, and the optimizer. The learning rate is also a hyperparameter you can tune automatically but for now, let's set it at 0.001
.
# Setup the training parameters
b_model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
With all settings set, you can start training the model. We've set the number of epochs to 10 but feel free to increase it if you have more time to go through the notebook.
# Number of training epochs.
NUM_EPOCHS = 10
# Train the model
b_model.fit(img_train, label_train, epochs=NUM_EPOCHS, validation_split=0.2)
Finally, you want to see how this baseline model performs against the test set.
# Evaluate model on the test set
b_eval_dict = b_model.evaluate(img_test, label_test, return_dict=True)
Let's define a helper function for displaying the results so it's easier to compare later.
# Define helper function
def print_results(model, model_name, eval_dict):
'''
Prints the values of the hyparameters to tune, and the results of model evaluation
Args:
model (Model) - Keras model to evaluate
model_name (string) - arbitrary string to be used in identifying the model
eval_dict (dict) - results of model.evaluate
'''
print(f'\n{model_name}:')
print(f'number of units in 1st Dense layer: {model.get_layer("dense_1").units}')
print(f'learning rate for the optimizer: {model.optimizer.lr.numpy()}')
for key,value in eval_dict.items():
print(f'{key}: {value}')
# Print results for baseline model
print_results(b_model, 'BASELINE MODEL', b_eval_dict)
That's it for getting the results for a single set of hyperparameters. As you can see, this process can be tedious if you want to try different sets of parameters. For example, will your model improve if you use learning_rate=0.00001
and units=128
? What if 0.001
paired with 256
? The process will be even more difficult if you decide to also tune the dropout and try out other activation functions as well. Keras Tuner solves this problem by having an API to automatically search for the optimal set. You will just need to set it up once then wait for the results. You will see how this is done in the next sections.
To perform hypertuning with Keras Tuner, you will need to:
You will start by installing and importing the required packages.
# Install Keras Tuner
!pip install -q -U keras-tuner
# Import required packages
import tensorflow as tf
import kerastuner as kt
The model you set up for hypertuning is called a hypermodel. When you build this model, you define the hyperparameter search space in addition to the model architecture.
You can define a hypermodel through two approaches:
HyperModel
class of the Keras Tuner APIIn this lab, you will take the first approach: you will use a model builder function to define the image classification model. This function returns a compiled model and uses hyperparameters you define inline to hypertune the model.
The function below basically builds the same model you used earlier. The difference is there are two hyperparameters that are setup for tuning:
You will see that this is done with a HyperParameters object which configures the hyperparameter you'd like to tune. For this exercise, you will:
use its Int()
method to define the search space for the Dense units. This allows you to set a minimum and maximum value, as well as the step size when incrementing between these values.
use its Choice()
method for the learning rate. This allows you to define discrete values to include in the search space when hypertuning.
You can view all available methods and its sample usage in the official documentation.
def model_builder(hp):
'''
Builds the model and sets up the hyperparameters to tune.
Args:
hp - Keras tuner object
Returns:
model with hyperparameters to tune
'''
# Initialize the Sequential API and start stacking the layers
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28)))
# Tune the number of units in the first Dense layer
# Choose an optimal value between 32-512
hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
model.add(keras.layers.Dense(units=hp_units, activation='relu', name='dense_1'))
# Add next layers
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Dense(10, activation='softmax'))
# Tune the learning rate for the optimizer
# Choose an optimal value from 0.01, 0.001, or 0.0001
hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
return model
Now that you have the model builder, you can then define how the tuner can find the optimal set of hyperparameters, also called the search strategy. Keras Tuner has four tuners available with built-in strategies - RandomSearch
, Hyperband
, BayesianOptimization
, and Sklearn
.
In this tutorial, you will use the Hyperband tuner. Hyperband is an algorithm specifically developed for hyperparameter optimization. It uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket wherein the algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. You can read about the intuition behind the algorithm in section 3 of this paper.
Hyperband determines the number of models to train in a bracket by computing 1 + log`factor`(max_epochs
) and rounding it up to the nearest integer. You will see these parameters (i.e. factor
and max_epochs
passed into the initializer below). In addition, you will also need to define the following to instantiate the Hyperband tuner:
objective
to optimize (e.g. validation accuracy)directory
to save logs and checkpoints for every trial (model configuration) run during the hyperparameter search. If you re-run the hyperparameter search, the Keras Tuner uses the existing state from these logs to resume the search. To disable this behavior, pass an additional overwrite=True
argument while instantiating the tuner.project_name
to differentiate with other runs. This will be used as a subdirectory name under the directory
.You can refer to the documentation for other arguments you can pass in.
# Instantiate the tuner
tuner = kt.Hyperband(model_builder,
objective='val_accuracy',
max_epochs=10,
factor=3,
directory='kt_dir',
project_name='kt_hyperband')
Let's see a summary of the hyperparameters that you will tune:
# Display hypertuning settings
tuner.search_space_summary()
You can pass in a callback to stop training early when a metric is not improving. Below, we define an EarlyStopping callback to monitor the validation loss and stop training if it's not improving after 5 epochs.
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
You will now run the hyperparameter search. The arguments for the search method are the same as those used for tf.keras.model.fit
in addition to the callback above. This will take around 10 minutes to run.
# Perform hypertuning
tuner.search(img_train, label_train, epochs=NUM_EPOCHS, validation_split=0.2, callbacks=[stop_early])
You can get the top performing model with the get_best_hyperparameters() method.
# Get the optimal hyperparameters from the results
best_hps=tuner.get_best_hyperparameters()[0]
print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")
Now that you have the best set of hyperparameters, you can rebuild the hypermodel with these values and retrain it.
# Build the model with the optimal hyperparameters
h_model = tuner.hypermodel.build(best_hps)
h_model.summary()
# Train the hypertuned model
h_model.fit(img_train, label_train, epochs=NUM_EPOCHS, validation_split=0.2)
You will then get its performance against the test set.
# Evaluate the hypertuned model against the test set
h_eval_dict = h_model.evaluate(img_test, label_test, return_dict=True)
We can compare the results we got with the baseline model we used at the start of the notebook. Results may vary but you will usually get a model that has less units in the dense layer, while having comparable loss and accuracy. This indicates that you reduced the model size and saved compute resources while still having more or less the same accuracy.
# Print results of the baseline and hypertuned model
print_results(b_model, 'BASELINE MODEL', b_eval_dict)
print_results(h_model, 'HYPERTUNED MODEL', h_eval_dict)
If you want to keep practicing with Keras Tuner in this notebook, you can do a factory reset (Runtime > Factory reset runtime
) and take on any of the following:
hp.Float()
or hp.Choice()
hp.Choice()
HyperModel
classes - HyperXception and HyperResNet for computer vision applications.In this tutorial, you used Keras Tuner to conveniently tune hyperparameters. You defined which ones to tune, the search space, and search strategy to arrive at the optimal set of hyperparameters. These concepts will again be discussed in the next sections but in the context of AutoML, a package that automates the entire machine learning pipeline. On to the next!