Ungraded Lab: Hyperparameter tuning and model training with TFX

In this lab, you will be again doing hyperparameter tuning but this time, it will be within a Tensorflow Extended (TFX) pipeline.

We have already introduced some TFX components in Course 2 of this specialization related to data ingestion, validation, and transformation. In this notebook, you will get to work with two more which are related to model development and training: Tuner and Trainer.

tfx pipeline image source: https://www.tensorflow.org/tfx/guide

You will again be working with the FashionMNIST dataset and will feed it though the TFX pipeline up to the Trainer component.You will quickly review the earlier components from Course 2, then focus on the two new components introduced.

Let's begin!

Setup

Install TFX

You will first install TFX, a framework for developing end-to-end machine learning pipelines.

Note: In Google Colab, you need to restart the runtime at this point to finalize updating the packages you just installed. You can do so by clicking the Restart Runtime at the end of the output cell above (after installation), or by selecting Runtime > Restart Runtime in the Menu bar. Please do not proceed to the next section without restarting. You can also ignore the errors about version incompatibility of some of the bundled packages because we won't be using those in this notebook.

Imports

You will then import the packages you will need for this exercise.

Download and prepare the dataset

As mentioned earlier, you will be using the Fashion MNIST dataset just like in the previous lab. This will allow you to compare the similarities and differences when using Keras Tuner as a standalone library and within an ML pipeline.

You will first need to setup the directories that you will use to store the dataset, as well as the pipeline artifacts and metadata store.

You will now download FashionMNIST from Tensorflow Datasets. The with_info flag will be set to True so you can display information about the dataset in the next cell (i.e. using ds_info).

You can review the downloaded files with the code below. For this lab, you will be using the train TFRecord so you will need to take note of its filename. You will not use the test TFRecord in this lab.

You will then copy the train split from the downloaded data so it can be consumed by the ExampleGen component in the next step. This component requires that your files are in a directory without extra files (e.g. JSONs and TXT files).

TFX Pipeline

With the setup complete, you can now proceed to creating the pipeline.

Initialize the Interactive Context

You will start by initializing the InteractiveContext so you can run the components within this Colab environment. You can safely ignore the warning because you will just be using a local SQLite file for the metadata store.

ExampleGen

You will start the pipeline by ingesting the TFRecord you set aside. The ImportExampleGen consumes TFRecords and you can specify splits as shown below. For this exercise, you will split the train tfrecord to use 80% for the train set, and the remaining 20% as eval/validation set.

StatisticsGen

Next, you will compute the statistics of the dataset with the StatisticsGen component.

SchemaGen

You can then infer the dataset schema with SchemaGen. This will be used to validate incoming data to ensure that it is formatted correctly.

ExampleValidator

You can assume that the dataset is clean since we downloaded it from TFDS. But just to review, let's run it through ExampleValidator to detect if there are anomalies within the dataset.

Transform

Let's now use the Transform component to scale the image pixels and convert the data types to float. You will first define the transform module containing these operations before you run the component.

You will run the component by passing in the examples, schema, and transform module file.

Note: You can safely ignore the warnings and udf_utils related errors.

Tuner

As the name suggests, the Tuner component tunes the hyperparameters of your model. To use this, you will need to provide a tuner module file which contains a tuner_fn() function. In this function, you will mostly do the same steps as you did in the previous ungraded lab but with some key differences in handling the dataset.

The Transform component earlier saved the transformed examples as TFRecords compressed in .gz format and you will need to load that into memory. Once loaded, you will need to create batches of features and labels so you can finally use it for hypertuning. This process is modularized in the _input_fn() below.

Going back, the tuner_fn() function will return a TunerFnResult namedtuple containing your tuner object and a set of arguments to pass to tuner.search() method. You will see these in action in the following cells. When reviewing the module file, we recommend viewing the tuner_fn() first before looking at the other auxiliary functions.

With the module defined, you can now setup the Tuner component. You can see the description of each argument here.

Notice that we passed a num_steps argument to the train and eval args and this was used in the steps_per_epoch and validation_steps arguments in the tuner module above. This can be useful if you don't want to go through the entire dataset when tuning. For example, if you have 10GB of training data, it would be incredibly time consuming if you will iterate through it entirely just for one epoch and one set of hyperparameters. You can set the number of steps so your program will only go through a fraction of the dataset.

You can compute for the total number of steps in one epoch by: number of examples / batch size. For this particular example, we have 48000 examples / 32 (default size) which equals 1500 steps per epoch for the train set (compute val steps from 12000 examples). Since you passed 500 in the num_steps of the train args, this means that some examples will be skipped. This will likely result in lower accuracy readings but will save time in doing the hypertuning. Try modifying this value later and see if you arrive at the same set of hyperparameters.

Trainer

Like the Tuner component, the Trainer component also requires a module file to setup the training process. It will look for a run_fn() function that defines and trains the model. The steps will look similar to the tuner module file:

You can pass the output of the Tuner component to the Trainer by filling the hyperparameters argument with the Tuner output. This is indicated by the tuner.outputs['best_hyperparameters'] below. You can see the definition of the other arguments here.

Take note that when re-training your model, you don't always have to retune your hyperparameters. Once you have a set that you think performs well, you can just import it with the ImporterNode as shown in the official docs:

hparams_importer = ImporterNode(
    instance_name='import_hparams',
    # This can be Tuner's output file or manually edited file. The file contains
    # text format of hyperparameters (kerastuner.HyperParameters.get_config())
    source_uri='path/to/best_hyperparameters.txt',
    artifact_type=HyperParameters)

trainer = Trainer(
    ...
    # An alternative is directly use the tuned hyperparameters in Trainer's user
    # module code and set hyperparameters to None here.
    hyperparameters = hparams_importer.outputs['result'])

Your model should now be saved in your pipeline directory and you can navigate through it as shown below. The file is saved as saved_model.pb.

You can also visualize the training results by loading the logs saved by the Tensorboard callback.

Congratulations! You have now created an ML pipeline that includes hyperparameter tuning and model training. You will know more about the next components in future lessons but in the next section, you will first learn about a framework for automatically building ML pipelines: AutoML. Enjoy the rest of the course!