Ungraded lab: Serve a model with TensorFlow Serving


In this lab you will be taking a look at TFX's model serving system for production called Tensorflow Serving. This system is highly integrated with the Tensorflow stack and provides an easy and straightforward way of deploying models.

Specifically you will:

This lab draws inspiration from this official Tensorflow tutorial, so check it out if you got doubts about the topics covered here.

Notice that unlike last ungraded lab, you will be working with TF serving without using Docker. This is to show you different ways in which this serving system can be used.

Let's get started!

Imports

Downloading the data

During this lab you are not going to train a model but to make use of one to get predictions, because of this you need some test images. The model you are going to use was originally trained using images from the datasets cats and dogs and caltech birds. To ask for predictions some images of the test set are provided:

Now that you are familiar with the data you're going to be working with, let's jump to the model.

Load a pretrained model

The purpose of this lab is to showcase TF serving's capabilities so you are not going to spend any time training a model. Instead you will be using a model that you trained during course 1 of the specialization. This model classifies images of birds, cats and dogs and has been trained with image augmentation so it yields really good results.

First, download the necessary files:

Now, load the model into memory:

At this point you can assume you have succesfully trained the model yourself. You can ignore the warnings about the model being trained on an older version of TensorFlow.

For context, ths model uses a simple CNN architecture. Take a quick look at the layers that made it up:

Save your model

To load our trained model into TensorFlow Serving we first need to save it in SavedModel format. This will create a protobuf file in a well-defined directory hierarchy, and will include a version number. TensorFlow Serving allows us to select which version of a model, or "servable" we want to use when we make inference requests. Each version will be exported to a different sub-directory under the given path.

A saved model on disk includes the following files:

Take a quick look at these files:

Examine your saved model

We'll use the command line utility saved_model_cli to look at the MetaGraphDefs (the models) and SignatureDefs (the methods you can call) in our SavedModel. See this discussion of the SavedModel CLI in the TensorFlow Guide.

That tells us a lot about our model! In this case we didn't explicitly train the model, so any information about the inputs and outputs is very valuable. For instance we know that this model expects our inputs to be of shape (150, 150, 3), which in combination with the use of conv2d layers suggests this model expects colored images in a resolution of 150 by 150. Also the output of the model are of shape (3) suggesting a softmax activation with 3 classes.

Prepare data for inference

Now that you know the shape of the data expected by the model it is time to preprocess the test images accordingly. These images come in a wide variety of resolutions, luckily Keras has you covered with its ImageDataGenerator. Using this object you can:

Since this object is a generator, you can get a batch of images and labels using the next function:

As expected data_imgs is an array containing 32 colored images of 150x150 resolution. In a similar fashion labels has the true label for each one of these 32 images.

To check that everything is working properly, do a sanity check to plot the first 5 images in the batch:

All images have the same resolution and the true labels are correct.

Let's jump to serving the model!

Serve your model with TensorFlow Serving

Install TensorFlow Serving

You will need to install an older version (2.8.0) because more recent versions are currently incompatible with Colab.

Start running TensorFlow Serving

This is where we start running TensorFlow Serving and load our model. After it loads we can start making inference requests using REST. There are some important parameters:

Take a look at the end of the logs printed out my TF model server:

The server was able to succesfully load and serve the model!

Since you are going to interact with the server through HTTP/REST, you should point the requests to localhost:8501 as it is being printed in the logs above.

Make a request to your model in TensorFlow Serving

At this point you already know how your test data looks like. You are going to make predictions for colored images of 150x150 in batches of 32 images (represented by numpy arrays) at a time.

Since REST expects the data to be in JSON format and JSON does not support custom Python data types such as numpy arrays you first need to convert these arrays into nested lists.

TF serving expects a field called instances which contains the input tensors for the model. To pass in your data to the model you should create a JSON with your data as value for the key instances.

Make REST requests

We'll send a predict request as a POST request to our server's REST endpoint, and pass it the batch of 32 images.

Remember that the endpoint that serves the model is located at http://localhost:8501. However this URL still needs some additional parameters to properly handle the request. You should append v1/models/name-of-your-model:predict to it so TF serving knows which model to look for and to perform a predict task.

You should also pass to the request the data containing the list that represents the 32 images along with a headers dictionary that specifies the type of content that will be passed, which is JSON in this case.

After you get a response from the server you can get the predictions out of it by inspecting the predictions field of the JSON that the response returned.

You might think it is weird that the predictions returned 3 values for each image. However, remember that the last layer of the model is a softmax function, so it returned a value for each one of the class. To get the actual predictions you need to find the maximum argument:

Now you have a predicted class for each one of the test images! Nice!

To test how good the model is performing let's plot the first 10 images along with the true and predicted labels:

To do some further testing you can plot more images out of the 32 or even try to generate a new batch from the generator and repeat the steps above.

Optional Challenge

Try to recreating the steps above for the next batch of 32 images:

Solution

If you want some help, the answer can be found in the next cell:

Conclusion

Congratulations on finishing this ungraded lab!

Now you should have a deeper understanding of TF serving's internals. In the previous ungraded lab you saw how to use TFS alongside with Docker. In this one you saw how TFS and tensorflow-model-server worked on their own. You also saw how to save a model and the structure of a saved model.

Keep it up!