Bring your own pre-trained MXNet or TensorFlow models into Amazon SageMaker

Not only does Amazon SageMaker  provide easy scalability and distribution to train and host ML models, it is modularized so that the process of training a model is decoupled from deploying the model. This means that models that are trained outside of Amazon SageMaker can be brought into SageMaker only to be deployed. This is very useful if you have models that are already trained, and you want to use only the hosting part of SageMaker instead of the entire pipeline. This is also useful if you don’t train your own models, but you buy pre-trained models.

This blog post explains how to deploy your own models on Amazon SageMaker that have been trained on TensorFlow or MXNet. The following is  an overview of the entire process.

  • Step 1: Model definitions are written in a framework of choice.

  • Step 2: The model is trained in that framework.

  • Step 3: The model is exported and model artifacts that can be understood by Amazon SageMaker are created.

  • Step 4: Model artifacts are uploaded to an Amazon S3 bucket.

  • Step 5: Using the model definitions, artifacts, and the Amazon SageMaker Python SDK, a SageMaker model is created.

  • Step 6: The SageMaker model is deployed as an endpoint.

What is a model and how do I get my hands on one?

A trained model file is simply the model artifact that is created by the training process. It contains the values of all the parameters of the model and some additional metadata about the model architecture, which says how these parameters are connected to each other. Different frameworks have different ways of exporting the model files.

TensorFlow

TensorFlow models can be exported from a trained TensorFlow Estimator. The model files have to follow protocols for Amazon SageMaker to be able to understand it. Consider the following definition of a simple TensorFlow Estimator:

1
2
3
4
5
6
7
8
INPUT_TENSOR_NAME = 'inputs'
def estimator_fn(run_config, params):
 feature_columns = [tf.feature_column.numeric_column(INPUT_TENSOR_NAME, shape=[4])]
 return tf.estimator.DNNClassifier(feature_columns=feature_columns,
 hidden_units=[10, 20, 10],
 n_classes=3,
 config=run_config)

This is a classifier model that expects as input a 4-dimensional numeric column feature in a dictionary keyed by INPUT_TENSOR_NAME. All TensorFlow model exports need a serving input function. The job of the serving input function is to provide an input pipeline to the exported model, when it is deployed. Fortunately, TensorFlow provides a builder to create this serving function. Consider the following piece of code:

1
2
3
4
def serving_input_fn():
 feature_spec = {INPUT_TENSOR_NAME: tf.FixedLenFeature(dtype=tf.float32, shape=[4])}
 return tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)()

This method creates a feature_spec, which is a dictionary of string→ feature values that our TensorFlow model expects. This should have the same structure and dimensionality that the estimator_fn expects of input features as well. We can use the TensorFlow export method to export our model.

1
2
3
4
5
import tarfile

exported_model = classifier.export_savedmodel(export_dir_base = 'export/Servo/',
 serving_input_receiver_fn = serving_input_fn)

Amazon SageMaker expects TensorFlow models to be tarballs (tar.gz files) that when exported will have all the artifacts within the directory export/Servo. The following code shows how this can be accomplished:

1
2
3
**with** tarfile.open('model.tar.gz', mode='w:gz') **as** archive:
 archive.add('export', recursive=True)

MXNet

Analogous to TensorFlow Estimators, MXNet uses MXNet Modules to represent models. In addition to the model params file saved using the checkpoint method, Amazon SageMaker needs the input data shape stored as a .json file. We can simply modify the MXNet checkpoint saver to create appropriate model artifacts that Amazon SageMaker can understand. The following piece of code gets that accomplished:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import os
import json

os.mkdir('model')

model.save_checkpoint('model/model', 0000)
with open ( 'model/model-shapes.json', "w") as shapes:
 json.dump([{"shape": model.data_shapes[0][1], "name": "data"}], shapes)

import tarfile

def flatten(tarinfo):
 tarinfo.name = os.path.basename(tarinfo.name)
 return tarinfo
 
tar = tarfile.open("model.tar.gz", "w:gz")
tar.add("model", filter=flatten)
tar.close()

How to host the model on Amazon SageMaker

Before hosting a model, the model artifacts have to be moved to an Amazon S3 bucket After the artifacts are in the S3 bucket, the model can be deployed as an endpoint on Amazon SageMaker. To perform these tasks, we can make use of the  that accompanies Amazon SageMaker. The Python SDK conveniently provides us an API to upload our models into S3.

1
2
3
4
import sagemaker

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

You can verify this upload by checking the Amazon SageMaker default bucket that is already created in your AWS account. The prefix is the directory under which the model is stored. Converting this model that is at S3, into an Amazon SageMaker endpoint is a two-step process. A SageMaker model is first created, which prepares the model to be deployed and initializes the model constructor.

For TensorFlow model artifacts:

1
2
3
4
5
from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
 role = role,
 entry_point = 'classifier.py')

For MXNet model artifacts:

1
2
3
4
sagemaker_model = MXNetModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
 role = role,
 entry_point = 'classifier.py')

1
predictor = sagemaker_model.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')

The arguments gives us options to choose the number of instances and the type of instances at which to deploy our model. This innocent looking call does a lot of things under-the-hood. It creates a container with our files and our models in it, provisions hardware according to our specifications, and returns us an endpoint with which we can perform inference. This method might take a several minutes to finish. After the endpoint is created, the SDK also provides methods to query the endpoint. In the case of our TensorFlow model, we had a feature dimension of 4. To query the TensorFlow endpoint, we could do the following:

1
2
3
sample = [6.4,3.2,4.5,1.5]
predictor.predict(sample)

After we are done, the endpoint should be destroyed, so you are not charged for compute resources that you aren’t using. The SDK also provides methods to delete endpoints:

1
sagemaker.Session().delete_endpoint(predictor.endpoint)

Keep in mind that when you bring your own models into SageMaker frameworks, the Amazon SageMaker endpoint containers are maintained independently for particular versions of the frameworks. For instance, models that are trained on MXNet 1.0 can’t be deployed on a SageMaker container that supports MXNet version 0.12. The artifact builder needs to ensure that compatible artifacts are built. The allows you to bring in multiple versions of models using appropriate parameters for version selection. The documentation here shows the usage of these parameters.

Notebooks

The code and details in this blog post are also available in notebook format. There are two notebooks that demonstrate how to bring your models into Amazon SageMaker for deployment, one each for TensorFlow and MXNet. Refer these for a complete tutorial.

Conclusion

In this blog post we showed you how you can use Amazon SageMaker to deploy models that have been trained outside the Amazon SageMaker setup using TensorFlow or MXNet. The blog demonstrates how to export models from TensorFlow or MXNet into a format that can be consumed by Amazon SageMaker.


About the Author

Ragav Venkatesan is a Research Scientist with AWS Deep Learning. He has an MS in Electrical Engineering and a PhD in Computer Science from Arizona State University. His current area of research includes Neural Network compression and Computer Vision algorithms.