Data scientists and developers can use the Amazon SageMaker fully managed machine learning service to build and train machine learning (ML) models, and then directly deploy them into a production-ready hosted environment.
In this blog post we’ll show you how to use Amazon SageMaker to do transfer learning using a TensorFlow container with our own code for training and inference.
Transfer learning is a well-known technique that is used in computer vision problems to re-train an existing trained neural network, such as AlexNet or ResNet[1], for additional custom labels. Amazon SageMaker also supports transfer learning for image classification through the built-in image classification algorithm and you could use your own labelled image data to re-train a ResNet[1] network. See this image classification documentation for more details on Amazon SageMaker. To understand when to use transfer learning and related guidelines, refer to this.
Although the Amazon SageMaker built-in image classification algorithm is great for a variety of use cases, you might have scenarios where you need a different combination of the pre-trained network and the image data on which it has been trained. For example, some of the criteria to keep in mind are – similarity of new data set to the original data set, size of the new data set, number of labels required, accuracy of the model, size of the trained model, and, last but not least, the amount of compute power needed to re-train. If, for example, you are trying to deploy the trained model on a handheld device, you might be better off with a smaller footprint model such as MobileNet. Alternatively, if you want more compute-efficient models, Xception would be better than VGG16 or Inception.
In this blog post, we take an inception v3 network pre-trained on an ImageNet dataset and re-train it using the Caltech-256 dataset (Griffin, G. Holub, AD. Perona, P. The Caltech 256. Caltech Technical Report). Amazon SageMaker makes it very easy to bundle your own container and import it into Amazon Elastic Container Registry (Amazon ECR). Alternatively, you could use the container provided by Amazon SageMaker: https://github.com/aws/sagemaker-tensorflow-containers. We will customize the Amazon SageMaker TensorFlow container with our own transfer learning code in TensorFlow framework. Then we’ll import this container into Amazon ECR, and use it for model training and inferencing.
I hope this gives you sufficient background, now let us start putting it together.
Launching and preparing the environment
We’ll leverage the Jupyter Notebook instance provided by Amazon SageMaker to customize the provided TensorFlow container. We will import the container to this notebook environment before registering into Amazon ECR. The same container will be used both for training and inferencing. Note that we’ll build on a previous blog post, which explains the basics of importing your own container. The difference is that in this blog post we are customizing the TensorFlow container.
Launch an Amazon SageMaker notebook instance
Log in to the AWS Management Console and go to the Amazon SageMaker console. The notebook instance has all the building blocks to build a custom container and the Docker container Image.
Open the Amazon SageMaker Dashboard. Choose Create notebook instance.
For the purposes of this blog post, we will do the following:
-
Place the notebook instance in a subnet inside a VPC, which has internet access.
-
You can pick up any of the instance options from the drop-down list, but we recommend ml.m4.xlarge at the minimum.
-
Create a new IAM role or attach an existing one. Make sure that this IAM role allows Amazon ECR full access and S3 Bucket access. This access is in addition to the default access that you get when creating a new IAM role for Amazon SageMaker. For this blog post, these permissions should be sufficient to get started, however, if you want to further narrow down the scope of permissions and limit them to the resources, see Amazon SageMaker Roles in the documentation.
-
Also, create a security group in this VPC, which has at least port 80 and 8888 opened.
-
Enable internet access for this notebook. For purposes of this blog, this should be sufficient, but for additional security considerations of a notebook instance, review Notebook Instance Security in the documentation.
-
Keep the rest of the options at the default settings.
Choose Create notebook instance and you should see an instance launching. (It’s in Pendingstatus.) –
Wait for the status to change from Pending to InService.
After the notebook instance is in service, choose Open. Now you have a fully working Jupyter notebook with variety of different environments pre-configured and pre-built for you.
A screen similar to the following opens.
You can see all the pre-configured environments by choosing on New in the top-right corner.
Choose Terminal to launch a terminal.
We will be using this terminal screen for most of our workshop.
Get the Amazon SageMaker provided TensorFlow container
The containers for TensorFlow, MXNet, and Chainer have been provided. We will customize the TensorFlow container with our own training and inference code. To begin, we will first clone the TensorFlow container Git repository from AWS.
In the terminal window, execute the following command:
sh$ git clone https://github.com/aws/sagemaker-tensorflow-containers.git
1 |
|
Execute the following to clone the TensorFlow code for transfer learning and inferencing:
1 |
|
Check the files present in tfblog/tensorflow_code. Look for important files, such as train, which is used while training the model. The Amazon SageMaker invokes Docker container so that this script is called during training process. Similarly, there is a script called predictor.py that loads the trained model parameters and through a wrapper function in serve waits for prediction queries. It parses the input data (image/ text/ csv) and calls the model function. The predictions are then replied back to the user. Note that the code in train and predictor.py is taken from the image retraining part and labeling of images part of the code. The primary changes ensure that the inference code runs as a Flask application instance and responds to ping and invocation requests from Amazon SageMaker. This sample notebook shows you how to package the Docker container and the folder structure required for it to work in Amazon SageMaker.
Now build the container and give it a name, such ‘blog_img_5’or a name that you prefer. This image will be registered in ECR:
sh# bash build_and_push.sh blog_img_5 (Or whatever name you prefer for the image instead of blog_img_5)
1 |
|
You can use the Caltech-256 dataset, as mentioned earlier.
Make sure that the folder structure has all the images belonging to a label grouped in their own folder:
Now, let’s test the training locally from within the SageMaker notebook instance. Go to folder called local_test, which has utilities for building, deploying, and running predictions locally before publishing into Amazon SageMaker.
This is very useful for testing and debugging purposes. Note the helper script called train_local.sh and how it launches the container, mounts the training data path, and passes ‘train’ as the entry point.
Copy a small amount of data to test the functionality ( for example, to ensure that ‘train’ and ‘predict’ are invoked) and not the training accuracy.
To start the local training:
sh# bash train_local.sh blog_img_5
1 |
|
Code start
import boto3 runtime = boto3.Session().client(service_name=’runtime.sagemaker’) url = ‘ https://images.unsplash.com/photo-1519046947096-f43d6481532b’ response = runtime.invoke_endpoint(EndpointName=’blog-ep-new, ContentType=’text/plain’, Body=url) result = response[‘Body’].read() result
Code end
1 |
|
Clean-up and delete the resources
Remember to clean up the resources that were provisioned so you can avoid costs after running this scenario.
Delete endpoint
To delete the endpoint, highlight the endpoint that was created and select Delete:
Delete endpoint configuration
Similarly, delete the endpoint configuration by selecting the appropriate endpoint configuration and selecting Delete:
Delete model
Next we’ll delete the model:
Delete notebook instance
Finally delete the notebook instance:
Conclusion
To conclude, I hope this blog post gave you some ideas for taking existing code in a framework, bundling it into an Amazon SageMaker-provided container, and porting it over to Amazon SageMaker. The containerized approach that this service provides makes it very easy for developers and data scientists to focus on their core strengths. You can use any framework you choose, but you still can leverage the wide array of AWS services, such as Amazon S3, CloudWatch, Amazon ECR, and IAM to scale and operate your infrastructure.
About the Author
Amit Sharma is an AWS solutions architect specializing in Analytics and Machine Leaning services. He helps various AWS customers and partners with technical guidance on related projects, thus enabling them to leverage services for their business benefit.