Semantic Segmentation algorithm is now available in Amazon SageMaker

Amazon SageMaker is a managed and infinitely scalable machine learning (ML) platform. With this platform, it is easy to build, train, and deploy machine learning models. Amazon SageMaker already has two popular built-in computer vision algorithms for image classification and object detection. The Amazon SageMaker image classification algorithm learns to categorize images into a set of pre-defined categories. The Amazon SageMaker object detection algorithm learns to draw bounding boxes and identify objects in the boxes. Today, we are excited to announce that we are enhancing our computer vision family of algorithms with the launch of the Amazon SageMaker semantic segmentation algorithm.

An example of the Amazon SageMaker semantic segmentation algorithm at work. Photo by Pixabay via PEXELS.

Semantic segmentation (SS) is the task of classifying every pixel in an image with a class from a known set of labels. The segmentation output is usually represented as different RGB (or grayscale, if the number of classes is fewer than 255) values. Therefore the output is a matrix (or grayscale image) with the same shape as the input image. This output image is also called a segmentation mask. With the Amazon SageMaker semantic segmentation algorithm, you can train your models with your own dataset, plus you can use our pre-trained models for favorable initialization. The algorithm is built using the MXNet Gluon framework and the Gluon CV toolkit. It provides an option of three built-in, state-of-the-art algorithms with which you can learn the semantic segmentation model:

All algorithms have two distinct components:

An encoder or a backbone
A decoder.

The backbone is a network that produces reliable activation maps of image features. The decoder is a network that constructs the segmentation mask from the encoded activation maps. Amazon SageMaker semantic segmentation provides a choice of pre-trained or randomly initialized ResNet50 or ResNet101 as options for backbones. The backbones come with pre-trained artifacts that were originally trained on the ImageNet classification task. These are reliable pre-trained artifacts that users can use to fine-tune their FCN or PSP backbones for segmentation. Alternatively, users can initialize these networks from scratch. Decoders are never pre-trained.

The algorithm can be trained using P2/P3 type Amazon Elastic Compute Cloud (Amazon EC2) instances in single machine configurations. Trained models from the algorithm can be hosted on all CPU and GPU instances supported by Amazon SageMaker. However, training on CPU machines is always more expensive than GPU machines since we are able to make use of advanced math libraries to fully use GPUs for convolutional networks. Therefore, we restrict training only to GPU machines. When trained and properly hosted, the algorithm can either generate segmentation masks for a query image as a PNG file or produce a probability score for each pixel for each class. The algorithm can handle a variety of segments of varying sizes, shapes and scales natively.

Getting started

Amazon SageMaker semantic segmentation expects the customer’s training dataset to be on Amazon Simple Storage Service (Amazon S3). Once trained, it produces the resulting model artifacts on Amazon S3. Amazon SageMaker takes care of starting and stopping Amazon EC2 instances for the customers during training. After the model is trained, it can be deployed to an endpoint. For a general, high-level overview of the Amazon SageMaker workflow, see the Amazon SageMaker documentation. The Amazon SageMaker semantic segmentation algorithm can be trained using several interfaces. The AWS Management Console interface has a simple form-like structure that can be used to kick off training jobs and creating endpoints. There are also APIs that are available in Python that are explained using the associated notebook.

I/O format

The Amazon SageMaker semantic segmentation algorithm will supports the following file input format. This format allows the user to directly pass images. The dataset in Amazon S3 is expected to be presented in four channels: two for train and two for validation using four directories, two for images and two for annotations. Annotations are expected to be uncompressed PNG images. The dataset may also have a label map that describes how the annotation mappings are established. If not, a default will be used. The algorithm is capable of working with annotations from various annotation systems and standard benchmarking datasets. The algorithm also supports an augmented manifest for PIPE mode training straight from S3. Refer to the documentation on how the I/O format works. The algorithm allows inputs to be supplied using an augmented manifest, which works in Pipe mode straight from S3.

Inference formats

To query a trained model using the model’s endpoint, an image needs to be supplied along with an Accept Type denoting the type of output required. Depending on the request, the algorithm will output a PNG file with a segmentation mask in the same format as the labels itself, or it outputs class probabilities encoded in a protobuf format. Refer to the documentation for more information on AcceptTypes.

Training job

Note that the Amazon SageMaker semantic segmentation algorithm only supports GPU instances for training. We recommend using GPU instances with more memory for training with large batch sizes. While the algorithm trains, you can monitor the progress through either at the Amazon SageMaker notebook or Amazon CloudWatch. After the training is done, the trained model artifacts will be uploaded to the Amazon S3 output location that you specified in the training configuration. To deploy the model as an endpoint, you can choose to use either a CPU or a GPU instance.

Performance numbers

The following numbers demonstrate some performance numbers for the Amazon SageMaker semantic segmentation algorithm. We trained on the PASCAL VOC12 training dataset and observe the mean Intersection-over-Union (mIOU) on the VOC12 validation dataset with a crop size of 240X240. For the experiment, we used backbone = "resnet-50" and “rmsprop” as the optimizer with default parameters (momentum = 0.9, weight_decay = 0.0001). We trained the model for 20 epochs and achieved an mIOU of 0.62. Using backbone="resnet-50", we observe an approximately 5.83x speedup in training speed while going from a single GPU (ml.p3.2xlarge) to 8 GPUs in (ml.p3.16xlarge) instances with a mini_batch_size of 8 for the former and 64 for the latter. Analogously, we also observed greater than 2.5x speed increase when moving from ml.p2.16xlarge to ml.p3.16xlarge multi-GPU instances.

Notebooks

An example of object detection is available in the SageMaker notebooks repository. Refer to this for a complete tutorial and some recommendations for data preparation and hyperparameters.

Conclusion

In this blog post we announced the launch of the Amazon SageMaker Semantic Segmentation algorithm. We described how to get started with training your own semantic segmentation models, and we presented a few performance numbers. We look forward to hearing from you as you set up your own implementation of semantic segmentation.

About the Authors

Ragav Venkatesan is a Research Scientist with AWS AI Labs. He has an MS in Electrical Engineering and a PhD in Computer Science from Arizona State University. His current area of research includes Neural Network compression and Computer Vision algorithms for Amazon SageMaker. Outside of work, Ragav is a session bassist and producer at Thaalam Studios.**

Saksham Saini did his BS in Computer Engineering from University of Illinois at Urbana-Champaign. He is currently working on building highly optimized and scalable algorithms for Amazon SageMaker. Outside work, he enjoys reading, music and traveling.

Satyaki Chakraborty is an MS student at Carnegie Mellon University studying computer vision. He contributed to Amazon SageMaker Semantic Segmentation during his summer internship.

Xiong Zhou is an Applied Scientist with AWS AI Labs. He has a PhD in Electrical and Electronics Engineering from University of Houston. His current research focus involves developing domain adaptation and active learning algorithms. He is also working on building computer vision algorithms for Amazon SageMaker.

Luka Krajcar is a Software Development Engineer on the AWS AI Algorithms team. He received his M.S. in Computer Science at the Faculty of Electrical Engineering and Computing at the University of Zagreb. Outside of work, Luka enjoys reading fiction, running, and video gaming.

Hang Zhang is an Applied Scientist with Amazon AI. He has a PhD from Rutgers University. He is currently working with the GluonCV team.