The AWS Deep Learning AMIs for Ubuntu and Amazon Linux now come with an optimized build of TensorFlow 1.9 custom-built directly from source and fine-tuned for high performance training across Amazon EC2 instances. In addition, the AMIs come with the latest Apache MXNet 1.2 with several performance and usability improvements, the new Keras 2-MXNet backend with high performance multi-GPU training support, and a new MXBoard tool for improved debugging and visualization for training MXNet models.
Faster training with optimized TensorFlow 1.9 and Horovod
The Amazon Machine Images (AMIs) now come with a compute-optimized build of TensorFlow 1.9 custom-built directly from source to accelerate training performance on the Intel Xeon Platinum processors powering Amazon EC2 C5 instances. Training a ResNet-50 benchmark with synthetic ImageNet dataset using our custom build of TensorFlow 1.9 on a C5.18xlarge instance type was 7X faster than training on the stock TensorFlow 1.9 binaries. The AMIs also come with a GPU-optimized build of TensorFlow 1.9 configured with NVIDIA CUDA 9 and cuDNN 7 to take advantage of mixed precision training on the Volta V100 GPUs powering Amazon EC2 P3 instances. A Deep Learning AMI automatically deploys the high performance build of TensorFlow optimized for the EC2 instance of your choice when you activate the TensorFlow virtual environment for the first time.
For developers looking to scale their TensorFlow training from a single GPU to multiple GPUs, the AWS Deep Learning AMIs come with Horovod, which is fully optimized for efficiently utilizing distributed training cluster topologies composed of Amazon EC2 P3 instances. Horovod is an open source distributed training framework based on the Message Passing Interface (MPI) model. This is a popular standard for passing messages and managing communication between nodes in a high-performance distributed computing environment. For optimizing Horovod on AWS Deep Learning AMI, we have fine-tuned the MPI rank placement pattern to place each physical CPU socket in the Amazon EC2 P3 cluster closest to its corresponding GPU, thus minimizing the overhead from frequent I/O and memory copy operations. In addition, we have optimized the allreduce operation used in combining tensors generated by individual nodes during distributed training. We have fine-tuned the fusion threshold parameter that governs the size of memory buffer used for reducing a batch of tensors together. Although a small buffer size limits the number of reduction operations that can be combined in one batch, a large buffer size can introduce additional wait times. You can run our ResNet-50 distributed training script that trains a ResNet-50 network with ImageNet dataset on a cluster of 8 Amazon EC2 P3.16xlarge instances (64 Volta V100 GPUs) using the EC2 P3-optimized Horovod on AWS Deep Learning AMI.
Enhanced performance and interoperability with Apache MXNet 1.2
The Deep Learning AMIs now come with the latest release of Apache MXNet 1.2 that brings ease of use, faster performance, and enhanced interoperability. MXNet 1.2 adds a new Scala-based inference API that offers thread-safe high-level APIs for performing predictions with deep learning models trained with MXNet. The new Intel MKL-DNN integration in MXNet 1.2 accelerates neural network operators such as convolution, deconvolution, and pooling on compute-optimized C5 instances. The enhanced FP16 support accelerates mixed precision training on Tensor Cores of NVIDIA Volta V100 GPUs powering Amazon EC2 P3 instances. In addition, MXNet 1.2 now comes with a new Open Neural Network Exchange Format (ONNX) module for easily importing ONNX models into the MXNet symbolic interface.
High performance multi-GPU training with the MXNet backend for Keras 2
The Deep Learning AMI now comes pre-installed with the new Keras-MXNet deep learning backend. Keras is a high-level neural network API written in Python that is popular for its quick and easy prototyping of convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Keras developers can now use the high-performance MXNet deep learning engine as their backend for distributed training of CNNs and RNNs. Developers can design in Keras, train with Keras-MXNet, and run inference in production, at scale with MXNet.
You can use the Keras MXNet benchmarks to test the faster training and efficient scaling across multiple GPUs. Training a ResNet50 network with the ImageNet dataset on a p3.16xlarge EC2 instance showed near linear scaling of throughput going from 1 GPU to 8 GPUs. Follow our step-by-step guide to get started with Keras2-MXNet.**
Improved debugging support with MXBoard
The AMIs now come pre-installed with MXBoard, a Python package that provides APIs for logging MXNet data for visualization in TensorBoard. With MXBoard, developers can easily debug and visualize their MXNet model training with support for a range of visualizations including histograms, convolutional filters, visual embedding, and more. Follow our step-by-step guide to get started with MXBoard.
Getting started with the Deep Learning AMIs
You can quickly get started with the AWS Deep Learning AMIs by using our getting started tutorial and, you can go to our developer guide for more tutorials, resources, and release notes. The latest AMIs are now available on the AWS Marketplace. You can also subscribe to our discussion forum to get new launch announcements and post your questions.
About the Author
Sumit Thakur is a Senior Product Manager for AWS Deep Learning. He works on products that make it easy for customers to get started with deep learning on cloud, with a specific focus on making it easy to use engines on Deep Learning AMI. In his spare time, he likes connecting with nature and watching sci-fi TV series.