Accelerate model training using faster Pipe mode on Amazon SageMaker

Amazon SageMaker now comes with a faster Pipe mode implementation, significantly accelerating the speeds at which data can be streamed from Amazon Simple Storage Service (S3) into Amazon SageMaker while training machine learning models.

Pipe mode offers significantly better read throughput than the File mode that downloads data to the local Amazon Elastic Block Store (EBS) volume prior to starting the model training. This means your training jobs start sooner, finish quicker, and need less disk space, reducing your overall cost to train machine learning models on Amazon SageMaker. For example, we conducted internal benchmarks earlier this year when we launched Pipe Input Mode for Amazon SageMaker built-in algorithms. We learned that start times were reduced by up to 87 percent on a 78 GB training dataset. In addition, we saw that throughput was twice as fast in some benchmarks, resulting in up to 35 percent reduction in total training time.

Overview

Amazon SageMaker supports two mechanisms for transferring training data: File mode and Pipe mode. In File mode, the training data is downloaded first to an encrypted EBS volume attached to the training instance prior to commencing the training. However, in Pipe mode the input data is streamed directly to the training algorithm while it is running. This continuous streaming of data enables a few significant advantages. First, the startup time of a training job becomes independent of the size of the input data, resulting in much quicker startup, especially while training on gigabyte- and petabyte-scale datasets. Furthermore, you don’t have to pay for a large disk volume to download large datasets. Finally, if your training algorithm is I/O-bound, the highly concurrent, high-throughput reading mechanism employed by Pipe mode can significantly speed up your model training.

Higher I/O throughput with faster Pipe mode

The latest implementation of Pipe mode provides higher data streaming throughputs than before. The following chart demonstrates the throughput improvements in Pipe mode compared to when we launched Pipe mode support earlier this year. For apples to apples comparison, the streaming throughput numbers are baselined against that of File mode as measured across instance types supported by Amazon SageMaker training.

As you can see, streaming training data using Pipe mode is now up to three times faster than before in some cases. Pipe mode support is available out of the box for Amazon SageMaker built-in algorithms. Now we will present an example of how you can take advantage of the Pipe mode if you are bringing your own custom training algorithms to Amazon SageMaker.

Writing Pipe mode training code

In Pipe mode, data is pre-fetched from Amazon S3 at high-concurrency and throughput and streamed into Unix Named Pipes (aka FIFOs). There is one FIFO per channel per epoch. The algorithm must open the FIFO for reading and read through to (or optionally abort mid-stream). It must close its end-of-the-file descriptor when done. It can then optionally wait for the next epoch’s FIFO to get created and then it can commence reading. The algorithm iterates through epochs until achieves its completion criteria.

This notebook example has an extremely simple Pipe mode “training” algorithm implementation in Python. It conforms to the specifications required by Amazon SageMaker training It reads data in Pipe mode but does nothing with the data. It simply reads it and throws it away. The example is written this way to illustrate exactly what’s needed to support Pipe mode without complicating the code with a real training algorithm.

The train.py Python program contains the code. The following code snippet iterates through reading each epoch’s data through its corresponding FIFO:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# We're allocating a byte array here to read data into; a real algorithm
# may opt to prefetch the data into a memory buffer and train
# in parallel so that both IO and training happen simultaneously
data = bytearray(16777216)
total_read = 0
total_duration = 0
for epoch in range(num_epochs):
 check_termination()
 epoch_bytes_read = 0
 # As per the Amazon SageMaker training spec, the FIFO's path will be based on
 # the channel name and the current epoch:
 fifo_path = '{0}/{1}_{2}'.format(data_dir, channel_name, epoch)

 # Usually the FIFO will already exist by the time we get here, but
 # to be safe we should wait to confirm:
 wait_till_fifo_exists(fifo_path)
 with open(fifo_path, 'rb', buffering=0) as fifo:
 print('opened fifo: %s' % fifo_path)
 # Now simply iterate reading from the file until EOF. Again, a
 # real algorithm will actually do something with the data
 # rather than simply reading and immediately discarding like we
 # are doing here
 start = time.time()
 bytes_read = fifo.readinto(data)
 total_read += bytes_read
 epoch_bytes_read += bytes_read
 while bytes_read > 0 and not terminated:
 bytes_read = fifo.readinto(data)
 total_read += bytes_read
 epoch_bytes_read += bytes_read

 duration = time.time() - start
 total_duration += duration
 epoch_throughput = epoch_bytes_read / duration / 1000000
 print('Completed epoch %s; read %s bytes; time: %.2fs, throughput: %.2f MB/s'
 % (epoch, epoch_bytes_read, duration, epoch_throughput))

Using Pipe mode versus File mode

There are a few situations where Pipe mode may not be the optimum choice for training. In that case you should stick to using File mode:

  • If your algorithm needs to backtrack or skip ahead within an epoch. This isn’t possible in Pipe mode because the underlying FIFO can’t not support lseek() operations.

  • If your training dataset is small enough to fit in memory and you need to run multiple epochs. In this case it might be quicker and easier just to load it all into memory and iterate.

  • If it is not easy to parse your training dataset from a streaming source.

In all other scenarios, if you have an I/O-bound training algorithm, switching to Pipe mode should give you a significant throughput-boost as well as reduce the size of the disk volume required. This should result in both saving you time and reducing training costs.


About the authors

Ishaaq Chandy is a Senior Engineer in Amazon AI where he loves his work in building an innovative and massively scalable training platform for Amazon Sagemaker. Prior to this he was working on AWS ELB where he was part of the launch teams for both ALB as well as NLB.

 

Sumit Thakur works on products that make it quick and easy for customers to get started with deep learning on cloud. He is product manager for Amazon SageMaker and AWS Deep Learning AMI. In his spare time, he likes connecting with nature and watching sci-fi TV series.**