Amazon Transcribe now supports real-time transcriptions

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to applications. We’re excited to announce a new feature called Streaming Transcription, which enables users to pass a live audio stream to our service and receive text transcripts in real time.

Real-time transcriptions benefit use cases across diverse verticals, including contact centers, media and entertainment, courtroom record keeping, finance, and insurance. For example, contact centers can detect keywords in real-time transcriptions to trigger downstream actions, like automatically summoning a supervisor. In media, live broadcasting of news or shows can benefit from live subtitling. Video game companies can use streaming transcription to meet accessibility requirements for in-game chat, helping players who have hearing impairments. In the legal domain, courtrooms can leverage real-time transcriptions to enable stenography, while lawyers can also make legal annotations on top of live transcripts for deposition purposes. In business productivity, companies can leverage real-time transcription to capture meeting notes on the fly.

Streaming Transcription utilizes HTTP 2’s implementation of bidirectional streams to handle streaming audio and transcripts between your application and the Amazon Transcribe service. Bidirectional streams allow your application to handle sending and receiving data at the same time, resulting in quicker, more reactive results.

To demonstrate how to use the AWS SDK to take advantage of Streaming Transcription within your own applications, we’ve created an example application. This application creates a basic user interface that allows you to stream audio from your microphone or an audio file to Amazon Transcribe and receive transcripts in real time.

The example application can be found on the AWS GitHub account (https://github.com/aws-samples). Download the example app by choosing the green Clone or download button and selecting the Download ZIP link. Alternatively, you can clone the repository to your desktop using Git or SVN.

Build the application with Apache Maven (https://maven.apache.org/index.html) and then execute the resulting jar with the following commands:

export AWS_ACCESS_KEY_ID=<your key id>
export AWS_SECRET_ACCESS_KEY=<your secret access key>
export AWS_REGION=<desired region endpoint to use, such as us-east-1>
mvn clean package
java -jar target/aws-transcribe-sample-application-1.0-SNAPSHOT-jar-with-dependencies.jar

You should be off and transcribing! Live!

To explore the code, start with the startTranscription method in the TranscribeStreamingClientWrapper class:

return client.startStreamTranscription(
 //Request parameters. Refer to API documentation for details.
 getRequest(sampleRate),
 //AudioEvent publisher containing "chunks" of audio data to transcribe
 requestStream,
 //Defines what to do with transcripts as they arrive from the service
 responseHandler);

All the code necessary to set up an audio stream and a response handler can be found in the repository. We recommend using this example as a starting point for your application.

Good luck and happy transcribing!

About the authors

Paul Zhao is a Sr. Product Manager at AWS Machine Learning. He manages the Amazon Transcribe service. Outside of work, Paul is a motorcycle enthusiast and avid woodworker.

Paul Kohan is a Sr. Software Engineer at Amazon Transcribe. Outside of work Paul enjoys hanging out with his dog, Toby, and playing video and board games.