Long Short-Term Memory dramatically improves Google Voice etc – now available to a billion users

A type of “Recurrent neural networks” architecture initially developed in Juergen Schmidhuber’s research groups at the Swiss AI Lab IDSIA and TU Munich greatly improved Google Voice (by 49%) and are now available to a billion users. You can find the recent Google Research Blog on this, by Haşim Sak, Andrew Senior, Kanishka Rao, Françoise Beaufays and Johan Schalkwyk:

http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html

This is a speech recognition application of “Long Short-Term Memory (LSTM)” Recurrent Neural Networks (since 1997) [1] with “forget gates” (since 1999) [2] and “Connectionist Temporal Classification (CTC)” (since 2006) [3].

Google is using the LSTMs also for numerous other applications such as state-of-the-art machine translation [4], image caption generation [5], natural language processing, etc.

References

[1] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. ftp://ftp.idsia.ch/pub/juergen/lstm.pdf

[2] F. Gers, N. Schraudolph, J. Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research 3:115-143, 2002. ftp://ftp.idsia.ch/pub/juergen/TimeCountOsci-JMLR-final.pdf

[3] A. Graves, S. Fernandez, F. Gomez, J. Schmidhuber. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. Proceedings of the International Conference on Machine Learning (ICML-06, Pittsburgh), 2006. ftp://ftp.idsia.ch/pub/juergen/icml2006.pdf

[4] I. Sutskever, O. Vinyals, O., Q. V. Le, (2014). Sequence to sequence learning with neural networks. Technical Report http://arxiv.org/abs/1409.3215 [cs.CL], Google. NIPS’2014.

[5] O. Vinyals, A. Toshev, S. Bengio, D. Erhan. Show and Tell: A Neural Image Caption Generator. http://arxiv.org/abs/1411.4555