TR2019-159

Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models

- Moritz, N., Hori, T., Le Roux, J., "Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models", IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), December 2019, pp. 936-943.
  BibTeX TR2019-159 PDF
  - @inproceedings{Moritz2019dec,
  - author = {Moritz, Niko and Hori, Takaaki and {Le Roux}, Jonathan},
  - title = {{Streaming End-to-End Speech Recognition with Joint CTC-Attention Based Models}},
  - booktitle = {IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)},
  - year = 2019,
  - pages = {936--943},
  - month = dec,
  - isbn = {978-1-7281-0305-1},
  - url = {https://www.merl.com/publications/TR2019-159}
  - }
MERL Contact:
- Jonathan
  Le Roux
Research Areas:

Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

In this paper, we present a one-pass decoding algorithm for streaming recognition with joint connectionist temporal classification (CTC) and attention-based end-to-end automatic speech recognition (ASR) models. The decoding scheme is based on a frame-synchronous CTC prefix beam search algorithm and the recently proposed triggered attention concept. To achieve a fully streaming end-to-end ASR system, the CTC-triggered attention decoder is combined with a unidirectional encoder neural network based on parallel time-delayed long short-term memory (PTDLSTM) streams, which has demonstrated superior performance compared to various other streaming encoder architectures in earlier work. A new type of pre-training method is studied to further improve our streaming ASR models by adding residual connections to the encoder neural network and layer-wise removing them during the training process. The proposed joint CTC-triggered attention decoding algorithm, which enables streaming recognition of attention-based ASR systems, achieves similar ASR results compared to offline CTC-attention decoding and significantly better results compared to CTC prefix beam search decoding alone.

MERL Contact:

JonathanLe Roux

Research Areas:

Abstract:

Jonathan
Le Roux