TR2019-015

Triggered Attention for End-to-End Speech Recognition

- Moritz, N., Hori, T., Le Roux, J., "Triggered Attention for End-to-End Speech Recognition", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP.2019.8683510, May 2019.
  BibTeX TR2019-015 PDF
  - @inproceedings{Moritz2019may,
  - author = {Moritz, Niko and Hori, Takaaki and {Le Roux}, Jonathan},
  - title = {{Triggered Attention for End-to-End Speech Recognition}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2019,
  - month = may,
  - doi = {10.1109/ICASSP.2019.8683510},
  - url = {https://www.merl.com/publications/TR2019-015}
  - }
MERL Contact:
- Jonathan
  Le Roux
Research Areas:

Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

A new system architecture for end-to-end automatic speech recognition (ASR) is proposed that combines the alignment capabilities of the connectionist temporal classification (CTC) approach and the modeling strength of the attention mechanism. The proposed system architecture, named triggered attention (TA), uses a CTC-based classifier to control the activation of an attention-based decoder neural network. This allows for a frame-synchronous decoding scheme with an adjustable look-ahead parameter to control the induced delay and opens the door to streaming recognition with attention-based end-to-end ASR systems. We present ASR results of the TA model on three data sets of different size and language and compare the scores to a well-tuned attention-based end-to-end ASR baseline system, which consumes input frames in the traditional full-sequence manner. The proposed triggered attention (TA) decoder concept achieves similar or better ASR results in all experiments compared to the full-sequence attention model, while also limiting the decoding delay to two look-ahead frames, which in our setup corresponds to an output delay of 80 ms.

Related News & Events

NEWS MERL presenting 16 papers at ICASSP 2019
Date: May 12, 2019 - May 17, 2019
Where: Brighton, UK
MERL Contacts: Petros T. Boufounos; Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Tim K. Marks; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief
- MERL researchers will be presenting 16 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Brighton, UK from May 12-17, 2019. Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and parameter estimation. MERL is also a sponsor of the conference and will be participating in the student career luncheon; please join us at the lunch to learn about our internship program and career opportunities.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.

MERL Contact:

JonathanLe Roux

Research Areas:

Abstract:

Jonathan
Le Roux