TR2016-011

Minimum Word Error Training of Long Short-Term Memory Recurrent Neural Network Language Models for Speech Recognition

- Hori, T., Hori, C., Watanabe, S., Hershey, J.R., "Minimum Word Error Training of Long Short-Term Memory Recurrent Neural Network Language Models for Speech Recognition", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP.2016.7472827, March 2016, pp. 5990-5994.
  BibTeX TR2016-011 PDF
  - @inproceedings{Hori2016mar,
  - author = {Hori, Takaaki and Hori, Chiori and Watanabe, Shinji and Hershey, John R.},
  - title = {Minimum Word Error Training of Long Short-Term Memory Recurrent Neural Network Language Models for Speech Recognition},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2016,
  - pages = {5990--5994},
  - month = mar,
  - doi = {10.1109/ICASSP.2016.7472827},
  - url = {https://www.merl.com/publications/TR2016-011}
  - }
MERL Contact:
- Chiori
  Hori
Research Areas:

Artificial Intelligence, Speech & Audio

Abstract:

This paper describes minimum word error (MWE) training of recurrent neural network language models (RNNLMs) for speech recognition. RNNLMs are usually trained to minimize a cross entropy of estimated word probabilities against the correct word sequence, which corresponds to maximum likelihood criterion. However, this training does not necessarily maximize a performance measure in a target task, i.e. it does not minimize word error rate (WER) explicitly in speech recognition. To solve such a problem, several discriminative training methods have already been proposed for n-gram language models, but those for RNNLMs have not sufficiently investigated. In this paper, we propose a MWE training method for RNNLMs, and report significant WER reductions when we applied the MWE method to a standard Elman-type RNNLM and a more advanced model, a Long Short-Term Memory (LSTM) RNNLM. We also present efficient MWE training with N-best lists on Graphics Processing Units (GPUs).

Related News & Events

NEWS MERL researchers present 12 papers at ICASSP 2016
Date: March 20, 2016 - March 25, 2016
Where: Shanghai, China
MERL Contacts: Petros T. Boufounos; Chiori Hori; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Anthony Vetro
Research Areas: Computational Sensing, Digital Video, Speech & Audio, Communications, Signal Processing
Brief
- MERL researchers have presented 12 papers at the recent IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which was held in Shanghai, China from March 20-25, 2016. ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing, with more than 1200 papers presented and over 2000 participants.

MERL Contact:

ChioriHori

Research Areas:

Abstract:

Chiori
Hori