TALK Advanced Recurrent Neural Networks for Automatic Speech Recognition

Date released: April 29, 2016

TALK Advanced Recurrent Neural Networks for Automatic Speech Recognition
Date & Time:

Friday, April 29, 2016; 12:00 PM - 1:00 PM
Abstract:

A recurrent neural network (RNN) is a class of neural network models where connections between its neurons form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Recently the RNN-based acoustic models greatly improved automatic speech recognition (ASR) accuracy on many tasks, such as an advanced version of the RNN, which exploits a structure called long-short-term memory (LSTM). However, ASR performance with distant microphones, low resources, noisy, reverberant conditions, and on multi-talker speech are still far from satisfactory as compared to humans. To address these issues, we develop new strucute of RNNs inspired by two principles: (1) the structure follows the intuition of human speech recognition; (2) the structure is easy to optimize. The talk will go beyond basic RNNs, introduce prediction-adaptation-correction RNNs (PAC-RNNs) and highway LSTMs (HLSTMs). It studies both uni-directional and bi-direcitonal RNNs and discriminative training also applied on top the RNNs. For efficient training of such RNNs, the talk will describe two algorithms for learning their parameters in some detail: (1) Latency-Controlled bi-directional model training; and (2) Two pass forward computation for sequence training. Finally, this talk will analyze the advantages and disadvantages of different variants and propose future directions.
Speaker:

Yu Zhang
MIT
Yu Zhang is a graduate student at the Spoken Language Systems group at CSAIL, MIT where he works with Prof. James Glass. He received his BS and MS degrees from Shanghai Jiao Tong University. His research interests lie in the intersection of speech recognition and machine learning. In particular, he enjoyed designing new variants of recurrent neural network for specific tasks. He is also an open source practitioner and interested in designing and implementing efficient open source training toolkit for deep learning.
Research Area:

Speech & Audio

Date & Time:

Abstract:

Speaker:

Research Area: