TR2014-079

Sequential Maximum Mutual Information Linear Discriminant Analysis for Speech Recognition


    •  Tachioka, Y., Watanabe, S., Le Roux, J., Hershey, J.R., "Sequential Maximum Mutual Information Linear Discriminant Analysis for Speech Recognition", Interspeech, September 2014, vol. 15, pp. 2415-2419.
      BibTeX TR2014-079 PDF
      • @inproceedings{Tachioka2014sep,
      • author = {Tachioka, Y. and Watanabe, S. and {Le Roux}, J. and Hershey, J.R.},
      • title = {Sequential Maximum Mutual Information Linear Discriminant Analysis for Speech Recognition},
      • booktitle = {Interspeech},
      • year = 2014,
      • volume = 15,
      • pages = {2415--2419},
      • month = sep,
      • publisher = {International Speech Communication Association},
      • issn = {2308-457X},
      • url = {https://www.merl.com/publications/TR2014-079}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Speech & Audio

Abstract:

Linear discriminant analysis (LDA) is a simple and effective feature transformation technique that aims to improve discriminability by maximizing the ratio of the between-class variance to the within-class variance. However, LDA does not explicitly consider the sequential discriminative criterion which consists in directly reducing the errors of a speech recognizer. This paper proposes a simple extension of LDA that is called sequential LDA (sLDA) based on a sequential discriminative criterion computed from the Gaussian statistics, which are obtained from sequential maximum mutual information (MMI) training. Al- though the objective function of the proposed LDA can be regarded as a special case of various discriminative feature trans- formation techniques (for example, f-MPE or the bottom layer of a neural network), the transformation matrix can be obtained as the closed-form solution to a generalized eigenvalue problem, in contrast to the gradient-descent-based optimization methods usually used in these techniques. Experiments on large vocabulary continuous speech recognition (Corpus of Spontaneous Japanese) and noisy speech recognition task (2nd CHiME challenge) show consistent improvements from standard LDA due to the sequential discriminative training. In addition, the proposed method, despite its simple and fast computation, improved the performance in combination with discriminative feature transformation (f-bMMI), perhaps by providing a good initialization to f-bMMI.