Mitsubishi Electric Research Laboratories

A Companding Front End for Noise-Robust Automatic Speech Recognition

Citation:   *  Guinness, J.; Raj, B.; Schmidt-Nielsen, B.; Turicchia, L.; Sarpeshkar, R., "A Companding Front End for Noise-Robust Automatic Speech Recognition", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ISSN: 1520-6149, Vol. 1, pp. 249-252, March 2005 (IEEE Xplore)
MERL Report:  TR2005-023

Feature computation models for automatic speech recognition (ASR) systems have long been modeled on the human auditory system. Most current ASR systems model the critical band response and equal loudness characteristics of the auditory system. It has been postulate that more detailed models of the human auditory system can lead to more noise-robust speech recognition. An auditory phenomenon that is of particular relevance to robustness is simultaneous masking, whereby dominant frequencies suppress adjacent weaker frequencies. In this paper we present a companding-based model that mimics simultaneous masking in the front end of a speech recognizer. In an automotive digits recognition task, the front end improves word error rate by 4.0% (25% relative ot Mel cepstra) at -5 dB SNR at the cost of a 1.7% increase at 15 dB SNR.

 Read the full technical report (PDF: 362.8 kB)