Speech & Audio

Audio source separation, recognition, and understanding.

The speech and audio group at MERL is interested in a broad spectrum of challenging speech and audio machine perception problems. In the area of front-end signal processing, we are focusing on microphone array processing, single-channel speech enhancement, and acoustic event detection. For traditional ASR tasks, we are working on discriminative methods for acoustic and language modeling. Our efforts extend to back-end inference problems such as natural language understanding, topic modeling, user prediction, and interactive multi-modal human-machine interfaces.

We strive to explore these tasks by developing novel machine-learning models and seeking clever strategies to scale the resulting inference algorithms to the constraints of real-world applications. By testing different independence and modularity assumptions along the way, we hope to elucidate the inherent structure of the problem.

This work enables us to collaborate with Mitsubishi Electric's R&D labs in Japan to create innovative products. In keeping with MERL's philosophy, we make it a priority to publish our findings in major journals and conferences, and organize collaborative activities in the speech and audio community to advance the state of scientific knowledge.