TR2021-127

Convolutive Prediction for Reverberant Speech Separation

- Wang, Z.-Q., Wichern, G., Le Roux, J., "Convolutive Prediction for Reverberant Speech Separation", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), DOI: 10.1109/WASPAA52581.2021.9632667, October 2021, pp. 56-60.
  BibTeX TR2021-127 PDF
  - @inproceedings{Wang2021oct4,
  - author = {Wang, Zhong-Qiu and Wichern, Gordon and {Le Roux}, Jonathan},
  - title = {{Convolutive Prediction for Reverberant Speech Separation}},
  - booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  - year = 2021,
  - pages = {56--60},
  - month = oct,
  - publisher = {IEEE},
  - doi = {10.1109/WASPAA52581.2021.9632667},
  - url = {https://www.merl.com/publications/TR2021-127}
  - }
MERL Contacts:
- Gordon
  Wichern
- Jonathan
  Le Roux
Research Areas:

Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

We propose convolutive prediction, a novel formulation of linear prediction for speech dereverberation, and apply it to monaural and multi-microphone speaker separation in reverberant conditions. The key idea is to first use a deep neural network (DNN) to estimate the direct-path signal of each speaker, and then identify delayed and decayed copies of the estimated direct-path signal, which are likely due to reverberation. They can be directly removed for dereverberation or used as extra features for another DNN to perform better dereverberation. To identify such copies, we solve a linear regression problem per-frequency efficiently in the time-frequency domain to estimate the underlying room impulse response (RIR). In the multi-channel extension, we perform minimum variance distortionless response (MVDR) beamforming on the outputs of convolutive prediction. The beamforming and dereverberation results are used as extra features for a second DNN to perform better separation and dereverberation. State-of-the-art results are obtained on the SMS-WSJ corpus.

Related Publication

Wang, Z.-Q., Wichern, G., Le Roux, J., "Convolutive Prediction for Reverberant Speech Separation", arXiv, August 2021.

BibTeX arXiv

@article{Wang2021aug2,
author = {Wang, Zhong-Qiu and Wichern, Gordon and {Le Roux}, Jonathan},
title = {{Convolutive Prediction for Reverberant Speech Separation}},
journal = {arXiv},
year = 2021,
month = aug,
url = {https://arxiv.org/abs/2108.07194}
}

MERL Contacts:

GordonWichern

JonathanLe Roux

Research Areas:

Abstract:

Gordon
Wichern

Jonathan
Le Roux