Deep Unfolding for Multichannel Source Separation

We propose a new deep computational network that combines the advantages of generative models and deep networks for multichannel source separation.

Scott Wisdom (University of Washington), John R. Hershey, Jonathan Le Roux, and Shinji Watanabe (MERL, Speech & Audio).

Search MERL publications by keyword: Speech & Audio, deep unfolding, source separation, multichannel GMM, Markov random field.


Deep unfolding has recently been proposed to derive novel deep network architectures from model-based approaches. In this paper, we consider its application to multichannel source separation. We unfold a multichannel Gaussian mixture model (MCGMM), resulting in a deep MCGMM computational network that directly processes complex-valued frequency-domain multichannel audio and has an architecture defined explicitly by a generative model, thus combining the advantages of deep networks and model-based approaches. We further extend the deep MCGMM by modeling the GMM states using an MRF, whose unfolded mean-field inference updates add dynamics across layers. Experiments on source separation for multichannel mixtures of two simultaneous speakers shows that the deep MCGMM leads to improved performance with respect to the original MCGMM model.


Supplementary material

Detailed derivations (pdf)



MERL Publications

  •  Wisdom, S., Hershey, J.R., Le Roux, J., Watanabe, S., "Deep Unfolding for Multichannel Source Separation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/​ICASSP.2016.7471649, March 2016, pp. 121-125.
    BibTeX TR2016-008 PDF
    • @inproceedings{Wisdom2016mar,
    • author = {Wisdom, Scott and Hershey, John R. and {Le Roux}, Jonathan and Watanabe, Shinji},
    • title = {{Deep Unfolding for Multichannel Source Separation}},
    • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
    • year = 2016,
    • pages = {121--125},
    • month = mar,
    • doi = {10.1109/ICASSP.2016.7471649},
    • url = {https://www.merl.com/publications/TR2016-008}
    • }
  •  Le Roux, J., Hershey, J.R., Weninger, F.J., "Deep NMF for Speech Separation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/​ICASSP.2015.7177933, April 2015, pp. 66-70.
    BibTeX TR2015-029 PDF
    • @inproceedings{LeRoux2015apr1,
    • author = {{Le Roux}, J. and Hershey, J.R. and Weninger, F.J.},
    • title = {{Deep NMF for Speech Separation}},
    • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
    • year = 2015,
    • pages = {66--70},
    • month = apr,
    • publisher = {IEEE},
    • doi = {10.1109/ICASSP.2015.7177933},
    • url = {https://www.merl.com/publications/TR2015-029}
    • }
  •  Hershey, J.R., Le Roux, J., Weninger, F., "Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures", arXiv, August 2014.
    BibTeX arXiv
    • @article{Hershey2014aug,
    • author = {Hershey, J.R. and {Le Roux}, J. and Weninger, F.},
    • title = {{Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures}},
    • journal = {arXiv},
    • year = 2014,
    • month = aug,
    • url = {https://arxiv.org/abs/1409.2574}
    • }