TR2022-115

Heterogeneous Target Speech Separation


    •  Tzinis, E., Wichern, G., Subramanian, A.S., Smaragdis, P., Le Roux, J., "Heterogeneous Target Speech Separation", Interspeech, DOI: 10.21437/​Interspeech.2022-10717, September 2022, pp. 1796-1800.
      BibTeX TR2022-115 PDF
      • @inproceedings{Tzinis2022sep,
      • author = {Tzinis, Efthymios and Wichern, Gordon and Subramanian, Aswin Shanmugam and Smaragdis, Paris and Le Roux, Jonathan},
      • title = {Heterogeneous Target Speech Separation},
      • booktitle = {Interspeech},
      • year = 2022,
      • pages = {1796--1800},
      • month = sep,
      • doi = {10.21437/Interspeech.2022-10717},
      • url = {https://www.merl.com/publications/TR2022-115}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

Heterogeneous Target Speech Separation Efthymios Tzinis1,2, Gordon Wichern1, Aswin Subramanian1, Paris Smaragdis2, Jonathan Le Roux1 1Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA, USA 2University of Illinois at Urbana-Champaign, Urbana, IL, USA {etzinis2,paris}@illinois.edu,{wichern,subramanian,leroux} @merl.com Abstract We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e.g., loudness, gen- der, language, spatial location, etc). Our proposed heteroge- neous separation framework can seamlessly leverage datasets with large distribution shifts and learn cross-domain represen- tations under a variety of concepts used as conditioning. Our ex- periments show that training separation models with heteroge- neous conditions facilitates the generalization to new concepts with unseen out-of-domain data while also performing substan- tially higher than single-domain specialist models. Notably, such training leads to more robust learning of new harder source separation discriminative concepts and can yield improvements over permutation invariant training with oracle source selection. We analyze the intrinsic behavior of source separation train- ing with heterogeneous metadata and propose ways to alleviate emerging problems with challenging separation conditions. We release the collection of preparation recipes for all datasets used to further promote research towards this challenging task.