TR2019-014

Bootstrapping Single-Channel Source Separation via Unsupervised Spatial Clustering on Stereo Mixtures

- Seetharaman, P., Wichern, G., Le Roux, J., Pardo, B., "Bootstrapping Single-Channel Source Separation via Unsupervised Spatial Clustering on Stereo Mixtures", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP.2019.8683198, May 2019.
  BibTeX TR2019-014 PDF
  - @inproceedings{Seetharaman2019may2,
  - author = {Seetharaman, Prem and Wichern, Gordon and {Le Roux}, Jonathan and Pardo, Bryan},
  - title = {{Bootstrapping Single-Channel Source Separation via Unsupervised Spatial Clustering on Stereo Mixtures}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2019,
  - month = may,
  - doi = {10.1109/ICASSP.2019.8683198},
  - url = {https://www.merl.com/publications/TR2019-014}
  - }
MERL Contacts:
- Gordon
  Wichern
- Jonathan
  Le Roux
Research Areas:

Artificial Intelligence, Machine Learning, Speech & Audio

Abstract:

Separating an audio scene into isolated sources is a fundamental problem in computer audition, analogous to image segmentation in visual scene analysis. Source separation systems based on deep learning are currently the most successful approaches for solving the underdetermined separation problem, where there are more sources than channels. Such systems are normally trained on sound mixtures where the ground truth decomposition is already known. In this work, we use an unsupervised spatial source separation on stereo mixtures which generates initial decompositions of mixtures to train a deep learning source separation model. These estimated decompositions vary greatly in quality across the training mixtures. To overcome this, we weight the data during training using a confidence measure that assesses which mixtures or parts of mixtures are well-separated by the unsupervised algorithm. Once trained, the model can be applied to separate single-channel mixtures, where no source direction information is available. The idea is to use simple, low-level processing to separate sources in an unsupervised fashion, identify easy conditions, and then use that knowledge to bootstrap a (self-)supervised source separation model for difficult conditions. We also explore using the two approaches in an ensemble.

Related News & Events

NEWS MERL presenting 16 papers at ICASSP 2019
Date: May 12, 2019 - May 17, 2019
Where: Brighton, UK
MERL Contacts: Petros T. Boufounos; Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Tim K. Marks; Philip V. Orlik; Anthony Vetro; Pu (Perry) Wang; Gordon Wichern
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief
- MERL researchers will be presenting 16 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Brighton, UK from May 12-17, 2019. Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and parameter estimation. MERL is also a sponsor of the conference and will be participating in the student career luncheon; please join us at the lunch to learn about our internship program and career opportunities.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.

MERL Contacts:

GordonWichern

JonathanLe Roux

Research Areas:

Abstract:

Gordon
Wichern

Jonathan
Le Roux