TR2015-023

Sparse NMF -- half-baked or well done?


    •  Le Roux, J.; Weninger, F.J.; Hershey, J.R., "Sparse NMF -- half-baked or well done?," Tech. Rep. TR2015-023, Mitsubishi Electric Research Laboratories, March 2015.
      BibTeX Download PDF
      • @techreport{LeRoux2015mar,
      • author = {{Le Roux}, J. and Weninger, F.J. and Hershey, J.R.},
      • title = {Sparse NMF -- half-baked or well done?},
      • institution = {Mitsubishi Electric Research Laboratories},
      • year = 2015,
      • number = {TR2015-023},
      • address = {Cambridge MA, USA},
      • month = mar,
      • url = {http://www.merl.com/publications/TR2015-023}
      • }
  • MERL Contacts:
  • Research Areas:

    Multimedia, Speech & Audio


Non-negative matrix factorization (NMF) has been a popular method for modeling audio signals, in particular for single-channel source separation. An important factor in the success of NMF-based algorithms is the "quality" of the basis functions that are obtained from training data. In order to model rich signals such as speech or wide ranges of non-stationary noises, NMF typically requires using a large number of basis functions. However, without additional constraints, using a large number of bases leads to trivial solutions where the bases can indiscriminately model any signal. Two main approaches have been considered to cope with this issue: introducing sparsity on the activation coefficients, or skipping training altogether and randomly selecting basis functions as a subset of the training data ("exemplarbased NMF"). Surprisingly, the sparsity route is widely regarded as leading to similar or worse results than the simple and extremely efficient (no training!) exemplar-based approach. Only a small fraction of researchers have realized that sparse NMF works well if implemented correctly. However, to our knowledge, no thorough comparison has been presented in the literature, and many researchers in the field may remain unaware of this fact. We review exemplar-based NMF as well as two versions of sparse NMF, a simplistic ad hoc one and a principled one, giving a detailed derivation of the update equations for the latter in the general case of beta divergences, and we perform a thorough comparison of the three methods on a speech separation task using the 2nd CHiME Speech Separation and Recognition Challenge dataset. Results show that, contrary to a popular belief in the community, learning basis functions using NMF with sparsity, if done the right way, leads to significant gains in source-to-distortion ratio with respect to both exemplar-based NMF and the ad hoc implementation of sparse NMF.