Deep NMF for Speech Separation

Non-negative matrix factorization (NMF) has been widely used for challenging single-channel audio source separation tasks. However, inference in NMF-based models relies on iterative inference methods, typically formulated as multiplicative updates. We propose "deep NMF", a novel non-negative deep network architecture which results from unfolding the NMF iterations and untying its parameters. This architecture can be discriminatively trained for optimal separation performance. To optimize its non-negative parameters, we show how a new form of back-propagation, based on multiplicative updates, can be used to preserve non-negativity, without the need for constrained optimization. We show on a challenging speech separation task that deep NMF improves in terms of accuracy upon NMF and is competitive with conventional sigmoid deep neural networks, while requiring a tenth of the number of parameters.