TALK    Theory and Applications of Sparse Model-Based Recurrent Neural Networks

Date released: March 6, 2018


  •  TALK    Theory and Applications of Sparse Model-Based Recurrent Neural Networks
  • Date & Time:

    Tuesday, March 6, 2018; 12:00 PM

  • Abstract:

    Recurrent neural networks (RNNs) are effective, data-driven models for sequential data, such as audio and speech signals. However, like many deep networks, RNNs are essentially black boxes; though they are effective, their weights and architecture are not directly interpretable by practitioners. A major component of my dissertation research is explaining the success of RNNs and constructing new RNN architectures through the process of "deep unfolding," which can construct and explain deep network architectures using an equivalence to inference in statistical models. Deep unfolding yields principled initializations for training deep networks, provides insight into their effectiveness, and assists with interpretation of what these networks learn.

    In particular, I will show how RNNs with rectified linear units and residual connections are a particular deep unfolding of a sequential version of the iterative shrinkage-thresholding algorithm (ISTA), a simple and classic algorithm for solving L1-regularized least-squares. This equivalence allows interpretation of state-of-the-art unitary RNNs (uRNNs) as an unfolded sparse coding algorithm. I will also describe a new type of RNN architecture called deep recurrent nonnegative matrix factorization (DR-NMF). DR-NMF is an unfolding of a sparse NMF model of nonnegative spectrograms for audio source separation. Both of these networks outperform conventional LSTM networks while also providing interpretability for practitioners.

  • Speaker:

    Scott Wisdom
    Affectiva

    Scott Wisdom is currently a senior research scientist at Affectiva in Boston, MA. He received his Ph.D. in 2017 and M.S. in 2014 from the Department of Electrical Engineering at the University of Washington in Seattle, advised by Les Atlas and James Pitton. His primary research interests include machine learning and statistical signal processing for detection, enhancement, and classification of nonstationary time series, especially audio and speech. At the IEEE WASPAA 2017 conference, he received the best student paper award. Scott has also interned at MERL and Microsoft Research, and was a graduate student member of the Far-Field Speech team at the summer 2015 Jelinek Workshop on Speech and Language Technology.

  • MERL Host:

    Jonathan Le Roux

  • External Link:

    https://stwisdom.github.io/

  • Research Area:

    Speech & Audio