Audio and Music Signal Processing Mini-Symposium

October 20, 2011

Mitsubishi Electric Research Labs (MERL) is hosting a mini-symposium on audio and music signal processing, with three talks by eminent researchers in the field: Prof. Mark Plumbley, Dr. Cédric Févotte and Prof. Nobutaka Ono.

  • Date: Thursday, October 20, 2011
  • Time: 2:00 PM - 5:00 PM
  • Hosts: Jonathan Le Roux, John R. Hershey (MERL Speech and Audio Team)
  • Location: Mitsubishi Electric Research Labs (MERL), 201 Broadway, 8 Floor, Cambridge, MA 02139 (MERL is situated a few minutes walk from MIT and the Kendall/MIT T station)


2:00 pm Welcome and Introduction to MERL, Dr. Kent Wittenburg (MERL)
2:20 pm "Analysing Digital Music",
Prof. Mark Plumbley (Queen Mary, London)
3:00 pm "Itakura-Saito nonnegative matrix factorization and friends for music signal decomposition",
Dr. Cédric Févotte (CNRS - Telecom ParisTech, Paris)
3:40 pm "Auxiliary Function Approach to Source Localization and Separation",
Prof. Nobutaka Ono (National Institute of Informatics, Tokyo)
Speaker: Prof. Mark Plumbley (Queen Mary, London)

Title: "Analysing Digital Music"


Although music has been "digital" since the introduction of the Compact Disc over 20 years ago, the term "Digital Music" has only recently come into widespread use, as computer and internet technologies have begun to be used to analyze, discover and deliver music and associated information to listeners. Much of this work is about finding out meaningful, semantic, information about the music track, such as the artist, instruments, genre (rock/pop/jazz), lyrics, key, notes, beats, and so on. In this talk, I will explore some of the technologies emerging in this exciting and evolving area. I will also talk about some of our work in the analysis of musical audio signals, including automatic music transcription, beat tracking, audio source separation, and sound visualization.

Speaker Biography:

Prof. Mark Plumbley is Director of the Centre for Digital Music (C4DM) at Queen Mary University of London. His research interests include the analysis of audio and music signals, including beat tracking, automatic music transcription and source separation, using techniques such as neural networks, information theory, and sparse representations. He is Principal Investigator on several current EPSRC grants, including "Information Dynamics of Music" and "Sustainable Software for Digital Music and Audio Research", and he holds an EPSRC Leadership Fellowship. He leads the UK Digital Music Research Network, is Chair of the International Independent Component Analysis (ICA) Steering Committee, and is a member of the IEEE Audio and Acoustic Signal Processing Technical Committee.

Speaker: Dr. Cédric Févotte (CNRS - Telecom ParisTech, Paris)

Title: "Itakura-Saito nonnegative matrix factorization and friends for music signal decomposition"


Other the last 10 years nonnegative matrix factorization (NMF) has become a popular unsupervised dictionary learning/adaptive data decomposition technique with applications in many fields. In particular, much research about this topic has been driven by applications in audio, where NMF has been applied with success to automatic music transcription and single channel source source separation. In this setting the nonnegative data is formed by the magnitude or power spectrogram of the sound signal and is decomposed as the product of a dictionary matrix containing elementary spectra representative of the data times an activation matrix which contains the expansion coefficients of the data frames in the dictionary.

After a general overview of NMF and a focus on majorization-minimization (MM) algorithms for NMF, the presentation will discuss model selection issues in the audio setting, pertaining to 1) the choice of time-frequency representation (essentially, magnitude or power spectrogram), and 2) the measure of fit used for the computation of the factorization. We will give arguments in support of factorizing of the power spectrogram with the Itakura-Saito (IS) divergence. In particular, IS-NMF is shown to be connected to maximum likelihood estimation of variance parameters in a well-defined statistical model of superimposed Gaussian components and this model is in turn shown to be well suited to audio.

Then presentation will briefly address variants of IS-NMF, namely IS-NMF with regularization of the activation coefficients (Markov model, group sparsity), online IS-NMF, automatic relevance determination for model order selection and multichannel IS-NMF. Audio source separation demos will be played.

Speaker Biography:

Cedric Fevotte obtained the State Engineering degree and the MSc degree in Control and Computer Science from Ecole Centrale de Nantes (France) in 2000, and then the PhD degree in 2003. As a PhD student he was with the Signal Processing Group at Institut de Recherche en Communication et Cybernetique de Nantes (IRCCyN) where he worked on time-frequency approaches to blind source separation. From 2003 to 2006 he was a research associate with the Signal Processing Laboratory at University of Cambridge (Engineering Dept) where he developed Bayesian approaches to sparse component analysis with applications to audio source separation. He was then a research engineer with the start-up company Mist-Technologies (now Audionamix) in Paris, designing mono/stereo to 5.1 surround sound upmix solutions. In Mar. 2007, he joined Telecom ParisTech, first as a research associate and then as a CNRS tenured research scientist in Nov. 2007. His research interests generally concern statistical signal processing and unsupervised machine learning and in particular applications to blind source separation and music signal processing. He is the scientific leader of project TANGERINE (Theory and applications of nonnegative matrix factorization) funded by the French research funding agency ANR.

Speaker: Prof. Nobutaka Ono (National Institute of Informatics, Tokyo)

Title: "Auxiliary Function Approach to Source Localization and Separation"


Many kinds of source localization and separation problems can be formulated as nonlinear optimization problems, and there are generally no closed-form solutions. For fast and stable calculation, effective iterative algorithms are desired. Auxiliary function technique, which can be called majorization-minimization (MM) algorithm, is one of the attractive approach for them since it can yield simple and convergence-guaranteed update rules for parameter estimation. In this talk, as a showcase of them, auxiliary-function-based algorithms for source localization and separation such as TDOA-based source localization, blind alignment of asynchronously-recorded signals, harmonic/percussive sound separation, independent component/vector analysis will be presented.

Speaker Biography:

Nobutaka Ono received the Ph.D degree in mathematical engineering and information physics from the University of Tokyo (Japan) in 2001. He worked at the Graduate School of Information Science and Technology, University of Tokyo, as a Research Associate from 2001 to 2004, and as a Lecturer from 2005 to 2010. In 2011, he joined the Principles of Informatics Research Division at the National Institute of Informatics (NII, Tokyo, Japan) as an Associate Professor. His research interests include source separation and localization, array signal processing, acoustic and music signal processing, audio coding, and machine learning. He was the Secretary of the Technical Committee of Psychological and Physiological Acoustics in Japan from 2006 to 2009. He received the Sato Prize Paper Award from Acoustic Society of Japan (ASJ) in 2000, the Igarashi Award at the Sensor Symposium on Sensors, Micromachines, and Applied Systems from the Institute of Electrical Engineers of Japan (IEEJ) in 2004, the Awaya Prize Young Researcher Award from ASJ in 2007, the Best Paper Award at the International Symposium on Industrial Electronics (ISIE) in 2008.

