News & Events

48 MERL Events and MERL Talks found.


  •  TALK    Efficiently sampling wave fields
    Date & Time: Thursday, October 17, 2013; 12:00 PM
    Speaker: Prof. Laurent Daudet, Paris Diderot University, France
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
    Abstract
    • In acoustics, one may wish to acquire a wavefield over a whole spatial domain, while we can only make point measurements (ie, with microphones). Even with few sources, this remains a difficult problem because of reverberation, which can be hard to characterize. This can be seen as a sampling / interpolation problem, and it raises a number of interesting questions: how many sample points are needed, where to choose the sampling points, etc. In this presentation, we will review some case studies, in 2D (vibrating plates) and 3D (room acoustics), with numerical and experimental data, where we have developed sparse models, possibly with additional 'structures', based on a physical modeling of the acoustic field. These type of models are well suited to reconstruction techniques known as compressed sensing. These principles can also be used for sub-nyquist optical imaging : we will show preliminary experimental results of a new compressive imager, remarkably simple in its principle, using a multiply scattering medium.
  •  
  •  EVENT    CHiME 2013 - The 2nd International Workshop on Machine Listening in Multisource Environments
    Date & Time: Saturday, June 1, 2013; 9:00 AM - 6:00 PM
    Location: Vancouver, Canada
    MERL Contact: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • MERL researchers Shinji Watanabe and Jonathan Le Roux are members of the organizing committee of CHiME 2013, the 2nd International Workshop on Machine Listening in Multisource Environments, Jonathan acting as Program Co-Chair. MERL is also a sponsor for the event.

      CHiME 2013 is a one-day workshop to be held in conjunction with ICASSP 2013 that will consider the challenge of developing machine listening applications for operation in multisource environments, i.e. real-world conditions with acoustic clutter, where the number and nature of the sound sources is unknown and changing over time. CHiME brings together researchers from a broad range of disciplines (computational hearing, blind source separation, speech recognition, machine learning) to discuss novel and established approaches to this problem. The cross-fertilisation of ideas will foster fresh approaches that efficiently combine the complementary strengths of each research field.
  •  
  •  EVENT    ICASSP 2013 - Student Career Luncheon
    Date & Time: Thursday, May 30, 2013; 12:30 PM - 2:30 PM
    Location: Vancouver, Canada
    MERL Contacts: Anthony Vetro; Petros T. Boufounos; Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • MERL is a sponsor for the first ICASSP Student Career Luncheon that will take place at ICASSP 2013. MERL members will take part in the event to introduce MERL and talk with students interested in positions or internships.
  •  
  •  TALK    Practical kernel methods for automatic speech recognition
    Date & Time: Tuesday, May 7, 2013; 2:30 PM
    Speaker: Dr. Yotaro Kubo, NTT Communication Science Laboratories, Kyoto, Japan
    Research Area: Speech & Audio
    Abstract
    • Kernel methods are important to realize both convexity in estimation and ability to represent nonlinear classification. However, in automatic speech recognition fields, kernel methods are not widely used conventionally. In this presentation, I will introduce several attempts to practically incorporate kernel methods into acoustic models for automatic speech recognition. The presentation will consist of two parts. The first part will describes maximum entropy discrimination and its application to a kernel machine training. The second part will describes dimensionality reduction of kernel-based features.
  •  
  •  TALK    Probabilistic Latent Tensor Factorisation
    Date & Time: Tuesday, February 26, 2013; 12:00 PM
    Speaker: Prof. Taylan Cemgil, Bogazici University, Istanbul, Turkey
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
    Abstract
    • Algorithms for decompositions of matrices are of central importance in machine learning, signal processing and information retrieval, with SVD and NMF (Nonnegative Matrix Factorisation) being the most widely used examples. Probabilistic interpretations of matrix factorisation models are also well known and are useful in many applications (Salakhutdinov and Mnih 2008; Cemgil 2009; Fevotte et. al. 2009). In the recent years, decompositions of multiway arrays, known as tensor factorisations have gained significant popularity for the analysis of large data sets with more than two entities (Kolda and Bader, 2009; Cichocki et. al. 2008). We will discuss a subset of these models from a statistical modelling perspective, building upon probabilistic Bayesian generative models and generalised linear models (McCulloch and Nelder). In both views, the factorisation is implicit in a well-defined hierarchical statistical model and factorisations can be computed via maximum likelihood.

      We express a tensor factorisation model using a factor graph and the factor tensors are optimised iteratively. In each iteration, the update equation can be implemented by a message passing algorithm, reminiscent to variable elimination in a discrete graphical model. This setting provides a structured and efficient approach that enables very easy development of application specific custom models, as well as algorithms for the so called coupled (collective) factorisations where an arbitrary set of tensors are factorised simultaneously with shared factors. Extensions to full Bayesian inference for model selection, via variational approximations or MCMC are also feasible. Well known models of multiway analysis such as Nonnegative Matrix Factorisation (NMF), Parafac, Tucker, and audio processing (Convolutive NMF, NMF2D, SF-SSNTF) appear as special cases and new extensions can easily be developed. We will illustrate the approach with applications in link prediction and audio and music processing.
  •  
  •  TALK    Bayesian Group Sparse Learning
    Date & Time: Monday, January 28, 2013; 11:00 AM
    Speaker: Prof. Jen-Tzung Chien, National Chiao Tung University, Taiwan
    Research Area: Speech & Audio
    Abstract
    • Bayesian learning provides attractive tools to model, analyze, search, recognize and understand real-world data. In this talk, I will introduce a new Bayesian group sparse learning and its application on speech recognition and signal separation. First of all, I present the group sparse hidden Markov models (GS-HMMs) where a sequence of acoustic features is driven by Markov chain and each feature vector is represented by two groups of basis vectors. The features across states and within states are represented accordingly. The sparse prior is imposed by introducing the Laplacian scale mixture (LSM) distribution. The robustness of speech recognition is illustrated. On the other hand, the LSM distribution is also incorporated into Bayesian group sparse learning based on the nonnegative matrix factorization (NMF). This approach is developed to estimate the reconstructed rhythmic and harmonic music signals from single-channel source signal. The Monte Carlo procedure is presented to infer two groups of parameters. The future work of Bayesian learning shall be discussed.
  •  
  •  TALK    Speech recognition for closed-captioning
    Date & Time: Tuesday, December 11, 2012; 12:00 PM
    Speaker: Takahiro Oku, NHK Science & Technology Research Laboratories
    Research Area: Speech & Audio
    Abstract
    • In this talk, I will present human-friendly broadcasting research conducted in NHK and research on speech recognition for real-time closed-captioning. The goal of human-friendly broadcasting research is to make broadcasting more accessible and enjoyable for everyone, including children, elderly, and physically challenged persons. The automatic speech recognition technology that NHK has developed makes it possible to create captions for the hearing impaired in real-time automatically. For sports programs such as professional sumo wrestling, a closed-captioning system has already been implemented in which captions are created by using speech recognition on a captioning re-speaker. In 2011, NHK General Television started broadcasting of closed captions for the information program "Morning Market". After the introduction of the implemented closed-captioning system, I will talk about our recent improvement obtained by an adaptation method that creates a more effective acoustic model using error correction results. The method reflects recognition error tendencies more effectively.
  •  
  •  TALK    Understanding Audition via Sound Analysis and Synthesis
    Date & Time: Wednesday, October 24, 2012; 11:45 AM
    Speaker: Josh McDermott, MIT, BCS
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  TALK    Recognizing and Classifying Environmental Sounds
    Date & Time: Wednesday, October 24, 2012; 11:00 AM
    Speaker: Prof. Dan Ellis, Columbia University
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  EVENT    SANE 2012 - Speech and Audio in the Northeast
    Date & Time: Wednesday, October 24, 2012; 8:30 AM - 5:00 PM
    Location: MERL
    MERL Contact: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • SANE 2012, a one-day event gathering researchers and students in speech and audio from the northeast of the American continent, will be held on Wednesday October 24, 2012 at Mitsubishi Electric Research Laboratories (MERL) in Cambridge, MA.
  •  
  •  TALK    Self-Organizing Units (SOUs): Training Speech Recognizers Without Any Transcribed Audio
    Date & Time: Wednesday, October 24, 2012; 2:15 PM
    Speaker: Dr. Herb Gish, BBN - Raytheon
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  TALK    A new class of dynamical system models for speech and audio
    Date & Time: Wednesday, October 24, 2012; 4:05 PM
    Speaker: Dr. John R. Hershey, MERL
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  TALK    Factorial Hidden Restricted Boltzmann Machines for Noise Robust Speech Recognition
    Date & Time: Wednesday, October 24, 2012; 3:20 PM
    Speaker: Dr. Steven J. Rennie, IBM Research
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  TALK    Advances in Acoustic Modeling at IBM Research: Deep Belief Networks, Sparse Representations
    Date & Time: Wednesday, October 24, 2012; 9:55 AM
    Speaker: Dr. Tara Sainath, IBM Research
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  TALK    Zero-Resource Speech Pattern and Sub-Word Unit Discovery
    Date & Time: Wednesday, October 24, 2012; 9:10 AM
    Speaker: Prof. Jim Glass and Chia-ying Lee, MIT CSAIL
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  TALK    Latent Topic Modeling of Conversational Speech
    Date & Time: Wednesday, October 24, 2012; 1:30 PM
    Speaker: Dr. Timothy J. Hazen and David Harwath, MIT Lincoln Labs / MIT CSAIL
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  TALK    Non-negative Hidden Markov Modeling of Audio
    Date & Time: Thursday, October 11, 2012; 2:30 PM
    Speaker: Dr. Gautham J. Mysore, Adobe
    Research Area: Speech & Audio
    Abstract
    • Non-negative spectrogram factorization techniques have become quite popular in the last decade as they are effective in modeling the spectral structure of audio. They have been extensively used for applications such as source separation and denoising. These techniques however fail to account for non-stationarity and temporal dynamics, which are two important properties of audio. In this talk, I will introduce the non-negative hidden Markov model (N-HMM) and the non-negative factorial hidden Markov model (N-FHMM) to model single sound sources and sound mixtures respectively. They jointly model the spectral structure and temporal dynamics of sound sources, while accounting for non-stationarity. I will also discuss the application of these models to various applications such as source separation, denoising, and content based audio processing, showing why they yield improved performance when compared to non-negative spectrogram factorization techniques.
  •  
  •  TALK    Tensor representation of speaker space for arbitrary speaker conversion
    Date & Time: Thursday, September 6, 2012; 12:00 PM
    Speaker: Dr. Daisuke Saito, The University of Tokyo
    Research Area: Speech & Audio
    Abstract
    • In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this talk, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
  •  
  •  TALK    Learning Intermediate-Level Representations of Form and Motion from Natural Movies
    Date & Time: Wednesday, February 22, 2012; 11:00 AM
    Speaker: Dr. Charles Cadieu, McGovern Institute for Brain Research, MIT
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
    Abstract
    • The human visual system processes complex patterns of light into a rich visual representation where the objects and motions of our world are made explicit. This remarkable feat is performed through a hierarchically arranged series of cortical areas. Little is known about the details of the representations in the intermediate visual areas. Therefore, we ask the question: can we predict the detailed structure of the representations we might find in intermediate visual areas?

      In pursuit of this question, I will present a model of intermediate-level visual representation that is based on learning invariances from movies of the natural environment and produces predictions about intermediate visual areas. The model is composed of two stages of processing: an early feature representation layer, and a second layer in which invariances are explicitly represented. Invariances are learned as the result of factoring apart the temporally stable and dynamic components embedded in the early feature representation. The structure contained in these components is made explicit in the activities of second-layer units that capture invariances in both form and motion. When trained on natural movies, the first-layer produces a factorization, or separation, of image content into a temporally persistent part representing local edge structure and a dynamic part representing local motion structure. The second-layer units are split into two populations according to the factorization in the first-layer. The form-selective units receive their input from the temporally persistent part (local edge structure) and after training result in a diverse set of higher-order shape features consisting of extended contours, multi-scale edges, textures, and texture boundaries. The motion-selective units receive their input from the dynamic part (local motion structure) and after training result in a representation of image translation over different spatial scales and directions, in addition to more complex deformations. These representations provide a rich description of dynamic natural images, provide testable hypotheses regarding intermediate-level representation in visual cortex, and may be useful representations for artificial visual systems.
  •  
  •  TALK    Auxiliary Function Approach to Source Localization and Separation
    Date & Time: Thursday, October 20, 2011; 3:40 PM
    Speaker: Prof. Nobutaka Ono, National Institute of Informatics, Tokyo
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  EVENT    Audio and Music Signal Processing Mini-Symposium
    Date & Time: Thursday, October 20, 2011; 2:00 PM -5:00 PM
    Location: MERL
    MERL Contact: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • MERL is hosting a mini-symposium on audio and music signal processing, with three talks by eminent researchers in the field: Prof. Mark Plumbley, Dr. Cedric Fevotte and Prof. Nobutaka Ono.
  •  
  •  TALK    Itakura-Saito nonnegative matrix factorization and friends for music signal decomposition
    Date & Time: Thursday, October 20, 2011; 3:00 PM
    Speaker: Dr. Cedric Fevotte, CNRS - Telecom ParisTech, Paris
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •  
  •  TALK    Analysing Digital Music
    Date & Time: Thursday, October 20, 2011; 2:20 PM
    Speaker: Prof. Mark Plumbley, Queen Mary, London
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
  •