News & Events

80 Events and Talks were found.


  •  TALK   Bayesian Group Sparse Learning
    Date & Time: Monday, January 28, 2013; 11:00 AM
    Speaker: Prof. Jen-Tzung Chien, National Chiao Tung University, Taiwan
    Research Areas: Multimedia, Speech & Audio
    Brief
    • Bayesian learning provides attractive tools to model, analyze, search, recognize and understand real-world data. In this talk, I will introduce a new Bayesian group sparse learning and its application on speech recognition and signal separation. First of all, I present the group sparse hidden Markov models (GS-HMMs) where a sequence of acoustic features is driven by Markov chain and each feature vector is represented by two groups of basis vectors. The features across states and within states are represented accordingly. The sparse prior is imposed by introducing the Laplacian scale mixture (LSM) distribution. The robustness of speech recognition is illustrated. On the other hand, the LSM distribution is also incorporated into Bayesian group sparse learning based on the nonnegative matrix factorization (NMF). This approach is developed to estimate the reconstructed rhythmic and harmonic music signals from single-channel source signal. The Monte Carlo procedure is presented to infer two groups of parameters. The future work of Bayesian learning shall be discussed.
  •  
  •  TALK   Electromagnetic Remote Sensing for the Detection of Concealed Objects
    Date & Time: Thursday, December 13, 2012; 12:00 PM
    Speaker: Dr. Tomasz M. Grzegorczyk, Delpsi LLC
    MERL Host: Anthony Vetro
    Research Area: Multimedia
    Brief
    • Electromagnetic (EM) remote sensing is a well-established modality for the detection, tracking, and identification of concealed targets. The degree of freedom offered by the operating frequency (and the associated propagation or induction regimes) make EM waves sufficiently versatile to interrogate both large as well as small structures, metallic as well as dielectric objects, in close proximity or further away. This wide flexibility has made EM remote sensing a modality of choice in many applications. This presentation will focus on two implementations of non-destructive and non-contact EM sensing. The first is based on a tomographic approach, whereby EM waves are used to infer material properties within the volume of accessible structures. The two examples to be discussed are breast cancer detection, i.e. locating areas of high vascularity in otherwise healthy biological tissues, and inspection of concrete structures, i.e. identifying volumetric material property variations to locate rebars and cracks. The second area we will discuss is that of subsurface target detection, with again two very different applications. The first pertains to ground penetrating radars with frequencies in the GHz aimed at the detection of buried weak dielectric scatterers, whereas the second focuses on the detection of metallic targets in the magnetic induction regime, for which much lower frequencies are used. In all these applications, the data collected by the appropriate hardwares are processed by combining fundamental EM concepts with inverse methods for parameter estimation. We will discuss both a deterministic method -- Gauss-Newton -- and a stochastic method -- Kalman filters for real time target detection.
  •  
  •  TALK   Speech recognition for closed-captioning
    Date & Time: Tuesday, December 11, 2012; 12:00 PM
    Speaker: Takahiro Oku, NHK Science & Technology Research Laboratories
    Research Areas: Multimedia, Speech & Audio
    Brief
    • In this talk, I will present human-friendly broadcasting research conducted in NHK and research on speech recognition for real-time closed-captioning. The goal of human-friendly broadcasting research is to make broadcasting more accessible and enjoyable for everyone, including children, elderly, and physically challenged persons. The automatic speech recognition technology that NHK has developed makes it possible to create captions for the hearing impaired in real-time automatically. For sports programs such as professional sumo wrestling, a closed-captioning system has already been implemented in which captions are created by using speech recognition on a captioning re-speaker. In 2011, NHK General Television started broadcasting of closed captions for the information program "Morning Market". After the introduction of the implemented closed-captioning system, I will talk about our recent improvement obtained by an adaptation method that creates a more effective acoustic model using error correction results. The method reflects recognition error tendencies more effectively.
  •  
  •  EVENT   APSIPA 2012
    Date: Monday, December 3, 2012 - Thursday, December 6, 2012
    MERL Contact: Anthony Vetro
    Location: Hollywood, CA
    Research Area: Multimedia
    Brief
    • MERL is a sponsor for APSIPA 2012, the fourth annual conference organized by Asia-Pacific Signal and Information Processing Association.
  •  
  •  TALK   Self-Organizing Units (SOUs): Training Speech Recognizers Without Any Transcribed Audio.
    Date & Time: Wednesday, October 24, 2012; 2:15 PM
    Speaker: Dr. Herb Gish, BBN - Raytheon
    MERL Host: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
  •  
  •  TALK   Zero-Resource Speech Pattern and Sub-Word Unit Discovery
    Date & Time: Wednesday, October 24, 2012; 9:10 AM
    Speaker: Prof. Jim Glass and Chia-ying Lee, MIT CSAIL
    MERL Host: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
  •  
  •  TALK   A new class of dynamical system models for speech and audio
    Date & Time: Wednesday, October 24, 2012; 4:05 PM
    Speaker: Dr. John R. Hershey, MERL
    MERL Host: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
  •  
  •  EVENT   SANE 2012 - Speech and Audio in the Northeast
    Date & Time: Wednesday, October 24, 2012; 8:30 AM - 5:00 PM
    MERL Contact: Jonathan Le Roux
    Location: MERL
    Research Areas: Multimedia, Speech & Audio
    Brief
    • SANE 2012, a one-day event gathering researchers and students in speech and audio from the northeast of the American continent, will be held on Wednesday October 24, 2012 at Mitsubishi Electric Research Laboratories (MERL) in Cambridge, MA.
  •  
  •  TALK   Factorial Hidden Restricted Boltzmann Machines for Noise Robust Speech Recognition
    Date & Time: Wednesday, October 24, 2012; 3:20 PM
    Speaker: Dr. Steven J. Rennie, IBM Research
    MERL Host: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
  •  
  •  TALK   Recognizing and Classifying Environmental Sounds
    Date & Time: Wednesday, October 24, 2012; 11:00 AM
    Speaker: Prof. Dan Ellis, Columbia University
    MERL Host: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
  •  
  •  TALK   Understanding Audition via Sound Analysis and Synthesis
    Date & Time: Wednesday, October 24, 2012; 11:45 AM
    Speaker: Josh McDermott, MIT, BCS
    MERL Host: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
  •  
  •  TALK   Latent Topic Modeling of Conversational Speech
    Date & Time: Wednesday, October 24, 2012; 1:30 PM
    Speaker: Dr. Timothy J. Hazen and David Harwath, MIT Lincoln Labs / MIT CSAIL
    MERL Host: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
  •  
  •  TALK   Advances in Acoustic Modeling at IBM Research: Deep Belief Networks, Sparse Representations
    Date & Time: Wednesday, October 24, 2012; 9:55 AM
    Speaker: Dr. Tara Sainath, IBM Research
    MERL Host: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
  •  
  •  EVENT   Automotive UI 2012 - 4th International Conference on Automotive User Interfaces and Interactive Vehicular Applications
    Date: Wednesday, October 17, 2012 - Friday, October 19, 2012
    MERL Contact: Anthony Vetro
    Location: Portsmouth, NH
    Research Area: Multimedia
    Brief
    • MERL is a sponsor for the Fourth International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Automotive UI 2012.
  •  
  •  TALK   Non-negative Hidden Markov Modeling of Audio
    Date & Time: Thursday, October 11, 2012; 2:30 PM
    Speaker: Dr. Gautham J. Mysore, Adobe
    MERL Host: John Hershey
    Research Areas: Multimedia, Speech & Audio
    Brief
    • Non-negative spectrogram factorization techniques have become quite popular in the last decade as they are effective in modeling the spectral structure of audio. They have been extensively used for applications such as source separation and denoising. These techniques however fail to account for non-stationarity and temporal dynamics, which are two important properties of audio. In this talk, I will introduce the non-negative hidden Markov model (N-HMM) and the non-negative factorial hidden Markov model (N-FHMM) to model single sound sources and sound mixtures respectively. They jointly model the spectral structure and temporal dynamics of sound sources, while accounting for non-stationarity. I will also discuss the application of these models to various applications such as source separation, denoising, and content based audio processing, showing why they yield improved performance when compared to non-negative spectrogram factorization techniques.
  •  
  •  EVENT   ICIP 2012 - IEEE International Conference on Image Processing
    Date: Sunday, September 30, 2012 - Wednesday, October 3, 2012
    MERL Contact: Anthony Vetro
    Location: Orlando, FL
    Research Area: Multimedia
    Brief
    • Anthony Vetro is the Industrial Co-chair of ICIP 2012, the IEEE International Conference on Image Processing, to be held in Orlando, Florida, in September 2012.
  •  
  •  TALK   Tensor representation of speaker space for arbitrary speaker conversion
    Date & Time: Thursday, September 6, 2012; 12:00 PM
    Speaker: Dr. Daisuke Saito, The University of Tokyo
    Research Areas: Multimedia, Speech & Audio
    Brief
    • In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this talk, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
  •  
  •  TALK   Communication Systems for Oilfield Applications
    Date & Time: Tuesday, August 7, 2012; 12:00 PM
    Speaker: Dr. Julius Kusuma, Schlumberger-Doll Research
    MERL Host: Petros Boufounos
    Research Area: Multimedia
    Brief
    • The oilfield is a rich area for research and engineering in communication and signal processing. Communication over non-standard channels, using constrained sources, noisy environments, and limited computational and energy resources, are some of the key challenges in this domain. In this talk I will give an introduction first on the role of science and technology, in particular communication and signal processing, in the oilfield. Due to its unique role in the industry, Schlumberger has a rich variety of communication systems over EM wireless, wired, acoustic, and even fluid pressure channels.

      In this talk we give a brief tour of some of the state-of-the-art and showcase how technology has revolutionized the practice of the industry, enabling innovations such as horizontal drilling, logging-while-drilling, and well-placement. At the same time, we give a tutorial on how the lifecycle of a reservoir is managed, including imaging, drilling, logging, sampling, testing, and completing. Throughout, we will show how communication has revolutionized the practice in the industry.
  •  
  •  TALK   Nonparametric Bayesian Latent Variable Models
    Date & Time: Friday, July 27, 2012; 12:00 PM
    Speaker: Mingyuan Zhou, Duke University
    MERL Host: Dehong Liu
    Research Area: Multimedia
    Brief
    • Bayesian nonparametrics, using stochastic processes as prior distributions, is a relatively young and rapidly growing research area in statistics and machine learning. In this talk, we first briefly review completely random measures, a family of pure-jump non-negative stochastic processes that are simple to construct and amenable for posterior computation. We then present nonparametric Bayesian latent variable models based on the beta process, Bernoulli process, gamma process, Poisson process, and in particular, the negative binomial process. Specifically, for continuous data, we discuss dictionary learning with the beta-Bernoulli process and dependent hierarchical beta process, and for count data, we present the beta-negative binomial process and Poisson factor analysis. Furthermore, we discuss how the seeming disjoint count and mixture modelings can be united under the negative binomial processes framework, providing new opportunities to build mixture and hierarchical mixture models with better data fitting, more efficient inference and more flexible model constructions. We show successful applications of our nonparametric Bayesian latent variable models to image processing, topic modeling and count data analysis.
  •  
  •  TALK   Quadratic Gaussian Multiterminal Source Coding
    Date & Time: Friday, July 6, 2012; 12:00 PM
    Speaker: Zixiang Xiong, Texas A&M University
    MERL Host: Anthony Vetro
    Research Area: Multimedia
    Brief
    • Driven by a host of emerging applications, distributed source coding has assumed renewed interest in the past decade. Although the Slepian-Wolf theorem has been known for almost 40 years and progresses have been made recently on the rate region of quadratic Gaussian two-terminal source coding, finding the sum-rate bound of quadratic Gaussian multiterminal source coding with more than two terminals is still an open problem. In this talk, I'll briefly go over existing results on distributed source coding problems before describing a set of new results we obtained recently.
  •