News & Events

TALK Advances in Acoustic Modeling at IBM Research: Deep Belief Networks, Sparse Representations
Date & Time: Wednesday, October 24, 2012; 9:55 AM
Speaker: Dr. Tara Sainath, IBM Research
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Recognizing and Classifying Environmental Sounds
Date & Time: Wednesday, October 24, 2012; 11:00 AM
Speaker: Prof. Dan Ellis, Columbia University
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Understanding Audition via Sound Analysis and Synthesis
Date & Time: Wednesday, October 24, 2012; 11:45 AM
Speaker: Josh McDermott, MIT, BCS
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Self-Organizing Units (SOUs): Training Speech Recognizers Without Any Transcribed Audio
Date & Time: Wednesday, October 24, 2012; 2:15 PM
Speaker: Dr. Herb Gish, BBN - Raytheon
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
TALK Non-negative Hidden Markov Modeling of Audio
Date & Time: Thursday, October 11, 2012; 2:30 PM
Speaker: Dr. Gautham J. Mysore, Adobe
Research Area: Speech & Audio
Abstract
- Non-negative spectrogram factorization techniques have become quite popular in the last decade as they are effective in modeling the spectral structure of audio. They have been extensively used for applications such as source separation and denoising. These techniques however fail to account for non-stationarity and temporal dynamics, which are two important properties of audio. In this talk, I will introduce the non-negative hidden Markov model (N-HMM) and the non-negative factorial hidden Markov model (N-FHMM) to model single sound sources and sound mixtures respectively. They jointly model the spectral structure and temporal dynamics of sound sources, while accounting for non-stationarity. I will also discuss the application of these models to various applications such as source separation, denoising, and content based audio processing, showing why they yield improved performance when compared to non-negative spectrogram factorization techniques.
TALK Interactive Visual Analysis for Engineering Applications
Date & Time: Thursday, October 11, 2012; 12:00 PM
Speaker: Kresimir Matkovic, VRVis Research Center, Vienna
Abstract
- Increasing complexity and a large number of control parameters make the design and understanding of modern engineering systems impossible without simulation today. Advances in simulation technology and ability to run multiple simulations with different sets of parameters poses new challenges for analysis techniques. In this talk we will present our experiences in exploration and analysis of simulation ensembles realized in several projects with experts from automotive, meteorology, and medical domains. We tightly integrate simulation, numerical optimization, and interactive visual analysis in a unified framework. Our new data model supports families of curves and families of surfaces. Accompanying interactive visual analysis techniques offer new possibilities for data exploration and analysis. It is possible to start with a simple analysis, to continue with identifying hidden features, and finally to explore very complex dependencies using advanced interaction and on-the-fly data derivation and aggregation. All proposed techniques will be illustrated using a coordinated multiple views system and real-life data from various projects with scientists and engineers, including the optimization of an automotive rail injection system.
TALK Tensor representation of speaker space for arbitrary speaker conversion
Date & Time: Thursday, September 6, 2012; 12:00 PM
Speaker: Dr. Daisuke Saito, The University of Tokyo
Research Area: Speech & Audio
Abstract
- In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice Gaussian mixture model (EV-GMM) was proposed. In the EVC, similarly to speaker recognition approaches, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this talk, we revisit construction of the speaker space by introducing the tensor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the Gaussian component and the dimension of the mean vector, and the speaker space is derived by the tensor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
TALK Challenges on shape acquisition of moving object
Date & Time: Friday, August 17, 2012; 12:00 PM
Speaker: Prof. Hiroshi Kawasaki, Kagoshima University
Research Area: Computer Vision
Abstract
- In this talk, I will introduce an overview of my research projects on 3D shape acquisition of moving object. The talk mainly focuses on two parts, the first one is about our 3D shape acquisition technique using projector and camera system and the second is entire shape acquisition using multi-view pro-cam system. I also briefly cover the following topics:
  
  -- Theory of shape from coplanarity technique
  -- Texture recovery method on pro-cam system
  -- Future plan on medical application of our scanner
  
  Those researches are jointly researched by Prof. Katushi Ikeuchi (Univ. of Tokyo), Prof. Ryo Furukawa (Hiroshima city Univ) and Prof. Ryusuke Sagawa (AIST).
TALK Communication Systems for Oilfield Applications
Date & Time: Tuesday, August 7, 2012; 12:00 PM
Speaker: Dr. Julius Kusuma, Schlumberger-Doll Research
MERL Host: Petros T. Boufounos
Abstract
- The oilfield is a rich area for research and engineering in communication and signal processing. Communication over non-standard channels, using constrained sources, noisy environments, and limited computational and energy resources, are some of the key challenges in this domain. In this talk I will give an introduction first on the role of science and technology, in particular communication and signal processing, in the oilfield. Due to its unique role in the industry, Schlumberger has a rich variety of communication systems over EM wireless, wired, acoustic, and even fluid pressure channels.
  
  In this talk we give a brief tour of some of the state-of-the-art and showcase how technology has revolutionized the practice of the industry, enabling innovations such as horizontal drilling, logging-while-drilling, and well-placement. At the same time, we give a tutorial on how the lifecycle of a reservoir is managed, including imaging, drilling, logging, sampling, testing, and completing. Throughout, we will show how communication has revolutionized the practice in the industry.
TALK Feedback Particle Filter and its Applications
Date & Time: Wednesday, August 1, 2012; 12:00 PM
Speaker: Prof. Prashant Mehta, University of Illinois at Urbana-Champaign
MERL Host: Scott A. Bortoff
Abstract
- In my talk, I will present a self-contained introduction to nonlinear filtering, and describe some recent developments. Specifically, I will introduce the feedback particle filter and show how it admits an innovations error-based feedback control structure. The control is chosen so that the posterior distribution of any particle matches the posterior distribution of the true state given the observations. The subject of my talk is a new formulation of nonlinear filter (for Bayesian inference) that is based on concepts from optimal control and mean-field game theory. Nonlinear filtering is important to many applications in engineering, biology, economics, atmospheric sciences and neuroscience. Several applications will be described to illustrate the theoretical concepts.
  
  This is joint work with Tao Yang and Sean Meyn at the University of Illinois.
TALK Nonparametric Bayesian Latent Variable Models
Date & Time: Friday, July 27, 2012; 12:00 PM
Speaker: Mingyuan Zhou, Duke University
MERL Host: Dehong Liu
Abstract
- Bayesian nonparametrics, using stochastic processes as prior distributions, is a relatively young and rapidly growing research area in statistics and machine learning. In this talk, we first briefly review completely random measures, a family of pure-jump non-negative stochastic processes that are simple to construct and amenable for posterior computation. We then present nonparametric Bayesian latent variable models based on the beta process, Bernoulli process, gamma process, Poisson process, and in particular, the negative binomial process. Specifically, for continuous data, we discuss dictionary learning with the beta-Bernoulli process and dependent hierarchical beta process, and for count data, we present the beta-negative binomial process and Poisson factor analysis. Furthermore, we discuss how the seeming disjoint count and mixture modelings can be united under the negative binomial processes framework, providing new opportunities to build mixture and hierarchical mixture models with better data fitting, more efficient inference and more flexible model constructions. We show successful applications of our nonparametric Bayesian latent variable models to image processing, topic modeling and count data analysis.
TALK A Pole-Placement Approach to the Design of Robust Linear Multivariable Control Systems
Date: Thursday, July 19, 2012
Speaker: Rick Vaccaro, University of Rhode Island
MERL Host: Scott A. Bortoff
Abstract
- The ability to directly specify the closed-loop poles of a multivariable control system is a major benefit of pole-placement algorithms for calculating state-feedback and observer gains. The drawback of these algorithms is the lack of any guarantee on the stability robustness of the resulting control system. The optimal control approach for calculating state-feedback gains (LQR) has a certain guaranteed robustness, but adding an observer (i.e. Kalman filter, LQG) can result in arbitrarily poor robustness. In this talk, a new pole-placement approach is introduced for calculating state-feedback and observer gains. The new approach optimizes robustness and gives impressive results, particularly for output feedback, observer-based control systems.
TALK Threat Assessment and Semi-Autonomous Control of Manned and Unmanned Vehicles
Date & Time: Monday, July 16, 2012; 2:00 PM
Speaker: Dr. Karl Iagnemma, Director, MIT Robotic Mobility Group
MERL Host: Stefano Di Cairano
Abstract
- Operator error is a significant factor in a majority of manned and unmanned vehicle accidents. In this talk, a framework for semi-autonomous vehicle accident avoidance will be presented that has been shown to effectively mitigate collisions caused by operator error. The framework analyzes sensor data (from vision and/or LIDAR data) to identify "no go" regions in the environment, and automatically synthesize constraints on vehicle position. An optimal trajectory and associated control inputs are then found via linear or nonlinear model predictive control. The "threat" to the vehicle is quantified from various metrics computed over the optimal trajectory. A number of approaches for arbitrating between operator and control system authority, based on the predicted threat, will be discussed. Extensive simulation and experimental testing will be described for both manned and unmanned scenarios. Future directions in threat assessment and semi-autonomous control, based on the integration of vision-based sensing and active steering control, will also be discussed.
TALK Applications of Mobile Augmented Reality and Pervasive Computing in Architecture, Engineering, and Construction
Date & Time: Tuesday, July 10, 2012; 11:00 AM
Speaker: Prof Vineet Kamat, University of Michigan
Research Area: Computer Vision
Abstract
- This talk will present ongoing research at the University of Michigan Laboratory for Interactive Visualization in Engineering (LIVE) that is exploring applications of mobile pervasive computing and visualization in design, engineering, and construction. Findings from three specific research projects will be presented: Interactive Visualization of Construction Operations in Mobile Outdoor Augmented Reality; Rapid Building Damage Evaluation using Augmented Reality and Structural Simulation; and Location-Aware Contextual Information Access and Retrieval for Rapid On-Site Decision Making. In each case, the development of fundamental algorithms, their implementation as reusable and modular software, and their implementation in the engineering applications will be described.
TALK Quadratic Gaussian Multiterminal Source Coding
Date & Time: Friday, July 6, 2012; 12:00 PM
Speaker: Zixiang Xiong, Texas A&M University
MERL Host: Anthony Vetro
Abstract
- Driven by a host of emerging applications, distributed source coding has assumed renewed interest in the past decade. Although the Slepian-Wolf theorem has been known for almost 40 years and progresses have been made recently on the rate region of quadratic Gaussian two-terminal source coding, finding the sum-rate bound of quadratic Gaussian multiterminal source coding with more than two terminals is still an open problem. In this talk, I'll briefly go over existing results on distributed source coding problems before describing a set of new results we obtained recently.
TALK Sparse projections onto convex sets
Date: Tuesday, July 3, 2012
Speaker: Prof. Volkan Cevher, EPFL
MERL Host: Petros T. Boufounos
Abstract
- Many natural and man-made signals exhibit a few degrees of freedom relative to their dimension due to natural parameterizations or constraints. The inherent low-dimensional structure of such signals are mathematically modeled via combinatorial and geometric concepts, such as sparsity, unions-of-subspaces, or spectral sets, and are now revolutionizing the way we address linear inverse problems from incomplete data.
  
  In this talk, we describe a set of structured sparse models for constrained linear inverse problems that feature exact and epsilon-approximate projections in polynomial time. We pay particular attention to the sparsity models based on matroids, multi-knapsack, and clustering as well as spectrally constrained models. We then study sparse projections onto convex sets, such as the (general) simplex, and ell-1,2,inf balls. Finally, we describe a hybrid optimization framework which explicitly leverages these non-convex models along with additional convex constraints to obtain better recovery performance in compressive sensing, learn interpretable sparse densities from finite samples, and improved sparse Markowitzs portfolios with better return/cost performance.
TALK Visual 3D/4D modeling of urban places and events
Date & Time: Friday, June 29, 2012; 2:30 PM
Speaker: Prof. Marc Pollefeys, ETH Zurich and UNC Chapel Hill
Research Area: Computer Vision
Abstract
- One of the fundamental problems of computer vision is to extract 3D shape and motion from images. This can be achieved when a scene or object is observed from different viewpoints or over a period of time. First, we will discuss image-based 3D modeling and localization in large environments, e.g. urban 3D reconstruction from vehicle-borne cameras and (geo)localization from mobile-phone images. In this context, we will discuss some of the challenges an opportunities offered by symmetries of architectural structures. We will also discuss how changes in an urban environment can be detected from images, leading to the possibility to efficiently acquire 4D models. In addition to explicit 4D modeling of an event, we'll consider the possibility to perform interactive video-based rendering from casually captured videos.
TALK Toward Efficient and Robust Human Pose Estimation
Date & Time: Tuesday, June 26, 2012; 12:00 PM
Speaker: Min Sun, University of Michigan
Research Area: Computer Vision
Abstract
- Robust human pose estimation is a challenging problem in computer vision in that body part configurations are often subject to severe deformations and occlusions. Moreover, efficient pose estimation is often a desirable requirement in many applications. The trade-off between accuracy and efficiency has been explored in a large number of approaches. On the one hand, models with simple representations (like tree or star models) can be efficiently applied in pose estimation problems. However, these models are often prone to body part misclassification errors. On the other hand, models with rich representations (i.e., loopy graphical models) are theoretically more robust, but their inference complexity may increase dramatically. In this talk, we present an efficient and exact inference algorithm based on branch-and-bound to solve the human pose estimation problem on loopy graphical models. We show that our method is empirically much faster (about 74 times) than the state-of-the-art exact inference algorithm [Sontag et al. UAI'08]. By extending a state-of-the-art tree model [Sapp et al. ECCV'10] to a loopy graphical model, we show that the estimation accuracy improves for most of the body parts (especially lower arms) on popular datasets such as Buffy [Ferrari et al. CVPR'08] and Stickmen [Eichner and Ferrari BMVC'09] datasets. Our method can also be used to exactly solve most of the inference problems of Stretchable Models [Sapp et al. CVPR'11] on video sequences (which contains a few hundreds of variables) in just a few minutes. Finally, we show that the novel inference algorithm can potentially be used to solve human behavior understanding and biological computation problems.
TALK A Real-Time Algorithm for Nonlinear Model Predictive Control and Its Applications
Date & Time: Monday, June 25, 2012; 10:30 AM
Speaker: Prof. Toshiyuki Ohtsuka, Osaka University
MERL Host: Stefano Di Cairano
Abstract
- In this talk, a real-time algorithm for nonlinear model predictive control and its applications will be introduced. The continuation method is combined with an efficient linear solver GMRES to trace the time-dependent optimal solution without iterative searches. Applications of the algorithm include position control of an underactuated hovercraft, route tracking of a ship with redundant actuators, and path generation for an automobile. Automatic code generation by symbolic computation and other related topics will also be introduced.
TALK Cooperative Cuts: Coupling Edges via Submodularity
Date & Time: Thursday, April 12, 2012; 12:00 PM
Speaker: Dr. Stefanie Jegelka, UC Berkeley
Research Area: Computer Vision
Abstract
- Graph cuts that represent pairwise Markov random fields have been a popular tool in computer vision, but they have some well-known shortcomings that arise from their locality and conditional independence assumptions. We therefore extend graph cuts to "cooperative cuts", where "cooperating" graph edges incur a lower combined cost. This cooperation is modeled by submodular functions on edges. The resulting family of global energy functions includes recent models in computer vision and also new critieria which e.g. significantly improve image segmentation results for finely structured objects and for images with variation in contrast. While "minimum cooperative cut" is NP-hard, the underlying indirect submodularity and the graph structure enable efficient approximations.
  
  In the second part of the talk, I will switch topics and briefly address Hilbert space embeddings of distributions. With the kernel trick, such embeddings help generalize clustering objectives to consider higher-order moments of distributions instead of merely point locations.
TALK Control Design with Uncertain Predictions in Autonomous Systems: Theory and Practice
Date & Time: Friday, March 16, 2012; 10:00 AM
Speaker: Prof. Francesco Borrelli, UC Berkeley
MERL Host: Stefano Di Cairano
Abstract
- Forecasts will play an increasingly important role in the next generation of autonomous and semi-autonomous systems. In nominal conditions, predictions of system dynamics, human behavior and environmental envelope can be used by the control algorithm to improve safety and performance of the resulting system. However, in practice, constraint satisfaction, performance guarantees and real-time computation are challenged by the (1) growing complexity of the engineered system, (2) uncertainty in the human/machine interaction and (3) uncertainty in the environment where the system operates.
  
  In this talk I will present the theory and tools that we have developed over the past ten years for the systematic design of predictive controllers for uncertain linear and nonlinear systems. I will first provide an overview of our theoretical efforts. Then, I will focus on our recent results in addressing constraint satisfaction and real-time computation in nonlinear systems and large-scale networked systems. Throughout the talk I will use two applications to motivate our research and show the benefits of the proposed techniques: Safe Autonomous Cars and Green Intelligent Buildings.
TALK Research and Development in JSK Robotics Lab, Univ. of Tokyo
Date & Time: Thursday, March 8, 2012; 9:30 AM
Speaker: Prof. Masayuki Inaba, Professor, Director of JSK Robotics Lab<br /> Department of Creative Informatics<br /> Department of Mechano-Informatics<br /> Graduate School of Information Technology and Science<br /> The University of Tokyo
Abstract
- This talk introduces a history and ongoing activities of the research and development in JSK Robotics Lab, The University of Tokyo including hand-eye coordination in rope handling, correlation-based tracking vision, vision-based robotics, wireless remote-brained approach, whole-body behaviors on humanoids, tactile deformable devices for robot sensor suit, musculoskeletal spined humanoids, power systems for human speed and torque perfomance, learning and assistive activities on HRP2 (Japanese Humanoid Robot Project Platform) and PR2 (Willow Garages's Personal Robot Platform for Open Source Robot Operating System:ROS), common software architecture in all JSK robots, and their mother environment for inherited research and development in JSK.
TALK Learning Intermediate-Level Representations of Form and Motion from Natural Movies
Date & Time: Wednesday, February 22, 2012; 11:00 AM
Speaker: Dr. Charles Cadieu, McGovern Institute for Brain Research, MIT
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
Abstract
- The human visual system processes complex patterns of light into a rich visual representation where the objects and motions of our world are made explicit. This remarkable feat is performed through a hierarchically arranged series of cortical areas. Little is known about the details of the representations in the intermediate visual areas. Therefore, we ask the question: can we predict the detailed structure of the representations we might find in intermediate visual areas?
  
  In pursuit of this question, I will present a model of intermediate-level visual representation that is based on learning invariances from movies of the natural environment and produces predictions about intermediate visual areas. The model is composed of two stages of processing: an early feature representation layer, and a second layer in which invariances are explicitly represented. Invariances are learned as the result of factoring apart the temporally stable and dynamic components embedded in the early feature representation. The structure contained in these components is made explicit in the activities of second-layer units that capture invariances in both form and motion. When trained on natural movies, the first-layer produces a factorization, or separation, of image content into a temporally persistent part representing local edge structure and a dynamic part representing local motion structure. The second-layer units are split into two populations according to the factorization in the first-layer. The form-selective units receive their input from the temporally persistent part (local edge structure) and after training result in a diverse set of higher-order shape features consisting of extended contours, multi-scale edges, textures, and texture boundaries. The motion-selective units receive their input from the dynamic part (local motion structure) and after training result in a representation of image translation over different spatial scales and directions, in addition to more complex deformations. These representations provide a rich description of dynamic natural images, provide testable hypotheses regarding intermediate-level representation in visual cortex, and may be useful representations for artificial visual systems.
TALK User-guided 2D-to-3D Conversion
Date & Time: Tuesday, February 21, 2012; 12:00 PM
Speaker: Dimitri Androutsos, Richard Rzeszutek, Ryerson University
MERL Host: Anthony Vetro
Abstract
- The problem of converting monoscopic footage into stereoscopic or multi-view content is inherently difficult and ill-posed. On the surface, this does not appear to be the case as the problem may be summed up as, "Given single-view image or video, create one or more views as if they were taken from a different camera view." However, capturing a three-dimensional scene as a two-dimensional image is a lossy process and any information regarding the distance of objects to the camera is lost. Methods exist for extracting depth information from a monoscopic view and it is possible to obtain metrically-correct depth estimates under certain conditions. But since conversion is primarily used as a post-processing stage in film production, the user requires a degree of control over the results. This, in turn, makes it ill-posed as there is no way to know ahead of time what the user wants from the conversion. In this talk we will present the work being done at Ryerson University on user-guided 2D-to-3D conversion. In particular, we will focus on how existing image segmentation techniques may be combined to produce reasonable depth maps for conversion while still providing complete control to the user. We will also discuss how our research can be applied to both images and video without any significant alterations to our methods.
TALK Secure Computation and Interference in Networks: Performance Limits and Efficient Protocols
Date & Time: Wednesday, January 4, 2012; 12:00 PM
Speaker: Dr. Ye Wang, AgaMatrix, Inc.
Abstract
- In the field of Secure Multi-party Computation, the general objective is to design protocols that allow a group of parties to securely compute functions of their collective private data, while maintaining privacy (in that no parties reveal any more information about their personal data than necessary) and ensuring correctness (in that no parties can disrupt or influence the computation beyond the affect of changing their input data). Information theoretic approaches toward this broad problem, that provide provable (unconditional) security guarantees (even against adversaries that have unbounded computational power), have established that general computation is possible in a variety of scenarios. However, these general solutions are not always the most efficient or finely tuned to the requirements of specific problems and applications.
  
  In this talk, we will overview our work toward the development of efficient information theoretic approaches for secure multi-party computation applications within the common theme of secure computation and inference over a distributed data network. These applications include:
  
  1) private information retrieval, where the objective is to privately obtain data without revealing what was selected;
  2) secure statistical analysis, the problem of extracting statistics without revealing anything else about the underlying distributed data;
  3) secure sampling, which is the secure distributed generation of new data with a given joint distribution; and
  4) secure authentication, where the identity of a party needs to authenticated via inference on his credentials and stored registration data.
  
  Our contributions toward these applications include the following. We proposed a novel oblivious transfer protocol, applicable to private information retrieval, that trades off a small amount privacy for a drastic increase in efficiency. We leveraged a dimensionality reduction that exploits functional structure to simultaneously achieve arbitrarily high accuracy and efficiency in protocols that perform secure statistical analysis of distributed databases. Toward characterizing the region of distributions that can be securely sampled from scratch, we fully characterized the two-party scenario and provided inner and outer bounds on the multi-party scenario. Toward enabling secure distributed authentication, we proposed a two-factor secure biometric authentication system that is robust against the compromise of registered biometric data, allowing for revocability and providing resistance against cross-enrollment attacks.