News & Events

TALK Advanced Recurrent Neural Networks for Automatic Speech Recognition
Date & Time: Friday, April 29, 2016; 12:00 PM - 1:00 PM
Speaker: Yu Zhang, MIT
Research Area: Speech & Audio
Abstract
- A recurrent neural network (RNN) is a class of neural network models where connections between its neurons form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Recently the RNN-based acoustic models greatly improved automatic speech recognition (ASR) accuracy on many tasks, such as an advanced version of the RNN, which exploits a structure called long-short-term memory (LSTM). However, ASR performance with distant microphones, low resources, noisy, reverberant conditions, and on multi-talker speech are still far from satisfactory as compared to humans. To address these issues, we develop new strucute of RNNs inspired by two principles: (1) the structure follows the intuition of human speech recognition; (2) the structure is easy to optimize. The talk will go beyond basic RNNs, introduce prediction-adaptation-correction RNNs (PAC-RNNs) and highway LSTMs (HLSTMs). It studies both uni-directional and bi-direcitonal RNNs and discriminative training also applied on top the RNNs. For efficient training of such RNNs, the talk will describe two algorithms for learning their parameters in some detail: (1) Latency-Controlled bi-directional model training; and (2) Two pass forward computation for sequence training. Finally, this talk will analyze the advantages and disadvantages of different variants and propose future directions.
TALK A data-centric approach to driving behavior research: How can signal processing methods contribute to the development of autonomous driving?
Date & Time: Tuesday, March 15, 2016; 12:00 PM - 12:45 PM
Speaker: Prof. Kazuya Takeda, Nagoya University
Research Area: Speech & Audio
Abstract
- Thanks to advanced "internet of things" (IoT) technologies, situation-specific human behavior has become an area of development for practical applications involving signal processing. One important area of development of such practical applications is driving behavior research. Since 1999, I have been collecting driving behavior data in a wide range of signal modalities, including speech/sound, video, physical/physiological sensors, CAN bus, LIDAR and GNSS. The objective of this data collection is to evaluate how well signal models can represent human behavior while driving. In this talk, I would like to summarize our 10 years of study of driving behavior signal processing, which has been based on these signal corpora. In particular, statistical signal models of interactions between traffic contexts and driving behavior, i.e., stochastic driver modeling, will be discussed, in the context of risky lane change detection. I greatly look forward to discussing the scalability of such corpus-based approaches, which could be applied to almost any traffic situation.
TALK Driver's mental workload estimation based on the reflex eye movement
Date & Time: Tuesday, March 15, 2016; 12:45 PM - 1:30 PM
Speaker: Prof. Hirofumi Aoki, Nagoya University
Research Area: Speech & Audio
Abstract
- Driving requires a complex skill that is involved with the vehicle itself (e.g., speed control and instrument operation), other road users (e.g., other vehicles, pedestrians), surrounding environment, and so on. During driving, visual cues are the main source to supply information to the brain. In order to stabilize the visual information when you are moving, the eyes move to the opposite direction based on the input to the vestibular system. This involuntary eye movement is called as the vestibulo-ocular reflex (VOR) and the physiological models have been studied so far. Obinata et al. found that the VOR can be used to estimate mental workload. Since then, our research group has been developing methods to quantitatively estimate mental workload during driving by means of reflex eye movement. In this talk, I will explain the basic mechanism of the reflex eye movement and how to apply for mental workload estimation. I also introduce the latest work to combine the VOR and OKR (optokinetic reflex) models for naturalistic driving environment.
TALK Emotion Detection for Health Related Issues
Date & Time: Tuesday, February 16, 2016; 12:00 PM - 1:00 PM
Speaker: Dr. Najim Dehak, MIT
Research Area: Speech & Audio
Abstract
- Recently, there has been a great increase of interest in the field of emotion recognition based on different human modalities, such as speech, heart rate etc. Emotion recognition systems can be very useful in several areas, such as medical and telecommunications. In the medical field, identifying the emotions can be an important tool for detecting and monitoring patients with mental health disorder. In addition, the identification of the emotional state from voice provides opportunities for the development of automated dialogue system capable of producing reports to the physician based on frequent phone communication between the system and the patients. In this talk, we will describe a health related application of using emotion recognition system based on human voices in order to detect and monitor the emotion state of people.
TALK Skewness in the Passive Tracer Problem
Date & Time: Monday, November 23, 2015; 12:00 PM
Speaker: Manuchehr Aminian, University of North Carolina, Chapel Hill
Abstract
- The classic work by G.I. Taylor describes the enhanced longitudinal diffusivity of a passive tracer subjected to laminar pipe flow. Much work since then has gone into extending this result particularly in calculating the evolution of the scalar variance. However, less work has been done to describe the evolution of asymmetry in the distribution. We present the results from a modeling effort to understand how the higher moments of the tracer distribution depend on geometry based off of explicit results in the circular pipe. We do this via analysis of "channel-limiting" geometries (rectangular ducts and elliptical pipes parameterized by their aspect ratio), using both new analytical tools and Monte Carlo simulation, which have revealed a wealth of nontrivial behavior of the distributions at short and intermediate time.
TALK The Wireless Control Network: A New Approach For Control Over Networks
Date & Time: Friday, October 18, 2013; 12:00 PM
Speaker: Dr. Shreyas Sundaram, University of Waterloo
Abstract
- This talk will describe a method to stabilize a plant with a network of resource-constrained wireless nodes. As opposed to traditional networked control schemes where the nodes simply route information to and from a dedicated controller, our approach treats the network itself as the controller. Specifically, we formulate a strategy where each node repeatedly updates its state to be a linear combination of the states of neighboring nodes. We show that this causes the entire network to behave as a linear dynamical system, with sparsity constraints imposed by the network topology. We provide a numerical design procedure to determine the appropriate linear combinations for each node so that the transmissions of the nodes closest to the actuators are stabilizing. We also make connections to decentralized control theory and the concept of fixed modes to provide topological conditions under which stabilization is possible. We show that this "Wireless Control Network" requires low computational and communication overhead, simplifies transmission scheduling, and enables compositional design. We also consider the issue of security in this control scheme. Using structured system theory, we show that a certain number of malicious or misbehaving nodes can be detected and identified provided that the connectivity of the network is sufficiently high.
TALK Efficiently sampling wave fields
Date & Time: Thursday, October 17, 2013; 12:00 PM
Speaker: Prof. Laurent Daudet, Paris Diderot University, France
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
Abstract
- In acoustics, one may wish to acquire a wavefield over a whole spatial domain, while we can only make point measurements (ie, with microphones). Even with few sources, this remains a difficult problem because of reverberation, which can be hard to characterize. This can be seen as a sampling / interpolation problem, and it raises a number of interesting questions: how many sample points are needed, where to choose the sampling points, etc. In this presentation, we will review some case studies, in 2D (vibrating plates) and 3D (room acoustics), with numerical and experimental data, where we have developed sparse models, possibly with additional 'structures', based on a physical modeling of the acoustic field. These type of models are well suited to reconstruction techniques known as compressed sensing. These principles can also be used for sub-nyquist optical imaging : we will show preliminary experimental results of a new compressive imager, remarkably simple in its principle, using a multiply scattering medium.
TALK Embedded Vision R&D at Texas Instruments
Date & Time: Friday, October 4, 2013; 12:00 PM
Speaker: Dr. Goksel Dedeoglu, Texas Instruments
Research Area: Computer Vision
Abstract
- There are growing needs to accelerate computer vision algorithms on embedded processors for wide-ranging equipment including mobile phones, network cameras, robots, and automotive safety systems. In our Vision R&D group, we conduct various projects to understand how the vision requirements can be best addressed on Digital Signal Processors (DSP), where the compute bottlenecks are, and how we should evolve our hardware & software architectures to meet our customers' future needs. Towards this end, we build prototypes wherein we design and optimize embedded software for real-world application performance and robustness. In this talk, I will provide examples of vision problems that we have recently tackled.
TALK Design of Active Inputs for Set-Based Fault Diagnosis
Date & Time: Friday, September 6, 2013; 12:00 PM
Speaker: Dr. Davide M. Raimondo, University of Pavia, Italy
MERL Host: Stefano Di Cairano
Abstract
- Although there are many fault diagnosis algorithms available, there has been very little work on the design or modification of control inputs with the aim of increasing the detectability and isolability of faults. The use of such inputs has clear potential for overcoming a central difficulty in fault detection, which is to distinguish the effects of faults from those of disturbances, process uncertainties, etc. Accordingly, the use of active inputs could be a transformative technology in industry, provided that such inputs can be computed reliably and efficiently.
  This presentation discusses new methods for computing active inputs that guarantee that the input-output data of a process will be sufficient to correctly identify a fault from a given library of possible faults. This problem is inherently nonconvex and has a combinatorial dependence on the number of faults considered. To address this, a new formulation is considered, along with related approximations, that is amenable to efficient solution using standard optimization packages (e.g. CPLEX). The theoretical contributions combine ideas from reachability analysis, set-based computations, and optimization theory to exploit detailed problem structure and thereby manage the problem complexity. Comparisons with an existing method show that the proposed formulation provides a dramatic reduction in the required computational effort.
TALK Decoupling Systems By Design
Date & Time: Friday, August 23, 2013; 12:00 PM
Speaker: Dr Cornel Sultan, Virginia Tech
Abstract
- Coordinate coupling raises serious numerical, analysis, and control design problems that grow with the size of the system. On the other hand, decoupled dynamic equations facilitate all of the above processes since each equation can be treated independently. Unfortunately, due to the inherent heterogeneity typical of most practical, complex systems, these are not naturally decoupled so developing accurate enough decoupled approximations is of interest.
  
  In this talk the issue of building such accurate decoupled approximations is addressed by leveraging concepts from robust control theory. Specifically, system gains (e.g. energy gain, peak to peak gain) are used to characterize the approximation error. Then some system parameters are selected to minimize this approximation error. The advantage of using system gains is that the decoupling approximation is guaranteed to be accurate over an entire class of signals (e.g. finite energy/finite peak signals). These ideas are illustrated on linearized models of tensegrity structures which are designed to yield accurate decoupled models with respect to all signals of finite energy and finite peak. Further analysis corrects several misconceptions regarding decoupling, system properties, and control design.
TALK A Dirichlet Process Mixture Model for Clustering of Household Electricity Load Profiles
Date & Time: Tuesday, July 30, 2013; 12:00 PM
Speaker: Ramon Granell, Oxford University
MERL Host: Daniel N. Nikovski
Research Area: Data Analytics
Abstract
- We show that real electricity-use patterns can be distinguished using a Bayesian nonparametric model based on the Dirichlet Process Mixture Model. By modelling the load profiles as discrete counters we make use of the Dirichlet-Multinomial distribution. Clusters are computed with the Chinese Restaurant Process method and posterior probabilities distributions estimated with a Gibbs sampling algorithm.
TALK Topics in Intelligent Building Systems Control
Date & Time: Tuesday, July 23, 2013; 12:00 PM
Speaker: Dr. Sandipan Mishra, Renssealer Polytechnic Institute
MERL Host: Stefano Di Cairano
Abstract
- This talk will present the breadth of research activities in the Intelligent Systems, Automation & Control Laboratory at Rensselaer Polytechnic Institute, ranging from building systems control to additive manufacturing and adaptive optics. In particular, we will focus on the modeling and control design paradigms for intelligent building systems and smart LED lighting systems. Since building systems have substantial variability of occupancy, usage, ambient environment, and physical properties over time, strategies for "model-free" control algorithms for building temperature control will be illustrated. The seminar will also discuss the state-of-the-art in feedback control of lighting systems and demonstrate the efficacy of distributed control and consensus type algorithms for these large-scale lighting systems. Finally, some interesting examples of bio-inspired estimation from blurry images for adaptive optics will be presented.
TALK Challenges in Model-Based System Engineering: Past, Present and Future
Date & Time: Tuesday, July 16, 2013; 12:00 PM
Speaker: Dr. Michael Tiller, Xogeny
Abstract
- Model-based System Engineering has been recognized, for some time, as a way for companies to improve their product development processes. However, change takes time in engineering and we still have only scratched the surface of what is possible. New ideas and technologies are constantly emerging that can improve a model-based approach. In this talk, I will discuss some of my past experiences with model-based system engineering in the automotive industry. I'll also discuss the shifts I see from numerical approaches to more symbolic approaches and how this manifests itself in a shift from imperative representations of engineering models to more declarative ones. I'll cover some of the interesting challenges I've seen trying to model automotive systems and how I think those challenges can be overcome moving forward. Finally, I'll talk about some of the exciting possibilities I see on the horizon for modeling.
TALK On distributed conflict resolution at road intersections
Date & Time: Wednesday, June 26, 2013; 12:00 PM
Speaker: Gabriel Rodrigues de Campos, Chalmers University
Abstract
- In this talk, we consider a scenario where several vehicles have to coordinate among them in order to cross a traffic intersection. Thus, the control problem relies on the optimization of global cost function while guaranteeing collision avoidance and the satisfaction of local constraints. We propose a decentralized solution, where vehicles sequentially solve local optimization problems allowing them to cross, in a safe way, the intersection. Such approach pays a special attention to how quantify the degrees of freedom that each vehicle disposes to avoid a potential collision and lead to an adequate formalism in which collision avoidance is enforced through local state constraints at given time instants. Finally, simulations results on the efficiency, performance and optimality of the proposed approach are presented at the end of the talk.
TALK Holistic Models for Visual Perception in Autonomous Systems
Date & Time: Thursday, May 23, 2013; 12:00 PM
Speaker: Prof. Raquel Urtasun, TTI-Chicago
Research Area: Computer Vision
Abstract
- The development of autonomous systems that can effectively assist people with everyday tasks is one of the grand challenges in modern computer science. Notable examples are personal robotics for the elderly and people with disabilities, as well as autonomous driving systems which can help decrease fatalities caused by traffic accidents. To achieve full autonomy, multiple perception tasks must be solved: Autonomous systems should sense the environment, recognize the 3D world and interact with it. While most approaches have tackled individual perceptual components in isolation, I believe that the next generation of perceptual systems should reason jointly about multiple tasks.
  
  In this talk I'll argue that there are four key aspects towards developing such holistic models: (i) learning, (ii) inference (iii) representation, and (iv) data. I'll describe efficient Markov random field learning and inference algorithms that exploit both the structure of the problem as well as parallel computation to achieve computational and memory efficiency. I'll demonstrate the effectiveness of our models on a wide variety of examples, and show representations and inference strategies that allow us to achieve state-of-the-art performance and result in several orders of magnitude speed-ups in a variety of challenging tasks, including 3D reconstruction, 3D layout parsing, object detection, semantic segmentation and free text exploitation for holistic visual recognition.
TALK Application of Multi-scale Modeling and Approximation to Design Optimization of Heat Exchangers
Date & Time: Wednesday, May 8, 2013; 12:00 PM
Speaker: Vikrant Aute, University of Maryland
MERL Host: Christopher R. Laughman
Research Area: Data Analytics
Abstract
- Heat exchangers are a key component in any air-conditioning, heat pumping and refrigeration system. These heat exchangers (aka evaporators, condensers, indoor units, outdoor units) not only contribute significantly to the total cost of the system but also contain the most refrigerant charge. There is a continued interest in improving the designs of heat exchangers and making them more compact while reducing the cost. Compact heat exchangers help improve system performance, reduce power consumption and lower the first costs. Due to the lower internal volume, they hold lower refrigerant charge which in turn results in lower environmental impact.
  
  In the simulation based design and optimization of compact heat exchangers, there are two main challenges. The first challenge arises from the use of computationally expensive analysis tools such as Computational Fluid Dynamics (CFD). The second challenge is the effect of scales. The use of CFD tools can make the optimization infeasible due to computing and engineering resource limitations. Furthermore, during CFD analysis, certain simplifications are made to the computational domain such as simulating a small periodic segment of a given heat transfer surface. In this talk, three technologies are introduced that assist in addressing these issues. These technologies are (1) Approximation Assisted Optimization, (2) Parallel Parameterized CFD, and (3) Multi-scale modeling of heat exchangers. These technologies together help reduce the computational effort by more than 90% and engineering time by more than 50%. Two real world applications focusing on air-to-refrigerant and liquid-to-refrigerant heat exchangers will be discussed, that demonstrate the application of these technologies.
TALK Practical kernel methods for automatic speech recognition
Date & Time: Tuesday, May 7, 2013; 2:30 PM
Speaker: Dr. Yotaro Kubo, NTT Communication Science Laboratories, Kyoto, Japan
Research Area: Speech & Audio
Abstract
- Kernel methods are important to realize both convexity in estimation and ability to represent nonlinear classification. However, in automatic speech recognition fields, kernel methods are not widely used conventionally. In this presentation, I will introduce several attempts to practically incorporate kernel methods into acoustic models for automatic speech recognition. The presentation will consist of two parts. The first part will describes maximum entropy discrimination and its application to a kernel machine training. The second part will describes dimensionality reduction of kernel-based features.
TALK Visual Signal Analysis and Compression: Focus on Texture Similarity
Date & Time: Friday, May 3, 2013; 12:00 PM
Speaker: Prof. Thrasyvoulos N. Pappas, Northwestern University
MERL Host: Anthony Vetro
Abstract
- Texture is an important visual attribute both for human perception and image analysis systems. We present new structural texture similarity metrics and applications that critically depend on such metrics, with
  emphasis on image compression and content-based retrieval. The new metrics account for human visual perception and the stochastic nature of textures. They rely entirely on local image statistics and allow substantial point-by-point deviations between textures that according to human judgment are similar or essentially identical.
  
  We also present new testing procedures for objective texture similarity metrics. We identify three operating domains for evaluating the performance of such similarity metrics: the top of the similarity scale, where a monotonic relationship between metric values and subjective scores is desired; the ability to distinguish between perceptually similar and dissimilar textures; and the ability to retrieve "identical" textures. Each domain has different performance goals and requires different testing procedures. Experimental results similarity metrics demonstrate both the performance of the proposed metrics and the effectiveness of the proposed subjective testing procedures.
TALK Anomaly Detection in Very Large Graphs: Modeling and Computational Considerations
Date & Time: Thursday, May 2, 2013; 12:00 PM
Speaker: Ben Miller, MIT
Abstract
- Graph theory provides an intuitive mathematical foundation for dealing with relational data, but there are numerous computational challenges in the detection of interesting behavior within small subsets of vertices, especially as the graphs grow larger and the behavior becomes more subtle. This presentation discusses computational considerations of a residuals-based subgraph detection framework, including the implications on inference with recent statistical models. We also present scaling properties, demonstrating analysis of a billion-vertex graph using commodity hardware.
TALK Managing Global Innovation
Date & Time: Tuesday, April 23, 2013; 12:00 PM
Speaker: Prof. Joe Santos, MIT Sloan
Abstract
- A "local innovation" and a "global innovation" should not be distinct because of their use or market (which could be universal or worldwide in both cases) but rather because of where they came to be: a "global innovation" is an innovation from the World; a "local innovation" is an innovation from one place. Most innovations around us, be it product innovations, technology or process innovations, and business model or strategy innovations, are "local". I will argue that as the World become more global, the likelihood and value of "local innovations" will diminish and that "global innovations" are fast becoming more relevant in shaping company performance. But "global innovations", unlike "local innovations", do not just occur through some mix of creativity, serendipity and entrepreneurship. The process of "global innovation" must be managed -- and this applies particularly to breakthrough innovations. My presentation demonstrates such propositions and covers the critical challenges faced by those who manage global innovation. I will also present some solutions from our research on this matter over the last fifteen years or so.
TALK Signal Processing on Graphs: Theory and Applications
Date & Time: Thursday, March 21, 2013; 12:00 PM
Speaker: Prof. Antonio Ortega, University of Southern California
MERL Host: Anthony Vetro
Abstract
- Graphs have long been used in a wide variety of problems, such analysis of social networks, machine learning, network protocol optimization, decoding of LDPCs or image processing. Techniques based on spectral graph theory provide a "frequency" interpretation of graph data and have proven to be quite popular in multiple applications.
  
  In the last few years, a growing amount of work has started extending and complementing spectral graph techniques, leading to the emergence of "Graph Signal Processing" as a broad research field. A common characteristic of this recent work is that it considers the data attached to the vertices as a "graph-signal" and seeks to create new techniques (filtering, sampling, interpolation), similar to those commonly used in conventional signal processing (for audio, images or video), so that they can be applied to these graph signals.
  
  In this talk, we first introduce some of the basic tools needed in developing new graph signal processing operations. We then introduce our design of wavelet filterbanks of graphs, which for the first time provides a multi-resolution, critically-sampled, frequency- and graph-localized transforms for graph signals. We conclude by providing several examples of how these new transforms and tools can be applied to existing problems. Time permitting, we will discuss applications to image processing, depth video compression, recommendation system design and network optimization.
TALK Communication/computation tradeoffs and other practical considerations in distributed convex optimization
Date & Time: Thursday, March 21, 2013; 12:00 PM
Speaker: Konstantinos Tsianos, McGill, Montreal, Canada
MERL Host: Petros T. Boufounos
Abstract
- Distributed algorithms become necessary to employ the computational resources needed for solving the large scale optimization problems that arise in areas such as machine learning,computation biology and others. We study a very general distributed setting where the data is distributed over many machines that can communicate with one another over a network that does not have any specialized communication infrastructure. In this setting the role of the network becomes critical in the performance of a distributed algorithm. From a more theoretical standpoint we discuss two questions: 1) How many nodes should we use for a given problem before communication becomes a bottleneck? and 2) How often should the nodes communicate to one another for the communication cost to be worth the transmission? In addition, we discuss some more practical issue that one needs to consider in implementing algorithms that are asynchronous and robust to communication delays.
TALK Label Propagation over Graphs
Date & Time: Friday, March 8, 2013; 12:00 PM
Speaker: Prof. Hiroshi Mamitsuka, Kyoto University
Abstract
- Semi-structured data, particularly graphs, are now abundant in molecular biology. Typical examples are protein-protein interactions, gene regulatory networks, metabolic pathways, etc. To understand cellular mechanisms from this type of data, I've been working on semi-structured data, covering a wide variety of general topics in machine learning or data mining, such as link prediction, graph clustering, frequent subgraph mining, and label propagation over graphs and so on. In this talk I will focus on label propagation, in which nodes are partially labeled and the objective is to predict unknown labels using labels and links. I'll present two approaches under two different inputs in sequence: 1) only single graph and 2) multiple graphs sharing a common node set.
  
  1) Existing methods extract features, considering either of graph smoothness or discrimination. The proposed method extracts features, considering the both two aspects, as spectral transforms. The obtained features or eigenvectors can be used to generate kernels, leading to multiple kernel learning to solve the label propagation problem efficiently.
  
  2) Existing methods estimate weights over given graphs, like selecting the most reliable graph. This framework is however unable to consider densely connected subgraphs, which we call locally informative graphs (LIGs). The proposed method first runs spectral graph partitioning over each graph to capture LIGs in eigenvectors and then an existing method of label propagation for multiple graphs is run over the entire eigenvectors.
  
  I will show empirical advantages of the two proposed methods by using both synthetic and real, biological networks.
TALK Probabilistic Latent Tensor Factorisation
Date & Time: Tuesday, February 26, 2013; 12:00 PM
Speaker: Prof. Taylan Cemgil, Bogazici University, Istanbul, Turkey
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
Abstract
- Algorithms for decompositions of matrices are of central importance in machine learning, signal processing and information retrieval, with SVD and NMF (Nonnegative Matrix Factorisation) being the most widely used examples. Probabilistic interpretations of matrix factorisation models are also well known and are useful in many applications (Salakhutdinov and Mnih 2008; Cemgil 2009; Fevotte et. al. 2009). In the recent years, decompositions of multiway arrays, known as tensor factorisations have gained significant popularity for the analysis of large data sets with more than two entities (Kolda and Bader, 2009; Cichocki et. al. 2008). We will discuss a subset of these models from a statistical modelling perspective, building upon probabilistic Bayesian generative models and generalised linear models (McCulloch and Nelder). In both views, the factorisation is implicit in a well-defined hierarchical statistical model and factorisations can be computed via maximum likelihood.
  
  We express a tensor factorisation model using a factor graph and the factor tensors are optimised iteratively. In each iteration, the update equation can be implemented by a message passing algorithm, reminiscent to variable elimination in a discrete graphical model. This setting provides a structured and efficient approach that enables very easy development of application specific custom models, as well as algorithms for the so called coupled (collective) factorisations where an arbitrary set of tensors are factorised simultaneously with shared factors. Extensions to full Bayesian inference for model selection, via variational approximations or MCMC are also feasible. Well known models of multiway analysis such as Nonnegative Matrix Factorisation (NMF), Parafac, Tucker, and audio processing (Convolutive NMF, NMF2D, SF-SSNTF) appear as special cases and new extensions can easily be developed. We will illustrate the approach with applications in link prediction and audio and music processing.
TALK Bayesian Group Sparse Learning
Date & Time: Monday, January 28, 2013; 11:00 AM
Speaker: Prof. Jen-Tzung Chien, National Chiao Tung University, Taiwan
Research Area: Speech & Audio
Abstract
- Bayesian learning provides attractive tools to model, analyze, search, recognize and understand real-world data. In this talk, I will introduce a new Bayesian group sparse learning and its application on speech recognition and signal separation. First of all, I present the group sparse hidden Markov models (GS-HMMs) where a sequence of acoustic features is driven by Markov chain and each feature vector is represented by two groups of basis vectors. The features across states and within states are represented accordingly. The sparse prior is imposed by introducing the Laplacian scale mixture (LSM) distribution. The robustness of speech recognition is illustrated. On the other hand, the LSM distribution is also incorporated into Bayesian group sparse learning based on the nonnegative matrix factorization (NMF). This approach is developed to estimate the reconstructed rhythmic and harmonic music signals from single-channel source signal. The Monte Carlo procedure is presented to infer two groups of parameters. The future work of Bayesian learning shall be discussed.