News & Events

399 were found.


  •  AWARD   CHiME 2012 Speech Separation and Recognition Challenge Best Performance
    Date: June 1, 2013
    Awarded to: Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux and John R. Hershey
    Awarded for: "Discriminative Methods for Noise Robust Speech Recognition: A CHiME Challenge Benchmark"
    Awarded by: International Workshop on Machine Listening in Multisource Environments (CHiME)
    MERL Contact: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
    Brief
    • The results of the 2nd 'CHiME' Speech Separation and Recognition Challenge are out! The team formed by MELCO researcher Yuuki Tachioka and MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John Hershey obtained the best results in the continuous speech recognition task (Track 2). This very challenging task consisted in recognizing speech corrupted by highly non-stationary noises recorded in a real living room. Our proposal, which also included a simple yet extremely efficient denoising front-end, focused on investigating and developing state-of-the-art automatic speech recognition back-end techniques: feature transformation methods, as well as discriminative training methods for acoustic and language modeling. Our system significantly outperformed other participants. Our code has since been released as an improved baseline for the community to use.
  •  
  •  NEWS   International Workshop on Machine Listening in Multisource Environments (CHiME) 2013: publication by Jonathan Le Roux, John R. Hershey, Shinji Watanabe and others
    Date: June 1, 2013
    Where: International Workshop on Machine Listening in Multisource Environments (CHiME)
    MERL Contact: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
    Brief
    • The paper "Discriminative Methods for Noise Robust Speech Recognition: A CHiME Challenge Benchmark" by Tachioka, Y., Watanabe, S., Le Roux, J. and Hershey, J.R. was presented at the International Workshop on Machine Listening in Multisource Environments (CHiME)
  •  
  •  NEWS   MERL obtains best results in the 2nd CHiME Speech Separation and Recognition Challenge
    Date: June 1, 2013
    MERL Contact: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
    Brief
    • The results of the 2nd CHiME Speech Separation and Recognition Challenge are out! The team formed by MELCO researcher Yuuki Tachioka and MERL Speech & Audio team researchers Shinji Watanabe, Jonathan Le Roux and John Hershey obtained the best results in the continuous speech recognition task (Track 2). This very challenging task consisted in recognizing speech corrupted by highly non-stationary noises recorded in a real living room. Our proposal, which also included a simple yet extremely efficient denoising front-end, focused on investigating and developing state-of-the-art automatic speech recognition back-end techniques: feature transformation methods, as well as discriminative training methods for acoustic and language modeling. Our system significantly outperformed other participants. Our code has since been released as an improved baseline for the community to use.
  •  
  •  EVENT   ICASSP 2013 - Student Career Luncheon
    Date & Time: Thursday, May 30, 2013; 12:30 PM - 2:30 PM
    MERL Contacts: Anthony Vetro; Petros Boufounos; Jonathan Le Roux
    Location: Vancouver, Canada
    Research Areas: Multimedia, Speech & Audio
    Brief
    • MERL is a sponsor for the first ICASSP Student Career Luncheon that will take place at ICASSP 2013. MERL members will take part in the event to introduce MERL and talk with students interested in positions or internships.
  •  
  •  NEWS   ICASSP 2013: 9 publications by Jonathan Le Roux, Dehong Liu, Robert A. Cohen, Dong Tian, Shantanu D. Rane, Jianlin Guo, John R. Hershey, Shinji Watanabe, Petros T. Boufounos, Zafer Sahinoglu and Anthony Vetro
    Date: May 26, 2013
    Where: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
    MERL Contacts: Dehong Liu; Jianlin Guo; Anthony Vetro; Dong Tian; Petros Boufounos; Jonathan Le Roux
    Research Areas: Multimedia, Electronics & Communications
    Brief
    • The papers "Stereo-based Feature Enhancement Using Dictionary Learning" by Watanabe, S. and Hershey, J.R., "Effectiveness of Discriminative Training and Feature Transformation for Reverberated and Noisy Speech" by Tachioka, Y., Watanabe, S. and Hershey, J.R., "Non-negative Dynamical System with Application to Speech and Audio" by Fevotte, C., Le Roux, J. and Hershey, J.R., "Source Localization in Reverberant Environments using Sparse Optimization" by Le Roux, J., Boufounos, P.T., Kang, K. and Hershey, J.R., "A Keypoint Descriptor for Alignment-Free Fingerprint Matching" by Garg, R. and Rane, S., "Transient Disturbance Detection for Power Systems with a General Likelihood Ratio Test" by Song, JX., Sahinoglu, Z. and Guo, J., "Disparity Estimation of Misaligned Images in a Scanline Optimization Framework" by Rzeszutek, R., Tian, D. and Vetro, A., "Screen Content Coding for HEVC Using Edge Modes" by Hu, S., Cohen, R.A., Vetro, A. and Kuo, C.C.J. and "Random Steerable Arrays for Synthetic Aperture Imaging" by Liu, D. and Boufounos, P.T. were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
  •  
  •  EVENT   ISCAS 2013 - IEEE International Symposium on Circuits & Systems
    Date: Sunday, May 19, 2013 - Thursday, May 23, 2013
    MERL Contact: Anthony Vetro
    Location: Beijing, China
    Research Area: Multimedia
    Brief
    • Anthony Vetro is the Demo Co-chair of ISCAS 2013, the IEEE International Symposium on Circuits & Systems, to be held in Beijing, China, in May 2013.
  •  
  •  NEWS   ISCAS 2013: publication by Anthony Vetro, Dong Tian and others
    Date: May 19, 2013
    Where: IEEE International Symposium on Circuits and Systems (ISCAS)
    MERL Contacts: Dong Tian; Anthony Vetro
    Research Areas: Multimedia, Digital Video
    Brief
    • The paper "View Synthesis Prediction Using Skip and Merge Candidates for HEVC-based 3D Video Coding" by Zou, F., Tian, D. and Vetro, A. was presented at the IEEE International Symposium on Circuits and Systems (ISCAS)
  •  
  •  NEWS   Conference on Telecommunications (Conftele) 2013: publication by Anthony Vetro, Dong Tian and others
    Date: May 8, 2013
    Where: Conference on Telecommunications (Conftele)
    MERL Contacts: Dong Tian; Anthony Vetro
    Research Areas: Multimedia, Digital Video
    Brief
    • The paper "Analysis of Depth Map Resampling Filters for Depth-based 3D Video Coding" by Graziosi, D.B., Rodrigues, N.M.M., de Faria, S.M.M., Tian, D. and Vetro, A. was presented at the Conference on Telecommunications (Conftele)
  •  
  •  TALK   Practical kernel methods for automatic speech recognition
    Date & Time: Tuesday, May 7, 2013; 2:30 PM
    Speaker: Dr. Yotaro Kubo, NTT Communication Science Laboratories, Kyoto, Japan
    Research Areas: Multimedia, Speech & Audio
    Brief
    • Kernel methods are important to realize both convexity in estimation and ability to represent nonlinear classification. However, in automatic speech recognition fields, kernel methods are not widely used conventionally. In this presentation, I will introduce several attempts to practically incorporate kernel methods into acoustic models for automatic speech recognition. The presentation will consist of two parts. The first part will describes maximum entropy discrimination and its application to a kernel machine training. The second part will describes dimensionality reduction of kernel-based features.
  •  
  •  TALK   Visual Signal Analysis and Compression: Focus on Texture Similarity
    Date & Time: Friday, May 3, 2013; 12:00 PM
    Speaker: Prof. Thrasyvoulos N. Pappas, Northwestern University
    MERL Host: Anthony Vetro
    Research Area: Multimedia
    Brief
    • Texture is an important visual attribute both for human perception and image analysis systems. We present new structural texture similarity metrics and applications that critically depend on such metrics, with
      emphasis on image compression and content-based retrieval. The new metrics account for human visual perception and the stochastic nature of textures. They rely entirely on local image statistics and allow substantial point-by-point deviations between textures that according to human judgment are similar or essentially identical.

      We also present new testing procedures for objective texture similarity metrics. We identify three operating domains for evaluating the performance of such similarity metrics: the top of the similarity scale, where a monotonic relationship between metric values and subjective scores is desired; the ability to distinguish between perceptually similar and dissimilar textures; and the ability to retrieve "identical" textures. Each domain has different performance goals and requires different testing procedures. Experimental results similarity metrics demonstrate both the performance of the proposed metrics and the effectiveness of the proposed subjective testing procedures.
  •  
  •  NEWS   ICLR 2013: publication by Jonathan Le Roux and others
    Date: May 2, 2013
    Where: International Conference on Learning Representations (ICLR)
    MERL Contact: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
    Brief
    • The paper "Block Coordinate Descent for Sparse NMF" by Potluru, V.K., Plis, S.M., Le Roux, J., Pearlmutter, B.A., Calhoun, V.D. and Hayes, T.P. was presented at the International Conference on Learning Representations (ICLR)
  •  
  •  NEWS   Emerging Technologies for 3D Video: Creation, Coding, Transmission and Rendering: publication by Anthony Vetro and others
    Date: May 1, 2013
    Where: Emerging Technologies for 3D Video: Creation, Coding, Transmission and Rendering
    MERL Contact: Anthony Vetro
    Research Areas: Multimedia, Digital Video
    Brief
    • The article "Depth Based 3D Video Formats and Coding Technology" by Vetro, A. and Muller, K. was published in the book Emerging Technologies for 3D Video: Creation, Coding, Transmission and Rendering
  •  
  •  TALK   Signal Processing on Graphs: Theory and Applications
    Date & Time: Thursday, March 21, 2013; 12:00 PM
    Speaker: Prof. Antonio Ortega, University of Southern California
    MERL Host: Anthony Vetro
    Research Area: Multimedia
    Brief
    • Graphs have long been used in a wide variety of problems, such analysis of social networks, machine learning, network protocol optimization, decoding of LDPCs or image processing. Techniques based on spectral graph theory provide a "frequency" interpretation of graph data and have proven to be quite popular in multiple applications.

      In the last few years, a growing amount of work has started extending and complementing spectral graph techniques, leading to the emergence of "Graph Signal Processing" as a broad research field. A common characteristic of this recent work is that it considers the data attached to the vertices as a "graph-signal" and seeks to create new techniques (filtering, sampling, interpolation), similar to those commonly used in conventional signal processing (for audio, images or video), so that they can be applied to these graph signals.

      In this talk, we first introduce some of the basic tools needed in developing new graph signal processing operations. We then introduce our design of wavelet filterbanks of graphs, which for the first time provides a multi-resolution, critically-sampled, frequency- and graph-localized transforms for graph signals. We conclude by providing several examples of how these new transforms and tools can be applied to existing problems. Time permitting, we will discuss applications to image processing, depth video compression, recommendation system design and network optimization.
  •  
  •  TALK   Communication/computation tradeoffs and other practical considerations in distributed convex optimization
    Date & Time: Thursday, March 21, 2013; 12:00 PM
    Speaker: Konstantinos Tsianos, McGill, Montreal, Canada
    MERL Host: Petros Boufounos
    Research Area: Multimedia
    Brief
    • Distributed algorithms become necessary to employ the computational resources needed for solving the large scale optimization problems that arise in areas such as machine learning,computation biology and others. We study a very general distributed setting where the data is distributed over many machines that can communicate with one another over a network that does not have any specialized communication infrastructure. In this setting the role of the network becomes critical in the performance of a distributed algorithm. From a more theoretical standpoint we discuss two questions: 1) How many nodes should we use for a given problem before communication becomes a bottleneck? and 2) How often should the nodes communicate to one another for the communication cost to be worth the transmission? In addition, we discuss some more practical issue that one needs to consider in implementing algorithms that are asynchronous and robust to communication delays
  •  
  •  NEWS   DCC 2013: publication by Petros T. Boufounos and Shantanu D. Rane
    Date: March 20, 2013
    Where: Data Compression Conference (DCC)
    MERL Contact: Petros Boufounos
    Research Areas: Multimedia, Computational Sensing
    Brief
    • The paper "Efficient Coding of Signal Distances Using Universal Quantized Embeddings" by Boufounos, P.T. and Rane, S. was presented at the Data Compression Conference (DCC)
  •  
  •  NEWS   IEEE Signal Processing Letters: publication by Jonathan Le Roux and others
    Date: March 1, 2013
    Where: IEEE Signal Processing Letters
    MERL Contact: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
    Brief
    • The article "Consistent Wiener Filtering for Audio Source Separation" by Le Roux, J. and Vincent, E. was published in IEEE Signal Processing Letters
  •  
  •  NEWS   Journal of Machine Learning Research (JMLR): publication by Petros T. Boufounos and others
    Date: March 1, 2013
    Where: Journal of Machine Learning Research (JMLR)
    MERL Contact: Petros Boufounos
    Research Areas: Multimedia, Computational Sensing
    Brief
    • The article "Greedy Sparsity-Constrained Optimization" by Bahmani, S., Raj, B. and Boufounos, P. was published in Journal of Machine Learning Research (JMLR)
  •  
  •  TALK   Probabilistic Latent Tensor Factorisation
    Date & Time: Tuesday, February 26, 2013; 12:00 PM
    Speaker: Prof. Taylan Cemgil, Bogazici University, Istanbul, Turkey
    MERL Host: Jonathan Le Roux
    Research Areas: Multimedia, Speech & Audio
    Brief
    • Algorithms for decompositions of matrices are of central importance in machine learning, signal processing and information retrieval, with SVD and NMF (Nonnegative Matrix Factorisation) being the most widely used examples. Probabilistic interpretations of matrix factorisation models are also well known and are useful in many applications (Salakhutdinov and Mnih 2008; Cemgil 2009; Fevotte et. al. 2009). In the recent years, decompositions of multiway arrays, known as tensor factorisations have gained significant popularity for the analysis of large data sets with more than two entities (Kolda and Bader, 2009; Cichocki et. al. 2008). We will discuss a subset of these models from a statistical modelling perspective, building upon probabilistic Bayesian generative models and generalised linear models (McCulloch and Nelder). In both views, the factorisation is implicit in a well-defined hierarchical statistical model and factorisations can be computed via maximum likelihood.

      We express a tensor factorisation model using a factor graph and the factor tensors are optimised iteratively. In each iteration, the update equation can be implemented by a message passing algorithm, reminiscent to variable elimination in a discrete graphical model. This setting provides a structured and efficient approach that enables very easy development of application specific custom models, as well as algorithms for the so called coupled (collective) factorisations where an arbitrary set of tensors are factorised simultaneously with shared factors. Extensions to full Bayesian inference for model selection, via variational approximations or MCMC are also feasible. Well known models of multiway analysis such as Nonnegative Matrix Factorisation (NMF), Parafac, Tucker, and audio processing (Convolutive NMF, NMF2D, SF-SSNTF) appear as special cases and new extensions can easily be developed. We will illustrate the approach with applications in link prediction and audio and music processing.
  •  
  •  NEWS   IEEE Signal Processing Magazine: 2 publications by Petros T. Boufounos and Shantanu D. Rane
    Date: February 13, 2013
    Where: IEEE Signal Processing Magazine
    MERL Contact: Petros Boufounos
    Research Area: Multimedia
    Brief
    • The articles "Privacy-Preserving Nearest Neighbor Methods: Comparing Signals without Revealing Them" by Rane, S. and Boufounos, P.T. and "Privacy-preserving Speech Processing: Cryptographic and String-Matching Frameworks Show Promise" by Pathak, M.A., Raj, B., Rane, S. and Samaragdis, P. were published in IEEE Signal Processing Magazine
  •  
  •  TALK   Bayesian Group Sparse Learning
    Date & Time: Monday, January 28, 2013; 11:00 AM
    Speaker: Prof. Jen-Tzung Chien, National Chiao Tung University, Taiwan
    Research Areas: Multimedia, Speech & Audio
    Brief
    • Bayesian learning provides attractive tools to model, analyze, search, recognize and understand real-world data. In this talk, I will introduce a new Bayesian group sparse learning and its application on speech recognition and signal separation. First of all, I present the group sparse hidden Markov models (GS-HMMs) where a sequence of acoustic features is driven by Markov chain and each feature vector is represented by two groups of basis vectors. The features across states and within states are represented accordingly. The sparse prior is imposed by introducing the Laplacian scale mixture (LSM) distribution. The robustness of speech recognition is illustrated. On the other hand, the LSM distribution is also incorporated into Bayesian group sparse learning based on the nonnegative matrix factorization (NMF). This approach is developed to estimate the reconstructed rhythmic and harmonic music signals from single-channel source signal. The Monte Carlo procedure is presented to infer two groups of parameters. The future work of Bayesian learning shall be discussed.
  •