News & Events

42 Events and Talks were found.




  •  EVENT   MERL 3rd Annual Open House
    Date & Time: Thursday, November 29, 2018; 4-6pm
    MERL Contacts: Marissa Deegan; Elizabeth Phillips; Jeroen van Baar; Anthony Vetro
    Location: 201 Broadway, 8th floor, Cambridge, MA
    Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio
    Brief
    • Snacks, demos, science: On Thursday 11/29, Mitsubishi Electric Research Labs (MERL) will host an open house for graduate+ students interested in internships, post-docs, and research scientist positions. The event will be held from 4-6pm and will feature demos & short presentations in our main areas of research including artificial intelligence, robotics, computer vision, speech processing, optimization, machine learning, data analytics, signal processing, communications, sensing, control and dynamical systems, as well as multi-physyical modeling and electronic devices. MERL is a high impact publication-oriented research lab with very extensive internship and university collaboration programs. Most internships lead to publication; many of our interns and staff have gone on to notable careers at MERL and in academia. Come mix with our researchers, see our state of the art technologies, and learn about our research opportunities. Dress code: casual, with resumes.

      Pre-registration for the event is strongly encouraged:
      merlopenhouse.eventbrite.com

      Current internship and employment openings:
      www.merl.com/internship/openings
      www.merl.com/employment/employment

      Information about working at MERL:
      www.merl.com/employment
  •  
  •  EVENT   SANE 2018 - Speech and Audio in the Northeast
    Date: Thursday, October 18, 2018
    MERL Contacts: Takaaki Hori; Jonathan Le Roux
    Location: Google, Cambridge, MA
    Research Area: Speech & Audio
    Brief
    • SANE 2018, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 18, 2018 at Google, in Cambridge, MA. MERL is one of the organizers and sponsors of the workshop.

      It is the 7th edition in the SANE series of workshops, which started at MERL in 2012. Since the first edition, the audience has steadily grown, with a record 180 participants in 2017.

      SANE 2018 will feature invited talks by leading researchers from the Northeast, as well as from the international community. It will also feature a lively poster session, open to both students and researchers.
  •  
  •  TALK   Theory and Applications of Sparse Model-Based Recurrent Neural Networks
    Date & Time: Tuesday, March 6, 2018; 12:00 PM
    Speaker: Scott Wisdom, Affectiva
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • Recurrent neural networks (RNNs) are effective, data-driven models for sequential data, such as audio and speech signals. However, like many deep networks, RNNs are essentially black boxes; though they are effective, their weights and architecture are not directly interpretable by practitioners. A major component of my dissertation research is explaining the success of RNNs and constructing new RNN architectures through the process of "deep unfolding," which can construct and explain deep network architectures using an equivalence to inference in statistical models. Deep unfolding yields principled initializations for training deep networks, provides insight into their effectiveness, and assists with interpretation of what these networks learn.

      In particular, I will show how RNNs with rectified linear units and residual connections are a particular deep unfolding of a sequential version of the iterative shrinkage-thresholding algorithm (ISTA), a simple and classic algorithm for solving L1-regularized least-squares. This equivalence allows interpretation of state-of-the-art unitary RNNs (uRNNs) as an unfolded sparse coding algorithm. I will also describe a new type of RNN architecture called deep recurrent nonnegative matrix factorization (DR-NMF). DR-NMF is an unfolding of a sparse NMF model of nonnegative spectrograms for audio source separation. Both of these networks outperform conventional LSTM networks while also providing interpretability for practitioners.
  •  
  •  TALK   Advances in Accelerated Computing
    Date & Time: Friday, February 2, 2018; 12:00
    Speaker: Dr. David Kaeli, Northeastern University
    MERL Host: Abraham Goldsmith
    Research Areas: Control, Optimization, Machine Learning, Speech & Audio
    Brief
    • GPU computing is alive and well! The GPU has allowed researchers to overcome a number of computational barriers in important problem domains. But still, there remain challenges to use a GPU to target more general purpose applications. GPUs achieve impressive speedups when compared to CPUs, since GPUs have a large number of compute cores and high memory bandwidth. Recent GPU performance is approaching 10 teraflops of single precision performance on a single device. In this talk we will discuss current trends with GPUs, including some advanced features that allow them exploit multi-context grains of parallelism. Further, we consider how GPUs can be treated as cloud-based resources, enabling a GPU-enabled server to deliver HPC cloud services by leveraging virtualization and collaborative filtering. Finally, we argue for for new heterogeneous workloads and discuss the role of the Heterogeneous Systems Architecture (HSA), a standard that further supports integration of the CPU and GPU into a common framework. We present a new class of benchmarks specifically tailored to evaluate the benefits of features supported in the new HSA programming model.
  •  
  •  EVENT   MERL leads organization of dialog technology challenges and associated workshop
    Date: Sunday, December 10, 2017
    MERL Contacts: Bret Harsham; Chiori Hori; Takaaki Hori
    Location: Hyatt Regency, Long Beach, CA
    Research Areas: Speech & Audio, Artificial Intelligence
    Brief
    • MERL researcher Chiori Hori led the organization of the 6th edition of the Dialog System Technology Challenges (DSTC6). This year's edition of DSTC is split into three tracks: End-to-End Goal Oriented Dialog Learning, End-to-End Conversation Modeling, and Dialogue Breakdown Detection. A total of 23 teams from all over the world competed in the various tracks, and will meet at the Hyatt Regency in Long Beach, CA, USA on December 10 to present their results at a dedicated workshop colocated with NIPS 2017.

      MERL's Speech and Audio Team and Mitsubishi Electric Corporation jointly submitted a set of systems to the End-to-End Conversation Modeling Track, obtaining the best rank among 19 submissions in terms of objective metrics.
  •  
  •  TALK   Generative Model-Based Text-to-Speech Synthesis
    Date & Time: Wednesday, February 1, 2017; 12:00-13:00
    Speaker: Dr. Heiga ZEN, Google
    MERL Host: Chiori Hori
    Research Area: Speech & Audio
    Brief
    • Recent progress in generative modeling has improved the naturalness of synthesized speech significantly. In this talk I will summarize these generative model-based approaches for speech synthesis such as WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems.
      See https://deepmind.com/blog/wavenet-generative-model-raw-audio/ for further details.
  •  
  •  EVENT   MERL organizes Workshop on End-to-End Speech and Audio Processing at NIPS 2016
    Date: Saturday, December 10, 2016
    Location: Centre Convencions Internacional Barcelona, Barcelona SPAIN
    Research Areas: Machine Learning, Speech & Audio
    Brief
    • MERL researcher John Hershey, is organizing a Workshop on End-to-End Speech and Audio Processing, on behalf of MERL's Speech and Audio team, and in collaboration with Philemon Brakel of the University of Montreal. The workshop focuses on recent advances to end-to-end deep learning methods to address alignment and structured prediction problems that naturally arise in speech and audio processing. The all day workshop takes place on Saturday, December 10th at NIPS 2016, in Barcelona, Spain.
  •  
  •  EVENT   John Hershey to present tutorial at the 2016 IEEE SLT Workshop
    Date: Tuesday, December 13, 2016
    Speaker: John Hershey, MERL
    MERL Contact: Jonathan Le Roux
    Location: 2016 IEEE Spoken Language Technology Workshop, San Diego, California
    Research Areas: Machine Learning, Speech & Audio
    Brief
    • MERL researcher John Hershey presents an invited tutorial at the 2016 IEEE Workshop on Spoken Language Technology, in San Diego, California. The topic, "developing novel deep neural network architectures from probabilistic models" stems from MERL work with collaborators Jonathan Le Roux and Shinji Watanabe, on a principled framework that seeks to improve our understanding of deep neural networks, and draws inspiration for new types of deep network from the arsenal of principles and tools developed over the years for conventional probabilistic models. The tutorial covers a range of parallel ideas in the literature that have formed a recent trend, as well as their application to speech and language.
  •  
  •  EVENT   2016 IEEE Workshop on Spoken Language Technology: Sponsored by MERL
    Date: Tuesday, December 13, 2016 - Friday, December 16, 2016
    Location: San Diego, California
    Research Area: Speech & Audio
    Brief
    • The IEEE Workshop on Spoken Language Technology is a premier international showcase for advances in spoken language technology. The theme for 2016 is "machine learning: from signal to concepts," which reflects the current excitement about end-to-end learning in speech and language processing. This year, MERL is showing its support for SLT as one of its top sponsors, along with Amazon and Microsoft.
  •  
  •  EVENT   SANE 2016 - Speech and Audio in the Northeast
    Date: Friday, October 21, 2016
    MERL Contact: Jonathan Le Roux
    Location: MIT, McGovern Institute for Brain Research, Cambridge, MA
    Research Area: Speech & Audio
    Brief
    • SANE 2016, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Friday October 21, 2016 at MIT's Brain and Cognitive Sciences Department, at the McGovern Institute for Brain Research, in Cambridge, MA.

      It is a follow-up to SANE 2012 (Mitsubishi Electric Research Labs - MERL), SANE 2013 (Columbia University), SANE 2014 (MIT CSAIL), and SANE 2015 (Google NY). Since the first edition, the audience has steadily grown, gathering 140 researchers and students in 2015.

      SANE 2016 will feature invited talks by leading researchers: Juan P. Bello (NYU), William T. Freeman (MIT/Google), Nima Mesgarani (Columbia University), DAn Ellis (Google), Shinji Watanabe (MERL), Josh McDermott (MIT), and Jesse Engel (Google). It will also feature a lively poster session during lunch time, open to both students and researchers.

      SANE 2016 is organized by Jonathan Le Roux (MERL), Josh McDermott (MIT), Jim Glass (MIT), and John R. Hershey (MERL).
  •  
  •  TALK   Speech structure and its application to speech processing -- Relational, holistic and abstract representation of speech
    Date & Time: Friday, June 3, 2016; 1:30PM - 3:00PM
    Speaker: Nobuaki Minematsu and Daisuke Saito, The University of Tokyo
    Research Area: Speech & Audio
    Brief
    • Speech signals covey various kinds of information, which are grouped into two kinds, linguistic and extra-linguistic information. Many speech applications, however, focus on only a single aspect of speech. For example, speech recognizers try to extract only word identity from speech and speaker recognizers extract only speaker identity. Here, irrelevant features are often treated as hidden or latent by applying the probability theory to a large number of samples or the irrelevant features are normalized to have quasi-standard values. In speech analysis, however, phases are usually removed, not hidden or normalized, and pitch harmonics are also removed, not hidden or normalized. The resulting speech spectrum still contains both linguistic information and extra-linguistic information. Is there any good method to remove extra-linguistic information from the spectrum? In this talk, our answer to that question is introduced, called speech structure. Extra-linguistic variation can be modeled as feature space transformation and our speech structure is based on the transform-invariance of f-divergence. This proposal was inspired by findings in classical studies of structural phonology and recent studies of developmental psychology. Speech structure has been applied to accent clustering, speech recognition, and language identification. These applications are also explained in the talk.
  •  
  •  EVENT   John Hershey Invited to Speak at Deep Learning Summit 2016 in Boston
    Date: Thursday, May 12, 2016 - Friday, May 13, 2016
    Location: Deep Learning Summit, Boston, MA
    Research Areas: Speech & Audio, Artificial Intelligence
    Brief
    • MERL Speech and Audio Senior Team Leader John Hershey is among a set of high-profile researchers invited to speak at the Deep Learning Summit 2016 in Boston on May 12-13, 2016. John will present the team's groundbreaking work on general sound separation using a novel deep learning framework called Deep Clustering. For the first time, an artificial intelligence is able to crack the half-century-old "cocktail party problem", that is, to isolate the speech of a single person from a mixture of multiple unknown speakers, as humans do when having a conversation in a loud crowd.
  •  
  •  TALK   Advanced Recurrent Neural Networks for Automatic Speech Recognition
    Date & Time: Friday, April 29, 2016; 12:00 PM - 1:00 PM
    Speaker: Yu Zhang, MIT
    Research Area: Speech & Audio
    Brief
    • A recurrent neural network (RNN) is a class of neural network models where connections between its neurons form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Recently the RNN-based acoustic models greatly improved automatic speech recognition (ASR) accuracy on many tasks, such as an advanced version of the RNN, which exploits a structure called long-short-term memory (LSTM). However, ASR performance with distant microphones, low resources, noisy, reverberant conditions, and on multi-talker speech are still far from satisfactory as compared to humans. To address these issues, we develop new strucute of RNNs inspired by two principles: (1) the structure follows the intuition of human speech recognition; (2) the structure is easy to optimize. The talk will go beyond basic RNNs, introduce prediction-adaptation-correction RNNs (PAC-RNNs) and highway LSTMs (HLSTMs). It studies both uni-directional and bi-direcitonal RNNs and discriminative training also applied on top the RNNs. For efficient training of such RNNs, the talk will describe two algorithms for learning their parameters in some detail: (1) Latency-Controlled bi-directional model training; and (2) Two pass forward computation for sequence training. Finally, this talk will analyze the advantages and disadvantages of different variants and propose future directions.
  •  
  •  TALK   Driver's mental workload estimation based on the reflex eye movement
    Date & Time: Tuesday, March 15, 2016; 12:45 PM - 1:30 PM
    Speaker: Prof. Hirofumi Aoki, Nagoya University
    Research Area: Speech & Audio
    Brief
    • Driving requires a complex skill that is involved with the vehicle itself (e.g., speed control and instrument operation), other road users (e.g., other vehicles, pedestrians), surrounding environment, and so on. During driving, visual cues are the main source to supply information to the brain. In order to stabilize the visual information when you are moving, the eyes move to the opposite direction based on the input to the vestibular system. This involuntary eye movement is called as the vestibulo-ocular reflex (VOR) and the physiological models have been studied so far. Obinata et al. found that the VOR can be used to estimate mental workload. Since then, our research group has been developing methods to quantitatively estimate mental workload during driving by means of reflex eye movement. In this talk, I will explain the basic mechanism of the reflex eye movement and how to apply for mental workload estimation. I also introduce the latest work to combine the VOR and OKR (optokinetic reflex) models for naturalistic driving environment.
  •  
  •  TALK   A data-centric approach to driving behavior research: How can signal processing methods contribute to the development of autonomous driving?
    Date & Time: Tuesday, March 15, 2016; 12:00 PM - 12:45 PM
    Speaker: Prof. Kazuya Takeda, Nagoya University
    Research Area: Speech & Audio
    Brief
    • Thanks to advanced "internet of things" (IoT) technologies, situation-specific human behavior has become an area of development for practical applications involving signal processing. One important area of development of such practical applications is driving behavior research. Since 1999, I have been collecting driving behavior data in a wide range of signal modalities, including speech/sound, video, physical/physiological sensors, CAN bus, LIDAR and GNSS. The objective of this data collection is to evaluate how well signal models can represent human behavior while driving. In this talk, I would like to summarize our 10 years of study of driving behavior signal processing, which has been based on these signal corpora. In particular, statistical signal models of interactions between traffic contexts and driving behavior, i.e., stochastic driver modeling, will be discussed, in the context of risky lane change detection. I greatly look forward to discussing the scalability of such corpus-based approaches, which could be applied to almost any traffic situation.
  •  
  •  TALK   Emotion Detection for Health Related Issues
    Date & Time: Tuesday, February 16, 2016; 12:00 PM - 1:00 PM
    Speaker: Dr. Najim Dehak, MIT
    Research Area: Speech & Audio
    Brief
    • Recently, there has been a great increase of interest in the field of emotion recognition based on different human modalities, such as speech, heart rate etc. Emotion recognition systems can be very useful in several areas, such as medical and telecommunications. In the medical field, identifying the emotions can be an important tool for detecting and monitoring patients with mental health disorder. In addition, the identification of the emotional state from voice provides opportunities for the development of automated dialogue system capable of producing reports to the physician based on frequent phone communication between the system and the patients. In this talk, we will describe a health related application of using emotion recognition system based on human voices in order to detect and monitor the emotion state of people.
  •  
  •  EVENT   SANE 2015 - Speech and Audio in the Northeast
    Date: Thursday, October 22, 2015
    MERL Contact: Jonathan Le Roux
    Location: Google, New York City, NY
    Research Area: Speech & Audio
    Brief
    • SANE 2015, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 22, 2015 at Google, in New York City, NY.

      It is a follow-up to SANE 2012, held at Mitsubishi Electric Research Labs (MERL), SANE 2013, held at Columbia University, and SANE 2014, held at MIT, which each gathered 70 to 90 researchers and students.

      SANE 2015 will feature invited talks by leading researchers from the Northeast, as well as from the international community: Rohit Prasad (Amazon), Michael Mandel (Brooklyn College, CUNY), Ron Weiss (Google), John Hershey (MERL), Pablo Sprechmann (NYU), Tuomas Virtanen (Tampere University of Technology), and Paris Smaragdis (UIUC). It will also feature a lively poster session during lunch time, open to both students and researchers.

      SANE 2015 is organized by Jonathan Le Roux (MERL), Hank Liao (Google), Andrew Senior (Google), and John R. Hershey (MERL).
  •  
  •  EVENT   SANE 2014 - Speech and Audio in the Northeast
    Date: Thursday, October 23, 2014
    MERL Contact: Jonathan Le Roux
    Location: Mitsubishi Electric Research Laboratories (MERL)
    Research Area: Speech & Audio
    Brief
    • SANE 2014, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 23, 2014 at MIT, in Cambridge, MA. It is a follow-up to SANE 2012, held at Mitsubishi Electric Research Labs (MERL), and SANE 2013, held at Columbia University, which each gathered around 70 researchers and students. SANE 2014 will feature invited talks by leading researchers from the Northeast as well as Europe: Najim Dehak (MIT), Hakan Erdogan (MERL/Sabanci University), Gael Richard (Telecom ParisTech), George Saon (IBM Research), Andrew Senior (Google Research), Stavros Tsakalidis (BBN - Raytheon), and David Wingate (Lyric). It will also feature a lively poster session during lunch time, open to both students and researchers. SANE 2014 is organized by Jonathan Le Roux (MERL), Jim Glass (MIT), and John R. Hershey (MERL).
  •  
  •  EVENT   SANE 2013 - Speech and Audio in the Northeast
    Date & Time: Thursday, October 24, 2013; 8:45 AM - 5:00 PM
    MERL Contact: Jonathan Le Roux
    Location: Columbia University
    Research Area: Speech & Audio
    Brief
    • SANE 2013, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 24, 2013 at Columbia University, in New York City.

      A follow-up to SANE 2012 held in October 2012 at MERL in Cambridge, MA, this year's SANE will be held in conjunction with the WASPAA workshop, held October 20-23 in upstate New York. WASPAA attendees are welcome and encouraged to attend SANE.

      SANE 2013 will feature invited speakers from the Northeast, as well as from the international community. It will also feature a lively poster session during lunch time, open to both students and researchers.

      SANE 2013 is organized by Prof. Dan Ellis (Columbia University), Jonathan Le Roux (MERL) and John R. Hershey (MERL).
  •  
  •  TALK   Efficiently sampling wave fields
    Date & Time: Thursday, October 17, 2013; 12:00 PM
    Speaker: Prof. Laurent Daudet, Paris Diderot University, France
    MERL Host: Jonathan Le Roux
    Research Area: Speech & Audio
    Brief
    • In acoustics, one may wish to acquire a wavefield over a whole spatial domain, while we can only make point measurements (ie, with microphones). Even with few sources, this remains a difficult problem because of reverberation, which can be hard to characterize. This can be seen as a sampling / interpolation problem, and it raises a number of interesting questions: how many sample points are needed, where to choose the sampling points, etc. In this presentation, we will review some case studies, in 2D (vibrating plates) and 3D (room acoustics), with numerical and experimental data, where we have developed sparse models, possibly with additional 'structures', based on a physical modeling of the acoustic field. These type of models are well suited to reconstruction techniques known as compressed sensing. These principles can also be used for sub-nyquist optical imaging : we will show preliminary experimental results of a new compressive imager, remarkably simple in its principle, using a multiply scattering medium.
  •