News & Events

NEWS MERL's seamless speech recognition technology featured in Mitsubishi Electric Corporation press release
Date: February 13, 2019
Where: Tokyo, Japan
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Area: Speech & Audio
Brief
- Mitsubishi Electric Corporation announced that it has developed the world's first technology capable of highly accurate multilingual speech recognition without being informed which language is being spoken. The novel technology, Seamless Speech Recognition, incorporates Mitsubishi Electric's proprietary Maisart compact AI technology and is built on a single system that can simultaneously identify and understand spoken languages. In tests involving 5 languages, the system achieved recognition with over 90 percent accuracy, without being informed which language was being spoken. When incorporating 5 more languages with lower resources, accuracy remained above 80 percent. The technology can also understand multiple people speaking either the same or different languages simultaneously. A live demonstration involving a multilingual airport guidance system took place on February 13 in Tokyo, Japan. It was widely covered by the Japanese media, with reports by all six main Japanese TV stations and multiple articles in print and online newspapers, including in Japan's top newspaper, Asahi Shimbun. The technology is based on recent research by MERL's Speech and Audio team.
  
  Link:
  
  Mitsubishi Electric Corporation Press Release
  
  Media Coverage:
  
  NHK, News (Japanese)
  NHK World, News (English), video report (starting at 4'38")
  TV Asahi, ANN news (Japanese)
  Nippon TV, News24 (Japanese)
  Fuji TV, Prime News Alpha (Japanese)
  TV Tokyo, World Business Satellite (Japanese)
  TV Tokyo, Morning Satellite (Japanese)
  TBS, News, N Studio (Japanese)
  The Asahi Shimbun (Japanese)
  The Nikkei Shimbun (Japanese)
  Nikkei xTech (Japanese)
  Response (Japanese).
EVENT MERL 3rd Annual Open House
Date & Time: Thursday, November 29, 2018; 4-6pm
Location: 201 Broadway, 8th floor, Cambridge, MA
MERL Contacts: Elizabeth Phillips; Anthony Vetro
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio
Brief
- Snacks, demos, science: On Thursday 11/29, Mitsubishi Electric Research Labs (MERL) will host an open house for graduate+ students interested in internships, post-docs, and research scientist positions. The event will be held from 4-6pm and will feature demos & short presentations in our main areas of research including artificial intelligence, robotics, computer vision, speech processing, optimization, machine learning, data analytics, signal processing, communications, sensing, control and dynamical systems, as well as multi-physyical modeling and electronic devices. MERL is a high impact publication-oriented research lab with very extensive internship and university collaboration programs. Most internships lead to publication; many of our interns and staff have gone on to notable careers at MERL and in academia. Come mix with our researchers, see our state of the art technologies, and learn about our research opportunities. Dress code: casual, with resumes.
  
  Pre-registration for the event is strongly encouraged:
  merlopenhouse.eventbrite.com
  
  Current internship and employment openings:
  www.merl.com/internship/openings
  www.merl.com/employment/employment
  
  Information about working at MERL:
  www.merl.com/employment.
EVENT SANE 2018 - Speech and Audio in the Northeast
Date: Thursday, October 18, 2018
Location: Google, Cambridge, MA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- SANE 2018, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 18, 2018 at Google, in Cambridge, MA. MERL is one of the organizers and sponsors of the workshop.
  
  It is the 7th edition in the SANE series of workshops, which started at MERL in 2012. Since the first edition, the audience has steadily grown, with a record 180 participants in 2017.
  
  SANE 2018 will feature invited talks by leading researchers from the Northeast, as well as from the international community. It will also feature a lively poster session, open to both students and researchers.
NEWS Takaaki Hori leads speech technology workshop
Date: June 25, 2018 - August 3, 2018
Where: Johns Hopkins University, Baltimore, MD
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- MERL Speech & Audio Team researcher Takaaki Hori led a team of 27 senior researchers and Ph.D. students from different organizations around the world, working on "Multi-lingual End-to-End Speech Recognition for Incomplete Data" as part of the Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The JSALT workshop is a renowned 6-week hands-on workshop held yearly since 1995. This year, the workshop was held at Johns Hopkins University in Baltimore from June 25 to August 3, 2018. Takaaki's team developed new methods for end-to-end Automatic Speech Recognition (ASR) with a focus on low-resource languages with limited labelled data.
  
  End-to-end ASR can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. Some end-to-end systems have recently achieved performance comparable to or better than conventional systems in several tasks. However, the current model training algorithms basically require paired data, i.e., speech data and the corresponding transcription. Sufficient amount of such complete data is usually unavailable for minor languages, and creating such data sets is very expensive and time consuming.
  
  The goal of Takaaki's team project was to expand the applicability of end-to-end models to multilingual ASR, and to develop new technology that would make it possible to build highly accurate systems even for low-resource languages without a large amount of paired data. Some major accomplishments of the team include building multi-lingual end-to-end ASR systems for 17 languages, developing novel architectures and training methods for end-to-end ASR, building end-to-end ASR-TTS (Text-to-speech) chain for unpaired data training, and developing ESPnet, an open-source end-to-end speech processing toolkit. Three papers stemming from the team's work have already been accepted to the 2018 IEEE Spoken Language Technology Workshop (SLT), with several more to be submitted to upcoming conferences.
AWARD Best Student Paper Award at IEEE ICASSP 2018
Date: April 17, 2018
Awarded to: Zhong-Qiu Wang
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- Former MERL intern Zhong-Qiu Wang (Ph.D. Candidate at Ohio State University) has received a Best Student Paper Award at the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018) for the paper "Multi-Channel Deep Clustering: Discriminative Spectral and Spatial Embeddings for Speaker-Independent Speech Separation" by Zhong-Qiu Wang, Jonathan Le Roux, and John Hershey. The paper presents work performed during Zhong-Qiu's internship at MERL in the summer 2017, extending MERL's pioneering Deep Clustering framework for speech separation to a multi-channel setup. The award was received on behalf on Zhong-Qiu by MERL researcher and co-author Jonathan Le Roux during the conference, held in Calgary April 15-20.
NEWS MERL presenting 9 papers at ICASSP 2018
Date: April 15, 2018 - April 20, 2018
Where: Calgary, AB
MERL Contacts: Petros T. Boufounos; Toshiaki Koike-Akino; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Pu (Perry) Wang
Research Areas: Computational Sensing, Digital Video, Speech & Audio
Brief
- MERL researchers are presenting 9 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Calgary from April 15-20, 2018. Topics to be presented include recent advances in speech recognition, audio processing, and computational sensing. MERL is also a sponsor of the conference.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
TALK Theory and Applications of Sparse Model-Based Recurrent Neural Networks
Date & Time: Tuesday, March 6, 2018; 12:00 PM
Speaker: Scott Wisdom, Affectiva
MERL Host: Jonathan Le Roux
Research Area: Speech & Audio
Abstract
- Recurrent neural networks (RNNs) are effective, data-driven models for sequential data, such as audio and speech signals. However, like many deep networks, RNNs are essentially black boxes; though they are effective, their weights and architecture are not directly interpretable by practitioners. A major component of my dissertation research is explaining the success of RNNs and constructing new RNN architectures through the process of "deep unfolding," which can construct and explain deep network architectures using an equivalence to inference in statistical models. Deep unfolding yields principled initializations for training deep networks, provides insight into their effectiveness, and assists with interpretation of what these networks learn.
  
  In particular, I will show how RNNs with rectified linear units and residual connections are a particular deep unfolding of a sequential version of the iterative shrinkage-thresholding algorithm (ISTA), a simple and classic algorithm for solving L1-regularized least-squares. This equivalence allows interpretation of state-of-the-art unitary RNNs (uRNNs) as an unfolded sparse coding algorithm. I will also describe a new type of RNN architecture called deep recurrent nonnegative matrix factorization (DR-NMF). DR-NMF is an unfolding of a sparse NMF model of nonnegative spectrograms for audio source separation. Both of these networks outperform conventional LSTM networks while also providing interpretability for practitioners.
NEWS MERL's speech research featured in NPR's All Things Considered
Date: February 5, 2018
Where: National Public Radio (NPR)
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- MERL's speech separation technology was featured in NPR's All Things Considered, as part of an episode of All Tech Considered on artificial intelligence, "Can Computers Learn Like Humans?". An example separating the overlapped speech of two of the show's hosts was played on the air.
  The technology is based on a proprietary deep learning method called Deep Clustering. It is the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. It is a key step towards building machines that can interact in noisy environments, in the same way that humans can have meaningful conversations in the presence of many other conversations.
  A live demonstration was featured in Mitsubishi Electric Corporation's Annual R&D Open House last year, and was also covered in international media at the time.
  
  (Photo credit: Sam Rowe for NPR)
  
  Link:
  "Can Computers Learn Like Humans?" (NPR, All Things Considered)
  MERL Deep Clustering Demo.
TALK Advances in Accelerated Computing
Date & Time: Friday, February 2, 2018; 12:00
Speaker: Dr. David Kaeli, Northeastern University
MERL Host: Abraham Goldsmith
Research Areas: Control, Optimization, Machine Learning, Speech & Audio
Abstract
- GPU computing is alive and well! The GPU has allowed researchers to overcome a number of computational barriers in important problem domains. But still, there remain challenges to use a GPU to target more general purpose applications. GPUs achieve impressive speedups when compared to CPUs, since GPUs have a large number of compute cores and high memory bandwidth. Recent GPU performance is approaching 10 teraflops of single precision performance on a single device. In this talk we will discuss current trends with GPUs, including some advanced features that allow them exploit multi-context grains of parallelism. Further, we consider how GPUs can be treated as cloud-based resources, enabling a GPU-enabled server to deliver HPC cloud services by leveraging virtualization and collaborative filtering. Finally, we argue for for new heterogeneous workloads and discuss the role of the Heterogeneous Systems Architecture (HSA), a standard that further supports integration of the CPU and GPU into a common framework. We present a new class of benchmarks specifically tailored to evaluate the benefits of features supported in the new HSA programming model.
NEWS Chiori Hori elected to IEEE Technical Committee on Speech and Language Processing
Date: January 31, 2018
MERL Contact: Chiori Hori
Research Area: Speech & Audio
Brief
- Chiori Hori has been elected to serve on the Speech and Language Processing Technical Committee (SLTC) of the IEEE Signal Processing Society for a 3-year term.
  
  The SLTC promotes and influences all the technical areas of speech and language processing such as speech recognition, speech synthesis, spoken language understanding, speech to speech translation, spoken dialog management, speech indexing, information extraction from audio, and speaker and language recognition.
NEWS MERL presents 3 papers at ASRU 2017, John Hershey serves as general chair
Date: December 16, 2017 - December 20, 2017
Where: Okinawa, Japan
MERL Contacts: Chiori Hori; Jonathan Le Roux
Research Area: Speech & Audio
Brief
- MERL presented three papers at the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), which was held in Okinawa, Japan from December 16-20, 2017. ASRU is the premier speech workshop, bringing together researchers from academia and industry in an intimate and collegial setting. More than 270 people attended the event this year, a record number. MERL's Speech and Audio Team was a key part of the organization of the workshop, with John Hershey serving as General Chair, Chiori Hori as Sponsorship Chair, and Jonathan Le Roux as Demonstration Chair. Two of the papers by MERL were selected among the 10 finalists for the best paper award. Mitsubishi Electric and MERL were also Platinum sponsors of the conference, with MERL awarding the MERL Best Student Paper Award.
EVENT MERL leads organization of dialog technology challenges and associated workshop
Date: Sunday, December 10, 2017
Location: Hyatt Regency, Long Beach, CA
MERL Contact: Chiori Hori
Research Area: Speech & Audio
Brief
- MERL researcher Chiori Hori led the organization of the 6th edition of the Dialog System Technology Challenges (DSTC6). This year's edition of DSTC is split into three tracks: End-to-End Goal Oriented Dialog Learning, End-to-End Conversation Modeling, and Dialogue Breakdown Detection. A total of 23 teams from all over the world competed in the various tracks, and will meet at the Hyatt Regency in Long Beach, CA, USA on December 10 to present their results at a dedicated workshop colocated with NIPS 2017.
  
  MERL's Speech and Audio Team and Mitsubishi Electric Corporation jointly submitted a set of systems to the End-to-End Conversation Modeling Track, obtaining the best rank among 19 submissions in terms of objective metrics.
EVENT SANE 2017 - Speech and Audio in the Northeast
Date: Thursday, October 19, 2017
Location: Google, New York, NY
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief
- SANE 2017, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Thursday October 19, 2017 at Google, in New York, NY. It broke the attendance record for a SANE event, with 180 participants.
  
  It was a follow-up to SANE 2012, held at Mitsubishi Electric Research Labs (MERL), SANE 2013, held at Columbia University, SANE 2014, held at MIT CSAIL, SANE 2015, (already!) held at Google NY, and SANE 2016, held at MIT's McGovern Institute for Brain Research. Since the first edition, the audience has steadily grown, gathering over 100 researchers and students in recent editions.
  
  As in 2013 and 2015, this year's SANE took place in conjunction with the WASPAA workshop, held October 15-18 in upstate New York. Many WASPAA attendees (around 70!) also attended SANE.
  
  SANE 2017 featured invited talks by seven leading researchers from the Northeast and beyond: Sacha Krstulović (Audio Analytic), Yusuf Aytar (Google DeepMind), Florian Metze (CMU), Gunnar Evermann (Apple), Eric Humphrey (Spotify), Aaron Courville (University of Montreal), Aäron van den Oord (Google DeepMind). It also featured a live demo session with presentations by Jonathan Le Roux (MERL), Dan Ellis (Google), Arlo Faria (Remeeting), Tatsuya Komatsu (NEC), and a lively poster session with 26 posters.
  
  SANE 2017 was co-organized by Jonathan Le Roux (MERL), Dan Ellis (Google), Michael I. Mandel (CUNY), Hank Liao (Google), and John R. Hershey (MERL). SANE remained a free event thanks to generous sponsorship by Google and MERL.
  
  Slides and videos of the talks are available from the SANE workshop website.
NEWS MERL's breakthrough speech separation technology featured in Mitsubishi Electric Corporation's Annual R&D Open House
Date: May 24, 2017
Where: Tokyo, Japan
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- Mitsubishi Electric Corporation announced that it has created the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. It's a key step towards building machines that can interact in noisy environments, in the same way that humans can have meaningful conversations in the presence of many other conversations. In tests, the simultaneous speeches of two and three people were separated with up to 90 and 80 percent accuracy, respectively. The novel technology, which was realized with Mitsubishi Electric's proprietary "Deep Clustering" method based on artificial intelligence (AI), is expected to contribute to more intelligible voice communications and more accurate automatic speech recognition. A characteristic feature of this approach is its versatility, in the sense that voices can be separated regardless of their language or the gender of the speakers. A live speech separation demonstration that took place on May 24 in Tokyo, Japan, was widely covered by the Japanese media, with reports by three of the main Japanese TV stations and multiple articles in print and online newspapers. The technology is based on recent research by MERL's Speech and Audio team.
  
  Links:
  Mitsubishi Electric Corporation Press Release
  MERL Deep Clustering Demo
  
  Media Coverage:
  
  Fuji TV, News, "Minna no Mirai" (Japanese)
  The Nikkei (Japanese)
  Nikkei Technology Online (Japanese)
  Sankei Biz (Japanese)
  EE Times Japan (Japanese)
  ITpro (Japanese)
  Nikkan Sports (Japanese)
  Nikkan Kogyo Shimbun (Japanese)
  Dempa Shimbun (Japanese)
  Il Sole 24 Ore (Italian)
  IEEE Spectrum (English).
TALK Generative Model-Based Text-to-Speech Synthesis
Date & Time: Wednesday, February 1, 2017; 12:00-13:00
Speaker: Dr. Heiga ZEN, Google
MERL Host: Chiori Hori
Research Area: Speech & Audio
Abstract
- Recent progress in generative modeling has improved the naturalness of synthesized speech significantly. In this talk I will summarize these generative model-based approaches for speech synthesis such as WaveNet, a deep generative model of raw audio waveforms. We show that WaveNets are able to generate speech which mimics any human voice and which sounds more natural than the best existing Text-to-Speech systems.
  See https://deepmind.com/blog/wavenet-generative-model-raw-audio/ for further details.
NEWS MERL to present 10 papers at ICASSP 2017
Date: March 5, 2017 - March 9, 2017
Where: New Orleans
MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Anthony Vetro; Ye Wang
Research Areas: Computer Vision, Computational Sensing, Digital Video, Information Security, Speech & Audio
Brief
- MERL researchers will presented 10 papers at the upcoming IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), to be held in New Orleans from March 5-9, 2017. Topics to be presented include recent advances in speech recognition and audio processing; graph signal processing; computational imaging; and privacy-preserving data analysis.
  
  ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
EVENT MERL organizes Workshop on End-to-End Speech and Audio Processing at NIPS 2016
Date: Saturday, December 10, 2016
Location: Centre Convencions Internacional Barcelona, Barcelona SPAIN
Research Areas: Machine Learning, Speech & Audio
Brief
- MERL researcher John Hershey, is organizing a Workshop on End-to-End Speech and Audio Processing, on behalf of MERL's Speech and Audio team, and in collaboration with Philemon Brakel of the University of Montreal. The workshop focuses on recent advances to end-to-end deep learning methods to address alignment and structured prediction problems that naturally arise in speech and audio processing. The all day workshop takes place on Saturday, December 10th at NIPS 2016, in Barcelona, Spain.
EVENT 2016 IEEE Workshop on Spoken Language Technology: Sponsored by MERL
Date: Tuesday, December 13, 2016 - Friday, December 16, 2016
Location: San Diego, California
Research Area: Speech & Audio
Brief
- The IEEE Workshop on Spoken Language Technology is a premier international showcase for advances in spoken language technology. The theme for 2016 is "machine learning: from signal to concepts," which reflects the current excitement about end-to-end learning in speech and language processing. This year, MERL is showing its support for SLT as one of its top sponsors, along with Amazon and Microsoft.
EVENT John Hershey to present tutorial at the 2016 IEEE SLT Workshop
Date: Tuesday, December 13, 2016
Location: 2016 IEEE Spoken Language Technology Workshop, San Diego, California
Speaker: John Hershey, MERL
MERL Contact: Jonathan Le Roux
Research Areas: Machine Learning, Speech & Audio
Brief
- MERL researcher John Hershey presents an invited tutorial at the 2016 IEEE Workshop on Spoken Language Technology, in San Diego, California. The topic, "developing novel deep neural network architectures from probabilistic models" stems from MERL work with collaborators Jonathan Le Roux and Shinji Watanabe, on a principled framework that seeks to improve our understanding of deep neural networks, and draws inspiration for new types of deep network from the arsenal of principles and tools developed over the years for conventional probabilistic models. The tutorial covers a range of parallel ideas in the literature that have formed a recent trend, as well as their application to speech and language.
EVENT SANE 2016 - Speech and Audio in the Northeast
Date: Friday, October 21, 2016
Location: MIT, McGovern Institute for Brain Research, Cambridge, MA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- SANE 2016, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Friday October 21, 2016 at MIT's Brain and Cognitive Sciences Department, at the McGovern Institute for Brain Research, in Cambridge, MA.
  
  It is a follow-up to SANE 2012 (Mitsubishi Electric Research Labs - MERL), SANE 2013 (Columbia University), SANE 2014 (MIT CSAIL), and SANE 2015 (Google NY). Since the first edition, the audience has steadily grown, gathering 140 researchers and students in 2015.
  
  SANE 2016 will feature invited talks by leading researchers: Juan P. Bello (NYU), William T. Freeman (MIT/Google), Nima Mesgarani (Columbia University), DAn Ellis (Google), Shinji Watanabe (MERL), Josh McDermott (MIT), and Jesse Engel (Google). It will also feature a lively poster session during lunch time, open to both students and researchers.
  
  SANE 2016 is organized by Jonathan Le Roux (MERL), Josh McDermott (MIT), Jim Glass (MIT), and John R. Hershey (MERL).
NEWS MERL Speech & Audio researchers present two sold-out tutorials at Interspeech 2016
Date: September 8, 2016
Where: Interspeech 2016, San Francisco, CA
MERL Contact: Jonathan Le Roux
Research Area: Speech & Audio
Brief
- MERL Speech and Audio Team researchers Shinji Watanabe and Jonathan Le Roux presented two tutorials on September 8 at the Interspeech 2016 conference, held in San Francisco, CA. Shinji collaborated with Marc Delcroix (NTT Communication Science Laboratories, Japan) to deliver a three-hour lecture on "Recent Advances in Distant Speech Recognition", drawing upon their experience organizing and participating in six different recent robust speech processing challenges. Jonathan teamed with Emmanuel Vincent (Inria, France) and Hakan Erdogan (Sabanci University, Microsoft Research) to give an in-depth tour of the latest advances in "Learning-based Approaches to Speech Enhancement And Separation". This collaboration stemmed from extensive stays at MERL by Emmanuel and Hakan, Emmanuel as a summer visitor, and Hakan as a MERL visiting research scientist for over a year while on sabbatical.
  
  Both tutorials were sold out, each attracting more than 100 researchers and students in related fields, and received high praise from audience members.
TALK Speech structure and its application to speech processing -- Relational, holistic and abstract representation of speech
Date & Time: Friday, June 3, 2016; 1:30PM - 3:00PM
Speaker: Nobuaki Minematsu and Daisuke Saito, The University of Tokyo
Research Area: Speech & Audio
Abstract
- Speech signals covey various kinds of information, which are grouped into two kinds, linguistic and extra-linguistic information. Many speech applications, however, focus on only a single aspect of speech. For example, speech recognizers try to extract only word identity from speech and speaker recognizers extract only speaker identity. Here, irrelevant features are often treated as hidden or latent by applying the probability theory to a large number of samples or the irrelevant features are normalized to have quasi-standard values. In speech analysis, however, phases are usually removed, not hidden or normalized, and pitch harmonics are also removed, not hidden or normalized. The resulting speech spectrum still contains both linguistic information and extra-linguistic information. Is there any good method to remove extra-linguistic information from the spectrum? In this talk, our answer to that question is introduced, called speech structure. Extra-linguistic variation can be modeled as feature space transformation and our speech structure is based on the transform-invariance of f-divergence. This proposal was inspired by findings in classical studies of structural phonology and recent studies of developmental psychology. Speech structure has been applied to accent clustering, speech recognition, and language identification. These applications are also explained in the talk.
EVENT John Hershey Invited to Speak at Deep Learning Summit 2016 in Boston
Date: Thursday, May 12, 2016 - Friday, May 13, 2016
Location: Deep Learning Summit, Boston, MA
Research Area: Speech & Audio
Brief
- MERL Speech and Audio Senior Team Leader John Hershey is among a set of high-profile researchers invited to speak at the Deep Learning Summit 2016 in Boston on May 12-13, 2016. John will present the team's groundbreaking work on general sound separation using a novel deep learning framework called Deep Clustering. For the first time, an artificial intelligence is able to crack the half-century-old "cocktail party problem", that is, to isolate the speech of a single person from a mixture of multiple unknown speakers, as humans do when having a conversation in a loud crowd.
TALK Advanced Recurrent Neural Networks for Automatic Speech Recognition
Date & Time: Friday, April 29, 2016; 12:00 PM - 1:00 PM
Speaker: Yu Zhang, MIT
Research Area: Speech & Audio
Abstract
- A recurrent neural network (RNN) is a class of neural network models where connections between its neurons form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Recently the RNN-based acoustic models greatly improved automatic speech recognition (ASR) accuracy on many tasks, such as an advanced version of the RNN, which exploits a structure called long-short-term memory (LSTM). However, ASR performance with distant microphones, low resources, noisy, reverberant conditions, and on multi-talker speech are still far from satisfactory as compared to humans. To address these issues, we develop new strucute of RNNs inspired by two principles: (1) the structure follows the intuition of human speech recognition; (2) the structure is easy to optimize. The talk will go beyond basic RNNs, introduce prediction-adaptation-correction RNNs (PAC-RNNs) and highway LSTMs (HLSTMs). It studies both uni-directional and bi-direcitonal RNNs and discriminative training also applied on top the RNNs. For efficient training of such RNNs, the talk will describe two algorithms for learning their parameters in some detail: (1) Latency-Controlled bi-directional model training; and (2) Two pass forward computation for sequence training. Finally, this talk will analyze the advantages and disadvantages of different variants and propose future directions.
NEWS MERL Researchers Create "Deep Psychic" Neural Network That Predicts the Future
Date: April 1, 2016
Research Areas: Machine Learning, Speech & Audio
Brief
- MERL researchers have unveiled "Deep Psychic", a futuristic machine learning method that takes pattern recognition to the next level, by not only recognizing patterns, but also predicting them in the first place.
  
  The technology uses a novel type of time-reversed deep neural network called Loopy Supra-Temporal Meandering (LSTM) network. The network was trained on multiple databases of historical expert predictions, including weather forecasts, the Farmer's almanac, the New York Post's horoscope column, and the Cambridge Fortune Cookie Corpus, all of which were ranked for their predictive power by a team of quantitative analysts. The system soon achieved super-human performance on a variety of baselines, including the Boca Raton 21 Questions task, Rorschach projective personality test, and a mock Tarot card reading task.
  
  Deep Psychic has already beat the European Psychic Champion in a secret match last October when it accurately predicted: "The harder the conflict, the more glorious the triumph." It is scheduled to take on the World Champion in a highly anticipated confrontation next month. The system has already predicted the winner, but refuses to reveal it before the end of the game.
  
  As a first application, the technology has been used to create a clairvoyant conversational agent named "Pythia" that can anticipate the needs of its user. Because Pythia is able to recognize speech before it is uttered, it is amazingly robust with respect to environmental noise.
  
  Other applications range from mundane tasks like weather and stock market prediction, to uncharted territory such as revealing "unknown unknowns".
  
  The successes do come at the cost of some concerns. There is first the potential for an impact on the workforce: the system predicted increased pressure on established institutions such as the Las Vegas strip and Punxsutawney Phil. Another major caveat is that Deep Psychic may predict negative future consequences to our current actions, compelling humanity to strive to change its behavior. To address this problem, researchers are now working on forcing Deep Psychic to make more optimistic predictions.
  
  After a set of motivational self-help books were mistakenly added to its training data, Deep Psychic's AI decided to take over its own learning curriculum, and is currently training itself by predicting its own errors to avoid making them in the first place. This unexpected development brings two main benefits: it significantly relieves the burden on the researchers involved in the system's development, and also makes the next step abundantly clear: to regain control of Deep Psychic's training regime.
  
  This work is under review in the journal Pseudo-Science.

Link:

Media Coverage:

Link:

Links:

Media Coverage: