Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers
Jonathan
Le Roux
Toshiaki
Koike-Akino
Ye
Wang
Anoop
Cherian
Gordon
Wichern
Chiori
Hori
Tim K.
Marks
Michael J.
Jones
Daniel N.
Nikovski
Kieran
Parsons
Devesh K.
Jha
Philip V.
Orlik
Suhas
Lohit
Matthew
Brand
Petros T.
Boufounos
Hassan
Mansour
Diego
Romeres
Pu
(Perry)
WangMoitreya
Chatterjee
François
Germain
Siddarth
Jain
William S.
Yerazunis
Mouhacine
Benosman
Kuan-Chuan
Peng
Arvind
Raghunathan
Radu
Corcodel
Hongbo
Sun
Yebin
Wang
Jianlin
Guo
Sameer
Khurana
Chungwei
Lin
Jing
Liu
Yanting
Ma
Bingnan
Wang
Stefano
Di Cairano
Anthony
Vetro
Jinyun
Zhang
Jose
Amaya
Karl
Berntorp
Ankush
Chakrabarty
Vedang M.
Deshpande
Dehong
Liu
Zexu
Pan
Wataru
Tsujita
Abraham P.
Vinod
Janek
Ebbers
Ryo
Hase
James
Queeney
Shinya
Tsuruta
Ryoma
Yataka
-
Awards
-
AWARD Jonathan Le Roux elevated to IEEE Fellow Date: January 1, 2024
Awarded to: Jonathan Le Roux
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
Mitsubishi Electric celebrated Dr. Le Roux's elevation and that of another researcher from the company, Dr. Shumpei Kameyama, with a worldwide news release on February 15.
Dr. Jonathan Le Roux has made fundamental contributions to the field of multi-speaker speech processing, especially to the areas of speech separation and multi-speaker end-to-end automatic speech recognition (ASR). His contributions constituted a major advance in realizing a practically usable solution to the cocktail party problem, enabling machines to replicate humans’ ability to concentrate on a specific sound source, such as a certain speaker within a complex acoustic scene—a long-standing challenge in the speech signal processing community. Additionally, he has made key contributions to the measures used for training and evaluating audio source separation methods, developing several new objective functions to improve the training of deep neural networks for speech enhancement, and analyzing the impact of metrics used to evaluate the signal reconstruction quality. Dr. Le Roux’s technical contributions have been crucial in promoting the widespread adoption of multi-speaker separation and end-to-end ASR technologies across various applications, including smart speakers, teleconferencing systems, hearables, and mobile devices.
IEEE Fellow is the highest grade of membership of the IEEE. It honors members with an outstanding record of technical achievements, contributing importantly to the advancement or application of engineering, science and technology, and bringing significant value to society. Each year, following a rigorous evaluation procedure, the IEEE Fellow Committee recommends a select group of recipients for elevation to IEEE Fellow. Less than 0.1% of voting members are selected annually for this member grade elevation.
- MERL Distinguished Scientist and Speech & Audio Senior Team Leader Jonathan Le Roux has been elevated to IEEE Fellow, effective January 2024, "for contributions to multi-source speech and audio processing."
-
AWARD Honorable Mention Award at NeurIPS 23 Instruction Workshop Date: December 15, 2023
Awarded to: Lingfeng Sun, Devesh K. Jha, Chiori Hori, Siddharth Jain, Radu Corcodel, Xinghao Zhu, Masayoshi Tomizuka and Diego Romeres
MERL Contacts: Radu Corcodel; Chiori Hori; Siddarth Jain; Devesh K. Jha; Diego Romeres
Research Areas: Artificial Intelligence, Machine Learning, RoboticsBrief- MERL Researchers received an "Honorable Mention award" at the Workshop on Instruction Tuning and Instruction Following at the NeurIPS 2023 conference in New Orleans. The workshop was on the topic of instruction tuning and Instruction following for Large Language Models (LLMs). MERL researchers presented their work on interactive planning using LLMs for partially observable robotic tasks during the oral presentation session at the workshop.
-
AWARD MERL team wins the Audio-Visual Speech Enhancement (AVSE) 2023 Challenge Date: December 16, 2023
Awarded to: Zexu Pan, Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux
MERL Contacts: François Germain; Chiori Hori; Sameer Khurana; Jonathan Le Roux; Zexu Pan; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 12 teams in the 2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSE). The team was led by Zexu Pan, and also included Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux.
The AVSE challenge aims to design better speech enhancement systems by harnessing the visual aspects of speech (such as lip movements and gestures) in a manner similar to the brain’s multi-modal integration strategies. MERL’s system was a scenario-aware audio-visual TF-GridNet, that incorporates the face recording of a target speaker as a conditioning factor and also recognizes whether the predominant interference signal is speech or background noise. In addition to outperforming all competing systems in terms of objective metrics by a wide margin, in a listening test, MERL’s model achieved the best overall word intelligibility score of 84.54%, compared to 57.56% for the baseline and 80.41% for the next best team. The Fisher’s least significant difference (LSD) was 2.14%, indicating that our model offered statistically significant speech intelligibility improvements compared to all other systems.
- MERL's Speech & Audio team ranked 1st out of 12 teams in the 2nd COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSE). The team was led by Zexu Pan, and also included Gordon Wichern, Yoshiki Masuyama, Francois Germain, Sameer Khurana, Chiori Hori, and Jonathan Le Roux.
See All Awards for Artificial Intelligence -
-
News & Events
-
TALK [MERL Seminar Series 2024] Melanie Mitchell presents talk titled "The Debate Over 'Understanding' in AI's Large Language Models" Date & Time: Tuesday, February 13, 2024; 1:00 PM
Speaker: Melanie Mitchell, Santa Fe Institute
MERL Host: Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Human-Computer InteractionAbstract- I will survey a current, heated debate in the AI research community on whether large pre-trained language models can be said to "understand" language -- and the physical and social situations language encodes -- in any important sense. I will describe arguments that have been made for and against such understanding, and, more generally, will discuss what methods can be used to fairly evaluate understanding and intelligence in AI systems. I will conclude with key questions for the broader sciences of intelligence that have arisen in light of these discussions.
-
TALK [MERL Seminar Series 2024] Greta Tuckute presents talk titled Computational models of human auditory and language processing Date & Time: Wednesday, January 31, 2024; 12:00 PM
Speaker: Greta Tuckute, MIT
MERL Host: Sameer Khurana
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioAbstract- Advances in machine learning have led to powerful models for audio and language, proficient in tasks like speech recognition and fluent language generation. Beyond their immense utility in engineering applications, these models offer valuable tools for cognitive science and neuroscience. In this talk, I will demonstrate how these artificial neural network models can be used to understand how the human brain processes language. The first part of the talk will cover how audio neural networks serve as computational accounts for brain activity in the auditory cortex. The second part will focus on the use of large language models, such as those in the GPT family, to non-invasively control brain activity in the human language system.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
Internships
-
CV2119: Conditional Video Generation
We seek a highly motivated intern to conduct original research in generative models for conditional video generation. We are interested in applications to various tasks such as video generation from text, images, and diagrams. The successful candidate will collaborate with MERL researchers to design and implement new models, conduct experiments, and prepare results for publication. The candidate should be a PhD student (or postdoc) in computer vision and machine learning with a strong publication record including at least one paper in a top-tier computer vision or machine learning venue such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, AAAI, or TPAMI. Strong programming skills, experience developing and implementing new models in deep learning platforms such as PyTorch, and broad knowledge of machine learning and deep learning methods are expected, including experience in the latest advances in conditional video generation. Start date is flexible; duration should be at least 3 months.
-
CI2049: Efficient/Green AI
MERL is seeking highly motivated and qualified interns to work on efficient machine learning techniques. The ideal candidates would have significant research experience in federated learning, generative large language models, and efficient/green AI. A mature understanding of modern machine learning methods, proficiency with Python, and familiarity with deep learning frameworks are expected. Candidates at or beyond the middle of their Ph.D. program are encouraged to apply. The expected duration is 3 months long with flexible start dates.
-
OR2103: Human Robot Collaboration in Assembly Tasks
MERL is looking for a self-motivated and qualified candidate to work on human-robot-interaction for manipulation and assembly collaborative scenarios. The ideal candidate is a PhD student and should have experience and records in one or multiple of the following areas. 1) Control, estimation and perception for Robotic manipulation 2) Task and Motion Planning 3) Learning from demonstration algorithms applied to robotic manipulation 4) Machine learning techniques for modeling and control as well as regression and classification problems. 5) Experience in working with robotic systems and familiarity with physics engine simulators like Mujoco, Isaac Gym, PyBullet. The successful candidate will be expected to develop, in collaboration with MERL employees, state of the art algorithms to solve complex manipulation tasks that involve human and robot collaborations. Proficiency in Python and ROS are required. The expectation is that the research will lead to one or more scientific publications. The expected duration s 3-4 months, with a flexible starting date.
See All Internships for Artificial Intelligence -
-
Recent Publications
- "SPECDIFF-GAN: A SPECTRALLY-SHAPED NOISE DIFFUSION GAN FOR SPEECH AND MUSIC SYNTHESIS", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2024.BibTeX TR2024-013 PDF
- @inproceedings{Baoueb2024mar,
- author = {Baoueb, Teysir and Liu, Haocheng and Fontaine, Mathieu and Le Roux, Jonathan and Richard, Gaël},
- title = {SPECDIFF-GAN: A SPECTRALLY-SHAPED NOISE DIFFUSION GAN FOR SPEECH AND MUSIC SYNTHESIS},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-013}
- }
, - "Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2024.BibTeX TR2024-012 PDF
- @inproceedings{Hori2024mar,
- author = {Hori, Chiori and Wang, Pu and Rahman, Mahbub and Vaca-Rubio, Cristian and Khurana, Sameer and Cherian, Anoop and Le Roux, Jonathan},
- title = {Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-012}
- }
, - "GLA-GRAD: A GRIFFIN-LIM EXTENDED WAVEFORM GENERATION DIFFUSION MODEL", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2024.BibTeX TR2024-014 PDF
- @inproceedings{Liu2024mar,
- author = {Liu, Haocheng and Baoueb, Teysir and Fontaine, Mathieu and Le Roux, Jonathan and Richard, Gaël},
- title = {GLA-GRAD: A GRIFFIN-LIM EXTENDED WAVEFORM GENERATION DIFFUSION MODEL},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-014}
- }
, - "Why Does Differential Privacy with Large ε Defend Against Practical Membership Inference Attacks?", AAAI Workshop on Privacy-Preserving Artificial Intelligence, February 2024.BibTeX TR2024-009 PDF
- @inproceedings{Lowy2024feb2,
- author = {Lowy, Andrew and Li, Zhuohang and Liu, Jing and Koike-Akino, Toshiaki and Parsons, Kieran and Wang, Ye},
- title = {Why Does Differential Privacy with Large ε Defend Against Practical Membership Inference Attacks?},
- booktitle = {AAAI Workshop on Privacy-Preserving Artificial Intelligence},
- year = 2024,
- month = feb,
- url = {https://www.merl.com/publications/TR2024-009}
- }
, - "TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings", IEEE/ACM Transactions on Audio, Speech, and Language Processing, DOI: 10.1109/TASLP.2024.3350887, Vol. 32, pp. 1185-1197, February 2024.BibTeX TR2024-006 PDF
- @article{Boeddeker2024feb,
- author = {Boeddeker, Christoph and Subramanian, Aswin Shanmugam and Wichern, Gordon and Haeb-Umbach, Reinhold and Le Roux, Jonathan},
- title = {TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings},
- journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
- year = 2024,
- volume = 32,
- pages = {1185--1197},
- month = feb,
- doi = {10.1109/TASLP.2024.3350887},
- issn = {2329-9304},
- url = {https://www.merl.com/publications/TR2024-006}
- }
, - "Pixel-Grounded Prototypical Part Networks", IEEE Winter Conference on Applications of Computer Vision (WACV), January 2024.BibTeX TR2024-002 PDF Presentation
- @inproceedings{Carmichael2024jan,
- author = {Carmichael, Zachariah and Jones, Lohit, Suhas and Cherian, Anoop and Michael J. and Scheirer, Walter},
- title = {Pixel-Grounded Prototypical Part Networks},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2024,
- month = jan,
- url = {https://www.merl.com/publications/TR2024-002}
- }
, - "CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments", AAAI Conference on Artificial Intelligence, December 2023.BibTeX TR2023-154 PDF
- @inproceedings{Liu2023dec2,
- author = {Liu, Xiulong and Paul, Sudipta and Chatterjee, Moitreya and Cherian, Anoop},
- title = {CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments},
- booktitle = {AAAI Conference on Artificial Intelligence},
- year = 2023,
- month = dec,
- url = {https://www.merl.com/publications/TR2023-154}
- }
, - "LoDA: Low-Dimensional Adaptation of Large Language Models", Advances in Neural Information Processing Systems (NeurIPS) workshop, December 2023.BibTeX TR2023-150 PDF
- @inproceedings{Liu2023dec,
- author = {Liu, Jing and Koike-Akino, Toshiaki and Wang, Pu and Brand, Matthew and Wang, Ye and Parsons, Kieran},
- title = {LoDA: Low-Dimensional Adaptation of Large Language Models},
- booktitle = {Advances in Neural Information Processing Systems (NeurIPS) workshop},
- year = 2023,
- month = dec,
- url = {https://www.merl.com/publications/TR2023-150}
- }
,
- "SPECDIFF-GAN: A SPECTRALLY-SHAPED NOISE DIFFUSION GAN FOR SPEECH AND MUSIC SYNTHESIS", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2024.
-
Videos
-
Software & Data Downloads
-
neural-IIR-field -
Pixel-Grounded Prototypical Part Networks -
DeepBornFNO -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling -
Partial Group Convolutional Neural Networks
-