Artificial Intelligence

Making machines smarter for improved safety, efficiency and comfort.

Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.

Quick Links
Researchers
Awards
- AWARD MERL team wins the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge
  Date: April 7, 2025
  Awarded to: Christopher Ick, Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux
  MERL Contacts: Jonathan Le Roux; Yoshiki Masuyama; Gordon Wichern
  Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
  Brief
  - MERL's Speech & Audio team ranked 1st out of 3 teams in the Generative Data Augmentation of Room Acoustics (GenDARA) 2025 Challenge, which focused on “generating room impulse responses (RIRs) to supplement a small set of measured examples and using the augmented data to train speaker distance estimation (SDE) models". The team was led by MERL intern Christopher Ick, and also included Gordon Wichern, Yoshiki Masuyama, François G. Germain, and Jonathan Le Roux.
    
    The GenDARA Challenge was organized as part of the Generative Data Augmentation (GenDA) workshop at the 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), and held on April 7, 2025 in Hyderabad, India. Yoshiki Masuyama presented the team's method, "Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training".
    
    The GenDARA challenge aims to promote the use of generative AI to synthesize RIRs from limited room data, as collecting or simulating RIR datasets at scale remains a significant challenge due to high costs and trade-offs between accuracy and computational efficiency. The challenge asked participants to first develop RIR generation systems capable of expanding a sparse set of labeled room impulse responses by generating RIRs at new source–receiver positions. They were then tasked with using this augmented dataset to train speaker distance estimation systems. Ranking was determined by the overall performance on the downstream SDE task. MERL’s approach to the GenDARA challenge centered on a geometry-aware neural acoustic field model that was first pre-trained on a large external RIR dataset to learn generalizable mappings from 3D room geometry to room impulse responses. For each challenge room, the model was then adapted or fine-tuned using the small number of provided RIRs, enabling high-fidelity generation of RIRs at unseen source–receiver locations. These augmented RIR sets were subsequently used to train the SDE system, improving speaker distance estimation by providing richer and more diverse acoustic training data.
- AWARD MERL Wins Awards at NeurIPS LLM Privacy Challenge
  Date: December 15, 2024
  Awarded to: Jing Liu, Ye Wang, Toshiaki Koike-Akino, Tsunato Nakai, Kento Oonishi, Takuya Higashi
  MERL Contacts: Toshiaki Koike-Akino; Jing Liu; Ye Wang
  Research Areas: Artificial Intelligence, Machine Learning, Information Security
  Brief
  - The Mitsubishi Electric Privacy Enhancing Technologies (MEL-PETs) team, consisting of a collaboration of MERL and Mitsubishi Electric researchers, won awards at the NeurIPS 2024 Large Language Model (LLM) Privacy Challenge. In the Blue Team track of the challenge, we won the 3rd Place Award, and in the Red Team track, we won the Special Award for Practical Attack.
- AWARD University of Padua and MERL team wins the AI Olympics with RealAIGym competition at IROS24
  Date: October 17, 2024
  Awarded to: Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
  MERL Contact: Diego Romeres
  Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, Robotics
  Brief
  - The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
    
    The competition and award ceremony was hosted by IEEE International Conference on Intelligent Robots and Systems (IROS) on October 17, 2024 in Abu Dhabi, UAE. Diego Romeres presented the team's method, based on a model-based reinforcement learning algorithm called MC-PILCO.
See All Awards for Artificial Intelligence
News & Events
- NEWS Jonathan Le Roux Elected Vice Chair and Gordon Wichern Reelected as Member of the IEEE AASP Technical Committee
  Date: November 14, 2025
  MERL Contacts: Jonathan Le Roux; Gordon Wichern
  Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
  Brief
  - Two members of MERL’s Speech and Audio Team have been elected to important positions within the IEEE Audio and Acoustic Signal Processing Technical Committee (AASP TC), a leading body of the IEEE Signal Processing Society that brings together experts from academia and industry working on speech, music, environmental audio, spatial acoustics, enhancement, separation, and machine learning for audio. The committee plays a central role in guiding the scientific direction of the field by promoting emerging research areas, shaping major conferences such as ICASSP and WASPAA, organizing special sessions and tutorials, and fostering a vibrant and collaborative global community.
    
    Jonathan Le Roux, Senior Team Leader and Distinguished Research Scientist, has been elected as the next Vice Chair of the AASP TC. His election reflects his longstanding contributions to the audio and acoustic signal processing community, his leadership in workshop and conference organization, and his significant impact across a wide range of research areas within the TC’s scope. Jonathan will serve a one-year term as Vice Chair, after which he will succeed Prof. Minje Kim (UIUC) as Chair of the AASP TC for a two-year term in 2027–28, helping steer the committee’s strategic initiatives and continued growth.
    
    During the same election, Senior Principal Research Scientist Gordon Wichern, who currently serves as Chair of the Review Subcommittee, was reelected for a second three-year term as a member of the AASP TC, serving from 2026 to 2028. His continued presence on the committee reflects his impactful research and active service to the audio and acoustic signal processing community.
- NEWS MERL Papers, Workshops, and Talks at ICCV 2025
  Date: October 19, 2025 - October 23, 2025
  Where: Honolulu, HI, USA
  MERL Contacts: Petros T. Boufounos; Anoop Cherian; Toshiaki Koike-Akino; Hassan Mansour; Tim K. Marks; Pedro Miraldo; Kuan-Chuan Peng; Pu (Perry) Wang
  Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal Processing
  Brief
  - MERL researchers presented 3 conference papers and 3 workshop papers, co-organized 2 workshops, and delivered 2 invited talks at the IEEE International Conference on Computer Vision (ICCV) 2025, which was held in Honolulu, HI, USA from October 19-23, 2025. ICCV is one of the most prestigious and competitive international conferences in the area of computer vision. Details of MERL contributions are provided below:
    
    Main Conference Papers:
    
    1. "SAC-GNC: SAmple Consensus for adaptive Graduated Non-Convexity" by V. Piedade, C. Sidhartha, J. Gaspar, V. M. Govindu, and P. Miraldo. (Highlight Paper)
    Paper: https://www.merl.com/publications/TR2025-146
    
    2. "Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts" by C.-A. Yang, K.-C. Peng, and R. A. Yeh.
    Paper: https://www.merl.com/publications/TR2025-124
    
    3. "Manual-PA: Learning 3D Part Assembly from Instruction Diagrams" by J. Zhang, A. Cherian, C. Rodriguez-Opazo, W. Deng, and S. Gould.
    Paper: https://www.merl.com/publications/TR2025-139
    
    MERL Co-Organized Workshops:
    
    1. "The Workshop on Anomaly Detection with Foundation Models (ADFM)" by K.-C. Peng, Y. Zhao, and A. Aich.
    Workshop link: https://adfmw.github.io/iccv25/
    
    2. "The 8th International Workshop on Computer Vision for Physiological Measurement (CVPM)" by D. McDuff, W. Wang, S. Stuijk, T. Marks, H. Mansour, V. R. Shenoy.
    Workshop link: https://sstuijk.estue.nl/cvpm/cvpm25/
    
    MERL Keynote Talks at Workshops:
    
    1. Tim K. Marks, Keynote Speaker at the Workshop on Computer Vision for Physiological Measurement (CVPM).
    Workshop website: https://vineetrshenoy.github.io/cvpmSeptember2025/
    
    2. Tim K. Marks, Keynote Speaker at the Workshop on Analysis and Modeling of Faces and Gestures (AMFG).
    Workshop website: https://fulab.sites.northeastern.edu/amfg2025/
    
    Workshop Papers:
    
    1. "Joint Training of Image Generator and Detector for Road Defect Detection" by K.-C. Peng.
    paper: https://www.merl.com/publications/TR2025-149
    
    2. "Radar-Conditioned 3D Bounding Box Diffusion for Indoor Human Perception" by R. Yataka, P. Wang, P.T. Boufounos, and R. Takahashi.
    paper: https://www.merl.com/publications/TR2025-154
    
    3. "L-GGSC: Learnable Graph-based Gaussian Splatting Compression" by S. Kato, T. Koike-Akino, and T. Fujihashi.
    paper: https://www.merl.com/publications/TR2025-148
See All News & Events for Artificial Intelligence
Research Highlights
Internships
See All Internships for Artificial Intelligence
Openings
- CI0177: Postdoctoral Research Fellow - Agentic AI
See All Openings at MERL
Recent Publications
- Xiang, X., Peng, K.-C., Lohit, S., Jones, M.J., Zhang, J., "Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes", British Machine Vision Conference (BMVC), November 2025.
  BibTeX TR2025-162 PDF Presentation
  - @inproceedings{Xiang2025nov,
  - author = {{{Xiang, Xinhao and Peng, Kuan-Chuan and Lohit, Suhas and Jones, Michael J. and Zhang, Jiawei}}},
  - title = {{{Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes}}},
  - booktitle = {British Machine Vision Conference (BMVC)},
  - year = 2025,
  - month = nov,
  - url = {https://www.merl.com/publications/TR2025-162}
  - }
- Wilkinghoff, K., Fujimura, T., Imoto, K., Le Roux, J., Tan, Z.-H., Toda, T., "Handling Domain Shifts for Anomalous Sound Detection: A Review of DCASE-Related Work", Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), October 2025.
  BibTeX TR2025-157 PDF
  - @inproceedings{Wilkinghoff2025oct,
  - author = {Wilkinghoff, Kevin and Fujimura, Takuya and Imoto, Keisuke and {Le Roux}, Jonathan and Tan, Zheng-Hua and Toda, Tomoki},
  - title = {{Handling Domain Shifts for Anomalous Sound Detection: A Review of DCASE-Related Work}},
  - booktitle = {Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)},
  - year = 2025,
  - month = oct,
  - url = {https://www.merl.com/publications/TR2025-157}
  - }
- Fujihashi, T., Kuwabara, A., Koike-Akino, T., "QKAN-GS: Quantum-Empowered 3D Gaussian Splatting", ACM Multimedia Workshop, October 2025.
  BibTeX TR2025-156 PDF
  - @inproceedings{Fujihashi2025oct,
  - author = {Fujihashi, Takuya and Kuwabara, Akihiro and Koike-Akino, Toshiaki},
  - title = {{QKAN-GS: Quantum-Empowered 3D Gaussian Splatting}},
  - booktitle = {ACM Multimedia Workshop},
  - year = 2025,
  - month = oct,
  - url = {https://www.merl.com/publications/TR2025-156}
  - }
- Peng, K.-C., "Joint Training of Image Generator and Detector for Road Defect Detection", IEEE International Conference on Computer Vision (ICCV) Workshops, October 2025.
  BibTeX TR2025-149 PDF Video Presentation
  - @inproceedings{Peng2025oct,
  - author = {{{Peng, Kuan-Chuan}}},
  - title = {{{Joint Training of Image Generator and Detector for Road Defect Detection}}},
  - booktitle = {IEEE International Conference on Computer Vision (ICCV) Workshops},
  - year = 2025,
  - month = oct,
  - url = {https://www.merl.com/publications/TR2025-149}
  - }
- Yang, C.-A., Peng, K.-C., Yeh, R., "Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts", IEEE International Conference on Computer Vision (ICCV), October 2025.
  BibTeX TR2025-124 PDF Video Data Presentation
  - @inproceedings{Yang2025oct,
  - author = {{{Yang, Chiao-An and Peng, Kuan-Chuan and Yeh, Raymond}}},
  - title = {{{Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts}}},
  - booktitle = {IEEE International Conference on Computer Vision (ICCV)},
  - year = 2025,
  - month = oct,
  - url = {https://www.merl.com/publications/TR2025-124}
  - }
- Shenoy, V., Wu, S., Comas, A., Lohit, S., Mansour, H., Marks, T.K., "Time-Series U-Net with Recurrence for Noise-Robust Imaging Photoplethysmography", IEEE Access, October 2025.
  BibTeX TR2025-145 PDF
  - @article{Shenoy2025oct,
  - author = {Shenoy, Vineet and Wu, Shaoju and Comas, Armand and Lohit, Suhas and Mansour, Hassan and Marks, Tim K.},
  - title = {{Time-Series U-Net with Recurrence for Noise-Robust Imaging Photoplethysmography}},
  - journal = {IEEE Access},
  - year = 2025,
  - month = oct,
  - url = {https://www.merl.com/publications/TR2025-145}
  - }
- Masuyama, Y., Germain, F.G., Wichern, G., Ick, C., Le Roux, J., "Physics-Informed Direction-Aware Neural Acoustic Fields", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), DOI: 10.1109/WASPAA66052.2025.11230918, October 2025.
  BibTeX TR2025-142 PDF
  - @inproceedings{Masuyama2025oct,
  - author = {Masuyama, Yoshiki and Germain, François G and Wichern, Gordon and Ick, Christopher and {Le Roux}, Jonathan},
  - title = {{Physics-Informed Direction-Aware Neural Acoustic Fields}},
  - booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  - year = 2025,
  - month = oct,
  - doi = {10.1109/WASPAA66052.2025.11230918},
  - url = {https://www.merl.com/publications/TR2025-142}
  - }
- Paissan, F., Wichern, G., Masuyama, Y., Aihara, R., Germain, F.G., Saijo, K., Le Roux, J., "FasTUSS: Faster Task-Aware Unified Source Separation", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), DOI: 10.1109/WASPAA66052.2025.11230943, October 2025.
  BibTeX TR2025-143 PDF
  - @inproceedings{Paissan2025oct,
  - author = {Paissan, Francesco and Wichern, Gordon and Masuyama, Yoshiki and Aihara, Ryo and Germain, François G and Saijo, Kohei and {Le Roux}, Jonathan},
  - title = {{FasTUSS: Faster Task-Aware Unified Source Separation}},
  - booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  - year = 2025,
  - month = oct,
  - doi = {10.1109/WASPAA66052.2025.11230943},
  - url = {https://www.merl.com/publications/TR2025-143}
  - }
See All Publications for Artificial Intelligence
Videos

[BMVC 2025] Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes

[ICCV Workshop 2025] Joint Training of Image Generator and Detector for Road Defect Detection

[ICCV 2025] Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts

In-Context Iterative Policy Improvement for Dynamic Manipulation

[CVPR 2025] TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection

[MERL Seminar Series Spring 2025] Red Teaming AI Agents in-the-wild: Revealing Deployment Vulnerabilities

[MERL Seminar Series Spring 2025] The Emergence of Generalizability and Semantic Low-Dim Subspaces in Diffusion Models

[MERL Seminar Series Spring 2025] Amplifying human performance in combinatorial competitive programming

[WACV 2025] Towards Zero-shot 3D Anomaly Localization

[NeurIPS 2024] MEL-PETs Defense for the NeurIPS 2024 LLM Privacy Challenge Blue Team Track

[NeurIPS 2024] MEL-PETs Joint-Context Attack for the NeurIPS 2024 LLM Privacy Challenge Red Team Track

[MERL Seminar Series Fall 2024] AI-assisted Power Grid Dispatch and Control: Optimization, Safety, and Real-world Demonstrations

[NeurIPS 2024] Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads

[MERL Seminar Series Fall 2024] Audio for Object and Spatial Awareness

[IROS 2024] Few-shot Transparent Instance Segmentation for Bin Picking

[MERL Seminar Series Fall 2024] Tools from cognitive science to understand the behavior of large language models

[ECCV 2024] PS-NEUS: A Probability-guided Sampler for Neural Implicit Surface Rendering

[ECCV 2024] Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection

Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling

[MERL Seminar Series Spring 2024] Are Emergent Abilities of Large Language Models a Mirage?

MERL's Quantum AI Technology

[MERL Seminar Series Spring 2024] The Debate Over 'Understanding' in AI's Large Language Models

[MERL Seminar Series Spring 2024] Computational models of human auditory and language processing

[MERL Seminar Series Fall 2023] Multiplicity in Machine Learning

Multi-level Reasoning for Robotic Assembly: From Sequence Inference to Contact Selection

[MERL Seminar Series Fall 2023] Visual Programming - A compositional approach to building General Purpose Vision Systems

[MERL Seminar Series Fall 2023] The Confluence of Vision, Language, and Robotics

Are Deep Neural Networks SMARTer than Second Graders?

[MERL Seminar Series Spring 2023] Fine-grained wildlife sound recognition: Towards the accuracy of a naturalist

[MERL Seminar Series Spring 2023] Pitfalls and Opportunities in Interpretable Machine Learning

Human Perspective Scene Understanding via Multimodal Sensing

[MERL Seminar Series Spring 2022] Self-Supervised Scene Representation Learning

[MERL Seminar Series Spring 2022] Learning Speech Representations with Multimodal Self-Supervision

[MERL Seminar Series 2021] Deep probabilistic regression

[MERL Seminar Series 2021] Learning to See by Moving: Self-supervising 3D scene representations for perception, control, and visual reasoning

[MERL Seminar Series 2021] Look and Listen: From Semantic to Spatial Audio-Visual Perception

Application of Deep Learning for Nanophotonic Device Design (Invited)

Machine Learning Power Amplifier

Scene-Aware Interaction Technology
Software & Data Downloads

Software & Data Downloads

MERL is making Artificial Intelligence software and data available to the research community:

Subject- and Dataset-Aware Neural Field for HRTF Modeling (SuDaField)
Open Vocabulary Attribute Detection Dataset (OVAD)
Task-Aware Unified Source Separation (TUSS)
Local Density-Based Anomaly Score Normalization for Domain Generalization (anomaly-score-normalization)
Long-Tailed Online Anomaly Detection dataset (LTOAD)
Group Representation Networks (G-RepsNets)

See All Downloads