Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers
Jonathan
Le Roux
Toshiaki
Koike-Akino
Ye
Wang
Gordon
Wichern
Anoop
Cherian
Chiori
Hori
Tim K.
Marks
Michael J.
Jones
Kieran
Parsons
Daniel N.
Nikovski
Jing
Liu
Devesh K.
Jha
Suhas
Lohit
Matthew
Brand
Pu
(Perry)
WangPhilip V.
Orlik
Kuan-Chuan
Peng
Moitreya
Chatterjee
Diego
Romeres
Yoshiki
Masuyama
Petros T.
Boufounos
Siddarth
Jain
Hassan
Mansour
William S.
Yerazunis
Radu
Corcodel
Pedro
Miraldo
Arvind
Raghunathan
Jianlin
Guo
Hongbo
Sun
Yebin
Wang
Ankush
Chakrabarty
Chungwei
Lin
Yanting
Ma
Bingnan
Wang
Stefano
Di Cairano
Saviz
Mowlavi
Anthony
Vetro
Jinyun
Zhang
Vedang M.
Deshpande
Christopher R.
Laughman
Dehong
Liu
Alexander
Schperberg
Wataru
Tsujita
Abraham P.
Vinod
Kenji
Inomata
-
Awards
-
AWARD MERL Wins Awards at NeurIPS LLM Privacy Challenge Date: December 15, 2024
Awarded to: Jing Liu, Ye Wang, Toshiaki Koike-Akino, Tsunato Nakai, Kento Oonishi, Takuya Higashi
MERL Contacts: Toshiaki Koike-Akino; Jing Liu; Ye Wang
Research Areas: Artificial Intelligence, Machine Learning, Information SecurityBrief- The Mitsubishi Electric Privacy Enhancing Technologies (MEL-PETs) team, consisting of a collaboration of MERL and Mitsubishi Electric researchers, won awards at the NeurIPS 2024 Large Language Model (LLM) Privacy Challenge. In the Blue Team track of the challenge, we won the 3rd Place Award, and in the Red Team track, we won the Special Award for Practical Attack.
-
AWARD University of Padua and MERL team wins the AI Olympics with RealAIGym competition at IROS24 Date: October 17, 2024
Awarded to: Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, RoboticsBrief- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
The competition and award ceremony was hosted by IEEE International Conference on Intelligent Robots and Systems (IROS) on October 17, 2024 in Abu Dhabi, UAE. Diego Romeres presented the team's method, based on a model-based reinforcement learning algorithm called MC-PILCO.
- The team composed of the control group at the University of Padua and MERL's Optimization and Robotic team ranked 1st out of the 4 finalist teams that arrived to the 2nd AI Olympics with RealAIGym competition at IROS 24, which focused on control of under-actuated robots. The team was composed by Niccolò Turcato, Alberto Dalla Libera, Giulio Giacomuzzo, Ruggero Carli and Diego Romeres. The competition was organized by the German Research Center for Artificial Intelligence (DFKI), Technical University of Darmstadt and Chalmers University of Technology.
-
AWARD MERL team wins the Listener Acoustic Personalisation (LAP) 2024 Challenge Date: August 29, 2024
Awarded to: Yoshiki Masuyama, Gordon Wichern, Francois G. Germain, Christopher Ick, and Jonathan Le Roux
MERL Contacts: Jonathan Le Roux; Gordon Wichern; Yoshiki Masuyama
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
The LAP Challenge workshop and award ceremony was hosted by the 32nd European Signal Processing Conference (EUSIPCO 24) on August 29, 2024 in Lyon, France. Yoshiki Masuyama presented the team's method, "Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization", and received the award from Prof. Michele Geronazzo (University of Padova, IT, and Imperial College London, UK), Chair of the Challenge's Organizing Committee.
The LAP challenge aims to explore challenges in the field of personalized spatial audio, with the first edition focusing on the spatial upsampling and interpolation of head-related transfer functions (HRTFs). HRTFs with dense spatial grids are required for immersive audio experiences, but their recording is time-consuming. Although HRTF spatial upsampling has recently shown remarkable progress with approaches involving neural fields, HRTF estimation accuracy remains limited when upsampling from only a few measured directions, e.g., 3 or 5 measurements. The MERL team tackled this problem by proposing a retrieval-augmented neural field (RANF). RANF retrieves a subject whose HRTFs are close to those of the target subject at the measured directions from a library of subjects. The HRTF of the retrieved subject at the target direction is fed into the neural field in addition to the desired sound source direction. The team also developed a neural network architecture that can handle an arbitrary number of retrieved subjects, inspired by a multi-channel processing technique called transform-average-concatenate.
- MERL's Speech & Audio team ranked 1st out of 7 teams in Task 2 of the 1st SONICOM Listener Acoustic Personalisation (LAP) Challenge, which focused on "Spatial upsampling for obtaining a high-spatial-resolution HRTF from a very low number of directions". The team was led by Yoshiki Masuyama, and also included Gordon Wichern, Francois Germain, MERL intern Christopher Ick, and Jonathan Le Roux.
See All Awards for Artificial Intelligence -
-
News & Events
-
NEWS Diego Romeres Delivers Invited Talks at Fraunhofer Italia and the University of Padua Date: July 16, 2025 - July 18, 2025
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Control, Machine Learning, Optimization, Robotics, Human-Computer InteractionBrief- MERL researcher Diego Romeres was invited to present MERL's latest research at two institutions in Italy this July, focusing on human-robot collaboration and LLM-driven assembly systems.
On July 16th, Dr. Romeres delivered a talk titled “Human-Robot Collaborative Assembly” at Fraunhofer Italia – Innovation Engineering Center (EIC) in Bolzano. His presentation showcased research on human-robot collaboration for efficient and flexible assembly processes. Fraunhofer Italia EIC is a non-profit research institute focused on enabling digital and sustainable transformation through applied innovation in close collaboration with both public and private sectors.
Two days later, on July 18th, Dr. Romeres was hosted by the University of Padua, one of Europe’s oldest and most renowned universities. His invited lecture, “Robot Assembly through Human Collaboration & Large Language Models”, explored how artificial intelligence can enhance human-robot synergy in complex assembly tasks.
- MERL researcher Diego Romeres was invited to present MERL's latest research at two institutions in Italy this July, focusing on human-robot collaboration and LLM-driven assembly systems.
-
NEWS Toshiaki Koike-Akino to give a tutorial talk at ISIT 2025 Quantum Hackathon Date: June 22, 2025
Where: IEEE International Symposium on Information Theory (ISIT)
MERL Contact: Toshiaki Koike-Akino
Research Areas: Artificial Intelligence, Communications, Data Analytics, Machine Learning, Optimization, Signal Processing, Human-Computer Interaction, Information SecurityBrief- Toshiaki Koike-Akino is invited to present a tutorial talk at IEEE ISIT 2025 Quantum Hackathon, to be held at Ann Arbor, Michigan, USA. The talk, entitled "Emerging Quantum AI Technology", will discuss the recent trends, challenges, and applications of quantum artificial intelligence (QAI) technologies.
The ISIT 2025 Quantum Hackathon invites participants to explore the intersection of quantum computing and information theory. Participants will work with quantum simulators, available quantum hardware, and state-of-the-art development kits to create innovative solutions that connect quantum advancements with challenges in communication and signal processing.
The IEEE International Symposium on Information Theory (ISIT) is the flagship conference of the IEEE Information Theory Society. The symposium centers around the presentation in all of the areas of information theory, including source and channel coding, communication theory and systems, cryptography and security, detection and estimation, networks, pattern recognition and learning, statistics, stochastic processes and complexity, and signal processing.
- Toshiaki Koike-Akino is invited to present a tutorial talk at IEEE ISIT 2025 Quantum Hackathon, to be held at Ann Arbor, Michigan, USA. The talk, entitled "Emerging Quantum AI Technology", will discuss the recent trends, challenges, and applications of quantum artificial intelligence (QAI) technologies.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
PS-NeuS: A Probability-guided Sampler for Neural Implicit Surface Rendering -
Quantum AI Technology -
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models -
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-Aware Spatio-Temporal Sampling -
Steered Diffusion -
Sustainable AI -
Robust Machine Learning -
mmWave Beam-SNR Fingerprinting (mmBSF) -
Video Anomaly Detection -
Biosignal Processing for Human-Machine Interaction -
Task-aware Unified Source Separation - Audio Examples
-
-
Internships
-
CA0157: Internship - Spatio-temporal monitoring using mobile robots
MERL is seeking a highly motivated intern to collaborate and develop a framework for spatio-temporal monitoring using heterogeneous mobile robots. The work will involve multi-domain research, including multi-agent planning and control, optimization, adaptive and learning-based control, and computer vision. The methods will be implemented and evaluated using physical experiments on robotic platforms (e.g., Crazyflies,Turtlebots). The results of the internship are expected to be published in top-tier conferences and/or journals. The internship will take place during Fall/Winter 2025 (exact dates are flexible) with an expected duration of 4-6 months.
Please use your cover letter to explain how you meet the following requirements, preferably with links to papers, code repositories, etc., indicating your proficiency.
Required Specific Experience
- Current enrollment in a PhD program in Mechanical, Electrical Engineering, Computer Science, or related programs, with a focus on Robotics and/or Control Systems
- Experience in some/all of these topics: multi-agent planning and control, optimization, adaptive and learning-based control, and computer vision
- Experience with ROS2 and validation of algorithms on robotic platforms
- Strong programming skills in Python and/or C/C++
Desired Specific Experience
- Experience with Crazyflie quadrotors and the Crazyswarm2 library
- Experience with cvxpy and/or gurobipy
- Experience in convex optimization and model predictive control
- Experience with computer vision
-
CI0169: Internship - Robotic AI Agent
Those who are passionate about pushing the boundaries of embodied AI, join our cutting-edge research team as an intern and contribute to the development of generalist AI agents for humanoid robots. This is a unique opportunity to work on impactful projects aimed at publishing in top-tier AI and robotics venues.
What We’re Looking For
We’re seeking highly motivated individuals with:
- Advanced research experience in robotic AI, edge AI, and agentic AI systems.
- Hands-on expertise in Large Language Models (LLMs), Vision-Language-Action (VLA) models and Foundation Models
- Strong proficiency with Python, PyTorch, deep learning, and robotic agent frameworks
Internship Details
- Duration: ~3 months
- Start Date: Flexible
- Goal: Publish research at leading AI/robotics conferences and journals
If you're excited about shaping the future of humanoid robotics and AI agents, we’d love to hear from you!
-
OR0171: Internship - Foundation Models for Robotic Manipulation
MERL is seeking a highly motivated and qualified intern to conduct research on applying foundation models to robotic manipulation. The focus will be on leveraging large-scale pretrained models (e.g., vision-language models, multimodal transformers, diffusion policies) to enable generalist manipulation capabilities across diverse objects, tasks and embodiments including humanoids. Potential research topics include few-shot policy learning, multimodal grounding of multiple sensor modalities to robot actions, and adapting foundation models for precise control and high success rate.
The ideal candidate will be a senior Ph.D. student with a strong background in machine learning for robotics, particularly in areas such as foundation models, imitation learning, reinforcement learning, and multimodal perception. Knowledge on large-scale Vision-Language-Action (VLA) and multimodal foundation models is expected. The internship will involve algorithm design, model fine-tuning, simulation experiments, and deployment on physical robot platforms equipped with cameras, tactile sensors, and force/torque sensors. The successful candidate will collaborate closely with MERL researchers, with the expectation of publishing in top-tier robotics or AI conferences/journals. Interested candidates should apply with an updated CV and relevant publications.
Required Specific Experience
-
Strong background in machine learning for robotics, particularly foundation models (e.g., pi_0, OpenVLA, RT-X, etc.) and imitation learning.
-
Experience with simulation environments such as Mujoco, Isaac Gym, or RLBench.
-
Experience with physical robot platforms and sensors (vision, tactile, force/torque).
-
Proficiency in Python, PyTorch, and modern deep learning frameworks
-
Strong publication record in robotics, machine learning, or AI venues
Internship Details
- Duration: ~3 months
- Start Date: Fall 2025 (flexible based on mutual agreement)
- Goal: Publish research at leading robotics/AI conferences and journals
-
See All Internships for Artificial Intelligence -
-
Recent Publications
- "Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts", IEEE International Conference on Computer Vision (ICCV), October 2025.BibTeX TR2025-124 PDF Data Presentation
- @inproceedings{Yang2025oct,
- author = {{{Yang, Chiao-An and Peng, Kuan-Chuan and Yeh, Raymond}}},
- title = {{{Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts}}},
- booktitle = {IEEE International Conference on Computer Vision (ICCV)},
- year = 2025,
- month = oct,
- url = {https://www.merl.com/publications/TR2025-124}
- }
, - "LoDA: Low-Dimensional Adaptation of Large Language Models" in Springer Book, September 2025.BibTeX TR2025-130 PDF
- @incollection{Liu2025sep,
- author = {Liu, Jing and Koike-Akino, Toshiaki and Wang, Pu and Brand, Matthew and Parsons, Kieran and Wang, Ye},
- title = {{LoDA: Low-Dimensional Adaptation of Large Language Models}},
- booktitle = {Springer Book},
- year = 2025,
- month = sep,
- url = {https://www.merl.com/publications/TR2025-130}
- }
, - "HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement", Interspeech, August 2025.BibTeX TR2025-122 PDF
- @inproceedings{Hussein2025aug,
- author = {Hussein, Amir and Khurana, Sameer and Wichern, Gordon and Germain, François G and {Le Roux}, Jonathan},
- title = {{HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement}},
- booktitle = {Interspeech},
- year = 2025,
- month = aug,
- url = {https://www.merl.com/publications/TR2025-122}
- }
, - "Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses", Interspeech, DOI: 10.21437/Interspeech.2025-1912, August 2025, pp. 933-937.BibTeX TR2025-120 PDF
- @inproceedings{Ick2025aug,
- author = {Ick, Christopher and Wichern, Gordon and Masuyama, Yoshiki and Germain, François G and {Le Roux}, Jonathan},
- title = {{Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses}},
- booktitle = {Interspeech},
- year = 2025,
- pages = {933--937},
- month = aug,
- doi = {10.21437/Interspeech.2025-1912},
- url = {https://www.merl.com/publications/TR2025-120}
- }
, - "Factorized RVQ-GAN For Disentangled Speech Tokenization", Interspeech, August 2025.BibTeX TR2025-123 PDF
- @inproceedings{Khurana2025aug,
- author = {Khurana, Sameer and Klement, Dominik and Laurent, Antoine and Bobos, Dominik and Novosad, Juraj and Gazdik, Peter and Zhang, Ellen and Huang, Zilli and Hussein, Amir and Marxer, Ricard and Masuyama, Yoshiki and Aihara, Ryo and Hori, Chiori and Germain, François G and Wichern, Gordon and {Le Roux}, Jonathan},
- title = {{Factorized RVQ-GAN For Disentangled Speech Tokenization}},
- booktitle = {Interspeech},
- year = 2025,
- month = aug,
- url = {https://www.merl.com/publications/TR2025-123}
- }
, - "Investigating Continuous Autoregressive Generative Speech Enhancement", Interspeech, August 2025.BibTeX TR2025-119 PDF
- @inproceedings{Yang2025aug,
- author = {Yang, Haici and Wichern, Gordon and Aihara, Ryo and Masuyama, Yoshiki and Khurana, Sameer and Germain, François G and {Le Roux}, Jonathan},
- title = {{Investigating Continuous Autoregressive Generative Speech Enhancement}},
- booktitle = {Interspeech},
- year = 2025,
- month = aug,
- url = {https://www.merl.com/publications/TR2025-119}
- }
, - "Audio Signal Processing in the Artificial Intelligence Era: Challenges and Directions", Journal of the Audio Engineering Society, August 2025.BibTeX TR2025-116 PDF
- @article{Steinmetz2025aug,
- author = {Steinmetz, Christian and Uhle, Christian and Everardo, Flavio and Mitcheltree, Christopher and McElveen, J. Keith and Jot, Jean-Marc and Wichern, Gordon},
- title = {{Audio Signal Processing in the Artificial Intelligence Era: Challenges and Directions}},
- journal = {Journal of the Audio Engineering Society},
- year = 2025,
- month = aug,
- url = {https://www.merl.com/publications/TR2025-116}
- }
, - "Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents", ACL 2025 workshop on Generation, Evaluation & Metrics (GEM), July 2025.BibTeX TR2025-114 PDF
- @inproceedings{Lewis2025jul2,
- author = {Lewis, Ashley and White, Michael and Liu, Jing and Koike-Akino, Toshiaki and Parsons, Kieran and Wang, Ye},
- title = {{Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents}},
- booktitle = {ACL 2025 workshop on Generation, Evaluation \& Metrics (GEM)},
- year = 2025,
- month = jul,
- url = {https://www.merl.com/publications/TR2025-114}
- }
,
- "Toward Long-Tailed Online Anomaly Detection through Class-Agnostic Concepts", IEEE International Conference on Computer Vision (ICCV), October 2025.
-
Videos
-
Software & Data Downloads
-
MEL-PETs Joint-Context Attack for LLM Privacy Challenge -
Subject- and Dataset-Aware Neural Field for HRTF Modeling -
Local Density-Based Anomaly Score Normalization for Domain Generalization -
MEL-PETs Defense for LLM Privacy Challenge -
Learned Born Operator for Reflection Tomographic Imaging -
Long-Tailed Online Anomaly Detection dataset -
Group Representation Networks -
Task-Aware Unified Source Separation -
Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization -
Self-Monitored Inference-Time INtervention for Generative Music Transformers -
Transformer-based model with LOcal-modeling by COnvolution -
Sound Event Bounding Boxes -
Enhanced Reverberation as Supervision -
Gear Extensions of Neural Radiance Fields -
Long-Tailed Anomaly Detection Dataset -
Neural IIR Filter Field for HRTF Upsampling and Personalization -
Target-Speaker SEParation -
Pixel-Grounded Prototypical Part Networks -
Steered Diffusion -
Hyperbolic Audio Source Separation -
Simple Multimodal Algorithmic Reasoning Task Dataset -
Partial Group Convolutional Neural Networks -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Nonparametric Score Estimators -
3D MOrphable STyleGAN -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Goal directed RL with Safety Constraints -
Hierarchical Musical Instrument Separation -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
Online Feature Extractor Network -
MotionNet -
FoldingNet++ -
Quasi-Newton Trust Region Policy Optimization -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Robust Iterative Data Estimation -
Gradient-based Nikaido-Isoda -
Discriminative Subspace Pooling -
Open Vocabulary Attribute Detection Dataset
-