-
CI0197: Internship - Embodied AI & Humanoid Robotics
Those who are passionate about pushing the boundaries of embodied AI, join our cutting-edge research team as an intern and contribute to the development of generalist AI agents for humanoid robots. This is a unique opportunity to work on impactful projects aimed at publishing in top-tier AI and robotics venues.
What We’re Looking For
We’re seeking highly motivated individuals with:
- Advanced research experience in robotic AI, edge AI, and agentic AI systems.
- Hands-on expertise in Vision-Language-Action (VLA) models and Foundation Models
- Strong proficiency with Python, PyTorch/JAX, deep learning, and robotic agent frameworks
Internship Details
- Duration: ~3 months
- Start Date: Flexible
- Goal: Publish research at leading AI/robotics conferences and journals
If you're excited about shaping the future of humanoid robotics and AI agents, we’d love to hear from you!
The pay range for this internship position will be 6-8K per month.
- Research Areas: Applied Physics, Artificial Intelligence, Computer Vision, Control, Machine Learning, Robotics, Signal Processing, Speech & Audio, Optimization
- Host: Toshi Koike-Akino
- Apply Now
-
CV0267: Internship - Audio-Visual Learning for Spatial Audio Processing
MERL is looking for a highly motivated intern to work on an original research project on audio-visual learning, with a focus on spatial audio, training models using limited labeled data. A strong background in computer vision, audio processing, and deep learning is required. Experience in audio-visual (multimodal) learning, weakly/self-supervised learning, Room Impulse Response (RIR) estimation, and large (vision-) language models is an added plus and will be valued. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI, and possess solid programming skills in Python and popular deep learning frameworks such as Pytorch. The intern will collaborate with MERL researchers to develop and implement novel algorithms and prepare manuscripts for scientific publications. Successful applicants are typically graduate students on a Ph.D. track or recent Ph.D. graduates. Duration and start date are flexible, but the internship is expected to last for at least 3 months.
Required Specific Experience
- Prior publications in top-tier computer vision and/or machine learning venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, or AAAI.
- Knowledge of the latest self-supervised and weakly-supervised learning techniques.
- Experience with Large (Vision-) Language Models, Spatial audio processing techniques.
- Proficiency in scripting languages, such as Python, and deep learning frameworks such as PyTorch or Tensorflow.
The pay range for this internship position will be $6-8K per month.
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
- Host: Moitreya Chatterjee
- Apply Now
-
SA0191: Internship - Human-Robot Interaction Based on Multimodal Scene Understanding
We are looking for a graduate student interested in advancing the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring with a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with a flexible start date (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
Required Specific Experience
- Experience with ROS2, C/C++, Python, and deep learning frameworks such as PyTorch are essential.
The pay range for this internship position will be 6-8K per month.
- Research Areas: Artificial Intelligence, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now
-
SA0186: Internship - Neural Spatial Audio Processing and Understanding
We are seeking graduate students interested in advancing the fields of spatial audio, room acoustics, physics informed machine learning, and scene understanding (e.g., sound source localization and spatial-aware captioning). The interns will work closely with MERL researchers to develop novel algorithms, record data, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: microphone array processing, physics informed machine learning, and 3D modeling in computer vision. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
The pay range for this internship position will be6-8K per month..
- Research Areas: Speech & Audio, Machine Learning, Signal Processing
- Host: Yoshiki Masuyama
- Apply Now
-
SA0187: Internship - Sound event and anomaly detection
We are seeking graduate students interested in helping advance the fields of machine sound source separation, sound event detection/localization, anomaly detection, and physics informed deep learning for machine sounds in extremely noisy conditions. The interns will collaborate with MERL researchers to derive and implement novel algorithms, record data, conduct experiments, integrate audio signals with other sensors (electrical, vision, vibration, etc.), and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work.
The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, audio source separation (music, speech, or general sounds), microphone array processing, sound event localization and detection, anomaly detection, and physics informed machine learning.
Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
The pay range for this internship position will be6-8K per month.
- Research Areas: Speech & Audio, Signal Processing, Machine Learning, Artificial Intelligence
- Host: Gordon Wichern
- Apply Now
-
SA0188: Internship - Audio separation, generation, and analysis
We are seeking graduate students interested in helping advance the fields of generative audio, source separation, speech enhancement, and robust ASR in challenging multi-source and far-field scenarios. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work.
The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, probabilistic modeling, and deep generative modeling.
Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
The pay range for this internship position will be 6-8K per month.
- Research Areas: Speech & Audio, Machine Learning, Artificial Intelligence
- Host: Jonathan Le Roux
- Apply Now