Internship Openings

SA0191: Internship - Human-Robot Interaction Based on Multimodal Scene Understanding
- We are looking for a graduate student interested in advancing the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring with a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with a flexible start date (not just Spring/Summer but throughout 2026) and duration (typically 3-6 months).
  Required Specific Experience
  - Experience with ROS2, C/C++, Python, and deep learning frameworks such as PyTorch are essential.
  The pay range for this internship position will be 6-8K per month.
- Research Areas: Artificial Intelligence, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now
SA0302: Internship - Audio Processing for Moving Sounds
- We are seeking graduate students interested in helping advance the understanding of applying sophisticated audio processing techniques (e.g., source separation, localization, anomalous sound detection) to moving sound sources (e.g., vehicles). The interns will collaborate with MERL researchers to derive and implement novel algorithms, record data, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, source separation, physics informed machine learning, outlier detection, and unsupervised learning.
  The pay range for this internship position will be 6-8K per month.
- Research Areas: Artificial Intelligence, Speech & Audio, Machine Learning
- Host: Gordon Wichern
- Apply Now
CV0075: Internship - Multimodal Embodied AI
- MERL is looking for a self-motivated intern to work on problems at the intersection of multimodal large language models and embodied AI in dynamic indoor environments. The ideal candidate would be a PhD student with a strong background in machine learning and computer vision, as demonstrated by top-tier publications. The candidate must have prior experience in designing synthetic scenes (e.g., 3D games) using popular graphics software, embodied AI, large language models, reinforcement learning, and the use of simulators such as Habitat/SoundSpaces. Hands on experience in using animated 3D human shape models (e.g., SMPL and variants) is desired. The intern is expected to collaborate with researchers in computer vision at MERL to develop algorithms and prepare manuscripts for scientific publications.
  Required Specific Experience
  - Experience in designing 3D interactive scenes
  - Experience with vision based embodied AI using simulators (implementation on real robotic hardware would be a plus).
  - Experience training large language models on multimodal data
  - Experience with training reinforcement learning algorithms
  - Strong foundations in machine learning and programming
  - Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- Research Areas: Artificial Intelligence, Computer Vision, Speech & Audio, Robotics, Machine Learning
- Host: Anoop Cherian
- Apply Now