-
SA0045: Internship - Universal Audio Compression and Generation
We are seeking graduate students interested in helping advance the fields of universal audio compression and generation. We aim to build a single generative model that can perform multiple audio generation tasks conditioned on multimodal context. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work. The ideal candidates are Ph.D. students with experience in some of the following: deep generative modeling, large language models, neural audio codecs. The internship typically lasts 3-6 months.
- Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
- Host: Sameer Khurana
- Apply Now
-
SA0044: Internship - Multimodal scene-understanding
We are looking for a graduate student interested in helping advance the field of multimodal scene understanding, focusing on scene understanding using natural language for robot dialog and/or indoor monitoring using a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern''''s doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
Required Specific Experience
- Experience with ROS2, C/C++, Python, and deep learning frameworks such as PyTorch are essential.
- Research Areas: Artificial Intelligence, Computer Vision, Control, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now
-
SA0040: Internship - Sound event and anomaly detection
We are seeking graduate students interested in helping advance the fields of sound event detection/localization, anomaly detection, and physics informed deep learning for machine sounds. The interns will collaborate with MERL researchers to derive and implement novel algorithms, record data, conduct experiments, integrate audio signals with other sensors (electrical, vision, vibration, etc.), and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work.
The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, physics informed machine learning, outlier detection, and unsupervised learning.
Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2025) and duration (typically 3-6 months).
- Research Areas: Artificial Intelligence, Speech & Audio, Machine Learning, Data Analytics
- Host: Gordon Wichern
- Apply Now
-
SA0041: Internship - Audio separation, generation, and analysis
We are seeking graduate students interested in helping advance the fields of generative audio, source separation, speech enhancement, spatial audio, and robust ASR in challenging multi-source and far-field scenarios. The interns will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern's doctoral work.
The ideal candidates are senior Ph.D. students with experience in some of the following: audio signal processing, microphone array processing, spatial audio reproduction, probabilistic modeling, deep generative modeling, and physics informed machine learning techniques (e.g., neural fields, PINNs, sound field and reverberation modeling).
Multiple positions are available with flexible start dates (not just Spring/Summer but throughout 2025) and duration (typically 3-6 months).
- Research Areas: Speech & Audio, Machine Learning, Artificial Intelligence
- Host: Jonathan Le Roux
- Apply Now
-
CV0078: Internship - Audio-Visual Learning with Limited Labeled Data
MERL is looking for a highly motivated intern to work on an original research project on multimodal learning, such as audio-visual learning, using limited labeled data. A strong background in computer vision and deep learning is required. Experience in audio-visual (multimodal) learning, weakly/self-supervised learning, continual learning, and large (vision-) language models is an added plus and will be valued. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS or AAAI, and possess solid programming skills in Python and popular deep learning frameworks such as Pytorch. The intern will collaborate with MERL researchers to develop and implement novel algorithms and prepare manuscripts for scientific publications. Successful applicants are typically graduate students on a Ph.D. track or recent Ph.D. graduates. Duration and start date are flexible, but the internship is expected to last for at least 3 months.
Required Specific Experience
- Prior publications in top-tier computer vision and/or machine learning venues, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS or AAAI.
- Knowledge of the latest self-supervised and weakly-supervised learning techniques.
- Experience with Large (Vision-) Language Models.
- Proficiency in scripting languages, such as Python, and deep learning frameworks such as PyTorch or Tensorflow.
- Research Areas: Computer Vision, Machine Learning, Speech & Audio, Artificial Intelligence
- Host: Moitreya Chatterjee
- Apply Now
-
CV0075: Internship - Multimodal Embodied AI
MERL is looking for a self-motivated intern to work on problems at the intersection of multimodal large language models and embodied AI in dynamic indoor environments. The ideal candidate would be a PhD student with a strong background in machine learning and computer vision, as demonstrated by top-tier publications. The candidate must have prior experience in designing synthetic scenes (e.g., 3D games) using popular graphics software, embodied AI, large language models, reinforcement learning, and the use of simulators such as Habitat/SoundSpaces. Hands on experience in using animated 3D human shape models (e.g., SMPL and variants) is desired. The intern is expected to collaborate with researchers in computer vision at MERL to develop algorithms and prepare manuscripts for scientific publications.
Required Specific Experience
- Experience in designing 3D interactive scenes
- Experience with vision based embodied AI using simulators (implementation on real robotic hardware would be a plus).
- Experience training large language models on multimodal data
- Experience with training reinforcement learning algorithms
- Strong foundations in machine learning and programming
- Strong track record of publications in top-tier computer vision and machine learning venues (such as CVPR, NeurIPS, etc.).
- Research Areas: Artificial Intelligence, Computer Vision, Speech & Audio, Robotics, Machine Learning
- Host: Anoop Cherian
- Apply Now