-
CV1552: Multimodal Reasoning
MERL is looking for a self-motivated intern to work on problems at the intersection of video understanding, audio processing, and language models. The ideal candidate would be a PhD student with a strong mathematical background in machine learning and computer vision. The candidate must have prior experience in using deep learning methods for image and video representations (such as using scene graphs) and deep audio analysis (such as source separation, localization, etc.). Proficiency in Python and flexibility in using different deep learning software (especially Pytorch) is expected. The intern is expected to collaborate with computer vision and speech teams at MERL to develop algorithms and prepare manuscripts for scientific publications. The internship is for 3 months with flexible start date. This internship is preferred to be onsite at MERL, but may be done remotely where you live if the COVID pandemic makes it necessary.
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
- Host: Anoop Cherian
- Apply Now
-
SA1469: Audio source separation and sound event detection
We are seeking multiple graduate students interested in helping advance the fields of source separation, speech enhancement, and sound event detection/localization in challenging multi-source and far-field scenarios. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. The ideal candidate would be a senior Ph.D. student with experience in audio signal processing, microphone array processing, probabilistic modeling, and deep learning techniques requiring minimal supervision (e.g., unsupervised, weakly-supervised, self-supervised, or few shot learning). The expected duration of the internship is 3-6 months and start date is flexible.
- Research Areas: Machine Learning, Speech & Audio
- Host: Gordon Wichern
- Apply Now