We are looking for a graduate student interested in helping advance the field of multimodal scene understanding, with a focus on scene understanding using natural language for robot dialog and/or indoor monitoring using a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern''s doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
- Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Robotics, Speech & Audio
- Host: Chiori Hori
- Apply Now