- MERL Seminar Series.)
(Learn more about the
Date & Time:
Tuesday, September 6, 2022; 12:00 PM EDT
Human sensory perception of the physical world is rich and multimodal and can flexibly integrate input from all five sensory modalities -- vision, touch, smell, hearing, and taste. However, in AI, attention has primarily focused on visual perception. In this talk, I will introduce my efforts in connecting vision with sound, which will allow machine perception systems to see objects and infer physics from multi-sensory data. In the first part of my talk, I will introduce a. self-supervised approach that could learn to parse images and separate the sound sources by watching and listening to unlabeled videos without requiring additional manual supervision. In the second part of my talk, I will show we may further infer the underlying causal structure in 3D environments through visual and auditory observations. This enables agents to seek the sound source of repeating environmental sound (e.g., alarm) or identify what object has fallen, and where, from an intermittent impact sound.
UMass Amherst & MIT-IBM Watson AI Lab
Chuang Gan is an assistant professor at UMass Amherst. Before that, he is a researcher at MIT and IBM, working with Prof. Antonio Torralba and Prof. Josh Tenenbaum. He completed his Ph.D. with the highest honor at Tsinghua University, supervised by Prof. Andrew Chi-Chih Yao. His research interests sit at the intersection of computer vision, machine learning, and cognitive science. His research works have been recognized by Microsoft Fellowship, Baidu Fellowship, and media coverage from BBC, WIRED, Forbes, and MIT Tech Review. He has served as an area chair of CVPR, ICCV, ECCV, ICML, ICLR, NeurIPS, and ACL, and an associate editor of IEEE Transactions on Image Processing.