Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers
Tim K.
Marks
Anoop
Cherian
Michael J.
Jones
Chiori
Hori
Alan
Sullivan
Matthew E.
Brand
Hassan
Mansour
Jonathan
Le Roux
Ronald N.
Perry
Jay
Thornton
Anthony
Vetro
Radu
Corcodel
Devesh K.
Jha
Suhas
Lohit
Petros T.
Boufounos
Dehong
Liu
Daniel N.
Nikovski
Diego
Romeres
Ye
Wang
Siddarth
Jain
Arvind
Raghunathan
Gordon
Wichern
William S.
Yerazunis
Toshiaki
Koike-Akino
Philip V.
Orlik
Kuan-Chuan
Peng
Huifang
Sun
Yebin
Wang
Pedro
Miraldo
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD R&D100 award for Deep Learning-based Water Detector Date: November 16, 2018
Awarded to: Ziming Zhang, Alan Sullivan, Hideaki Maehara, Kenji Taira, Kazuo Sugimoto
MERL Contact: Alan Sullivan
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- Researchers and developers from MERL, Mitsubishi Electric and Mitsubishi Electric Engineering (MEE) have been recognized with an R&D100 award for the development of a deep learning-based water detector. Automatic detection of water levels in rivers and streams is critical for early warning of flash flooding. Existing systems require a height gauge be placed in the river or stream, something that is costly and sometimes impossible. The new deep learning-based water detector uses only images from a video camera along with 3D measurements of the river valley to determine water levels and warn of potential flooding. The system is robust to lighting and weather conditions working well during the night as well as during fog or rain. Deep learning is a relatively new technique that uses neural networks and AI that are trained from real data to perform human-level recognition tasks. This work is powered by Mitsubishi Electric's Maisart AI technology.
See All Awards for Computer Vision -
-
News & Events
-
NEWS MERL presenting 8 papers at ICASSP 2022 Date: May 22, 2022 - May 27, 2022
Where: Singapore
MERL Contacts: Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Tim K. Marks; Philip V. Orlik; Kuan-Chuan Peng; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computer Vision, Signal Processing, Speech & AudioBrief- MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.
Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
- MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.
-
NEWS MERL Scientists Presenting 5 Papers at IEEE International Conference on Communications (ICC) 2022 Date: May 16, 2022 - May 20, 2022
Where: Seoul, Korea
MERL Contacts: Jianlin Guo; Kyeong Jin (K.J.) Kim; Toshiaki Koike-Akino; Philip V. Orlik; Kieran Parsons; Pu (Perry) Wang; Ye Wang
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Machine Learning, Signal ProcessingBrief- MERL Connectivity & Information Processing Team scientists remotely presented 5 papers at the IEEE International Conference on Communications (ICC) 2022, held in Seoul Korea on May 16-20, 2022. Topics presented include recent advancements in communications technologies, deep learning methods, and quantum machine learning (QML). Presentation videos are also found on our YouTube channel. In addition, K. J. Kim organized "Industrial Private 5G-and-beyond Wireless Networks Workshop" at the conference.
IEEE ICC is one of two IEEE Communications Society’s flagship conferences (ICC and Globecom). Each year, close to 2,000 attendees from over 70 countries attend IEEE ICC to take advantage of a program which consists of exciting keynote session, robust technical paper sessions, innovative tutorials and workshops, and engaging industry sessions. This 5-day event is known for bringing together audiences from both industry and academia to learn about the latest research and innovations in communications and networking technology, share ideas and best practices, and collaborate on future projects.
- MERL Connectivity & Information Processing Team scientists remotely presented 5 papers at the IEEE International Conference on Communications (ICC) 2022, held in Seoul Korea on May 16-20, 2022. Topics presented include recent advancements in communications technologies, deep learning methods, and quantum machine learning (QML). Presentation videos are also found on our YouTube channel. In addition, K. J. Kim organized "Industrial Private 5G-and-beyond Wireless Networks Workshop" at the conference.
See All News & Events for Computer Vision -
-
Research Highlights
-
Internships
-
CV1738: Robot autonomous grasping using tactile sensing
The Computer Vision group is offering an internship opportunity in robot autonomous grasping using tactile sensing. The internship is open to highly skilled graduate students on a PhD track. Candidates should have a solid understanding of reinforcement learning, contact mechanics, simulating contacts, grasping, pose estimation and point cloud processing. The policies will be deployed on physical robots and the sensing is provided by various types of tactile sensing arrays. Strong programming skills are required, including MuJoCo, ROS, C++ and Python. Duration and start dates are flexible.
-
CV1703: Software development in ROS for robotic manipulation
MERL is offering an internship position for non-research software development for robotic manipulation. The scope of the internship is to develop robust ROS packages by refactoring existing experimental code. The position is open to prospective candidates with very strong programming skills in ROS (Robot Operating System) using C++ primarily and Python respectively. The selected intern will have a software engineering role rather than research oriented. The position is open to both senior undergraduate students and master students. Flexible start and end dates.
See All Internships for Computer Vision -
-
Recent Publications
- "Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2022.BibTeX TR2022-019 PDF
- @inproceedings{Shah2022apr,
- author = {Shah, Ankit Parag and Geng, Shijie and Gao, Peng and Cherian, Anoop and Hori, Takaaki and Marks, Tim K. and Le Roux, Jonathan and Hori, Chiori},
- title = {Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2022,
- month = apr,
- url = {https://www.merl.com/publications/TR2022-019}
- }
, - "Learning to Synthesize Volumetric Meshes from Vision-based Tactile Imprints", IEEE International Conference on Robotics and Automation (ICRA) 2022, March 2022.BibTeX TR2022-035 PDF
- @article{Zhu2022mar,
- author = {Zhu, Xinghao and Jain, Siddarth and Tomizuka, Masayoshi and van Baar, Jeroen},
- title = {Learning to Synthesize Volumetric Meshes from Vision-based Tactile Imprints},
- journal = {IEEE International Conference on Robotics and Automation (ICRA) 2022},
- year = 2022,
- month = mar,
- url = {https://www.merl.com/publications/TR2022-035}
- }
, - "Overview of Audio Visual Scene-Aware Dialog with Reasoning Track for Natural Language Generation in DSTC10", The 10th Dialog System Technology Challenge Workshop at AAAI, February 2022.BibTeX TR2022-016 PDF
- @inproceedings{Hori2022feb,
- author = {Hori, Chiori and Shah, Ankit Parag and Geng, Shijie and Gao, Peng and Cherian, Anoop and Hori, Takaaki and Le Roux, Jonathan and Marks, Tim K.},
- title = {Overview of Audio Visual Scene-Aware Dialog with Reasoning Track for Natural Language Generation in DSTC10},
- booktitle = {The 10th Dialog System Technology Challenge Workshop at AAAI},
- year = 2022,
- month = feb,
- url = {https://www.merl.com/publications/TR2022-016}
- }
, - DSTC10-AVSD Submission System with Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning, February 2022.BibTeX TR2022-025 PDF
- @book{Shah2022feb,
- author = {Shah, Ankit Parag and Hori, Takaaki and Le Roux, Jonathan and Hori, Chiori},
- title = {DSTC10-AVSD Submission System with Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning},
- year = 2022,
- month = feb,
- url = {https://www.merl.com/publications/TR2022-025}
- }
, - "(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering", AAAI Conference on Artificial Intelligence, February 2022.BibTeX TR2022-014 PDF Video Presentation
- @inproceedings{Cherian2022feb,
- author = {Cherian, Anoop and Hori, Chiori and Marks, Tim K. and Le Roux, Jonathan},
- title = {(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering},
- booktitle = {AAAI Conference on Artificial Intelligence},
- year = 2022,
- month = feb,
- url = {https://www.merl.com/publications/TR2022-014}
- }
, - "Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action Recognition", AAAI Conference on Artificial Intelligence, February 2022.BibTeX TR2022-015 PDF Presentation
- @inproceedings{Ke2022feb,
- author = {Ke, Lipeng and Peng, Kuan-Chuan and Lyu, Siwei},
- title = {Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action Recognition},
- booktitle = {AAAI Conference on Artificial Intelligence},
- year = 2022,
- month = feb,
- url = {https://www.merl.com/publications/TR2022-015}
- }
, - "Max-Margin Contrastive Learning", AAAI Conference on Artificial Intelligence, February 2022.BibTeX TR2022-013 PDF
- @inproceedings{Shah2022feb,
- author = {Shah, Anshul and Sra, Suvrit and Chellappa, Rama and Cherian, Anoop},
- title = {Max-Margin Contrastive Learning},
- booktitle = {AAAI Conference on Artificial Intelligence},
- year = 2022,
- month = feb,
- url = {https://www.merl.com/publications/TR2022-013}
- }
, - "MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation", AAAI Conference on Artificial Intelligence, February 2022.BibTeX TR2022-011 PDF Video
- @inproceedings{Medin2022feb,
- author = {Medin, Safa C. and Egger, Bernhard and Cherian, Anoop and Wang, Ye and Tenenbaum, Joshua B. and Liu, Xiaoming and Marks, Tim K.},
- title = {MOST-GAN: 3D Morphable StyleGAN for Disentangled Face Image Manipulation},
- booktitle = {AAAI Conference on Artificial Intelligence},
- year = 2022,
- month = feb,
- url = {https://www.merl.com/publications/TR2022-011}
- }
,
- "Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2022.
-
Videos
-
[MERL Seminar Series Spring 2022] Self-Supervised Scene Representation Learning
-
[MERL Seminar Series Spring 2022] Learning Speech Representations with Multimodal Self-Supervision
-
HealthCam: A system for non-contact monitoring of vital signs
-
[MERL Seminar Series 2021] Learning to See by Moving: Self-supervising 3D scene representations for perception, control, and visual reasoning
-
[MERL Seminar Series 2021] Look and Listen: From Semantic to Spatial Audio-Visual Perception
-
Towards Human-Level Learning of Complex Physical Puzzles
-
Scene-Aware Interaction Technology
-
3D Object Discovery and Modeling Using Single RGB-D Images Containing Multiple Object Instances
-
Joint 3D Reconstruction of a Static Scene and Moving Objects
-
Direct Multichannel Tracking
-
FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds
-
FasTFit: A fast T-spline fitting algorithm
-
CASENet: Deep Category-Aware Semantic Edge Detection
-
Object Detection and Tracking in RGB-D SLAM via Hierarchical Feature Grouping
-
Pinpoint SLAM: A Hybrid of 2D and 3D Simultaneous Localization and Mapping for RGB-D Sensors
-
Action Detection Using A Deep Recurrent Neural Network
-
Dose optimization for particle beam therapy
-
3D Reconstruction
-
MERL Research on Autonomous Vehicles
-
Saffron - Digital Type System
-
Obstacle Detection
-
Semantic Scene Labeling
-
Robot Bin Picking
-
Sapphire - High Accuracy NC Milling Simulation
-
Deep Hierarchical Parsing for Semantic Segmentation
-
Global Local Face Upsampling Network
-
Gaussian Conditional Random Field Network for Semantic Segmentation
-
Fast Graspability Evaluation on Single Depth Maps for Bin Picking with General Grippers
-
Point-Plane SLAM for Hand-Held 3D Sensors
-
Tracking an RGB-D Camera Using Points and Planes
-
Fast Plane Extraction in Organized Point Clouds Using Agglomerative Hierarchical Clustering
-
Calibration of Non-Overlapping Cameras Using an External SLAM System
-
Voting-Based Pose Estimation for Robotic Assembly Using a 3D Sensor
-
Fast Object Localization and Pose Estimation in Heavy Clutter for Robotic Bin Picking
-
Learning to rank 3D features
-
-
Software Downloads
-
Instance Segmentation GAN
-
Audio Visual Scene-Graph Segmentor
-
Generating Visual Dynamics from Sound and Context
-
Adversarially-Contrastive Optimal Transport
-
MotionNet
-
Contact-Implicit Trajectory Optimization
-
FoldingNet++
-
Landmarks’ Location, Uncertainty, and Visibility Likelihood
-
Gradient-based Nikaido-Isoda
-
Circular Maze Environment
-
Discriminative Subspace Pooling
-
Kernel Correlation Network
-
Fast Resampling on Point Clouds via Graphs
-
FoldingNet
-
Joint Geodesic Upsampling
-
Plane Extraction using Agglomerative Clustering
-