Object Detection and Tracking in RGB-D SLAM via Hierarchical Feature Grouping

We present an object detection and tracking framework integrated into a simultaneous localization and mapping (SLAM) system using an RGB-D camera. We propose a compact representation of objects by grouping features hierarchically. Similar to a keyframe being a collection of features, an object is represented as a set of segments, where a segment is a subset of features in a frame. Just like keyframes, segments are registered with each other in a map, which we call an object map. We use the same SLAM procedure in both offline object scanning and online object detection modes. In the offline scanning mode, we scan an object using an RGB-D camera to generate an object map. In the online detection mode, a set of object maps for different objects is given, and the objects are detected via appearance-based matching between the segments in the current frame and in the object maps. In the case of a match, the object is localized with respect to the map being reconstructed by the SLAM system by a RANSAC registration. In the subsequent frames, the tracking is done by predicting the poses of the objects. We also incorporate constraints obtained from the objects into bundle adjustment to improve the object pose estimation accuracy as well as the SLAM reconstruction accuracy. We demonstrate our technique in an object picking scenario using a robot arm. Experimental results show that the system is able to detect and pick up objects successfully from different viewpoints and distances.