GPU Pose Estimation

Object pose (location and orientation) estimation is a common task in many computer vision applications. Although many methods exist, most algorithms require a reasonably accurate initial pose and lack robustness to illumination variation, appearance change, and partial occlusions. We propose a GPU-based fast method for automatic pose estimation without manual setting of initial pose based on shape matching of a 3D model to a range image of the scene. Our algorithm is simple and accurately estimates the pose of partially occluded objects in cluttered scenes in about half a second.

Background & Objective:  Pose estimation in scenes with clutter (due to unwanted objects and noise) and occlusions (due to multiple overlapping objects) is challenging. Pose estimation from range images--where each pixel contains an estimate of the distance to the closest object--is more robust to changes in illumination, shadows, and lack of features. Range images can be robustly acquired with active light systems. If a database of 3D models of objects is available, one can use model-based techniques, where the 3D model of the object is matched to the range image of the scene. We developed a novel model-based pose estimation algorithm for range images that runs entirely on modern Graphics Processing Units (GPUs). The massive data-parallel processing on GPUs )NVIDIA GeForce 8800 GTX) makes our method over 60 times faster than a comparable CPU implementation (on Pentium D945). Our method does not require manual setting of initial pose and accurately computes object poses for synthetic or laser scan data in about half a second.

Technical Discussion:  The figure shows an overview of our method. In a pre-processing step, we use a 3D model or detailed scan of the object and render it in different poses. Each pose is stored as a reference range map in texture memory. This task has to be performed only once per reference object. During online pose estimation, we acquire a 3D scan of the scene using an active light method (in our case a laser range scan). We smooth the 3D scan on the GPU using a median filter to compute the input range map. The task is now to find the best match between reference range maps and input range map through error minimization by pairwise comparisons. We devised a novel error function that uses the range values and Euclidean distance maps. The error function can be evaluated per pixel, which makes it suitable for efficient processing on GPUs. To efficiently minimize the error we developed a novel data-parallel version of the downhill simplex algorithm that runs entirely on the GPU.
All GPU code is implemented on NVIDIA’s Compute Unified Device Architecture (CUDA).

Future Direction:  We are currently improving the speed and robustness of our GPU pose estimation algorithm.

Contacts:
Joseph Katz
In Kyu Park

Technology Area:  Imaging

Modification Date:  September 17, 2007