Mitsubishi Electric Research Laboratories

Video Object Tracking

As a part of the Physical Security SK, we developed robust, computationally efficient, unsupervised multi-object tracking techniques that requires minimum initialization effort and fine-tuning for stationary camera setups. Object tracking is a key technology of most surveillance products developed at Mitsubishi Electric.

Background & Objective:  Accurate object segmentation and tracking under the constraint of low computational complexity presents a challenge. Generally speaking, tracking of objects can be done either by back-tracking or by forward-tracking. The back-tracking based approach segments foreground regions in the current image and then establishes the correspondence of regions between the previous image. The forward-tracking approach estimates the positions of the regions in the current frame using the segmentation result obtained for the previous image. For establishing correspondence, several object templates are utilized. The limitation of this approach is a single template is not sufficient for wide variety of applications, e.g. human tracking and traffic monitoring require different templates. A possible forward-tracking technique is mean-shift analysis. Mean-shift is a nonparametric density gradient estimator. It is employed to derive the object candidate that is the most similar to a given model while predicting the next object location. This method provides accurate localization, and it is computationally fast. However, the mean-shift tracker is not automatic since it requires initial model properties, i.e. object boundary, etc.

Technical Discussion:  We developed an automatic and real-time object-tracking algorithm by integrating a mixture model-based background subtraction into a mean-shift based forward tracking mechanism. We combined these methods to accomplish an automatic and robust tracker that can handle high resolution color video in real-time. We also address other main difficulties, i.e. managing sudden illumination changes in the scene, overcasts, shadows, and correspondence problems. We find human face and arms by applying a skin color mask which is formulated in the RGB space by offline training. Our method generates a reference image using pixel-wise mixture models, finds changed part of image by background subtraction, removes shadows by analyzing color and spatial properties of pixels, determines objects, and tracks them in the consecutive frames.
     Currently, we are improving our state-of-art tracker by integrating particle filter based posterior probability estimation methods. In addition to color histograms, we now use edge, motion, and appearance models to guide the mean-shift procedure. We use a novel inside/outside scale adaptation.

Publications:
Porikli, F.M., "Road Extraction by Point-wise Gaussian Models", SPIE Algorithms and Technologies for Multispectral, Hyperspectral and Ultraspectral Imagery IX, Vol. 5093, pp. 758-764, September 2003 (SPIE Proceedings, TR2003-059)

Technology Area:  Computer Vision

Modification Date:  July 14, 2005