Morphable 3D models from video

synthetic
profile views The flexible factorization is a solution to the dual problems of visually tracking a nonrigid (or rigid) object and recovering its 3D structure and motion. The method works with low-quality low-resolution video from an uncalibrated camera, and can be fully automated. The recovered morphable model can be re-animated with new motions, shape, or texture. The animation above shows a 148-frame video re-rendered with 3D depth information recovered from a flexible factorization of the face. The inset shows the original frames. Some synthetic profiles are shown at left, at the original resolution.

The recovered model consists of a set of 3D points specifying shape, and for each point, a basis set of 3D point displacements, or morphs, whose combinations reproduce the nonrigid motions viewed in the original video. The middle image in the figure at right shows the recovered 3D points projected onto one video frame. The flanking diagrams depict front and side views of the recovered model with the three dominant morphs represented as displacement arrows. These morphs factor the actor's facial expressions into eyebrow and upper-face motions (blue), mouth and cheek contractions (green), and smiles/grimaces (red).

Technical overview: Nonrigid 3D structure-from-motion and 2D optical flow can both be formulated as tensor factorization problems, where the rank of the factors can be deduced from the motion itself. The two problems can be related through a (noisy) affine transform, yielding a combined nonrigid-structure-from-intensities problem that we solve via structured matrix decompositions. Often the preconditions for this factorization are violated by image noise and deficiencies of the data vis-a-vis the sample complexity of the problem. Both issues are remediated with careful use of rank constraints, norm constraints, and integration over uncertainty in the intensity values. The resulting algorithm can track and 3D-reconstruct surfaces that have very little texture, such as the forehead and cheeks of the actor (which appear smooth in the low-res video) and even the smooth skin of young children. Although we give a "zero-knowledge" (untrained and purely algebraic) solution, the algorithm has an especially efficient incremental variant that "learns" as it goes, getting better at tracking as new video frames come in and inferring the locations of self-occluded points.

Morphable 3D models from video.
Matthew Brand.
Best paper prize, Computer Vision & Pattern Recognition, CVPR 2001.


© 2000-2001 Matthew Brand