
The flexible factorization is a solution to the dual problems of
visually tracking a nonrigid (or rigid) object and recovering its 3D
structure and motion. The method works with low-quality low-resolution video
from an uncalibrated camera, and can be fully automated. The recovered
morphable model can be re-animated with new motions, shape, or texture. The
animation above shows a 148-frame video re-rendered with 3D depth
information recovered from a flexible factorization of the face. The
inset shows the original frames. Some
synthetic profiles are shown at left, at the original resolution.
The recovered model consists of a set of 3D points specifying shape, and
for each point, a basis set of 3D point displacements, or morphs,
whose combinations reproduce the nonrigid motions viewed in the original video.
The middle image in the figure at right shows the recovered 3D points
projected onto one video frame. The flanking diagrams depict front and
side views of the recovered model with the three dominant morphs represented
as displacement arrows. These morphs factor the actor's facial expressions
into eyebrow and upper-face motions (blue), mouth
and cheek contractions (green), and
smiles/grimaces (red).
Technical overview: Nonrigid 3D structure-from-motion and 2D optical flow can both be formulated as tensor factorization problems, where the rank of the factors can be deduced from the motion itself. The two problems can be related through a (noisy) affine transform, yielding a combined nonrigid-structure-from-intensities problem that we solve via structured matrix decompositions. Often the preconditions for this factorization are violated by image noise and deficiencies of the data vis-a-vis the sample complexity of the problem. Both issues are remediated with careful use of rank constraints, norm constraints, and integration over uncertainty in the intensity values. The resulting algorithm can track and 3D-reconstruct surfaces that have very little texture, such as the forehead and cheeks of the actor (which appear smooth in the low-res video) and even the smooth skin of young children. Although we give a "zero-knowledge" (untrained and purely algebraic) solution, the algorithm has an especially efficient incremental variant that "learns" as it goes, getting better at tracking as new video frames come in and inferring the locations of self-occluded points.
Morphable 3D models from
video.
Matthew
Brand.
Best paper prize, Computer Vision & Pattern Recognition, CVPR 2001.