TR97-25

Learning concise models of human activity from ambient video via a structure-inducing M-step estimator


    •  Matthew Brand, "Learning concise models of human activity from ambient video via a structure-inducing M-step estimator", Tech. Rep. TR97-25, Mitsubishi Electric Research Laboratories, Cambridge, MA, November 1997.
      BibTeX Download PDF
      • @techreport{MERL_TR97-25,
      • author = {Matthew Brand},
      • title = {Learning concise models of human activity from ambient video via a structure-inducing M-step estimator},
      • institution = {MERL - Mitsubishi Electric Research Laboratories},
      • address = {Cambridge, MA 02139},
      • number = {TR97-25},
      • month = nov,
      • year = 1997,
      • url = {http://www.merl.com/publications/TR97-25/}
      • }
  • MERL Contact:
  • Research Area:

    Algorithms


We introduce a method for structure discovery in data and use it to learn a normative theory about the behavior of the visual world from coarse image representations. The theory takes the form of a concise probabilistic automaton---specifically, a continuous-output hidden Markov model (HMM)---but the induction method applies generally to any conditional probability model. The learning algorithm introduces and exploits an entropic prior for fast, simultaneous estimation of model structure and parameters. Although not motivated as such, the prior and its maximum {em a posteriori} (MAP) estimator can be understood as an exact formulation of minimum description length (MDL) for Bayesian point estimation; we present an exact solution for the MAP estimator which thus folds MDL into the M-step of expectation-maximization (EM) algorithms. Consequently there is no speculative or wasted computation as in search-based MDL approaches. In contrast to conventionally trained HMMs, entropically trained models are so concise and highly structured that they are interpretable, and can be automatically converted into a flowchart and/or a map of characteristic activities (motion patterns) in the field of view. In this paper we examine the model formed by the system from roughly a half-hour of video of office activity, then demonstrate its ability to detect unusual behavior.