TR2003-132

Unsupervised Mining of Statistical Temporal Structures in Video


    •  Xie, L., Chang, S.-F., Divakaran, A., Sun, H., "Unsupervised Mining of Statistical Temporal Structures in Video" in Video Mining, DOI: 10.1007/​978-1-4757-6928-9_10, Springer, October 2003.
      BibTeX TR2003-132 PDF
      • @incollection{Xie2003oct,
      • author = {Xie, L. and Chang, S.-F. and Divakaran, A. and Sun, H.},
      • title = {Unsupervised Mining of Statistical Temporal Structures in Video},
      • booktitle = {Video Mining},
      • year = 2003,
      • month = oct,
      • publisher = {Springer},
      • doi = {10.1007/978-1-4757-6928-9_10},
      • url = {https://www.merl.com/publications/TR2003-132}
      • }
  • MERL Contact:
  • Research Area:

    Digital Video

Abstract:

In this paper, we present algorithms for unsupervised mining of structures in video using multi-scale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semantics, particularly in structured domains like sports. While much work in the literature explores the link between the observations and the semantics using supervised learning, we propose unsupervised structure mining algorithms that aim at alleviating the burden of labelling and training, as well as providing a scalable solution for generalizing video indexing techniques to heterogeneous content collections such as surveillance and consumer videos. Existing unsupervised video structuring works primarily use clustering techniques, while the rich statistical characteristics in the temporal dimension at different granularity remain unexplored. Authomatically identifying structures from an unknown domain poses significant challenges when domain knowledge is not explicitly present to assist algorithm design, model selection, and feature selection. In this work, we model multi-level statistical structures with hierarchical hidden Markov models based on a multi-level Markov dependency assumption. The parameters of the model are efficiently estimated using the EM algorithm, we have also developed a model structure learning algorithm that uses stochastic sampling techniques to find the optimal model structure, and a feature selection algorithm that automatically finds compact relevant feature sets using hybrid wrappeer-filter methods. When tested on sports videos, the unsupervised learning scheme achieves very promising results: (1) The automatically selected feature set for soccer and baseball vides matches the ones that are manually selected with domain knowledge, (2) The system automatically discovers high-level structures that matches the semantic events in the video, (3) The system achieves even slightly better accuracy in detecting semantic events in unlabelled soccer videos than a competing supervised approach designed and trained with domain knowledge.

 

  • Related News & Events

    •  NEWS    Video Mining: 2 publications by Ajay Divakaran, Huifang Sun and others
      Date: October 31, 2003
      Where: Video Mining
      MERL Contact: Huifang Sun
      Brief
      • The articles "Unsupervised Mining of Statistical Temporal Structures in Video" by Xie, L., Chang, S.-F., Divakaran, A. and Sun, H. and "Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors" by Divakaran, A., Peker, K.A., Radharkishnan, R., Xiong, Z. and Cabasson, R. were published in the book Video Mining.
    •