Adaptive Fast Playback-Based Video Skimming Using a Compressed-Domain Visual Complexity Measure

    •  Kadir A. Peker, Ajay Divakaran, "Adaptive Fast Playback-Based Video Skimming Using a Compressed-Domain Visual Complexity Measure", Tech. Rep. TR2003-33, Mitsubishi Electric Research Laboratories, Cambridge, MA, June 2004.
      BibTeX TR2003-33 PDF
      • @techreport{MERL_TR2003-33,
      • author = {Kadir A. Peker, Ajay Divakaran},
      • title = {Adaptive Fast Playback-Based Video Skimming Using a Compressed-Domain Visual Complexity Measure},
      • institution = {MERL - Mitsubishi Electric Research Laboratories},
      • address = {Cambridge, MA 02139},
      • number = {TR2003-33},
      • month = jun,
      • year = 2004,
      • url = {}
      • }
  • Research Area:

    Digital Video


Psychophysical experiments have shown that the human visual system is sensitive to visual stimuli only within a certain spatio-temporal window. the location of a moving image in the spatio-temporal space is determined by the spatial frequency content of image regions and their velocity. We present a novel compressed domain measure of spatio-temporal motion activity of a video segment that provides us with a criteria on how fast a video segment can be played within human perceptual limits. Alternatively, this measure allows us to determine the spatio-temporal filtering required for an acceptable playback of a video segment at a given fast playback speed. The spatio-temporal activity measure is computed in the compressed domain and allows for generation of instant skims through video content at any point forward using adaptive fast playback. The adaptive fast playback method using spatio-temporal complexity is based on early vision characteristics of the human visual system only, and thus independent of content type and semantics so it is applicable in a wide range of applications. It is best suited for low temporal compression summarization. A visual of the content is preserved at all times; hence the temporal continuity of the action is preserved, and the risk of missing an important event is eliminated as well. The user can switch between skim mode and regular playback at anytime or change the speed-up ratio of the fast playback. Our simulations on various types of video indicate that the presented video skimming and summarization method is effective and useful. Finally, the adaptive fast playback framework can be extended to include other inputs such as face detection, dialog detection, or semantic annotation. It can also be integrated with other summarization methods that try to capture the semantics.