TR2009-085

Broadcast Video Content Segmentation by Supervised Learning


    •  Wilson, K.W., Divakaran, A., "Broadcast Video Content Segmentation by Supervised Learning" in Multimedia Content Analysis, Divakaran, A., Eds., Signals and Communication Technology, pp. 1-17, Springer, March 2009.
      BibTeX TR2009-085 PDF
      • @incollection{Wilson2009mar,
      • author = {Wilson, K.W. and Divakaran, A.},
      • title = {Broadcast Video Content Segmentation by Supervised Learning},
      • booktitle = {Multimedia Content Analysis},
      • year = 2009,
      • editor = {Divakaran, A.},
      • series = {Signals and Communication Technology},
      • pages = {1--17},
      • month = mar,
      • publisher = {Springer},
      • isbn = {978-0-387-76569-3},
      • url = {https://www.merl.com/publications/TR2009-085}
      • }
  • Research Area:

    Digital Video

Abstract:

Today's viewers of broadcast content are presented with huge amounts of content from broadcast networks, cable networks, pay-per-view, and more. Streaming video over the internet is beginning to add to this flow. Viewers do not have enough time to watch all of this content, and in many cases, even after selecting a few programs of interest, they many want to speed up their viewing of the chosen content, either by summarizing it or by providing tools to rapidly navigate to the most important parts. New display devices and new viewing environments, for example using a cell phone to watch content while riding the bus, will also increase the need for new video summarization and management tools. Video Summarization tools can vary substantially in their goals. For example, tools may seek to create a set of still-image keyframes, or they may create a condensed video skim [14]. Even after specifying the format of the summary, there can be different semantic objectives for the summary. A summary meant to best convey the plot of a situation comedy could differ substantially from a summary meant to show the funniest few scenes from the show. Most of these processing goals remain unachieved despite over a decade of work on video summarization. The fundamental reason for this difficulty is the existence of the "semantic gap", the large separation between computationally easy-to-extract audio and visual features and semantically meaningful items such as spoken words, visual objects, and elements of narrative structure. Because most video summarization goals are stated in semantic terms ("the most informative summary," "the most exciting plays of the match"), while our computational tools are best at extracting simple features like audio energy and color histograms, we must find some way to bridge these two domains.

 

  • Related News & Events