TR2003-119

Investigation on Effectiveness of Mid-level Feature Representation for Semantic Boundary Detection in News Video


    •  Radhakrishan, R., Xiong, Z., Divakaran, A., Raj, B., "Investigation on Effectiveness of Mid-Level Feature Representation for Semantic Boundary Detection in News Video", SPIE Conference on Internet Multimedia Management Systems, September 2003, vol. 5242, pp. 74-80.
      BibTeX TR2003-119 PDF
      • @inproceedings{Radhakrishan2003sep,
      • author = {Radhakrishan, R. and Xiong, Z. and Divakaran, A. and Raj, B.},
      • title = {Investigation on Effectiveness of Mid-Level Feature Representation for Semantic Boundary Detection in News Video},
      • booktitle = {SPIE Conference on Internet Multimedia Management Systems},
      • year = 2003,
      • volume = 5242,
      • pages = {74--80},
      • month = sep,
      • url = {https://www.merl.com/publications/TR2003-119}
      • }
Abstract:

In our past work, we have attempted to use a mid-level feature namely the state population histogram obtained from the Hidden Markov Model (HMM) of a general sound class, for speaker change detection so as to extract semantic boundaries in broadcast news. In this paper, we compare the performance of our previous approach with another approach based on video shot detection and speaker change detection using the Bayesian Information Criterion (BIC). Our experiments show that the latter approach performs significantly better than the former. This motivated us to examine the mid-level feature closely. We found that the component population histogram enabled discovery of broad phonetic categories such as vowels, nasals, fricatives etc, regardless of the number of distinct speakers in the test utterance. In order for it to be useful for speaker change detection, the individual components should model the phonetic sounds of each speaker separately. From our experiments, we conclude that state/component population histograms can only be useful for further clustering or semantic class discovery if the features are chosen carefully so that the individual states represent the semantic categories of interest.

 

  • Related News & Events

    •  NEWS    SPIE Conference on Internet Multimedia Management Systems 2003: 2 publications by Ajay Divakaran and others
      Date: September 9, 2003
      Where: SPIE Conference on Internet Multimedia Management Systems
      Brief
      • The papers "An Extended Framework for Adaptive Playback-Based Video Summarization" by Peker, K.A. and Divakaran, A. and "Investigation on Effectiveness of Mid-Level Feature Representation for Semantic Boundary Detection in News Video" by Radhakrishan, R., Xiong, Z., Divakaran, A. and Raj, B. were presented at the SPIE Conference on Internet Multimedia Management Systems.
    •