TR2005-032

A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from Unscripted Multimedia


    •  Radhakrishnan, R., Divakaran, A., Xiong, Z., Otsuka, I., "A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from Unscripted Multimedia", EURASIP Journal on Applied Signal Processing, Vol. 2006, pp. 1-24, May 2005.
      BibTeX TR2005-032 PDF
      • @article{Radhakrishnan2005may,
      • author = {Radhakrishnan, R. and Divakaran, A. and Xiong, Z. and Otsuka, I.},
      • title = {A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from Unscripted Multimedia},
      • journal = {EURASIP Journal on Applied Signal Processing},
      • year = 2005,
      • volume = 2006,
      • pages = {1--24},
      • month = may,
      • url = {https://www.merl.com/publications/TR2005-032}
      • }
Abstract:

We propose a content-adaptive analysis and representation framework to discover events using audio features from \"unscripted\" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier based temporal segmentation of the content. It is motivated by the observation that \"interesting\" events in unscripted multimedia occur sparsely in a background of usual or \"uninteresting\" events. We treat the sequence of low / mid level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low and mid level audio features extracted from sports video show that \"highlight\" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select \"key audio classes\" that are indicative of events of interest in the chosen domain.

 

  • Related News & Events