Mitsubishi Electric Research Laboratories

Discovering Meaningful Multimedia Patterns with Audio-Visual Concepts and Associated Text

Citation:   Xie, L.; Kennedy, L.; Chang, S-F; Divakaran, A.; Sun, H.; Lin, C-Y, "Discovering Meaningful Multimedia Patterns with Audio-Visual Concepts and Associated Text", IEEE International Conference on Image Processing (ICIP), ISSN: 1522-4880, Vol. 4, pp. 2383-2386, October 2004 (IEEE Xplore)
MERL Report:  TR2004-128

This paper presents algorithms for finding the meanings of the audio-visual video patterns obtained in the unsupervised discovery process. This problem is interesting in domains where neither perceptual patterns nor semantic concepts have simple structures. The patterns in the video are modeled with hierarchical hidden Markov models, with efficient algorithms to jointly learn the model parameters, the optimal model complexity, as well as the relevant feature subsets. The meanings are contained in words of the speech transcript of the video. The pattern-word association is obtained via co-occurrence analysis and machine translation models. Promising results are obtained on TRECVID news videos: video patterns that associate with distinct topics such as el-nino and politics are itentified; a temporal structure model compares favorably to a non-temporal clustering algorithm.

 Read the full technical report (PDF: 103.1 kB)