Mitsubishi Electric Research Laboratories

Audio-Visual Event Detection for Consumer and Surveillance Video

Audio-visual analysis for event detection in consumer and surveillance video enables us to summarize the content by only including "interesting" events in the summary. The focus of this project is to combine the audio-visual cues to detect these events in order to achieve higher accuracy as well as deeper understanding of audio-visual events.

Background & Objective:  In unscripted content such as sports and surveillance video, "interesting" events happen sparsely in a background of "uninteresting" events. Therefore, by detecting this set of "interesting" events one can summarize the content. Such event detection enables us to detect live events as well as to summarize the content, which in turn enables rapid browsing of stored video, which is applicable to both personal video recorders and surveillance recorders.  The events of interest have characteristic patterns in both video and audio features that may or may not be time aligned. By learning statistical models for these characteristic patterns from the audio-visual features, one can achieve higher accuracy for event detection than is possible when using only one modality (audio or video). In the past we used exclusively video or exclusively audio to detect events for both consumer and surveillance applications. We have used motion activity to detect patterns corresponding to sports highlights as well as changes in highway traffic density. We have also used the "Viola-Jones" real-time object detection framework to detect key video objects from sports video such as Goal Posts in Soccer, Baseball catchers etc. We have also used audio to detect sports highlights by finding long stretches of audience reaction in the form of cheering, applause and commentator's excited speech. While both modes provide powerful cues, we could achieve higher detection accuracy when we combine the results from audio and video analysis.

Technical Discussion:  The challenge lies in identifying key audio-visual markers (objects) that are indicative of events of interest in a domain and then in developing statistical models to detect these key audio-visual markers. For instance, in sports video the audience reaction audio class gets us in the vicinity of an "interesting" event. However, to get to the beginning of the interesting event one needs to find a suitable key video object such as the goal post or the baseball catcher. Then, by associating a video marker with an audio marker one can capture the whole event. Another challenge is to deal with streaming as opposed to stored video.  

Publications:
Divakaran, A.; Radhakrishnan, R.; Peker, K.A., "Blind Summarization: Content-Adaptive Video Summarization Using Time-Series Analysis", SPIE Conference Multimedia Content Analysis, Management and Retrieval, Vol. 6073, pp. 6-10, January 2006 (SPIE Proceedings, TR2006-026)

Radhakrishnan, R.; Divakaran, A.; Smaragdis, P., "Audio Analysis for Surveillance Applications", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 158-161, October 2005 (IEEE Xplore, TR2005-139)

Goh, K-S; Miyahara, K.; Radhakrishan, R.; Xiong, Z.; Divakaran, A., "Audio-Visual Event Detection Based on Mining of Semantic Audio-Visual Labels", SPIE Conference on Storage and Retrieval for Multimedia Databases, Vol. 5307, pp. 292-299, January 2004 (SPIE Proceedings, TR2004-008)

Divakaran, A.; Miyaraha, K.; Peker, K.A.; Radhakrishnan, R.; Xion, Z., "Video Mining Using Combinations of Unsupervised and Supervised Learning Techniques", SPIE Conference on Storage and Retrieval for Multimedia Databases, Vol. 5307, pp. 235-243, January 2004 (SPIE Proceedings, TR2004-007)

Divakaran, A.; Peker, K.A.; Radharkishnan, R.; Xiong, Z.; Cabasson, R., "Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors", Video Mining, Rosenfeld, A.; Doermann, D.; DeMenthon, D., October 2003 (Kluwer Academic Publishers, TR2003-034)

Technology Areas:
Computer Vision
Audio Video Processing
Digital Video

Modification Date:  June 13, 2008