Video Querying Via Compact Descriptors of Visually Salient Objects

We consider the problem of extracting descriptors that represent visually salient portions of a video sequence. Most state-of-the-art schemes generate video descriptors by extracting features, e.g., SIFT or SURF or other keypoint-based features, from individual video frames. This approach is wasteful in scenarios that impose constraints on storage, communication overhead and on the allowable computational complexity for video querying. More importantly, the descriptors obtained by this approach generally do not provide semantic clues about the video content. In this paper, we investigate new feature-agnostic approaches for efficient retrieval of similar video content. We evaluate the efficiency and accuracy of retrieval when k-means clustering is applied to image features extracted from video frames. We also propose a new approach in which the extraction of compact video descriptors is cast as a Non-negative Matrix Factorization (NMF) problem. Initial experiments on video-based matching suggest that compact descriptors obtained via low-rank matrix factorization improve discriminability and robustness to parameter selection compared to k-means clustering.