Human action recognition from video sequences is one of the fundamental problems in computer vision. In this research, we investigate and propose representation learning approaches towards solving this problem, which we call discriminative subspace pooling. Specifically, we combine recent deep learning approaches with techniques for generating adversarial perturbations into learning novel representations that can summarize long video sequences into compact descriptors – these descriptors capture essential properties of the input videos that are sufficient to achieve good recognition rates. We make two contributions. First, we propose a subspace-based discriminative classifier, similar to a non-linear SVM, but having piecewise-linear decision boundaries, where these boundaries are along orthogonal directions (as a subspace). Computing such decision boundaries need not require kernel space embeddings, but could be achieved using Riemannian optimization techniques. However, for classification, we need a negative set to be classified against. To this end, our second contribution is to apply universal adversarial perturbations on deep features computed from the input videos to generate the negative set. These perturbatoins are such that they are highly likely to result in the mis-classification of the deep features (on its originally trained classifier). Our learned subspace thus picks up those dimensions in the data that are vulnerable to mis-classification, implicitly capturing deep features that are action related.
This software implements the Discriminative Subspace Pooling (DSP) for video-based action recognition. The software has two modules: (i) that implements generation of adversarial perturbations using a fully-connected neural network, and (ii) computing the DSP descriptors using these perturbations. We also provide sample feature data (from a subset of the popular HMDB51 dataset) to demonstrate the working of our scheme. The approach presented in this code was published in the 2018 European Conference on Computer Vision (ECCV) in a paper titled "Discriminative Subspace Pooling Using Adversarial Perturbvations".
To download the software, please enter some information about yourself, then review and agree to MERL's research-only licensing terms.