Application-agnostic spatio-temporal hand graph representations for stable activity understanding


Understanding complex hand actions, such as assembly tasks or kitchen activities, from hand skeleton data is an important yet challenging task. This paper introduces a hand graph-based spatio-temporal feature extraction method which uniquely represents complex hand action in an unsupervised manner. To evaluate the efficacy of the proposed representation, we consider action segmentation and recognition tasks. The segmentation problem involves an assembling task in an industrial setting, while the recognition problem deals with kitchen and office activities. Additionally, for both segmentation and recognition models, we propose notions of stability, which are used to demonstrate the robustness of our proposed approach. We introduce validation loss stability (ValS) and estimation stability with cross-validation (EtS) to analyze robustness of any supervised classification model. The proposed method shows comparable classification performance with state of the art methods, but it achieves significantly better accuracy and stability in a cross-person setting.