Leveraging Self-Supervised Training for Unintentional Action Recognition

Abstract

Unintentional actions are rare occurrences that are difficult to defineprecisely and that are highly dependent on the temporal context of the action.In this work, we explore such actions and seek to identify the points in videoswhere the actions transition from intentional to unintentional. We propose amulti-stage framework that exploits inherent biases such as motion speed,motion direction, and order to recognize unintentional actions. To enhancerepresentations via self-supervised training for the task of unintentionalaction recognition we propose temporal transformations, called TemporalTransformations of Inherent Biases of Unintentional Actions (T2IBUA). Themulti-stage approach models the temporal information on both the level ofindividual frames and full clips. These enhanced representations show strongperformance for unintentional action recognition tasks. We provide an extensiveablation study of our framework and report results that significantly improveover the state-of-the-art.<br

    Similar works