1 research outputs found

    Group Activity Recognition With Differential Recurrent Convolutional Neural Networks

    No full text
    Human group activity recognition has drawn the attention of researchers worldwide because of the significant role it plays in many applications, including video surveillance and public security. Existing solutions for group activity recognition rely on human detection and tracking. To ensure high detection accuracy, current state-of-the-art tracking techniques require human supervision to identify objects of interest before automatic tracking can take place. This limitation has prevented existing approaches from being used in real-world applications. In scenarios when human supervision is unavailable, tracking algorithms could generate inaccurate trajectories and cause a decrease in performance for the existing group analysis methods. To address the aforementioned drawbacks, we investigate in this paper an end-to-end deep model, Differential Recurrent Convolutional Neural Networks (DRCNN). Our model consists of convolutional neural networks (CNN) and stacked differential long short-term memory (DLSTM) networks. It takes sequential raw video data as input and does not consider each group member as an individual object. Different from traditional non-end-to-end solutions which separate the steps of feature extraction and parameter learning, DRCNN utilizes a unified deep model to optimize the parameters of CNN and LSTM hand in hand. It thus has the potential of generating a more harmonious model. In addition, taking advantage of the semantic representation of CNN and the memory states of DLSTM, DRCNN has strong capabilities in understanding complex scene semantics and group dynamics. Extensive experimental studies indicate that the proposed technique can accomplish the task of fully automatic group activity recognition without sacrificing performance, and even outperforms the human-aided state-ofthe- art methods on two benchmark group activity datasets. To the best of our knowledge, this is the first end-to-end group activity recognition technique ever proposed
    corecore