2,905 research outputs found

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Video Action Transformer Network

    Full text link
    We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by using high-resolution, person-specific, class-agnostic queries, the model spontaneously learns to track individual people and to pick up on semantic context from the actions of others. Additionally its attention mechanism learns to emphasize hands and faces, which are often crucial to discriminate an action - all without explicit supervision other than boxes and class labels. We train and test our Action Transformer network on the Atomic Visual Actions (AVA) dataset, outperforming the state-of-the-art by a significant margin using only raw RGB frames as input.Comment: CVPR 201

    Smart environment monitoring through micro unmanned aerial vehicles

    Get PDF
    In recent years, the improvements of small-scale Unmanned Aerial Vehicles (UAVs) in terms of flight time, automatic control, and remote transmission are promoting the development of a wide range of practical applications. In aerial video surveillance, the monitoring of broad areas still has many challenges due to the achievement of different tasks in real-time, including mosaicking, change detection, and object detection. In this thesis work, a small-scale UAV based vision system to maintain regular surveillance over target areas is proposed. The system works in two modes. The first mode allows to monitor an area of interest by performing several flights. During the first flight, it creates an incremental geo-referenced mosaic of an area of interest and classifies all the known elements (e.g., persons) found on the ground by an improved Faster R-CNN architecture previously trained. In subsequent reconnaissance flights, the system searches for any changes (e.g., disappearance of persons) that may occur in the mosaic by a histogram equalization and RGB-Local Binary Pattern (RGB-LBP) based algorithm. If present, the mosaic is updated. The second mode, allows to perform a real-time classification by using, again, our improved Faster R-CNN model, useful for time-critical operations. Thanks to different design features, the system works in real-time and performs mosaicking and change detection tasks at low-altitude, thus allowing the classification even of small objects. The proposed system was tested by using the whole set of challenging video sequences contained in the UAV Mosaicking and Change Detection (UMCD) dataset and other public datasets. The evaluation of the system by well-known performance metrics has shown remarkable results in terms of mosaic creation and updating, as well as in terms of change detection and object detection

    Protective Behavior Detection in Chronic Pain Rehabilitation: From Data Preprocessing to Learning Model

    Get PDF
    Chronic pain (CP) rehabilitation extends beyond physiotherapist-directed clinical sessions and primarily functions in people's everyday lives. Unfortunately, self-directed rehabilitation is difficult because patients need to deal with both their pain and the mental barriers that pain imposes on routine functional activities. Physiotherapists adjust patients' exercise plans and advice in clinical sessions based on the amount of protective behavior (i.e., a sign of anxiety about movement) displayed by the patient. The goal of such modifications is to assist patients in overcoming their fears and maintaining physical functioning. Unfortunately, physiotherapists' support is absent during self-directed rehabilitation or also called self-management that people conduct in their daily life. To be effective, technology for chronic-pain self-management should be able to detect protective behavior to facilitate personalized support. Thereon, this thesis addresses the key challenges of ubiquitous automatic protective behavior detection (PBD). Our investigation takes advantage of an available dataset (EmoPain) containing movement and muscle activity data of healthy people and people with CP engaged in typical everyday activities. To begin, we examine the data augmentation methods and segmentation parameters using various vanilla neural networks in order to enable activity-independent PBD within pre-segmented activity instances. Second, by incorporating temporal and bodily attention mechanisms, we improve PBD performance and support theoretical/clinical understanding of protective behavior that the attention of a person with CP shifts between body parts perceived as risky during feared movements. Third, we use human activity recognition (HAR) to improve continuous PBD in data of various activity types. The approaches proposed above are validated against the ground truth established by majority voting from expert annotators. Unfortunately, using such majority-voted ground truth causes information loss, whereas direct learning from all annotators is vulnerable to noise from disagreements. As the final study, we improve the learning from multiple annotators by leveraging the agreement information for regularization
    • …
    corecore