250,141 research outputs found

    Geometric Invariance In The Analysis Of Human Motion In Video Data

    Get PDF
    Human motion analysis is one of the major problems in computer vision research. It deals with the study of the motion of human body in video data from different aspects, ranging from the tracking of body parts and reconstruction of 3D human body configuration, to higher level of interpretation of human action and activities in image sequences. When human motion is observed through video camera, it is perspectively distorted and may appear totally different from different viewpoints. Therefore it is highly challenging to establish correct relationships between human motions across video sequences with different camera settings. In this work, we investigate the geometric invariance in the motion of human body, which is critical to accurately understand human motion in video data regardless of variations in camera parameters and viewpoints. In human action analysis, the representation of human action is a very important issue, and it usually determines the nature of the solutions, including their limits in resolving the problem. Unlike existing research that study human motion as a whole 2D/3D object or a sequence of postures, we study human motion as a sequence of body pose transitions. We also decompose a human body pose further into a number of body point triplets, and break down a pose transition into the transition of a set of body point triplets. In this way the study of complex non-rigid motion of human body is reduced to that of the motion of rigid body point triplets, i.e. a collection of planes in motion. As a result, projective geometry and linear algebra can be applied to explore the geometric invariance in human motion. Based on this formulation, we have discovered the fundamental ratio invariant and the eigenvalue equality invariant in human motion. We also propose solutions based on these geometric invariants to the problems of view-invariant recognition of human postures and actions, as well as analysis of human motion styles. These invariants and their applicability have been validated by experimental results supporting that their effectiveness in understanding human motion with various camera parameters and viewpoints

    Video-based similar gesture action recognition using deep learning and GAN-based approaches

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Human action is not merely a matter of presenting patterns of motion of different parts of the body, in addition, it is also a description of intention, emotion and thoughts of the person. Hence, it has become a crucial component in human behavior analysis and understanding. Human action recognition has a wide variety of applications such as surveillance, robotics, health care, video searching and human-computer interaction. Analysing human actions manually is tedious and easily prone to errors. Therefore, computer scientists have been trying to bring the abilities of cognitive video understanding to human action recognition systems by using computer vision techniques. However, human action recognition is a complex task in computer vision because of the camera motion, occlusion, background cluttering, viewpoint variation, execution rate and similar gestures. These challenges significantly degrade the performance of the human action recognition system. The purpose of this research is to propose solutions based on traditional machine learning methods as well as the state-of-the-art deep learning methods to automatically process video-based human action recognition. This thesis investigates three research areas of video-based human action recognition: traditional human action recognition, similar gesture action recognition, and data augmentation for human action recognition. To start with, the feature-based methods using classic machine learning algorithms have been studied. Recently, deep convolutional neural networks (CNN) have taken their place in the computer vision and human action recognition research areas and have achieved tremendous success in comparison to traditional machine learning techniques. Current state-of-the-art deep convolutional neural networks were used for the human action recognition task. Furthermore, recurrent neural networks (RNN) and its variation of long-short term memory (LSTM) are used to process the time series features which are handcrafted features or extracted from the CNN. However, these methods suffer from similar gestures, which appear in the human action videos. Thus, a hierarchical classification framework is proposed for similar gesture action recognition, and the performance is improved by the multi-stage classification approach. Additionally, the framework has been modified into an end-to-end system, therefore, the similar gestures can be processed by the system automatically. In this study, a novel data augmentation framework for action recognition has been proposed, the objective is to generate well learnt video frames from action videos which can enlarge the dataset size as well as the feature bias. It is very important for a human action recognition system to recognize the actions with similar gestures as accurately as possible. For such a system, a generative adversarial net (GAN) is applied to learn the original video datasets and generate video frames by playing an adversarial game. Furthermore, a framework is developed for classifying the original dataset in the first place to obtain the confusion matrix using a CNN. The similar gesture actions will be paired based on the confusion matrix results. The final classification result will be applied on the fusion dataset which contains both original and generated video frames. This study will provide realtime and practical solutions for autonomous human action recognition system. The analysis of similar gesture actions will improve the performance of the existing CNN-based approaches. In addition, the GAN-based approaches from computer vision have been applied to the graph embedding area, because graph embedding is similar to image embedding but used for different purposes. Unlike the purpose of the GAN in computer vision for generating the images, the GAN in graph embedding can be used to regularize the embedding. So the proposed methods are able to reconstruct both structural characteristics and node features, which naturally possess the interaction between these two sources of information while learning the embedding

    Exploring Motion Signatures for Vision-Based Tracking, Recognition and Navigation

    Get PDF
    As cameras become more and more popular in intelligent systems, algorithms and systems for understanding video data become more and more important. There is a broad range of applications, including object detection, tracking, scene understanding, and robot navigation. Besides the stationary information, video data contains rich motion information of the environment. Biological visual systems, like human and animal eyes, are very sensitive to the motion information. This inspires active research on vision-based motion analysis in recent years. The main focus of motion analysis has been on low level motion representations of pixels and image regions. However, the motion signatures can benefit a broader range of applications if further in-depth analysis techniques are developed. In this dissertation, we mainly discuss how to exploit motion signatures to solve problems in two applications: object recognition and robot navigation. First, we use bird species recognition as the application to explore motion signatures for object recognition. We begin with study of the periodic wingbeat motion of flying birds. To analyze the wing motion of a flying bird, we establish kinematics models for bird wings, and obtain wingbeat periodicity in image frames after the perspective projection. Time series of salient extremities on bird images are extracted, and the wingbeat frequency is acquired for species classification. Physical experiments show that the frequency based recognition method is robust to segmentation errors and measurement lost up to 30%. In addition to the wing motion, the body motion of the bird is also analyzed to extract the flying velocity in 3D space. An interacting multi-model approach is then designed to capture the combined object motion patterns and different environment conditions. The proposed systems and algorithms are tested in physical experiments, and the results show a false positive rate of around 20% with a low false negative rate close to zero. Second, we explore motion signatures for vision-based vehicle navigation. We discover that motion vectors (MVs) encoded in Moving Picture Experts Group (MPEG) videos provide rich information of the motion in the environment, which can be used to reconstruct the vehicle ego-motion and the structure of the scene. However, MVs suffer from high noise level. To handle the challenge, an error propagation model for MVs is first proposed. Several steps, including MV merging, plane-at-infinity elimination, and planar region extraction, are designed to further reduce noises. The extracted planes are used as landmarks in an extended Kalman filter (EKF) for simultaneous localization and mapping. Results show that the algorithm performs localization and plane mapping with a relative trajectory error below 5:1%. Exploiting the fact that MVs encodes both environment information and moving obstacles, we further propose to track moving objects at the same time of localization and mapping. This enables the two critical navigation functionalities, localization and obstacle avoidance, to be performed in a single framework. MVs are labeled as stationary or moving according to their consistency to geometric constraints. Therefore, the extracted planes are separated into moving objects and the stationary scene. Multiple EKFs are used to track the static scene and the moving objects simultaneously. In physical experiments, we show a detection rate of moving objects at 96:6% and a mean absolute localization error below 3:5 meters

    Going Deeper into Action Recognition: A Survey

    Full text link
    Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader

    A PCA approach to the object constancy for faces using view-based models of the face

    Get PDF
    The analysis of object and face recognition by humans attracts a great deal of interest, mainly because of its many applications in various fields, including psychology, security, computer technology, medicine and computer graphics. The aim of this work is to investigate whether a PCA-based mapping approach can offer a new perspective on models of object constancy for faces in human vision. An existing system for facial motion capture and animation developed for performance-driven animation of avatars is adapted, improved and repurposed to study face representation in the context of viewpoint and lighting invariance. The main goal of the thesis is to develop and evaluate a new approach to viewpoint invariance that is view-based and allows mapping of facial variation between different views to construct a multi-view representation of the face. The thesis describes a computer implementation of a model that uses PCA to generate example- based models of the face. The work explores the joint encoding of expression and viewpoint using PCA and the mapping between viewspecific PCA spaces. The simultaneous, synchronised video recording of 6 views of the face was used to construct multi-view representations, which helped to investigate how well multiple views could be recovered from a single view via the content addressable memory property of PCA. A similar approach was taken to lighting invariance. Finally, the possibility of constructing a multi-view representation from asynchronous view-based data was explored. The results of this thesis have implications for a continuing research problem in computer vision – the problem of recognising faces and objects from different perspectives and in different lighting. It also provides a new approach to understanding viewpoint invariance and lighting invariance in human observers
    • …
    corecore