4 research outputs found

    Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection

    Get PDF
    Abstract—This paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure video-to-video volume similarity by extending Canonical Correlation Analysis (CCA), a principled tool to inspect linear relations between two sets of vectors, to that of two multiway data arrays (or tensors). The proposed method analyzes video volumes as inputs avoiding the difficult problem of explicit motion estimation required in traditional methods and provides a way of spatiotemporal pattern matching that is robust to intraclass variations of actions. The proposed matching is demonstrated for action classification by a simple Nearest Neighbor classifier. We, moreover, propose an automatic action detection method, which performs 3D window search over an input video with action exemplars. The search is speeded up by dynamic learning of subspaces in the proposed CCA. Experiments on a public action data set (KTH) and a self-recorded hand gesture data showed that the proposed method is significantly better than various state-ofthe-art methods with respect to accuracy. Our method has low time complexity and does not require any major tuning parameters. Index Terms—Action categorization, gesture recognition, canonical correlation analysis, tensor, action detection, incremental subspace learning, spatiotemporal pattern classification. Ç

    Development of a human fall detection system based on depth maps

    Get PDF
    Assistive care related products are increasingly in demand with the recent developments in health sector associated technologies. There are several studies concerned in improving and eliminating barriers in providing quality health care services to all people, especially elderly who live alone and those who cannot move from their home for various reasons such as disable, overweight. Among them, human fall detection systems play an important role in our daily life, because fall is the main obstacle for elderly people to live independently and it is also a major health concern due to aging population. The three basic approaches used to develop human fall detection systems include some sort of wearable devices, ambient based devices or non-invasive vision based devices using live cameras. Most of such systems are either based on wearable or ambient sensor which is very often rejected by users due to the high false alarm and difficulties in carrying them during their daily life activities. Thus, this study proposes a non-invasive human fall detection system based on the height, velocity, statistical analysis, fall risk factors and position of the subject using depth information from Microsoft Kinect sensor. Classification of human fall from other activities of daily life is accomplished using height and velocity of the subject extracted from the depth information after considering the fall risk level of the user. Acceleration and activity detection are also employed if velocity and height fail to classify the activity. Finally position of the subject is identified for fall confirmation or statistical analysis is conducted to verify the fall event. From the experimental results, the proposed system was able to achieve an average accuracy of 98.3% with sensitivity of 100% and specificity of 97.7%. The proposed system accurately distinguished all the fall events from other activities of daily life

    New human action recognition scheme with geometrical feature representation and invariant discretization for video surveillance

    Get PDF
    Human action recognition is an active research area in computer vision because of its immense application in the field of video surveillance, video retrieval, security systems, video indexing and human computer interaction. Action recognition is classified as the time varying feature data generated by human under different viewpoint that aims to build mapping between dynamic image information and semantic understanding. Although a great deal of progress has been made in recognition of human actions during last two decades, few proposed approaches in literature are reported. This leads to a need for much research works to be conducted in addressing on going challenges leading to developing more efficient approaches to solve human action recognition. Feature extraction is the main tasks in action recognition that represents the core of any action recognition procedure. The process of feature extraction involves transforming the input data that describe the shape of a segmented silhouette of a moving person into the set of represented features of action poses. In video surveillance, global moment invariant based on Geometrical Moment Invariant (GMI) is widely used in human action recognition. However, there are many drawbacks of GMI such that it lack of granular interpretation of the invariants relative to the shape. Consequently, the representation of features has not been standardized. Hence, this study proposes a new scheme of human action recognition (HAR) with geometrical moment invariants for feature extraction and supervised invariant discretization in identifying actions uniqueness in video sequencing. The proposed scheme is tested using IXMAS dataset in video sequence that has non rigid nature of human poses that resulting from drastic illumination changes, changing in pose and erratic motion patterns. The invarianceness of the proposed scheme is validated based on the intra-class and inter-class analysis. The result of the proposed scheme yields better performance in action recognition compared to the conventional scheme with an average of more than 99% accuracy while preserving the shape of the human actions in video images

    Human action recognition using mutual invariants

    No full text
    Static and temporally varying 3D invariants are proposed for capturing the spatio-temporal dynamics of a general human action to enable its representation in a compact, view-invariant manner. Two variants of the representation are presented and studied: (1) a restricted-3D version, whose theory and implementation are simple and efficient but which can be applied only to a restricted class of human action, and (2) a full-3D version, whose theory and implementation are more complex but which can be applied to any general human action. A detailed analysis of the two representations is presented. We show why a straightforward implementation of the key ideas does not work well in the general case, and present strategies designed to overcome inherent weaknesses in the approach. What results is an approach for human action modeling and recognition that is not only invariant to viewpoint, but is also robust enough to handle different people, different speeds of action (and hence, frame rate) and minor variabilities in a given action, while encoding sufficient distinction among actions. Results on 2D projections of human motion capture and on manually segmented real imag
    corecore