4 research outputs found
Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection
Abstract—This paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure video-to-video volume similarity by extending Canonical Correlation Analysis (CCA), a principled tool to inspect linear relations between two sets of vectors, to that of two multiway data arrays (or tensors). The proposed method analyzes video volumes as inputs avoiding the difficult problem of explicit motion estimation required in traditional methods and provides a way of spatiotemporal pattern matching that is robust to intraclass variations of actions. The proposed matching is demonstrated for action classification by a simple Nearest Neighbor classifier. We, moreover, propose an automatic action detection method, which performs 3D window search over an input video with action exemplars. The search is speeded up by dynamic learning of subspaces in the proposed CCA. Experiments on a public action data set (KTH) and a self-recorded hand gesture data showed that the proposed method is significantly better than various state-ofthe-art methods with respect to accuracy. Our method has low time complexity and does not require any major tuning parameters. Index Terms—Action categorization, gesture recognition, canonical correlation analysis, tensor, action detection, incremental subspace learning, spatiotemporal pattern classification. Ç
Development of a human fall detection system based on depth maps
Assistive care related products are increasingly in demand with the recent
developments in health sector associated technologies. There are several studies
concerned in improving and eliminating barriers in providing quality health care
services to all people, especially elderly who live alone and those who cannot move
from their home for various reasons such as disable, overweight. Among them, human
fall detection systems play an important role in our daily life, because fall is the main
obstacle for elderly people to live independently and it is also a major health concern
due to aging population. The three basic approaches used to develop human fall
detection systems include some sort of wearable devices, ambient based devices or
non-invasive vision based devices using live cameras. Most of such systems are either
based on wearable or ambient sensor which is very often rejected by users due to the
high false alarm and difficulties in carrying them during their daily life activities. Thus,
this study proposes a non-invasive human fall detection system based on the height,
velocity, statistical analysis, fall risk factors and position of the subject using depth
information from Microsoft Kinect sensor. Classification of human fall from other
activities of daily life is accomplished using height and velocity of the subject
extracted from the depth information after considering the fall risk level of the user.
Acceleration and activity detection are also employed if velocity and height fail to
classify the activity. Finally position of the subject is identified for fall confirmation
or statistical analysis is conducted to verify the fall event. From the experimental
results, the proposed system was able to achieve an average accuracy of 98.3% with
sensitivity of 100% and specificity of 97.7%. The proposed system accurately
distinguished all the fall events from other activities of daily life
New human action recognition scheme with geometrical feature representation and invariant discretization for video surveillance
Human action recognition is an active research area in computer vision because of its immense application in the field of video surveillance, video retrieval, security systems, video indexing and human computer interaction. Action recognition is classified as the time varying feature data generated by human under different viewpoint that aims to build mapping between dynamic image information and semantic understanding. Although a great deal of progress has been made in recognition of human actions during last two decades, few proposed approaches in literature are reported. This leads to a need for much research works to be conducted in addressing on going challenges leading to developing more efficient approaches to solve human action recognition. Feature extraction is the main tasks in action recognition that represents the core of any action recognition procedure. The process of feature extraction involves transforming the input data that describe the shape of a segmented silhouette of a moving person into the set of represented features of action poses. In video surveillance, global moment invariant based on Geometrical Moment Invariant (GMI) is widely used in human action recognition. However, there are many drawbacks of GMI such that it lack of granular interpretation of the invariants relative to the shape. Consequently, the representation of features has not been standardized. Hence, this study proposes a new scheme of human action recognition (HAR) with geometrical moment invariants for feature extraction and supervised invariant discretization in identifying actions uniqueness in video sequencing. The proposed scheme is tested using IXMAS dataset in video sequence that has non rigid nature of human poses that resulting from drastic illumination changes, changing in pose and erratic motion patterns. The invarianceness of the proposed scheme is validated based on the intra-class and inter-class analysis. The result of the proposed scheme yields better performance in action recognition compared to the conventional scheme with an average of more than 99% accuracy while preserving the shape of the human actions in video images
Human action recognition using mutual invariants
Static and temporally varying 3D invariants are proposed for capturing the spatio-temporal dynamics of a general human action to enable its representation in a compact, view-invariant manner. Two variants of the representation are presented and studied: (1) a restricted-3D version, whose theory and implementation are simple and efficient but which can be applied only to a restricted class of human action, and (2) a full-3D version, whose theory and implementation are more complex but which can be applied to any general human action. A detailed analysis of the two representations is presented. We show why a straightforward implementation of the key ideas does not work well in the general case, and present strategies designed to overcome inherent weaknesses in the approach. What results is an approach for human action modeling and recognition that is not only invariant to viewpoint, but is also robust enough to handle different people, different speeds of action (and hence, frame rate) and minor variabilities in a given action, while encoding sufficient distinction among actions. Results on 2D projections of human motion capture and on manually segmented real imag