1,713 research outputs found
Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition
This paper contributes to the challenge of skeleton-based human action recognition in
videos. The key step is to develop a generic network architecture to extract discriminative
features for the spatio-temporal skeleton data. In this paper, we propose a novel module,
namely Logsig-RNN, which is the combination of the log-signature layer and recurrent
type neural networks (RNNs). The former one comes from the mathematically principled
technology of signatures and log-signatures as representations for streamed data, which
can manage high sample rate streams, non-uniform sampling and time series of variable
length. It serves as an enhancement of the recurrent layer, which can be conveniently
plugged into neural networks. Besides we propose two path transformation layers to
significantly reduce path dimension while retaining the essential information fed into
the Logsig-RNN module. (The network architecture is illustrated in Figure 1 (Right).)
Finally, numerical results demonstrate that replacing the RNN module by the LogsigRNN module in SOTA networks consistently improves the performance on both Chalearn
gesture data and NTU RGB+D 120 action data in terms of accuracy and robustness.
In particular, we achieve the state-of-the-art accuracy on Chalearn2013 gesture data by
combining simple path transformation layers with the Logsig-RNN
Skeleton-Based Gesture Recognition With Learnable Paths and Signature Features
For the skeleton-based gesture recognition, graph
convolutional networks (GCNs) have achieved remarkable performance since the human skeleton is a natural graph. However,
the biological structure might not be the crucial one for motion
analysis. Also, spatial differential information like joint distance
and angle between bones may be overlooked during the graph
convolution. In this paper, we focus on obtaining meaningful joint
groups and extracting their discriminative features by the path
signature (PS) theory. Firstly, to characterize the constraints and
dependencies of various joints, we propose three types of paths,
i.e., spatial, temporal, and learnable path. Especially, a learnable
path generation mechanism can group joints together that are not
directly connected or far away, according to their kinematic characteristic. Secondly, to obtain informative and compact features,
a deep integration of PS with few parameters are introduced.
All the computational process is packed into two modules, i.e.,
spatial-temporal path signature module (ST-PSM) and learnable
path signature module (L-PSM) for the convenience of utilization.
They are plug-and-play modules available for any neural network
like CNNs and GCNs to enhance the feature extraction ability.
Extensive experiments have conducted on three mainstream
datasets (ChaLearn 2013, ChaLearn 2016, and AUTSL). We
achieved the state-of-the-art results with simpler framework and
much smaller model size. By inserting our two modules into the
several GCN-based networks, we can observe clear improvements
demonstrating the great effectiveness of our proposed method
Signature features with the visibility transformation
The signature in rough path theory provides a graduated summary of a path
through an examination of the effects of its increments. Inspired by recent
developments of signature features in the context of machine learning, we
explore a transformation that is able to embed the effect of the absolute
position of the data stream into signature features. This unified feature is
particularly effective for its simplifying role in allowing the signature
feature set to accommodate nonlinear functions of absolute and relative values
Log signatures in machine learning
Rough path theory, originated as a branch of stochastic analysis, is an emerging tool for analysing complex sequential data in machine learning with increasing attention. This is owing to the core mathematical object of rough path theory, i.e., the signature/log-signature of a path, which has analytical and algebraic properties. This thesis aims to develop a principled and effective model for time series data based on the log-signature method and the recurrent neural network (RNN). The proposed (generalized) Logsig-RNN model can be regarded as a generalization of the RNN model, which boosts the model performance of the RNN by reducing the time dimension and summarising the local structures of sequential data via the log-signature feature. This hybrid model serves as a generic neural network for a wide range of time series applications.
In this thesis, we construct the mathematical formulation for the (generalized) Logsig-RNN model, analyse its complexity and establish the universality. We validate the effectiveness of the proposed method for time series analysis in both supervised learning and generative tasks. In particular, for the skeleton human action recognition tasks, we demonstrates that by replacing the RNN module by the Logsig-RNN in state-of-the-art (SOTA) networks improves the accuracy, efficiency and robustness. In addition, our generator based on the Logsig-RNN model exhibits better performance in generating realistic-looking time series data than classical RNN generators and other baseline methods from the literature. Apart from that, another contribution of our work is to construct a novel Sig-WGAN framework to address the efficiency issue and instability training of traditional generative adversarial networks for time series generation
Detecting early signs of depressive and manic episodes in patients with bipolar disorder using the signature-based model
Recurrent major mood episodes and subsyndromal mood instability cause
substantial disability in patients with bipolar disorder. Early identification
of mood episodes enabling timely mood stabilisation is an important clinical
goal. Recent technological advances allow the prospective reporting of mood in
real time enabling more accurate, efficient data capture. The complex nature of
these data streams in combination with challenge of deriving meaning from
missing data mean pose a significant analytic challenge. The signature method
is derived from stochastic analysis and has the ability to capture important
properties of complex ordered time series data. To explore whether the onset of
episodes of mania and depression can be identified using self-reported mood
data.Comment: 12 pages, 3 tables, 10 figure
Learning stochastic differential equations using RNN with log signature features
This paper contributes to the challenge of learning a function on streamed
multimodal data through evaluation. The core of the result of our paper is the
combination of two quite different approaches to this problem. One comes from
the mathematically principled technology of signatures and log-signatures as
representations for streamed data, while the other draws on the techniques of
recurrent neural networks (RNN). The ability of the former to manage high
sample rate streams and the latter to manage large scale nonlinear interactions
allows hybrid algorithms that are easy to code, quicker to train, and of lower
complexity for a given accuracy.
We illustrate the approach by approximating the unknown functional as a
controlled differential equation. Linear functionals on solutions of controlled
differential equations are the natural universal class of functions on data
streams. Following this approach, we propose a hybrid Logsig-RNN algorithm that
learns functionals on streamed data. By testing on various datasets, i.e.
synthetic data, NTU RGB+D 120 skeletal action data, and Chalearn2013 gesture
data, our algorithm achieves the outstanding accuracy with superior efficiency
and robustness
Gesture passwords: concepts, methods and challenges
Biometrics are a convenient alternative to traditional forms of access control such as passwords and pass-cards since they rely solely on user-specific traits. Unlike alphanumeric passwords, biometrics cannot be given or told to another person, and unlike pass-cards, are always “on-hand.” Perhaps the most well-known biometrics with these properties are: face, speech, iris, and gait. This dissertation proposes a new biometric modality: gestures.
A gesture is a short body motion that contains static anatomical information and changing behavioral (dynamic) information. This work considers both full-body gestures such as a large wave of the arms, and hand gestures such as a subtle curl of the fingers and palm. For access control, a specific gesture can be selected as a “password” and used for identification and authentication of a user. If this particular motion were somehow compromised, a user could readily select a new motion as a “password,” effectively changing and renewing the behavioral aspect of the biometric.
This thesis describes a novel framework for acquiring, representing, and evaluating gesture passwords for the purpose of general access control. The framework uses depth sensors, such as the Kinect, to record gesture information from which depth maps or pose features are estimated. First, various distance measures, such as the log-euclidean distance between feature covariance matrices and distances based on feature sequence alignment via dynamic time warping, are used to compare two gestures, and train a classifier to either authenticate or identify a user. In authentication, this framework yields an equal error rate on the order of 1-2% for body and hand gestures in non-adversarial scenarios. Next, through a novel decomposition of gestures into posture, build, and dynamic components, the relative importance of each component is studied. The dynamic portion of a gesture is shown to have the largest impact on biometric performance with its removal causing a significant increase in error. In addition, the effects of two types of threats are investigated: one due to self-induced degradations (personal effects and the passage of time) and the other due to spoof attacks. For body gestures, both spoof attacks (with only the dynamic component) and self-induced degradations increase the equal error rate as expected. Further, the benefits of adding additional sensor viewpoints to this modality are empirically evaluated. Finally, a novel framework that leverages deep convolutional neural networks for learning a user-specific “style” representation from a set of known gestures is proposed and compared to a similar representation for gesture recognition. This deep convolutional neural network yields significantly improved performance over prior methods.
A byproduct of this work is the creation and release of multiple publicly available,
user-centric (as opposed to gesture-centric) datasets based on both body and hand gestures
Infant Cognitive Scores Prediction With Multi-stream Attention-based Temporal Path Signature Features
There is stunning rapid development of human brains in the first year of life. Some studies have revealed the tight connection between cognition skills and cortical morphology in this period. Nonetheless, it is still a great challenge to predict cognitive scores using brain morphological features, given issues like small sample size and missing data in longitudinal studies. In this work, for the first time, we introduce the path signature method to explore hidden analytical and geometric properties of longitudinal cortical morphology features. A novel BrainPSNet is proposed with a differentiable temporal path signature layer to produce informative representations of different time points and various temporal granules. Further, a two-stream neural network is included to combine groups of raw features and path signature features for predicting the cognitive score. More importantly, considering different influences of each brain region on the cognitive function, we design a learning-based attention mask generator to automatically weight regions correspondingly. Experiments are conducted on an in-house longitudinal dataset. By comparing with several recent algorithms, the proposed method achieves the state-of-the-art performance. The relationship between morphological features and cognitive abilities is also analyzed
- …