9,070 research outputs found

    Multimodal Multipart Learning for Action Recognition in Depth Videos

    Full text link
    The articulated and complex nature of human actions makes the task of action recognition difficult. One approach to handle this complexity is dividing it to the kinetics of body parts and analyzing the actions based on these partial descriptors. We propose a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts. To represent dynamics and appearance of parts, we employ a heterogeneous set of depth and skeleton based features. The proper structure of multimodal multipart features are formulated into the learning framework via the proposed hierarchical mixed norm, to regularize the structured features of each part and to apply sparsity between them, in favor of a group feature selection. Our experimental results expose the effectiveness of the proposed learning method in which it outperforms other methods in all three tested datasets while saturating one of them by achieving perfect accuracy

    A modular network treatment of Baars' Global Workspace consciousness model

    Get PDF
    Network theory provides an alternative to the renormalization and phase transition methods used in Wallace's (2005a) treatment of Baars' Global Workspace model. Like the earlier study, the new analysis produces the workplace itself, the tunable threshold of consciousness, and the essential role for embedding contexts, in an explicitly analytic 'necessary conditions' manner which suffers neither the mereological fallacy inherent to brain-only theories nor the sufficiency indeterminacy of neural network or agent-based simulations. This suggests that the new approach, and the earlier, represent different analytically solvable limits in a broad continuum of possible models, analogous to the differences between bond and site percolation or between the two and many-body limits of classical mechanics. The development significantly extends the theoretical foundations for an empirical general cognitive model (GCM) based on the Shannon-McMillan Theorem. Patterned after the general linear model which reflects the Central Limit Theorem, the proposed technique should be both useful for the reduction of expermiental data on consciousness and in the design of devices with capacities which may transcend those of conventional machines and provide new perspectives on the varieties of biological consciousness

    Learning to Transform Time Series with a Few Examples

    Get PDF
    We describe a semi-supervised regression algorithm that learns to transform one time series into another time series given examples of the transformation. This algorithm is applied to tracking, where a time series of observations from sensors is transformed to a time series describing the pose of a target. Instead of defining and implementing such transformations for each tracking task separately, our algorithm learns a memoryless transformation of time series from a few example input-output mappings. The algorithm searches for a smooth function that fits the training examples and, when applied to the input time series, produces a time series that evolves according to assumed dynamics. The learning procedure is fast and lends itself to a closed-form solution. It is closely related to nonlinear system identification and manifold learning techniques. We demonstrate our algorithm on the tasks of tracking RFID tags from signal strength measurements, recovering the pose of rigid objects, deformable bodies, and articulated bodies from video sequences. For these tasks, this algorithm requires significantly fewer examples compared to fully-supervised regression algorithms or semi-supervised learning algorithms that do not take the dynamics of the output time series into account

    Log signatures in machine learning

    Get PDF
    Rough path theory, originated as a branch of stochastic analysis, is an emerging tool for analysing complex sequential data in machine learning with increasing attention. This is owing to the core mathematical object of rough path theory, i.e., the signature/log-signature of a path, which has analytical and algebraic properties. This thesis aims to develop a principled and effective model for time series data based on the log-signature method and the recurrent neural network (RNN). The proposed (generalized) Logsig-RNN model can be regarded as a generalization of the RNN model, which boosts the model performance of the RNN by reducing the time dimension and summarising the local structures of sequential data via the log-signature feature. This hybrid model serves as a generic neural network for a wide range of time series applications. In this thesis, we construct the mathematical formulation for the (generalized) Logsig-RNN model, analyse its complexity and establish the universality. We validate the effectiveness of the proposed method for time series analysis in both supervised learning and generative tasks. In particular, for the skeleton human action recognition tasks, we demonstrates that by replacing the RNN module by the Logsig-RNN in state-of-the-art (SOTA) networks improves the accuracy, efficiency and robustness. In addition, our generator based on the Logsig-RNN model exhibits better performance in generating realistic-looking time series data than classical RNN generators and other baseline methods from the literature. Apart from that, another contribution of our work is to construct a novel Sig-WGAN framework to address the efficiency issue and instability training of traditional generative adversarial networks for time series generation

    3D PersonVLAD: Learning Deep Global Representations for Video-based Person Re-identification

    Full text link
    In this paper, we introduce a global video representation to video-based person re-identification (re-ID) that aggregates local 3D features across the entire video extent. Most of the existing methods rely on 2D convolutional networks (ConvNets) to extract frame-wise deep features which are pooled temporally to generate the video-level representations. However, 2D ConvNets lose temporal input information immediately after the convolution, and a separate temporal pooling is limited in capturing human motion in shorter sequences. To this end, we present a \textit{global} video representation (3D PersonVLAD), complementary to 3D ConvNets as a novel layer to capture the appearance and motion dynamics in full-length videos. However, encoding each video frame in its entirety and computing an aggregate global representation across all frames is tremendously challenging due to occlusions and misalignments. To resolve this, our proposed network is further augmented with 3D part alignment module to learn local features through soft-attention module. These attended features are statistically aggregated to yield identity-discriminative representations. Our global 3D features are demonstrated to achieve state-of-the-art results on three benchmark datasets: MARS \cite{MARS}, iLIDS-VID \cite{VideoRanking}, and PRID 2011Comment: Accepted to appear at IEEE Transactions on Neural Networks and Learning System
    • …
    corecore