47,327 research outputs found

    3D Cylindrical Trace Transform based feature extraction for effective human action classification

    Get PDF
    Human action recognition is currently one of the hottest areas in pattern recognition and machine intelligence. Its applications vary from console and exertion gaming and human computer interaction to automated surveillance and assistive environments. In this paper, we present a novel feature extraction method for action recognition, extending the capabilities of the Trace transform to the 3D domain. We define the notion of a 3D form of the Trace transform on discrete volumes extracted from spatio-temporal image sequences. On a second level, we propose the combination of the novel transform, named 3D Cylindrical Trace Transform, with Selective Spatio-Temporal Interest Points, in a feature extraction scheme called Volumetric Triple Features, which manages to capture the valuable geometrical distribution of interest points in spatio-temporal sequences and to give prominence to their action-discriminant geometrical correlations. The technique provides noise robust, distortion invariant and temporally sensitive features for the classification of human actions. Experiments on different challenging action recognition datasets provided impressive results indicating the efficiency of the proposed transform and of the overall proposed scheme for the specific task

    Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

    Full text link
    Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks. At the core of our method is a meta-layer designed with loop-embedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging fine-grained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COVComment: Accepted to CVPR 201

    Learning to Recognize Actions from Limited Training Examples Using a Recurrent Spiking Neural Model

    Full text link
    A fundamental challenge in machine learning today is to build a model that can learn from few examples. Here, we describe a reservoir based spiking neural model for learning to recognize actions with a limited number of labeled videos. First, we propose a novel encoding, inspired by how microsaccades influence visual perception, to extract spike information from raw video data while preserving the temporal correlation across different frames. Using this encoding, we show that the reservoir generalizes its rich dynamical activity toward signature action/movements enabling it to learn from few training examples. We evaluate our approach on the UCF-101 dataset. Our experiments demonstrate that our proposed reservoir achieves 81.3%/87% Top-1/Top-5 accuracy, respectively, on the 101-class data while requiring just 8 video examples per class for training. Our results establish a new benchmark for action recognition from limited video examples for spiking neural models while yielding competetive accuracy with respect to state-of-the-art non-spiking neural models.Comment: 13 figures (includes supplementary information

    Profiling user activities with minimal traffic traces

    Full text link
    Understanding user behavior is essential to personalize and enrich a user's online experience. While there are significant benefits to be accrued from the pursuit of personalized services based on a fine-grained behavioral analysis, care must be taken to address user privacy concerns. In this paper, we consider the use of web traces with truncated URLs - each URL is trimmed to only contain the web domain - for this purpose. While such truncation removes the fine-grained sensitive information, it also strips the data of many features that are crucial to the profiling of user activity. We show how to overcome the severe handicap of lack of crucial features for the purpose of filtering out the URLs representing a user activity from the noisy network traffic trace (including advertisement, spam, analytics, webscripts) with high accuracy. This activity profiling with truncated URLs enables the network operators to provide personalized services while mitigating privacy concerns by storing and sharing only truncated traffic traces. In order to offset the accuracy loss due to truncation, our statistical methodology leverages specialized features extracted from a group of consecutive URLs that represent a micro user action like web click, chat reply, etc., which we call bursts. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. We present an extensive experimental evaluation on a real dataset of mobile web traces, consisting of more than 130 million records, representing the browsing activities of 10,000 users over a period of 30 days. Our results show that the proposed methodology achieves around 90% accuracy in segregating URLs representing user activities from non-representative URLs

    Temporal segmentation of human actions in video sequences

    Get PDF
    Most of the published works concerning action recognition, usually assume that the action sequences have been previously segmented in time, that is, the action to be recognized starts with the first sequence frame and ends with the last one. However, temporal segmentation of actions in sequences is not an easy task, and is always prone to errors. In this paper, we present a new technique to automatically extract human actions from a video sequence. Our approach presents several contributions. First of all, we use a projection template scheme and find spatio-temporal features and descriptors within the projected surface, rather than extracting them in the whole sequence. For projecting the sequence we use a variant of the R transform, which has never been used before for temporal action segmentation. Instead of projecting the original video sequence, we project its optical flow components, preserving important information about action motion. We test our method on a publicly available action dataset, and the results show that it performs very well segmenting human actions compared with the state-of-the-art methods.Peer ReviewedPostprint (author's final draft

    Convex Relaxations of SE(2) and SE(3) for Visual Pose Estimation

    Get PDF
    This paper proposes a new method for rigid body pose estimation based on spectrahedral representations of the tautological orbitopes of SE(2)SE(2) and SE(3)SE(3). The approach can use dense point cloud data from stereo vision or an RGB-D sensor (such as the Microsoft Kinect), as well as visual appearance data. The method is a convex relaxation of the classical pose estimation problem, and is based on explicit linear matrix inequality (LMI) representations for the convex hulls of SE(2)SE(2) and SE(3)SE(3). Given these representations, the relaxed pose estimation problem can be framed as a robust least squares problem with the optimization variable constrained to these convex sets. Although this formulation is a relaxation of the original problem, numerical experiments indicate that it is indeed exact - i.e. its solution is a member of SE(2)SE(2) or SE(3)SE(3) - in many interesting settings. We additionally show that this method is guaranteed to be exact for a large class of pose estimation problems.Comment: ICRA 2014 Preprin
    • …
    corecore