Search CORE

47,327 research outputs found

3D Cylindrical Trace Transform based feature extraction for effective human action classification

Author: Goudelis Georgios
Karpouzis Kostas
Kollias Stefanos
Tsatiris Georgios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2017
Field of study

Human action recognition is currently one of the hottest areas in pattern recognition and machine intelligence. Its applications vary from console and exertion gaming and human computer interaction to automated surveillance and assistive environments. In this paper, we present a novel feature extraction method for action recognition, extending the capabilities of the Trace transform to the 3D domain. We define the notion of a 3D form of the Trace transform on discrete volumes extracted from spatio-temporal image sequences. On a second level, we propose the combination of the novel transform, named 3D Cylindrical Trace Transform, with Selective Spatio-Temporal Interest Points, in a feature extraction scheme called Volumetric Triple Features, which manages to capture the valuable geometrical distribution of interest points in spatio-temporal sequences and to give prominence to their action-discriminant geometrical correlations. The technique provides noise robust, distortion invariant and temporally sensitive features for the classification of human actions. Experiments on different challenging action recognition datasets provided impressive results indicating the efficiency of the proposed transform and of the overall proposed scheme for the specific task

University of Lincoln Institutional Repository

Crossref

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

Author: Gao Zilin
Li Peihua
Wang Qilong
Xie Jiangtao
Publication venue
Publication date: 01/04/2018
Field of study

Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks. At the core of our method is a meta-layer designed with loop-embedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging fine-grained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COVComment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

Learning to Recognize Actions from Limited Training Examples Using a Recurrent Spiking Neural Model

Author: Panda Priyadarshini
Srinivasa Narayan
Publication venue
Publication date: 19/10/2017
Field of study

A fundamental challenge in machine learning today is to build a model that can learn from few examples. Here, we describe a reservoir based spiking neural model for learning to recognize actions with a limited number of labeled videos. First, we propose a novel encoding, inspired by how microsaccades influence visual perception, to extract spike information from raw video data while preserving the temporal correlation across different frames. Using this encoding, we show that the reservoir generalizes its rich dynamical activity toward signature action/movements enabling it to learn from few training examples. We evaluate our approach on the UCF-101 dataset. Our experiments demonstrate that our proposed reservoir achieves 81.3%/87% Top-1/Top-5 accuracy, respectively, on the 101-class data while requiring just 8 video examples per class for training. Our results establish a new benchmark for action recognition from limited video examples for spiking neural models while yielding competetive accuracy with respect to state-of-the-art non-spiking neural models.Comment: 13 figures (includes supplementary information

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector

Profiling user activities with minimal traffic traces

Author: J Zhang
L Sweeney
T Fawcett
TTT Nguyen
Publication venue
Publication date: 07/04/2015
Field of study

Understanding user behavior is essential to personalize and enrich a user's online experience. While there are significant benefits to be accrued from the pursuit of personalized services based on a fine-grained behavioral analysis, care must be taken to address user privacy concerns. In this paper, we consider the use of web traces with truncated URLs - each URL is trimmed to only contain the web domain - for this purpose. While such truncation removes the fine-grained sensitive information, it also strips the data of many features that are crucial to the profiling of user activity. We show how to overcome the severe handicap of lack of crucial features for the purpose of filtering out the URLs representing a user activity from the noisy network traffic trace (including advertisement, spam, analytics, webscripts) with high accuracy. This activity profiling with truncated URLs enables the network operators to provide personalized services while mitigating privacy concerns by storing and sharing only truncated traffic traces. In order to offset the accuracy loss due to truncation, our statistical methodology leverages specialized features extracted from a group of consecutive URLs that represent a micro user action like web click, chat reply, etc., which we call bursts. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. We present an extensive experimental evaluation on a real dataset of mobile web traces, consisting of more than 130 million records, representing the browsing activities of 10,000 users over a period of 30 days. Our results show that the proposed methodology achieves around 90% accuracy in segregating URLs representing user activities from non-representative URLs

arXiv.org e-Print Archive

Crossref

Temporal segmentation of human actions in video sequences

Author: Carmona Leyva José María
Climent Vilaró Joan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Most of the published works concerning action recognition, usually assume that the action sequences have been previously segmented in time, that is, the action to be recognized starts with the first sequence frame and ends with the last one. However, temporal segmentation of actions in sequences is not an easy task, and is always prone to errors. In this paper, we present a new technique to automatically extract human actions from a video sequence. Our approach presents several contributions. First of all, we use a projection template scheme and find spatio-temporal features and descriptors within the projected surface, rather than extracting them in the whole sequence. For projecting the sequence we use a variant of the R transform, which has never been used before for temporal action segmentation. Instead of projecting the original video sequence, we project its optical flow components, preserving important information about action motion. We test our method on a publicly available action dataset, and the results show that it performs very well segmenting human actions compared with the state-of-the-art methods.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Convex Relaxations of SE(2) and SE(3) for Visual Pose Estimation

Author: Burdick Joel W.
Horowitz Matanya B.
Matni Nikolai
Publication venue
Publication date: 06/04/2014
Field of study

This paper proposes a new method for rigid body pose estimation based on spectrahedral representations of the tautological orbitopes of

SE(2)

and

SE(3)

. The approach can use dense point cloud data from stereo vision or an RGB-D sensor (such as the Microsoft Kinect), as well as visual appearance data. The method is a convex relaxation of the classical pose estimation problem, and is based on explicit linear matrix inequality (LMI) representations for the convex hulls of

SE(2)

and

SE(3)

. Given these representations, the relaxed pose estimation problem can be framed as a robust least squares problem with the optimization variable constrained to these convex sets. Although this formulation is a relaxation of the original problem, numerical experiments indicate that it is indeed exact - i.e. its solution is a member of

SE(2)

SE(3)

- in many interesting settings. We additionally show that this method is guaranteed to be exact for a large class of pose estimation problems.Comment: ICRA 2014 Preprin

arXiv.org e-Print Archive