Search CORE

14 research outputs found

Action Classification with Locality-constrained Linear Coding

Author: Huynh Du
Mahmood Arif
Mian Ajmal
Rahmani Hossein
Publication venue
Publication date: 01/01/2014
Field of study

We propose an action classification algorithm which uses Locality-constrained Linear Coding (LLC) to capture discriminative information of human body variations in each spatiotemporal subsequence of a video sequence. Our proposed method divides the input video into equally spaced overlapping spatiotemporal subsequences, each of which is decomposed into blocks and then cells. We use the Histogram of Oriented Gradient (HOG3D) feature to encode the information in each cell. We justify the use of LLC for encoding the block descriptor by demonstrating its superiority over Sparse Coding (SC). Our sequence descriptor is obtained via a logistic regression classifier with L2 regularization. We evaluate and compare our algorithm with ten state-of-the-art algorithms on five benchmark datasets. Experimental results show that, on average, our algorithm gives better accuracy than these ten algorithms.Comment: ICPR 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Lancaster E-Prints

Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions

Author: CM Bishop
D Weinland
DA Bini
F Perronnin
G Csurka
I Traore
J Aggarwal
J Sánchez
K Guo
MT Harandi
MT Harandi
N Aggarwal
P Turaga
R Poppe
S Ali
S Hirose
SR Ke
V Arsigny
Y Wu
Ó Pérez
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We present a comparative evaluation of various techniques for action recognition while keeping as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against traditional action recognition techniques based on Gaussian mixture models and Fisher vectors (FVs). We evaluate these action recognition techniques under ideal conditions, as well as their sensitivity in more challenging conditions (variations in scale and translation). Despite recent advancements for handling manifolds, manifold based techniques obtain the lowest performance and their kernel representations are more unstable in the presence of challenging conditions. The FV approach obtains the highest accuracy under ideal conditions. Moreover, FV best deals with moderate scale and translation changes

arXiv.org e-Print Archive

Crossref

University of Queensland eSpace

Generalized Rank Pooling for Activity Recognition

Author: Cherian Anoop
Fernando Basura
Gould Stephen
Harandi Mehrtash
Publication venue
Publication date: 22/07/2017
Field of study

Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity. Usually, this pooling step discards the temporal order of the frames, which could otherwise be used for better recognition. Towards this end, we propose a novel pooling method, generalized rank pooling (GRP), that takes as input, features from the intermediate layers of a CNN that is trained on tiny sub-sequences, and produces as output the parameters of a subspace which (i) provides a low-rank approximation to the features and (ii) preserves their temporal order. We propose to use these parameters as a compact representation for the video sequence, which is then used in a classification setup. We formulate an objective for computing this subspace as a Riemannian optimization problem on the Grassmann manifold, and propose an efficient conjugate gradient scheme for solving it. Experiments on several activity recognition datasets show that our scheme leads to state-of-the-art performance.Comment: Accepted at IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 201

arXiv.org e-Print Archive

Crossref

Non-Linear Temporal Subspace Representations for Activity Recognition

Author: Cherian Anoop
Gould Stephen
Hartley Richard
Sra Suvrit
Publication venue
Publication date: 27/03/2018
Field of study

Representations that can compactly and effectively capture the temporal evolution of semantic content are important to computer vision and machine learning algorithms that operate on multi-variate time-series data. We investigate such representations motivated by the task of human action recognition. Here each data instance is encoded by a multivariate feature (such as via a deep CNN) where action dynamics are characterized by their variations in time. As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in a reproducing kernel Hilbert space, projections of data onto which captures their temporal order. We develop this idea further and show that such a pooling scheme can be cast as an order-constrained kernelized PCA objective. We then propose to use the parameters of a kernelized low-rank feature subspace as the representation of the sequences. We cast our formulation as an optimization problem on generalized Grassmann manifolds and then solve it efficiently using Riemannian optimization techniques. We present experiments on several action recognition datasets using diverse feature modalities and demonstrate state-of-the-art results.Comment: Accepted at the IEEE International Conference on Computer Vision and Pattern Recognition, CVPR, 2018. arXiv admin note: substantial text overlap with arXiv:1705.0858

arXiv.org e-Print Archive

Crossref

Recognition and localization of relevant human behavior in videos, SPIE,

Author: Coen Van Leeuwen
Gertjan Burghouts
Henri Bouma
Johan-Martijn Ten Hove
Leo De Penning
Maarten Kruithof
Patrick Hanckmann
Sander Landsmeer
Sanne Korzec
Sebastiaan Van Den Broek
Publication venue
Publication date: 01/01/2013
Field of study

ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we present an improved system that can recognize actions of a human and interactions between multiple humans. Central to the new system is our agent-based architecture. The system is trained on thousands of videos and evaluated on realistic persistent surveillance data in the DARPA Mind's Eye program, with hours of videos of challenging scenes. The results show that our system is able to track the people, detect and localize events, and discriminate between different behaviors, and it performs 3.4 times better than our previous system

CiteSeerX

Learning Binary Code for Fast Nearest Subspace Search

Author: Andoni
Andoni
Arya
Bai
Baraniuk
Basri
Basri
Basri
Bauml
Beveridge
Blank
Blanz
Broomhead
Broomhead
Chang
Datar
Dong
Dong
Edelman
Edwin R. Hancock
Fitzgibbon
Ghasedi Dizaji
Gionis
Goldstein
Golub
Gong
Gross
Hamm
Hotelling
Ji
Ji
Jun Zhou
Kim
Kushilevitz
Lei Zhou
Li
Lin
Lin
Liong
Liu
Liu
Liu
Luo
Marrinan
Muja
O’Hara
Pirsiavash
Shen
Song
Soomro
Sun
Vidal
Wang
Wang
Wang
Wang
Wang
Weiss
Wolf
Wright
Xianglong Liu
Xiao Bai
Xu
Zhang
Zhang
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/02/2020
Field of study

Crossref

White Rose Research Online

Object and action detection methods using MOSSE filters

Author: Arn Robert T.
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2012
Field of study

2012 Fall.Includes bibliographical references.In this thesis we explore the application of the Minimum Output Sum of Squared Error (MOSSE) filter to object detection in images as well as action detection in video. We exploit the properties of the Fourier transform for computing correlations in two and three dimensions. We perform a comprehensive examination of the shape parameters of the desired target response and determine values to optimize the filter performance for specific objects and actions. In addition, we propose the Gaussian Iterative Response (GIR) algorithm and the Multi-Sigma Geometric Mean method to improve the MOSSE filter response on test signals. Also, new detection criteria are investigated and shown to boost the detection accuracy on two well-known data sets

Mountain Scholar (Digital Collections of Colorado and Wyoming)

One-shot learning with pretrained convolutional neural network

Author: Yu Zhixian
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2019
Field of study

2019 Summer.Includes bibliographical references.Recent progress in convolutional neural networks and deep learning has revolutionized the image classification field, and computers can now classify images with a very high accuracy. However, unlike the human vision system which efficiently recognizes a new object after seeing a similar one, recognizing new classes of images requires a time- and resource-consuming process of retraining a neural network due to several restrictions. Since a pretrained neural network has seen a large amount of training data, it may be generalized to effectively and efficiently recognize new classes considering it may extract patterns from training images. This inspires some research in one-shot learning, which is the process of learning to classify a novel class through one training image from the novel class. One-shot learning can help expand the use of a trained convolutional neural network without costly model retraining. In addition to the practical application of one-shot learning, it is also important to understand how a convolutional neural network supports one-shot learning. More specifically, how does the feature space structure to support one-shot learning? This can potentially help us better understand the mechanisms of convolutional neural networks. This thesis proposes an approximate nearest neighbor-based method for one-shot learning. This method makes use of the features produced by a pretrained convolutional neural network and builds a proximity forest to classify new classes. The algorithm is tested in two datasets with different scales and achieves reasonable high classification accuracy in both datasets. Furthermore, this thesis tries to understand the feature space to explain the success of our proposed method. A novel tool generalized curvature analysis is used to probe the feature space structure of the convolutional neural network. The results show that the feature space curves around samples with both known classes and unknown in-domain classes, but not around transition samples between classes or out-of-domain samples. In addition, the low curvature of out-of-domain samples is correlated with the inability of a pretrained convolutional neural network to classify out-of-domain classes, indicating that a pretrained model cannot generate useful feature representations for out-of-domain samples. In summary, this thesis proposes a new method for one-shot learning, and provides insight into understanding the feature space of convolutional neural networks

Mountain Scholar (Digital Collections of Colorado and Wyoming)