Search CORE

12 research outputs found

Log-Euclidean Bag of Words for Human Action Recognition

Author: Bhatia R.
Conrad Sanderson
Lazebnik S.
Masoud Faraki
Maziar Palhang
Wong Y.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2015
Field of study

Representing videos by densely extracted local space-time features has recently become a popular approach for analysing actions. In this paper, we tackle the problem of categorising human actions by devising Bag of Words (BoW) models based on covariance matrices of spatio-temporal features, with the features formed from histograms of optical flow. Since covariance matrices form a special type of Riemannian manifold, the space of Symmetric Positive Definite (SPD) matrices, non-Euclidean geometry should be taken into account while discriminating between covariance matrices. To this end, we propose to embed SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW approach to its Riemannian version. The proposed BoW approach takes into account the manifold geometry of SPD matrices during the generation of the codebook and histograms. Experiments on challenging human action datasets show that the proposed method obtains notable improvements in discrimination accuracy, in comparison to several state-of-the-art methods

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Sparse Coding on Symmetric Positive Definite Manifolds using Bregman Divergences

Author: Harandi Mehrtash
Hartley Richard
Lovell Brian
Sanderson Conrad
Publication venue
Publication date: 30/08/2014
Field of study

This paper introduces sparse coding and dictionary learning for Symmetric Positive Definite (SPD) matrices, which are often used in machine learning, computer vision and related areas. Unlike traditional sparse coding schemes that work in vector spaces, in this paper we discuss how SPD matrices can be described by sparse combination of dictionary atoms, where the atoms are also SPD matrices. We propose to seek sparse coding by embedding the space of SPD matrices into Hilbert spaces through two types of Bregman matrix divergences. This not only leads to an efficient way of performing sparse coding, but also an online and iterative scheme for dictionary learning. We apply the proposed methods to several computer vision tasks where images are represented by region covariance matrices. Our proposed algorithms outperform state-of-the-art methods on a wide range of classification tasks, including face recognition, action recognition, material classification and texture categorization

arXiv.org e-Print Archive

CiteSeerX

Representing visual appearance by video Brownian covariance descriptor for human action recognition

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Second-order Temporal Pooling for Action Recognition

Author: Cherian Anoop
Gould Stephen
Publication venue
Publication date: 06/08/2018
Field of study

Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these features. Typically zero-th (max) or the first-order (average) statistics are used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel end-to-end learnable feature aggregation scheme, dubbed temporal correlation pooling that generates an action descriptor for a video sequence by capturing the similarities between the temporal evolution of clip-level CNN features computed across the video. Such a descriptor, while being computationally cheap, also naturally encodes the co-activations of multiple CNN features, thereby providing a richer characterization of actions than their first-order counterparts. We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space. We provide experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained datasets such as MPII Cooking activities and JHMDB, as well as the recent Kinetics-600. Our results demonstrate the advantages of higher-order pooling schemes that when combined with hand-crafted features (as is standard practice) achieves state-of-the-art accuracy.Comment: Accepted in the International Journal of Computer Vision (IJCV

arXiv.org e-Print Archive

The Australian National University

Extrinsic methods for coding and dictionary learning on grassmann manifolds

Author: Harandi Mehrtash
Hartley Richard
Lovell Brian
Sanderson Conrad
Shen Chunhua
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/06/2015
Field of study

Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning in Grassmann manifolds, i.e., the space of linear subspaces. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose an algorithm for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into higher dimensional Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis

University of Queensland eSpace

Estudio de caracterizadores visuales para la detección de obstaculos en vídeos de ski con cámara subjetiva

Author: Francia Molinero Héctor
Miguel Casado Gregorio de
Publication venue: 'Universidad de Zaragoza'
Publication date: 01/01/2016
Field of study

Este trabajo se centra en el estudio de caracterizadores visuales para la identificación de objetos u obstáculos sencillos en vídeos de ski. Para ello se han utilizado técnicas de aprendizaje para desarrollar un prototipo software que se apoya en un conjunto de prueba creado expresamente para este estudio. A nuestro saber este tipo de técnicas no se habían aplicado antes a este campo, por lo que se ha tenido que crear una base de datos con imágenes tomadas en primera persona. Como resultado del proyecto se ha permitido comprobar que, para determinados caracterizadores, se obtienen buenos resultados llegando incluso al 90\% de precisión en el reconocimiento de las clases de objetos creadas. La memoria aborda un análisis del estado del arte, donde se resumen una serie de dispositivos de motorización de la actividad física (pulseras, smartwatches, la nube de aplicaciones que proporcionan servicios extendidos a éstos dispositivos...). El estado del arte también resume los principales artículos relacionados con este estudio y sobre los cuales se apoya, tanto en las técnicas de visión basadas en caracterizadores visuales como en las de aprendizaje. A continuación se presenta la arquitectura del sistema, con un resumen global, la descripción de los caracterizadores visuales SURF (Speed Up Robust Feature), HOG (Histogram of Oriented Gradients), HOF (Histogram of Optical Flow) y MBH (Motion Boundary Histogram) utilizados. Seguidamente, el prototipo del sistema presenta de una manera estructurada toda la implementación realizada. Finalmente, se mostrarán los resultados con su correspondiente evaluación y la gestión del proyecto con sus conclusiones

Repositorio Universidad de Zaragoza

Fast and accurate image and video analysis on Riemannian manifolds

Author: Zhao Kun
Publication venue: 'University of Queensland Library'
Publication date: 25/11/2016
Field of study

University of Queensland eSpace

Human action recognition under Log-Euclidean Riemannian metric

Author: F. Perronnin
K. Mikolajczyk
M.J. Lucena
O. Tuzel
T. Kadir
Y. Rubner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper presents a new action recognition approach based on local spatio-temporal features. The main contributions of our approach are twofold. First, a new local spatio-temporal feature is proposed to represent the cuboids detected in video sequences. Specifically, the descriptor utilizes the covariance matrix to capture the self-correlation information of the low-level features within each cuboid. Since covariance matrices do not lie on Euclidean space, the Log-Euclidean Riemannian metric is used for distance measure between covariance matrices. Second, the Earth Mover’s Distance (EMD) is used for matching any pair of video sequences. In contrast to the widely used Euclidean distance, EMD achieves more robust performances in matching histograms/distributions with different sizes. Experimental results on two datasets demonstrate the effectiveness of the proposed approach

CiteSeerX

Crossref

Birkbeck Institutional Research Online