Search CORE

25 research outputs found

Supervised local descriptor learning for human action recognition

Author: Cao Xianbin
Shao Ling
Xu Dan
Zhen Xiantong
Zheng Feng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/05/2017
Field of study

Local features have been widely used in computer vision tasks, e.g., human action recognition, but it tends to be an extremely challenging task to deal with large-scale local features of high dimensionality with redundant information. In this paper, we propose a novel fully supervised local descriptor learning algorithm called discriminative embedding method based on the image-to-class distance (I2CDDE) to learn compact but highly discriminative local feature descriptors for more accurate and efficient action recognition. By leveraging the advantages of the I2C distance, the proposed I2CDDE incorporates class labels to enable fully supervised learning of local feature descriptors, which achieves highly discriminative but compact local descriptors. The objective of our I2CDDE is to minimize the I2C distances from samples to their corresponding classes while maximizing the I2C distances to the other classes in the low-dimensional space. To further improve the performance, we propose incorporating a manifold regularization based on the graph Laplacian into the objective function, which can enhance the smoothness of the embedding by extracting the local intrinsic geometrical structure. The proposed I2CDDE for the first time achieves fully supervised learning of local feature descriptors. It significantly improves the performance of I2C-based methods by increasing the discriminative ability of local features while greatly reducing the computational burden by dimensionality reduction to handle large-scale data. We apply the proposed I2CDDE algorithm to human action recognition on four widely used benchmark datasets. The results have shown that I2CDDE can significantly improve I2C-based classifiers and achieves state-of-the-art performance

Crossref

University of East Anglia digital repository

Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Author: Kothari Nisarg
Leung Thomas
Mori Greg
Pantofaru Caroline
Toderici George
Toshev Alexander
Yang Weilong
Publication venue
Publication date: 01/07/2015
Field of study

We present a method for learning an embedding that places images of humans in similar poses nearby. This embedding can be used as a direct method of comparing images based on human pose, avoiding potential challenges of estimating body joint positions. Pose embedding learning is formulated under a triplet-based distance criterion. A deep architecture is used to allow learning of a representation capable of making distinctions between different poses. Experiments on human pose matching and retrieval from video data demonstrate the potential of the method

arXiv.org e-Print Archive

CiteSeerX

Shared nearest neighbors match kernel for bird songs identification -LifeCLEF 2015 challenge

Author: Buisson Olivier
Champ Julien
Joly Alexis
Leveau Valentin
Publication venue: HAL CCSD
Publication date: 08/09/2015
Field of study

International audienceThis paper presents a new fine-grained audio classification technique designed and experimented in the context of the LifeCLEF 2015 bird species identification challenge. Inspired by recent works on fine-grained image classification, we introduce a new match kernel based on the shared nearest neighbors of the low level audio features extracted at the frame level. To make such strategy scalable to the tens of millions of MFCC features extracted from the tens of thousands audio recordings of the training set, we used high-dimensional hashing techniques coupled with an efficient approximate nearest neighbors search algorithm with controlled quality. Further improvements are obtained by (i) using a sliding window for the temporal pooling of the raw matches (ii) weighting each low level feature according to the semantic coherence of its nearest neighbors. Results show the effectiveness of the proposed technique which ranked 2nd among the 7 research groups participating to the LifeCLEF bird challenge

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Good appearance and shape descriptors for object category recognition

Author: Dias J.
Gaspar F.
Proença P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

In the problem of object category recognition, we have studied different families of descriptors exploiting RGB and 3D information. Furthermore, we have proven practically that 3D shape-based descriptors are more suitable for this type of recognition due to low shape intra-class variance, as opposed to image texture-based. In addition, we have also shown how an efficient Naive Bayes Nearest Neighbor (NBNN) classifier can scale to a large hierarchical RGB-D Object Dataset [2] and achieve, with a single descriptor type, an accuracy close to state-of-art learning based approaches using combined descriptors

Crossref

Repositório Institucional do ISCTE-IUL

Local Pyramidal Descriptors for Image Recognition

Author: Bagdanov Andrew D.
Del Bimbo Alberto
Seidenari Lorenzo
Serra Giuseppe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

In this paper, we present a novel method to improve the flexibility of descriptor matching for image recognition by using local multiresolution pyramids in feature space. We propose that image patches be represented at multiple levels of descriptor detail and that these levels be defined in terms of local spatial pooling resolution. Preserving multiple levels of detail in local descriptors is a way of hedging one's bets on which levels will most relevant for matching during learning and recognition. We introduce the Pyramid SIFT (P-SIFT) descriptor and show that its use in four state-of-the-art image recognition pipelines improves accuracy and yields state-of-the-art results. Our technique is applicable independently of spatial pyramid matching and we show that spatial pyramids can be combined with local pyramids to obtain further improvement. We achieve state-of-the-art results on Caltech-101 (80.1%) and Caltech-256 (52.6%) when compared to other approaches based on SIFT features over intensity images. Our technique is efficient and is extremely easy to integrate into image recognition pipelines

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Florence Research

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

GOLD: Gaussians of Local Descriptors for Image Representation

Author: Cucchiara Rita
Grana Costantino
Manfredi Marco
Serra Giuseppe
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The Bag of Words paradigm has been the baseline from which several successful image classification solutions were developed in the last decade. These represent images by quantizing local descriptors and summarizing their distribution. The quantization step introduces a dependency on the dataset, that even if in some contexts significantly boosts the performance, severely limits its generalization capabilities. Differently, in this paper, we propose to model the local features distribution with a multivariate Gaussian, without any quantization. The full rank covariance matrix, which lies on a Riemannian manifold, is projected on the tangent Euclidean space and concatenated to the mean vector. The resulting representation, a Gaussian of local descriptors (GOLD), allows to use the dot product to closely approximate a distance between distributions without the need for expensive kernel computations. We describe an image by an improved spatial pyramid, which avoids boundary effects with soft assignment: local descriptors contribute to neighboring Gaussians, forming a weighted spatial pyramid of GOLD descriptors. In addition, we extend the model leveraging dataset characteristics in a mixture of Gaussian formulation further improving the classification accuracy. To deal with large scale datasets and high dimensional feature spaces the Stochastic Gradient Descent solver is adopted. Experimental results on several publicly available datasets show that the proposed method obtains state-of-the-art performance

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia