4,842 research outputs found
Object Level Deep Feature Pooling for Compact Image Representation
Convolutional Neural Network (CNN) features have been successfully employed
in recent works as an image descriptor for various vision tasks. But the
inability of the deep CNN features to exhibit invariance to geometric
transformations and object compositions poses a great challenge for image
search. In this work, we demonstrate the effectiveness of the objectness prior
over the deep CNN features of image regions for obtaining an invariant image
representation. The proposed approach represents the image as a vector of
pooled CNN features describing the underlying objects. This representation
provides robustness to spatial layout of the objects in the scene and achieves
invariance to general geometric transformations, such as translation, rotation
and scaling. The proposed approach also leads to a compact representation of
the scene, making each image occupy a smaller memory footprint. Experiments
show that the proposed representation achieves state of the art retrieval
results on a set of challenging benchmark image datasets, while maintaining a
compact representation.Comment: Deep Vision 201
A Brief Review On Image Retrieval Techniques and its Scope
This paper presents the novel approach for image retrieval. Image retrieval is an important problem in many applications, such as copyright infringement detection, tag annotation, commercial retrieval, and landmark identification. Image retrieval definition is given and the concept and significance of image retrieval is also provided. Various image retrieval techniques based on content based, sketch based, also based on image annotation is explained here. The last section includes the approach for retrieval is given as a problem formulation
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
Multiple vocabulary coding for 3D shape retrieval using Bag of Covariances
Bag of Covariance matrices (BoC) have been recently introduced as an extension of the standard Bag of Words (BoW) to the space of positive semi-definite matrices, which has a Riemannian structure. BoC descriptors can be constructed with various Riemannian metrics and using various quantization approaches. Each construction results in some quantization errors, which are often reduced by increasing the vocabulary size. This, however, results in a signature that is not compact, increasing both the storage and computation complexity. This article demonstrates that a compact signature, with minimum distortion, can be constructed by using multiple vocabulary based coding. Each vocabulary is constructed from a different quantization method of the covariance feature space. The proposed method also extracts non-linear dependencies between the different BoC signatures to compose the final compact signature. Our experiments show that the proposed approach can boost the performance of the BoC descriptors in various 3D shape classification and retrieval tasks
- …