21,855 research outputs found
Learning midlevel image features for natural scene and texture classification
This paper deals with coding of natural scenes in order to extract semantic information. We present a new scheme to project natural scenes onto a basis in which each dimension encodes statistically independent information. Basis extraction is performed by independent component analysis (ICA) applied to image patches culled from natural scenes. The study of the resulting coding units (coding filters) extracted from well-chosen categories of images shows that they adapt and respond selectively to discriminant features in natural scenes. Given this basis, we define global and local image signatures relying on the maximal activity of filters on the input image. Locally, the construction of the signature takes into account the spatial distribution of the maximal responses within the image. We propose a criterion to reduce the size of the space of representation for faster computation. The proposed approach is tested in the context of texture classification (111 classes), as well as natural scenes classification (11 categories, 2037 images). Using a common protocol, the other commonly used descriptors have at most 47.7% accuracy on average while our method obtains performances of up to 63.8%. We show that this advantage does not depend on the size of the signature and demonstrate the efficiency of the proposed criterion to select ICA filters and reduce the dimensio
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
Prototypicality effects in global semantic description of objects
In this paper, we introduce a novel approach for semantic description of
object features based on the prototypicality effects of the Prototype Theory.
Our prototype-based description model encodes and stores the semantic meaning
of an object, while describing its features using the semantic prototype
computed by CNN-classifications models. Our method uses semantic prototypes to
create discriminative descriptor signatures that describe an object
highlighting its most distinctive features within the category. Our experiments
show that: i) our descriptor preserves the semantic information used by the
CNN-models in classification tasks; ii) our distance metric can be used as the
object's typicality score; iii) our descriptor signatures are semantically
interpretable and enables the simulation of the prototypical organization of
objects within a category.Comment: Paper accepted in IEEE Winter Conference on Applications of Computer
Vision 2019 (WACV2019). Content: 10 pages (8 + 2 reference) with 7 figure
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- âŠ