3,290 research outputs found
Image Parsing with a Wide Range of Classes and Scene-Level Context
This paper presents a nonparametric scene parsing approach that improves the
overall accuracy, as well as the coverage of foreground classes in scene
images. We first improve the label likelihood estimates at superpixels by
merging likelihood scores from different probabilistic classifiers. This boosts
the classification performance and enriches the representation of
less-represented classes. Our second contribution consists of incorporating
semantic context in the parsing process through global label costs. Our method
does not rely on image retrieval sets but rather assigns a global likelihood
estimate to each label, which is plugged into the overall energy function. We
evaluate our system on two large-scale datasets, SIFTflow and LMSun. We achieve
state-of-the-art performance on the SIFTflow dataset and near-record results on
LMSun.Comment: Published at CVPR 2015, Computer Vision and Pattern Recognition
(CVPR), 2015 IEEE Conference o
Fusing image representations for classification using support vector machines
In order to improve classification accuracy different image representations
are usually combined. This can be done by using two different fusing schemes.
In feature level fusion schemes, image representations are combined before the
classification process. In classifier fusion, the decisions taken separately
based on individual representations are fused to make a decision. In this paper
the main methods derived for both strategies are evaluated. Our experimental
results show that classifier fusion performs better. Specifically Bayes belief
integration is the best performing strategy for image classification task.Comment: Image and Vision Computing New Zealand, 2009. IVCNZ '09. 24th
International Conference, Wellington : Nouvelle-Z\'elande (2009
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
Salient Regions for Query by Image Content
Much previous work on image retrieval has used global features such as colour and texture to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. This paper discusses how this problem can be circumvented by using salient interest points and compares and contrasts an extension to previous work in which the concept of scale is incorporated into the selection of salient regions to select the areas of the image that are most interesting and generate local descriptors to describe the image characteristics in that region. The paper describes and contrasts two such salient region descriptors and compares them through their repeatability rate under a range of common image transforms. Finally, the paper goes on to investigate the performance of one of the salient region detectors in an image retrieval situation
Fusing MPEG-7 visual descriptors for image classification
This paper proposes three content-based image classification techniques based on fusing various low-level MPEG-7 visual descriptors. Fusion is necessary as descriptors would be otherwise incompatible and inappropriate to directly include e.g. in a Euclidean distance. Three approaches are described: A “merging” fusion combined with an SVM classifier, a back-propagation fusion combined with a KNN classifier and a Fuzzy-ART neurofuzzy network. In the latter case, fuzzy rules can be extracted in an effort to bridge the “semantic gap” between the low-level descriptors and the high-level semantics of an image. All networks were evaluated using content from the repository of the aceMedia project1 and more specifically in a beach/urban scene classification problem
The aceToolbox: low-level audiovisual feature extraction for retrieval and classification
In this paper we present an overview of a software platform
that has been developed within the aceMedia project,
termed the aceToolbox, that provides global and local lowlevel feature extraction from audio-visual content. The toolbox is based on the MPEG-7 eXperimental Model (XM),
with extensions to provide descriptor extraction from arbitrarily shaped image segments, thereby supporting local descriptors reflecting real image content. We describe the architecture of the toolbox as well as providing an overview of the descriptors supported to date. We also briefly describe the segmentation algorithm provided. We then demonstrate the usefulness of the toolbox in the context of two different content processing scenarios: similarity-based retrieval in large collections and scene-level classification of still images
- …