64,432 research outputs found
Feature Extraction Using Fractal Codes
Fast and successful searching for an object in a multimedia database is a highly desirable functionality. Several approaches to content based retrieval for multimedia databases can be found in the literature [9,10,12,14,17]. The approach we consider is feature extraction. A feature can be seen as a way to present simple information like the texture, color and spatial information of an image, or the pitch, frequency of a sound etc.
In this paper we present a method for feature extraction on texture and spatial similarity, using fractal coding techniques. Our method is based upon the observation that the coefficients describing the fractal code of an image, contain very useful information about the structural content of the image. We apply simple statistics on information produced by fractal image coding. The statistics reveal features and require a small amount of storage. Several invariances are a consequence of the used methods: size, global contrast, orientation
A content-based image retrieval system for texture and color queries
Cataloged from PDF version of article.In recent years, very large collections of images and videos have grown rapidly.
In parallel with this growth, content-based retrieval and querying the indexed collections
are required to access visual information. Two of the main components of
the visual information are texture and color. In this thesis, a content-based image
retrieval system is presented that computes texture and color similarity among
images. The underlying technique is based on the adaptation of a statistical approach
to texture analysis. An optimal set of five second-order texture statistics
are extracted from the Spatial Grey Level Dependency Matrix of each image, so
as to render the feature vector for each image maximally informative, and yet
to obtain a low vector dimensionality for efficiency in computation. The method
for color analysis is the color histograms, and the information captured within
histograms is extracted after a pre-processing phase that performs color transformation,
quantization, and filtering. The features thus extracted and stored within
feature vectors are later compared with an intersection-based method. The system
is also extended for pre-processing images to segment regions with different
textural quality, rather than operating globally over the whole image. The system
also includes a framework for object-based color and texture querying, which
might be useful for reducing the similarity error while comparing rectangular regions
as objects. It is shown through experimental results and precision-recall
analysis that the content-based retrieval system is effective in terms of retrieval
and scalability.Konak, Eyüp SabriM.S
Aggregated Deep Local Features for Remote Sensing Image Retrieval
Remote Sensing Image Retrieval remains a challenging topic due to the special
nature of Remote Sensing Imagery. Such images contain various different
semantic objects, which clearly complicates the retrieval task. In this paper,
we present an image retrieval pipeline that uses attentive, local convolutional
features and aggregates them using the Vector of Locally Aggregated Descriptors
(VLAD) to produce a global descriptor. We study various system parameters such
as the multiplicative and additive attention mechanisms and descriptor
dimensionality. We propose a query expansion method that requires no external
inputs. Experiments demonstrate that even without training, the local
convolutional features and global representation outperform other systems.
After system tuning, we can achieve state-of-the-art or competitive results.
Furthermore, we observe that our query expansion method increases overall
system performance by about 3%, using only the top-three retrieved images.
Finally, we show how dimensionality reduction produces compact descriptors with
increased retrieval performance and fast retrieval computation times, e.g. 50%
faster than the current systems.Comment: Published in Remote Sensing. The first two authors have equal
contributio
Learning Aligned Cross-Modal Representations from Weakly Aligned Data
People can recognize scenes across many different modalities beyond natural
images. In this paper, we investigate how to learn cross-modal scene
representations that transfer across modalities. To study this problem, we
introduce a new cross-modal scene dataset. While convolutional neural networks
can categorize cross-modal scenes well, they also learn an intermediate
representation not aligned across modalities, which is undesirable for
cross-modal transfer applications. We present methods to regularize cross-modal
convolutional neural networks so that they have a shared representation that is
agnostic of the modality. Our experiments suggest that our scene representation
can help transfer representations across modalities for retrieval. Moreover,
our visualizations suggest that units emerge in the shared representation that
tend to activate on consistent concepts independently of the modality.Comment: Conference paper at CVPR 201
Object Level Deep Feature Pooling for Compact Image Representation
Convolutional Neural Network (CNN) features have been successfully employed
in recent works as an image descriptor for various vision tasks. But the
inability of the deep CNN features to exhibit invariance to geometric
transformations and object compositions poses a great challenge for image
search. In this work, we demonstrate the effectiveness of the objectness prior
over the deep CNN features of image regions for obtaining an invariant image
representation. The proposed approach represents the image as a vector of
pooled CNN features describing the underlying objects. This representation
provides robustness to spatial layout of the objects in the scene and achieves
invariance to general geometric transformations, such as translation, rotation
and scaling. The proposed approach also leads to a compact representation of
the scene, making each image occupy a smaller memory footprint. Experiments
show that the proposed representation achieves state of the art retrieval
results on a set of challenging benchmark image datasets, while maintaining a
compact representation.Comment: Deep Vision 201
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- …