822 research outputs found
Linear Spatial Pyramid Matching Using Non-convex and non-negative Sparse Coding for Image Classification
Recently sparse coding have been highly successful in image classification
mainly due to its capability of incorporating the sparsity of image
representation. In this paper, we propose an improved sparse coding model based
on linear spatial pyramid matching(SPM) and Scale Invariant Feature Transform
(SIFT ) descriptors. The novelty is the simultaneous non-convex and
non-negative characters added to the sparse coding model. Our numerical
experiments show that the improved approach using non-convex and non-negative
sparse coding is superior than the original ScSPM[1] on several typical
databases
Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds
Sparsity-based representations have recently led to notable results in
various visual recognition tasks. In a separate line of research, Riemannian
manifolds have been shown useful for dealing with features and models that do
not lie in Euclidean spaces. With the aim of building a bridge between the two
realms, we address the problem of sparse coding and dictionary learning over
the space of linear subspaces, which form Riemannian structures known as
Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into
the space of symmetric matrices by an isometric mapping. This in turn enables
us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we
propose closed-form solutions for learning a Grassmann dictionary, atom by
atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann
sparse coding and dictionary learning algorithms through embedding into Hilbert
spaces.
Experiments on several classification tasks (gender recognition, gesture
classification, scene analysis, face recognition, action recognition and
dynamic texture classification) show that the proposed approaches achieve
considerable improvements in discrimination accuracy, in comparison to
state-of-the-art methods such as kernelized Affine Hull Method and
graph-embedding Grassmann discriminant analysis.Comment: Appearing in International Journal of Computer Visio
Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model
International audienceIn the context of category level scene classification, the bag-of-visual-words model (BoVW) is widely used for image representation. This model is appearance based and does not contain any information regarding the arrangement of the visual words in the 2D image space. To overcome this problem, recent approaches try to capture information about either the absolute or the relative spatial location of visual words. In the first category, the so-called Spatial Pyramid Representation (SPR) is very popular thanks to its simplicity and good results. Alternatively, adding information about occurrences of relative spatial configurations of visual words was proven to be effective but at the cost of higher computational complexity, specifically when relative distance and angles are taken into account. In this paper, we introduce a novel way to incorporate both distance and angle information in the BoVW representation. The novelty is first to provide a computationally efficient representation adding relative spatial information between visual words and second to use a soft pairwise voting scheme based on the distance in the descriptor space. Experiments on challenging data sets MSRC-2, 15Scene, Caltech101, Caltech256 and Pascal VOC 2007 demonstrate that our method outperforms or is competitive with concurrent ones. We also show that it provides important complementary information to the spatial pyramid matching and can improve the overall performance
Sparse Coding-Based Method Comparison for Land-Use Classification
Land-use classification utilize high-resolution remote sensing image. The image is utilized for improving the classification problem. Nonetheless, in other side, the problem becomes more challenging cause the image is too complex. We have to represent the image appropriately. On of the common method to deal with it is Bag of Visual Word (BOVW). The method needs a coding process to get the final data interpretation. There are many methods to do coding such as Hard Quantization Coding (HQ), Sparse Coding (SC), and Locality-constrained Linear Coding (LCC). However, that coding methods use a different assumption. Therefore, we have to compare the result of each coding method. The coding method affects classification accuracy. The best coding method will produce the better classification result. Dataset UC Merced consisted 21 classes is used in this research. The experiment result shows that LCC got better performance / accuracy than SC and HQ. LCC method got 86.48 % accuracy. Furthermore, LCC also got the best performance on various number of training data for each class
Fast Low-rank Representation based Spatial Pyramid Matching for Image Classification
Spatial Pyramid Matching (SPM) and its variants have achieved a lot of
success in image classification. The main difference among them is their
encoding schemes. For example, ScSPM incorporates Sparse Code (SC) instead of
Vector Quantization (VQ) into the framework of SPM. Although the methods
achieve a higher recognition rate than the traditional SPM, they consume more
time to encode the local descriptors extracted from the image. In this paper,
we propose using Low Rank Representation (LRR) to encode the descriptors under
the framework of SPM. Different from SC, LRR considers the group effect among
data points instead of sparsity. Benefiting from this property, the proposed
method (i.e., LrrSPM) can offer a better performance. To further improve the
generalizability and robustness, we reformulate the rank-minimization problem
as a truncated projection problem. Extensive experimental studies show that
LrrSPM is more efficient than its counterparts (e.g., ScSPM) while achieving
competitive recognition rates on nine image data sets.Comment: accepted into knowledge based systems, 201
Expanded Parts Model for Semantic Description of Humans in Still Images
We introduce an Expanded Parts Model (EPM) for recognizing human attributes
(e.g. young, short hair, wearing suit) and actions (e.g. running, jumping) in
still images. An EPM is a collection of part templates which are learnt
discriminatively to explain specific scale-space regions in the images (in
human centric coordinates). This is in contrast to current models which consist
of a relatively few (i.e. a mixture of) 'average' templates. EPM uses only a
subset of the parts to score an image and scores the image sparsely in space,
i.e. it ignores redundant and random background in an image. To learn our
model, we propose an algorithm which automatically mines parts and learns
corresponding discriminative templates together with their respective locations
from a large number of candidate parts. We validate our method on three recent
challenging datasets of human attributes and actions. We obtain convincing
qualitative and state-of-the-art quantitative results on the three datasets.Comment: Accepted for publication in IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI
Remote Sensing Image Scene Classification: Benchmark and State of the Art
Remote sensing image scene classification plays an important role in a wide
range of applications and hence has been receiving remarkable attention. During
the past years, significant efforts have been made to develop various datasets
or present a variety of approaches for scene classification from remote sensing
images. However, a systematic review of the literature concerning datasets and
methods for scene classification is still lacking. In addition, almost all
existing datasets have a number of limitations, including the small scale of
scene classes and the image numbers, the lack of image variations and
diversity, and the saturation of accuracy. These limitations severely limit the
development of new approaches especially deep learning-based methods. This
paper first provides a comprehensive review of the recent progress. Then, we
propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly
available benchmark for REmote Sensing Image Scene Classification (RESISC),
created by Northwestern Polytechnical University (NWPU). This dataset contains
31,500 images, covering 45 scene classes with 700 images in each class. The
proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total
image number, (ii) holds big variations in translation, spatial resolution,
viewpoint, object pose, illumination, background, and occlusion, and (iii) has
high within-class diversity and between-class similarity. The creation of this
dataset will enable the community to develop and evaluate various data-driven
algorithms. Finally, several representative methods are evaluated using the
proposed dataset and the results are reported as a useful baseline for future
research.Comment: This manuscript is the accepted version for Proceedings of the IEE
- …