1,534 research outputs found
Unsupervised feature learning using self-organizing maps.
In recent years a great amount of research has focused on algorithms that learn features from unlabeled data. These approaches are known as feature learning or deep learning methods and have been successfully applied to classify scene images and recognize with high precision handwritten characters. In this thesis we show that a feature learning approach can be used to segment complex textures, a problem for a long time addressed proposing a large amount of handcrafted descriptors and local optimization strategies. We employ the SOM neural network for its ability to natively provide a set of topologically ordered features. These features allow us to obtain a highly accurate local description, even in areas characterized by a transition from one texture to another. We also show that a single feature learning unit can be combined with others in order to significantly improve the quality of the texture description and, consequently, reduce the segmentation errors. The results obtained prove that the proposed segmentation method is valid and provides a real alternative to other state-of-the-art methods. Since the proposed framework is simple, we easily combined it with a pyramidal histogram encoding and a SVM supervised network in order to classify scene images. We show that the important topological ordering property, inherited from the SOM network, allow us to resize the feature set, obtained during the initial unsupervised learning, avoiding an unpredictable performance loss. Moreover, the results on the standard Caltech-101 dataset prove a significant improvement on some state-of-the-art computer vision methods, designed specifically for image classification
Unsupervised Understanding of Location and Illumination Changes in Egocentric Videos
Wearable cameras stand out as one of the most promising devices for the
upcoming years, and as a consequence, the demand of computer algorithms to
automatically understand the videos recorded with them is increasing quickly.
An automatic understanding of these videos is not an easy task, and its mobile
nature implies important challenges to be faced, such as the changing light
conditions and the unrestricted locations recorded. This paper proposes an
unsupervised strategy based on global features and manifold learning to endow
wearable cameras with contextual information regarding the light conditions and
the location captured. Results show that non-linear manifold methods can
capture contextual patterns from global features without compromising large
computational resources. The proposed strategy is used, as an application case,
as a switching mechanism to improve the hand-detection problem in egocentric
videos.Comment: Submitted for publicatio
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification
Designing discriminative powerful texture features robust to realistic
imaging conditions is a challenging computer vision problem with many
applications, including material recognition and analysis of satellite or
aerial imagery. In the past, most texture description approaches were based on
dense orderless statistical distribution of local features. However, most
recent approaches to texture recognition and remote sensing scene
classification are based on Convolutional Neural Networks (CNNs). The d facto
practice when learning these CNN models is to use RGB patches as input with
training performed on large amounts of labeled data (ImageNet). In this paper,
we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained
using mapped coded images with explicit texture information provide
complementary information to the standard RGB deep models. Additionally, two
deep architectures, namely early and late fusion, are investigated to combine
the texture and color information. To the best of our knowledge, we are the
first to investigate Binary Patterns encoded CNNs and different deep network
fusion architectures for texture recognition and remote sensing scene
classification. We perform comprehensive experiments on four texture
recognition datasets and four remote sensing scene classification benchmarks:
UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with
7 categories and the recently introduced large scale aerial image dataset (AID)
with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary
information to standard RGB deep model of the same network architecture. Our
late fusion TEX-Net architecture always improves the overall performance
compared to the standard RGB network on both recognition problems. Our final
combination outperforms the state-of-the-art without employing fine-tuning or
ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin
Feature Extraction Methods by Various Concepts using SOM
Image retrieval systems gained traction with the increased use of visual and media data. It is critical to understand and manage big data, lot of analysis done in image retrieval applications. Given the considerable difficulty involved in handling big data using a traditional approach, there is a demand for its efficient management, particularly regarding accuracy and robustness. To solve these issues, we employ content-based image retrieval (CBIR) methods within both supervised , unsupervised pictures. Self-Organizing Maps (SOM), a competitive unsupervised learning aggregation technique, are applied in our innovative multilevel fusion methodology to extract features that are categorised. The proposed methodology beat state-of-the-art algorithms with 90.3% precision, approximate retrieval precision (ARP) of 0.91, and approximate retrieval recall (ARR) of 0.82 when tested on several benchmark datasets
Content Based Image Retrieval by Convolutional Neural Networks
Hamreras S., Benítez-Rochel R., Boucheham B., Molina-Cabello M.A., López-Rubio E. (2019) Content Based Image Retrieval by Convolutional Neural Networks. In: Ferrández Vicente J., Álvarez-Sánchez J., de la Paz López F., Toledo Moreo J., Adeli H. (eds) From Bioinspired Systems and Biomedical Applications to Machine Learning. IWINAC 2019. Lecture Notes in Computer Science, vol 11487. Springer.In this paper, we present a Convolutional Neural Network (CNN) for feature extraction in Content based Image Retrieval (CBIR). The proposed CNN aims at reducing the semantic gap between low level and high-level features. Thus, improving retrieval results. Our CNN is the result of a transfer learning technique using Alexnet pretrained network. It learns how to extract representative features from a learning database and then uses this knowledge in query feature extraction. Experimentations performed on Wang (Corel 1K) database show a significant improvement in terms of precision over the state of the art classic approaches.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
CTex - an adaptive unsupervised segmentation algorithm based on color-texture coherence
This paper presents the development of an unsupervised image segmentation framework (referred to as CTex) that is based on the adaptive inclusion of color and texture in the process of data partition. An important contribution of this work consists of a new formulation for the extraction of color features that evaluates the input image in a multispace color representation. To achieve this, we have used the opponent characteristics of the RGB and YIQ color spaces where the key component was the inclusion of the self organizing map (SOM) network in the computation of the dominant colors and estimation of the optimal number of clusters in the image. The texture features are computed using a multichannel texture decomposition scheme based on Gabor filtering. The major contribution of this work resides in the adaptive integration of the color and texture features in a compound mathematical descriptor with the aim of identifying the homogenous regions in the image. This integration is performed by a novel adaptive clustering algorithm that enforces the spatial continuity during the data assignment process. A comprehensive qualitative and quantitative performance evaluation has been carried out and the experimental results indicate that the proposed technique is accurate in capturing the color and texture characteristics when applied to complex natural images
- …