22,141 research outputs found
From BoW to CNN: Two Decades of Texture Representation for Texture Classification
Texture is a fundamental characteristic of many types of images, and texture
representation is one of the essential and challenging problems in computer
vision and pattern recognition which has attracted extensive research
attention. Since 2000, texture representations based on Bag of Words (BoW) and
on Convolutional Neural Networks (CNNs) have been extensively studied with
impressive performance. Given this period of remarkable evolution, this paper
aims to present a comprehensive survey of advances in texture representation
over the last two decades. More than 200 major publications are cited in this
survey covering different aspects of the research, which includes (i) problem
description; (ii) recent advances in the broad categories of BoW-based,
CNN-based and attribute-based methods; and (iii) evaluation issues,
specifically benchmark datasets and state of the art results. In retrospect of
what has been achieved so far, the survey discusses open challenges and
directions for future research.Comment: Accepted by IJC
HEp-2 Cell Classification via Fusing Texture and Shape Information
Indirect Immunofluorescence (IIF) HEp-2 cell image is an effective evidence
for diagnosis of autoimmune diseases. Recently computer-aided diagnosis of
autoimmune diseases by IIF HEp-2 cell classification has attracted great
attention. However the HEp-2 cell classification task is quite challenging due
to large intra-class variation and small between-class variation. In this paper
we propose an effective and efficient approach for the automatic classification
of IIF HEp-2 cell image by fusing multi-resolution texture information and
richer shape information. To be specific, we propose to: a) capture the
multi-resolution texture information by a novel Pairwise Rotation Invariant
Co-occurrence of Local Gabor Binary Pattern (PRICoLGBP) descriptor, b) depict
the richer shape information by using an Improved Fisher Vector (IFV) model
with RootSIFT features which are sampled from large image patches in multiple
scales, and c) combine them properly. We evaluate systematically the proposed
approach on the IEEE International Conference on Pattern Recognition (ICPR)
2012, IEEE International Conference on Image Processing (ICIP) 2013 and ICPR
2014 contest data sets. The experimental results for the proposed methods
significantly outperform the winners of ICPR 2012 and ICIP 2013 contest, and
achieve comparable performance with the winner of the newly released ICPR 2014
contest.Comment: 11 pages, 7 figure
On the Use of Default Parameter Settings in the Empirical Evaluation of Classification Algorithms
We demonstrate that, for a range of state-of-the-art machine learning
algorithms, the differences in generalisation performance obtained using
default parameter settings and using parameters tuned via cross-validation can
be similar in magnitude to the differences in performance observed between
state-of-the-art and uncompetitive learning systems. This means that fair and
rigorous evaluation of new learning algorithms requires performance comparison
against benchmark methods with best-practice model selection procedures, rather
than using default parameter settings. We investigate the sensitivity of three
key machine learning algorithms (support vector machine, random forest and
rotation forest) to their default parameter settings, and provide guidance on
determining sensible default parameter values for implementations of these
algorithms. We also conduct an experimental comparison of these three
algorithms on 121 classification problems and find that, perhaps surprisingly,
rotation forest is significantly more accurate on average than both random
forest and a support vector machine
Multi-modal Face Pose Estimation with Multi-task Manifold Deep Learning
Human face pose estimation aims at estimating the gazing direction or head
postures with 2D images. It gives some very important information such as
communicative gestures, saliency detection and so on, which attracts plenty of
attention recently. However, it is challenging because of complex background,
various orientations and face appearance visibility. Therefore, a descriptive
representation of face images and mapping it to poses are critical. In this
paper, we make use of multi-modal data and propose a novel face pose estimation
method that uses a novel deep learning framework named Multi-task Manifold Deep
Learning . It is based on feature extraction with improved deep neural
networks and multi-modal mapping relationship with multi-task learning. In the
proposed deep learning based framework, Manifold Regularized Convolutional
Layers (MRCL) improve traditional convolutional layers by learning the
relationship among outputs of neurons. Besides, in the proposed mapping
relationship learning method, different modals of face representations are
naturally combined to learn the mapping function from face images to poses. In
this way, the computed mapping model with multiple tasks is improved.
Experimental results on three challenging benchmark datasets DPOSE, HPID and
BKHPD demonstrate the outstanding performance of
Gender Recognition Based on Sift Features
This paper proposes a robust approach for face detection and gender
classification in color images. Previous researches about gender recognition
suppose an expensive computational and time-consuming pre-processing step in
order to alignment in which face images are aligned so that facial landmarks
like eyes, nose, lips, chin are placed in uniform locations in image. In this
paper, a novel technique based on mathematical analysis is represented in three
stages that eliminates alignment step. First, a new color based face detection
method is represented with a better result and more robustness in complex
backgrounds. Next, the features which are invariant to affine transformations
are extracted from each face using scale invariant feature transform (SIFT)
method. To evaluate the performance of the proposed algorithm, experiments have
been conducted by employing a SVM classifier on a database of face images which
contains 500 images from distinct people with equal ratio of male and female
Texture image analysis and texture classification methods - A review
Tactile texture refers to the tangible feel of a surface and visual texture
refers to see the shape or contents of the image. In the image processing, the
texture can be defined as a function of spatial variation of the brightness
intensity of the pixels. Texture is the main term used to define objects or
concepts of a given image. Texture analysis plays an important role in computer
vision cases such as object recognition, surface defect detection, pattern
recognition, medical image analysis, etc. Since now many approaches have been
proposed to describe texture images accurately. Texture analysis methods
usually are classified into four categories: statistical methods, structural,
model-based and transform-based methods. This paper discusses the various
methods used for texture or analysis in details. New researches shows the power
of combinational methods for texture analysis, which can't be in specific
category. This paper provides a review on well known combinational methods in a
specific section with details. This paper counts advantages and disadvantages
of well-known texture image descriptors in the result part. Main focus in all
of the survived methods is on discrimination performance, computational
complexity and resistance to challenges such as noise, rotation, etc. A brief
review is also made on the common classifiers used for texture image
classification. Also, a survey on texture image benchmark datasets is included.Comment: 29 Pages, Keywords: Texture Image, Texture Analysis, Texture
classification, Feature extraction, Image processing, Local Binary Patterns,
Benchmark texture image dataset
Interpretable Transformations with Encoder-Decoder Networks
Deep feature spaces have the capacity to encode complex transformations of
their input data. However, understanding the relative feature-space
relationship between two transformed encoded images is difficult. For instance,
what is the relative feature space relationship between two rotated images?
What is decoded when we interpolate in feature space? Ideally, we want to
disentangle confounding factors, such as pose, appearance, and illumination,
from object identity. Disentangling these is difficult because they interact in
very nonlinear ways. We propose a simple method to construct a deep feature
space, with explicitly disentangled representations of several known
transformations. A person or algorithm can then manipulate the disentangled
representation, for example, to re-render an image with explicit control over
parameterized degrees of freedom. The feature space is constructed using a
transforming encoder-decoder network with a custom feature transform layer,
acting on the hidden representations. We demonstrate the advantages of explicit
disentangling on a variety of datasets and transformations, and as an aid for
traditional tasks, such as classification.Comment: Accepted at ICCV 201
Alignment Distances on Systems of Bags
Recent research in image and video recognition indicates that many visual
processes can be thought of as being generated by a time-varying generative
model. A nearby descriptive model for visual processes is thus a statistical
distribution that varies over time. Specifically, modeling visual processes as
streams of histograms generated by a kernelized linear dynamic system turns out
to be efficient. We refer to such a model as a System of Bags. In this work, we
investigate Systems of Bags with special emphasis on dynamic scenes and dynamic
textures. Parameters of linear dynamic systems suffer from ambiguities. In
order to cope with these ambiguities in the kernelized setting, we develop a
kernelized version of the alignment distance. For its computation, we use a
Jacobi-type method and prove its convergence to a set of critical points. We
employ it as a dissimilarity measure on Systems of Bags. As such, it
outperforms other known dissimilarity measures for kernelized linear dynamic
systems, in particular the Martin Distance and the Maximum Singular Value
Distance, in every tested classification setting. A considerable margin can be
observed in settings, where classification is performed with respect to an
abstract mean of video sets. For this scenario, the presented approach can
outperform state-of-the-art techniques, such as Dynamic Fractal Spectrum or
Orthogonal Tensor Dictionary Learning
Joint Maximum Purity Forest with Application to Image Super-Resolution
In this paper, we propose a novel random-forest scheme, namely Joint Maximum
Purity Forest (JMPF), for classification, clustering, and regression tasks. In
the JMPF scheme, the original feature space is transformed into a compactly
pre-clustered feature space, via a trained rotation matrix. The rotation matrix
is obtained through an iterative quantization process, where the input data
belonging to different classes are clustered to the respective vertices of the
new feature space with maximum purity. In the new feature space, orthogonal
hyperplanes, which are employed at the split-nodes of decision trees in random
forests, can tackle the clustering problems effectively. We evaluated our
proposed method on public benchmark datasets for regression and classification
tasks, and experiments showed that JMPF remarkably outperforms other
state-of-the-art random-forest-based approaches. Furthermore, we applied JMPF
to image super-resolution, because the transformed, compact features are more
discriminative to the clustering-regression scheme. Experiment results on
several public benchmark datasets also showed that the JMPF-based image
super-resolution scheme is consistently superior to recent state-of-the-art
image super-resolution algorithms.Comment: 18 pages, 7 figure
Fault detection in operating helicopter drive train components based on support vector data description
The objective of the paper is to develop a vibration-based automated procedure dealing with early detection of
mechanical degradation of helicopter drive train components using Health and Usage Monitoring Systems (HUMS) data. An anomaly-detection method devoted to the quantification of the degree of deviation of the mechanical state of a component from its nominal condition is developed. This method is based on an Anomaly Score (AS) formed by a combination of a set of statistical features correlated with specific damages, also known as Condition Indicators (CI), thus the operational variability is implicitly included in the model through the CI correlation. The problem of fault detection is then recast as a one-class classification problem in the space spanned by a set of CI, with the aim of a global differentiation between normal and anomalous observations, respectively related to healthy and supposedly faulty components. In this paper, a procedure based on an efficient one-class classification method that does not require any assumption on the data distribution, is used. The core of such an approach is the Support Vector Data Description (SVDD), that allows an efficient data description without the need of a significant amount of statistical data. Several analyses have been carried out in order to validate the proposed procedure, using flight vibration data collected from a H135, formerly known as EC135, servicing helicopter, for which micro-pitting damage on a gear was detected by HUMS and assessed through visual inspection. The capability of the proposed approach of providing better trade-off between false alarm rates and missed detection rates with respect to individual CI and to the AS obtained assuming jointly-Gaussian-distributed CI has been also analysed
- …