571 research outputs found
Nonnegative OPLS for supervised design of filter banks: application to image and audio feature extraction
Audio or visual data analysis tasks usually have to deal with high-dimensional and nonnegative signals. However, most data analysis methods suffer from overfitting and numerical problems when data have more than a few dimensions needing a dimensionality reduction preprocessing. Moreover, interpretability about how and why filters work for audio or visual applications is a desired property, especially when energy or spectral signals are involved. In these cases, due to the nature of these signals, the nonnegativity of the filter weights is a desired property to better understand its working. Because of these two necessities, we propose different methods to reduce the dimensionality of data while the nonnegativity and interpretability of the solution are assured. In particular, we propose a generalized methodology to design filter banks in a supervised way for applications dealing with nonnegative data, and we explore different ways of solving the proposed objective function consisting of a nonnegative version of the orthonormalized partial least-squares method. We analyze the discriminative power of the features obtained with the proposed methods for two different and widely studied applications: texture and music genre classification. Furthermore, we compare the filter banks achieved by our methods with other state-of-the-art methods specifically designed for feature extraction.This work was supported in parts by the MINECO projects TEC2013-48439-C4-1-R, TEC2014-52289-R, TEC2016-75161-C2-1-R, TEC2016-75161-C2-2-R, TEC2016-81900-REDT/AEI, and PRICAM (S2013/ICE-2933)
Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification
Designing discriminative powerful texture features robust to realistic
imaging conditions is a challenging computer vision problem with many
applications, including material recognition and analysis of satellite or
aerial imagery. In the past, most texture description approaches were based on
dense orderless statistical distribution of local features. However, most
recent approaches to texture recognition and remote sensing scene
classification are based on Convolutional Neural Networks (CNNs). The d facto
practice when learning these CNN models is to use RGB patches as input with
training performed on large amounts of labeled data (ImageNet). In this paper,
we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained
using mapped coded images with explicit texture information provide
complementary information to the standard RGB deep models. Additionally, two
deep architectures, namely early and late fusion, are investigated to combine
the texture and color information. To the best of our knowledge, we are the
first to investigate Binary Patterns encoded CNNs and different deep network
fusion architectures for texture recognition and remote sensing scene
classification. We perform comprehensive experiments on four texture
recognition datasets and four remote sensing scene classification benchmarks:
UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with
7 categories and the recently introduced large scale aerial image dataset (AID)
with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary
information to standard RGB deep model of the same network architecture. Our
late fusion TEX-Net architecture always improves the overall performance
compared to the standard RGB network on both recognition problems. Our final
combination outperforms the state-of-the-art without employing fine-tuning or
ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin
Development of Multirate Filter – Based Region Features for Iris Identification
The emergence of biometric system is seen as the next-generation technological solution in strengthening the social and national security. The evolution of biometrics has shifted the paradigm of authentication from classical token and knowledge-based systems to physiological and behavioral trait based systems. R & D on iris biometrics, in last one decade, has established it as one of the most promising traits. Even though, iris biometric takes high resolution near-infrared (NIR) images as input, its authentication accuracy is very commendable. Its performance is often influenced by the presence of noise, database size, and feature representation. This thesis focuses on the use of multi resolution analysis (MRA) in developing suitable features for non-ideal iris images. Our investigation starts with the iris feature extraction technique using Cohen −Daubechies − Feauveau 9/7 (CDF 9/7) filter bank. In this work, a technique has been proposed to deal with issues like segmentation failure and occlusion. The experimental studies deal with the superiority of CDF 9/7 filter bank over the frequency based techniques. Since there is scope for improving the frequency selectivity of CDF 9/7 filter bank, a tunable filter bank is proposed to extract region based features from non-cooperative iris images. The proposed method is based on half band polynomial of 14th order. Since, regularity and frequency selectivity are in inverse relationship with each other, filter coefficients are derived by not imposing maximum number of zeros. Also, the half band polynomial is presented in x-domain, so as to apply semidefinite programming, which results in optimization of coefficients of analysis/synthesis filter. The next contribution in this thesis deals with the development of another powerful MRA known as triplet half band filter bank (THFB). The advantage of THFB is the flexibility in choosing the frequency response that allows one to overcome the magnitude constraints. The proposed filter bank has improved frequency selectivity along with other desired properties, which is then used for iris feature extraction. The last contribution of the thesis describes a wavelet cepstral feature derived from CDF 9/7 filter bank to characterize iris texture. Wavelet cepstrum feature helps in reducing the dimensionality of the detail coefficients; hence, a compact feature presentation is possible with improved accuracy against CDF 9/7. The efficacy of the features suggested are validated for iris recognition on three publicly available databases namely, CASIAv3, UBIRISv1, and IITD. The features are compared with other transform domain features like FFT, Gabor filter and a comprehensive evaluation is done for all suggested features as well. It has been observed that the suggested features show superior performance with respect to accuracy. Among all suggested features, THFB has shown best performance
Interactive real-time three-dimensional visualisation of virtual textiles
Virtual textile databases provide a cost-efficient alternative to the use of existing hardcover
sample catalogues. By taking advantage of the high performance features offered by the
latest generation of programmable graphics accelerator boards, it is possible to combine
photometric stereo methods with 3D visualisation methods to implement a virtual textile
database. In this thesis, we investigate and combine rotation invariant texture retrieval with
interactive visualisation techniques.
We use a 3D surface representation that is a generic data representation that allows us to
combine real-time interactive 3D visualisation methods with present day texture retrieval
methods. We begin by investigating the most suitable data format for the 3D surface
representation and identify relief-mapping combined with Bézier surfaces as the most
suitable 3D surface representations for our needs, and go on to describe how these
representation can be combined for real-time rendering. We then investigate ten different
methods of implementing rotation invariant texture retrieval using feature vectors. These
results show that first order statistics in the form of histogram data are very effective for
discriminating colour albedo information, while rotation invariant gradient maps are
effective for distinguishing between different types of micro-geometry using either first or
second order statistics.Engineering and physical Sciences Research (EPSRC
Feature Learning from Spectrograms for Assessment of Personality Traits
Several methods have recently been proposed to analyze speech and
automatically infer the personality of the speaker. These methods often rely on
prosodic and other hand crafted speech processing features extracted with
off-the-shelf toolboxes. To achieve high accuracy, numerous features are
typically extracted using complex and highly parameterized algorithms. In this
paper, a new method based on feature learning and spectrogram analysis is
proposed to simplify the feature extraction process while maintaining a high
level of accuracy. The proposed method learns a dictionary of discriminant
features from patches extracted in the spectrogram representations of training
speech segments. Each speech segment is then encoded using the dictionary, and
the resulting feature set is used to perform classification of personality
traits. Experiments indicate that the proposed method achieves state-of-the-art
results with a significant reduction in complexity when compared to the most
recent reference methods. The number of features, and difficulties linked to
the feature extraction process are greatly reduced as only one type of
descriptors is used, for which the 6 parameters can be tuned automatically. In
contrast, the simplest reference method uses 4 types of descriptors to which 6
functionals are applied, resulting in over 20 parameters to be tuned.Comment: 12 pages, 3 figure
Retinal vessel segmentation using textons
Segmenting vessels from retinal images, like segmentation in many other medical image domains, is a challenging task, as there is no unified way that can be adopted to extract the vessels accurately. However, it is the most critical stage in automatic assessment of various forms of diseases (e.g. Glaucoma, Age-related macular degeneration, diabetic retinopathy and cardiovascular diseases etc.). Our research aims to investigate retinal image segmentation approaches based on textons as they provide a compact description of texture that can be learnt from a training set. This thesis presents a brief review of those diseases and also includes their current situations, future trends and techniques used for their automatic diagnosis in routine clinical applications. The importance of retinal vessel segmentation is
particularly emphasized in such applications. An extensive review of previous work on retinal vessel segmentation and salient texture analysis methods is presented. Five automatic retinal vessel segmentation methods are proposed in this thesis. The first method focuses on addressing the problem of removing pathological anomalies (Drusen, exudates) for retinal vessel segmentation, which have been identified by other researchers as a problem and a common source of error. The results show that the modified method shows some
improvement compared to a previously published method. The second novel supervised segmentation method employs textons. We propose a new filter bank (MR11) that includes bar detectors for vascular feature extraction and other kernels to detect edges and photometric variations in the image. The k-means clustering algorithm is adopted for texton generation based on the vessel and non-vessel elements which are identified by ground truth. The third improved supervised method is developed based on the second one, in which textons are generated by k-means clustering and texton maps representing vessels are derived by back projecting pixel clusters onto hand labelled ground truth. A further step is implemented to ensure that the best combinations of textons are represented in the map and subsequently used to identify vessels in the test set. The experimental results on two benchmark datasets show that our proposed method performs well compared to other published work and the results of human experts. A further test of our system on an independent set of optical fundus images verified its consistent performance. The statistical analysis on experimental results also reveals that it is possible to train unified textons for retinal vessel segmentation. In the fourth method a novel scheme using Gabor filter bank for vessel feature extraction is proposed. The ii method is inspired by the human visual system. Machine learning is used to optimize the
Gabor filter parameters. The experimental results demonstrate that our method significantly enhances the true positive rate while maintaining a level of specificity that is comparable with other approaches. Finally, we proposed a new unsupervised texton based retinal vessel
segmentation method using derivative of SIFT and multi-scale Gabor filers. The lack of sufficient quantities of hand labelled ground truth and the high level of variability in ground truth labels amongst experts provides the motivation for this approach. The evaluation results
reveal that our unsupervised segmentation method is comparable with the best other supervised methods and other best state of the art methods
- …