571 research outputs found

    Nonnegative OPLS for supervised design of filter banks: application to image and audio feature extraction

    Get PDF
    Audio or visual data analysis tasks usually have to deal with high-dimensional and nonnegative signals. However, most data analysis methods suffer from overfitting and numerical problems when data have more than a few dimensions needing a dimensionality reduction preprocessing. Moreover, interpretability about how and why filters work for audio or visual applications is a desired property, especially when energy or spectral signals are involved. In these cases, due to the nature of these signals, the nonnegativity of the filter weights is a desired property to better understand its working. Because of these two necessities, we propose different methods to reduce the dimensionality of data while the nonnegativity and interpretability of the solution are assured. In particular, we propose a generalized methodology to design filter banks in a supervised way for applications dealing with nonnegative data, and we explore different ways of solving the proposed objective function consisting of a nonnegative version of the orthonormalized partial least-squares method. We analyze the discriminative power of the features obtained with the proposed methods for two different and widely studied applications: texture and music genre classification. Furthermore, we compare the filter banks achieved by our methods with other state-of-the-art methods specifically designed for feature extraction.This work was supported in parts by the MINECO projects TEC2013-48439-C4-1-R, TEC2014-52289-R, TEC2016-75161-C2-1-R, TEC2016-75161-C2-2-R, TEC2016-81900-REDT/AEI, and PRICAM (S2013/ICE-2933)

    Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification

    Full text link
    Designing discriminative powerful texture features robust to realistic imaging conditions is a challenging computer vision problem with many applications, including material recognition and analysis of satellite or aerial imagery. In the past, most texture description approaches were based on dense orderless statistical distribution of local features. However, most recent approaches to texture recognition and remote sensing scene classification are based on Convolutional Neural Networks (CNNs). The d facto practice when learning these CNN models is to use RGB patches as input with training performed on large amounts of labeled data (ImageNet). In this paper, we show that Binary Patterns encoded CNN models, codenamed TEX-Nets, trained using mapped coded images with explicit texture information provide complementary information to the standard RGB deep models. Additionally, two deep architectures, namely early and late fusion, are investigated to combine the texture and color information. To the best of our knowledge, we are the first to investigate Binary Patterns encoded CNNs and different deep network fusion architectures for texture recognition and remote sensing scene classification. We perform comprehensive experiments on four texture recognition datasets and four remote sensing scene classification benchmarks: UC-Merced with 21 scene categories, WHU-RS19 with 19 scene classes, RSSCN7 with 7 categories and the recently introduced large scale aerial image dataset (AID) with 30 aerial scene types. We demonstrate that TEX-Nets provide complementary information to standard RGB deep model of the same network architecture. Our late fusion TEX-Net architecture always improves the overall performance compared to the standard RGB network on both recognition problems. Our final combination outperforms the state-of-the-art without employing fine-tuning or ensemble of RGB network architectures.Comment: To appear in ISPRS Journal of Photogrammetry and Remote Sensin

    Development of Multirate Filter – Based Region Features for Iris Identification

    Get PDF
    The emergence of biometric system is seen as the next-generation technological solution in strengthening the social and national security. The evolution of biometrics has shifted the paradigm of authentication from classical token and knowledge-based systems to physiological and behavioral trait based systems. R & D on iris biometrics, in last one decade, has established it as one of the most promising traits. Even though, iris biometric takes high resolution near-infrared (NIR) images as input, its authentication accuracy is very commendable. Its performance is often influenced by the presence of noise, database size, and feature representation. This thesis focuses on the use of multi resolution analysis (MRA) in developing suitable features for non-ideal iris images. Our investigation starts with the iris feature extraction technique using Cohen −Daubechies − Feauveau 9/7 (CDF 9/7) filter bank. In this work, a technique has been proposed to deal with issues like segmentation failure and occlusion. The experimental studies deal with the superiority of CDF 9/7 filter bank over the frequency based techniques. Since there is scope for improving the frequency selectivity of CDF 9/7 filter bank, a tunable filter bank is proposed to extract region based features from non-cooperative iris images. The proposed method is based on half band polynomial of 14th order. Since, regularity and frequency selectivity are in inverse relationship with each other, filter coefficients are derived by not imposing maximum number of zeros. Also, the half band polynomial is presented in x-domain, so as to apply semidefinite programming, which results in optimization of coefficients of analysis/synthesis filter. The next contribution in this thesis deals with the development of another powerful MRA known as triplet half band filter bank (THFB). The advantage of THFB is the flexibility in choosing the frequency response that allows one to overcome the magnitude constraints. The proposed filter bank has improved frequency selectivity along with other desired properties, which is then used for iris feature extraction. The last contribution of the thesis describes a wavelet cepstral feature derived from CDF 9/7 filter bank to characterize iris texture. Wavelet cepstrum feature helps in reducing the dimensionality of the detail coefficients; hence, a compact feature presentation is possible with improved accuracy against CDF 9/7. The efficacy of the features suggested are validated for iris recognition on three publicly available databases namely, CASIAv3, UBIRISv1, and IITD. The features are compared with other transform domain features like FFT, Gabor filter and a comprehensive evaluation is done for all suggested features as well. It has been observed that the suggested features show superior performance with respect to accuracy. Among all suggested features, THFB has shown best performance

    Interactive real-time three-dimensional visualisation of virtual textiles

    Get PDF
    Virtual textile databases provide a cost-efficient alternative to the use of existing hardcover sample catalogues. By taking advantage of the high performance features offered by the latest generation of programmable graphics accelerator boards, it is possible to combine photometric stereo methods with 3D visualisation methods to implement a virtual textile database. In this thesis, we investigate and combine rotation invariant texture retrieval with interactive visualisation techniques. We use a 3D surface representation that is a generic data representation that allows us to combine real-time interactive 3D visualisation methods with present day texture retrieval methods. We begin by investigating the most suitable data format for the 3D surface representation and identify relief-mapping combined with Bézier surfaces as the most suitable 3D surface representations for our needs, and go on to describe how these representation can be combined for real-time rendering. We then investigate ten different methods of implementing rotation invariant texture retrieval using feature vectors. These results show that first order statistics in the form of histogram data are very effective for discriminating colour albedo information, while rotation invariant gradient maps are effective for distinguishing between different types of micro-geometry using either first or second order statistics.Engineering and physical Sciences Research (EPSRC

    Feature Learning from Spectrograms for Assessment of Personality Traits

    Full text link
    Several methods have recently been proposed to analyze speech and automatically infer the personality of the speaker. These methods often rely on prosodic and other hand crafted speech processing features extracted with off-the-shelf toolboxes. To achieve high accuracy, numerous features are typically extracted using complex and highly parameterized algorithms. In this paper, a new method based on feature learning and spectrogram analysis is proposed to simplify the feature extraction process while maintaining a high level of accuracy. The proposed method learns a dictionary of discriminant features from patches extracted in the spectrogram representations of training speech segments. Each speech segment is then encoded using the dictionary, and the resulting feature set is used to perform classification of personality traits. Experiments indicate that the proposed method achieves state-of-the-art results with a significant reduction in complexity when compared to the most recent reference methods. The number of features, and difficulties linked to the feature extraction process are greatly reduced as only one type of descriptors is used, for which the 6 parameters can be tuned automatically. In contrast, the simplest reference method uses 4 types of descriptors to which 6 functionals are applied, resulting in over 20 parameters to be tuned.Comment: 12 pages, 3 figure

    Retinal vessel segmentation using textons

    Get PDF
    Segmenting vessels from retinal images, like segmentation in many other medical image domains, is a challenging task, as there is no unified way that can be adopted to extract the vessels accurately. However, it is the most critical stage in automatic assessment of various forms of diseases (e.g. Glaucoma, Age-related macular degeneration, diabetic retinopathy and cardiovascular diseases etc.). Our research aims to investigate retinal image segmentation approaches based on textons as they provide a compact description of texture that can be learnt from a training set. This thesis presents a brief review of those diseases and also includes their current situations, future trends and techniques used for their automatic diagnosis in routine clinical applications. The importance of retinal vessel segmentation is particularly emphasized in such applications. An extensive review of previous work on retinal vessel segmentation and salient texture analysis methods is presented. Five automatic retinal vessel segmentation methods are proposed in this thesis. The first method focuses on addressing the problem of removing pathological anomalies (Drusen, exudates) for retinal vessel segmentation, which have been identified by other researchers as a problem and a common source of error. The results show that the modified method shows some improvement compared to a previously published method. The second novel supervised segmentation method employs textons. We propose a new filter bank (MR11) that includes bar detectors for vascular feature extraction and other kernels to detect edges and photometric variations in the image. The k-means clustering algorithm is adopted for texton generation based on the vessel and non-vessel elements which are identified by ground truth. The third improved supervised method is developed based on the second one, in which textons are generated by k-means clustering and texton maps representing vessels are derived by back projecting pixel clusters onto hand labelled ground truth. A further step is implemented to ensure that the best combinations of textons are represented in the map and subsequently used to identify vessels in the test set. The experimental results on two benchmark datasets show that our proposed method performs well compared to other published work and the results of human experts. A further test of our system on an independent set of optical fundus images verified its consistent performance. The statistical analysis on experimental results also reveals that it is possible to train unified textons for retinal vessel segmentation. In the fourth method a novel scheme using Gabor filter bank for vessel feature extraction is proposed. The ii method is inspired by the human visual system. Machine learning is used to optimize the Gabor filter parameters. The experimental results demonstrate that our method significantly enhances the true positive rate while maintaining a level of specificity that is comparable with other approaches. Finally, we proposed a new unsupervised texton based retinal vessel segmentation method using derivative of SIFT and multi-scale Gabor filers. The lack of sufficient quantities of hand labelled ground truth and the high level of variability in ground truth labels amongst experts provides the motivation for this approach. The evaluation results reveal that our unsupervised segmentation method is comparable with the best other supervised methods and other best state of the art methods
    corecore