1,158 research outputs found

    Manifold Based Deep Learning: Advances and Machine Learning Applications

    Get PDF
    Manifolds are topological spaces that are locally Euclidean and find applications in dimensionality reduction, subspace learning, visual domain adaptation, clustering, and more. In this dissertation, we propose a framework for linear dimensionality reduction called the proxy matrix optimization (PMO) that uses the Grassmann manifold for optimizing over orthogonal matrix manifolds. PMO is an iterative and flexible method that finds the lower-dimensional projections for various linear dimensionality reduction methods by changing the objective function. PMO is suitable for Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA), Maximum Autocorrelation Factors (MAF), and Locality Preserving Projections (LPP). We extend PMO to incorporate robust Lp-norm versions of PCA and LDA, which uses fractional p-norms making them more robust to noisy data and outliers. The PMO method is designed to be realized as a layer in a neural network for maximum benefit. In order to do so, the incremental versions of PCA, LDA, and LPP are included in the PMO framework for problems where the data is not all available at once. Next, we explore the topic of domain shift in visual domain adaptation by combining concepts from spherical manifolds and deep learning. We investigate domain shift, which quantifies how well a model trained on a source domain adapts to a similar target domain with a metric called Spherical Optimal Transport (SpOT). We adopt the spherical manifold along with an orthogonal projection loss to obtain the features from the source and target domains. We then use the optimal transport with the cosine distance between the features as a way to measure the gap between the domains. We show, in our experiments with domain adaptation datasets, that SpOT does better than existing measures for quantifying domain shift and demonstrates a better correlation with the gain of transfer across domains

    VOICE BIOMETRICS UNDER MISMATCHED NOISE CONDITIONS

    Get PDF
    This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes

    Open-set Speaker Identification

    Get PDF
    This study is motivated by the growing need for effective extraction of intelligence and evidence from audio recordings in the fight against crime, a need made ever more apparent with the recent expansion of criminal and terrorist organisations. The main focus is to enhance open-set speaker identification process within the speaker identification systems, which are affected by noisy audio data obtained under uncontrolled environments such as in the street, in restaurants or other places of businesses. Consequently, two investigations are initially carried out including the effects of environmental noise on the accuracy of open-set speaker recognition, which thoroughly cover relevant conditions in the considered application areas, such as variable training data length, background noise and real world noise, and the effects of short and varied duration reference data in open-set speaker recognition. The investigations led to a novel method termed “vowel boosting” to enhance the reliability in speaker identification when operating with varied duration speech data under uncontrolled conditions. Vowels naturally contain more speaker specific information. Therefore, by emphasising this natural phenomenon in speech data, it enables better identification performance. The traditional state-of-the-art GMM-UBMs and i-vectors are used to evaluate “vowel boosting”. The proposed approach boosts the impact of the vowels on the speaker scores, which improves the recognition accuracy for the specific case of open-set identification with short and varied duration of speech material

    Sparsity Preserving Discriminant Projections with Applications to Face Recognition

    Get PDF
    Dimensionality reduction is extremely important for understanding the intrinsic structure hidden in high-dimensional data. In recent years, sparse representation models have been widely used in dimensionality reduction. In this paper, a novel supervised learning method, called Sparsity Preserving Discriminant Projections (SPDP), is proposed. SPDP, which attempts to preserve the sparse representation structure of the data and maximize the between-class separability simultaneously, can be regarded as a combiner of manifold learning and sparse representation. Specifically, SPDP first creates a concatenated dictionary by classwise PCA decompositions and learns the sparse representation structure of each sample under the constructed dictionary using the least square method. Secondly, a local between-class separability function is defined to characterize the scatter of the samples in the different submanifolds. Then, SPDP integrates the learned sparse representation information with the local between-class relationship to construct a discriminant function. Finally, the proposed method is transformed into a generalized eigenvalue problem. Extensive experimental results on several popular face databases demonstrate the feasibility and effectiveness of the proposed approach

    The Phase-based Gabor Fisher Classifier and its application to face recognition under varying illumination conditions

    Full text link
    • …
    corecore