21 research outputs found

    Analysis of Brain Magnetic Resonance Images: Voxel-Based Morphometry and Pattern Classification Approaches

    Get PDF
    This thesis aims to examine two types of elaboration techniques of brain magnetic resonance imaging (MRI) data: the voxel-based morphometry (VBM) and the support vector machine (SVM) approaches. While the VBM is a standard and well-established mass-univariate method, the SVM multivariate analysis has been rarely implemented to investigate brain MRI data. An improvement of our knowledge on the pattern classication approach is necessary to be achieved, both to assess its exploratory capability and to point out advantages and disadvantages with respect to the more largely used VBM approach. Despite these methods are potentially suitable to investigate a large variety of neurological and neuropsychiatric disorders, in the present study they have been employed with the purpose of detecting neuroanatomical and gender-related abnormalities in children with autism spectrum disorders (ASD). In fact, the dierences in the neuroanatomy of young children with ASD are an intriguing and still poor investigated issue. After a description of the physical principles of nuclear magnetic resonance and an overview of magnetic resonance imaging, we specied the two algorithms that represent the object of the current study: voxel-based morphometry and support vector machines classication methods. Hence, we described the theoretical principles they are based on, pointing out schemes and procedures employed to implement these analysis approaches. Then, we examined the application of VBM and SVM methods to an opportunely chosen sample of MRI data. A total of 152 structural MRI scans were selected. Specically, our dataset was composed by 76 ASD children and 76 matched controls in the 2-7 year age range. The images were preprocessed applying the SPM8 algorithm, based on the dieomorphic anatomical registration through exponentiated lie algebra (DARTEL) procedure. The resulting grey matter (GM) segments were analyzed by applying the conventional voxel-wise two-sample t-test VBM analysis and employing the stringent family-wise error (FWE) rate correction according to random gaussian elds theory. The same preprocessed GM segments were then analyzed using the SVM pattern classication approach, that presents the advantage of intrinsically taking into account interregional correlations. Moreover, this technique would allow investigations about the predictive value of structural MRI scans. In fact, the SVM classication capability can be quantied in terms of the area under the receiver operating characteristic curve (AUC). The leave-pair-out cross- validation protocol has been adopted to evaluate the classication performance. The recursive feature elimination (RFE) procedure has been implemented both to reduce the large number of features in the classication problem and to enhance the classication capability. The SVM-RFE allows also to localize the most discriminant voxels and to visualize them in a discrimination map. However, the pattern classication method was not employed to predict the class membership of undiagnosed subjects, but as a gure of merit allowing to determine an optimal threshold on the discrimination maps, where possible between-group structural dierences are encoded. With the aim of strengthening the SVM-based methods applied to brain data and to guarantee reliability and reproducibility of the results, we set up the following tests: 1. We evaluated the consistency among all discrimination maps, each obtained from one of the SVM leave-pair-out cross-validation steps, within the chosen range of number of retained features employed. 2. We assessed the dependency on the population of the training set within the cross- validation procedure. In this way we became able to check for the stability of our statistical results with respect to the number of subjects employed during the learning phase. Furthermore, we can evaluate the classication performances for dierent cross- validation schemes. Among the results we obtained, we found that SVMs applied to GM scans correctly discriminate ASD male and female individuals with respect to controls with an AUC above the 87% with a fraction of retained voxels in the 0.4-29% range. By choosing as operative point of the system that corresponding to the lower amount of signicant voxels (0.4% of the total number of voxels) we obtained a sensitivity of 82% and a specicity of 80%. The resulting discrimination maps showed some signicant regions where an excess of GM characterizes the ASD subjects with respect to the matched control group. These regions seemed to be consistent with those obtained from the VBM analysis, nevertheless the SVM analysis highlighted a larger number of interesting gender-specic discriminating regions. Hence, multivariate methods based on the SVM could contribute not only to distinguish ASD from control children, but also to disentangle the gender specicity of ASD brain alterations, consistently with respect to the mass-univariate approach. Achieving a better AUC could make possible to employ the pattern recognition approach not only to individuate brain regions discriminating between patients and controls, but also to predict the class membership of undiagnosed subjects, thus facilitating the early diagnosis of the ASD pathology

    K-means algorithms for functional data

    Get PDF
    Cluster analysis of functional data considers that the objects on which you want to perform a taxonomy are functions f : X e Rp ↦R and the available information about each object is a sample in a finite set of points f ¼ fðx ; y ÞA X x Rgn . The aim is to infer the meaningful groups by working explicitly with its infinite-dimensional nature. In this paper the use of K-means algorithms to solve this problem is analysed. A comparative study of three K-means algorithms has been conducted. The K-means algorithm for raw data, a kernel K-means algorithm for raw data and a K-means algorithm using two distances for functional data are tested. These distances, called dVn and dϕ, are based on projections onto Reproducing Kernel Hilbert Spaces (RKHS) and Tikhonov regularization theory. Although it is shown that both distances are equivalent, they lead to two different strategies to reduce the dimensionality of the data. In the case of dVn distance the most suitable strategy is Johnson–Lindenstrauss random projections. The dimensionality reduction for dϕ is based on spectral methods

    Enhanced protein fold recognition through a novel data integration approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein fold recognition is a key step in protein three-dimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the issue of finding the most efficient method for combining these different informative data sources and exploring their relative significance for protein fold classification. Kernel methods have been extensively used for biological data analysis. They can incorporate separate fold discriminatory features into kernel matrices which encode the similarity between samples in their respective data sources.</p> <p>Results</p> <p>In this paper we consider the problem of integrating multiple data sources using a kernel-based approach. We propose a novel information-theoretic approach based on a Kullback-Leibler (KL) divergence between the output kernel matrix and the input kernel matrix so as to integrate heterogeneous data sources. One of the most appealing properties of this approach is that it can easily cope with multi-class classification and multi-task learning by an appropriate choice of the output kernel matrix. Based on the position of the output and input kernel matrices in the KL-divergence objective, there are two formulations which we respectively refer to as <it>MKLdiv-dc </it>and <it>MKLdiv-conv</it>. We propose to efficiently solve MKLdiv-dc by a difference of convex (DC) programming method and MKLdiv-conv by a projected gradient descent algorithm. The effectiveness of the proposed approaches is evaluated on a benchmark dataset for protein fold recognition and a yeast protein function prediction problem.</p> <p>Conclusion</p> <p>Our proposed methods MKLdiv-dc and MKLdiv-conv are able to achieve state-of-the-art performance on the SCOP PDB-40D benchmark dataset for protein fold prediction and provide useful insights into the relative significance of informative data sources. In particular, MKLdiv-dc further improves the fold discrimination accuracy to 75.19% which is a more than 5% improvement over competitive Bayesian probabilistic and SVM margin-based kernel learning methods. Furthermore, we report a competitive performance on the yeast protein function prediction problem.</p

    Making Faces - State-Space Models Applied to Multi-Modal Signal Processing

    Get PDF

    Discriminant feature pursuit: from statistical learning to informative learning.

    Get PDF
    Lin Dahua.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves 233-250).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- The Problem We are Facing --- p.1Chapter 1.2 --- Generative vs. Discriminative Models --- p.2Chapter 1.3 --- Statistical Feature Extraction: Success and Challenge --- p.3Chapter 1.4 --- Overview of Our Works --- p.5Chapter 1.4.1 --- New Linear Discriminant Methods: Generalized LDA Formulation and Performance-Driven Sub space Learning --- p.5Chapter 1.4.2 --- Coupled Learning Models: Coupled Space Learning and Inter Modality Recognition --- p.6Chapter 1.4.3 --- Informative Learning Approaches: Conditional Infomax Learning and Information Chan- nel Model --- p.6Chapter 1.5 --- Organization of the Thesis --- p.8Chapter I --- History and Background --- p.10Chapter 2 --- Statistical Pattern Recognition --- p.11Chapter 2.1 --- Patterns and Classifiers --- p.11Chapter 2.2 --- Bayes Theory --- p.12Chapter 2.3 --- Statistical Modeling --- p.14Chapter 2.3.1 --- Maximum Likelihood Estimation --- p.14Chapter 2.3.2 --- Gaussian Model --- p.15Chapter 2.3.3 --- Expectation-Maximization --- p.17Chapter 2.3.4 --- Finite Mixture Model --- p.18Chapter 2.3.5 --- A Nonparametric Technique: Parzen Windows --- p.21Chapter 3 --- Statistical Learning Theory --- p.24Chapter 3.1 --- Formulation of Learning Model --- p.24Chapter 3.1.1 --- Learning: Functional Estimation Model --- p.24Chapter 3.1.2 --- Representative Learning Problems --- p.25Chapter 3.1.3 --- Empirical Risk Minimization --- p.26Chapter 3.2 --- Consistency and Convergence of Learning --- p.27Chapter 3.2.1 --- Concept of Consistency --- p.27Chapter 3.2.2 --- The Key Theorem of Learning Theory --- p.28Chapter 3.2.3 --- VC Entropy --- p.29Chapter 3.2.4 --- Bounds on Convergence --- p.30Chapter 3.2.5 --- VC Dimension --- p.35Chapter 4 --- History of Statistical Feature Extraction --- p.38Chapter 4.1 --- Linear Feature Extraction --- p.38Chapter 4.1.1 --- Principal Component Analysis (PCA) --- p.38Chapter 4.1.2 --- Linear Discriminant Analysis (LDA) --- p.41Chapter 4.1.3 --- Other Linear Feature Extraction Methods --- p.46Chapter 4.1.4 --- Comparison of Different Methods --- p.48Chapter 4.2 --- Enhanced Models --- p.49Chapter 4.2.1 --- Stochastic Discrimination and Random Subspace --- p.49Chapter 4.2.2 --- Hierarchical Feature Extraction --- p.51Chapter 4.2.3 --- Multilinear Analysis and Tensor-based Representation --- p.52Chapter 4.3 --- Nonlinear Feature Extraction --- p.54Chapter 4.3.1 --- Kernelization --- p.54Chapter 4.3.2 --- Dimension reduction by Manifold Embedding --- p.56Chapter 5 --- Related Works in Feature Extraction --- p.59Chapter 5.1 --- Dimension Reduction --- p.59Chapter 5.1.1 --- Feature Selection --- p.60Chapter 5.1.2 --- Feature Extraction --- p.60Chapter 5.2 --- Kernel Learning --- p.61Chapter 5.2.1 --- Basic Concepts of Kernel --- p.61Chapter 5.2.2 --- The Reproducing Kernel Map --- p.62Chapter 5.2.3 --- The Mercer Kernel Map --- p.64Chapter 5.2.4 --- The Empirical Kernel Map --- p.65Chapter 5.2.5 --- Kernel Trick and Kernelized Feature Extraction --- p.66Chapter 5.3 --- Subspace Analysis --- p.68Chapter 5.3.1 --- Basis and Subspace --- p.68Chapter 5.3.2 --- Orthogonal Projection --- p.69Chapter 5.3.3 --- Orthonormal Basis --- p.70Chapter 5.3.4 --- Subspace Decomposition --- p.70Chapter 5.4 --- Principal Component Analysis --- p.73Chapter 5.4.1 --- PCA Formulation --- p.73Chapter 5.4.2 --- Solution to PCA --- p.75Chapter 5.4.3 --- Energy Structure of PCA --- p.76Chapter 5.4.4 --- Probabilistic Principal Component Analysis --- p.78Chapter 5.4.5 --- Kernel Principal Component Analysis --- p.81Chapter 5.5 --- Independent Component Analysis --- p.83Chapter 5.5.1 --- ICA Formulation --- p.83Chapter 5.5.2 --- Measurement of Statistical Independence --- p.84Chapter 5.6 --- Linear Discriminant Analysis --- p.85Chapter 5.6.1 --- Fisher's Linear Discriminant Analysis --- p.85Chapter 5.6.2 --- Improved Algorithms for Small Sample Size Problem . --- p.89Chapter 5.6.3 --- Kernel Discriminant Analysis --- p.92Chapter II --- Improvement in Linear Discriminant Analysis --- p.100Chapter 6 --- Generalized LDA --- p.101Chapter 6.1 --- Regularized LDA --- p.101Chapter 6.1.1 --- Generalized LDA Implementation Procedure --- p.101Chapter 6.1.2 --- Optimal Nonsingular Approximation --- p.103Chapter 6.1.3 --- Regularized LDA algorithm --- p.104Chapter 6.2 --- A Statistical View: When is LDA optimal? --- p.105Chapter 6.2.1 --- Two-class Gaussian Case --- p.106Chapter 6.2.2 --- Multi-class Cases --- p.107Chapter 6.3 --- Generalized LDA Formulation --- p.108Chapter 6.3.1 --- Mathematical Preparation --- p.108Chapter 6.3.2 --- Generalized Formulation --- p.110Chapter 7 --- Dynamic Feedback Generalized LDA --- p.112Chapter 7.1 --- Basic Principle --- p.112Chapter 7.2 --- Dynamic Feedback Framework --- p.113Chapter 7.2.1 --- Initialization: K-Nearest Construction --- p.113Chapter 7.2.2 --- Dynamic Procedure --- p.115Chapter 7.3 --- Experiments --- p.115Chapter 7.3.1 --- Performance in Training Stage --- p.116Chapter 7.3.2 --- Performance on Testing set --- p.118Chapter 8 --- Performance-Driven Subspace Learning --- p.119Chapter 8.1 --- Motivation and Principle --- p.119Chapter 8.2 --- Performance-Based Criteria --- p.121Chapter 8.2.1 --- The Verification Problem and Generalized Average Margin --- p.122Chapter 8.2.2 --- Performance Driven Criteria based on Generalized Average Margin --- p.123Chapter 8.3 --- Optimal Subspace Pursuit --- p.125Chapter 8.3.1 --- Optimal threshold --- p.125Chapter 8.3.2 --- Optimal projection matrix --- p.125Chapter 8.3.3 --- Overall procedure --- p.129Chapter 8.3.4 --- Discussion of the Algorithm --- p.129Chapter 8.4 --- Optimal Classifier Fusion --- p.130Chapter 8.5 --- Experiments --- p.131Chapter 8.5.1 --- Performance Measurement --- p.131Chapter 8.5.2 --- Experiment Setting --- p.131Chapter 8.5.3 --- Experiment Results --- p.133Chapter 8.5.4 --- Discussion --- p.139Chapter III --- Coupled Learning of Feature Transforms --- p.140Chapter 9 --- Coupled Space Learning --- p.141Chapter 9.1 --- Introduction --- p.142Chapter 9.1.1 --- What is Image Style Transform --- p.142Chapter 9.1.2 --- Overview of our Framework --- p.143Chapter 9.2 --- Coupled Space Learning --- p.143Chapter 9.2.1 --- Framework of Coupled Modelling --- p.143Chapter 9.2.2 --- Correlative Component Analysis --- p.145Chapter 9.2.3 --- Coupled Bidirectional Transform --- p.148Chapter 9.2.4 --- Procedure of Coupled Space Learning --- p.151Chapter 9.3 --- Generalization to Mixture Model --- p.152Chapter 9.3.1 --- Coupled Gaussian Mixture Model --- p.152Chapter 9.3.2 --- Optimization by EM Algorithm --- p.152Chapter 9.4 --- Integrated Framework for Image Style Transform --- p.154Chapter 9.5 --- Experiments --- p.156Chapter 9.5.1 --- Face Super-resolution --- p.156Chapter 9.5.2 --- Portrait Style Transforms --- p.157Chapter 10 --- Inter-Modality Recognition --- p.162Chapter 10.1 --- Introduction to the Inter-Modality Recognition Problem . . . --- p.163Chapter 10.1.1 --- What is Inter-Modality Recognition --- p.163Chapter 10.1.2 --- Overview of Our Feature Extraction Framework . . . . --- p.163Chapter 10.2 --- Common Discriminant Feature Extraction --- p.165Chapter 10.2.1 --- Formulation of the Learning Problem --- p.165Chapter 10.2.2 --- Matrix-Form of the Objective --- p.168Chapter 10.2.3 --- Solving the Linear Transforms --- p.169Chapter 10.3 --- Kernelized Common Discriminant Feature Extraction --- p.170Chapter 10.4 --- Multi-Mode Framework --- p.172Chapter 10.4.1 --- Multi-Mode Formulation --- p.172Chapter 10.4.2 --- Optimization Scheme --- p.174Chapter 10.5 --- Experiments --- p.176Chapter 10.5.1 --- Experiment Settings --- p.176Chapter 10.5.2 --- Experiment Results --- p.177Chapter IV --- A New Perspective: Informative Learning --- p.180Chapter 11 --- Toward Information Theory --- p.181Chapter 11.1 --- Entropy and Mutual Information --- p.181Chapter 11.1.1 --- Entropy --- p.182Chapter 11.1.2 --- Relative Entropy (Kullback Leibler Divergence) --- p.184Chapter 11.2 --- Mutual Information --- p.184Chapter 11.2.1 --- Definition of Mutual Information --- p.184Chapter 11.2.2 --- Chain rules --- p.186Chapter 11.2.3 --- Information in Data Processing --- p.188Chapter 11.3 --- Differential Entropy --- p.189Chapter 11.3.1 --- Differential Entropy of Continuous Random Variable . --- p.189Chapter 11.3.2 --- Mutual Information of Continuous Random Variable . --- p.190Chapter 12 --- Conditional Infomax Learning --- p.191Chapter 12.1 --- An Overview --- p.192Chapter 12.2 --- Conditional Informative Feature Extraction --- p.193Chapter 12.2.1 --- Problem Formulation and Features --- p.193Chapter 12.2.2 --- The Information Maximization Principle --- p.194Chapter 12.2.3 --- The Information Decomposition and the Conditional Objective --- p.195Chapter 12.3 --- The Efficient Optimization --- p.197Chapter 12.3.1 --- Discrete Approximation Based on AEP --- p.197Chapter 12.3.2 --- Analysis of Terms and Their Derivatives --- p.198Chapter 12.3.3 --- Local Active Region Method --- p.200Chapter 12.4 --- Bayesian Feature Fusion with Sparse Prior --- p.201Chapter 12.5 --- The Integrated Framework for Feature Learning --- p.202Chapter 12.6 --- Experiments --- p.203Chapter 12.6.1 --- A Toy Problem --- p.203Chapter 12.6.2 --- Face Recognition --- p.204Chapter 13 --- Channel-based Maximum Effective Information --- p.209Chapter 13.1 --- Motivation and Overview --- p.209Chapter 13.2 --- Maximizing Effective Information --- p.211Chapter 13.2.1 --- Relation between Mutual Information and Classification --- p.211Chapter 13.2.2 --- Linear Projection and Metric --- p.212Chapter 13.2.3 --- Channel Model and Effective Information --- p.213Chapter 13.2.4 --- Parzen Window Approximation --- p.216Chapter 13.3 --- Parameter Optimization on Grassmann Manifold --- p.217Chapter 13.3.1 --- Grassmann Manifold --- p.217Chapter 13.3.2 --- Conjugate Gradient Optimization on Grassmann Manifold --- p.219Chapter 13.3.3 --- Computation of Gradient --- p.221Chapter 13.4 --- Experiments --- p.222Chapter 13.4.1 --- A Toy Problem --- p.222Chapter 13.4.2 --- Face Recognition --- p.223Chapter 14 --- Conclusion --- p.23

    A probablistic framework for classification and fusion of remotely sensed hyperspectral data

    Get PDF
    Reliable and accurate material identification is a crucial component underlying higher-level autonomous tasks within the context of autonomous mining. Such tasks can include exploration, reconnaissance and guidance of machines (e.g. autonomous diggers and haul trucks) to mine sites. This thesis focuses on the problem of classification of materials (rocks and minerals) using high spatial and high spectral resolution (hyperspectral) imagery, collected remotely from mine faces in operational open pit mines. A new method is developed for the classification of hyperspectral data including field spectra and imagery using a probabilistic framework and Gaussian Process regression. The developed method uses, for the first time, the Observation Angle Dependent (OAD) covariance function to classify high-dimensional sets of data. The performance of the proposed method of classification is assessed and compared to standard methods used for the classification of hyperspectral data. This is done using a staged experimental framework. First, the proposed method is tested using high-resolution field spectrometer data acquired in the laboratory and in the field. Second, the method is extended to work on hyperspectral imagery acquired in the laboratory and its performance evaluated. Finally, the method is evaluated for imagery acquired from a mine face under natural illumination and the use of independent spectral libraries to classify imagery is explored. A probabilistic framework was selected because it best enables the integration of internal and external information from a variety of sensors. To demonstrate advantages of the proposed GP-OAD method over existing, deterministic methods, a new framework is proposed to fuse hyperspectral images using the classified probabilistic outputs from several different images acquired of the same mine face. This method maximises the amount of information but reduces the amount of data by condensing all available information into a single map. Thus, the proposed fusion framework removes the need to manually select a single classification among many individual classifications of a mine face as the `best' one and increases the classification performance by combining more information. The methods proposed in this thesis are steps forward towards an automated mine face inspection system that can be used within the existing autonomous mining framework to improve productivity and efficiency. Last but not least the proposed methods will also contribute to increased mine safety

    Stochastic chaos and thermodynamic phase transitions : theory and Bayesian estimation algorithms

    Get PDF
    Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 177-200).The chaotic behavior of dynamical systems underlies the foundations of statistical mechanics through ergodic theory. This putative connection is made more concrete in Part I of this thesis, where we show how to quantify certain chaotic properties of a system that are of relevance to statistical mechanics and kinetic theory. We consider the motion of a particle trapped in a double-well potential coupled to a noisy environment. By use of the classic Langevin and Fokker-Planck equations, we investigate Kramers' escape rate problem. We show that there is a deep analogy between kinetic rate theory and stochastic chaos, for which we propose a novel definition. In Part II, we develop techniques based on Volterra series modeling and Bayesian non-linear filtering to distinguish between dynamic noise and measurement noise. We quantify how much of the system's ergodic behavior can be attributed to intrinsic deterministic dynamical properties vis-a-vis inevitable extrinsic noise perturbations.by Zhi-De Deng.M.Eng.and S.B
    corecore