5 research outputs found

    Generalizations, extensions and applications for principal component analysis

    Get PDF
    Principal component analysis (PCA) is one of the most important dimension reduction technique. It is widely used in many applications including economics, finance and medical research. In this research, several novel generalizations of PCA are proposed to adapt the technique to more complicated scenarios. In the first project, we propose a principal surface model for manifold-like datasets in 3D space. In the second part, a new concept of graphical intra-class correlation coefficient (GICC) is defined and a Markov Chain Monte Carlo Expectation-Maximization (mcmcEM) algorithm is used for likelihood optimization. In the third part, we propose multilevel binary principal component analysis (MBPCA) models for finding the principal components of multilevel binary dataset. A variational expectation maximization algorithm is used for likelihood optimization

    Statistical Inference on Multiple Graphs

    Get PDF
    Given multiple graphs, an important question is how to perform statistical inference on them. This question becomes more significant in the recent era with the explosion of graph data and the increasing complexity of data analysis. Successfully addressing this question will have a large impact on various scientific fields including neuroscience, social network analysis, and internet mapping. Graphs are naturally complex objects with intrinsic topological structure which imposes significant challenges to traditional statistical inference. Therefore, graph pre-processing, feature extraction, and dimension reduction are essential to obtain good subsequent inference performance. In this dissertation, I develop pre-processing, feature extraction, and dimension reduction methods for data taking the form of multiple graphs. The methods are motivated by classical statistical approaches including analysis of variance, feature screening, and principal component analysis. Some methods can be applied under both supervised and unsupervised settings; others are designed only for problems involving labels of interest. I analyze the theoretical properties of these methods jointly with subsequent inference performance under suitable random graph models. Simulations, which include graph clustering, classification, and regression are provided to demonstrate the properties of the proposed methods. I further apply the methods developed here to real data sets such as human brain networks acquired through neuroimaging techniques. The main contribution of this dissertation is the presentation of a set of methods in analyzing multiple graphs. These methods are supported with theory and numerical experiments. I further demonstrate the utility of the methods by exploring real data sets and discovering statistical patterns

    Statistical Analysis of Functional Connectivity in Brain Imaging: Measurement Reliability and Clinical Applications

    Get PDF
    Measurement reliability is crucial for the research of functional connectivity data in the context of pursuing more reproducible research. Unfortunately, the utility of traditional reliability measures, such as the intraclass correlation coefficient, is limited given the size and complexity of functional connectivity data. In recent work, novel reliability measures have been introduced in the context where a set of subjects are measured twice or more, including: fingerprinting, rank sums, and generalizations of the intraclass correlation coefficient. However, the relationships between, and the best practices among these measures remains largely unknown. In this thesis, we consider a novel reliability measure, discriminability. We show that it is deterministically linked with the correlation coefficient under univariate random effect models, and has desired property of optimal accuracy for inferential tasks using multivariate measurements. Additionally, we propose a universal framework of reliability test based on permutations of the statistics.The power of permutation tests derived from these measures are compared numerically under Gaussian and non-Gaussian settings, with and without simulated batch effects. Motivated by both theoretical and empirical results, we provide methodological recommendations for each benchmark setting to serve as a resource for future analyses. We investigate the Poisson and Gaussian approximations of the tests so that the computational cost is reduced. We demonstrate possible follow-up research using reliability tests via applications on the Human Connectome Project functional connectivity data. We believe these results will play an important role towards improving reproducibility not only for functional connectivity, but also in fields such as functional magnetic resonance imaging in general, genomics, pharmacology, and more. Lastly, we illustrate the potential of functional connectivity as a source of causal biomarkers with an example of analyzing the trial data for an aphasia treatment
    corecore