2,498 research outputs found

    Sparse reduced-rank regression for imaging genetics studies: models and applications

    Get PDF
    We present a novel statistical technique; the sparse reduced rank regression (sRRR) model which is a strategy for multivariate modelling of high-dimensional imaging responses and genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity in the regression coefficients, identifying subsets of genetic markers that best explain the variability observed in subsets of the phenotypes. To properly exploit the rich structure present in each of the imaging and genetics domains, we additionally propose the use of several structured penalties within the sRRR model. Using simulation procedures that accurately reflect realistic imaging genetics data, we present detailed evaluations of the sRRR method in comparison with the more traditional univariate linear modelling approach. In all settings considered, we show that sRRR possesses better power to detect the deleterious genetic variants. Moreover, using a simple genetic model, we demonstrate the potential benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to extracting averages over regions of interest in the brain. Since this entails the use of phenotypic vectors of enormous dimensionality, we suggest the use of a sparse classification model as a de-noising step, prior to the imaging genetics study. Finally, we present the application of a data re-sampling technique within the sRRR model for model selection. Using this approach we are able to rank the genetic markers in order of importance of association to the phenotypes, and similarly rank the phenotypes in order of importance to the genetic markers. In the very end, we illustrate the application perspective of the proposed statistical models in three real imaging genetics datasets and highlight some potential associations

    A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction

    Full text link
    While linear mixed model (LMM) has shown a competitive performance in correcting spurious associations raised by population stratification, family structures, and cryptic relatedness, more challenges are still to be addressed regarding the complex structure of genotypic and phenotypic data. For example, geneticists have discovered that some clusters of phenotypes are more co-expressed than others. Hence, a joint analysis that can utilize such relatedness information in a heterogeneous data set is crucial for genetic modeling. We proposed the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction. Our method is capable of uncovering the genetic associations of a large number of phenotypes together while considering the relatedness of these phenotypes. Through extensive simulation experiments, we show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals. Further, we validate the effectiveness of sGLMM in the real-world genomic dataset on two different species from plants and humans. In Arabidopsis thaliana data, sGLMM behaves better than all other baseline models for 63.4% traits. We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.Comment: Code available at https://github.com/YeWenting/sGLM

    Identifying progressive imaging genetic patterns via multi-task sparse canonical correlation analysis: a longitudinal study of the ADNI cohort

    Get PDF
    Motivation Identifying the genetic basis of the brain structure, function and disorder by using the imaging quantitative traits (QTs) as endophenotypes is an important task in brain science. Brain QTs often change over time while the disorder progresses and thus understanding how the genetic factors play roles on the progressive brain QT changes is of great importance and meaning. Most existing imaging genetics methods only analyze the baseline neuroimaging data, and thus those longitudinal imaging data across multiple time points containing important disease progression information are omitted. Results We propose a novel temporal imaging genetic model which performs the multi-task sparse canonical correlation analysis (T-MTSCCA). Our model uses longitudinal neuroimaging data to uncover that how single nucleotide polymorphisms (SNPs) play roles on affecting brain QTs over the time. Incorporating the relationship of the longitudinal imaging data and that within SNPs, T-MTSCCA could identify a trajectory of progressive imaging genetic patterns over the time. We propose an efficient algorithm to solve the problem and show its convergence. We evaluate T-MTSCCA on 408 subjects from the Alzheimer’s Disease Neuroimaging Initiative database with longitudinal magnetic resonance imaging data and genetic data available. The experimental results show that T-MTSCCA performs either better than or equally to the state-of-the-art methods. In particular, T-MTSCCA could identify higher canonical correlation coefficients and capture clearer canonical weight patterns. This suggests that T-MTSCCA identifies time-consistent and time-dependent SNPs and imaging QTs, which further help understand the genetic basis of the brain QT changes over the time during the disease progression. Availability and implementation The software and simulation data are publicly available at https://github.com/dulei323/TMTSCCA. Supplementary information Supplementary data are available at Bioinformatics online

    Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO!

    Get PDF
    Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neu- roimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimag- ing data

    Structured Sparse Methods for Imaging Genetics

    Get PDF
    abstract: Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a novel direction to reveal and understand the complex genetic mechanisms. Oftentimes, imaging genetics studies are challenging due to the relatively small number of subjects but extremely high-dimensionality of both imaging data and genomic data. In this dissertation, I carry on my research on imaging genetics with particular focuses on two tasks---building predictive models between neuroimaging data and genomic data, and identifying disorder-related genetic risk factors through image-based biomarkers. To this end, I consider a suite of structured sparse methods---that can produce interpretable models and are robust to overfitting---for imaging genetics. With carefully-designed sparse-inducing regularizers, different biological priors are incorporated into learning models. More specifically, in the Allen brain image--gene expression study, I adopt an advanced sparse coding approach for image feature extraction and employ a multi-task learning approach for multi-class annotation. Moreover, I propose a label structured-based two-stage learning framework, which utilizes the hierarchical structure among labels, for multi-label annotation. In the Alzheimer's disease neuroimaging initiative (ADNI) imaging genetics study, I employ Lasso together with EDPP (enhanced dual polytope projections) screening rules to fast identify Alzheimer's disease risk SNPs. I also adopt the tree-structured group Lasso with MLFre (multi-layer feature reduction) screening rules to incorporate linkage disequilibrium information into modeling. Moreover, I propose a novel absolute fused Lasso model for ADNI imaging genetics. This method utilizes SNP spatial structure and is robust to the choice of reference alleles of genotype coding. In addition, I propose a two-level structured sparse model that incorporates gene-level networks through a graph penalty into SNP-level model construction. Lastly, I explore a convolutional neural network approach for accurate predicting Alzheimer's disease related imaging phenotypes. Experimental results on real-world imaging genetics applications demonstrate the efficiency and effectiveness of the proposed structured sparse methods.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Identification of associations between genotypes and longitudinal phenotypes via temporally-constrained group sparse canonical correlation analysis

    Get PDF
    Motivation: Neuroimaging genetics identifies the relationships between genetic variants (i.e., the single nucleotide polymorphisms) and brain imaging data to reveal the associations from genotypes to phenotypes. So far, most existing machine-learning approaches are widely used to detect the effective associations between genetic variants and brain imaging data at one time-point. However, those associations are based on static phenotypes and ignore the temporal dynamics of the phenotypical changes. The phenotypes across multiple time-points may exhibit temporal patterns that can be used to facilitate the understanding of the degenerative process. In this article, we propose a novel temporally constrained group sparse canonical correlation analysis (TGSCCA) framework to identify genetic associations with longitudinal phenotypic markers. Results: The proposed TGSCCA method is able to capture the temporal changes in brain from longitudinal phenotypes by incorporating the fused penalty, which requires that the differences between two consecutive canonical weight vectors from adjacent time-points should be small. A new efficient optimization algorithm is designed to solve the objective function. Furthermore, we demonstrate the effectiveness of our algorithm on both synthetic and real data (i.e., the Alzheimer’s Disease Neuroimaging Initiative cohort, including progressive mild cognitive impairment, stable MCI and Normal Control participants). In comparison with conventional SCCA, our proposed method can achieve strong associations and discover phenotypic biomarkers across multiple time-points to guide disease-progressive interpretation
    • …
    corecore