203 research outputs found

    Sparse reduced-rank regression for imaging genetics studies: models and applications

    Get PDF
    We present a novel statistical technique; the sparse reduced rank regression (sRRR) model which is a strategy for multivariate modelling of high-dimensional imaging responses and genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity in the regression coefficients, identifying subsets of genetic markers that best explain the variability observed in subsets of the phenotypes. To properly exploit the rich structure present in each of the imaging and genetics domains, we additionally propose the use of several structured penalties within the sRRR model. Using simulation procedures that accurately reflect realistic imaging genetics data, we present detailed evaluations of the sRRR method in comparison with the more traditional univariate linear modelling approach. In all settings considered, we show that sRRR possesses better power to detect the deleterious genetic variants. Moreover, using a simple genetic model, we demonstrate the potential benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to extracting averages over regions of interest in the brain. Since this entails the use of phenotypic vectors of enormous dimensionality, we suggest the use of a sparse classification model as a de-noising step, prior to the imaging genetics study. Finally, we present the application of a data re-sampling technique within the sRRR model for model selection. Using this approach we are able to rank the genetic markers in order of importance of association to the phenotypes, and similarly rank the phenotypes in order of importance to the genetic markers. In the very end, we illustrate the application perspective of the proposed statistical models in three real imaging genetics datasets and highlight some potential associations

    Graph-Based Fusion of Imaging, Genetic and Clinical Data for Degenerative Disease Diagnosis

    Get PDF
    Graph learning methods have achieved noteworthy performance in disease diagnosis due to their ability to represent unstructured information such as inter-subject relationships. While it has been shown that imaging, genetic and clinical data are crucial for degenerative disease diagnosis, existing methods rarely consider how best to use their relationships. How best to utilize information from imaging, genetic and clinical data remains a challenging problem. This study proposes a novel graph-based fusion (GBF) approach to meet this challenge. To extract effective imaging-genetic features, we propose an imaging-genetic fusion module which uses an attention mechanism to obtain modality-specific and joint representations within and between imaging and genetic data. Then, considering the effectiveness of clinical information for diagnosing degenerative diseases, we propose a multi-graph fusion module to further fuse imaging-genetic and clinical features, which adopts a learnable graph construction strategy and a graph ensemble method. Experimental results on two benchmarks for degenerative disease diagnosis (Alzheimer's Disease Neuroimaging Initiative and Parkinson's Progression Markers Initiative) demonstrate its effectiveness compared to state-of-the-art graph-based methods. Our findings should help guide further development of graph-based models for dealing with imaging, genetic and clinical data

    Advanced Methods for Discovering Genetic Markers Associated with High Dimensional Imaging Data

    Get PDF
    Imaging genetic studies have been widely applied to discover genetic factors of inherited neuropsychiatric diseases. Despite the notable contribution of genome-wide association studies (GWAS) in neuroimaging research, it has always been difficult to efficiently perform association analysis on imaging phenotypes. There are several challenges arising from this topic, such as the large dimensionality of imaging data and genetic data, the potential spatial dependency of imaging phenotypes and the computational burden of the GWAS problem. All the aforementioned issues motivate us to investigate new statistical methods in neuroimaging genetic analysis. In the first project, we develop a hierarchical functional principal regression model (HFPRM) to simultaneously study diffusion tensor bundle statistics on multiple fiber tracts. Theoretically, the asymptotic distribution of the global test statistic on the common factors has been studied. Simulations are conducted to evaluate the finite sample performance of HFPRM. Finally, we apply our method to a GWAS of a neonate population to explore important genetic architecture in early human brain development. In the second project, we consider an association test between functional data acquired on a single curve and scalar variables in a varying coefficient model. We propose a functional projection regression model and an associated global test statistic to aggregate weak signals across the domain of functional data. Theoretically, we examine the asymptotic distribution of the global test statistic and provide a strategy to adaptively select the tuning parameter. Simulation experiments show that the proposed test outperforms existing state-of-the-art methods in functional statistical inference. We also apply the proposed method to a GWAS in the UK Biobank dataset. In the third project, we introduce an adaptive projection regression model (APRM) to perform statistical inference on high dimensional imaging responses in the presence of high correlations. Dimension reduction of the phenotypes is achieved through a linear projection regression model. We also implement an adaptive inference procedure to detect signals at multiple levels. Numerical simulations demonstrate that APRM outperforms many state-of-the-art methods in high dimensional inference. Finally, we apply APRM to a GWAS of volumetric data on 93 regions of interest in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset.Doctor of Philosoph

    Optimizing fMRI analysis in Schizophrenia research: methodology improvements

    Full text link
    La Resonancia Magnética funcional (RMf) es una técnica moderna de neuroimagen que permite la localización de actividad neuronal con una alta resolución espacial. La técnica de RMf emplea los cambios locales de oxigenación en la sangre, reflejados como pequeños cambios en la intensidad en un tipo concreto de imagen de Resonancia Magnética. La habilidad de esta técnica en la detección de cambios en la función en el cerebro sano y enfermo, y la localización de función anormal convierte a la RMf en una técnica ideal para el tratamiento de numerosas enfermedades y lesiones neuronales. Ya se ha aplicado clínicamente en la localización de áreas funcionales afectadas por tumores, pre- y post- operativamente. La esquizofrenia, una vasta enfermedad que se encuentra presente en el uno por cien de la población global, es una dolencia que se ha estudiado recientemente mediante técnicas de neuroimagen funcional, con más de 300 estudios publicados en revistas sobre esquizofrenia y RMf. La comprensión de los sustratos neuronales de la esquizofrenia requiere una determinación precisa de la extensión y la distribución de anormalidades en la función y anatomía cerebrales. Ya que los síntomas tienen una distribución dispersa, se debería emplear una aproximación fenomemológica a esta enfermedad para relacionar anormalidades, síntomas y prognosis con precisión. Los pacientes que tienen principalmente síntomas positivos, tales como alucinaciones auditivas y delirios, pueden tener anomalías cerebrales diferentes a aquellos que tienen síntomas negativos pronunciados. Por tanto, en el presente estudios se ha seleccionado un síntoma positivo, la presencia de alucinaciones auditivas en pacientes esquizofrénicos, como el criterio de selección de un grupo homogéneo de pacientes esquizofrénicos auditivos. Esta tesis presenta la aplicación de la RMf al estudio de la enfermedad de la esquizofrenia. Finalmente, un nuevo método de filtrado de datos de RMf, el NL-means, se ha propuesto y se sugiere suLull Noguera, JJ. (2008). Optimizing fMRI analysis in Schizophrenia research: methodology improvements [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/48597Palanci

    Biologically informed risk scoring in schizophrenia based on genome-wide omics data

    Get PDF
    Extensive efforts in characterizing the biological architecture of schizophrenia have moved psychiatric research closer towards clinical application. As our understanding of psychiatric illness is slowly shifting towards a conceptualization as dimensional constructs that cut across traditional diagnostic boundaries, opportunities for personalized medicine applications that are afforded by the application of advanced data science methods on the increasingly available, large-scale and multimodal data repositories are starting to be more broadly recognized. A particularly intriguing phenomenon is the discrepancy between the high heritability of schizophrenia and the difficulty in identifying predictive genetic signatures, for which polygenic risk scores of common variants that explain approximately 18% of illness-associated variance remain the gold standard. A substantial body of research points towards two lines of investigation that may lead to a significant advance, resolve at least in part the ‘missing heritability’ phenomenon, and potentially provide the basis for more predictive, personalized clinical tools. First, it is paramount to better understand the impact of environmental factors on illness risk and elucidate the biology underlying their impact on altered brain function in schizophrenia. This thesis aims to close a major gap in our understanding of the multivariate, epigenetic landscape associated with schizophrenia, its interaction with polygenic risk and its association with DLPFC-HC connectivity, a well-established and robust neural intermediate phenotype of schizophrenia. As a basis for this, we have developed a novel biologically-informed machine learning framework by incorporating systems-level biological domain knowledge, i.e., gene ontological pathways, entitled ‘BioMM’ using genome-wide DNA methylation data obtained from whole blood samples. An epigenetic poly-methylation score termed ‘PMS’ was estimated at the individual level using BioMM, trained and validated using a total of 2230 whole-blood samples and 244 post-mortem brain samples. The pathways contributing most to this PMS were strongly associated with synaptic, neural and immune system-related functions. The identified PMS could be successfully validated in two independent cohorts, demonstrating the robust generalizability of the identified model. Furthermore, the PMS could significantly differentiate patients with schizophrenia from healthy controls when predicted in DLPFC post-mortem brain samples, suggesting that the epigenetic landscape of schizophrenia is to a certain extent shared between the central and peripheral tissues. Importantly, the peripheral PMS was associated with an intermediate neuroimaging phenotype (i.e., DLPFC-HC functional connectivity) in two independent imaging samples under the working memory paradigm. However, we did not find sufficient evidence for a combined genetic and epigenetic effect on brain function by integrating PRS derived from GWAS data, which suggested that DLPFC-HC coupling was predominantly impacted by environmental risk components, rather than polygenic risk of common variants. The epigenetic signature was further not associated with GWAS-derived risk scores implying the observed epigenetic effect did likely not depend on the underlying genetics, and this was further substantiated by investigation of data from unaffected first-degree relatives of patients with SCZ, BD, MDD and autism. In summary, the characterization of PMS through the systems-level integration of multimodal data elucidates the multivariate impact of epigenetic effects on schizophrenia-relevant brain function and its interdependence with genetic illness risk. Second, the limited predictive value of polygenic risk scores and the difficulty in identifying associations with heritable neural differences found in schizophrenia may be due to the possibility that the manifestation of the functional consequences of genetic risk is modulated by spatio-temporal as well as sex-specific effects. To address this, this thesis identifies sex-differences in the spatio-temporal expression trajectories during human development of genes that showed significant prefrontal co-expression with schizophrenia risk genes during the fetal phase and adolescence, consistent with a core developmental hypothesis of schizophrenia. More specifically, it was found that during these two time-periods, prefrontal expression was significantly more variable in males compared to females, a finding that could be validated in an independent data source and that was specific for schizophrenia compared to other psychiatric as well as somatic illnesses. Similar to the epigenetic differences described above, the genes underlying the risk-associated gene expression differences were significantly linked to synaptic function. Notably, individual genes with male-specific variability increases were distinct between the fetal phase and adolescence, potentially suggesting different risk associated mechanisms that converge on the shared synaptic involvement of these genes. These results provide substantial support to the hypothesis that the functional consequences of genetic risk show spatiotemporal specificity. Importantly, the temporal specificity was linked to the fetal phase and adolescence, time-periods that are thought to be of predominant importance for the brain-functional consequences of environmental risk exposure. Therefore, the presented results provide the basis for future studies exploring the polygenic risk architecture and its interaction with environmental effects in a multivariate and spatiotemporally stratified manner. In summary, the work presented in this thesis describes multivariate, multimodal approaches to characterize the (epi-)genetic basis of schizophrenia, explores its association with a well-established neural intermediate phenotype of the illness and investigates the spatio-temporal specificity of schizophrenia-relevant gene expression effects. This work expands our knowledge of the complex biology underlying schizophrenia and provides the basis for the future development of more predictive biological algorithms that may aid in advancing personalized medicine in psychiatry

    26th Annual Computational Neuroscience Meeting (CNS*2017): Part 1

    Get PDF

    Sparse multivariate models for pattern detection in high-dimensional biological data

    No full text
    Recent advances in technology have made it possible and affordable to collect biological data of unprecedented size and complexity. While analysing such data, traditional statistical methods and machine learning algorithms suffer from the curse of dimensionality. Parsimonious models, which may refer to parsimony in model structure and/or model parameters, have been shown to improve both biological interpretability of the model and the generalisability to new data. In this thesis we are concerned with model selection in both supervised and unsupervised learning tasks. For supervised learnings, we propose a new penalty called graphguided group lasso (GGGL) and employ this penalty in penalised linear regressions. GGGL is able to integrate prior structured information with data mining, where variables sharing similar biological functions are collected into groups and the pairwise relatedness between groups are organised into a network. Such prior information will guide the selection of variables that are predictive to a univariate response, so that the model selects variable groups that are close in the network and important variables within the selected groups. We then generalise the idea of incorporating network-structured prior knowledge to association studies consisting of multivariate predictors and multivariate responses and propose the network-driven sparse reduced-rank regression (NsRRR). In NsRRR, pairwise relatedness between predictors and between responses are represented by two networks, and the model identifies associations between a subnetwork of predictors and a subnetwork of responses such that both subnetworks tend to be connected. For unsupervised learning, we are concerned with a multi-view learning task in which we compare the variance of high-dimensional biological features collected from multiple sources which are referred as “views”. We propose the sparse multi-view matrix factorisation (sMVMF) which is parsimonious in both model structure and model parameters. sMVMF can identify latent factors that regulate variability shared across all views and the variability which is characteristic to a specific view, respectively. For each novel method, we also present simulation studies and an application on real biological data to illustrate variable selection and model interpretability perspectives.Open Acces
    corecore