628 research outputs found

    Any-way and Sparse Analyses for Multimodal Fusion and Imaging Genomics

    Get PDF
    This dissertation aims to develop new algorithms that leverage sparsity and mutual information across data modalities built upon the independent component analysis (ICA) framework to improve the performance of current ICA-based multimodal fusion approaches. These algorithms are further applied to both simulated data and real neuroimaging and genomic data to examine their performance. The identified neuroimaging and genomic patterns can help better delineate the pathology of mental disorders or brain development. To alleviate the signal-background separation difficulties in infomax-decomposed sources for genomic data, we propose a sparse infomax by enhancing a robust sparsity measure, the Hoyer index. Hoyer index is scale-invariant and well suited for ICA frameworks since the scale of decomposed sources is arbitrary. Simulation results demonstrate that sparse infomax increases the component detection accuracy for situations where the source signal-to-background (SBR) ratio is low, particularly for single nucleotide polymorphism (SNP) data. The proposed sparse infomax is further extended into two data modalities as a sparse parallel ICA for applications to imaging genomics in order to investigate the associations between brain imaging and genomics. Simulation results show that sparse parallel ICA outperforms parallel ICA with improved accuracy for structural magnetic resonance imaging (sMRI)-SNP association detection and component spatial map recovery, as well as with enhanced sparsity for sMRI and SNP components under noisy cases. Applying the proposed sparse parallel ICA to fuse the whole-brain sMRI and whole-genome SNP data of 24985 participants in the UK biobank, we identify three stable and replicable sMRI-SNP pairs. The identified sMRI components highlight frontal, parietal, and temporal regions and associate with multiple cognitive measures (with different association strengths in different age groups for the temporal component). Top SNPs in the identified SNP factor are enriched in inflammatory disease and inflammatory response pathways, which also regulate gene expression, isoform percentage, transcription expression, or methylation level in the frontal region, and the regulation effects are significantly enriched. Applying the proposed sparse parallel ICA to imaging genomics in attention-deficit/hyperactivity disorder (ADHD), we identify and replicate one SNP component related to gray matter volume (GMV) alterations in superior and middle frontal gyri underlying working memory deficit in adults and adolescents with ADHD. The association is more significant in ADHD families than controls and stronger in adults and older adolescents than younger ones. The identified SNP component highlights SNPs in long non-coding RNAs (lncRNAs) in chromosome 5 and in several protein-coding genes that are involved in ADHD, such as MEF2C, CADM2, and CADPS2. Top SNPs are enriched in human brain neuron cells and regulate gene expression, isoform percentage, transcription expression, or methylation level in the frontal region. Moreover, to increase the flexibility and robustness in mining multimodal data, we propose aNy-way ICA, which optimizes the entire correlation structure of linked components across any number of modalities via the Gaussian independent vector analysis and simultaneously optimizes independence via separate (parallel) ICAs. Simulation results demonstrate that aNy-way ICA recover sources and loadings, as well as the true covariance patterns with improved accuracy compared to existing multimodal fusion approaches, especially under noisy conditions. Applying the proposed aNy-way ICA to integrate structural MRI, fractal n-back, and emotion identification task functional MRIs collected in the Philadelphia Neurodevelopmental Cohort (PNC), we identify and replicate one linked GMV-threat-2-back component, and the threat and 2-back components are related to intelligence quotient (IQ) score in both discovery and replication samples. Lastly, we extend the proposed aNy-way ICA with a reference constraint to enable prior-guided multimodal fusion. Simulation results show that aNy-way ICA with reference recovers the designed linkages between reference and modalities, cross-modality correlations, as well as loading and component matrices with improved accuracy compared to multi-site canonical correlation analysis with reference (MCCAR)+joint ICA under noisy conditions. Applying aNy-way ICA with reference to supervise structural MRI, fractal n-back, and emotion identification task functional MRIs fusion in PNC with IQ as the reference, we identify and replicate one IQ-related GMV-threat-2-back component, and this component is significantly correlated across modalities in both discovery and replication samples.Ph.D

    PARALLEL INDEPENDENT COMPONENT ANALYSIS WITH REFERENCE FOR IMAGING GENETICS: A SEMI-BLIND MULTIVARIATE APPROACH

    Get PDF
    Imaging genetics is an emerging field dedicated to the study of genetic underpinnings of brain structure and function. Over the last decade, brain imaging techniques such as magnetic resonance imaging (MRI) have been increasingly applied to measure morphometry, task-based function and connectivity in living brains. Meanwhile, high-throughput genotyping employing genome-wide techniques has made it feasible to sample the entire genome of a substantial number of individuals. While there is growing interest in image-wide and genome-wide approaches which allow unbiased searches over a large range of variants, one of the most challenging problems is the correction for the huge number of statistical tests used in univariate models. In contrast, a reference-guided multivariate approach shows specific advantage for simultaneously assessing many variables for aggregate effects while leveraging prior information. It can improve the robustness of the results compared to a fully blind approach. In this dissertation we present a semi-blind multivariate approach, parallel independent component analysis with reference (pICA-R), to better reveal relationships between hidden factors of particular attributes. First, a consistency-based order estimation approach is introduced to advance the application of ICA to genotype data. The pICA-R approach is then presented, where independent components are extracted from two modalities in parallel and inter-modality associations are subsequently optimized for pairs of components. In particular, prior information is incorporated to elicit components of particular interests, which helps identify factors carrying small amounts of variance in large complex datasets. The pICA-R approach is further extended to accommodate multiple references whose interrelationships are unknown, allowing the investigation of functional influence on neurobiological traits of potentially related genetic variants implicated in biology. Applied to a schizophrenia study, pICA-R reveals that a complex genetic factor involving multiple pathways underlies schizophrenia-related gray matter deficits in prefrontal and temporal regions. The extended multi-reference approach, when employed to study alcohol dependence, delineates a complex genetic architecture, where the CREB-BDNF pathway plays a key role in the genetic factor underlying a proportion of variation in cue-elicited brain activations, which plays a role in phenotypic symptoms of alcohol dependence. In summary, our work makes several important contributions to advance the application of ICA to imaging genetics studies, which holds the promise to improve our understating of genetics underlying brain structure and function in healthy and disease

    Data integration in inflammatory bowel disease

    Get PDF
    [eng] INTRODUCTION: Inflammatory bowel disease is a complex intestinal disease with several genetic and environmental factors that can influence its course. The ethiology and pathophysiology of the disease is not fully understood. There is some evidence that microbiome can play a role. Finding relationships between microbiome and host’s mucosa could help advance prevention, diagnosis or treatment. METHODS: We based our analysis on intestinal bacterial 16S rRNA and human transcriptome data from biopsies from multiple timepoints and intestine segments. We extended regularized generalized canonical correlation analysis to find models that are coherent with previous knowledge on the disease taking into account the samples’ information. Multiple inflammatory bowel disease datasets on different treatments and conditions were analysed and the models defining those dataset were compared. The results were compared with multiple co-inertia analysis. RESULTS: Splitting sample variables into different blocks results in models of these relationships that show differences on the genes and microorganisms selected. The models generated using our new method inteRmodel outperformed multiple coinertia analysis to classify the samples according to their location. Despite being used on datasets of different sources the resulting models show similar relationships between variables. DISCUSSION: Comparing multiple models helps find out the relationships within datasets. Our method finds how strong are the relationships between the microbiome, transcriptome and environmental variables. On different datasets genes selected were common. This approach is robust and flexible to different datasets and settings. CONCLUSION: With inteRmodel we found that the microbiome relates more closely to the sample location than to disease, but the transcriptome is highly related to the location of the sample on the intestine. There is a common transcriptome between datasets while microorganisms depend of the dataset. We can improve sample classification by taking into account both bacterial 16S rRNA and host transcriptome.[cat] INTRODUCCIÓ: La malaltia inflamatòria intestinal és una malaltia intestinal complexa amb diversos factors genètics i ambientals que poden influir en el seu curs. L'etiologia i fisiopatologia de la malaltia no es con eix del tot. Hi ha evidències que el microbioma pot tenir un paper rellevant. Trobar relacions entre el microbioma i la mucosa de l'hoste podria ajudar a avançar en la prevenció, el diagnòstic o el tractament. MÈTODES: Vam basar la nostra anàlisi en dades d'ARNr 16S bacteriana intestinal i de transcriptoma humà de biòpsies de múltiples punts de temps i segments intestinals. Hem ampliat l'anàlisi de correlació canònica generalitzada regularitzada per trobar models coherents amb el coneixement previ sobre la malaltia tenint en compte la informació de les mostres. Es van analitzar diversos conjunts de dades de malaltia inflamatòria intestinal sobre diferents tractaments i condicions i es van comparar els models que defineixen aquest conjunt de dades. Els resultats es van comparar amb l'anàlisi de coinèrcia múltiple. RESULTATS: Dividir les variables de la mostra en diferents blocs dona com a resultat models d'aquestes relacions que mostren diferències en els gens i els microorganismes seleccionats. Els models generats mitjançant el nostre nou mètode intermodel van superar l'anàlisi de coinèrcia múltiple per classificar les mostres segons la seva ubicació. Tot i utilitzar-se en conjunts de dades de diferents fonts, els models resultants mostren relacions similars entre variables. DISCUSSIÓ: La comparació de diversos models ajuda a esbrinar les relacions dins dels conjunts de dades. El nostre mètode troba com de fortes són les relacions entre el microbioma, el transcriptoma i les variables ambientals. En diferents conjunts de dades, els gens seleccionats eren comuns. Aquest enfocament és robust i flexible per a diferents conjunts de dades i configuracions. CONCLUSIÓ: Amb inteRmodel vam trobar que el microbioma es relaciona més estretament amb la ubicació de la mostra que amb la malaltia, però el transcriptoma està molt relacionat amb la ubicació de la mostra a l'intestí. Hi ha un transcriptoma comú entre conjunts de dades, mentre que els microorganismes depenen del conjunt de dades. Podem millorar la classificació de les mostres tenint en compte tant l'ARNr 16S bacterià com el transcriptoma hoste.[spa] INTRODUCCIÓN: La enfermedad inflamatoria intestinal es una enfermedad intestinal compleja con factores genéticos y ambientales que pueden influir en su curso. La etiología y la fisiopatología de la enfermedad no se conocen por completo. Existen evidencias que el microbioma puede desempeijar un papel relevante. Encontrar relaciones entre el microbioma y la mucosa del huésped podría ayudar a avanzar en la prevención, el diagnóstico o el tratamiento. MÉTODOS: Basamos nuestro análisis en el ARNr 16S bacteriano intestinal y en datos de transcriptomas humanos de biopsias de múltiples puntos temporales y segmentos intestinales. Extendimos el análisis de correlación canónica generalizada regularizado para encontrar modelos coherentes con el conocimiento previo sobre la enfermedad teniendo en cuenta la información de las muestras. Se analizaron múltiples conjuntos de datos de enfermedad inflamatoria intestinal en diferentes tratamientos y condiciones y se compararon los modelos que definen esos conjuntos de datos. Los resultados se compararon con análisis de coinercia múltiple. RESULTADOS: Dividir las variables de la muestra en diferentes bloques resulta en modelos de estas relaciones que muestran diferencias en los genes y microorganismos seleccionados. Los modelos generados con nuestro nuevo método, inter-Rmodel, superaron el análisis de múltiples coinercias para clasificar las muestras según su ubicación. A pesar de ser utilizados en conjuntos de datos de diferentes fuentes, los modelos resultantes muestran unas relaciones similares entre las variables. DISCUSIÓN: La comparación de varios modelos ayuda a descubrir las relaciones dentro de los conjuntos de datos. Nuestro método encuentra cuán fuertes son las relaciones entre el microbioma, el transcriptoma y las variables ambientales. En diferentes conjuntos de datos, los genes seleccionados eran comunes. Este enfoque es robusto y flexible para diferentes conjuntos de datos y configuraciones. CONCLUSIÓN: Con inteRmodel descubrimos que el microbioma se relaciona más estrechamente con la ubicación de la muestra que con la enfermedad, pero el transcriptoma está muy relacionado con la ubicación de la muestra en el intestino. Hay un transcriptoma común entre los conjuntos de datos, mientras que los microorganismos dependen del conjunto de datos. Podemos mejorar la clasificación de las muestras teniendo en cuenta tanto el ARNr 16S bacteriano como el transcriptoma del huésped

    Genetic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers

    Get PDF
    The Genetics Core of the Alzheimer’s Disease Neuroimaging Initiative (ADNI), formally established in 2009, aims to provide resources and facilitate research related to genetic predictors of multidimensional Alzheimer’s disease (AD)-related phenotypes. Here, we provide a systematic review of genetic studies published between 2009 and 2012 where either ADNI APOE genotype or genome-wide association study (GWAS) data were used. We review and synthesize ADNI genetic associations with disease status or quantitative disease endophenotypes including structural and functional neuroimaging, fluid biomarker assays, and cognitive performance. We also discuss the diverse analytical strategies used in these studies, including univariate and multivariate analysis, meta-analysis, pathway analysis, and interaction and network analysis. Finally, we perform pathway and network enrichment analyses of these ADNI genetic associations to highlight key mechanisms that may drive disease onset and trajectory. Major ADNI findings included all the top 10 AD genes and several of these (e.g., APOE, BIN1, CLU, CR1, and PICALM) were corroborated by ADNI imaging, fluid and cognitive phenotypes. ADNI imaging genetics studies discovered novel findings (e.g., FRMD6) that were later replicated on different data sets. Several other genes (e.g., APOC1, FTO, GRIN2B, MAGI2, and TOMM40) were associated with multiple ADNI phenotypes, warranting further investigation on other data sets. The broad availability and wide scope of ADNI genetic and phenotypic data has advanced our understanding of the genetic basis of AD and has nominated novel targets for future studies employing next-generation sequencing and convergent multi-omics approaches, and for clinical drug and biomarker development. Electronic supplementary material The online version of this article (doi:10.1007/s11682-013-9262-z) contains supplementary material, which is available to authorized users
    • …
    corecore