6 research outputs found

    Bayesian longitudinal low-rank regression models for imaging genetic data from longitudinal studies

    Get PDF
    To perform a joint analysis of multivariate neuroimaging phenotypes and candidate genetic markers obtained from longitudinal studies, we develop a Bayesian longitudinal low-rank regression (L2R2) model. The L2R2 model integrates three key methodologies: a low-rank matrix for approximating the high-dimensional regression coefficient matrices corresponding to the genetic main effects and their interactions with time, penalized splines for characterizing the overall time effect, and a sparse factor analysis model coupled with random effects for capturing within-subject spatio-temporal correlations of longitudinal phenotypes. Posterior computation proceeds via an efficient Markov chainMonte Carlo algorithm. Simulations show that the L2R2 model outperforms several other competing methods. We apply the L2R2 model to investigate the effect of single nucleotide polymorphisms (SNPs) on the top 10 and top 40 previously reported Alzheimer disease-associated genes. We also identify associations between the interactions of these SNPs with patient age and the tissue volumes of 93 regions of interest from patients’ brain images obtained from the Alzheimer’s Disease Neuroimaging Initiative

    Bayesian longitudinal low-rank regression models for imaging genetic data from longitudinal studies

    No full text
    To perform a joint analysis of multivariate neuroimaging phenotypes and candidate genetic markers obtained from longitudinal studies, we develop a Bayesian longitudinal low-rank regression (L2R2) model. The L2R2 model integrates three key methodologies: a low-rank matrix for approximating the high-dimensional regression coefficient matrices corresponding to the genetic main effects and their interactions with time, penalized splines for characterizing the overall time effect, and a sparse factor analysis model coupled with random effects for capturing within-subject spatio-temporal correlations of longitudinal phenotypes. Posterior computation proceeds via an efficient Markov chainMonte Carlo algorithm. Simulations show that the L2R2 model outperforms several other competing methods. We apply the L2R2 model to investigate the effect of single nucleotide polymorphisms (SNPs) on the top 10 and top 40 previously reported Alzheimer disease-associated genes. We also identify associations between the interactions of these SNPs with patient age and the tissue volumes of 93 regions of interest from patients’ brain images obtained from the Alzheimer’s Disease Neuroimaging Initiative

    Machine learning approaches for high-dimensional genome-wide association studies

    Get PDF
    Formålet med Genome-wide association studies (GWAS) er å finne statistiske sammenhenger mellom genetiske varianter og egenskaper av interesser. De genetiske variantene som forklarer mye av variasjonene i genomfattende genekspresjoner kan medføre konfunderende analyser av kvantitative egenskaper ved ekspresjonsplasseringer (eQTL). For å betrakte konfunderende faktorene, presenterte vi LVREML-metoden i artikkel I, en metode som er konseptuelt analogt med å estimere faste og tilfeldige effekter i Lineære Blandede modeller (LMM). Vi viste at de latente variablene med “Maximum likelihood” alltid kan velges ortogonalt til de kjente faktorene (som genetiske variasjoner). Dette indikerer at “Maximum likelihood” variablene forklarer utvalgsvariansene som ikke allerede er forklart av de genetiske variantene i modellen. For å kartlegge hvilke egenskaper som påvirkes av de identifiserte genetiske variantene, må vi reversere den funksjonelle relasjonen mellom genotyper og egenskaper. I denne sammenhengen er en “multi-trait” metode mer fordelaktige enn å studere egenskapene individuelt. “Multi-trait”-metoden drar nytte av økt kapasitet som følge av å vurdere kovarianser på tvers av egenskaper, og redusert multiple tester, fordi det trengs en enkelt test for å teste for sammenhenger til et sett med egenskaper. I artikkel II analyserte vi ulike maskinlæringsmetoder (Naive Bayes/independent univariate correlation, random forests og support vector machines) for omvendt regresjon i multi-trekk GWAS, ved bruk av genotyper, genuttrykksdata og “groundtruth” transcriptional regulatory networks fra DREAM5 SysGen Challenge og fra en krysning mellom to gjærstammer for å evaluere metoder. I artikkel III utvidet vi metoden ovenfor til å behandle menneskelig data. En viktig forskjell mellom data fra artikkel II og artikkel III er at vi ikke har “Groundtruth” data tilgjengelig for sistnevnte. Vi brukte genotypen og Magnetresonanstomografi (MRI) data hentet fra ADNI databasen. Resultatene fra både artikkel II og artikkel III viste at resultat av genotypeprediksjon varierte på tvers av genetiske varianter. Dette hjulpet med å identifisere genomiske regioner som er assosiert med stort antall egenskaper i høydimensjonale fenotypiske data. Vi observerte også at koeffisientene til maskinlæringsmodeller korrelerte med styrken til assosiasjonene mellom varianter og egenskaper. Resultatene våre viste også at ikke-lineære maskin-læringsmetoder som “random forests” identifiserte genetiske varianter tydeligere enn de lineære metodene. Spesielt observerte vi i artikkel III at “random forests” var i stand til å identifisere enkeltnukleotidpolymorfismer (SNP-er) som var forskjellige fra de som ble identifisert “ridge” og“lasso” regresjonsmetodene. Ytterligere analyse viste at de identifiserte SNP-ene tilhørte gener som tidligere var assosiert med hjernerelaterte lidelser.Genome-wide association studies (GWAS) aim to find statistical associations between genetic variants and traits of interests. The genetic variants that explain a lot of variation in genome-wide gene expression may lead to confounding in expression quantitative trait loci (eQTL) analyses. To account for these confounding factors, in Article I we proposed LVREML, a method conceptually analogous to estimating fixed and random effects in linear mixed models (LMM). We showed that the maximum-likelihood latent variables can always be chosen orthogonal to the known factors (such genetic variants). This indicates that the maximum-likelihood variables explain the sample covariances that is not already explained by the genetic variants in the model. For identifying which traits are effected by the identified genetic variants, we need to reverse the functional relation between genotypes and traits. In this regard, multitrait approaches are more advantageous than studying the traits individually. The multi-trait approaches benefit from increased power from considering cross-trait covariances and reduced multiple testing burden because a single test is needed to test for associations to a set of traits. In Article II, we analyzed various machine learning methods (ridge regression, Naive Bayes/independent univariate correlation, random forests and support vector machines) for reverse regression in multi-trait GWAS, using genotypes, gene expression data and ground-truth transcriptional regulatory networks from the DREAM5 SysGen Challenge and from a cross between two yeast strains to evaluate methods. In Article III, we extended the above approach to human dataset. An important difference between data from Article II and Article III is that we do not have groundtruth data available for the latter. We used the genotype and brain-imaging features extracted from the MRIs obtained from the ADNI database. The results from both Article II and Article III showed that the genotype prediction performance varied across genetic variants. This helped in identifying genomic regions that are associated with high number of traits in high-dimensional phenotypic data. We also observed that the feature coefficients of fitted machine learning models correlated with the strength of association between variants and traits. Our results also showed that non-linear machine learning methods like random forests identified genetic variants distinct from the linear methods. In particular, we observed in Article III that random forest was able to identify single-nueclotide-polymorphisms (SNPs) that were distinct from the ones identified by ridge and lasso regression. Further analysis showed that the identified SNPs belonged to genes previously associated with brain-related disorders.Doktorgradsavhandlin
    corecore