2,741 research outputs found
Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression
We present a new method for the detection of gene pathways associated with a
multivariate quantitative trait, and use it to identify causal pathways
associated with an imaging endophenotype characteristic of longitudinal
structural change in the brains of patients with Alzheimer's disease (AD). Our
method, known as pathways sparse reduced-rank regression (PsRRR), uses group
lasso penalised regression to jointly model the effects of genome-wide single
nucleotide polymorphisms (SNPs), grouped into functional pathways using prior
knowledge of gene-gene interactions. Pathways are ranked in order of importance
using a resampling strategy that exploits finite sample variability. Our
application study uses whole genome scans and MR images from 464 subjects in
the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs
are mapped to 185 gene pathways from the KEGG pathways database. Voxel-wise
imaging signatures characteristic of AD are obtained by analysing 3D patterns
of structural change at 6, 12 and 24 months relative to baseline. High-ranking,
AD endophenotype-associated pathways in our study include those describing
chemokine, Jak-stat and insulin signalling pathways, and tight junction
interactions. All of these have been previously implicated in AD biology. In a
secondary analysis, we investigate SNPs and genes that may be driving pathway
selection, and identify a number of previously validated AD genes including
CR1, APOE and TOMM40
Recommended from our members
GenEpi: gene-based epistasis discovery using machine learning.
BackgroundGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).ResultsIn this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.ConclusionsThe results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future
Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases
Recent advances of information technology in biomedical sciences and other
applied areas have created numerous large diverse data sets with a high
dimensional feature space, which provide us a tremendous amount of information
and new opportunities for improving the quality of human life. Meanwhile, great
challenges are also created driven by the continuous arrival of new data that
requires researchers to convert these raw data into scientific knowledge in
order to benefit from it. Association studies of complex diseases using SNP
data have become more and more popular in biomedical research in recent years.
In this paper, we present a review of recent statistical advances and
challenges for analyzing correlated high dimensional SNP data in genomic
association studies for complex diseases. The review includes both general
feature reduction approaches for high dimensional correlated data and more
specific approaches for SNPs data, which include unsupervised haplotype
mapping, tag SNP selection, and supervised SNPs selection using statistical
testing/scoring, statistical modeling and machine learning methods with an
emphasis on how to identify interacting loci.Comment: Published in at http://dx.doi.org/10.1214/07-SS026 the Statistics
Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- ā¦