16,598 research outputs found
A fast algorithm for detecting gene-gene interactions in genome-wide association studies
With the recent advent of high-throughput genotyping techniques, genetic data
for genome-wide association studies (GWAS) have become increasingly available,
which entails the development of efficient and effective statistical
approaches. Although many such approaches have been developed and used to
identify single-nucleotide polymorphisms (SNPs) that are associated with
complex traits or diseases, few are able to detect gene-gene interactions among
different SNPs. Genetic interactions, also known as epistasis, have been
recognized to play a pivotal role in contributing to the genetic variation of
phenotypic traits. However, because of an extremely large number of SNP-SNP
combinations in GWAS, the model dimensionality can quickly become so
overwhelming that no prevailing variable selection methods are capable of
handling this problem. In this paper, we present a statistical framework for
characterizing main genetic effects and epistatic interactions in a GWAS study.
Specifically, we first propose a two-stage sure independence screening (TS-SIS)
procedure and generate a pool of candidate SNPs and interactions, which serve
as predictors to explain and predict the phenotypes of a complex trait. We also
propose a rates adjusted thresholding estimation (RATE) approach to determine
the size of the reduced model selected by an independence screening.
Regularization regression methods, such as LASSO or SCAD, are then applied to
further identify important genetic effects. Simulation studies show that the
TS-SIS procedure is computationally efficient and has an outstanding finite
sample performance in selecting potential SNPs as well as gene-gene
interactions. We apply the proposed framework to analyze an
ultrahigh-dimensional GWAS data set from the Framingham Heart Study, and select
23 active SNPs and 24 active epistatic interactions for the body mass index
variation. It shows the capability of our procedure to resolve the complexity
of genetic control.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS771 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Feature Screening via Distance Correlation Learning
This paper is concerned with screening features in ultrahigh dimensional data
analysis, which has become increasingly important in diverse scientific fields.
We develop a sure independence screening procedure based on the distance
correlation (DC-SIS, for short). The DC-SIS can be implemented as easily as the
sure independence screening procedure based on the Pearson correlation (SIS,
for short) proposed by Fan and Lv (2008). However, the DC-SIS can significantly
improve the SIS. Fan and Lv (2008) established the sure screening property for
the SIS based on linear models, but the sure screening property is valid for
the DC-SIS under more general settings including linear models. Furthermore,
the implementation of the DC-SIS does not require model specification (e.g.,
linear model or generalized linear model) for responses or predictors. This is
a very appealing property in ultrahigh dimensional data analysis. Moreover, the
DC-SIS can be used directly to screen grouped predictor variables and for
multivariate response variables. We establish the sure screening property for
the DC-SIS, and conduct simulations to examine its finite sample performance.
Numerical comparison indicates that the DC-SIS performs much better than the
SIS in various models. We also illustrate the DC-SIS through a real data
example.Comment: 32 pages, 5 tables and 1 figure. Wei Zhong is the corresponding
autho
- …