1 research outputs found
An Efficient Sufficient Dimension Reduction Method for Identifying Genetic Variants of Clinical Significance
Fast and cheaper next generation sequencing technologies will generate
unprecedentedly massive and highly-dimensional genomic and epigenomic variation
data. In the near future, a routine part of medical record will include the
sequenced genomes. A fundamental question is how to efficiently extract genomic
and epigenomic variants of clinical utility which will provide information for
optimal wellness and interference strategies. Traditional paradigm for
identifying variants of clinical validity is to test association of the
variants. However, significantly associated genetic variants may or may not be
usefulness for diagnosis and prognosis of diseases. Alternative to association
studies for finding genetic variants of predictive utility is to systematically
search variants that contain sufficient information for phenotype prediction.
To achieve this, we introduce concepts of sufficient dimension reduction and
coordinate hypothesis which project the original high dimensional data to very
low dimensional space while preserving all information on response phenotypes.
We then formulate clinically significant genetic variant discovery problem into
sparse SDR problem and develop algorithms that can select significant genetic
variants from up to or even ten millions of predictors with the aid of dividing
SDR for whole genome into a number of subSDR problems defined for genomic
regions. The sparse SDR is in turn formulated as sparse optimal scoring
problem, but with penalty which can remove row vectors from the basis matrix.
To speed up computation, we develop the modified alternating direction method
for multipliers to solve the sparse optimal scoring problem which can easily be
implemented in parallel. To illustrate its application, the proposed method is
applied to simulation data and the NHLBI's Exome Sequencing Project datase