6 research outputs found
Shrinkage methods for variable selection and prediction with applications to genetic data
Identifying genotypes using genetic material was at first a painstaking
laboratory task. In the decades since the first gene was sequenced,
techniques have progressed through milestones requiring massive international
collaboration. Today’s genotype sequencing facilities use
high-throughput technology to sequence entire genomes within days.
Despite these technological improvements, and the resultant volume
of genetic data, the identification of meaningful genotype-phenotype
associations has not been as straightforward as was anticipated in the
pre-genome era. The genetic architecture of many common diseases
is complex, and heritability often cannot be explained when simple
statistical tests are used.
This thesis addresses a clinically important problem in statistical genetics
- that of predicting disease risk based on genotype information.
First, we review progress and current limitations in genetic risk prediction.
We then introduce penalised regression. This thesis focusses
on ridge regression, a penalised regression approach that has shown
promise in risk prediction for high-dimensional data. The choice of the
ridge parameter, which controls the amount of penalisation in ridge
regression, has not been addressed in the literature with the specific
aim of analysing genetic data. We present a method for automatically
choosing the ridge parameter based on genome-wide SNP data. Software
implementing the method is available to the community. We evaluate
the method using simulation studies and a real data example.
A ridge regression model does not indicate the strength of association
of individual variants with the outcome, a property that is often of
interest to geneticists. To this end we extend a previously proposed test of significance in ridge regression models to high-dimensional data and
to the logistic model which commonly occurs in the biomedical context.
This test is evaluated by comparison to a permutation test, which we
view as a benchmark. This test is integrated into the software package
mentioned above
ABC-SysBio—approximate Bayesian computation in Python with GPU support
Motivation: The growing field of systems biology has driven demand for flexible tools to model and simulate biological systems. Two established problems in the modeling of biological processes are model selection and the estimation of associated parameters. A number of statistical approaches, both frequentist and Bayesian, have been proposed to answer these questions
Significance testing in ridge regression for genetic data.
Published versio