235 research outputs found
Population Structure and Cryptic Relatedness in Genetic Association Studies
We review the problem of confounding in genetic association studies, which
arises principally because of population structure and cryptic relatedness.
Many treatments of the problem consider only a simple ``island'' model of
population structure. We take a broader approach, which views population
structure and cryptic relatedness as different aspects of a single confounder:
the unobserved pedigree defining the (often distant) relationships among the
study subjects. Kinship is therefore a central concept, and we review methods
of defining and estimating kinship coefficients, both pedigree-based and
marker-based. In this unified framework we review solutions to the problem of
population structure, including family-based study designs, genomic control,
structured association, regression control, principal components adjustment and
linear mixed models. The last solution makes the most explicit use of the
kinships among the study subjects, and has an established role in the analysis
of animal and plant breeding studies. Recent computational developments mean
that analyses of human genetic association data are beginning to benefit from
its powerful tests for association, which protect against population structure
and cryptic kinship, as well as intermediate levels of confounding by the
pedigree.Comment: Published in at http://dx.doi.org/10.1214/09-STS307 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Improving the Efficiency of Genomic Selection
We investigate two approaches to increase the efficiency of phenotypic
prediction from genome-wide markers, which is a key step for genomic selection
(GS) in plant and animal breeding. The first approach is feature selection
based on Markov blankets, which provide a theoretically-sound framework for
identifying non-informative markers. Fitting GS models using only the
informative markers results in simpler models, which may allow cost savings
from reduced genotyping. We show that this is accompanied by no loss, and
possibly a small gain, in predictive power for four GS models: partial least
squares (PLS), ridge regression, LASSO and elastic net. The second approach is
the choice of kinship coefficients for genomic best linear unbiased prediction
(GBLUP). We compare kinships based on different combinations of centring and
scaling of marker genotypes, and a newly proposed kinship measure that adjusts
for linkage disequilibrium (LD).
We illustrate the use of both approaches and examine their performances using
three real-world data sets from plant and animal genetics. We find that elastic
net with feature selection and GBLUP using LD-adjusted kinships performed
similarly well, and were the best-performing methods in our study.Comment: 17 pages, 5 figure
Encoding of low-quality DNA profiles as genotype probability matrices for improved profile comparisons, relatedness evaluation and database searches
Many DNA profiles recovered from crime scene samples are of a quality that
does not allow them to be searched against, nor entered into, databases. We
propose a method for the comparison of profiles arising from two DNA samples,
one or both of which can have multiple donors and be affected by low DNA
template or degraded DNA. We compute likelihood ratios to evaluate the
hypothesis that the two samples have a common DNA donor, and hypotheses
specifying the relatedness of two donors. Our method uses a probability
distribution for the genotype of the donor of interest in each sample. This
distribution can be obtained from a statistical model, or we can exploit the
ability of trained human experts to assess genotype probabilities, thus
extracting much information that would be discarded by standard interpretation
rules. Our method is compatible with established methods in simple settings,
but is more widely applicable and can make better use of information than many
current methods for the analysis of mixed-source, low-template DNA profiles. It
can accommodate uncertainty arising from relatedness instead of or in addition
to uncertainty arising from noisy genotyping. We describe a computer program
GPMDNA, available under an open source license, to calculate LRs using the
method presented in this paper.Comment: 28 pages. Accepted for publication 2-Sep-2016 - Forensic Science
International: Genetic
Modelling cost-effective air pollution abatement: a multi-period linear programming approach
Improvements in air quality for some criteria pollutants in Sydney, Wollongong and the Lower Hunter have been achieved, whilst further improvements are required for others.Environmental Economics and Policy,
Multiple Quantitative Trait Analysis Using Bayesian Networks
Models for genome-wide prediction and association studies usually target a
single phenotypic trait. However, in animal and plant genetics it is common to
record information on multiple phenotypes for each individual that will be
genotyped. Modeling traits individually disregards the fact that they are most
likely associated due to pleiotropy and shared biological basis, thus providing
only a partial, confounded view of genetic effects and phenotypic interactions.
In this paper we use data from a Multiparent Advanced Generation Inter-Cross
(MAGIC) winter wheat population to explore Bayesian networks as a convenient
and interpretable framework for the simultaneous modeling of multiple
quantitative traits. We show that they are equivalent to multivariate genetic
best linear unbiased prediction (GBLUP), and that they are competitive with
single-trait elastic net and single-trait GBLUP in predictive performance.
Finally, we discuss their relationship with other additive-effects models and
their advantages in inference and interpretation. MAGIC populations provide an
ideal setting for this kind of investigation because the very low population
structure and large sample size result in predictive models with good power and
limited confounding due to relatedness.Comment: 28 pages, 1 figure, code at
http://www.bnlearn.com/research/genetics1
Assessing the forensic value of DNA evidence from Y chromosomes and mitogenomes
Y-chromosomal and mitochondrial DNA profiles have been used as evidence in
courts for decades, yet the problem of evaluating the weight of evidence has
not been adequately resolved. Both are lineage markers (inherited from just one
parent), which presents different interpretation challenges compared with
standard autosomal DNA profiles (inherited from both parents), for which
recombination increases profile diversity and weakens the effects of
relatedness. We review approaches to the evaluation of lineage marker profiles
for forensic identification, focussing on the key roles of profile mutation
rate and relatedness. Higher mutation rates imply fewer individuals matching
the profile of an alleged contributor, but they will be more closely related.
This makes it challenging to evaluate the possibility that one of these
matching individuals could be the true source, because relatedness may make
them more plausible alternative contributors than less-related individuals, and
they may not be well mixed in the population. These issues reduce the
usefulness of profile databases drawn from a broad population: the larger the
population, the lower the profile relative frequency because of lower
relatedness with the alleged contributor. Many evaluation methods do not
adequately take account of relatedness, but its effects have become more
pronounced with the latest generation of high-mutation-rate Y profiles
- …