Search CORE

235 research outputs found

Population Structure and Cryptic Relatedness in Genetic Association Studies

Author: Astle William
Balding David J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

We review the problem of confounding in genetic association studies, which arises principally because of population structure and cryptic relatedness. Many treatments of the problem consider only a simple ``island'' model of population structure. We take a broader approach, which views population structure and cryptic relatedness as different aspects of a single confounder: the unobserved pedigree defining the (often distant) relationships among the study subjects. Kinship is therefore a central concept, and we review methods of defining and estimating kinship coefficients, both pedigree-based and marker-based. In this unified framework we review solutions to the problem of population structure, including family-based study designs, genomic control, structured association, regression control, principal components adjustment and linear mixed models. The last solution makes the most explicit use of the kinships among the study subjects, and has an established role in the analysis of animal and plant breeding studies. Recent computational developments mean that analyses of human genetic association data are beginning to benefit from its powerful tests for association, which protect against population structure and cryptic kinship, as well as intermediate levels of confounding by the pedigree.Comment: Published in at http://dx.doi.org/10.1214/09-STS307 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

OpenGrey Repository

University of Melbourne Institutional Repository

Improving the Efficiency of Genomic Selection

Author: Balding David J.
Mackay Ian
Scutari Marco
Publication venue
Publication date: 01/01/2013
Field of study

We investigate two approaches to increase the efficiency of phenotypic prediction from genome-wide markers, which is a key step for genomic selection (GS) in plant and animal breeding. The first approach is feature selection based on Markov blankets, which provide a theoretically-sound framework for identifying non-informative markers. Fitting GS models using only the informative markers results in simpler models, which may allow cost savings from reduced genotyping. We show that this is accompanied by no loss, and possibly a small gain, in predictive power for four GS models: partial least squares (PLS), ridge regression, LASSO and elastic net. The second approach is the choice of kinship coefficients for genomic best linear unbiased prediction (GBLUP). We compare kinships based on different combinations of centring and scaling of marker genotypes, and a newly proposed kinship measure that adjusts for linkage disequilibrium (LD). We illustrate the use of both approaches and examine their performances using three real-world data sets from plant and animal genetics. We find that elastic net with feature selection and GBLUP using LD-adjusted kinships performed similarly well, and were the best-performing methods in our study.Comment: 17 pages, 5 figure

arXiv.org e-Print Archive

Oxford University Research Archive

University of Melbourne Institutional Repository

Encoding of low-quality DNA profiles as genotype probability matrices for improved profile comparisons, relatedness evaluation and database searches

Author: Balding David J.
Ryan K.
Williams D. Gareth
Publication venue: 'Elsevier BV'
Publication date: 14/09/2016
Field of study

Many DNA profiles recovered from crime scene samples are of a quality that does not allow them to be searched against, nor entered into, databases. We propose a method for the comparison of profiles arising from two DNA samples, one or both of which can have multiple donors and be affected by low DNA template or degraded DNA. We compute likelihood ratios to evaluate the hypothesis that the two samples have a common DNA donor, and hypotheses specifying the relatedness of two donors. Our method uses a probability distribution for the genotype of the donor of interest in each sample. This distribution can be obtained from a statistical model, or we can exploit the ability of trained human experts to assess genotype probabilities, thus extracting much information that would be discarded by standard interpretation rules. Our method is compatible with established methods in simple settings, but is more widely applicable and can make better use of information than many current methods for the analysis of mixed-source, low-template DNA profiles. It can accommodate uncertainty arising from relatedness instead of or in addition to uncertainty arising from noisy genotyping. We describe a computer program GPMDNA, available under an open source license, to calculate LRs using the method presented in this paper.Comment: 28 pages. Accepted for publication 2-Sep-2016 - Forensic Science International: Genetic

arXiv.org e-Print Archive

UCL Discovery

Modelling cost-effective air pollution abatement: a multi-period linear programming approach

Author: Adams David
Balding Jeremy
Godden David P.
Hohnen Laura
Publication venue
Publication date
Field of study

Improvements in air quality for some criteria pollutants in Sydney, Wollongong and the Lower Hunter have been achieved, whilst further improvements are required for others.Environmental Economics and Policy,

Research Papers in Economics

Multiple Quantitative Trait Analysis Using Bayesian Networks

Author: Balding David J.
Howell Phil
Mackay Ian
Scutari Marco
Publication venue
Publication date: 01/01/2014
Field of study

Models for genome-wide prediction and association studies usually target a single phenotypic trait. However, in animal and plant genetics it is common to record information on multiple phenotypes for each individual that will be genotyped. Modeling traits individually disregards the fact that they are most likely associated due to pleiotropy and shared biological basis, thus providing only a partial, confounded view of genetic effects and phenotypic interactions. In this paper we use data from a Multiparent Advanced Generation Inter-Cross (MAGIC) winter wheat population to explore Bayesian networks as a convenient and interpretable framework for the simultaneous modeling of multiple quantitative traits. We show that they are equivalent to multivariate genetic best linear unbiased prediction (GBLUP), and that they are competitive with single-trait elastic net and single-trait GBLUP in predictive performance. Finally, we discuss their relationship with other additive-effects models and their advantages in inference and interpretation. MAGIC populations provide an ideal setting for this kind of investigation because the very low population structure and large sample size result in predictive models with good power and limited confounding due to relatedness.Comment: 28 pages, 1 figure, code at http://www.bnlearn.com/research/genetics1

arXiv.org e-Print Archive

PubMed Central

Oxford University Research Archive

Y-profile evidence:Close paternal relatives and mixtures

Author: Andersen Mikkel Meyer
Balding David J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

VBN

Assessing the forensic value of DNA evidence from Y chromosomes and mitogenomes

Author: Andersen Mikkel Meyer
Balding David J.
Publication venue: arXiv.org
Publication date: 01/01/2021
Field of study

VBN

Assessing the forensic value of DNA evidence from Y chromosomes and mitogenomes

Author: Andersen Mikkel M
Balding David J
Publication venue
Publication date: 01/01/2021
Field of study

Y-chromosomal and mitochondrial DNA profiles have been used as evidence in courts for decades, yet the problem of evaluating the weight of evidence has not been adequately resolved. Both are lineage markers (inherited from just one parent), which presents different interpretation challenges compared with standard autosomal DNA profiles (inherited from both parents), for which recombination increases profile diversity and weakens the effects of relatedness. We review approaches to the evaluation of lineage marker profiles for forensic identification, focussing on the key roles of profile mutation rate and relatedness. Higher mutation rates imply fewer individuals matching the profile of an alleged contributor, but they will be more closely related. This makes it challenging to evaluate the possibility that one of these matching individuals could be the true source, because relatedness may make them more plausible alternative contributors than less-related individuals, and they may not be well mixed in the population. These issues reduce the usefulness of profile databases drawn from a broad population: the larger the population, the lower the profile relative frequency because of lower relatedness with the alleged contributor. Many evaluation methods do not adequately take account of relatedness, but its effects have become more pronounced with the latest generation of high-mutation-rate Y profiles

arXiv.org e-Print Archive

Directory of Open Access Journals

Copenhagen University Research Information System

UCL Discovery

VBN

University of Melbourne Institutional Repository