Search CORE

1,585 research outputs found

Correcting for ascertainment bias in the inference of population structure

Author: Foll Matthieu
Guillot Gilles
Publication venue
Publication date: 02/08/2017
Field of study

Background: The ascertainment process of molecular markers amounts to disregard loci carrying alleles with low frequencies. This can result in strong biases in inferences under population genetics models if not properly taken into account by the inference algorithm. Attempting to model this censoring process in view of making inference of population structure (i.e.identifying clusters of individuals) brings up challenging numerical difficulties. Method: These difficulties are related to the presence of intractable normalizing constants in Metropolis-Hastings acceptance ratios. This can be solved via an Markov chain Monte Carlo (MCMC) algorithm known as single variable exchange algorithm (SVEA). Result: We show how this general solution can be implemented for a class of clustering models of broad interest in population genetics that includes the models underlying the computer programs STRUCTURE, GENELAND and GESTE. We also implement the method proposed for a simple example and show that it allows us to reduce the bias substantially. Availability: Further details and a computer program implementing the method are available from http://folk.uio.no/gillesg/AscB/ Contact: [email protected]

RERO DOC Digital Library

Population Structure and Cryptic Relatedness in Genetic Association Studies

Author: Astle William
Balding David J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

We review the problem of confounding in genetic association studies, which arises principally because of population structure and cryptic relatedness. Many treatments of the problem consider only a simple ``island'' model of population structure. We take a broader approach, which views population structure and cryptic relatedness as different aspects of a single confounder: the unobserved pedigree defining the (often distant) relationships among the study subjects. Kinship is therefore a central concept, and we review methods of defining and estimating kinship coefficients, both pedigree-based and marker-based. In this unified framework we review solutions to the problem of population structure, including family-based study designs, genomic control, structured association, regression control, principal components adjustment and linear mixed models. The last solution makes the most explicit use of the kinships among the study subjects, and has an established role in the analysis of animal and plant breeding studies. Recent computational developments mean that analyses of human genetic association data are beginning to benefit from its powerful tests for association, which protect against population structure and cryptic kinship, as well as intermediate levels of confounding by the pedigree.Comment: Published in at http://dx.doi.org/10.1214/09-STS307 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

OpenGrey Repository

University of Melbourne Institutional Repository

The date of interbreeding between Neandertals and modern humans

Author: Li Heng
Patterson Nick
Pääbo Svante
Reich David
Sankararaman Sriram
Publication venue
Publication date: 01/01/2012
Field of study

Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000-86,000 years before the present (BP), and most likely 47,000-65,000 years ago. This supports the recent interbreeding hypothesis, and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa

arXiv.org e-Print Archive

CiteSeerX

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

MPG.PuRe

FigShare

On the informativeness of dominant and co-dominant genetic markers for Bayesian supervised clustering

Author: Carpentier-Skandalis Alexandra
Guillot Gilles
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2011
Field of study

We study the accuracy of Bayesian supervised method used to cluster individuals into genetically homogeneous groups on the basis of dominant or codominant molecular markers. We provide a formula relating an error criterion the number of loci used and the number of clusters. This formula is exact and holds for arbitrary number of clusters and markers. Our work suggests that dominant markers studies can achieve an accuracy similar to that of codominant markers studies if the number of markers used in the former is about 1.7 times larger than in the latter

arXiv.org e-Print Archive

Online Research Database In Technology

Inference of population splits and mixtures from genome-wide allele frequency data

Author: A Keinan
A RoyChoudhury
AL Price
AR Boyko
BM Henn
BM vonHoldt
BS Weir
C Becquet
D Reich
D Reich
D Reich
DH Huson
DJ Lawson
EY Durand
G Bhatia
G Coop
G Hellenthal
G Liti
G McVean
G Nicholson
GM Lathrop
HG Parker
Hua Tang
I Gronau
J Felsenstein
J Felsenstein
J Felsenstein
J Hey
J Novembre
J Novembre
J Novembre
J Sirén
J Sukumaran
JK Pritchard
JK Pritchard
Jonathan K. Pritchard
Joseph K. Pickrell
JZ Li
K Lindblad-Toh
LL Cavalli-Sforza
LL Cavalli-Sforza
LL Cavalli-Sforza
LL Cavalli-Sforza
LS Kubatko
M Bonhomme
M DeGiorgio
M Jakobsson
M Nei
M Nei
M Rasmussen
MA Beaumont
N Patterson
N Patterson
N Saitou
NA Rosenberg
O François
P Beerli
P Menozzi
P Moorjani
R Nielsen
RE Green
RJ Dyer
RL Cann
RN Gutenkunst
RR Hudson
S Xu
SF Schaffner
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In this model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15 figures. This is an updated version of the preprint available at http://precedings.nature.com/documents/6956/version/

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

FigShare

Correcting the Site Frequency Spectrum for Divergence-Based Ascertainment

Author: Kern Andrew D
Publication venue: Dartmouth Digital Commons
Publication date: 16/04/2009
Field of study

Comparative genomics based on sequenced referenced genomes is essential to hypothesis generation and testing within population genetics. However, selection of candidate regions for further study on the basis of elevated or depressed divergence between species leads to a divergence-based ascertainment bias in the site frequency spectrum within selected candidate loci. Here, a method to correct this problem is developed that obtains maximum-likelihood estimates of the unascertained allele frequency distribution using numerical optimization. I show how divergence-based ascertainment may mimic the effects of natural selection and offer correction formulae for performing proper estimation into the strength of selection in candidate regions in a maximum-likelihood setting

Dartmouth Digital Commons (Dartmouth College)

Correcting the Site Frequency Spectrum for Divergence-Based Ascertainment

Author: AD Kern
Andrew D. Kern
CD Bustamante
CD Bustamante
F Tajima
G Bejerano
J Hey
JA Drake
JH Gillespie
JH Gillespie
Justin C. Fay
KS Pollard
R Nielsen
R Nielsen
S Katzman
SA Sawyer
SH Williamson
WH Press
Publication venue: Public Library of Science
Publication date: 16/04/2009
Field of study

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Dartmouth Digital Commons (Dartmouth College)