664 research outputs found
Reconstructing pedigrees: some identifiability questions for a recombination-mutation model
Pedigrees are directed acyclic graphs that represent ancestral relationships
between individuals in a population. Based on a schematic recombination
process, we describe two simple Markov models for sequences evolving on
pedigrees - Model R (recombinations without mutations) and Model RM
(recombinations with mutations). For these models, we ask an identifiability
question: is it possible to construct a pedigree from the joint probability
distribution of extant sequences? We present partial identifiability results
for general pedigrees: we show that when the crossover probabilities are
sufficiently small, certain spanning subgraph sequences can be counted from the
joint distribution of extant sequences. We demonstrate how pedigrees that
earlier seemed difficult to distinguish are distinguished by counting their
spanning subgraph sequences.Comment: 40 pages, 9 figure
The era of the ARG: an empiricist's guide to ancestral recombination graphs
In the presence of recombination, the evolutionary relationships between a
set of sampled genomes cannot be described by a single genealogical tree.
Instead, the genomes are related by a complex, interwoven collection of
genealogies formalized in a structure called an ancestral recombination graph
(ARG). An ARG extensively encodes the ancestry of the genome(s) and thus is
replete with valuable information for addressing diverse questions in
evolutionary biology. Despite its potential utility, technological and
methodological limitations, along with a lack of approachable literature, have
severely restricted awareness and application of ARGs in empirical evolution
research. Excitingly, recent progress in ARG reconstruction and simulation have
made ARG-based approaches feasible for many questions and systems. In this
review, we provide an accessible introduction and exploration of ARGs, survey
recent methodological breakthroughs, and describe the potential for ARGs to
further existing goals and open avenues of inquiry that were previously
inaccessible in evolutionary genomics. Through this discussion, we aim to more
widely disseminate the promise of ARGs in evolutionary genomics and encourage
the broader development and adoption of ARG-based inference.Comment: 34 pages, 3 figures, 3 table
Graphics for relatedness research
Studies of relatedness have been crucial in molecular ecology over the last decades. Good evidence of this is the fact that studies of population structure, evolution of social behaviours, genetic diversity and quantitative genetics all involve relatedness research. The main aim of this article is to review the most
common graphical methods used in allele sharing studies for detecting and identifying family relationships. Both IBS and IBD based allele sharing studies are considered. Furthermore, we propose two additional graphical methods from the field of compositional data analysis: the ternary diagram and scatterplots of isometric log-ratios of IBS and IBD probabilities. We illustrate all graphical tools with genetic data from the HGDP-CEPH diversity panel, using mainly 377 microsatellites genotyped for 25 individuals from the Maya population of this panel. We enhance all graphics with convex hulls obtained by simulation and use these to confirm the documented relationships. The proposed compositional graphics are shown to be useful in relatedness research, as they also single out the most prominent related pairs. The ternary diagram is advocated for its ability to display all three allele sharing probabilities simultaneously. The log-ratio plots are advocated as an attempt to overcome the problems with the Euclidean distance interpretation in the
classical graphics.Peer ReviewedPostprint (published version
Single nucleotide polymorphism-based dispersal estimates using noninvasive sampling
Quantifying dispersal within wild populations is an important but challenging task. Here we present a method to estimate contemporary, individual-based dispersal distance from noninvasively collected samples using a specialized panel of 96 SNPs (single nucleotide polymorphisms). One main issue in conducting dispersal studies is the requirement for a high sampling resolution at a geographic scale appropriate for capturing the majority of dispersal events. In this study, fecal samples of brown bear (Ursus arctos) were collected by volunteer citizens, resulting in a high sampling resolution spanning over 45,000km(2) in Gavleborg and Dalarna counties in Sweden. SNP genotypes were obtained for unique individuals sampled (n=433) and subsequently used to reconstruct pedigrees. A Mantel test for isolation by distance suggests that the sampling scale was appropriate for females but not for males, which are known to disperse long distances. Euclidean distance was estimated between mother and offspring pairs identified through the reconstructed pedigrees. The mean dispersal distance was 12.9km (SE 3.2) and 33.8km (SE 6.8) for females and males, respectively. These results were significantly different (Wilcoxon's rank-sum test: P-value=0.02) and are in agreement with the previously identified pattern of male-biased dispersal. Our results illustrate the potential of using a combination of noninvasively collected samples at high resolution and specialized SNPs for pedigree-based dispersal models
A genetic algorithm based method for stringent haplotyping of family data
<p>Abstract</p> <p>Background</p> <p>The linkage phase, or haplotype, is an extra level of information that in addition to genotype and pedigree can be useful for reconstructing the inheritance pattern of the alleles in a pedigree, and computing for example Identity By Descent probabilities. If a haplotype is provided, the precision of estimated IBD probabilities increases, as long as the haplotype is estimated without errors. It is therefore important to only use haplotypes that are strongly supported by the available data for IBD estimation, to avoid introducing new errors due to erroneous linkage phases.</p> <p>Results</p> <p>We propose a genetic algorithm based method for haplotype estimation in family data that includes a stringency parameter. This allows the user to decide the error tolerance level when inferring parental origin of the alleles. This is a novel feature compared to existing methods for haplotype estimation. We show that using a high stringency produces haplotype data with few errors, whereas a low stringency provides haplotype estimates in most situations, but with an increased number of errors.</p> <p>Conclusion</p> <p>By including a stringency criterion in our haplotyping method, the user is able to maintain the error rate at a suitable level for the particular study; one can select anything from haplotyped data with very small proportion of errors and a higher proportion of non-inferred haplotypes, to data with phase estimates for every marker, when haplotype errors are tolerable. Giving this choice makes the method more flexible and useful in a wide range of applications as it is able to fulfil different requirements regarding the tolerance for haplotype errors, or uncertain marker-phases.</p
- …