21 research outputs found
A community-maintained standard library of population genetic models
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Negative selection on complex traits limits phenotype prediction accuracy between populations
Recommended from our members
Linking Human Evolutionary History to Phenotypic Variation
A central question in genetics asks how genetic variation influences phenotypic variation. The distribution of genetic variation in a population is reflective of the evolutionary forces that shape and maintain genetic diversity such as mutation, natural selection, and genetic drift. In turn, this genetic variation affects molecular phenotypes like gene expression and eventually leads to variation in complex traits. In my dissertation, I develop statistical methods and apply computational approaches to understand these dynamics in human populations. In the first chapter, I describe a statistical model for detecting the presence of archaic haplotypes in modern human populations without having access to a reference archaic genome. I apply this method to the genomes of individuals from Europe and find that I can recover segments of DNA inherited from Neanderthals as a result of archaic admixture. In the second chapter, I apply this method to the genomes of individuals from several African populations and find that approximately 7\% of the genomes are inherited from an archaic species. Modeling of the site frequency spectrum suggests that the presence of these haplotypes is best explained by admixture with an unknown archaic hominin species. In the final chapter, I focus on the more recent history of humans and the genetic architecture of complex traits. In particular, I find that a substantial portion of the genetic architecture is population specific, which limits our ability to transfer phenotype predictions across populations
Linking Human Evolutionary History to Phenotypic Variation
A central question in genetics asks how genetic variation influences phenotypic variation. The distribution of genetic variation in a population is reflective of the evolutionary forces that shape and maintain genetic diversity such as mutation, natural selection, and genetic drift. In turn, this genetic variation affects molecular phenotypes like gene expression and eventually leads to variation in complex traits. In my dissertation, I develop statistical methods and apply computational approaches to understand these dynamics in human populations. In the first chapter, I describe a statistical model for detecting the presence of archaic haplotypes in modern human populations without having access to a reference archaic genome. I apply this method to the genomes of individuals from Europe and find that I can recover segments of DNA inherited from Neanderthals as a result of archaic admixture. In the second chapter, I apply this method to the genomes of individuals from several African populations and find that approximately 7\% of the genomes are inherited from an archaic species. Modeling of the site frequency spectrum suggests that the presence of these haplotypes is best explained by admixture with an unknown archaic hominin species. In the final chapter, I focus on the more recent history of humans and the genetic architecture of complex traits. In particular, I find that a substantial portion of the genetic architecture is population specific, which limits our ability to transfer phenotype predictions across populations
ANGSD-wrapper example sequence data: 22 Maize and Teosinte Inbred Lines.
This is the example data set for ANGSD-wrapper. It contains bam files for 11 maize and 11 teosinte inbred lines as well as a reference sequence and out group (Tripsacum
A statistical model for reference-free inference of archaic local ancestry.
Statistical analyses of genomic data from diverse human populations have demonstrated that archaic hominins, such as Neanderthals and Denisovans, interbred or admixed with the ancestors of present-day humans. Central to these analyses are methods for inferring archaic ancestry along the genomes of present-day individuals (archaic local ancestry). Methods for archaic local ancestry inference rely on the availability of reference genomes from the ancestral archaic populations for accurate inference. However, several instances of archaic admixture lack reference archaic genomes, making it difficult to characterize these events. We present a statistical method that combines diverse population genetic summary statistics to infer archaic local ancestry without access to an archaic reference genome. We validate the accuracy and robustness of our method in simulations. When applied to genomes of European individuals, our method recovers segments that are substantially enriched for Neanderthal ancestry, even though our method did not have access to any Neanderthal reference genomes
Negative selection on complex traits limits phenotype prediction accuracy between populations
Phenotype prediction is a key goal for medical genetics. Unfortunately, most genome-wide association studies are done in European populations, which reduces the accuracy of predictions via polygenic scores in non-European populations. Here, we use population genetic models to show that human demographic history and negative selection on complex traits can result in population-specific genetic architectures. For traits where alleles with the largest effect on the trait are under the strongest negative selection, approximately half of the heritability can be accounted for by variants in Europe that are absent from Africa, leading to poor performance in phenotype prediction across these populations. Further, under such a model, individuals in the tails of the genetic risk distribution may not be identified via polygenic scores generated in another population. We empirically test these predictions by building a model to stratify heritability between European-specific and shared variants and applied it to 37 traits and diseases in the UK Biobank. Across these phenotypes, ∼30% of the heritability comes from European-specific variants. We conclude that genetic association studies need to include more diverse populations to enable the utility of phenotype prediction in all populations
Recommended from our members
Age-dependent topic modeling of comorbidities in UK Biobank identifies disease subtypes with differential genetic risk.
The analysis of longitudinal data from electronic health records (EHRs) has the potential to improve clinical diagnoses and enable personalized medicine, motivating efforts to identify disease subtypes from patient comorbidity information. Here we introduce an age-dependent topic modeling (ATM) method that provides a low-rank representation of longitudinal records of hundreds of distinct diseases in large EHR datasets. We applied ATM to 282,957 UK Biobank samples, identifying 52 diseases with heterogeneous comorbidity profiles; analyses of 211,908 All of Us samples produced concordant results. We defined subtypes of the 52 heterogeneous diseases based on their comorbidity profiles and compared genetic risk across disease subtypes using polygenic risk scores (PRSs), identifying 18 disease subtypes whose PRS differed significantly from other subtypes of the same disease. We further identified specific genetic variants with subtype-dependent effects on disease risk. In conclusion, ATM identifies disease subtypes with differential genome-wide and locus-specific genetic risk profiles
Data from: Natural selection interacts with recombination to shape the evolution of hybrid genomes
To investigate the consequences of hybridization between species, we studied three replicate hybrid populations that formed naturally between two swordtail fish species, estimating their fine-scale genetic map and inferring ancestry along the genomes of 690 individuals. In all three populations, ancestry from the “minor” parental species is more common in regions of high recombination and where there is linkage to fewer putative targets of selection. The same patterns are apparent in a reanalysis of human and archaic admixture. These results support models in which ancestry from the minor parental species is more likely to persist when rapidly uncoupled from alleles that are deleterious in hybrids. Our analyses further indicate that selection on swordtail hybrids stems predominantly from deleterious combinations of epistatically-interacting alleles