Search CORE

8 research outputs found

A community-maintained standard library of population genetic models

Author: Adrion Jeffrey R.
Baumdicker Franz
Carlson Jedidiah
Cartwright Reed A.
Cole Christopher B.
Dukler Noah
Durvasula Arun
Galloway Jared G.
Gladstein Ariella L.
Gower Graham
Gravel Simon
Gronau Ilan
Gutenkunst Ryan N.
Kelleher Jerome
Kern Andrew D.
Kim Bernard Y.
Kyriazis Christopher C.
Lohmueller Kirk E.
McKenzie Patrick
Messer Philipp W.
Noskova Ekaterina
Ortega-Del Vecchyo Diego
Racimo Fernando
Ragsdale Aaron P.
Ralph Peter L.
Schrider Daniel R.
Siepel Adam
Struck Travis J.
Tsambos Georgia
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 01/01/2020
Field of study

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

Copenhagen University Research Information System

The University of Arizona

Efficient ancestry and mutation simulation with msprime 1.0

Author: Baumdicker Franz
Bisschop Gertjan
Eldon Bjarki
Ellerman Castedo E.
Galloway Jared G.
Gladstein Ariella L.
Goldstein Daniel
Gorjanc Gregor
Gower Graham
Gravel Simon
Guo Bing
Jeffery Ben
Kelleher Jerome
Kern Andrew D.
Koskela Jere
Kretzschmar Warren W.
Lohse Konrad
Matschiner Michael
Nelson Dominic
Pope Nathaniel S.
Quinto-Cortés Consuelo D.
Ragsdale Aaron P.
Ralph Peter L.
Rodrigues Murillo F.
Saunack Kumar
Sellinger Thibaut
Thornton Kevin
Tsambos Georgia
van Kemenade Hugo
Wohns Anthony W.
Wong H. Yan
Zhu Sha
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/09/2021
Field of study

Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime’s many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement

Copenhagen University Research Information System

PubMed Central

Edinburgh Research Explorer

eScholarship - University of California

Warwick Research Archives Portal Repository

Efficient ancestry and mutation simulation with msprime 1.0

Author: Baumdicker Franz
Bisschop Gertjan
Eldon Bjarki
Ellerman Castedo E.
Galloway Jared G.
Gladstein Ariella L.
Goldstein Daniel
Gorjanc Gregor
Gower Graham
Gravel Simon
Guo Bing
Jeffery Ben
Kelleher Jerome
Kern Andrew D.
Koskela Jere
Kretzschmar Warren W.
Lohse Konrad
Matschiner Michael
Nelson Dominic
Pope Nathaniel S.
Quinto-Cortés Consuelo D.
Ragsdale Aaron P.
Ralph Peter L.
Rodrigues Murillo F.
Saunack Kumar
Sellinger Thibaut
Thornton Kevin
Tsambos Georgia
van Kemenade Hugo
Wohns Anthony W.
Wong H. Yan
Zhu Sha
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/09/2021
Field of study

Edinburgh Research Explorer

Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone

Publikationer från Uppsala Universitet

Edinburgh Research Explorer

eScholarship - University of California

Digitala Vetenskapliga Arkivet - Academic Archive On-line

SimPrily: A Python framework to simplify high-throughput genomic simulations

Author: Ariella L. Gladstein
Blake L. Joyce
Consuelo D. Quinto-Cortés
David Christy
Julian L. Pistorius
Logan Gantner
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Genomic simulations are an important technique used in population genetics to infer demographic history, test for regions under selection, and create datasets to validate software. However, running thousands of simulations and manipulating large loci can present computational challenges. We present SimPrily, a Python tool optimized for high throughput computing (HTC), which facilitates simulation of whole chromosomes. SimPrily can use prior distributions of parameters to run simulations, incorporate single nucleotide polymorphism array ascertainment bias into the simulated model, and calculate a variety of genomic summary statistics. We include with SimPrily high-throughput workflows that leverage free computing resources through the Open Science Grid and CyVerse Discovery Environment, allowing researchers to run thousands or millions of large-locus simulations with minimal or no prior command line knowledge. Keywords: Genomics, Coalescent simulation, High-throughput computing, Demographic histor

Directory of Open Access Journals

The University of Arizona

Recommended from our members

Efficient ancestry and mutation simulation with msprime 1.0.

Author: Baumdicker Franz
Bisschop Gertjan
Eldon Bjarki
Ellerman E Castedo
Galloway Jared G
Gladstein Ariella L
Goldstein Daniel
Gorjanc Gregor
Gower Graham
Gravel Simon
Guo Bing
Jeffery Ben
Kelleher Jerome
Kern Andrew D
Koskela Jere
Kretzschumar Warren W
Lohse Konrad
Matschiner Michael
Nelson Dominic
Pope Nathaniel S
Quinto-Cortés Consuelo D
Ragsdale Aaron P
Ralph Peter L
Rodrigues Murillo F
Saunack Kumar
Sellinger Thibaut
Thornton Kevin
Tsambos Georgia
van Kemenade Hugo
Wohns Anthony W
Wong Yan
Zhu Sha
Publication venue: eScholarship, University of California
Publication date: 01/03/2022
Field of study

Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement

eScholarship - University of California

Recommended from our members

Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

eScholarship - University of California

Recommended from our members

Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations.

Peer reviewed: TrueAcknowledgements: We wish to thank the dozens of workshop attendees, and especially the two dozen or so hackathon participants, whose combined feedback motivated many of the updates made to stdpopsim in the past two years.Funder: Robertson Foundation; FundRef: http://dx.doi.org/10.13039/100013961Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone

Apollo (Cambridge)