149 research outputs found
Two-Locus Likelihoods under Variable Population Size and Fine-Scale Recombination Rate Estimation
Two-locus sampling probabilities have played a central role in devising an
efficient composite likelihood method for estimating fine-scale recombination
rates. Due to mathematical and computational challenges, these sampling
probabilities are typically computed under the unrealistic assumption of a
constant population size, and simulation studies have shown that resulting
recombination rate estimates can be severely biased in certain cases of
historical population size changes. To alleviate this problem, we develop here
new methods to compute the sampling probability for variable population size
functions that are piecewise constant. Our main theoretical result, implemented
in a new software package called LDpop, is a novel formula for the sampling
probability that can be evaluated by numerically exponentiating a large but
sparse matrix. This formula can handle moderate sample sizes () and
demographic size histories with a large number of epochs (). In addition, LDpop implements an approximate formula for the sampling
probability that is reasonably accurate and scales to hundreds in sample size
(). Finally, LDpop includes an importance sampler for the posterior
distribution of two-locus genealogies, based on a new result for the optimal
proposal distribution in the variable-size setting. Using our methods, we study
how a sharp population bottleneck followed by rapid growth affects the
correlation between partially linked sites. Then, through an extensive
simulation study, we show that accounting for population size changes under
such a demographic model leads to substantial improvements in fine-scale
recombination rate estimation. LDpop is freely available for download at
https://github.com/popgenmethods/ldpopComment: 32 pages, 13 figure
Inference of Population History using Coalescent HMMs: Review and Outlook
Studying how diverse human populations are related is of historical and
anthropological interest, in addition to providing a realistic null model for
testing for signatures of natural selection or disease associations.
Furthermore, understanding the demographic histories of other species is
playing an increasingly important role in conservation genetics. A number of
statistical methods have been developed to infer population demographic
histories using whole-genome sequence data, with recent advances focusing on
allowing for more flexible modeling choices, scaling to larger data sets, and
increasing statistical power. Here we review coalescent hidden Markov models, a
powerful class of population genetic inference methods that can effectively
utilize linkage disequilibrium information. We highlight recent advances, give
advice for practitioners, point out potential pitfalls, and present possible
future research directions.Comment: 12 pages, 2 figure
A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks
An explosion of high-throughput DNA sequencing in the past decade has led to
a surge of interest in population-scale inference with whole-genome data.
Recent work in population genetics has centered on designing inference methods
for relatively simple model classes, and few scalable general-purpose inference
techniques exist for more realistic, complex models. To achieve this, two
inferential challenges need to be addressed: (1) population data are
exchangeable, calling for methods that efficiently exploit the symmetries of
the data, and (2) computing likelihoods is intractable as it requires
integrating over a set of correlated, extremely high-dimensional latent
variables. These challenges are traditionally tackled by likelihood-free
methods that use scientific simulators to generate datasets and reduce them to
hand-designed, permutation-invariant summary statistics, often leading to
inaccurate inference. In this work, we develop an exchangeable neural network
that performs summary statistic-free, likelihood-free inference. Our framework
can be applied in a black-box fashion across a variety of simulation-based
tasks, both within and outside biology. We demonstrate the power of our
approach on the recombination hotspot testing problem, outperforming the
state-of-the-art.Comment: 9 pages, 8 figure
Flexible non-parametric tests of sample exchangeability and feature independence
In scientific studies involving analyses of multivariate data, two questions
often arise for the researcher. First, is the sample exchangeable, meaning that
the joint distribution of the sample is invariant to the ordering of the units?
Second, are the features independent of one another, or can the features be
grouped so that the groups are mutually independent? We propose a
non-parametric approach that addresses these two questions. Our approach is
conceptually simple, yet fast and flexible. It controls the Type I error across
realistic scenarios, and handles data of arbitrary dimensions by leveraging
large-sample asymptotics. In the exchangeability detection setting, through
extensive simulations and a comparison against unsupervised tests of
stratification based on random matrix theory, we find that our approach
compares favorably in various scenarios of interest. We apply our method to
problems in population and statistical genetics, including stratification
detection and linkage disequilibrium splitting. We also consider other
application domains, applying our approach to post-clustering single-cell
chromatin accessibility data and World Values Survey data, where we show how
users can partition features into independent groups, which helps generate new
scientific hypotheses about the features.Comment: Main Text: 25 pages Supplementary Material: 39 page
Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone
The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles
A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research
Antimicrobial resistance among migrants in Europe: a systematic review and meta-analysis
BACKGROUND: Rates of antimicrobial resistance (AMR) are rising globally and there is concern that increased migration is contributing to the burden of antibiotic resistance in Europe. However, the effect of migration on the burden of AMR in Europe has not yet been comprehensively examined. Therefore, we did a systematic review and meta-analysis to identify and synthesise data for AMR carriage or infection in migrants to Europe to examine differences in patterns of AMR across migrant groups and in different settings. METHODS: For this systematic review and meta-analysis, we searched MEDLINE, Embase, PubMed, and Scopus with no language restrictions from Jan 1, 2000, to Jan 18, 2017, for primary data from observational studies reporting antibacterial resistance in common bacterial pathogens among migrants to 21 European Union-15 and European Economic Area countries. To be eligible for inclusion, studies had to report data on carriage or infection with laboratory-confirmed antibiotic-resistant organisms in migrant populations. We extracted data from eligible studies and assessed quality using piloted, standardised forms. We did not examine drug resistance in tuberculosis and excluded articles solely reporting on this parameter. We also excluded articles in which migrant status was determined by ethnicity, country of birth of participants' parents, or was not defined, and articles in which data were not disaggregated by migrant status. Outcomes were carriage of or infection with antibiotic-resistant organisms. We used random-effects models to calculate the pooled prevalence of each outcome. The study protocol is registered with PROSPERO, number CRD42016043681. FINDINGS: We identified 2274 articles, of which 23 observational studies reporting on antibiotic resistance in 2319 migrants were included. The pooled prevalence of any AMR carriage or AMR infection in migrants was 25·4% (95% CI 19·1-31·8; I2 =98%), including meticillin-resistant Staphylococcus aureus (7·8%, 4·8-10·7; I2 =92%) and antibiotic-resistant Gram-negative bacteria (27·2%, 17·6-36·8; I2 =94%). The pooled prevalence of any AMR carriage or infection was higher in refugees and asylum seekers (33·0%, 18·3-47·6; I2 =98%) than in other migrant groups (6·6%, 1·8-11·3; I2 =92%). The pooled prevalence of antibiotic-resistant organisms was slightly higher in high-migrant community settings (33·1%, 11·1-55·1; I2 =96%) than in migrants in hospitals (24·3%, 16·1-32·6; I2 =98%). We did not find evidence of high rates of transmission of AMR from migrant to host populations. INTERPRETATION: Migrants are exposed to conditions favouring the emergence of drug resistance during transit and in host countries in Europe. Increased antibiotic resistance among refugees and asylum seekers and in high-migrant community settings (such as refugee camps and detention facilities) highlights the need for improved living conditions, access to health care, and initiatives to facilitate detection of and appropriate high-quality treatment for antibiotic-resistant infections during transit and in host countries. Protocols for the prevention and control of infection and for antibiotic surveillance need to be integrated in all aspects of health care, which should be accessible for all migrant groups, and should target determinants of AMR before, during, and after migration. FUNDING: UK National Institute for Health Research Imperial Biomedical Research Centre, Imperial College Healthcare Charity, the Wellcome Trust, and UK National Institute for Health Research Health Protection Research Unit in Healthcare-associated Infections and Antimictobial Resistance at Imperial College London
Economic Analysis of Knowledge: The History of Thought and the Central Themes
Following the development of knowledge economies, there has been a rapid expansion of economic analysis of knowledge, both in the context of technological knowledge in particular and the decision theory in general. This paper surveys this literature by identifying the main themes and contributions and outlines the future prospects of the discipline. The wide scope of knowledge related questions in terms of applicability and alternative approaches has led to the fragmentation of research. Nevertheless, one can identify a continuing tradition which analyses various aspects of the generation, dissemination and use of knowledge in the economy
- …