196 research outputs found

    Two-Locus Likelihoods under Variable Population Size and Fine-Scale Recombination Rate Estimation

    Full text link
    Two-locus sampling probabilities have played a central role in devising an efficient composite likelihood method for estimating fine-scale recombination rates. Due to mathematical and computational challenges, these sampling probabilities are typically computed under the unrealistic assumption of a constant population size, and simulation studies have shown that resulting recombination rate estimates can be severely biased in certain cases of historical population size changes. To alleviate this problem, we develop here new methods to compute the sampling probability for variable population size functions that are piecewise constant. Our main theoretical result, implemented in a new software package called LDpop, is a novel formula for the sampling probability that can be evaluated by numerically exponentiating a large but sparse matrix. This formula can handle moderate sample sizes (n≤50n \leq 50) and demographic size histories with a large number of epochs (D≥64\mathcal{D} \geq 64). In addition, LDpop implements an approximate formula for the sampling probability that is reasonably accurate and scales to hundreds in sample size (n≥256n \geq 256). Finally, LDpop includes an importance sampler for the posterior distribution of two-locus genealogies, based on a new result for the optimal proposal distribution in the variable-size setting. Using our methods, we study how a sharp population bottleneck followed by rapid growth affects the correlation between partially linked sites. Then, through an extensive simulation study, we show that accounting for population size changes under such a demographic model leads to substantial improvements in fine-scale recombination rate estimation. LDpop is freely available for download at https://github.com/popgenmethods/ldpopComment: 32 pages, 13 figure

    Inference of Population History using Coalescent HMMs: Review and Outlook

    Full text link
    Studying how diverse human populations are related is of historical and anthropological interest, in addition to providing a realistic null model for testing for signatures of natural selection or disease associations. Furthermore, understanding the demographic histories of other species is playing an increasingly important role in conservation genetics. A number of statistical methods have been developed to infer population demographic histories using whole-genome sequence data, with recent advances focusing on allowing for more flexible modeling choices, scaling to larger data sets, and increasing statistical power. Here we review coalescent hidden Markov models, a powerful class of population genetic inference methods that can effectively utilize linkage disequilibrium information. We highlight recent advances, give advice for practitioners, point out potential pitfalls, and present possible future research directions.Comment: 12 pages, 2 figure

    A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

    Full text link
    An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the symmetries of the data, and (2) computing likelihoods is intractable as it requires integrating over a set of correlated, extremely high-dimensional latent variables. These challenges are traditionally tackled by likelihood-free methods that use scientific simulators to generate datasets and reduce them to hand-designed, permutation-invariant summary statistics, often leading to inaccurate inference. In this work, we develop an exchangeable neural network that performs summary statistic-free, likelihood-free inference. Our framework can be applied in a black-box fashion across a variety of simulation-based tasks, both within and outside biology. We demonstrate the power of our approach on the recombination hotspot testing problem, outperforming the state-of-the-art.Comment: 9 pages, 8 figure

    Flexible non-parametric tests of sample exchangeability and feature independence

    Full text link
    In scientific studies involving analyses of multivariate data, two questions often arise for the researcher. First, is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Second, are the features independent of one another, or can the features be grouped so that the groups are mutually independent? We propose a non-parametric approach that addresses these two questions. Our approach is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. In the exchangeability detection setting, through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our approach compares favorably in various scenarios of interest. We apply our method to problems in population and statistical genetics, including stratification detection and linkage disequilibrium splitting. We also consider other application domains, applying our approach to post-clustering single-cell chromatin accessibility data and World Values Survey data, where we show how users can partition features into independent groups, which helps generate new scientific hypotheses about the features.Comment: Main Text: 25 pages Supplementary Material: 39 page

    Cognitive Training and Transcranial Direct Current Stimulation in Mild Cognitive Impairment: A Randomized Pilot Trial

    Get PDF
    BackgroundTranscranial direct current stimulation (tDCS), a non-invasive stimulation, represents a potential intervention to enhance cognition across clinical populations including Alzheimer’s disease and mild cognitive impairment (MCI). This randomized clinical trial in MCI investigated the effects of anodal tDCS (a-tDCS) delivered to left inferior frontal gyrus (IFG) combined with gist-reasoning training (SMART) versus sham tDCS (s-tDCS) plus SMART on measures of cognitive and neural changes in resting cerebral blood flow (rCBF). We were also interested in SMART effects on cognitive performance regardless of the tDCS group.MethodsTwenty-two MCI participants, who completed the baseline cognitive assessment (T1), were randomized into one of two groups: a-tDCS + SMART and s-tDCS + SMART. Of which, 20 participants completed resting pCASL MRI scan to measure rCBF. Eight SMART sessions were administered over 4 weeks with a-tDCS or s-tDCS stimulation for 20 min before each session. Participants were assessed immediately (T2) and 3-months after training (T3).ResultsSignificant group × time interactions showed cognitive gains at T2 in executive function (EF) measure of inhibition [DKEFS- Color word (p = 0.047)], innovation [TOSL (p = 0.01)] and on episodic memory [TOSL (p = 0.048)] in s-tDCS + SMART but not in a-tDCS + SMART group. Nonetheless, the gains did not persist for 3 months (T3) after the training. A voxel-based analysis showed significant increase in regional rCBF in the right middle frontal cortex (MFC) (cluster-wise p = 0.05, k = 1,168 mm3) in a-tDCS + SMART compared to s-tDCS + SMART. No significant relationship was observed between the increased CBF with cognition. Irrespective of group, the combined MCI showed gains at T2 in EF of conceptual reasoning [DKEFS card sort (p = 0.033)] and category fluency [COWAT (p = 0.055)], along with gains at T3 in EF of verbal fluency [COWAT (p = 0.009)].ConclusionOne intriguing finding is a-tDCS to left IFG plus SMART increased blood flow to right MFC, however, the stimulation seemingly blocked cognitive benefits of SMART on EF (inhibition and innovation) and episodic memory compared to s-tDCS + SMART group. Although the sample size is small, this paper contributes to growing evidence that cognitive training provides a way to significantly enhance cognitive performance in adults showing memory loss, where the role of a-tDCS in augmenting these effects need further study

    Ethical Considerations for the Clinical Oncologist in an Era of Oncology Drug Shortages

    Get PDF
    Shortages of injectable drugs affect many cancer patients and providers in the U.S. today. Scholars and policymakers have recently begun to devote increased attention to these issues, but only a few tangible resources exist to guide clinical oncologists in developing strategies for dealing with drug shortages on a recurring basis. This article discusses existing information from the scholarly literature, policy analyses, and other relevant sources and seeks to provide practical ethical guidance to the broad audience of oncology professionals who are increasingly confronted with such cases in their practice. We begin by providing a brief overview of the history, causes, and regulatory context of oncology drug shortages in the U.S., followed by a discussion of ethical frameworks that have been proposed in this setting. We conclude with practical recommendations for ethical professional behavior in these increasingly common and challenging situations

    Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

    Get PDF
    Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone

    The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles

    Get PDF
    A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research

    The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles

    Get PDF
    A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research
    • …
    corecore