469 research outputs found

    Ancestral population genomics

    Get PDF
    The full genomes of several closely related species are now available, opening an emerging field of investigation borrowing both from population genetics and phylogenetics. Providing we can properly model sequence evolution within populations undergoing speciation events, this resource enables us to estimate key population genetics parameters, such as ancestral population sizes and split times. Furthermore, we can enhance our understanding of the recombination process and investigate various selective forces. We discuss the basic speciation models for closely related species, including the isolation and isolation-with-migration models. A major point in our discussion is that only a few complete genomes contain much information about the whole population. The reason being that recombination unlinks genomic regions, and therefore a few genomes contain many segments with distinct histories. The challenge of population genomics is to decode this mosaic of histories in order to infer scenarios of demography and selection. We survey different approaches for understanding ancestral species from analyses of genomic data from closely related species. In particular, we emphasize core assumptions and working hypothesis. Finally, we discuss computational and statistical challenges that arise in the analysis of population genomics data sets

    Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution

    Full text link
    Analyses of serially-sampled data often begin with the assumption that the observations represent discrete samples from a latent continuous-time stochastic process. The continuous-time Markov chain (CTMC) is one such generative model whose popularity extends to a variety of disciplines ranging from computational finance to human genetics and genomics. A common theme among these diverse applications is the need to simulate sample paths of a CTMC conditional on realized data that is discretely observed. Here we present a general solution to this sampling problem when the CTMC is defined on a discrete and finite state space. Specifically, we consider the generation of sample paths, including intermediate states and times of transition, from a CTMC whose beginning and ending states are known across a time interval of length TT. We first unify the literature through a discussion of the three predominant approaches: (1) modified rejection sampling, (2) direct sampling, and (3) uniformization. We then give analytical results for the complexity and efficiency of each method in terms of the instantaneous transition rate matrix QQ of the CTMC, its beginning and ending states, and the length of sampling time TT. In doing so, we show that no method dominates the others across all model specifications, and we give explicit proof of which method prevails for any given Q,T,Q,T, and endpoints. Finally, we introduce and compare three applications of CTMCs to demonstrate the pitfalls of choosing an inefficient sampler.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS247 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Border control cooperation in the European Union: the Schengen visa policy in practice

    Get PDF
    This research project investigates the governing of Europe’s external border. It analyses how the common Schengen short-stay visa policy has been applied in practice by member states in the period from 2005 to 2010. So far, little systematic theoretical and empirical research has been carried out on the implementation of Schengen. The contributions of the thesis are two-fold. Firstly, it makes available a comprehensive and easily accessible database on the visa requirements, issuing-practices and consular representation of EU states in all third countries. It enables researchers to map out and compare how restrictively the visa policy is implemented by different member states and across sending countries. Secondly, the project provides three separate papers that in different ways make use of the database to explore and explain the varying openness of Europe’s border and dynamics of cooperation among member states. The three papers are tied together by a framework conceptualising Schengen as a border regime with two key dimensions: restrictiveness and integration. The first paper asks to what extent, and why, Europe’s border is more open to visitors of some nationalities rather than others. The second paper investigates to what extent, and why, EU states cooperate on sharing consular facilities in the visa-issuing process. The third paper examines to what extent, and why, Schengen participation has a restrictive impact on the visa-issuing practices of member countries. The analyses test existing theories and develop new concepts and models. The three papers engage with rationalist and constructivist theories and seek to assess their relative explanatory power. In doing so, the project makes use of different quantitative comparative approaches. It employs regression analysis, social network analytical tools and quasi-experimental design. Overall, the thesis concludes that Schengen is characterized by extensive cooperation and restrictive practices towards especially visitors from poor, Muslim-majority and refugeeproducing countries

    Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Continuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level. The sufficient statistics for CTMCs are the time spent in a state and the number of changes between any two states. In applications past evolutionary events (exact times and types of changes) are unaccessible and the past must be inferred from DNA sequence data observed in the present.</p> <p>Results</p> <p>We describe and implement three algorithms for computing linear combinations of expected values of the sufficient statistics, conditioned on the end-points of the chain, and compare their performance with respect to accuracy and running time. The first algorithm is based on an eigenvalue decomposition of the rate matrix (EVD), the second on uniformization (UNI), and the third on integrals of matrix exponentials (EXPM). The implementation in R of the algorithms is available at <url>http://www.birc.au.dk/~paula/</url>.</p> <p>Conclusions</p> <p>We use two different models to analyze the accuracy and eight experiments to investigate the speed of the three algorithms. We find that they have similar accuracy and that EXPM is the slowest method. Furthermore we find that UNI is usually faster than EVD.</p

    Importance sampling for Lambda-coalescents in the infinitely many sites model

    Full text link
    We present and discuss new importance sampling schemes for the approximate computation of the sample probability of observed genetic types in the infinitely many sites model from population genetics. More specifically, we extend the 'classical framework', where genealogies are assumed to be governed by Kingman's coalescent, to the more general class of Lambda-coalescents and develop further Hobolth et. al.'s (2008) idea of deriving importance sampling schemes based on 'compressed genetrees'. The resulting schemes extend earlier work by Griffiths and Tavar\'e (1994), Stephens and Donnelly (2000), Birkner and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance comparison of classical and new schemes for Beta- and Kingman coalescents.Comment: (38 pages, 40 figures

    Phase-type distributions in population genetics

    Get PDF
    Probability modelling for DNA sequence evolution is well established and provides a rich framework for understanding genetic variation between samples of individuals from one or more populations. We show that both classical and more recent models for coalescence (with or without recombination) can be described in terms of the so-called phase-type theory, where complicated and tedious calculations are circumvented by the use of matrices. The application of phase-type theory consists of describing the stochastic model as a Markov model by appropriately setting up a state space and calculating the corresponding intensity and reward matrices. Formulae of interest are then expressed in terms of these aforementioned matrices. We illustrate this by a few examples calculating the mean, variance and even higher order moments of the site frequency spectrum in the multiple merger coalescent models, and by analysing the mean and variance for the number of segregating sites for multiple samples in the two-locus ancestral recombination graph. We believe that phase-type theory has great potential as a tool for analysing probability models in population genetics. The compact matrix notation is useful for clarification of current models, in particular their formal manipulation (calculation), but also for further development or extensions
    corecore