469 research outputs found
Ancestral population genomics
The full genomes of several closely related species are now available, opening an emerging field of investigation borrowing both from population genetics and phylogenetics. Providing we can properly model sequence evolution within populations undergoing speciation events, this resource enables us to estimate key population genetics parameters, such as ancestral population sizes and split times. Furthermore, we can enhance our understanding of the recombination process and investigate various selective forces. We discuss the basic speciation models for closely related species, including the isolation and isolation-with-migration models. A major point in our discussion is that only a few complete genomes contain much information about the whole population. The reason being that recombination unlinks genomic regions, and therefore a few genomes contain many segments with distinct histories. The challenge of population genomics is to decode this mosaic of histories in order to infer scenarios of demography and selection. We survey different approaches for understanding ancestral species from analyses of genomic data from closely related species. In particular, we emphasize core assumptions and working hypothesis. Finally, we discuss computational and statistical challenges that arise in the analysis of population genomics data sets
Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution
Analyses of serially-sampled data often begin with the assumption that the
observations represent discrete samples from a latent continuous-time
stochastic process. The continuous-time Markov chain (CTMC) is one such
generative model whose popularity extends to a variety of disciplines ranging
from computational finance to human genetics and genomics. A common theme among
these diverse applications is the need to simulate sample paths of a CTMC
conditional on realized data that is discretely observed. Here we present a
general solution to this sampling problem when the CTMC is defined on a
discrete and finite state space. Specifically, we consider the generation of
sample paths, including intermediate states and times of transition, from a
CTMC whose beginning and ending states are known across a time interval of
length . We first unify the literature through a discussion of the three
predominant approaches: (1) modified rejection sampling, (2) direct sampling,
and (3) uniformization. We then give analytical results for the complexity and
efficiency of each method in terms of the instantaneous transition rate matrix
of the CTMC, its beginning and ending states, and the length of sampling
time . In doing so, we show that no method dominates the others across all
model specifications, and we give explicit proof of which method prevails for
any given and endpoints. Finally, we introduce and compare three
applications of CTMCs to demonstrate the pitfalls of choosing an inefficient
sampler.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS247 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Border control cooperation in the European Union: the Schengen visa policy in practice
This research project investigates the governing of Europe’s external border. It analyses how the common Schengen short-stay visa policy has been applied in practice by member states in the period from 2005 to 2010. So far, little
systematic theoretical and empirical research has been carried out on the implementation of Schengen. The contributions of the thesis are two-fold. Firstly,
it makes available a comprehensive and easily accessible database on the visa requirements, issuing-practices and consular representation of EU states in all third countries. It enables researchers to map out and compare how restrictively the visa policy is implemented by different member states and across sending countries. Secondly, the project provides three separate papers that in different
ways make use of the database to explore and explain the varying openness of Europe’s border and dynamics of cooperation among member states. The three papers are tied together by a framework conceptualising Schengen as a border
regime with two key dimensions: restrictiveness and integration. The first paper asks to what extent, and why, Europe’s border is more open to visitors of some nationalities rather than others. The second paper investigates to what extent, and why, EU states cooperate on sharing consular facilities in the visa-issuing process. The third paper examines to what extent, and why, Schengen
participation has a restrictive impact on the visa-issuing practices of member countries. The analyses test existing theories and develop new concepts and models. The three papers engage with rationalist and constructivist theories and seek to assess their relative explanatory power. In doing so, the project makes use of different quantitative comparative approaches. It employs regression analysis,
social network analytical tools and quasi-experimental design. Overall, the thesis concludes that Schengen is characterized by extensive cooperation and restrictive
practices towards especially visitors from poor, Muslim-majority and refugeeproducing countries
Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains
<p>Abstract</p> <p>Background</p> <p>Continuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level. The sufficient statistics for CTMCs are the time spent in a state and the number of changes between any two states. In applications past evolutionary events (exact times and types of changes) are unaccessible and the past must be inferred from DNA sequence data observed in the present.</p> <p>Results</p> <p>We describe and implement three algorithms for computing linear combinations of expected values of the sufficient statistics, conditioned on the end-points of the chain, and compare their performance with respect to accuracy and running time. The first algorithm is based on an eigenvalue decomposition of the rate matrix (EVD), the second on uniformization (UNI), and the third on integrals of matrix exponentials (EXPM). The implementation in R of the algorithms is available at <url>http://www.birc.au.dk/~paula/</url>.</p> <p>Conclusions</p> <p>We use two different models to analyze the accuracy and eight experiments to investigate the speed of the three algorithms. We find that they have similar accuracy and that EXPM is the slowest method. Furthermore we find that UNI is usually faster than EVD.</p
Importance sampling for Lambda-coalescents in the infinitely many sites model
We present and discuss new importance sampling schemes for the approximate
computation of the sample probability of observed genetic types in the
infinitely many sites model from population genetics. More specifically, we
extend the 'classical framework', where genealogies are assumed to be governed
by Kingman's coalescent, to the more general class of Lambda-coalescents and
develop further Hobolth et. al.'s (2008) idea of deriving importance sampling
schemes based on 'compressed genetrees'. The resulting schemes extend earlier
work by Griffiths and Tavar\'e (1994), Stephens and Donnelly (2000), Birkner
and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance
comparison of classical and new schemes for Beta- and Kingman coalescents.Comment: (38 pages, 40 figures
Phase-type distributions in population genetics
Probability modelling for DNA sequence evolution is well established and
provides a rich framework for understanding genetic variation between samples
of individuals from one or more populations. We show that both classical and
more recent models for coalescence (with or without recombination) can be
described in terms of the so-called phase-type theory, where complicated and
tedious calculations are circumvented by the use of matrices. The application
of phase-type theory consists of describing the stochastic model as a Markov
model by appropriately setting up a state space and calculating the
corresponding intensity and reward matrices. Formulae of interest are then
expressed in terms of these aforementioned matrices. We illustrate this by a
few examples calculating the mean, variance and even higher order moments of
the site frequency spectrum in the multiple merger coalescent models, and by
analysing the mean and variance for the number of segregating sites for
multiple samples in the two-locus ancestral recombination graph. We believe
that phase-type theory has great potential as a tool for analysing probability
models in population genetics. The compact matrix notation is useful for
clarification of current models, in particular their formal manipulation
(calculation), but also for further development or extensions
- …