Search CORE

469 research outputs found

Ancestral population genomics

Author: Dutheil J.
Hobolth A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

The full genomes of several closely related species are now available, opening an emerging field of investigation borrowing both from population genetics and phylogenetics. Providing we can properly model sequence evolution within populations undergoing speciation events, this resource enables us to estimate key population genetics parameters, such as ancestral population sizes and split times. Furthermore, we can enhance our understanding of the recombination process and investigate various selective forces. We discuss the basic speciation models for closely related species, including the isolation and isolation-with-migration models. A major point in our discussion is that only a few complete genomes contain much information about the whole population. The reason being that recombination unlinks genomic regions, and therefore a few genomes contain many segments with distinct histories. The challenge of population genomics is to decode this mosaic of histories in order to infer scenarios of demography and selection. We survey different approaches for understanding ancestral species from analyses of genomic data from closely related species. In particular, we emphasize core assumptions and working hypothesis. Finally, we discuss computational and statistical challenges that arise in the analysis of population genomics data sets

MPG.PuRe

Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution

Author: Hobolth Asger
Stone Eric A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

Analyses of serially-sampled data often begin with the assumption that the observations represent discrete samples from a latent continuous-time stochastic process. The continuous-time Markov chain (CTMC) is one such generative model whose popularity extends to a variety of disciplines ranging from computational finance to human genetics and genomics. A common theme among these diverse applications is the need to simulate sample paths of a CTMC conditional on realized data that is discretely observed. Here we present a general solution to this sampling problem when the CTMC is defined on a discrete and finite state space. Specifically, we consider the generation of sample paths, including intermediate states and times of transition, from a CTMC whose beginning and ending states are known across a time interval of length

T

. We first unify the literature through a discussion of the three predominant approaches: (1) modified rejection sampling, (2) direct sampling, and (3) uniformization. We then give analytical results for the complexity and efficiency of each method in terms of the instantaneous transition rate matrix

Q

of the CTMC, its beginning and ending states, and the length of sampling time

T

. In doing so, we show that no method dominates the others across all model specifications, and we give explicit proof of which method prevails for any given

Q,T,

and endpoints. Finally, we introduce and compare three applications of CTMCs to demonstrate the pitfalls of choosing an inefficient sampler.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS247 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Border control cooperation in the European Union: the Schengen visa policy in practice

Author: Hobolth Mogens
Publication venue
Publication date: 01/12/2012
Field of study

This research project investigates the governing of Europe’s external border. It analyses how the common Schengen short-stay visa policy has been applied in practice by member states in the period from 2005 to 2010. So far, little systematic theoretical and empirical research has been carried out on the implementation of Schengen. The contributions of the thesis are two-fold. Firstly, it makes available a comprehensive and easily accessible database on the visa requirements, issuing-practices and consular representation of EU states in all third countries. It enables researchers to map out and compare how restrictively the visa policy is implemented by different member states and across sending countries. Secondly, the project provides three separate papers that in different ways make use of the database to explore and explain the varying openness of Europe’s border and dynamics of cooperation among member states. The three papers are tied together by a framework conceptualising Schengen as a border regime with two key dimensions: restrictiveness and integration. The first paper asks to what extent, and why, Europe’s border is more open to visitors of some nationalities rather than others. The second paper investigates to what extent, and why, EU states cooperate on sharing consular facilities in the visa-issuing process. The third paper examines to what extent, and why, Schengen participation has a restrictive impact on the visa-issuing practices of member countries. The analyses test existing theories and develop new concepts and models. The three papers engage with rationalist and constructivist theories and seek to assess their relative explanatory power. In doing so, the project makes use of different quantitative comparative approaches. It employs regression analysis, social network analytical tools and quasi-experimental design. Overall, the thesis concludes that Schengen is characterized by extensive cooperation and restrictive practices towards especially visitors from poor, Muslim-majority and refugeeproducing countries

LSE Theses Online

Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains

Author: Hobolth Asger
Tataru Paula
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Continuous time Markov chains (CTMCs) is a widely used model for describing the evolution of DNA sequences on the nucleotide, amino acid or codon level. The sufficient statistics for CTMCs are the time spent in a state and the number of changes between any two states. In applications past evolutionary events (exact times and types of changes) are unaccessible and the past must be inferred from DNA sequence data observed in the present. Results We describe and implement three algorithms for computing linear combinations of expected values of the sufficient statistics, conditioned on the end-points of the chain, and compare their performance with respect to accuracy and running time. The first algorithm is based on an eigenvalue decomposition of the rate matrix (EVD), the second on uniformization (UNI), and the third on integrals of matrix exponentials (EXPM). The implementation in R of the algorithms is available at <url>http://www.birc.au.dk/~paula/</url>. Conclusions We use two different models to analyze the accuracy and eight experiments to investigate the speed of the three algorithms. We find that they have similar accuracy and that EXPM is the slowest method. Furthermore we find that UNI is usually faster than EVD.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Importance sampling for Lambda-coalescents in the infinitely many sites model

Author: Birkner
Birkner
Carr
Dong
Eldon
Ethier
Felsenstein
Griffiths
Griffiths
Griffiths
Griffiths
Hobolth
Hobolth
Jochen Blath
Matthias Birkner
Matthias Steinrücken
Möhle
Pepin
Pitman
Rogers
Sagitov
Schweinsberg
Sigurgíslason
Stephens
Tavaré
Ward
Árnason
Árnason
Árnason
Árnason
Publication venue: 'Elsevier BV'
Publication date: 09/05/2011
Field of study

We present and discuss new importance sampling schemes for the approximate computation of the sample probability of observed genetic types in the infinitely many sites model from population genetics. More specifically, we extend the 'classical framework', where genealogies are assumed to be governed by Kingman's coalescent, to the more general class of Lambda-coalescents and develop further Hobolth et. al.'s (2008) idea of deriving importance sampling schemes based on 'compressed genetrees'. The resulting schemes extend earlier work by Griffiths and Tavar\'e (1994), Stephens and Donnelly (2000), Birkner and Blath (2008) and Hobolth et. al. (2008). We conclude with a performance comparison of classical and new schemes for Beta- and Kingman coalescents.Comment: (38 pages, 40 figures

arXiv.org e-Print Archive

Crossref

Transgovernmental networks in the European Union:Improving compliance effectively?

Author: Hobolth Mogens
Martinsen Dorte Sindbjerg
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2013
Field of study

Copenhagen University Research Information System

The Effectiveness of Transgovernmental Networks:Managing the Practical Application of European Integration in the case of Solvit.

Author: Hobolth Mogens
Martinsen Dorte Sindbjerg
Publication venue: 'Edward Elgar Publishing'
Publication date: 01/01/2016
Field of study

Copenhagen University Research Information System

Phase-type distributions in population genetics

Author: Bladt Mogens
Hobolth Asger
Siri-Jégousse Arno
Publication venue: 'Elsevier BV'
Publication date: 04/06/2018
Field of study

Probability modelling for DNA sequence evolution is well established and provides a rich framework for understanding genetic variation between samples of individuals from one or more populations. We show that both classical and more recent models for coalescence (with or without recombination) can be described in terms of the so-called phase-type theory, where complicated and tedious calculations are circumvented by the use of matrices. The application of phase-type theory consists of describing the stochastic model as a Markov model by appropriately setting up a state space and calculating the corresponding intensity and reward matrices. Formulae of interest are then expressed in terms of these aforementioned matrices. We illustrate this by a few examples calculating the mean, variance and even higher order moments of the site frequency spectrum in the multiple merger coalescent models, and by analysing the mean and variance for the number of segregating sites for multiple samples in the two-locus ancestral recombination graph. We believe that phase-type theory has great potential as a tool for analysing probability models in population genetics. The compact matrix notation is useful for clarification of current models, in particular their formal manipulation (calculation), but also for further development or extensions

arXiv.org e-Print Archive

Copenhagen University Research Information System