    Ancestral population genomics

    The full genomes of several closely related species are now available, opening an emerging field of investigation borrowing both from population genetics and phylogenetics. Providing we can properly model sequence evolution within populations undergoing speciation events, this resource enables us to estimate key population genetics parameters, such as ancestral population sizes and split times. Furthermore, we can enhance our understanding of the recombination process and investigate various selective forces. We discuss the basic speciation models for closely related species, including the isolation and isolation-with-migration models. A major point in our discussion is that only a few complete genomes contain much information about the whole population. The reason being that recombination unlinks genomic regions, and therefore a few genomes contain many segments with distinct histories. The challenge of population genomics is to decode this mosaic of histories in order to infer scenarios of demography and selection. We survey different approaches for understanding ancestral species from analyses of genomic data from closely related species. In particular, we emphasize core assumptions and working hypothesis. Finally, we discuss computational and statistical challenges that arise in the analysis of population genomics data sets

    Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution

    Analyses of serially-sampled data often begin with the assumption that the observations represent discrete samples from a latent continuous-time stochastic process. The continuous-time Markov chain (CTMC) is one such generative model whose popularity extends to a variety of disciplines ranging from computational finance to human genetics and genomics. A common theme among these diverse applications is the need to simulate sample paths of a CTMC conditional on realized data that is discretely observed. Here we present a general solution to this sampling problem when the CTMC is defined on a discrete and finite state space. Specifically, we consider the generation of sample paths, including intermediate states and times of transition, from a CTMC whose beginning and ending states are known across a time interval of length TT. We first unify the literature through a discussion of the three predominant approaches: (1) modified rejection sampling, (2) direct sampling, and (3) uniformization. We then give analytical results for the complexity and efficiency of each method in terms of the instantaneous transition rate matrix QQ of the CTMC, its beginning and ending states, and the length of sampling time TT. In doing so, we show that no method dominates the others across all model specifications, and we give explicit proof of which method prevails for any given Q,T,Q,T, and endpoints. Finally, we introduce and compare three applications of CTMCs to demonstrate the pitfalls of choosing an inefficient sampler.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS247 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Extreme selective sweeps independently targeted the X chromosomes of the great apes

    The unique inheritance pattern of the X chromosome exposes it to natural selection in a way that is different from that of the autosomes, potentially resulting in accelerated evolution. We perform a comparative analysis of X chromosome polymorphism in 10 great ape species, including humans. In most species, we identify striking megabase-wide regions, where nucleotide diversity is less than 20% of the chromosomal average. Such regions are found exclusively on the X chromosome. The regions overlap partially among species, suggesting that the underlying targets are partly shared among species. The regions have higher proportions of singleton SNPs, higher levels of population differentiation, and a higher nonsynonymous-to-synonymous substitution ratio than the rest of the X chromosome. We show that the extent to which diversity is reduced is incompatible with direct selection or the action of background selection and soft selective sweeps alone, and therefore, we suggest that very strong selective sweeps have independently targeted these specific regions in several species. The only genomic feature that we can identify as strongly associated with loss of diversity is the location of testis-expressed ampliconic genes, which also have reduced diversity around them. We hypothesize that these genes may be responsible for selective sweeps in the form of meiotic drive caused by an intragenomic conflict in male meiosis

    Blood ties: ABO is a trans-species polymorphism in primates

    The ABO histo-blood group, the critical determinant of transfusion incompatibility, was the first genetic polymorphism discovered in humans. Remarkably, ABO antigens are also polymorphic in many other primates, with the same two amino acid changes responsible for A and B specificity in all species sequenced to date. Whether this recurrence of A and B antigens is the result of an ancient polymorphism maintained across species or due to numerous, more recent instances of convergent evolution has been debated for decades, with a current consensus in support of convergent evolution. We show instead that genetic variation data in humans and gibbons as well as in Old World Monkeys are inconsistent with a model of convergent evolution and support the hypothesis of an ancient, multi-allelic polymorphism of which some alleles are shared by descent among species. These results demonstrate that the ABO polymorphism is a trans-species polymorphism among distantly related species and has remained under balancing selection for tens of millions of years, to date, the only such example in Hominoids and Old World Monkeys outside of the Major Histocompatibility Complex.Comment: 45 pages, 4 Figures, 4 Supplementary Figures, 5 Supplementary Table

    Borrowing both from population genetics and phylogenetics, the field of population genomics emerged as full genomes of several closely related species were available. Providing we can properly model sequence evolution within populations undergoing speciation events, this resource enables us to estimate key population genetics parameters such as ancestral population sizes and split times. Furthermore we can enhance our understanding of the recombination process and investigate various selective forces. With the advent of resequencing technologies, genome-wide patterns of diversity in extant populations have now come to complement this picture, offering an increasing power to study more recent genetic history

    Different paths to the modern state in Europe: the interaction between domestic political economy and interstate competition

    Theoretical work on state formation and capacity has focused mostly on early modern Europe and on the experience of western European states during this period. While a number of European states monopolized domestic tax collection and achieved gains in state capacity during the early modern era, for others revenues stagnated or even declined, and these variations motivated alternative hypotheses for determinants of fiscal and state capacity. In this study we test the basic hypotheses in the existing literature making use of the large date set we have compiled for all of the leading states across the continent. We find strong empirical support for two prevailing threads in the literature, arguing respectively that interstate wars and changes in economic structure towards an urbanized economy had positive fiscal impact. Regarding the main point of contention in the theoretical literature, whether it was representative or authoritarian political regimes that facilitated the gains in fiscal capacity, we do not find conclusive evidence that one performed better than the other. Instead, the empirical evidence we have gathered lends supports to the hypothesis that when under pressure of war, the fiscal performance of representative regimes was better in the more urbanized-commercial economies and the fiscal performance of authoritarian regimes was better in rural-agrarian economie

    Fast and Robust Characterization of Time-Heterogeneous Sequence Evolutionary Processes Using Substitution Mapping

    Genes and genomes do not evolve similarly in all branches of the tree of life. Detecting and characterizing the heterogeneity in time, and between lineages, of the nucleotide (or amino acid) substitution process is an important goal of current molecular evolutionary research. This task is typically achieved through the use of non-homogeneous models of sequence evolution, which being highly parametrized and computationally-demanding are not appropriate for large-scale analyses. Here we investigate an alternative methodological option based on probabilistic substitution mapping. The idea is to first reconstruct the substitutional history of each site of an alignment under a homogeneous model of sequence evolution, then to characterize variations in the substitution process across lineages based on substitution counts. Using simulated and published datasets, we demonstrate that probabilistic substitution mapping is robust in that it typically provides accurate reconstruction of sequence ancestry even when the true process is heterogeneous, but a homogeneous model is adopted. Consequently, we show that the new approach is essentially as efficient as and extremely faster than (up to 25 000 times) existing methods, thus paving the way for a systematic survey of substitution process heterogeneity across genes and lineages

    A New Malaria Agent in African Hominids

    Plasmodium falciparum is the major human malaria agent responsible for 200 to 300 million infections and one to three million deaths annually, mainly among African infants. The origin and evolution of this pathogen within the human lineage is still unresolved. A single species, P. reichenowi, which infects chimpanzees, is known to be a close sister lineage of P. falciparum. Here we report the discovery of a new Plasmodium species infecting Hominids. This new species has been isolated in two chimpanzees (Pan troglodytes) kept as pets by villagers in Gabon (Africa). Analysis of its complete mitochondrial genome (5529 nucleotides including Cyt b, Cox I and Cox III genes) reveals an older divergence of this lineage from the clade that includes P. falciparum and P. reichenowi (∼21±9 Myrs ago using Bayesian methods and considering that the divergence between P. falciparum and P. reichenowi occurred 4 to 7 million years ago as generally considered in the literature). This time frame would be congruent with the radiation of hominoids, suggesting that this Plasmodium lineage might have been present in early hominoids and that they may both have experienced a simultaneous diversification. Investigation of the nuclear genome of this new species will further the understanding of the genetic adaptations of P. falciparum to humans. The risk of transfer and emergence of this new species in humans must be now seriously considered given that it was found in two chimpanzees living in contact with humans and its close relatedness to the most virulent agent of malaria

    Auxiliary variables for Bayesian inference in multi-class queueing networks

    Queueing networks describe complex stochastic systems of both theoretical and practical interest. They provide the means to assess alterations, diagnose poor performance and evaluate robustness across sets of interconnected resources. In the present paper, we focus on the underlying continuous-time Markov chains induced by these networks, and we present a flexible method for drawing parameter inference in multi-class Markovian cases with switching and different service disciplines. The approach is directed towards the inferential problem with missing data, where transition paths of individual tasks among the queues are often unknown. The paper introduces a slice sampling technique with mappings to the measurable space of task transitions between the service stations. This can address time and tractability issues in computational procedures, handle prior system knowledge and overcome common restrictions on service rates across existing inferential frameworks. Finally, the proposed algorithm is validated on synthetic data and applied to a real data set, obtained from a service delivery tasking tool implemented in two university hospitals