993 research outputs found

    Evolutionary Inference via the Poisson Indel Process

    Full text link
    We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classical evolutionary process, the TKF91 model, is a continuous-time Markov chain model comprised of insertion, deletion and substitution events. Unfortunately this model gives rise to an intractable computational problem---the computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a new stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The new model is closely related to the TKF91 model, differing only in its treatment of insertions, but the new model has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared to separate inference of phylogenies and alignments.Comment: 33 pages, 6 figure

    Novelty And Surprises In Complex Adaptive System (CAS) Dynamics: A Computational Theory of Actor Innovation

    Get PDF
    The work of John von Neumann in the 1940's on self-reproducing machines as models for biological systems and self-organized complexity provides the computational legacy for CAS. Following this, the major hypothesis emanating from Wolfram (1984), Langton (1992, 1994), Kaufmann (1993) and Casti (1994) is that the sine qua non of complex adaptive systems is their capacity to produce novelty or 'surprises' and the so called Type IV innovation based structure changing dynamics of the Wolfram-Chomsky schema. The Wolfram-Chomsky schema postulates that on varying the computational capabilities of agents, different system wide dynamics can be generated: finite automata produce Type I dynamics with unique limit points or homogeneity; push down automata produce Type II dynamics with limit cycles; linear bounded automata generate Type III chaotic trajectories with strange attractors. The significance of this schema is that it postulates that only agents with the full powers of Turing Machines capable of simulating other Turing Machines, which Wolfram calls computational universality can produce Type IV irregular innovation based structure changing dynamics associated with the three main natural exponents of CAS, evolutionary biology, immunology and capitalist growth. Langton (1990,1992) identifies the above complexity classes for dynamical systems with the halting problem of Turing machines and famously calls the phase transition or the domain on which novel objects emerge as 'life at the edge of chaos'. This paper develops the formal foundations for the emergence of novelty or innovation. Remarkably, following Binmore(1987) who first introduced to game theory the requisite dose of mechanism with players modelled as Turing Machines with the Gödel (1931) logic involving the Liar or the pure logic of opposition, we will see that only agents qua universal Turing Machines which can make self-referential calculation of hostile objectives can bring about adaptive novelty or strategic innovation

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    Complex type 4 structure changing dynamics of digital agents: Nash equilibria of a game with arms race in innovations

    Get PDF
    The new digital economy has renewed interest in how digital agents can innovate. This follows the legacy of John von Neumann dynamical systems theory on complex biological systems as computation. The Gödel-Turing-Post (GTP) logic is shown to be necessary to generate innovation based structure changing Type 4 dynamics of the Wolfram-Chomsky schema. Two syntactic procedures of GTP logic permit digital agents to exit from listable sets of digital technologies to produce novelty and surprises. The first is meta-analyses or offline simulations. The second is a fixed point with a two place encoding of negation or opposition, referred to as the Gödel sentence. It is postulated that in phenomena ranging from the genome to human proteanism, the Gödel sentence is a ubiquitous syntactic construction without which escape from hostile agents qua the Liar is impossible and digital agents become entrained within fixed repertoires. The only recursive best response function of a 2-person adversarial game that can implement strategic innovation in lock-step formation of an arms race is the productive function of the Emil Post [58] set theoretic proof of the Gödel incompleteness result. This overturns the view of game theorists that surprise and innovation cannot be a Nash equilibrium of a game

    Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective

    Full text link
    Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and provide a critical analysis of the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, focusing in particular on those that are inherently tied to the nature of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy expertsComment: To appear in the Proceedings on Privacy Enhancing Technologies (PoPETs), Vol. 2019, Issue

    Accurate reconstruction of insertion-deletion histories by statistical phylogenetics

    Get PDF
    The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with arXiv:1103.434

    Complex event types for agent-based simulation

    Get PDF
    This thesis presents a novel formal modelling language, complex event types (CETs), to describe behaviours in agent-based simulations. CETs are able to describe behaviours at any computationally represented level of abstraction. Behaviours can be specified both in terms of the state transition rules of the agent-based model that generate them and in terms of the state transition structures themselves. Based on CETs, novel computational statistical methods are introduced which allow statistical dependencies between behaviours at different levels to be established. Different dependencies formalise different probabilistic causal relations and Complex Systems constructs such as ‘emergence’ and ‘autopoiesis’. Explicit links are also made between the different types of CET inter-dependency and the theoretical assumptions they represent. With the novel computational statistical methods, three categories of model can be validated and discovered: (i) inter-level models, which define probabilistic dependencies between behaviours at different levels; (ii) multi-level models, which define the set of simulations for which an inter-level model holds; (iii) inferred predictive models, which define latent relationships between behaviours at different levels. The CET modelling language and computational statistical methods are then applied to a novel agent-based model of Colonic Cancer to demonstrate their applicability to Complex Systems sciences such as Systems Biology. This proof of principle model provides a framework for further development of a detailed integrative model of the system, which can progressively incorporate biological data from different levels and scales as these become available
    • …
    corecore