993 research outputs found
Evolutionary Inference via the Poisson Indel Process
We address the problem of the joint statistical inference of phylogenetic
trees and multiple sequence alignments from unaligned molecular sequences. This
problem is generally formulated in terms of string-valued evolutionary
processes along the branches of a phylogenetic tree. The classical evolutionary
process, the TKF91 model, is a continuous-time Markov chain model comprised of
insertion, deletion and substitution events. Unfortunately this model gives
rise to an intractable computational problem---the computation of the marginal
likelihood under the TKF91 model is exponential in the number of taxa. In this
work, we present a new stochastic process, the Poisson Indel Process (PIP), in
which the complexity of this computation is reduced to linear. The new model is
closely related to the TKF91 model, differing only in its treatment of
insertions, but the new model has a global characterization as a Poisson
process on the phylogeny. Standard results for Poisson processes allow key
computations to be decoupled, which yields the favorable computational profile
of inference under the PIP model. We present illustrative experiments in which
Bayesian inference under the PIP model is compared to separate inference of
phylogenies and alignments.Comment: 33 pages, 6 figure
Novelty And Surprises In Complex Adaptive System (CAS) Dynamics: A Computational Theory of Actor Innovation
The work of John von Neumann in the 1940's on self-reproducing machines as models for biological systems and self-organized complexity provides the computational legacy for CAS. Following this, the major hypothesis emanating from Wolfram (1984), Langton (1992, 1994), Kaufmann (1993) and Casti (1994) is that the sine qua non of complex adaptive systems is their capacity to produce novelty or 'surprises' and the so called Type IV innovation based structure changing dynamics of the Wolfram-Chomsky schema. The Wolfram-Chomsky schema postulates that on varying the computational capabilities of agents, different system wide dynamics can be generated: finite automata produce Type I dynamics with unique limit points or homogeneity; push down automata produce Type II dynamics with limit cycles; linear bounded automata generate Type III chaotic trajectories with strange attractors. The significance of this schema is that it postulates that only agents with the full powers of Turing Machines capable of simulating other Turing Machines, which Wolfram calls computational universality can produce Type IV irregular innovation based structure changing dynamics associated with the three main natural exponents of CAS, evolutionary biology, immunology and capitalist growth. Langton (1990,1992) identifies the above complexity classes for dynamical systems with the halting problem of Turing machines and famously calls the phase transition or the domain on which novel objects emerge as 'life at the edge of chaos'. This paper develops the formal foundations for the emergence of novelty or innovation. Remarkably, following Binmore(1987) who first introduced to game theory the requisite dose of mechanism with players modelled as Turing Machines with the Gödel (1931) logic involving the Liar or the pure logic of opposition, we will see that only agents qua universal Turing Machines which can make self-referential calculation of hostile objectives can bring about adaptive novelty or strategic innovation
Developing and applying heterogeneous phylogenetic models with XRate
Modeling sequence evolution on phylogenetic trees is a useful technique in
computational biology. Especially powerful are models which take account of the
heterogeneous nature of sequence evolution according to the "grammar" of the
encoded gene features. However, beyond a modest level of model complexity,
manual coding of models becomes prohibitively labor-intensive. We demonstrate,
via a set of case studies, the new built-in model-prototyping capabilities of
XRate (macros and Scheme extensions). These features allow rapid implementation
of phylogenetic models which would have previously been far more
labor-intensive. XRate's new capabilities for lineage-specific models,
ancestral sequence reconstruction, and improved annotation output are also
discussed. XRate's flexible model-specification capabilities and computational
efficiency make it well-suited to developing and prototyping phylogenetic
grammar models. XRate is available as part of the DART software package:
http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog
Complex type 4 structure changing dynamics of digital agents: Nash equilibria of a game with arms race in innovations
The new digital economy has renewed interest in how digital agents can innovate. This follows the legacy of John von Neumann dynamical systems theory on complex biological systems as computation. The Gödel-Turing-Post (GTP) logic is shown to be necessary to generate innovation based structure changing Type 4 dynamics of the Wolfram-Chomsky schema. Two syntactic procedures of GTP logic permit digital agents to exit from listable sets of digital technologies to produce novelty and surprises. The first is meta-analyses or offline simulations. The second is a fixed point with a two place encoding of negation or opposition, referred to as the Gödel sentence. It is postulated that in phenomena ranging from the genome to human proteanism, the Gödel sentence is a ubiquitous syntactic construction without which escape from hostile agents qua the Liar is impossible and digital agents become entrained within fixed repertoires. The only recursive best response function of a 2-person adversarial game that can implement strategic innovation in lock-step formation of an arms race is the productive function of the Emil Post [58] set theoretic proof of the Gödel incompleteness result. This overturns the view of game theorists that surprise and innovation cannot be a Nash equilibrium of a game
Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective
Rapid advances in human genomics are enabling researchers to gain a better
understanding of the role of the genome in our health and well-being,
stimulating hope for more effective and cost efficient healthcare. However,
this also prompts a number of security and privacy concerns stemming from the
distinctive characteristics of genomic data. To address them, a new research
community has emerged and produced a large number of publications and
initiatives.
In this paper, we rely on a structured methodology to contextualize and
provide a critical analysis of the current knowledge on privacy-enhancing
technologies used for testing, storing, and sharing genomic data, using a
representative sample of the work published in the past decade. We identify and
discuss limitations, technical challenges, and issues faced by the community,
focusing in particular on those that are inherently tied to the nature of the
problem and are harder for the community alone to address. Finally, we report
on the importance and difficulty of the identified challenges based on an
online survey of genome data privacy expertsComment: To appear in the Proceedings on Privacy Enhancing Technologies
(PoPETs), Vol. 2019, Issue
Accurate reconstruction of insertion-deletion histories by statistical phylogenetics
The Multiple Sequence Alignment (MSA) is a computational abstraction that
represents a partial summary either of indel history, or of structural
similarity. Taking the former view (indel history), it is possible to use
formal automata theory to generalize the phylogenetic likelihood framework for
finite substitution models (Dayhoff's probability matrices and Felsenstein's
pruning algorithm) to arbitrary-length sequences. In this paper, we report
results of a simulation-based benchmark of several methods for reconstruction
of indel history. The methods tested include a relatively new algorithm for
statistical marginalization of MSAs that sums over a stochastically-sampled
ensemble of the most probable evolutionary histories. For mammalian
evolutionary parameters on several different trees, the single most likely
history sampled by our algorithm appears less biased than histories
reconstructed by other MSA methods. The algorithm can also be used for
alignment-free inference, where the MSA is explicitly summed out of the
analysis. As an illustration of our method, we discuss reconstruction of the
evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with
arXiv:1103.434
Complex event types for agent-based simulation
This thesis presents a novel formal modelling language, complex event types (CETs), to describe behaviours
in agent-based simulations. CETs are able to describe behaviours at any computationally
represented level of abstraction. Behaviours can be specified both in terms of the state transition rules of
the agent-based model that generate them and in terms of the state transition structures themselves.
Based on CETs, novel computational statistical methods are introduced which allow statistical dependencies
between behaviours at different levels to be established. Different dependencies formalise
different probabilistic causal relations and Complex Systems constructs such as ‘emergence’ and ‘autopoiesis’.
Explicit links are also made between the different types of CET inter-dependency and the
theoretical assumptions they represent.
With the novel computational statistical methods, three categories of model can be validated and
discovered: (i) inter-level models, which define probabilistic dependencies between behaviours at different
levels; (ii) multi-level models, which define the set of simulations for which an inter-level model
holds; (iii) inferred predictive models, which define latent relationships between behaviours at different
levels.
The CET modelling language and computational statistical methods are then applied to a novel
agent-based model of Colonic Cancer to demonstrate their applicability to Complex Systems sciences
such as Systems Biology. This proof of principle model provides a framework for further development
of a detailed integrative model of the system, which can progressively incorporate biological data from
different levels and scales as these become available
- …