38,383 research outputs found

    Evolutionary Inference via the Poisson Indel Process

    Full text link
    We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classical evolutionary process, the TKF91 model, is a continuous-time Markov chain model comprised of insertion, deletion and substitution events. Unfortunately this model gives rise to an intractable computational problem---the computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a new stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The new model is closely related to the TKF91 model, differing only in its treatment of insertions, but the new model has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared to separate inference of phylogenies and alignments.Comment: 33 pages, 6 figure

    Efficient Management of Short-Lived Data

    Full text link
    Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, which are characterised by intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be tagged with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data tagged with expiration times. The algorithms are based on fully functional, persistent treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.Comment: switched to TimeCenter latex styl

    Streaming Complexity of Spanning Tree Computation

    Get PDF
    The semi-streaming model is a variant of the streaming model frequently used for the computation of graph problems. It allows the edges of an n-node input graph to be read sequentially in p passes using Õ(n) space. If the list of edges includes deletions, then the model is called the turnstile model; otherwise it is called the insertion-only model. In both models, some graph problems, such as spanning trees, k-connectivity, densest subgraph, degeneracy, cut-sparsifier, and (Δ+1)-coloring, can be exactly solved or (1+ε)-approximated in a single pass; while other graph problems, such as triangle detection and unweighted all-pairs shortest paths, are known to require Ω̃(n) passes to compute. For many fundamental graph problems, the tractability in these models is open. In this paper, we study the tractability of computing some standard spanning trees, including BFS, DFS, and maximum-leaf spanning trees. Our results, in both the insertion-only and the turnstile models, are as follows. Maximum-Leaf Spanning Trees: This problem is known to be APX-complete with inapproximability constant ρ ∈ [245/244, 2). By constructing an ε-MLST sparsifier, we show that for every constant ε > 0, MLST can be approximated in a single pass to within a factor of 1+ε w.h.p. (albeit in super-polynomial time for ε ≤ ρ-1 assuming P ≠ NP) and can be approximated in polynomial time in a single pass to within a factor of ρ_n+ε w.h.p., where ρ_n is the supremum constant that MLST cannot be approximated to within using polynomial time and Õ(n) space. In the insertion-only model, these algorithms can be deterministic. BFS Trees: It is known that BFS trees require ω(1) passes to compute, but the naïve approach needs O(n) passes. We devise a new randomized algorithm that reduces the pass complexity to O(√n), and it offers a smooth tradeoff between pass complexity and space usage. This gives a polynomial separation between single-source and all-pairs shortest paths for unweighted graphs. DFS Trees: It is unknown whether DFS trees require more than one pass. The current best algorithm by Khan and Mehta [STACS 2019] takes Õ(h) passes, where h is the height of computed DFS trees. Note that h can be as large as Ω(m/n) for n-node m-edge graphs. Our contribution is twofold. First, we provide a simple alternative proof of this result, via a new connection to sparse certificates for k-node-connectivity. Second, we present a randomized algorithm that reduces the pass complexity to O(√n), and it also offers a smooth tradeoff between pass complexity and space usage.ISSN:1868-896

    B-urns

    Full text link
    The fringe of a B-tree with parameter mm is considered as a particular P\'olya urn with mm colors. More precisely, the asymptotic behaviour of this fringe, when the number of stored keys tends to infinity, is studied through the composition vector of the fringe nodes. We establish its typical behaviour together with the fluctuations around it. The well known phase transition in P\'olya urns has the following effect on B-trees: for m59m\leq 59, the fluctuations are asymptotically Gaussian, though for m60m\geq 60, the composition vector is oscillating; after scaling, the fluctuations of such an urn strongly converge to a random variable WW. This limit is C\mathbb C-valued and it does not seem to follow any classical law. Several properties of WW are shown: existence of exponential moments, characterization of its distribution as the solution of a smoothing equation, existence of a density relatively to the Lebesgue measure on C\mathbb C, support of WW. Moreover, a few representations of the composition vector for various values of mm illustrate the different kinds of convergence

    Towards a Scalable Dynamic Spatial Database System

    Get PDF
    With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a novel distributed architecture to solve the scalability issues.Comment: (2012
    corecore