2,732 research outputs found

    Memory-Constrained Algorithms for Simple Polygons

    Get PDF
    A constant-workspace algorithm has read-only access to an input array and may use only O(1) additional words of O(logn)O(\log n) bits, where nn is the size of the input. We assume that a simple nn-gon is given by the ordered sequence of its vertices. We show that we can find a triangulation of a plane straight-line graph in O(n2)O(n^2) time. We also consider preprocessing a simple polygon for shortest path queries when the space constraint is relaxed to allow ss words of working space. After a preprocessing of O(n2)O(n^2) time, we are able to solve shortest path queries between any two points inside the polygon in O(n2/s)O(n^2/s) time.Comment: Preprint appeared in EuroCG 201

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Composite repetition-aware data structures

    Get PDF
    In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

    The Future of Computation

    Full text link
    ``The purpose of life is to obtain knowledge, use it to live with as much satisfaction as possible, and pass it on with improvements and modifications to the next generation.'' This may sound philosophical, and the interpretation of words may be subjective, yet it is fairly clear that this is what all living organisms--from bacteria to human beings--do in their life time. Indeed, this can be adopted as the information theoretic definition of life. Over billions of years, biological evolution has experimented with a wide range of physical systems for acquiring, processing and communicating information. We are now in a position to make the principles behind these systems mathematically precise, and then extend them as far as laws of physics permit. Therein lies the future of computation, of ourselves, and of life.Comment: 7 pages, Revtex. Invited lecture at the Workshop on Quantum Information, Computation and Communication (QICC-2005), IIT Kharagpur, India, February 200

    Quasi-Chemical Theory and Implicit Solvent Models for Simulations

    Get PDF
    A statistical thermodynamic development is given of a new implicit solvent model that avoids the traditional system size limitations of computer simulation of macromolecular solutions with periodic boundary conditions. This implicit solvent model is based upon the quasi-chemical approach, distinct from the common integral equation trunk of the theory of liquid solutions. The physical content of this theory is the hypothesis that a small set of solvent molecules are decisive for these solvation problems. A detailed derivation of the quasi-chemical theory escorts the development of this proposal. The numerical application of the quasi-chemical treatment to Li+^+ ion hydration in liquid water is used to motivate and exemplify the quasi-chemical theory. Those results underscore the fact that the quasi-chemical approach refines the path for utilization of ion-water cluster results for the statistical thermodynamics of solutions.Comment: 30 pages, contribution to Santa Fe Workshop on Treatment of Electrostatic Interactions in Computer Simulation of Condensed Medi
    corecore