2,854 research outputs found

    RLZAP: Relative Lempel-Ziv with Adaptive Pointers

    Full text link
    Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of genomes from individuals of the same species when fast random access is desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a reference genome is selected and then the other genomes are greedily parsed into phrases exactly matching substrings of the reference. Deorowicz and Grabowski (Bioinformatics, 2011) pointed out that letting each phrase end with a mismatch character usually gives better compression because many of the differences between individuals' genomes are single-nucleotide substitutions. Ferrada et al. (SPIRE 2014) then pointed out that also using relative pointers and run-length compressing them usually gives even better compression. In this paper we generalize Ferrada et al.'s idea to handle well also short insertions, deletions and multi-character substitutions. We show experimentally that our generalization achieves better compression than Ferrada et al.'s implementation with comparable random-access times

    A Faster Implementation of Online Run-Length Burrows-Wheeler Transform

    Full text link
    Run-length encoding Burrows-Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in O(nlgr)O(n\lg r) time and O(rlgn)O(r\lg n) bits of space, where nn is the length of input string SS received so far and rr is the number of runs in the BWT of the reversed SS. We improve the state-of-the-art algorithm for online RLBWT in terms of empirical construction time. Adopting the dynamic list for maintaining a total order, we can replace rank queries in a dynamic wavelet tree on a run-length compressed string by the direct comparison of labels in a dynamic list. The empirical result for various benchmarks show the efficiency of our algorithm, especially for highly repetitive strings.Comment: In Proc. IWOCA201

    Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries

    Full text link
    We present the first thorough practical study of the Lempel-Ziv-78 and the Lempel-Ziv-Welch computation based on trie data structures. With a careful selection of trie representations we can beat well-tuned popular trie data structures like Judy, m-Bonsai or Cedar

    Composite repetition-aware data structures

    Get PDF
    In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

    Traffic jams and ordering far from thermal equilibrium

    Full text link
    The recently suggested correspondence between domain dynamics of traffic models and the asymmetric chipping model is reviewed. It is observed that in many cases traffic domains perform the two characteristic dynamical processes of the chipping model, namely chipping and diffusion. This correspondence indicates that jamming in traffic models in which all dynamical rates are non-deterministic takes place as a broad crossover phenomenon, rather than a sharp transition. Two traffic models are studied in detail and analyzed within this picture.Comment: Contribution to the Niels Bohr Summer Institute on Complexity and Criticality; to appear in a Per Bak Memorial Issue of PHYSICA

    Dynamic Fluctuation Phenomena in Double Membrane Films

    Full text link
    Dynamics of double membrane films is investigated in the long-wavelength limit including the overdamped squeezing mode. We demonstrate that thermal fluctuations essentially modify the character of the mode due to its nonlinear coupling to the transversal shear hydrodynamic mode. The corresponding Green function acquires as a function of the frequency a cut along the imaginary semi-axis. Fluctuations lead to increasing the attenuation of the squeezing mode it becomes larger than the `bare' value.Comment: 7 pages, Revte

    Classification of Dust Days by Satellite Remotely Sensed Aerosol Products

    Get PDF
    Considerable progress in satellite remote sensing (SRS) of dust particles has been seen in the last decade. From an environmental health perspective, such an event detection, after linking it to ground particulate matter (PM) concentrations, can proxy acute exposure to respirable particles of certain properties (i.e. size, composition, and toxicity). Being affected considerably by atmospheric dust, previous studies in the Eastern Mediterranean, and in Israel in particular, have focused on mechanistic and synoptic prediction, classification, and characterization of dust events. In particular, a scheme for identifying dust days (DD) in Israel based on ground PM10 (particulate matter of size smaller than 10 nm) measurements has been suggested, which has been validated by compositional analysis. This scheme requires information regarding ground PM10 levels, which is naturally limited in places with sparse ground-monitoring coverage. In such cases, SRS may be an efficient and cost-effective alternative to ground measurements. This work demonstrates a new model for identifying DD and non-DD (NDD) over Israel based on an integration of aerosol products from different satellite platforms (Moderate Resolution Imaging Spectroradiometer (MODIS) and Ozone Monitoring Instrument (OMI)). Analysis of ground-monitoring data from 2007 to 2008 in southern Israel revealed 67 DD, with more than 88 percent occurring during winter and spring. A Classification and Regression Tree (CART) model that was applied to a database containing ground monitoring (the dependent variable) and SRS aerosol product (the independent variables) records revealed an optimal set of binary variables for the identification of DD. These variables are combinations of the following primary variables: the calendar month, ground-level relative humidity (RH), the aerosol optical depth (AOD) from MODIS, and the aerosol absorbing index (AAI) from OMI. A logistic regression that uses these variables, coded as binary variables, demonstrated 93.2 percent correct classifications of DD and NDD. Evaluation of the combined CART-logistic regression scheme in an adjacent geographical region (Gush Dan) demonstrated good results. Using SRS aerosol products for DD and NDD, identification may enable us to distinguish between health, ecological, and environmental effects that result from exposure to these distinct particle populations

    Factorised Steady States in Mass Transport Models

    Get PDF
    We study a class of mass transport models where mass is transported in a preferred direction around a one-dimensional periodic lattice and is globally conserved. The model encompasses both discrete and continuous masses and parallel and random sequential dynamics and includes models such as the Zero-range process and Asymmetric random average process as special cases. We derive a necessary and sufficient condition for the steady state to factorise, which takes a rather simple form.Comment: 6 page

    Straightening of Thermal Fluctuations in Semi-Flexible Polymers by Applied Tension

    Get PDF
    We investigate the propagation of a suddenly applied tension along a thermally excited semi-flexible polymer using analytical approximations, scaling arguments and numerical simulation. This problem is inherently non-linear. We find sub-diffusive propagation with a dynamical exponent of 1/4. By generalizing the internal elasticity, we show that tense strings exhibit qualitatively different tension profiles and propagation with an exponent of 1/2.Comment: Latex file; with three postscript figures; .ps available at http://dept.physics.upenn.edu/~nelson/pull.p
    corecore