47 research outputs found

    Sensitivity of the Burrows-Wheeler Transform to small modifications, and other problems on string compressors in Bioinformatics

    Get PDF
    Extensive amount of data is produced in textual form nowadays, especially in bioinformatics. Several algorithms exist to store and process this data efficiently in compressed space. In this thesis, we focus on both combinatorial and practical aspects of two of the most widely used algorithms for compressing text in bioinformatics: the Burrows-Wheeler Transform (BWT) and Lempel-Ziv compression (LZ77). In the first part, we focus on combinatorial aspects of the BWT. Given a word v, r = r(v) denotes the number of maximal equal-letter runs in BWT(v). First, we investigate the relationship between r of a word and r of its reverse. We prove that there exist words for which these two values differ by a logarithmic factor in the length of the word. In other words, although the repetitiveness in the two words is preserved, the number of runs can change by a non-constant factor. This suggests that the number of runs may not be an ideal repetitiveness measure. The second combinatorial aspect we are interested in is how small alterations in a word may affect its BWT in a relevant way. We prove that the number of runs of the BWT of a word can change (increase or decrease) by up to a logarithmic factor in the length of the word by just adding, removing, or substituting a single character. We then consider the special character usedinreal−lifeapplicationstomarktheendofaword.WeinvestigatetheimpactofthischaracteronwordswithrespecttotheBWT.Wecharacterizepositionsinawordwhere used in real-life applications to mark the end of a word. We investigate the impact of this character on words with respect to the BWT. We characterize positions in a word where can be inserted in order to turn it into the BWT of a −terminatedwordoverthesamealphabet.Weshowthat,whetherandwhere-terminated word over the same alphabet. We show that, whether and where is allowed, depends entirely on the structure of a specific permutation of the indices of the word, which is called the standard permutation of the word. The final part of this thesis treats more applied aspects of text compressors. In bioinformatics, BWT-based compressed data structures are widely used for pattern matching. We give an algorithm based on the BWT to find Maximal Unique Matches (MUMs) of a pattern with respect to a reference text in compressed space, extending an existing tool called PHONI [Boucher et. al, DCC 2021]. Finally, we study some aspects of the Lempel-Ziv 77 (LZ77) factorization of a word. Modeling DNA short reads, we provide a bound on the compression size of the concatenation of regular samples of a word

    Repetitive subwords

    Get PDF
    The central notionof thisthesisis repetitionsin words. We studyproblemsrelated to contiguous repetitions. More specifically we will consider repeating scattered subwords of non-primitive words, i.e. words which are complete repetitions of other words. We will present inequalities concerning these occurrences as well as giving apartial solutionto an openproblemposedby Salomaaet al. We will characterize languages, whichare closed under the operation ofduplication, thatis repeating any factor of a word. We alsogive newbounds onthe number of occurrencesof certain types of repetitions of words. We give a solution to an open problem posed by Calbrix and Nivat concerning regular languages consisting of non-primitive words. We alsopresentsomeresultsregarding theduplication closureoflanguages,among which a new proof to a problem of Bovet and Varricchio

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    Jet Quenching in Relativistic Heavy Ion Collisions at the LHC

    Get PDF
    Jet production in relativistic heavy ion collisions is studied using Pb+Pb collisions at a center of mass energy of 2.76 TeV per nucleon. The measurements reported here utilize data collected with the ATLAS detector at the LHC from the 2010 Pb ion run corresponding to a total integrated luminosity of 7 µ b^(-1). The results are obtained using fully reconstructed jets using the anti-k t algorithm with a per-event background subtraction procedure. A centrality-dependent modification of the dijet asymmetry distribution is observed, which indicates a higher rate of asymmetric dijet pairs in central collisions relative to periphal and pp collisions. Simultaneously the dijet angular correlations show almost no centrality dependence. These results provide the first direct observation of jet quenching. Measurements of the single inclusive jet spectrum, measured with jet radius parameters R=0.2, 0.3, 0.4 and 0.5, are also presented. The spectra are unfolded to correct for the finite energy resolution introduced by both detector effects and underlying event fluctuations. Single jet production, through the central-to-peripheral ratio R CP, is found to be suppressed in central collisions by approximately a factor of two, nearly independent of the jet p T. The R CP is found to have a small but significant increase with increasing R, which may relate directly to aspects of radiative energy loss

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Recommending APIs for software evolution

    Get PDF

    Generation of interactive programming environments: GIPE

    Get PDF
    corecore