47 research outputs found
Sensitivity of the Burrows-Wheeler Transform to small modifications, and other problems on string compressors in Bioinformatics
Extensive amount of data is produced in textual form nowadays, especially in bioinformatics. Several algorithms exist to store and process this data efficiently in compressed space. In this thesis, we focus on both combinatorial and practical aspects of two of the most widely used algorithms for compressing text in bioinformatics: the Burrows-Wheeler Transform (BWT) and Lempel-Ziv compression (LZ77). In the first part, we focus on combinatorial aspects of the BWT. Given a word v, r = r(v) denotes the number of maximal equal-letter runs in BWT(v). First, we investigate the relationship between r of a word and r of its reverse. We prove that there exist words for which these two values differ by a logarithmic factor in the length of the word. In other words, although the repetitiveness in the two words is preserved, the number of runs can change by a non-constant factor. This suggests that the number of runs may not be an ideal repetitiveness measure. The second combinatorial aspect we are interested in is how small alterations in a word may affect its BWT in a relevant way. We prove that the number of runs of the BWT of a word can change (increase or decrease) by up to a logarithmic factor in the length of the word by just adding, removing, or substituting a single character. We then consider the special character can be inserted in order to turn it into the BWT of a is allowed, depends entirely on the structure of a specific permutation of the indices of the word, which is called the standard permutation of the word. The final part of this thesis treats more applied aspects of text compressors. In bioinformatics, BWT-based compressed data structures are widely used for pattern matching. We give an algorithm based on the BWT to find Maximal Unique Matches (MUMs) of a pattern with respect to a reference text in compressed space, extending an existing tool called PHONI [Boucher et. al, DCC 2021]. Finally, we study some aspects of the Lempel-Ziv 77 (LZ77) factorization of a word. Modeling DNA short reads, we provide a bound on the compression size of the concatenation of regular samples of a word
Repetitive subwords
The central notionof thisthesisis repetitionsin words. We studyproblemsrelated to contiguous repetitions. More specifically we will consider repeating scattered subwords of non-primitive words, i.e. words which are complete repetitions of other words. We will present inequalities concerning these occurrences as well as giving apartial solutionto an openproblemposedby Salomaaet al. We will characterize languages, whichare closed under the operation ofduplication, thatis repeating any factor of a word. We alsogive newbounds onthe number of occurrencesof certain types of repetitions of words. We give a solution to an open problem posed by Calbrix and Nivat concerning regular languages consisting of non-primitive words. We alsopresentsomeresultsregarding theduplication closureoflanguages,among which a new proof to a problem of Bovet and Varricchio
LIPIcs, Volume 244, ESA 2022, Complete Volume
LIPIcs, Volume 244, ESA 2022, Complete Volum
Jet Quenching in Relativistic Heavy Ion Collisions at the LHC
Jet production in relativistic heavy ion collisions is studied using Pb+Pb collisions at a center of mass energy of 2.76 TeV per nucleon. The measurements reported here utilize data collected with the ATLAS detector at the LHC from the 2010 Pb ion run corresponding to a total integrated luminosity of 7 µ b^(-1). The results are obtained using fully reconstructed jets using the anti-k t algorithm with a per-event background subtraction procedure. A centrality-dependent modification of the dijet asymmetry distribution is observed, which indicates a higher rate of asymmetric dijet pairs in central collisions relative to periphal and pp collisions. Simultaneously the dijet angular correlations show almost no centrality dependence. These results provide the first direct observation of jet quenching. Measurements of the single inclusive jet spectrum, measured with jet radius parameters R=0.2, 0.3, 0.4 and 0.5, are also presented. The spectra are unfolded to correct for the finite energy resolution introduced by both detector effects and underlying event fluctuations. Single jet production, through the central-to-peripheral ratio R CP, is found to be suppressed in central collisions by approximately a factor of two, nearly independent of the jet p T. The R CP is found to have a small but significant increase with increasing R, which may relate directly to aspects of radiative energy loss
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum