293 research outputs found

    On the maximal sum of exponents of runs in a string

    Get PDF
    A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition vv with a period pp such that 2pv2p \le |v|. The exponent of a run is defined as v/p|v|/p and is 2\ge 2. We show new bounds on the maximal sum of exponents of runs in a string of length nn. Our upper bound of 4.1n4.1n is better than the best previously known proven bound of 5.6n5.6n by Crochemore & Ilie (2008). The lower bound of 2.035n2.035n, obtained using a family of binary words, contradicts the conjecture of Kolpakov & Kucherov (1999) that the maximal sum of exponents of runs in a string of length nn is smaller than 2n2nComment: 7 pages, 1 figur

    Online Pattern Matching for String Edit Distance with Moves

    Full text link
    Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

    Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

    Get PDF
    Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the runs in a string can be computed via a linear number of LCE queries. The first to consider these problems over a general ordered alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an O(n(logn)2/3)O(n (\log n)^{2/3})-time algorithm for answering O(n)O(n) LCE queries. This result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to O(nloglogn)O(n \log \log n) time. In this work we note a special \emph{non-crossing} property of LCE queries asked in the runs computation. We show that any nn such non-crossing queries can be answered on-line in O(nα(n))O(n \alpha(n)) time, which yields an O(nα(n))O(n \alpha(n))-time algorithm for computing runs

    On the maximal number of cubic subwords in a string

    Full text link
    We investigate the problem of the maximum number of cubic subwords (of the form wwwwww) in a given word. We also consider square subwords (of the form wwww). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares or which are highly repetitive; then we provide a nontrivial estimation for the number of cubes. We show that the maximum number of squares xxxx such that xx is not a primitive word (nonprimitive squares) in a word of length nn is exactly n21\lfloor \frac{n}{2}\rfloor - 1, and the maximum number of subwords of the form xkx^k, for k3k\ge 3, is exactly n2n-2. In particular, the maximum number of cubes in a word is not greater than n2n-2 either. Using very technical properties of occurrences of cubes, we improve this bound significantly. We show that the maximum number of cubes in a word of length nn is between (1/2)n(1/2)n and (4/5)n(4/5)n. (In particular, we improve the lower bound from the conference version of the paper.)Comment: 14 page

    Composite repetition-aware data structures

    Get PDF
    In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

    Infections in temporal proximity to HPV vaccination and adverse effects following vaccination in Denmark: A nationwide register-based cohort study and case-crossover analysis

    Get PDF
    BACKGROUND: Public trust in the human papilloma virus (HPV) vaccination programme has been challenged by reports of potential severe adverse effects. The reported adverse symptoms were heterogeneous and overlapping with those characterised as chronic fatigue syndrome (CFS) and have been described as CFS-like symptoms. Evidence suggests that CFS is often precipitated by an infection. The aim of the study was to examine if an infection in temporal proximity to HPV vaccination is a risk factor for suspected adverse effects following HPV vaccination. METHODS AND FINDINGS: The study was a nationwide register-based cohort study and case-crossover analysis. The study population consisted of all HPV vaccinated females living in Denmark, born between 1974 and 2006, and vaccinated between January 1, 2006 and December 31, 2017. The exposure was any infection in the period ± 1 month around time of first HPV vaccination and was defined as (1) hospital-treated infection; (2) redemption of anti-infective medication; or (3) having a rapid streptococcal test done at the general practitioner. The outcome was referral to a specialised hospital setting (5 national HPV centres opened June 1, 2015) due to suspected adverse effects following HPV vaccination. Multivariable logistic regression was used to estimate the association between infection and later HPV centre referral. The participants were 600,400 HPV-vaccinated females aged 11 to 44 years. Of these, 48,361 (9.7%) females had a hospital-treated infection, redeemed anti-infective medication, or had a rapid streptococcal test ± 1 month around time of first HPV vaccination. A total of 1,755 (0.3%) females were referred to an HPV centre. Having a hospital-treated infection in temporal proximity to vaccination was associated with significantly elevated risk of later referral to an HPV centre (odds ratio (OR) 2.75, 95% confidence interval (CI) 1.72 to 4.40; P < 0.001). Increased risk was also observed among females who redeemed anti-infective medication (OR 1.56, 95% CI 1.33 to 1.83; P < 0.001) or had a rapid streptococcal test (OR 1.45, 95% CI 1.10 to 1.93; P = 0.010). Results from a case-crossover analysis, which was performed to adjust for potential unmeasured confounding, supported the findings. A key limitation of the study is that the HPV centres did not open until June 1, 2015, which may have led to an underestimation of the risk of suspected adverse effects, but stratified analyses by year of vaccination yielded similar results. CONCLUSIONS: Treated infection in temporal proximity to HPV vaccination is associated with increased risk for later referral with suspected adverse vaccine effects. Thus, the infection could potentially be a trigger of the CFS-like symptoms in a subset of the referred females. To our knowledge, the study is the first to investigate the role of infection in the development of suspected adverse effects after HPV vaccination and replication of these findings are needed in other studies

    One-dimensional staged self-assembly

    Get PDF
    17th International Conference, DNA 17, Pasadena, CA, USA, September 19-23, 2011. ProceedingsWe introduce the problem of staged self-assembly of one-dimensional nanostructures, which becomes interesting when the elements are labeled (e.g., representing functional units that must be placed at specific locations). In a restricted model in which each operation has a single terminal assembly, we prove that assembling a given string of labels with the fewest stages is equivalent, up to constant factors, to compressing the string to be uniquely derived from the smallest possible context-free grammar (a well-studied O(logn)-approximable problem). Without this restriction, we show that the optimal assembly can be substantially smaller than the optimal context-free grammar, by a factor of Ω √n/log n even for binary strings of length n. Fortunately, we can bound this separation in model power by a quadratic function in the number of distinct glues or tiles allowed in the assembly, which is typically small in practice
    corecore