1,461 research outputs found

    Online Detection of Repetitions with Backtracking

    Full text link
    In this paper we present two algorithms for the following problem: given a string and a rational e>1e > 1, detect in the online fashion the earliest occurrence of a repetition of exponent e\ge e in the string. 1. The first algorithm supports the backtrack operation removing the last letter of the input string. This solution runs in O(nlogm)O(n\log m) time and O(m)O(m) space, where mm is the maximal length of a string generated during the execution of a given sequence of nn read and backtrack operations. 2. The second algorithm works in O(nlogσ)O(n\log\sigma) time and O(n)O(n) space, where nn is the length of the input string and σ\sigma is the number of distinct letters. This algorithm is relatively simple and requires much less memory than the previously known solution with the same working time and space. a string generated during the execution of a given sequence of nn read and backtrack operations.Comment: 12 pages, 5 figures, accepted to CPM 201

    Improving Developers\u27 Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies

    Get PDF
    Regular expressions are used for diverse purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS (Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. Due to the severity and prevalence of ReDoS, past work proposed automatic tools to detect and fix regexes. Although these tools were evaluated in automatic experiments, their usability has not yet been studied; usability has not been a focus of prior work. Our insight is that the usability of existing tools to detect and fix regexes will improve if we complement them with anti-patterns and fix strategies of vulnerable regexes. We developed novel anti-patterns for vulnerable regexes, and a collection of fix strategies to fix them. We derived our anti-patterns and fix strategies from a novel theory of regex infinite ambiguity—a necessary condition for regexes vulnerable to ReDoS. We proved the soundness and completeness of our theory. We evaluated the effectiveness of our anti-patterns, both in an automatic experiment and when applied manually. Then, we evaluated how much our anti-patterns and fix strategies improve developers’ understanding of the outcome of detection and fixing tools. Our evaluation found that our anti-patterns were effective over a large dataset of regexes (N=209,188): 100% precision and 99% recall, improving the state of the art 50% precision and 87% recall. Our anti-patterns were also more effective than the state of the art when applied manually (N=20): 100% developers applied them effectively vs. 50% for the state of the art. Finally, our anti-patterns and fix strategies increased developers’ understanding using automatic tools (N=9): from median “Very weakly” to median “Strongly” when detecting vulnerabilities, and from median “Very weakly” to median “Very strongly” when fixing them

    Optimising Unicode Regular Expression Evaluation with Previews

    Get PDF
    The jsre regular expression library was designed to provide fast matching of complex expressions over large input streams using user-selectable character encodings. An established design approach was used: a simulated non-deterministic automaton (NFA) implemented as a virtual machine, avoiding exponential cost functions in either space or time. A deterministic automaton (DFA) was chosen as a general dispatching mechanism for Unicode character classes and this also provided the opportunity to use compact DFAs in various optimization strategies. The result was the development of a regular expression Preview which provides a summary of all the matches possible from a given point in a regular expression in a form that can be implemented as a compact DFA and can be used to further improve the performance of the standard NFA simulation algorithm. This paper formally defines a preview and describes and evaluates several optimizations using this construct. They provide significant speed improvements accrued from fast scanning of anchor positions, avoiding retesting of repeated strings in unanchored searches, and efficient searching of multiple alternate expressions which in the case of keyword searching has a time complexity which is logarithmic in the number of words to be searched

    Computing Runs on a General Alphabet

    Full text link
    We describe a RAM algorithm computing all runs (maximal repetitions) of a given string of length nn over a general ordered alphabet in O(nlog23n)O(n\log^{\frac{2}3} n) time and linear space. Our algorithm outperforms all known solutions working in Θ(nlogσ)\Theta(n\log\sigma) time provided σ=nΩ(1)\sigma = n^{\Omega(1)}, where σ\sigma is the alphabet size. We conjecture that there exists a linear time RAM algorithm finding all runs.Comment: 4 pages, 2 figure

    Coarse-graining in retrodictive quantum state tomography

    Full text link
    Quantum state tomography often operates in the highly idealised scenario of assuming perfect measurements. The errors implied by such an approach are entwined with other imperfections relating to the information processing protocol or application of interest. We consider the problem of retrodicting the quantum state of a system, existing prior to the application of random but known phase errors, allowing those errors to be separated and removed. The continuously random nature of the errors implies that there is only one click per measurement outcome -- a feature having a drastically adverse effect on data-processing times. We provide a thorough analysis of coarse-graining under various reconstruction algorithms, finding dramatic increases in speed for only modest sacrifices in fidelity
    corecore