Search CORE

1,461 research outputs found

Online Detection of Repetitions with Backtracking

Author: A Apostolico
D Breslauer
D Breslauer
H Leung
J Jansson
JJ Hong
M Crochemore
MG Main
Z Galil
Publication venue
Publication date: 01/01/2015
Field of study

In this paper we present two algorithms for the following problem: given a string and a rational

e > 1

, detect in the online fashion the earliest occurrence of a repetition of exponent

\ge e

in the string. 1. The first algorithm supports the backtrack operation removing the last letter of the input string. This solution runs in

O(n\log m)

time and

O(m)

space, where

m

is the maximal length of a string generated during the execution of a given sequence of

n

read and backtrack operations. 2. The second algorithm works in

O(n\log\sigma)

time and

O(n)

space, where

n

is the length of the input string and

\sigma

is the number of distinct letters. This algorithm is relatively simple and requires much less memory than the previously known solution with the same working time and space. a string generated during the execution of a given sequence of

n

read and backtrack operations.Comment: 12 pages, 5 figures, accepted to CPM 201

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Recommended from our members

Systematic comparison of BIC-based speaker segmentation systems

Author: Benetos E.
Kotropoulos C.
Kotti M.
Moschou V.
Publication venue
Publication date: 01/01/2007
Field of study

Unsupervised speaker change detection is addressed in this paper. Three speaker segmentation systems are examined. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic fusion scheme, and applies the Bayesian Information Criterion (BIC). The second system consists of three modules. In the first module, a second-order statistic-measure is extracted; the Euclidean distance and the T2 Hotelling statistic are applied sequentially in the second module; and BIC is utilized in the third module. The third system, first uses a metric-based approach, in order to detect potential speaker change points, and then the BIC criterion is applied to validate the previously detected change points. Experiments are carried out on a dataset, which is created by concatenating speakers from the TIMIT database. A systematic performance comparison among the three systems is carried out by means of one-way ANOVA method and post hoc Tukey’s method

City Research Online

Crossref

Spiral - Imperial College Digital Repository

Improving Developers\u27 Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies

Author: Aamir Zainab
Davis James C
Hassan Sk Adnan
Lee Dongyoon
Servant Francisco
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2023
Field of study

Regular expressions are used for diverse purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS (Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. Due to the severity and prevalence of ReDoS, past work proposed automatic tools to detect and fix regexes. Although these tools were evaluated in automatic experiments, their usability has not yet been studied; usability has not been a focus of prior work. Our insight is that the usability of existing tools to detect and fix regexes will improve if we complement them with anti-patterns and fix strategies of vulnerable regexes. We developed novel anti-patterns for vulnerable regexes, and a collection of fix strategies to fix them. We derived our anti-patterns and fix strategies from a novel theory of regex infinite ambiguity—a necessary condition for regexes vulnerable to ReDoS. We proved the soundness and completeness of our theory. We evaluated the effectiveness of our anti-patterns, both in an automatic experiment and when applied manually. Then, we evaluated how much our anti-patterns and fix strategies improve developers’ understanding of the outcome of detection and fixing tools. Our evaluation found that our anti-patterns were effective over a large dataset of regexes (N=209,188): 100% precision and 99% recall, improving the state of the art 50% precision and 87% recall. Our anti-patterns were also more effective than the state of the art when applied manually (N=20): 100% developers applied them effectively vs. 50% for the state of the art. Finally, our anti-patterns and fix strategies increased developers’ understanding using automatic tools (N=9): from median “Very weakly” to median “Strongly” when detecting vulnerabilities, and from median “Very weakly” to median “Very strongly” when fixing them

Purdue E-Pubs

Optimising Unicode Regular Expression Evaluation with Previews

Author: Chivers Howard Robert
Publication venue
Publication date: 15/09/2016
Field of study

The jsre regular expression library was designed to provide fast matching of complex expressions over large input streams using user-selectable character encodings. An established design approach was used: a simulated non-deterministic automaton (NFA) implemented as a virtual machine, avoiding exponential cost functions in either space or time. A deterministic automaton (DFA) was chosen as a general dispatching mechanism for Unicode character classes and this also provided the opportunity to use compact DFAs in various optimization strategies. The result was the development of a regular expression Preview which provides a summary of all the matches possible from a given point in a regular expression in a form that can be implemented as a compact DFA and can be used to further improve the performance of the standard NFA simulation algorithm. This paper formally defines a preview and describes and evaluates several optimizations using this construct. They provide significant speed improvements accrued from fast scanning of anchor positions, avoiding retesting of repeated strings in unanchored searches, and efficient searching of multiple alternate expressions which in the case of keyword searching has a time complexity which is logarithmic in the number of words to be searched

White Rose Research Online

Computing Runs on a General Alphabet

Author: Kosolobov Dmitry
Publication venue
Publication date: 22/11/2015
Field of study

We describe a RAM algorithm computing all runs (maximal repetitions) of a given string of length

n

over a general ordered alphabet in

O(n\log^{\frac{2}3} n)

time and linear space. Our algorithm outperforms all known solutions working in

\Theta(n\log\sigma)

time provided

\sigma = n^{\Omega(1)}

, where

\sigma

is the alphabet size. We conjecture that there exists a linear time RAM algorithm finding all runs.Comment: 4 pages, 2 figure

arXiv.org e-Print Archive

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Coarse-graining in retrodictive quantum state tomography

Author: Gauger Erik M.
Knee George C.
Scerri Dale
Publication venue: 'IOP Publishing'
Publication date: 21/09/2018
Field of study

Quantum state tomography often operates in the highly idealised scenario of assuming perfect measurements. The errors implied by such an approach are entwined with other imperfections relating to the information processing protocol or application of interest. We consider the problem of retrodicting the quantum state of a system, existing prior to the application of random but known phase errors, allowing those errors to be separated and removed. The continuously random nature of the errors implies that there is only one click per measurement outcome -- a feature having a drastically adverse effect on data-processing times. We provide a thorough analysis of coarse-graining under various reconstruction algorithms, finding dramatic increases in speed for only modest sacrifices in fidelity

arXiv.org e-Print Archive

Heriot Watt Pure