Search CORE

6 research outputs found

Fast Searching in Packed Strings

Author: A. Amir
D.E. Knuth
E.W. Myers
G. Navarro
J. Tarhio
K. Fredriksson
K. Fredriksson
R. Baeza-Yates
R.A. Baeza-Yates
R.M. Karp
R.S. Boyer
S. Wu
S.T. Klein
T.A. Welch
V.L. Arlazarov
W. Masek
W. Rytter
Publication venue
Publication date: 01/01/2009
Field of study

Given strings

P

and

Q

the (exact) string matching problem is to find all positions of substrings in

Q

matching

P

. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let

m \leq n

be the lengths

P

and

Q

, respectively, and let

\sigma

denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time O\left(\frac{n}{\log_\sigma n} + m + \occ\right). Here \occ is the number of occurrences of

P

Q

. For

m = o(n)

this improves the

O(n)

bound of the Knuth-Morris-Pratt algorithm. Furthermore, if

m = O(n/\log_\sigma n)

our algorithm is optimal since any algorithm must spend at least \Omega(\frac{(n+m)\log \sigma}{\log n} + \occ) = \Omega(\frac{n}{\log_\sigma n} + \occ) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.Comment: To appear in Journal of Discrete Algorithms. Special Issue on CPM 200

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Online Research Database In Technology

Fast and Compact Regular Expression Matching

Author: Bille Philip
Farach-Colton Martin
Publication venue
Publication date: 01/01/2008
Field of study

We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

The IT University of Copenhagen's Repository

Structural Similarity between XML Documents and DTDs

Author: E. W. Myers
E. W. Myers
G. Navarro
K. Zhang
S. Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Fine-grained Complexity Meets IP = PSPACE

Author: Chen Lijie
Goldwasser Shafi
Lyu Kaifeng
Rothblum Guy N.
Rubinstein Aviad
Publication venue
Publication date: 03/11/2018
Field of study

In this paper we study the fine-grained complexity of finding exact and approximate solutions to problems in P. Our main contribution is showing reductions from exact to approximate solution for a host of such problems. As one (notable) example, we show that the Closest-LCS-Pair problem (Given two sets of strings

A

and

B

, compute exactly the maximum

\textsf{LCS}(a, b)

with

(a, b) \in A \times B

) is equivalent to its approximation version (under near-linear time reductions, and with a constant approximation factor). More generally, we identify a class of problems, which we call BP-Pair-Class, comprising both exact and approximate solutions, and show that they are all equivalent under near-linear time reductions. Exploring this class and its properties, we also show:

\bullet

Under the NC-SETH assumption (a significantly more relaxed assumption than SETH), solving any of the problems in this class requires essentially quadratic time.

\bullet

Modest improvements on the running time of known algorithms (shaving log factors) would imply that NEXP is not in non-uniform

\textsf{NC}^1

\bullet

Finally, we leverage our techniques to show new barriers for deterministic approximation algorithms for LCS. At the heart of these new results is a deep connection between interactive proof systems for bounded-space computations and the fine-grained complexity of exact and approximate solutions to problems in P. In particular, our results build on the proof techniques from the classical IP = PSPACE result

arXiv.org e-Print Archive

Recommended from our members

A framework for correlation and aggregation of security alerts in communication networks. A reasoning correlation and aggregation approach to detect multi-stage attack scenarios using elementary alerts generated by Network Intrusion Detection Systems (NIDS) for a global security perspective.

Author: Alserhani Faeiz
Publication venue: School of Computing, Informatics and Media
Publication date: 01/01/2011
Field of study

The tremendous increase in usage and complexity of modern communication and network systems connected to the Internet, places demands upon security management to protect organisations¿ sensitive data and resources from malicious intrusion. Malicious attacks by intruders and hackers exploit flaws and weakness points in deployed systems through several sophisticated techniques that cannot be prevented by traditional measures, such as user authentication, access controls and firewalls. Consequently, automated detection and timely response systems are urgently needed to detect abnormal activities by monitoring network traffic and system events. Network Intrusion Detection Systems (NIDS) and Network Intrusion Prevention Systems (NIPS) are technologies that inspect traffic and diagnose system behaviour to provide improved attack protection. The current implementation of intrusion detection systems (commercial and open-source) lacks the scalability to support the massive increase in network speed, the emergence of new protocols and services. Multi-giga networks have become a standard installation posing the NIDS to be susceptible to resource exhaustion attacks. The research focuses on two distinct problems for the NIDS: missing alerts due to packet loss as a result of NIDS performance limitations; and the huge volumes of generated alerts by the NIDS overwhelming the security analyst which makes event observation tedious. A methodology for analysing alerts using a proposed framework for alert correlation has been presented to provide the security operator with a global view of the security perspective. Missed alerts are recovered implicitly using a contextual technique to detect multi-stage attack scenarios. This is based on the assumption that the most serious intrusions consist of relevant steps that temporally ordered. The pre- and post- condition approach is used to identify the logical relations among low level alerts. The alerts are aggregated, verified using vulnerability modelling, and correlated to construct multi-stage attacks. A number of algorithms have been proposed in this research to support the functionality of our framework including: alert correlation, alert aggregation and graph reduction. These algorithms have been implemented in a tool called Multi-stage Attack Recognition System (MARS) consisting of a collection of integrated components. The system has been evaluated using a series of experiments and using different data sets i.e. publicly available datasets and data sets collected using real-life experiments. The results show that our approach can effectively detect multi-stage attacks. The false positive rates are reduced due to implementation of the vulnerability and target host information

Bradford Scholars