27,903 research outputs found
Fast and Compact Regular Expression Matching
We study 4 problems in string matching, namely, regular expression matching,
approximate regular expression matching, string edit distance, and subsequence
indexing, on a standard word RAM model of computation that allows
logarithmic-sized words to be manipulated in constant time. We show how to
improve the space and/or remove a dependency on the alphabet size for each
problem using either an improved tabulation technique of an existing algorithm
or by combining known algorithms in a new way
String Matching with Variable Length Gaps
We consider string matching with variable length gaps. Given a string and
a pattern consisting of strings separated by variable length gaps
(arbitrary strings of length in a specified range), the problem is to find all
ending positions of substrings in that match . This problem is a basic
primitive in computational biology applications. Let and be the lengths
of and , respectively, and let be the number of strings in . We
present a new algorithm achieving time and space , where is the sum of the lower bounds of the lengths of the gaps in
and is the total number of occurrences of the strings in
within . Compared to the previous results this bound essentially achieves
the best known time and space complexities simultaneously. Consequently, our
algorithm obtains the best known bounds for almost all combinations of ,
, , , and . Our algorithm is surprisingly simple and
straightforward to implement. We also present algorithms for finding and
encoding the positions of all strings in for every match of the pattern.Comment: draft of full version, extended abstract at SPIRE 201
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
Substring filtering for low-cost linked data interfaces
Recently, Triple Pattern Fragments (TPFS) were introduced as a low-cost server-side interface when high numbers of clients need to evaluate SPARQL queries. Scalability is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the TPFS interface purposely does not support complex constructs such as SPARQL filters, queries that use them need to be executed mostly on the client, resulting in long execution times. We therefore investigated the impact of adding a literal substring matching feature to the TPFS interface, with the goal of improving query performance while maintaining low server cost. In this paper, we discuss the client/server setup and compare the performance of SPARQL queries on multiple implementations, including Elastic Search and case-insensitive FM-index. Our evaluations indicate that these improvements allow for faster query execution without significantly increasing the load on the server. Offering the substring feature on TPF servers allows users to obtain faster responses for filter-based SPARQL queries. Furthermore, substring matching can be used to support other filters such as complete regular expressions or range queries
Dagstuhl Reports : Volume 1, Issue 2, February 2011
Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn
- …