534,324 research outputs found
Efficient Online Timed Pattern Matching by Automata-Based Skipping
The timed pattern matching problem is an actively studied topic because of
its relevance in monitoring of real-time systems. There one is given a log
and a specification (given by a timed word and a timed automaton
in this paper), and one wishes to return the set of intervals for which the log
, when restricted to the interval, satisfies the specification
. In our previous work we presented an efficient timed pattern
matching algorithm: it adopts a skipping mechanism inspired by the classic
Boyer--Moore (BM) string matching algorithm. In this work we tackle the problem
of online timed pattern matching, towards embedded applications where it is
vital to process a vast amount of incoming data in a timely manner.
Specifically, we start with the Franek-Jennings-Smyth (FJS) string matching
algorithm---a recent variant of the BM algorithm---and extend it to timed
pattern matching. Our experiments indicate the efficiency of our FJS-type
algorithm in online and offline timed pattern matching
Capturing Topology in Graph Pattern Matching
Graph pattern matching is often defined in terms of subgraph isomorphism, an
NP-complete problem. To lower its complexity, various extensions of graph
simulation have been considered instead. These extensions allow pattern
matching to be conducted in cubic-time. However, they fall short of capturing
the topology of data graphs, i.e., graphs may have a structure drastically
different from pattern graphs they match, and the matches found are often too
large to understand and analyze. To rectify these problems, this paper proposes
a notion of strong simulation, a revision of graph simulation, for graph
pattern matching. (1) We identify a set of criteria for preserving the topology
of graphs matched. We show that strong simulation preserves the topology of
data graphs and finds a bounded number of matches. (2) We show that strong
simulation retains the same complexity as earlier extensions of simulation, by
providing a cubic-time algorithm for computing strong simulation. (3) We
present the locality property of strong simulation, which allows us to
effectively conduct pattern matching on distributed graphs. (4) We
experimentally verify the effectiveness and efficiency of these algorithms,
using real-life data and synthetic data.Comment: VLDB201
APPROXIMATION ALGORITHMS FOR POINT PATTERN MATCHING AND SEARCHI NG
Point pattern matching is a fundamental problem in computational geometry.
For given a reference set and pattern set, the problem is to find a
geometric transformation applied to the pattern set that minimizes some
given distance measure with respect to the reference set. This problem has
been heavily researched under various distance measures and error models.
Point set similarity searching is variation of this problem in which a
large database of point sets is given, and the task is to preprocess
this database into a data structure so that, given a query point set,
it is possible to rapidly find the nearest point set among elements of
the database. Here, the term nearest is understood in
above sense of pattern matching, where the elements of the database may be
transformed to match the given query set. The approach presented here is
to compute a low distortion embedding of the pattern matching problem into
an (ideally) low dimensional metric space and then apply any standard
algorithm for nearest neighbor searching over this metric space.
This main focus of this dissertation is on two problems
in the area of point pattern matching and searching algorithms:
(i) improving the accuracy of alignment-based point pattern matching and
(ii) computing low-distortion embeddings of point sets into vector spaces.
For the first problem, new methods are presented for matching point sets
based on alignments of small subsets of points. It is shown that these methods
lead to better approximation bounds for alignment-based planar point pattern
matching algorithms under the Hausdorff distance. Furthermore, it is shown
that these approximation bounds are nearly the best achievable by alignment-based
methods.
For the second problem, results are presented for two different distance
measures. First, point pattern similarity search under translation for point sets
in multidimensional integer space is considered, where the distance function is
the symmetric difference. A randomized embedding into real space under the L1
metric is given. The algorithm achieves an expected distortion of O(log2 n).
Second, an algorithm is given for embedding Rd under the Earth Mover's
Distance (EMD) into multidimensional integer space under the symmetric difference
distance. This embedding achieves a distortion of O(log D), where D is
the diameter of the point set. Combining this with the above result implies that
point pattern similarity search with translation under the EMD can be embedded in
to
real space in the L1 metric with an expected distortion of O(log2 n log D)
Pattern matching and pattern discovery algorithms for protein topologies
We describe algorithms for pattern matching and pattern
learning in TOPS diagrams (formal descriptions of protein topologies).
These problems can be reduced to checking for subgraph isomorphism
and finding maximal common subgraphs in a restricted class of ordered
graphs. We have developed a subgraph isomorphism algorithm for
ordered graphs, which performs well on the given set of data. The
maximal common subgraph problem then is solved by repeated
subgraph extension and checking for isomorphisms. Despite the
apparent inefficiency such approach gives an algorithm with time
complexity proportional to the number of graphs in the input set and is
still practical on the given set of data. As a result we obtain fast
methods which can be used for building a database of protein
topological motifs, and for the comparison of a given protein of known
secondary structure against a motif database
Worst-case efficient single and multiple string matching on packed texts in the word-RAM model
AbstractIn this paper, we explore worst-case solutions for the problems of single and multiple matching on strings in the word-RAM model with word length w. In the first problem, we have to build a data structure based on a pattern p of length m over an alphabet of size Ļ such that we can answer to the following query: given a text T of length n, where each character is encoded using logĻ bits return the positions of all the occurrences of p in T (in the following we refer by occ to the number of reported occurrences). For the multi-pattern matching problem we have a set S of d patterns of total length m and a query on a text T consists in finding all positions of all occurrences in T of the patterns in S. As each character of the text is encoded using logĻ bits and we can read w bits in constant time in the RAM model, we assume that we can read up to Ī(w/logĻ) consecutive characters of the text in one time step. This implies that the fastest possible query time for both problems is O(nlogĻw+occ). In this paper we present several different results for both problems which come close to that best possible query time. We first present two different linear space data structures for the first and second problem: the first one answers to single pattern matching queries in time O(n(1m+logĻw)+occ) while the second one answers to multiple pattern matching queries to O(n(logd+logy+loglogmy+logĻw)+occ) where y is the length of the shortest pattern. We then show how a simple application of the four Russian technique permits to get data structures with query times independent of the length of the shortest pattern (the length of the only pattern in case of single string matching) at the expense of using more space
- ā¦