190 research outputs found
Recommended from our members
Pattern matching : a sheaf-theoretic approach
A general theory of pattern matching is presented by adopting an extensional, geometric view of patterns. The extension of the matching relation consists of the occurrences of all possible patterns in a particular target. The geometry of the pattern describes the structure of the pattern and the spatial relationships among parts of the pattern. The extension and the geometry, when combined, produce a structure called a sheaf. Sheaf theory is a well developed branch of mathematics which studies the global consequences of locally defined properties. For pattern matching, an occurrence of a pattern, a global property of the pattern, is obtained by gluing together occurrences of parts of the pattern, which are locally defined properties.A sheaf-theoretic view of pattern rnatching provides a uniforrn treatrnent of pattern matching on any kind of data structure-strings, trees, graphs, hypergraphs, and so on. Such a parametric description is achieved by using the language of category theory, a highly abstract description of commonly occurring structures and relationships in mathematics.A generalized version of the Knuth-Morris-Pratt pattern matching algorithm is derived by gradually converting the extensional description of pattern rnatching as a sheaf into an intensional description. The algorithm results from a synergy of four very general program synthesis/transformation techniques: (1) Divide and conquer: exploit the sheaf condition; assemble a full match by gluing together partial matches; (2) Finite differencing: collect and update partial matches incrementally while traversing the target; (3) Backtracking: instead of saving all partial matches, save just one; when this partial match cannot be extended, fail back to another; (4) Partial evaluation: precompute pattern-based (and therefore constant) computations.The derivation is carried out in a general frarnework using Grothendieck topologies. By appropriately instantiating the underlying data structures and topologies, the sarne scheme results in matching algorithms for patterns with variables and with multiple patterns. Slight variations of the derivation result in Earley's algorithm for context-free parsing, and Waltz filtering, a relaxation algorithm for providing 3-D interpretations to 2-D irnages.Other applications of a geometric view of patterns are briefly considered: rewrites, parallel algorithms, induction and computability
String Matching with Variable Length Gaps
We consider string matching with variable length gaps. Given a string and
a pattern consisting of strings separated by variable length gaps
(arbitrary strings of length in a specified range), the problem is to find all
ending positions of substrings in that match . This problem is a basic
primitive in computational biology applications. Let and be the lengths
of and , respectively, and let be the number of strings in . We
present a new algorithm achieving time and space , where is the sum of the lower bounds of the lengths of the gaps in
and is the total number of occurrences of the strings in
within . Compared to the previous results this bound essentially achieves
the best known time and space complexities simultaneously. Consequently, our
algorithm obtains the best known bounds for almost all combinations of ,
, , , and . Our algorithm is surprisingly simple and
straightforward to implement. We also present algorithms for finding and
encoding the positions of all strings in for every match of the pattern.Comment: draft of full version, extended abstract at SPIRE 201
A sheaf-theoretic approach to pattern matching and related problems
AbstractWe present a general theory of pattern matching by adopting an extensional, geometric view of patterns. Representing the geometry of the pattern via a Grothendieck topology, the extension of the matching relation for a constant target and varying pattern forms a sheaf. We derive a generalized version of the Knuth-Morris-Pratt string-matching algorithm by gradually converting this extensional description into an intensional description, i.e., an algorithm. The generality of this approach is illustrated by briefly considering other applications: Earley's algorithm for parsing, Waltz filtering for scene analysis, matching modulo commutativity, and the n-queens problem
Cyclic rewriting and conjugacy problems
Cyclic words are equivalence classes of cyclic permutations of ordinary
words. When a group is given by a rewriting relation, a rewriting system on
cyclic words is induced, which is used to construct algorithms to find minimal
length elements of conjugacy classes in the group. These techniques are applied
to the universal groups of Stallings pregroups and in particular to free
products with amalgamation, HNN-extensions and virtually free groups, to yield
simple and intuitive algorithms and proofs of conjugacy criteria.Comment: 37 pages, 1 figure, submitted. Changes to introductio
A Generalist Neural Algorithmic Learner
The cornerstone of neural algorithmic reasoning is the ability to solve
algorithmic tasks, especially in a way that generalises out of distribution.
While recent years have seen a surge in methodological improvements in this
area, they mostly focused on building specialist models. Specialist models are
capable of learning to neurally execute either only one algorithm or a
collection of algorithms with identical control-flow backbone. Here, instead,
we focus on constructing a generalist neural algorithmic learner -- a single
graph neural network processor capable of learning to execute a wide range of
algorithms, such as sorting, searching, dynamic programming, path-finding and
geometry. We leverage the CLRS benchmark to empirically show that, much like
recent successes in the domain of perception, generalist algorithmic learners
can be built by "incorporating" knowledge. That is, it is possible to
effectively learn algorithms in a multi-task manner, so long as we can learn to
execute them well in a single-task regime. Motivated by this, we present a
series of improvements to the input representation, training regime and
processor architecture over CLRS, improving average single-task performance by
over 20% from prior art. We then conduct a thorough ablation of multi-task
learners leveraging these improvements. Our results demonstrate a generalist
learner that effectively incorporates knowledge captured by specialist models.Comment: 20 pages, 10 figure
A Survey of String Matching Algorithms
ABSTRACT The concept of string matching algorithms are playing an important role of string algorithms in finding a place where one or several strings (patterns) are found in a large body of text (e.g., data streaming, a sentence, a paragraph, a book, etc.). Its application covers a wide range, including intrusion detection Systems (IDS) in computer networks, applications in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. In this paper we present a short survey for well-known and recent updated and hybrid string matching algorithms. These algorithms can be divided into two major categories, known as exact string matching and approximate string matching. The string matching classification criteria was selected to highlight important features of matching strategies, in order to identify challenges and vulnerabilities
Turning function and shape recognition
The technique of turning function is a powerful method for measuring similarity between two dimensional shapes. The method works well when the boundary of the shape does not contain noise edges. We propose an algorithm for smoothing noise edges by decomposing the boundary into monotone components and smoothing the noise edges in each component. We also present an implementation of the proposed smoothing algorithm
Linear pattern matching on sparse suffix trees
Packing several characters into one computer word is a simple and natural way
to compress the representation of a string and to speed up its processing.
Exploiting this idea, we propose an index for a packed string, based on a {\em
sparse suffix tree} \cite{KU-96} with appropriately defined suffix links.
Assuming, under the standard unit-cost RAM model, that a word can store up to
characters ( the alphabet size), our index takes
space, i.e. the same space as the packed string itself.
The resulting pattern matching algorithm runs in time ,
where is the length of the pattern, is the actual number of characters
stored in a word and is the number of pattern occurrences
The Complexity of the Approximate Multiple Pattern Matching Problem for Random Strings
We describe a multiple string pattern matching algorithm which is well-suited for approximate search and dictionaries composed of words of different lengths. We prove that this algorithm has optimal complexity rate up to a multiplicative constant, for arbitrary dictionaries. This extends to arbitrary dictionaries the classical results of Yao [SIAM J. Comput. 8, 1979], and Chang and Marr [Proc. CPM94, 1994]
- âŠ