190 research outputs found

    String Matching with Variable Length Gaps

    Get PDF
    We consider string matching with variable length gaps. Given a string TT and a pattern PP consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in TT that match PP. This problem is a basic primitive in computational biology applications. Let mm and nn be the lengths of PP and TT, respectively, and let kk be the number of strings in PP. We present a new algorithm achieving time O(nlog⁥k+m+α)O(n\log k + m +\alpha) and space O(m+A)O(m + A), where AA is the sum of the lower bounds of the lengths of the gaps in PP and α\alpha is the total number of occurrences of the strings in PP within TT. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of mm, nn, kk, AA, and α\alpha. Our algorithm is surprisingly simple and straightforward to implement. We also present algorithms for finding and encoding the positions of all strings in PP for every match of the pattern.Comment: draft of full version, extended abstract at SPIRE 201

    A sheaf-theoretic approach to pattern matching and related problems

    Get PDF
    AbstractWe present a general theory of pattern matching by adopting an extensional, geometric view of patterns. Representing the geometry of the pattern via a Grothendieck topology, the extension of the matching relation for a constant target and varying pattern forms a sheaf. We derive a generalized version of the Knuth-Morris-Pratt string-matching algorithm by gradually converting this extensional description into an intensional description, i.e., an algorithm. The generality of this approach is illustrated by briefly considering other applications: Earley's algorithm for parsing, Waltz filtering for scene analysis, matching modulo commutativity, and the n-queens problem

    Cyclic rewriting and conjugacy problems

    Full text link
    Cyclic words are equivalence classes of cyclic permutations of ordinary words. When a group is given by a rewriting relation, a rewriting system on cyclic words is induced, which is used to construct algorithms to find minimal length elements of conjugacy classes in the group. These techniques are applied to the universal groups of Stallings pregroups and in particular to free products with amalgamation, HNN-extensions and virtually free groups, to yield simple and intuitive algorithms and proofs of conjugacy criteria.Comment: 37 pages, 1 figure, submitted. Changes to introductio

    A Generalist Neural Algorithmic Learner

    Full text link
    The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.Comment: 20 pages, 10 figure

    A Survey of String Matching Algorithms

    Get PDF
    ABSTRACT The concept of string matching algorithms are playing an important role of string algorithms in finding a place where one or several strings (patterns) are found in a large body of text (e.g., data streaming, a sentence, a paragraph, a book, etc.). Its application covers a wide range, including intrusion detection Systems (IDS) in computer networks, applications in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. In this paper we present a short survey for well-known and recent updated and hybrid string matching algorithms. These algorithms can be divided into two major categories, known as exact string matching and approximate string matching. The string matching classification criteria was selected to highlight important features of matching strategies, in order to identify challenges and vulnerabilities

    Turning function and shape recognition

    Full text link
    The technique of turning function is a powerful method for measuring similarity between two dimensional shapes. The method works well when the boundary of the shape does not contain noise edges. We propose an algorithm for smoothing noise edges by decomposing the boundary into monotone components and smoothing the noise edges in each component. We also present an implementation of the proposed smoothing algorithm

    Linear pattern matching on sparse suffix trees

    Get PDF
    Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to logâĄÏƒn\log_{\sigma}n characters (σ\sigma the alphabet size), our index takes O(n/logâĄÏƒn)O(n/\log_{\sigma}n) space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time O(m+r2+r⋅occ)O(m+r^2+r\cdot occ), where mm is the length of the pattern, rr is the actual number of characters stored in a word and occocc is the number of pattern occurrences

    The Complexity of the Approximate Multiple Pattern Matching Problem for Random Strings

    Get PDF
    We describe a multiple string pattern matching algorithm which is well-suited for approximate search and dictionaries composed of words of different lengths. We prove that this algorithm has optimal complexity rate up to a multiplicative constant, for arbitrary dictionaries. This extends to arbitrary dictionaries the classical results of Yao [SIAM J. Comput. 8, 1979], and Chang and Marr [Proc. CPM94, 1994]
    • 

    corecore