329,702 research outputs found

    An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

    Get PDF
    We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer--Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm's running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of \emph{differences} in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call \emph{deterministic arithmetic automata} (DAAs) and \emph{probabilistic arithmetic automata} (PAAs)~\cite{Marschall2008}. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool's algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations

    A type system for Continuation Calculus

    Get PDF
    Continuation Calculus (CC), introduced by Geron and Geuvers, is a simple foundational model for functional computation. It is closely related to lambda calculus and term rewriting, but it has no variable binding and no pattern matching. It is Turing complete and evaluation is deterministic. Notions like "call-by-value" and "call-by-name" computation are available by choosing appropriate function definitions: e.g. there is a call-by-value and a call-by-name addition function. In the present paper we extend CC with types, to be able to define data types in a canonical way, and functions over these data types, defined by iteration. Data type definitions follow the so-called "Scott encoding" of data, as opposed to the more familiar "Church encoding". The iteration scheme comes in two flavors: a call-by-value and a call-by-name iteration scheme. The call-by-value variant is a double negation variant of call-by-name iteration. The double negation translation allows to move between call-by-name and call-by-value.Comment: In Proceedings CL&C 2014, arXiv:1409.259

    On the Comparison Complexity of the String Prefix-Matching Problem

    Get PDF
    In this paper we study the exact comparison complexity of the stringprefix-matching problem in the deterministic sequential comparison modelwith equality tests. We derive almost tight lower and upper bounds onthe number of symbol comparisons required in the worst case by on-lineprefix-matching algorithms for any fixed pattern and variable text. Unlikeprevious results on the comparison complexity of string-matching andprefix-matching algorithms, our bounds are almost tight for any particular pattern.We also consider the special case where the pattern and the text are thesame string. This problem, which we call the string self-prefix problem, issimilar to the pattern preprocessing step of the Knuth-Morris-Pratt string-matchingalgorithm that is used in several comparison efficient string-matchingand prefix-matching algorithms, including in our new algorithm.We obtain roughly tight lower and upper bounds on the number of symbolcomparisons required in the worst case by on-line self-prefix algorithms.Our algorithms can be implemented in linear time and space in thestandard uniform-cost random-access-machine model

    Optimal public money

    Get PDF
    In most countries, the supply of paper money is controlled by a state institution. This paper provides an explanation for why such an arrangement is typically chosen. I use a deterministic matching model with a continuum of agents where enforcement is limited and where some agents produce public goods. Agents can also, at a cost, produce a distinguishable, intrinsically useless but perfectly durable good: notes. I call a note public if it is printed by an agent who produces public goods. In this framework, I prove that the socially optimal allocation is only implemented by a pattern of trade in which exchanges are effected using public notes. JEL Classification: D8, E5Limited Commitment, Money

    Indexing Isodirectional Pointer Sequences

    Get PDF
    Many sequential and temporal data have dependency relationships among their elements, which can be represented as a sequence of pointers. In this paper, we introduce a new string matching problem with a particular type of strings, which we call isodirectional pointer sequence, in which each entry has a pointer to another entry. The proposed problem is not only a formalization of real-world dependency matching problems, but also a generalization of variants of the string matching problem such as parameterized pattern matching and Cartesian tree matching. We present a 2nlg?+2n+o(n)-bit index that preprocesses the text T[1:n] so as to count the number of occurrences of pattern P[1:m] in ?(mlg?) where ? is the number of distinct lengths of pointers in T. Our index is also easily implementable in practice because it consists of wavelet trees and range maximum query index, which are widely used building blocks in many other compact data structures. By compressing the wavelet trees, the index can also be stored into 2nH^*?(T)+2n+o(n) bits where H^*?(T) is the 0-th order empirical entropy of the distribution of pointer lengths of T

    MRSI: A Fast Pattern Matching Algorithm for Anti-virus Applications

    Full text link
    Anti-virus applications play an important role in today’s Internet communication security. Virus scanning is usually performed on email, web and file transfer traffic flows at intranet security gateways. The performance of popular anti-virus applications relies on the pattern matching algorithms implemented in these security devices. The growth of network bandwidth and the increase of virus signatures call for high speed and scalable pattern matching algorithms. Motivated by several observations of a real-life virus signature database from Clam-AV, a popular anti-virus application, a fast pattern matching algorithm named MRSI is proposed in this paper. Compared to the current algorithm implemented in Clam-AV, MRSI achieved an 80%~100 % faster virus scanning speed without excessive memory usages

    pBWT: Achieving succinct data structures for parameterized pattern matching and related problems

    Get PDF
    The fields of succinct data structures and compressed text indexing have seen quite a bit of progress over the last two decades. An important achievement, primarily using techniques based on the Burrows-Wheeler Transform (BWT), was obtaining the full functionality of the suffix tree in the optimal number of bits. A crucial property that allows the use of BWT for designing compressed indexes is order-preserving suffix links. Specifically, the relative order between two suffixes in the subtree of an internal node is same as that of the suffixes obtained by truncating the furst character of the two suffixes. Unfortunately, in many variants of the text-indexing problem, for e.g., parameterized pattern matching, 2D pattern matching, and order-isomorphic pattern matching, this property does not hold. Consequently, the compressed indexes based on BWT do not directly apply. Furthermore, a compressed index for any of these variants has been elusive throughout the advancement of the field of succinct data structures. We achieve a positive breakthrough on one such problem, namely the Parameterized Pattern Matching problem. Let T be a text that contains n characters from an alphabet , which is the union of two disjoint sets: containing static characters (s-characters) and containing parameterized characters (p-characters). A pattern P (also over ) matches an equal-length substring S of T i the s-characters match exactly, and there exists a one-to-one function that renames the p-characters in S to that in P. The task is to find the starting positions (occurrences) of all such substrings S. Previous index [Baker, STOC 1993], known as Parameterized Suffix Tree, requires (n log n) bits of space, and can find all occ occurrences in time O(jPj log +occ), where = jj. We introduce an n log +O(n)-bit index with O(jPj log +occlog n log ) query time. At the core, lies a new BWT-like transform, which we call the Parame- terized Burrows-Wheeler Transform (pBWT). The techniques are extended to obtain a succinct index for the Parameterized Dictionary Matching problem of Idury and Schaer [CPM, 1994]

    Well-Founded Recursion over Contextual Objects

    Get PDF
    We present a core programming language that supports writing well-founded structurally recursive functions using simultaneous pattern matching on contextual LF objects and contexts. The main technical tool is a coverage checking algorithm that also generates valid recursive calls. To establish consistency, we define a call-by-value small-step semantics and prove that every well-typed program terminates using a reducibility semantics. Based on the presented methodology we have implemented a totality checker as part of the programming and proof environment Beluga where it can be used to establish that a total Beluga program corresponds to a proof
    • …
    corecore