122 research outputs found

    On minimizing deterministic tree automata

    Get PDF
    We present two algorithms for minimizing deterministic frontier-to-root tree automata (dfrtas) and compare them with their string counterparts. The presentation is incremental, starting out from definitions of minimality of automata and state equivalence, in the style of earlier algorithm taxonomies by the authors. The first algorithm is the classical one, initially presented by Brainerd in the 1960s and presented (sometimes imprecisely) in standard texts on tree language theory ever since. The second algorithm is completely new. This algorithm, essentially representing the generalization to ranked trees of the string algorithm presented by Watson and Daciuk, incrementally minimizes a dfrta. As a result, intermediate results of the algorithm can be used to reduce the initial automaton’s size. This makes the algorithm useful in situations where running time is restricted (for example, in real-time applications). We also briefly sketch how a concurrent specification of the algorithm in CSP can be obtained from an existing specification for the dfa case

    On minimizing deterministic tree automata

    Get PDF
    We present two algorithms for minimizing deterministic frontier-to-root tree automata (dfrtas) and compare them with their string counterparts. The presentation is incremental, starting out from definitions of minimality of automata and state equivalence, in the style of earlier algorithm taxonomies by the authors. The first algorithm is the classical one, initially presented by Brainerd in the 1960s and presented (sometimes imprecisely) in standard texts on tree language theory ever since. The second algorithm is completely new. This algorithm, essentially representing the generalization to ranked trees of the string algorithm presented by Watson and Daciuk, incrementally minimizes a dfrta. As a result, intermediate results of the algorithm can be used to reduce the initial automaton’s size. This makes the algorithm useful in situations where running time is restricted (for example, in real-time applications). We also briefly sketch how a concurrent specification of the algorithm in CSP can be obtained from an existing specification for the dfa case

    Finite Automata Implementations Considering CPU Cache

    Get PDF
    The finite automata are mathematical models for finite state systems. More general finite automaton is the nondeterministic finite automaton (NFA) that cannot be directly used. It is usually transformed to the deterministic finite automaton (DFA) that then runs in time O(n), where n is the size of the input text. We present two main approaches to practical implementation of DFA considering CPU cache. The first approach (represented by Table Driven and Hard Coded implementations) is suitable forautomata being run very frequently, typically having cycles. The other approach is suitable for a collection of automata from which various automata are retrieved and then run. This second kind of automata are expected to be cycle-free.

    Indexing XML Documents Using Tree Paths Automaton

    Get PDF
    An XML document can be viewed as a tree in a natural way. Processing tree data structures usually requires a pushdown automaton as a model of computation. Therefore, it is interesting that a finite automaton can be used to solve the XML index problem. In this paper, we attempt to support a significant fragment of XPath queries which may use any combination of child (i.e., /) and descendant-or-self (i.e., //) axis. A systematic approach to the construction of such XML index, which is a finite automaton called Tree Paths Automaton, is presented. Given an XML tree model T, the tree is first of all preprocessed by means of its linear fragments called string paths. Since only path queries are considered, the branching structure of the XML tree model can be omitted. For individual string paths, smaller Tree Paths Automata are built, and they are afterwards combined to form the index. The searching phase uses the index, reads an input query Q of size m, and computes the list of positions of all occurrences of Q in the tree T. The searching is performed in time O(m) and does not depend on the size of the XML document. Although the number of queries is clearly exponential in the number of nodes of the XML tree model, the size of the index seems to be, according to our experimental results, usually only about 2.5 times larger than the size of the original document

    Efficient Implementation for Deterministic Finite Tree Automata Minimization

    Full text link

    Solving String Problems on Graphs Using the Labeled Direct Product

    Get PDF
    Suffix trees are an important data structure at the core of optimal solutions to many fundamental string problems, such as exact pattern matching, longest common substring, matching statistics, and longest repeated substring. Recent lines of research focused on extending some of these problems to vertex-labeled graphs, either by using efficient ad-hoc approaches which do not generalize to all input graphs, or by indexing difficult graphs and having worst-case exponential complexities. In the absence of an ubiquitous and polynomial tool like the suffix tree for labeled graphs, we introduce the labeled direct product of two graphs as a general tool for obtaining optimal algorithms in the worst case: we obtain conceptually simpler algorithms for the quadratic problems of string matching (SMLG) and longest common substring (LCSP) in labeled graphs. Our algorithms run in time linear in the size of the labeled product graph, which may be smaller than quadratic for some inputs, and their run-time is predictable, because the size of the labeled direct product graph can be precomputed efficiently. We also solve LCSP on graphs containing cycles, which was left as an open problem by Shimohira et al. in 2011. To show the power of the labeled product graph, we also apply it to solve the matching statistics (MSP) and the longest repeated string (LRSP) problems in labeled graphs. Moreover, we show that our (worst-case quadratic) algorithms are also optimal, conditioned on the Orthogonal Vectors Hypothesis. Finally, we complete the complexity picture around LRSP by studying it on undirected graphs.Peer reviewe

    A Survey of String Matching Algorithms

    Get PDF
    ABSTRACT The concept of string matching algorithms are playing an important role of string algorithms in finding a place where one or several strings (patterns) are found in a large body of text (e.g., data streaming, a sentence, a paragraph, a book, etc.). Its application covers a wide range, including intrusion detection Systems (IDS) in computer networks, applications in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. In this paper we present a short survey for well-known and recent updated and hybrid string matching algorithms. These algorithms can be divided into two major categories, known as exact string matching and approximate string matching. The string matching classification criteria was selected to highlight important features of matching strategies, in order to identify challenges and vulnerabilities

    Exact string matching algorithms : survey, issues, and future research directions

    Get PDF
    String matching has been an extensively studied research domain in the past two decades due to its various applications in the fields of text, image, signal, and speech processing. As a result, choosing an appropriate string matching algorithm for current applications and addressing challenges is difficult. Understanding different string matching approaches (such as exact string matching and approximate string matching algorithms), integrating several algorithms, and modifying algorithms to address related issues are also difficult. This paper presents a survey on single-pattern exact string matching algorithms. The main purpose of this survey is to propose new classification, identify new directions and highlight the possible challenges, current trends, and future works in the area of string matching algorithms with a core focus on exact string matching algorithms. © 2013 IEEE
    corecore