131,229 research outputs found

    State Complexity of Regular Tree Languages for Tree Matching

    Get PDF
    We study the state complexity of regular tree languages for tree matching problem. Given a tree t and a set of pattern trees L, we can decide whether or not there exists a subtree occurrence of trees in L from the tree t by considering the new language L′ which accepts all trees containing trees in L as subtrees. We consider the case when we are given a set of pattern trees as a regular tree language and investigate the state complexity. Based on the sequential and parallel tree concatenation, we define three types of tree languages for deciding the existence of different types of subtree occurrences. We also study the deterministic top-down state complexity of path-closed languages for the same problem.</jats:p

    Implementation of parallel algorithm for run of k-local tree automata

    Get PDF
    Tato práce se zabývá k-lokálními deterministickými konečnými stromovými automaty (DKSA), které hrají důležitou roli při hledání vzorů ve stromových strukturách. Existuje pracovně optimální paralelní algoritmus pro běh k-lokálních DKSA na výpočetním modelu EREW PRAM. Tento algoritmus bude implementován, experimentálně změřen a porovnán se sekvenčním algoritmem v této práci.This thesis deals with k-local deterministic finite tree automata (DFTA) which are important for tree pattern matching. There exists a work-optimal parallel algorithm for a run of k-local DFTA on EREW PRAM. This algorithm will be implemented, experimentally measured and compared with the sequential algorithm in this thesis

    On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

    Get PDF
    We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with pp processors. Given a static text of length nn, we first show how to compute the suffix array interval of a given pattern of length mm in O(mp+lgp+lglgplglgn)O(\frac{m}{p}+ \lg p + \lg\lg p\cdot\lg\lg n) time for pmp \le m. For approximate pattern matching with kk differences or mismatches, we show how to compute all occurrences of a given pattern in O(mkσkpmax(k,lglgn) ⁣+ ⁣(1+mp)lgplglgn+occ)O(\frac{m^k\sigma^k}{p}\max\left(k,\lg\lg n\right)\!+\!(1+\frac{m}{p}) \lg p\cdot \lg\lg n + \text{occ}) time, where σ\sigma is the size of the alphabet and pσkmkp \le \sigma^k m^k. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns PP and PP', we present a data structure for computing the interval of PPPP' in O(lglgn)O(\lg\lg n) sequential time, or in O(1+lgplgn)O(1+\lg_p\lg n) parallel time. All our data structures are of size O(n)O(n) bits (in addition to the suffix array)

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

    Linear Compressed Pattern Matching for Polynomial Rewriting (Extended Abstract)

    Full text link
    This paper is an extended abstract of an analysis of term rewriting where the terms in the rewrite rules as well as the term to be rewritten are compressed by a singleton tree grammar (STG). This form of compression is more general than node sharing or representing terms as dags since also partial trees (contexts) can be shared in the compression. In the first part efficient but complex algorithms for detecting applicability of a rewrite rule under STG-compression are constructed and analyzed. The second part applies these results to term rewriting sequences. The main result for submatching is that finding a redex of a left-linear rule can be performed in polynomial time under STG-compression. The main implications for rewriting and (single-position or parallel) rewriting steps are: (i) under STG-compression, n rewriting steps can be performed in nondeterministic polynomial time. (ii) under STG-compression and for left-linear rewrite rules a sequence of n rewriting steps can be performed in polynomial time, and (iii) for compressed rewrite rules where the left hand sides are either DAG-compressed or ground and STG-compressed, and an STG-compressed target term, n rewriting steps can be performed in polynomial time.Comment: In Proceedings TERMGRAPH 2013, arXiv:1302.599

    Pattern matching of compressed terms and contexts and polynomial rewriting

    Get PDF
    A generalization of the compressed string pattern match that applies to terms with variables is investigated: Given terms s and t compressed by singleton tree grammars, the task is to find an instance of s that occurs as a subterm in t. We show that this problem is in NP and that the task can be performed in time O(ncjVar(s)j), including the construction of the compressed substitution, and a representation of all occurrences. We show that the special case where s is uncompressed can be performed in polynomial time. As a nice application we show that for an equational deduction of t to t0 by an equality axiom l = r (a rewrite) a single step can be performed in polynomial time in the size of compression of t and l; r if the number of variables is fixed in l. We also show that n rewriting steps can be performed in polynomial time, if the equational axioms are compressed and assumed to be constant for the rewriting sequence. Another potential application are querying mechanisms on compressed XML-data bases
    corecore