78,533 research outputs found

    On the Use of Quasiorders in Formal Language Theory

    Full text link
    In this thesis we use quasiorders on words to offer a new perspective on two well-studied problems from Formal Language Theory: deciding language inclusion and manipulating the finite automata representations of regular languages. First, we present a generic quasiorder-based framework that, when instantiated with different quasiorders, yields different algorithms (some of them new) for deciding language inclusion. We then instantiate this framework to devise an efficient algorithm for searching with regular expressions on grammar-compressed text. Finally, we define a framework of quasiorder-based automata constructions to offer a new perspective on residual automata.Comment: PhD thesi

    Regular Expression Search on Compressed Text

    Full text link
    We present an algorithm for searching regular expression matches in compressed text. The algorithm reports the number of matching lines in the uncompressed text in time linear in the size of its compressed version. We define efficient data structures that yield nearly optimal complexity bounds and provide a sequential implementation --zearch-- that requires up to 25% less time than the state of the art.Comment: 10 pages, published in Data Compression Conference (DCC'19

    Processing SPARQL queries with regular expressions in RDF databases

    Get PDF
    Background: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.X113sciescopu

    Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

    Full text link
    We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

    Fast and Compact Regular Expression Matching

    Get PDF
    We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

    Substring filtering for low-cost linked data interfaces

    Get PDF
    Recently, Triple Pattern Fragments (TPFS) were introduced as a low-cost server-side interface when high numbers of clients need to evaluate SPARQL queries. Scalability is achieved by moving part of the query execution to the client, at the cost of elevated query times. Since the TPFS interface purposely does not support complex constructs such as SPARQL filters, queries that use them need to be executed mostly on the client, resulting in long execution times. We therefore investigated the impact of adding a literal substring matching feature to the TPFS interface, with the goal of improving query performance while maintaining low server cost. In this paper, we discuss the client/server setup and compare the performance of SPARQL queries on multiple implementations, including Elastic Search and case-insensitive FM-index. Our evaluations indicate that these improvements allow for faster query execution without significantly increasing the load on the server. Offering the substring feature on TPF servers allows users to obtain faster responses for filter-based SPARQL queries. Furthermore, substring matching can be used to support other filters such as complete regular expressions or range queries

    Tree Buffers

    Get PDF
    In runtime verification, the central problem is to decide if a given program execution violates a given property. In online runtime verification, a monitor observes a program’s execution as it happens. If the program being observed has hard real-time constraints, then the monitor inherits them. In the presence of hard real-time constraints it becomes a challenge to maintain enough information to produce error traces, should a property violation be observed. In this paper we introduce a data structure, called tree buffer, that solves this problem in the context of automata-based monitors: If the monitor itself respects hard real-time constraints, then enriching it by tree buffers makes it possible to provide error traces, which are essential for diagnosing defects. We show that tree buffers are also useful in other application domains. For example, they can be used to implement functionality of capturing groups in regular expressions. We prove optimal asymptotic bounds for our data structure, and validate them using empirical data from two sources: regular expression searching through Wikipedia, and runtime verification of execution traces obtained from the DaCapo test suite
    corecore