34,059 research outputs found

    Processing SPARQL queries with regular expressions in RDF databases

    Get PDF
    Background: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.X113sciescopu

    Regular Expression Search on Compressed Text

    Full text link
    We present an algorithm for searching regular expression matches in compressed text. The algorithm reports the number of matching lines in the uncompressed text in time linear in the size of its compressed version. We define efficient data structures that yield nearly optimal complexity bounds and provide a sequential implementation --zearch-- that requires up to 25% less time than the state of the art.Comment: 10 pages, published in Data Compression Conference (DCC'19

    H\"older-type inequalities and their applications to concentration and correlation bounds

    Get PDF
    Let Yv,vV,Y_v, v\in V, be [0,1][0,1]-valued random variables having a dependency graph G=(V,E)G=(V,E). We show that E[vVYv]vV{E[Yvχbb]}bχb, \mathbb{E}\left[\prod_{v\in V} Y_{v} \right] \leq \prod_{v\in V} \left\{ \mathbb{E}\left[Y_v^{\frac{\chi_b}{b}}\right] \right\}^{\frac{b}{\chi_b}}, where χb\chi_b is the bb-fold chromatic number of GG. This inequality may be seen as a dependency-graph analogue of a generalised H\"older inequality, due to Helmut Finner. Additionally, we provide applications of H\"older-type inequalities to concentration and correlation bounds for sums of weakly dependent random variables.Comment: 15 page
    corecore