Search CORE

6,237 research outputs found

Processing SPARQL queries with regular expressions in RDF databases

Author: Han WS
Hune cho
Jeong-Hoon Lee
Lee J
Lee J
Minh-Duc Pham
YU HWANJO
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.X113sciescopu

포항공과대학교

From Regular Expression Matching to Parsing

Author: Bille Philip
Gørtz Inge Li
Publication venue
Publication date: 29/01/2019
Field of study

Given a regular expression

R

and a string

Q

, the regular expression parsing problem is to determine if

Q

matches

R

and if so, determine how it matches, e.g., by a mapping of the characters of

Q

to the characters in

R

. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string

Q

matches regular expression

R

into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing

arXiv.org e-Print Archive

Online Research Database In Technology

Answering Regular Path Queries on Workflow Provenance

Author: Bao Zhuowei
Davidson Susan B.
Huang Xiaocheng
Milo Tova
Yuan Xiaojie
Publication venue
Publication date: 04/08/2014
Field of study

This paper proposes a novel approach for efficiently evaluating regular path queries over provenance graphs of workflows that may include recursion. The approach assumes that an execution g of a workflow G is labeled with query-agnostic reachability labels using an existing technique. At query time, given g, G and a regular path query R, the approach decomposes R into a set of subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is rewritten so that, using the reachability labels of nodes in g, whether or not there is a path which matches Ri between two nodes can be decided in constant time. The results of each safe subquery are then composed, possibly with some small unsafe remainder, to produce an answer to R. The approach results in an algorithm that significantly reduces the number of subqueries k over existing techniques by increasing their size and complexity, and that evaluates each subquery in time bounded by its input and output size. Experimental results demonstrate the benefit of this approach

arXiv.org e-Print Archive

Crossref

On the Complexity and Performance of Parsing with Derivatives

Author: Adams Michael D.
Hollenbeck Celeste
Might Matthew
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/04/2016
Field of study

Current algorithms for context-free parsing inflict a trade-off between ease of understanding, ease of implementation, theoretical complexity, and practical performance. No algorithm achieves all of these properties simultaneously. Might et al. (2011) introduced parsing with derivatives, which handles arbitrary context-free grammars while being both easy to understand and simple to implement. Despite much initial enthusiasm and a multitude of independent implementations, its worst-case complexity has never been proven to be better than exponential. In fact, high-level arguments claiming it is fundamentally exponential have been advanced and even accepted as part of the folklore. Performance ended up being sluggish in practice, and this sluggishness was taken as informal evidence of exponentiality. In this paper, we reexamine the performance of parsing with derivatives. We have discovered that it is not exponential but, in fact, cubic. Moreover, simple (though perhaps not obvious) modifications to the implementation by Might et al. (2011) lead to an implementation that is not only easy to understand but also highly performant in practice.Comment: 13 pages; 12 figures; implementation at http://bitbucket.org/ucombinator/parsing-with-derivatives/ ; published in PLDI '16, Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 13 - 17, 2016, Santa Barbara, CA, US

arXiv.org e-Print Archive

Crossref

Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching

Author: Berglund Martin
Drewes Frank
van der Merwe Brink
Publication venue: 'Open Publishing Association'
Publication date: 01/05/2014
Field of study

We develop a formal perspective on how regular expression matching works in Java, a popular representative of the category of regex-directed matching engines. In particular, we define an automata model which captures all the aspects needed to study such matching engines in a formal way. Based on this, we propose two types of static analysis, which take a regular expression and tell whether there exists a family of strings which makes Java-style matching run in exponential time.Comment: In Proceedings AFL 2014, arXiv:1405.527

arXiv.org e-Print Archive

Directory of Open Access Journals

Program Synthesis using Natural Language

Author: Aditya Desai
Amey Karkare
Mark Marron
Msr Redmond
Msr Redmond
Nidhi Jain
Roy
Sailesh R Subhajit
Sumit Gulwani
Vineet Hingorani
Publication venue
Publication date: 01/09/2015
Field of study

Interacting with computers is a ubiquitous activity for millions of people. Repetitive or specialized tasks often require creation of small, often one-off, programs. End-users struggle with learning and using the myriad of domain-specific languages (DSLs) to effectively accomplish these tasks. We present a general framework for constructing program synthesizers that take natural language (NL) inputs and produce expressions in a target DSL. The framework takes as input a DSL definition and training data consisting of NL/DSL pairs. From these it constructs a synthesizer by learning optimal weights and classifiers (using NLP features) that rank the outputs of a keyword-programming based translation. We applied our framework to three domains: repetitive text editing, an intelligent tutoring system, and flight information queries. On 1200+ English descriptions, the respective synthesizers rank the desired program as the top-1 and top-3 for 80% and 90% descriptions respectively

arXiv.org e-Print Archive

CiteSeerX