Search CORE

350 research outputs found

Graph Pattern Matching: From Intractable to Polynomial Time

Author: Fan Wenfei
Li Jianzhong
Ma Shuai
Tang Nan
Wu Yinghui
Wu Yunpeng
Publication venue
Publication date: 01/01/2010
Field of study

Fast and Tiny Structural Self-Indexes for XML

Author: Maneth Sebastian
Sebastian Tom
Publication venue
Publication date: 27/12/2010
Field of study

XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is comparable to the space improvement. More interestingly, certain algorithms execute much faster over the index (because no decompression occurs). E.g., for structural XPath count queries, evaluating over the index is faster than previous XPath implementations, often by two orders of magnitude. The index also allows to serialize XML results (including texts) faster than previous systems, by a factor of ca. 2-3. This is due to efficient copy handling of grammar repetitions, and because materialization is totally avoided. In order to compare with twig join implementations, we implemented a materializer which writes out pre-order numbers of result nodes, and show its competitiveness.Comment: 13 page

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Adding Logical Operators to Tree Pattern Queries on Graph-Structured Data

Author: Jiang Xiaorui
Zeng Qiang
Zhuge Hai
Publication venue
Publication date: 16/04/2012
Field of study

As data are increasingly modeled as graphs for expressing complex relationships, the tree pattern query on graph-structured data becomes an important type of queries in real-world applications. Most practical query languages, such as XQuery and SPARQL, support logical expressions using logical-AND/OR/NOT operators to define structural constraints of tree patterns. In this paper, (1) we propose generalized tree pattern queries (GTPQs) over graph-structured data, which fully support propositional logic of structural constraints. (2) We make a thorough study of fundamental problems including satisfiability, containment and minimization, and analyze the computational complexity and the decision procedures of these problems. (3) We propose a compact graph representation of intermediate results and a pruning approach to reduce the size of intermediate results and the number of join operations -- two factors that often impair the efficiency of traditional algorithms for evaluating tree pattern queries. (4) We present an efficient algorithm for evaluating GTPQs using 3-hop as the underlying reachability index. (5) Experiments on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our algorithm, from several times to orders of magnitude faster than state-of-the-art algorithms in terms of evaluation time, even for traditional tree pattern queries with only conjunctive operations.Comment: 16 page

arXiv.org e-Print Archive

CiteSeerX

Coventry University Pure Portal

Answering Regular Path Queries on Workflow Provenance

Author: Bao Zhuowei
Davidson Susan B.
Huang Xiaocheng
Milo Tova
Yuan Xiaojie
Publication venue
Publication date: 04/08/2014
Field of study

This paper proposes a novel approach for efficiently evaluating regular path queries over provenance graphs of workflows that may include recursion. The approach assumes that an execution g of a workflow G is labeled with query-agnostic reachability labels using an existing technique. At query time, given g, G and a regular path query R, the approach decomposes R into a set of subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is rewritten so that, using the reachability labels of nodes in g, whether or not there is a path which matches Ri between two nodes can be decided in constant time. The results of each safe subquery are then composed, possibly with some small unsafe remainder, to produce an answer to R. The approach results in an algorithm that significantly reduces the number of subqueries k over existing techniques by increasing their size and complexity, and that evaluates each subquery in time bounded by its input and output size. Experimental results demonstrate the benefit of this approach

arXiv.org e-Print Archive

Crossref

PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

Author: Chun Byung-Gon
Interlandi Matteo
Lee Yunseong
Santambrogio Marco Domenico
Scolari Alberto
Weimer Markus
Publication venue
Publication date: 01/01/2018
Field of study

Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training time, prediction serving has different requirements such as low latency, high throughput and graceful performance degradation under heavy load. Current prediction serving systems consider models as black boxes, whereby prediction-time-specific optimizations are ignored in favor of ease of deployment. In this paper, we present PRETZEL, a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations. Using production-like model pipelines, our experiments show that PRETZEL is able to introduce performance improvements over different dimensions; compared to state-of-the-art approaches PRETZEL is on average able to reduce 99th percentile latency by 5.5x while reducing memory footprint by 25x, and increasing throughput by 4.7x.Comment: 16 pages, 14 figures, 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Teak: A Novel Computational And Gui Software Pipeline For Reconstructing Biological Networks, Detecting Activated Biological Subnetworks, And Querying Biological Networks.

Author: Judeh Thair
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2014
Field of study

As high-throughput gene expression data becomes cheaper and cheaper, researchers are faced with a deluge of data from which biological insights need to be extracted and mined since the rate of data accumulation far exceeds the rate of data analysis. There is a need for computational frameworks to bridge the gap and assist researchers in their tasks. The Topology Enrichment Analysis frameworK (TEAK) is an open source GUI and software pipeline that seeks to be one of many tools that fills in this gap and consists of three major modules. The first module, the Gene Set Cultural Algorithm, de novo infers biological networks from gene sets using the KEGG pathways as prior knowledge. The second and third modules query against the KEGG pathways using molecular profiling data and query graphs, respectively. In particular, the second module, also called TEAK, is a network partitioning module that partitions the KEGG pathways into both linear and nonlinear subpathways. In conjunction with molecular profiling data, the subpathways are ranked and displayed to the user within the TEAK GUI. Using a public microarray yeast data set, previously unreported fitness defects for dpl1 delta and lag1 delta mutants under conditions of nitrogen limitation were found using TEAK. Finally, the third module, the Query Structure Enrichment Analysis framework, is a network query module that allows researchers to query their biological hypotheses in the form of Directed Acyclic Graphs against the KEGG pathways

Digital Commons@Wayne State University

Reasoning & Querying – State of the Art

Author: Bry François
Furche Tim
Weiand Klara
Publication venue
Publication date: 31/08/2008
Field of study

Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF

Open Access LMU

Supporting Custom Instructions with the LLVM Compiler for RISC-V Processor

Author: İnan Bora
Yiğit Emrecan
Ünay Eymen
Publication venue
Publication date: 23/10/2023
Field of study

The rise of hardware accelerators with custom instructions necessitates custom compiler backends supporting these accelerators. This study provides detailed analyses of LLVM and its RISC-V backend, supplemented with case studies providing end-to-end overview of the mentioned transformations. We discuss that instruction design should consider both hardware and software design space. The necessary compiler modifications may mean that the instruction is not well designed and need to be reconsidered. We discuss that RISC-V standard extensions provide exemplary instructions that can guide instruction designers. In this study, the process of adding a custom instruction to compiler is split into two parts as Assembler support and pattern matching support. Without pattern matching support, conventional software requires manual entries of inline Assembly for the accelerator which is not scalable. While it is trivial to add Assembler support regardless of the instruction semantics, pattern matching support is on the contrary. Pattern matching support and choosing the right stage for the modification, requires the knowledge of the internal transformations in the compiler. This study delves deep into pattern matching and presents multiple ways to approach the problem of pattern matching support. It is discussed that depending on the pattern's complexity, higher level transformations, e.g. IR level, can be more maintainable compared to Instruction Selection phase.Comment: Electronics and Communication Engineering B.Sc. Graduation Project. Source can be found in https://github.com/eymay/Senior-Design-Projec

arXiv.org e-Print Archive

Fast Rule-Based Graph Programs

Author: Campbell Graham
Courtehoute Brian
Plump Detlef
Publication venue: 'Elsevier BV'
Publication date: 04/01/2021
Field of study

Implementing graph algorithms efficiently in a rule-based language is challenging because graph pattern matching is expensive. In this paper, we present a number of linear-time implementations of graph algorithms in GP 2, an experimental programming language based on graph transformation rules which aims to facilitate program analysis and verification. We focus on two classes of rule-based graph programs: graph reduction programs which check some graph property, and programs using a depth-first search to test some property or perform an operation such as producing a 2-colouring or a topological sorting. Programs of the first type run in linear time without any constraints on input graphs while programs of the second type require input graphs of bounded degree to run in linear time. Essential for achieving the linear time complexity are so-called rooted rules in GP 2, which, in many situations, can be matched in constant time. For each of our programs, we prove both correctness and complexity, and also give empirical evidence for their run time.Comment: 47 pages, 202

arXiv.org e-Print Archive

White Rose Research Online