Search CORE

3,554 research outputs found

Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1

Author: Gurajada Sairam
Theobald Martin
Publication venue
Publication date: 01/01/2016
Field of study

We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 "Query Language" component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of relational joins with additional graph-reachability predicates. For the scalable, i.e., distributed, processing of this kind of queries over very large RDF collections, we develop a suitable partitioning and indexing scheme, which allows us to shard the RDF triples over an entire cluster of compute nodes and to process an incoming SPARQL query over all of the relevant graph partitions (and thus compute nodes) in parallel. Unlike most prior works in this field, we specifically aim at the unified optimization and distributed processing of queries consisting of both relational joins and graph-reachability predicates. All communication among the compute nodes is established via a proprietary, asynchronous communication protocol based on the Message Passing Interface

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

MPG.PuRe

Answering Regular Path Queries on Workflow Provenance

Author: Bao Zhuowei
Davidson Susan B.
Huang Xiaocheng
Milo Tova
Yuan Xiaojie
Publication venue
Publication date: 04/08/2014
Field of study

This paper proposes a novel approach for efficiently evaluating regular path queries over provenance graphs of workflows that may include recursion. The approach assumes that an execution g of a workflow G is labeled with query-agnostic reachability labels using an existing technique. At query time, given g, G and a regular path query R, the approach decomposes R into a set of subqueries R1, ..., Rk that are safe for G. For each safe subquery Ri, G is rewritten so that, using the reachability labels of nodes in g, whether or not there is a path which matches Ri between two nodes can be decided in constant time. The results of each safe subquery are then composed, possibly with some small unsafe remainder, to produce an answer to R. The approach results in an algorithm that significantly reduces the number of subqueries k over existing techniques by increasing their size and complexity, and that evaluates each subquery in time bounded by its input and output size. Experimental results demonstrate the benefit of this approach

arXiv.org e-Print Archive

Crossref

Reverse k Nearest Neighbor Search over Trajectories

Author: Bao Zhifeng
Cong Gao
Culpepper J. Shane
Sellis Timos
Wang Sheng
Publication venue
Publication date: 01/01/2017
Field of study

GPS enables mobile devices to continuously provide new opportunities to improve our daily lives. For example, the data collected in applications created by Uber or Public Transport Authorities can be used to plan transportation routes, estimate capacities, and proactively identify low coverage areas. In this paper, we study a new kind of query-Reverse k Nearest Neighbor Search over Trajectories (RkNNT), which can be used for route planning and capacity estimation. Given a set of existing routes DR, a set of passenger transitions DT, and a query route Q, a RkNNT query returns all transitions that take Q as one of its k nearest travel routes. To solve the problem, we first develop an index to handle dynamic trajectory updates, so that the most up-to-date transition data are available for answering a RkNNT query. Then we introduce a filter refinement framework for processing RkNNT queries using the proposed indexes. Next, we show how to use RkNNT to solve the optimal route planning problem MaxRkNNT (MinRkNNT), which is to search for the optimal route from a start location to an end location that could attract the maximum (or minimum) number of passengers based on a pre-defined travel distance threshold. Experiments on real datasets demonstrate the efficiency and scalability of our approaches. To the best of our best knowledge, this is the first work to study the RkNNT problem for route planning.Comment: 12 page

arXiv.org e-Print Archive

RMIT Research Repository

DR-NTU (Digital Repository of NTU)

Efficient Race Detection with Futures

Author: Agrawal Kunal
Fineman Jeremy
Lee I-Ting Angelina
Utterback Robert
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/01/2019
Field of study

This paper addresses the problem of provably efficient and practically good on-the-fly determinacy race detection in task parallel programs that use futures. Prior works determinacy race detection have mostly focused on either task parallel programs that follow a series-parallel dependence structure or ones with unrestricted use of futures that generate arbitrary dependences. In this work, we consider a restricted use of futures and show that it can be race detected more efficiently than general use of futures. Specifically, we present two algorithms: MultiBags and MultiBags+. MultiBags targets programs that use futures in a restricted fashion and runs in time

O(T_1 \alpha(m,n))

, where

T_1

is the sequential running time of the program,

\alpha

is the inverse Ackermann's function,

m

is the total number of memory accesses,

n

is the dynamic count of places at which parallelism is created. Since

\alpha

is a very slowly growing function (upper bounded by

4

for all practical purposes), it can be treated as a close-to-constant overhead. MultiBags+ an extension of MultiBags that target programs with general use of futures. It runs in time

O((T_1+k^2)\alpha(m,n))

where

T_1

\alpha

m

and

n

are defined as before, and

k

is the number of future operations in the computation. We implemented both algorithms and empirically demonstrate their efficiency

arXiv.org e-Print Archive

Crossref

IO-Top-k: index-access optimized top-k query processing

Author: Bast H.
Majumdar D.
Schenkel R.
Theobalt C.
Weikum G.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2006
Field of study

Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k queries operate on index lists for a query's elementary conditions and aggregate scores for result candidates. One of the best implementation methods in this setting is the family of threshold algorithms, which aim to terminate the index scans as early as possible based on lower and upper bounds for the final scores of result candidates. This procedure performs sequential disk accesses for sorted index scans, but also has the option of performing random accesses to resolve score uncertainty. This entails scheduling for the two kinds of accesses: 1) the prioritization of different index lists in the sequential accesses, and 2) the decision on when to perform random accesses and for which candidates. The prior literature has studied some of these scheduling issues, but only for each of the two access types in isolation. The current paper takes an integrated view of the scheduling issues and develops novel strategies that outperform prior proposals by a large margin. Our main contributions are new, principled, scheduling methods based on a Knapsack-related optimization for sequential accesses and a cost model for random accesses. The methods can be further boosted by harnessing probabilistic estimators for scores, selectivities, and index list correlations. We also discuss efficient implementation techniques for the underlying data structures. In performance experiments with three different datasets (TREC Terabyte, HTTP server logs, and IMDB), our methods achieved significant performance gains compared to the best previously known methods: a factor of up to 3 in terms of execution costs, and a factor of 5 in terms of absolute run-times of our implementation. Our best techniques are close to a lower bound for the execution cost of the considered class of threshold algorithms

MPG.PuRe

A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters

Author: Jensen Christian S.
Wu Dingming
Publication venue
Publication date: 01/01/2016
Field of study

Keyword-based web queries with local intent retrieve web content that is relevant to supplied keywords and that represent points of interest that are near the query location. Two broad categories of such queries exist. The first encompasses queries that retrieve single spatial web objects that each satisfy the query arguments. Most proposals belong to this category. The second category, to which this paper's proposal belongs, encompasses queries that support exploratory user behavior and retrieve sets of objects that represent regions of space that may be of interest to the user. Specifically, the paper proposes a new type of query, namely the top-k spatial textual clusters (k-STC) query that returns the top-k clusters that (i) are located the closest to a given query location, (ii) contain the most relevant objects with regard to given query keywords, and (iii) have an object density that exceeds a given threshold. To compute this query, we propose a basic algorithm that relies on on-line density-based clustering and exploits an early stop condition. To improve the response time, we design an advanced approach that includes three techniques: (i) an object skipping rule, (ii) spatially gridded posting lists, and (iii) a fast range query algorithm. An empirical study on real data demonstrates that the paper's proposals offer scalability and are capable of excellent performance

arXiv.org e-Print Archive

Crossref

VBN

Qualitative Multi-Objective Reachability for Ordered Branching MDPs

Author: C Courcoubetis
I Bozic
JG Reiter
K Chatterjee
K Etessami
K Etessami
K Etessami
K Etessami
K Etessami
M Kimmel
P Haccou
R Durbin
T Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/08/2020
Field of study

We study qualitative multi-objective reachability problems for Ordered Branching Markov Decision Processes (OBMDPs), or equivalently context-free MDPs, building on prior results for single-target reachability on Branching Markov Decision Processes (BMDPs). We provide two separate algorithms for "almost-sure" and "limit-sure" multi-target reachability for OBMDPs. Specifically, given an OBMDP,

\mathcal{A}

, given a starting non-terminal, and given a set of target non-terminals

K

of size

k = |K|

, our first algorithm decides whether the supremum probability, of generating a tree that contains every target non-terminal in set

K

, is

1

. Our second algorithm decides whether there is a strategy for the player to almost-surely (with probability

1

) generate a tree that contains every target non-terminal in set

K

. The two separate algorithms are needed: we show that indeed, in this context, "almost-sure"

\not=

"limit-sure" for multi-target reachability, meaning that there are OBMDPs for which the player may not have any strategy to achieve probability exactly

1

of reaching all targets in set

K

in the same generated tree, but may have a sequence of strategies that achieve probability arbitrarily close to

1

. Both algorithms run in time

2^{O(k)} \cdot |\mathcal{A}|^{O(1)}

, where

|\mathcal{A}|

is the total bit encoding length of the given OBMDP,

\mathcal{A}

. Hence they run in polynomial time when

k

is fixed, and are fixed-parameter tractable with respect to

k

. Moreover, we show that even the qualitative almost-sure (and limit-sure) multi-target reachability decision problem is in general NP-hard, when the size

k

of the set

K

of target non-terminals is not fixed.Comment: 47 page

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer