Search CORE

9,629 research outputs found

Processing Regular Path Queries on Arbitrarily Distributed Data

Author: A Halevy
A Koschmieder
AO Mendelzon
C Plake
D Calvanese
EN Gilbert
G Ladwig
G Navarro
J Umbrich
M Saleem
M Shoaran
O Hartig
S Abiteboul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/10/2015
Field of study

Regular Path Queries (RPQs) are a type of graph query where answers are pairs of nodes connected by a sequence of edges matching a regular expression. We study the techniques to process such queries on a distributed graph of data. While many techniques assume the location of each data element (node or edge) is known, when the components of the distributed system are autonomous, the data will be arbitrarily distributed. As the different query processing strategies are equivalently costly in the worst case, we isolate query-dependent cost factors and present a method to choose between strategies, using new query cost estimation techniques. We evaluate our techniques using meaningful queries on biomedical data

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Performance Guarantees for Distributed Reachability Queries

Author: Fan Wenfei
Wang Xin
Wu Yinghui
Publication venue
Publication date: 01/01/2012
Field of study

In the real world a graph is often fragmented and distributed across different sites. This highlights the need for evaluating queries on distributed graphs. This paper proposes distributed evaluation algorithms for three classes of queries: reachability for determining whether one node can reach another, bounded reachability for deciding whether there exists a path of a bounded length between a pair of nodes, and regular reachability for checking whether there exists a path connecting two nodes such that the node labels on the path form a string in a given regular expression. We develop these algorithms based on partial evaluation, to explore parallel computation. When evaluating a query Q on a distributed graph G, we show that these algorithms possess the following performance guarantees, no matter how G is fragmented and distributed: (1) each site is visited only once; (2) the total network traffic is determined by the size of Q and the fragmentation of G, independent of the size of G; and (3) the response time is decided by the largest fragment of G rather than the entire G. In addition, we show that these algorithms can be readily implemented in the MapReduce framework. Using synthetic and real-life data, we experimentally verify that these algorithms are scalable on large graphs, regardless of how the graphs are distributed.Comment: VLDB201

arXiv.org e-Print Archive

Edinburgh Research Explorer

TAPER: query-aware, partition-enhancement for large, heterogenous, graphs

Author: Firth Hugo
Missier Paolo
Publication venue
Publication date: 23/06/2016
Field of study

Graph partitioning has long been seen as a viable approach to address Graph DBMS scalability. A partitioning, however, may introduce extra query processing latency unless it is sensitive to a specific query workload, and optimised to minimise inter-partition traversals for that workload. Additionally, it should also be possible to incrementally adjust the partitioning in reaction to changes in the graph topology, the query workload, or both. Because of their complexity, current partitioning algorithms fall short of one or both of these requirements, as they are designed for offline use and as one-off operations. The TAPER system aims to address both requirements, whilst leveraging existing partitioning algorithms. TAPER takes any given initial partitioning as a starting point, and iteratively adjusts it by swapping chosen vertices across partitions, heuristically reducing the probability of inter-partition traversals for a given pattern matching queries workload. Iterations are inexpensive thanks to time and space optimisations in the underlying support data structures. We evaluate TAPER on two different large test graphs and over realistic query workloads. Our results indicate that, given a hash-based partitioning, TAPER reduces the number of inter-partition traversals by around 80%; given an unweighted METIS partitioning, by around 30%. These reductions are achieved within 8 iterations and with the additional advantage of being workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe

arXiv.org e-Print Archive

University of Birmingham Research Portal

Newcastle University E-Prints

Locating Equivalent Servants over P2P Networks

Author: Ciminiera Luigi
Marchetto Guido
Papa Manzillo Marco
Risso Fulvio Giovanni Ottavio
Torrero Livio
Publication venue: IEEE
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Distributed Graph Simulation: Impossibility and Possibility

Author: Deng Dong
Fan Wenfei
Wang Xin
Wu Yinghui
Publication venue
Publication date: 01/01/2014
Field of study

Edinburgh Research Explorer

Reasoning about Independence in Probabilistic Models of Relational Data

Author: Jensen David
Maier Marc
Marazopoulou Katerina
Publication venue
Publication date: 06/01/2014
Field of study

We extend the theory of d-separation to cases in which data instances are not independent and identically distributed. We show that applying the rules of d-separation directly to the structure of probabilistic models of relational data inaccurately infers conditional independence. We introduce relational d-separation, a theory for deriving conditional independence facts from relational models. We provide a new representation, the abstract ground graph, that enables a sound, complete, and computationally efficient method for answering d-separation queries about relational models, and we present empirical results that demonstrate effectiveness.Comment: 61 pages, substantial revisions to formalisms, theory, and related wor

arXiv.org e-Print Archive

CiteSeerX

Route Planning in Transportation Networks

Author: Bast Hannah
Delling Daniel
Goldberg Andrew
Müller-Hannemann Matthias
Pajor Thomas
Sanders Peter
Wagner Dorothea
Werneck Renato F.
Publication venue
Publication date: 20/04/2015
Field of study

We survey recent advances in algorithms for route planning in transportation networks. For road networks, we show that one can compute driving directions in milliseconds or less even at continental scale. A variety of techniques provide different trade-offs between preprocessing effort, space requirements, and query time. Some algorithms can answer queries in a fraction of a microsecond, while others can deal efficiently with real-time traffic. Journey planning on public transportation systems, although conceptually similar, is a significantly harder problem due to its inherent time-dependent and multicriteria nature. Although exact algorithms are fast enough for interactive queries on metropolitan transit systems, dealing with continent-sized instances requires simplifications or heavy preprocessing. The multimodal route planning problem, which seeks journeys combining schedule-based transportation (buses, trains) with unrestricted modes (walking, driving), is even harder, relying on approximate solutions even for metropolitan inputs.Comment: This is an updated version of the technical report MSR-TR-2014-4, previously published by Microsoft Research. This work was mostly done while the authors Daniel Delling, Andrew Goldberg, and Renato F. Werneck were at Microsoft Research Silicon Valle

arXiv.org e-Print Archive

CiteSeerX