218 research outputs found
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey
Graph processing is becoming increasingly prevalent across many application
domains. In spite of this prevalence, there is little research about how graphs
are actually used in practice. We performed an extensive study that consisted
of an online survey of 89 users, a review of the mailing lists, source
repositories, and whitepapers of a large suite of graph software products, and
in-person interviews with 6 users and 2 developers of these products. Our
online survey aimed at understanding: (i) the types of graphs users have; (ii)
the graph computations users run; (iii) the types of graph software users use;
and (iv) the major challenges users face when processing their graphs. We
describe the participants' responses to our questions highlighting common
patterns and challenges. Based on our interviews and survey of the rest of our
sources, we were able to answer some new questions that were raised by
participants' responses to our online survey and understand the specific
applications that use graph data and software. Our study revealed surprising
facts about graph processing in practice. In particular, real-world graphs
represent a very diverse range of entities and are often very large,
scalability and visualization are undeniably the most pressing challenges faced
by participants, and data integration, recommendations, and fraud detection are
very popular applications supported by existing graph software. We hope these
findings can guide future research
Processing SPARQL Queries Over Distributed RDF Graphs
We propose techniques for processing SPARQL queries over a large RDF graph in
a distributed environment. We adopt a "partial evaluation and assembly"
framework. Answering a SPARQL query Q is equivalent to finding subgraph matches
of the query graph Q over RDF graph G. Based on properties of subgraph matching
over a distributed graph, we introduce local partial match as partial answers
in each fragment of RDF graph G. For assembly, we propose two methods:
centralized and distributed assembly. We analyze our algorithms from both
theoretically and experimentally. Extensive experiments over both real and
benchmark RDF repositories of billions of triples confirm that our method is
superior to the state-of-the-art methods in both the system's performance and
scalability.Comment: 30 page
GSI: GPU-friendly Subgraph Isomorphism
Subgraph isomorphism is a well-known NP-hard problem that is widely used in
many applications, such as social network analysis and query over the knowledge
graph. Due to the inherent hardness, its performance is often a bottleneck in
various real-world applications. Therefore, we address this by designing an
efficient subgraph isomorphism algorithm leveraging features of GPU
architecture, such as massive parallelism and memory hierarchy. Existing
GPU-based solutions adopt a two-step output scheme, performing the same join
process twice in order to write intermediate results concurrently. They also
lack GPU architecture-aware optimizations that allow scaling to large graphs.
In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI.
Different from existing edge join-based GPU solutions, we propose a
Prealloc-Combine strategy based on the vertex-oriented framework, which avoids
joining-twice in existing solutions. Also, a GPU-friendly data structure
(called PCSR) is proposed to represent an edge-labeled graph. Extensive
experiments on both synthetic and real graphs show that GSI outperforms the
state-of-the-art algorithms by up to several orders of magnitude and has good
scalability with graph size scaling to hundreds of millions of edges.Comment: 15 pages, 17 figures, conferenc
Regular Path Query Evaluation on Streaming Graphs
We study persistent query evaluation over streaming graphs, which is becoming
increasingly important. We focus on navigational queries that determine if
there exists a path between two entities that satisfies a user-specified
constraint. We adopt the Regular Path Query (RPQ) model that specifies
navigational patterns with labeled constraints. We propose deterministic
algorithms to efficiently evaluate persistent RPQs under both arbitrary and
simple path semantics in a uniform manner. Experimental analysis on real and
synthetic streaming graphs shows that the proposed algorithms can process up to
tens of thousands of edges per second and efficiently answer RPQs that are
commonly used in real-world workloads.Comment: A shorter version of this paper has been accepted for publication in
2020 International Conference on Management of Data (SIGMOD 2020
- …