2 research outputs found
GSmart: An Efficient SPARQL Query Engine Using Sparse Matrix Algebra -- Full Version
Efficient execution of SPARQL queries over large RDF datasets is a topic of
considerable interest due to increased use of RDF to encode data. Most of this
work has followed either relational or graph-based approaches. In this paper,
we propose an alternative query engine, called gSmart, based on matrix algebra.
This approach can potentially better exploit the computing power of
high-performance heterogeneous architectures that we target. gSmart
incorporates: (1) grouped incident edge-based SPARQL query evaluation, in which
all unevaluated edges of a vertex are evaluated together using a series of
matrix operations to fully utilize query constraints and narrow down the
solution space; (2) a graph query planner that determines the order in which
vertices in query graphs should be evaluated; (3) memory- and
computation-efficient data structures including the light-weight sparse matrix
(LSpM) storage for RDF data and the tree-based representation for evaluation
results; (4) a multi-stage data partitioner to map the incident edge-based
query evaluation into heterogeneous HPC architectures and develop multi-level
parallelism; and (5) a parallel executor that uses the fine-grained processing
scheme, pre-pruning technique, and tree-pruning technique to lower inter-node
communication and enable high throughput. Evaluations of gSmart on a CPU+GPU
HPC architecture show execution time speedups of up to 46920.00x compared to
the existing SPARQL query engines on a single node machine. Additionally,
gSmart on the Tianhe-1A supercomputer achieves a maximum speedup of 6.90x
scaling from 2 to 16 CPU+GPU nodes
A Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs
RDF has seen increased adoption in recent years, prompting the
standardization of the SPARQL query language for RDF, and the development of
local and distributed engines for processing SPARQL queries. This survey paper
provides a comprehensive review of techniques and systems for querying RDF
knowledge graphs. While other reviews on this topic tend to focus on the
distributed setting, the main focus of the work is on providing a comprehensive
survey of state-of-the-art storage, indexing and query processing techniques
for efficiently evaluating SPARQL queries in a local setting (on one machine).
To keep the survey self-contained, we also provide a short discussion on graph
partitioning techniques used in the distributed setting. We conclude by
discussing contemporary research challenges for further improving SPARQL query
engines. This extended version also provides a survey of over one hundred
SPARQL query engines and the techniques they use, along with twelve benchmarks
and their features.Comment: This version adds 15 more systems, more details on approaches for
processing property paths, as well as some other minor change