17 research outputs found
Old Techniques for New Join Algorithms: A Case Study in RDF Processing
Recently there has been significant interest around designing specialized RDF
engines, as traditional query processing mechanisms incur orders of magnitude
performance gaps on many RDF workloads. At the same time researchers have
released new worst-case optimal join algorithms which can be asymptotically
better than the join algorithms in traditional engines. In this paper we apply
worst-case optimal join algorithms to a standard RDF workload, the LUBM
benchmark, for the first time. We do so using two worst-case optimal engines:
(1) LogicBlox, a commercial database engine, and (2) EmptyHeaded, our prototype
research engine with enhanced worst-case optimal join algorithms. We show that
without any added optimizations both LogicBlox and EmptyHeaded outperform two
state-of-the-art specialized RDF engines, RDF-3X and TripleBit, by up to 6x on
cyclic join queries-the queries where traditional optimizers are suboptimal. On
the remaining, less complex queries in the LUBM benchmark, we show that three
classic query optimization techniques enable EmptyHeaded to compete with RDF
engines, even when there is no asymptotic advantage to the worst-case optimal
approach. We validate that our design has merit as EmptyHeaded outperforms
MonetDB by three orders of magnitude and LogicBlox by two orders of magnitude,
while remaining within an order of magnitude of RDF-3X and TripleBit
GSI: GPU-friendly Subgraph Isomorphism
Subgraph isomorphism is a well-known NP-hard problem that is widely used in
many applications, such as social network analysis and query over the knowledge
graph. Due to the inherent hardness, its performance is often a bottleneck in
various real-world applications. Therefore, we address this by designing an
efficient subgraph isomorphism algorithm leveraging features of GPU
architecture, such as massive parallelism and memory hierarchy. Existing
GPU-based solutions adopt a two-step output scheme, performing the same join
process twice in order to write intermediate results concurrently. They also
lack GPU architecture-aware optimizations that allow scaling to large graphs.
In this paper, we propose a GPU-friendly subgraph isomorphism algorithm, GSI.
Different from existing edge join-based GPU solutions, we propose a
Prealloc-Combine strategy based on the vertex-oriented framework, which avoids
joining-twice in existing solutions. Also, a GPU-friendly data structure
(called PCSR) is proposed to represent an edge-labeled graph. Extensive
experiments on both synthetic and real graphs show that GSI outperforms the
state-of-the-art algorithms by up to several orders of magnitude and has good
scalability with graph size scaling to hundreds of millions of edges.Comment: 15 pages, 17 figures, conferenc
Adaptive Low-level Storage of Very Large Knowledge Graphs
The increasing availability and usage of Knowledge Graphs (KGs) on the Web
calls for scalable and general-purpose solutions to store this type of data
structures. We propose Trident, a novel storage architecture for very large KGs
on centralized systems. Trident uses several interlinked data structures to
provide fast access to nodes and edges, with the physical storage changing
depending on the topology of the graph to reduce the memory footprint. In
contrast to single architectures designed for single tasks, our approach offers
an interface with few low-level and general-purpose primitives that can be used
to implement tasks like SPARQL query answering, reasoning, or graph analytics.
Our experiments show that Trident can handle graphs with 10^11 edges using
inexpensive hardware, delivering competitive performance on multiple workloads.Comment: Accepted WWW 202
Subgraph query matching in multi-graphs based on node embedding
This paper presents an efficient algorithm for matching subgraph queries in a multi-graph based on features-based indexing techniques. The KD-tree data structure represents these nodes' features, while the set-trie index data structure represents the multi-edges to make queries effectively. The vertex core number, triangle number, and vertex degree are the eight features' main features. The densest vertex in the query graph is extracted based on these main features. The proposed model consists of two phases. The first phase's main idea is that, for the densest extracted vertex in the query graph, find the density similar neighborhood structure in the data graph. Then find the k-nearest neighborhood query to obtain the densest subgraph. The second phase for each layer graph, mapping the vertex to feature vector (Vertex Embedding), improves the proposed model. To reduce the node-embedding size to be efficient with the KD-tree, indexing a dimension reduction, the principal component analysis (PCA) method is used. Furthermore, symmetry-breaking conditions will remove the redundancy in the generated pattern matching with the query graph. In both phases, the filtering process is applied to minimize the number of candidate data nodes of the initiate query vertex. The filtering process is applied to minimize the number of candidate data nodes of the initiate query vertex. Finally, testing the effect of the concatenation of the structural features (orbits features) with the meta-features (summary of general, statistical, information-theoretic, etc.) for signatures of nodes on the model performance. The proposed model is tested over three real benchmarks, multi-graph datasets, and two randomly generated multi-graph datasets. The results agree with the theoretical study in both random cliques and Erdos random graph. The experiments showed that the time efficiency and the scalability results of the proposed model are acceptable.Web of Science1024art. no. 483