1,916 research outputs found
LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs
The number of linked data sources and the size of the linked open data graph
keep growing every day. As a consequence, semantic RDF services are more and
more confronted with various "big data" problems. Query processing in the
presence of inferences is one them. For instance, to complete the answer set of
SPARQL queries, RDF database systems evaluate semantic RDFS relationships
(subPropertyOf, subClassOf) through time-consuming query rewriting algorithms
or space-consuming data materialization solutions. To reduce the memory
footprint and ease the exchange of large datasets, these systems generally
apply a dictionary approach for compressing triple data sizes by replacing
resource identifiers (IRIs), blank nodes and literals with integer values. In
this article, we present a structured resource identification scheme using a
clever encoding of concepts and property hierarchies for efficiently evaluating
the main common RDFS entailment rules while minimizing triple materialization
and query rewriting. We will show how this encoding can be computed by a
scalable parallel algorithm and directly be implemented over the Apache Spark
framework. The efficiency of our encoding scheme is emphasized by an evaluation
conducted over both synthetic and real world datasets.Comment: 8 pages, 1 figur
- …