4,943 research outputs found
Partout: A Distributed Engine for Efficient RDF Processing
The increasing interest in Semantic Web technologies has led not only to a
rapid growth of semantic data on the Web but also to an increasing number of
backend applications with already more than a trillion triples in some cases.
Confronted with such huge amounts of data and the future growth, existing
state-of-the-art systems for storing RDF and processing SPARQL queries are no
longer sufficient. In this paper, we introduce Partout, a distributed engine
for efficient RDF processing in a cluster of machines. We propose an effective
approach for fragmenting RDF data sets based on a query log, allocating the
fragments to nodes in a cluster, and finding the optimal configuration. Partout
can efficiently handle updates and its query optimizer produces efficient query
execution plans for ad-hoc SPARQL queries. Our experiments show the superiority
of our approach to state-of-the-art approaches for partitioning and distributed
SPARQL query processing
On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark
Querying very large RDF data sets in an efficient manner requires a
sophisticated distribution strategy. Several innovative solutions have recently
been proposed for optimizing data distribution with predefined query workloads.
This paper presents an in-depth analysis and experimental comparison of five
representative and complementary distribution approaches. For achieving fair
experimental results, we are using Apache Spark as a common parallel computing
framework by rewriting the concerned algorithms using the Spark API. Spark
provides guarantees in terms of fault tolerance, high availability and
scalability which are essential in such systems. Our different implementations
aim to highlight the fundamental implementation-independent characteristics of
each approach in terms of data preparation, load balancing, data replication
and to some extent to query answering cost and performance. The presented
measures are obtained by testing each system on one synthetic and one
real-world data set over query workloads with differing characteristics and
different partitioning constraints.Comment: 16 pages, 3 figure
Almost all triple systems with independent neighborhoods are semi-bipartite
The neighborhood of a pair of vertices in a triple system is the set of
vertices such that is an edge.
A triple system
\HH is semi-bipartite if its vertex set contains a vertex subset such
that every edge of \HH intersects in exactly two points. It is easy to
see that if \HH is semi-bipartite, then the neighborhood of every pair of
vertices in \HH is an independent set. We show a partial converse of this
statement by proving that almost all triple systems with vertex sets and
independent neighborhoods are semi-bipartite. Our result can be viewed as an
extension of the Erd\H os-Kleitman-Rothschild theorem to triple systems. The
proof uses the Frankl-R\"odl hypergraph regularity lemma, and stability
theorems. Similar results have recently been proved for hypergraphs with
various other local constraints
RORS: Enhanced Rule-based OWL Reasoning on Spark
The rule-based OWL reasoning is to compute the deductive closure of an
ontology by applying RDF/RDFS and OWL entailment rules. The performance of the
rule-based OWL reasoning is often sensitive to the rule execution order. In
this paper, we present an approach to enhancing the performance of the
rule-based OWL reasoning on Spark based on a locally optimal executable
strategy. Firstly, we divide all rules (27 in total) into four main classes,
namely, SPO rules (5 rules), type rules (7 rules), sameAs rules (7 rules), and
schema rules (8 rules) since, as we investigated, those triples corresponding
to the first three classes of rules are overwhelming (e.g., over 99% in the
LUBM dataset) in our practical world. Secondly, based on the interdependence
among those entailment rules in each class, we pick out an optimal rule
executable order of each class and then combine them into a new rule execution
order of all rules. Finally, we implement the new rule execution order on Spark
in a prototype called RORS. The experimental results show that the running time
of RORS is improved by about 30% as compared to Kim & Park's algorithm (2015)
using the LUBM200 (27.6 million triples).Comment: 12 page
The Heisenberg product seen as a branching problem for connected reductive groups, stability properties
In this article we study, in the context of complex representations of
symmetric groups, some aspects of the Heisenberg product, introduced by Marcelo
Aguiar, Walter Ferrer Santos, and Walter Moreira in 2017. When applied to
irreducible representations, this product gives rise to the Aguiar
coefficients. We prove that these coefficients are in fact also branching
coefficients for representations of connected complex reductive groups. This
allows to use geometric methods already developped in a previous article,
notably based on notions from Geometric Invariant Theory, and to obtain some
stability results on Aguiar coefficients, generalising some of the results
concerning them given by Li Ying
Solving weighted and counting variants of connectivity problems parameterized by treewidth deterministically in single exponential time
It is well known that many local graph problems, like Vertex Cover and
Dominating Set, can be solved in 2^{O(tw)}|V|^{O(1)} time for graphs G=(V,E)
with a given tree decomposition of width tw. However, for nonlocal problems,
like the fundamental class of connectivity problems, for a long time we did not
know how to do this faster than tw^{O(tw)}|V|^{O(1)}. Recently, Cygan et al.
(FOCS 2011) presented Monte Carlo algorithms for a wide range of connectivity
problems running in time $c^{tw}|V|^{O(1)} for a small constant c, e.g., for
Hamiltonian Cycle and Steiner tree. Naturally, this raises the question whether
randomization is necessary to achieve this runtime; furthermore, it is
desirable to also solve counting and weighted versions (the latter without
incurring a pseudo-polynomial cost in terms of the weights).
We present two new approaches rooted in linear algebra, based on matrix rank
and determinants, which provide deterministic c^{tw}|V|^{O(1)} time algorithms,
also for weighted and counting versions. For example, in this time we can solve
the traveling salesman problem or count the number of Hamiltonian cycles. The
rank-based ideas provide a rather general approach for speeding up even
straightforward dynamic programming formulations by identifying "small" sets of
representative partial solutions; we focus on the case of expressing
connectivity via sets of partitions, but the essential ideas should have
further applications. The determinant-based approach uses the matrix tree
theorem for deriving closed formulas for counting versions of connectivity
problems; we show how to evaluate those formulas via dynamic programming.Comment: 36 page
- …