4,943 research outputs found

    Partout: A Distributed Engine for Efficient RDF Processing

    Full text link
    The increasing interest in Semantic Web technologies has led not only to a rapid growth of semantic data on the Web but also to an increasing number of backend applications with already more than a trillion triples in some cases. Confronted with such huge amounts of data and the future growth, existing state-of-the-art systems for storing RDF and processing SPARQL queries are no longer sufficient. In this paper, we introduce Partout, a distributed engine for efficient RDF processing in a cluster of machines. We propose an effective approach for fragmenting RDF data sets based on a query log, allocating the fragments to nodes in a cluster, and finding the optimal configuration. Partout can efficiently handle updates and its query optimizer produces efficient query execution plans for ad-hoc SPARQL queries. Our experiments show the superiority of our approach to state-of-the-art approaches for partitioning and distributed SPARQL query processing

    On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

    Full text link
    Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative and complementary distribution approaches. For achieving fair experimental results, we are using Apache Spark as a common parallel computing framework by rewriting the concerned algorithms using the Spark API. Spark provides guarantees in terms of fault tolerance, high availability and scalability which are essential in such systems. Our different implementations aim to highlight the fundamental implementation-independent characteristics of each approach in terms of data preparation, load balancing, data replication and to some extent to query answering cost and performance. The presented measures are obtained by testing each system on one synthetic and one real-world data set over query workloads with differing characteristics and different partitioning constraints.Comment: 16 pages, 3 figure

    Almost all triple systems with independent neighborhoods are semi-bipartite

    Get PDF
    The neighborhood of a pair of vertices u,vu,v in a triple system is the set of vertices ww such that uvwuvw is an edge. A triple system \HH is semi-bipartite if its vertex set contains a vertex subset XX such that every edge of \HH intersects XX in exactly two points. It is easy to see that if \HH is semi-bipartite, then the neighborhood of every pair of vertices in \HH is an independent set. We show a partial converse of this statement by proving that almost all triple systems with vertex sets [n][n] and independent neighborhoods are semi-bipartite. Our result can be viewed as an extension of the Erd\H os-Kleitman-Rothschild theorem to triple systems. The proof uses the Frankl-R\"odl hypergraph regularity lemma, and stability theorems. Similar results have recently been proved for hypergraphs with various other local constraints

    RORS: Enhanced Rule-based OWL Reasoning on Spark

    Full text link
    The rule-based OWL reasoning is to compute the deductive closure of an ontology by applying RDF/RDFS and OWL entailment rules. The performance of the rule-based OWL reasoning is often sensitive to the rule execution order. In this paper, we present an approach to enhancing the performance of the rule-based OWL reasoning on Spark based on a locally optimal executable strategy. Firstly, we divide all rules (27 in total) into four main classes, namely, SPO rules (5 rules), type rules (7 rules), sameAs rules (7 rules), and schema rules (8 rules) since, as we investigated, those triples corresponding to the first three classes of rules are overwhelming (e.g., over 99% in the LUBM dataset) in our practical world. Secondly, based on the interdependence among those entailment rules in each class, we pick out an optimal rule executable order of each class and then combine them into a new rule execution order of all rules. Finally, we implement the new rule execution order on Spark in a prototype called RORS. The experimental results show that the running time of RORS is improved by about 30% as compared to Kim & Park's algorithm (2015) using the LUBM200 (27.6 million triples).Comment: 12 page

    The Heisenberg product seen as a branching problem for connected reductive groups, stability properties

    Full text link
    In this article we study, in the context of complex representations of symmetric groups, some aspects of the Heisenberg product, introduced by Marcelo Aguiar, Walter Ferrer Santos, and Walter Moreira in 2017. When applied to irreducible representations, this product gives rise to the Aguiar coefficients. We prove that these coefficients are in fact also branching coefficients for representations of connected complex reductive groups. This allows to use geometric methods already developped in a previous article, notably based on notions from Geometric Invariant Theory, and to obtain some stability results on Aguiar coefficients, generalising some of the results concerning them given by Li Ying

    Solving weighted and counting variants of connectivity problems parameterized by treewidth deterministically in single exponential time

    Full text link
    It is well known that many local graph problems, like Vertex Cover and Dominating Set, can be solved in 2^{O(tw)}|V|^{O(1)} time for graphs G=(V,E) with a given tree decomposition of width tw. However, for nonlocal problems, like the fundamental class of connectivity problems, for a long time we did not know how to do this faster than tw^{O(tw)}|V|^{O(1)}. Recently, Cygan et al. (FOCS 2011) presented Monte Carlo algorithms for a wide range of connectivity problems running in time $c^{tw}|V|^{O(1)} for a small constant c, e.g., for Hamiltonian Cycle and Steiner tree. Naturally, this raises the question whether randomization is necessary to achieve this runtime; furthermore, it is desirable to also solve counting and weighted versions (the latter without incurring a pseudo-polynomial cost in terms of the weights). We present two new approaches rooted in linear algebra, based on matrix rank and determinants, which provide deterministic c^{tw}|V|^{O(1)} time algorithms, also for weighted and counting versions. For example, in this time we can solve the traveling salesman problem or count the number of Hamiltonian cycles. The rank-based ideas provide a rather general approach for speeding up even straightforward dynamic programming formulations by identifying "small" sets of representative partial solutions; we focus on the case of expressing connectivity via sets of partitions, but the essential ideas should have further applications. The determinant-based approach uses the matrix tree theorem for deriving closed formulas for counting versions of connectivity problems; we show how to evaluate those formulas via dynamic programming.Comment: 36 page
    • …
    corecore