226 research outputs found
Partout: A Distributed Engine for Efficient RDF Processing
The increasing interest in Semantic Web technologies has led not only to a
rapid growth of semantic data on the Web but also to an increasing number of
backend applications with already more than a trillion triples in some cases.
Confronted with such huge amounts of data and the future growth, existing
state-of-the-art systems for storing RDF and processing SPARQL queries are no
longer sufficient. In this paper, we introduce Partout, a distributed engine
for efficient RDF processing in a cluster of machines. We propose an effective
approach for fragmenting RDF data sets based on a query log, allocating the
fragments to nodes in a cluster, and finding the optimal configuration. Partout
can efficiently handle updates and its query optimizer produces efficient query
execution plans for ad-hoc SPARQL queries. Our experiments show the superiority
of our approach to state-of-the-art approaches for partitioning and distributed
SPARQL query processing
Entity Summarisation with Limited Edge Budget on Undirected and Directed Knowledge Graphs
The paper concerns a novel problem of summarising entities with limited presentation budget on entity-relationship knowledge graphs and propose an efficient algorithm for solving this problem. The algorithm has been implemented in two variants: undirected and directed, together with a visualisation tool. Experimental user evaluation of the algorithm was conducted on real large semantic knowledge graphs extracted from the web. The reported results of experimental user evaluation are promising and encourage to continue the work on improving the algorithm.
Efficient creation and incremental maintenance of the hopi index for complex xml document collections
The HOPI index, a connection index for XML documents based on the concept of a 2–hop cover, provides space – and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in XML search engines. This paper presents enhanced algorithms for building HOPI, shows how to augment the index with distance information, and discusses incremental index maintenance. Our experiments show substantial improvements over the existing divide-and-conquer algorithm for index creation, low space overhead for including distance information in the index, and efficient updates
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
Scientific Paper Recommendation Systems: A Literature Review of Recent Publications
Scientific writing builds upon already published papers. Manual identification of publications to read, cite or consider as related papers relies on a researcher’s ability to identify fitting keywords or initial papers from which a literature search can be started. The rapidly increasing amount of papers has called for automatic measures to find the desired relevant publications, so-called paper recommendation systems. As the number of publications increases so does the amount of paper recommendation systems. Former literature reviews focused on discussing the general landscape of approaches throughout the years and highlight the main directions. We refrain from this perspective, instead we only consider a comparatively small time frame but analyse it fully. In this literature review we discuss used methods, datasets, evaluations and open challenges encountered in all works first released between January 2019 and October 2021. The goal of this survey is to provide a comprehensive and complete overview of current paper recommendation systems
Transaktionen in föderierten Datenbanksystemen unter eingeschränkten Isolation Levels
Atomarität und Isolation von Transaktionen sind Schlüsseleigenschaften fortgeschrittener Anwendungen in föderierten Systemen, die aus verteilten, heterogenen Komponenten bestehen. Während Atomarität von praktisch allen realen Systemen durch das Zweiphasen- Commitprotokoll gewährleistet wird, unterstützt kein System eine explizite föderierte Concurrency Control. In der Literatur wurden zwar zahlreiche Lösungsansätze vorgeschlagen, doch sie haben wenig Einfluss auf Produkte genommen, weil sie die weitverbreiteten Isolation Levels nicht berücksichtigen, die Applikationen Optimierungsmöglichkeiten auf Kosten einer eingeschränkten Kontrolle über die Konsistenz der Daten erlauben. Diese Arbeit vergleicht zunächst existierende Definitionen für Isolation Levels und entwickelt eine neuartige, formale Charakterisierung für Snapshot Isolation, dem Isolation Level des Marktführers Oracle. Anschließend werden Algorithmen zur föderierten Concurrency Control vorgestellt, die beweisbar auch unter lokaler Snapshot Isolation die korrekte Ausführung föderierter Transaktionen gewährleisten, und Isolation Levels für föderierte Transaktionen diskutiert. Die Algorithmen sind in ein prototypisches föderiertes System integriert. Performancemessungen an diesem Prototyp zeigen ihre praktische Einsetzbarkeit.Atomicity and isolation of transactions are key requirements of advanced applications in federated systems consisting of distributed and heterogeneous components. While all existing federated systems support atomicity using the two-phase commit protocol, they lack support for federated concurrency control. Many possible solutions have been proposed in the literature, but they failed to make impact on real systems because they completely ignored the widely used concept of isolation levels, which offer optimization options to applications at the cost of less rigorous control over data consistency. This thesis compares existing definitions for isolation levels and develops a new characterization for Snapshot Isolation, an isolation level provided by Oracle, the market leader in the database field. We present algorithms for federated concurrency control that provably guarantee the correct execution of federated transactions even under local Snapshot Isolation, and discuss isolation levels for federated transactions. The algorithms are integrated into a federated system prototype. Performance measurements with this prototype show the practical viability of the developed methods
Transaktionen in föderierten Datenbanksystemen unter eingeschränkten Isolation Levels
Atomarität und Isolation von Transaktionen sind Schlüsseleigenschaften fortgeschrittener Anwendungen in föderierten Systemen, die aus verteilten, heterogenen Komponenten bestehen. Während Atomarität von praktisch allen realen Systemen durch das Zweiphasen- Commitprotokoll gewährleistet wird, unterstützt kein System eine explizite föderierte Concurrency Control. In der Literatur wurden zwar zahlreiche Lösungsansätze vorgeschlagen, doch sie haben wenig Einfluss auf Produkte genommen, weil sie die weitverbreiteten Isolation Levels nicht berücksichtigen, die Applikationen Optimierungsmöglichkeiten auf Kosten einer eingeschränkten Kontrolle über die Konsistenz der Daten erlauben. Diese Arbeit vergleicht zunächst existierende Definitionen für Isolation Levels und entwickelt eine neuartige, formale Charakterisierung für Snapshot Isolation, dem Isolation Level des Marktführers Oracle. Anschließend werden Algorithmen zur föderierten Concurrency Control vorgestellt, die beweisbar auch unter lokaler Snapshot Isolation die korrekte Ausführung föderierter Transaktionen gewährleisten, und Isolation Levels für föderierte Transaktionen diskutiert. Die Algorithmen sind in ein prototypisches föderiertes System integriert. Performancemessungen an diesem Prototyp zeigen ihre praktische Einsetzbarkeit.Atomicity and isolation of transactions are key requirements of advanced applications in federated systems consisting of distributed and heterogeneous components. While all existing federated systems support atomicity using the two-phase commit protocol, they lack support for federated concurrency control. Many possible solutions have been proposed in the literature, but they failed to make impact on real systems because they completely ignored the widely used concept of isolation levels, which offer optimization options to applications at the cost of less rigorous control over data consistency. This thesis compares existing definitions for isolation levels and develops a new characterization for Snapshot Isolation, an isolation level provided by Oracle, the market leader in the database field. We present algorithms for federated concurrency control that provably guarantee the correct execution of federated transactions even under local Snapshot Isolation, and discuss isolation levels for federated transactions. The algorithms are integrated into a federated system prototype. Performance measurements with this prototype show the practical viability of the developed methods
- …
