8 research outputs found

    Estimating the cardinality of conjunctive queries over RDF data using graph summarisation

    No full text
    Estimating the cardinality (i.e., the number of answers) of conjunctive queries is particularly difficult in RDF systems: queries over RDF data are navigational and thus tend to involve many joins. We present a new, principled cardinality estimation technique based on graph summarisation. We interpret a summary of an RDF graph using a possible world semantics and formalise the estimation problem as computing the expected cardinality over all RDF graphs represented by the summary, and we present a closed-form formula for computing the expectation of arbitrary queries. We also discuss approaches to RDF graph summarisation. Finally, we show empirically that our cardinality technique is more accurate and more consistent, often by orders of magnitude, than the state of the art. </p

    Estimating the cardinality of conjunctive queries over RDF data using graph summarisation

    No full text
    Estimating the cardinality (i.e., the number of answers) of conjunctive queries is particularly difficult in RDF systems: queries over RDF data are navigational and thus tend to involve many joins. We present a new, principled cardinality estimation technique based on graph summarisation. We interpret a summary of an RDF graph using a possible world semantics and formalise the estimation problem as computing the expected cardinality over all RDF graphs represented by the summary, and we present a closed-form formula for computing the expectation of arbitrary queries. We also discuss approaches to RDF graph summarisation. Finally, we show empirically that our cardinality technique is more accurate and more consistent, often by orders of magnitude, than the state of the art. </p

    Join Cardinality Estimation Graphs: Analyzing Pessimistic and Optimistic Estimators Through a Common Lens

    Get PDF
    Join cardinality estimation is a fundamental problem that is solved in the query optimizers of database management systems when generating efficient query plans. This problem arises both in systems that manage relational data as well those that manage graph-structured data where systems need to estimate the cardinalities of subgraphs in their input graphs. We focus on graph-structured data in this thesis. A popular class of join cardinality estimators uses statistics about sizes of small size queries to make estimates for larger queries. Statistics-based estimators can be broadly divided into two groups: (i) optimistic estimators that use statistics in formulas that make degree regularity and conditional independence assumptions; and (ii) the recent pessimistic estimators that estimate the sizes of queries using a set of upper bounds derived from linear programs, such as the AGM bound, or tighter bounds, such as the MOLP bound that are based on information theoretic bounds. In this thesis, we introduce a new framework that we call cardinality estimation graph (CEG) that can represent the estimates of both optimistic and pessimistic estimators. We observe that there is generally more than one way to generate optimistic estimates for a query, and the choice has either been ad-hoc or unspecified in previous work. We empirically show that choosing the largest candidate yields much higher accuracy than pessimistic estimators across different datasets and query workloads, and it is an effective heuristic to combat underestimations, which optimistic estimators are known to suffer from. To further improve the accuracy, we demonstrate how hash partitioning, an optimization technique designed to improve pessimistic estimators' accuracy, can be applied to optimistic estimators, and we evaluate the effectiveness. CEGs can also be used to obtain insights of pessimistic estimators. We show MOLP estimator is at least as tight as the pessimistic estimator and are identical on acyclic queries over binary relations, and the MOLP CEG offers an intuitive combinatorial proof that the MOLP bound is tighter than the DBPLP bound

    Time and Memory Efficient Parallel Algorithm for Structural Graph Summaries and two Extensions to Incremental Summarization and kk-Bisimulation for Long kk-Chaining

    Full text link
    We developed a flexible parallel algorithm for graph summarization based on vertex-centric programming and parameterized message passing. The base algorithm supports infinitely many structural graph summary models defined in a formal language. An extension of the parallel base algorithm allows incremental graph summarization. In this paper, we prove that the incremental algorithm is correct and show that updates are performed in time O(Ξ”β‹…dk)\mathcal{O}(\Delta \cdot d^k), where Ξ”\Delta is the number of additions, deletions, and modifications to the input graph, dd the maximum degree, and kk is the maximum distance in the subgraphs considered. Although the iterative algorithm supports values of k>1k>1, it requires nested data structures for the message passing that are memory-inefficient. Thus, we extended the base summarization algorithm by a hash-based messaging mechanism to support a scalable iterative computation of graph summarizations based on kk-bisimulation for arbitrary kk. We empirically evaluate the performance of our algorithms using benchmark and real-world datasets. The incremental algorithm almost always outperforms the batch computation. We observe in our experiments that the incremental algorithm is faster even in cases when 50%50\% of the graph database changes from one version to the next. The incremental computation requires a three-layered hash index, which has a low memory overhead of only 8%8\% (Β±1%\pm 1\%). Finally, the incremental summarization algorithm outperforms the batch algorithm even with fewer cores. The iterative parallel kk-bisimulation algorithm computes summaries on graphs with over 1010M edges within seconds. We show that the algorithm processes graphs of 100+ 100+\,M edges within a few minutes while having a moderate memory consumption of <150<150 GB. For the largest BSBM1B dataset with 1 billion edges, it computes k=10k=10 bisimulation in under an hour

    Semantic-based integration of sensor networks

    Get PDF
    азвој CMOS Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π΅ високог стСпСна ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΡ˜Π΅ ΠΏΡ€ΠΎΡƒΠ·Ρ€ΠΎΠΊΠΎΠ²Π°ΠΎ јС ΠΈΠ½Ρ‚Π΅Π½Π·ΠΈΠ²Π°Π½ Ρ€Π°Π·Π²ΠΎΡ˜ сСнзорских Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π°, Π±Π΅ΠΆΠΈΡ‡Π½ΠΈΡ… ΠΊΠΎΠΌΡƒΠ½ΠΈΠΊΠ°Ρ†ΠΈΡ˜Π° ΠΈ СнСргСтски Сфикасних процСсора, који зајСдно Ρ‡ΠΈΠ½Π΅ сСнзорскС Ρ‡Π²ΠΎΡ€ΠΎΠ²Π΅, способнС Π΄Π° ΠΎΠΏΠ°ΠΆΠ°Ρ˜Ρƒ Π½Π°ΡˆΡƒ ΠΎΠΊΠΎΠ»ΠΈΠ½Ρƒ, Π²Ρ€ΡˆΠ΅ ΠΎΠ±Ρ€Π°Π΄Ρƒ ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΈ Π΄Π° Ρ€Π°Π·ΠΌΠ΅ΡšΡƒΡ˜Ρƒ ΠΏΠΎΠ΄Π°Ρ‚ΠΊΠ΅ са Π΄Ρ€ΡƒΠ³ΠΈΠΌ ΡƒΡ€Π΅Ρ’Π°Ρ˜ΠΈΠΌΠ° ΠΈ корисницима Π½Π° Π˜Π½Ρ‚Π΅Ρ€Π½Π΅Ρ‚Ρƒ. Π’Π°ΠΊΠ°Π² Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΡˆΠΊΠΈ Ρ€Π°Π·Π²ΠΎΡ˜ јС Π΄ΠΎΠ²Π΅ΠΎ Π΄ΠΎ појавС визијС Π˜Π½Ρ‚Π΅Ρ€Π½Π΅Ρ‚Π° Π‘Ρ‚Π²Π°Ρ€ΠΈ (Π΅Π½Π³. Internet of Things - IoT), са Ρ†ΠΈΡ™Π΅ΠΌ Π΄Π° сС корисницима Π½Π° Π˜Π½Ρ‚Π΅Ρ€Π½Π΅Ρ‚Ρƒ ΠΏΡ€ΡƒΠΆΠ΅ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π΅ ΠΈΠ· Ρ€Π΅Π°Π»Π½ΠΎΠ³ свСта који нас ΠΎΠΊΡ€ΡƒΠΆΡƒΡ˜Π΅. Као јСдан ΠΎΠ΄ прСдуслова Π·Π° Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΡ˜Ρƒ Π·Π°ΠΌΠΈΡˆΡ™Π΅Π½ΠΈΡ… IoT сСрвиса ΠΈ ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄Π°, Π½Π΅ΠΎΠΏΡ…ΠΎΠ΄Π½Π° јС Ρ…ΠΎΡ€ΠΈΠ·ΠΎΠ½Ρ‚Π°Π»Π½Π° ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΡ˜Π° распорСђСних сСнзорских ΠΌΡ€Π΅ΠΆΠ°, ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΡ˜ΠΎΠΌ ΠΏΠ»Π°Ρ‚Ρ„ΠΎΡ€ΠΌΠΈ којС Π±ΠΈ ΠΎΠΌΠΎΠ³ΡƒΡ›ΠΈΠ»Π΅ ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΡ˜Ρƒ Ρ…Π΅Ρ‚Π΅Ρ€ΠΎΠ³Π΅Π½ΠΈΡ… сСнзорских ΠΌΡ€Π΅ΠΆΠ°, састављСних ΠΎΠ΄ Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚ΠΈΡ… сСнзорских ΡƒΡ€Π΅Ρ’Π°Ρ˜Π°, који користС Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚Π΅ ΠΊΠΎΠΌΡƒΠ½ΠΈΠΊΠ°Ρ†ΠΈΠΎΠ½Π΅ ΠΏΡ€ΠΎΡ‚ΠΎΠΊΠΎΠ»Π΅ ΠΈ Ρ„ΠΎΡ€ΠΌΠ°Ρ‚Π΅ ΠΏΠΎΡ€ΡƒΠΊΠ° ΠΈ Ρƒ могућности су Π΄Π° ΠΎΠΏΡΠ»ΡƒΠΆΡƒΡ˜Ρƒ Π²Π΅Ρ›ΠΈ Π±Ρ€ΠΎΡ˜ корисника. ΠŸΡ€Π΅Π΄ΠΌΠ΅Ρ‚ ΠΎΠ²Π΅ Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜Π΅ су Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π΅ Π·Π° ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΡ˜Ρƒ сСнзорских ΠΌΡ€Π΅ΠΆΠ°, којС Ρƒ ΠΎΠΏΡˆΡ‚Π΅ΠΌ ΡΠ»ΡƒΡ‡Π°Ρ˜Ρƒ ΠΏΡ€ΡƒΠΆΠ°Ρ˜Ρƒ ΠΏΠΎΠ΄Ρ€ΡˆΠΊΡƒ Ρ€Π°Π΄Ρƒ IoT Π°ΠΏΠ»ΠΈΠΊΠ°Ρ†ΠΈΡ˜Π°ΠΌΠ° са Π·Π°Ρ…Ρ‚Π΅Π²ΠΈΠΌΠ° Π·Π° високим пСрформансама. Од основних Π³Π΅Π½Π΅Ρ€ΠΈΡ‡ΠΊΠΈΡ… Ρ‚ΠΈΠΏΠΎΠ²Π°, издвојСна јС Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π° заснована Π½Π° сСмантици ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΈ Ρ†ΠΈΡ™ ΠΎΠ²Π΅ Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜Π΅ јС Π°Π½Π°Π»ΠΈΠ·Π° Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π° којС ΠΎΠΌΠΎΠ³ΡƒΡ›Π°Π²Π°Ρ˜Ρƒ сСмантичку ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΡ˜Ρƒ сСнзорских ΠΌΡ€Π΅ΠΆΠ° ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ΅ΠΌ сСмантичких Π²Π΅Π± Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π° Ρ€Π°Π΄ΠΈ ΠΎΠΌΠΎΠ³ΡƒΡ›Π°Π²Π°ΡšΠ° интСропСрабилности сСнзорских ΠΏΠΎΠ΄Π°Ρ‚Π°ΠΊΠ° ΠΈ ΠΌΡ€Π΅ΠΆΠ°. Π£ Ρ€Π°Π΄Ρƒ су Π½Π°Ρ˜ΠΏΡ€Π΅ ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠΎΠ²Π°Π½ΠΈ основни Ρ‚ΠΈΠΏΠΎΠ²ΠΈ Π³Π΅Π½Π΅Ρ€ΠΈΡ‡ΠΊΠΈΡ… Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π° ΠΈ Π΄Π°Ρ‚Π΅ су ΡšΠΈΡ…ΠΎΠ²Π΅ ΠΊΡ™ΡƒΡ‡Π½Π΅ карактСристикС ΠΈ Π½Π°Ρ‡ΠΈΠ½ Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΡ˜Π΅, Π° Π·Π° Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π΅ заснованС Π½Π° ΠΏΡ€ΠΎΠ»Π°Π·Π½ΠΎΠΌ ΡƒΡ€Π΅Ρ’Π°Ρ˜Ρƒ ΠΈ Π±Ρ€ΠΎΠΊΠ΅Ρ€Ρƒ ΠΏΠΎΡ€ΡƒΠΊΠ° ΡƒΡ€Π°Ρ’Π΅Π½Π° јС Π΅Π²Π°Π»ΡƒΠ°Ρ†ΠΈΡ˜Π° пСрформанси Ρƒ Ρ€Π°Π΄Ρƒ са ΠΏΠΎΠ΄Π°Ρ†ΠΈΠΌΠ° Ρƒ Ρ€Π΅Π°Π»Π½ΠΎΠΌ Π²Ρ€Π΅ΠΌΠ΅Π½Ρƒ ΠΊΡ€Π΅ΠΈΡ€Π°ΡšΠ΅ΠΌ симулационог ΠΎΠΊΡ€ΡƒΠΆΠ΅ΡšΠ°. Π—Π°Ρ‚ΠΈΠΌ су Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π΅ са сСмантичком ΠΈΠ½Ρ‚Π΅Π³Ρ€Π°Ρ†ΠΈΡ˜ΠΎΠΌ сСнзорских ΠΌΡ€Π΅ΠΆΠ° класификованС ΠΈΠ΄Π΅Π½Ρ‚ΠΈΡ„ΠΈΠΊΠΎΠ²Π°ΡšΠ΅ΠΌ Ρ‚ΠΈΠΏΠΈΡ‡Π½ΠΈΡ… ΠΏΡ€ΠΎΡ˜Π΅ΠΊΡ‚Π°Π½ΡΠΊΠΈΡ… приступа. ИздвојСнС су Π΄Π²Π΅ Π³Ρ€ΡƒΠΏΠ΅ приступа, ΠΈ Ρ‚ΠΎ приступи ΠΎΡ€ΠΈΡ˜Π΅Π½Ρ‚ΠΈΡΠ°Π½ΠΈ ΠΊΠ° сСнзорским ΠΌΡ€Π΅ΠΆΠ°ΠΌΠ° ΠΈ Π°ΠΏΠ»ΠΈΠΊΠ°Ρ‚ΠΈΠ²Π½ΠΎ ΠΎΡ€ΠΈΡ˜Π΅Π½Ρ‚ΠΈΡΠ°Π½ΠΈ приступи, Π° свака Π³Ρ€ΡƒΠΏΠ° садрТи Π΄Π°Ρ™Π΅ ΠΏΠΎ Ρ‡Π΅Ρ‚ΠΈΡ€ΠΈ Ρ‚ΠΈΠΏΠ° Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π°Π»Π½ΠΈΡ… приступа. Π—Π° сваки Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π°Π»Π½ΠΈ Ρ‚ΠΈΠΏ Π΄Π°Ρ‚Π° јС Π°Π½Π°Π»ΠΈΠ·Π° прСдности ΠΈ нСдостатака ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅Π½ΠΎΠ³ приступа ΠΈ ΠΊΡ€Π°Ρ‚Π°ΠΊ опис ΠΊΠΎΠ½ΠΊΡ€Π΅Ρ‚Π½ΠΈΡ… прСдставника...The progress of CMOS technology very large scale integration has resulted in the intensive development of sensor technologies, wireless communications, and energy efficient processors, which together make sensor nodes capable of perceiving our environment, processing data and exchanging data with other devices and users on the Internet. Such technological development has led to the emergence of the Internet of Things (IoT) vision, with the goal of providing information of the real world that surrounds us to the Internet users. As one of the prerequisites for the realization of imaginary IoT services and products, the horizontal integration of deployed sensor networks is required, by implementing platforms that are able to serve a large number of users and would allow the integration of heterogeneous sensor networks consisted of different sensor devices, using different communication protocols and message formats. The subject of this dissertation is the architectures for the integration of sensor networks, which generally support IoT applications with high performance requirements. Among the available generic types, an architecture based on the data semantics is selected and the aim of this dissertation is the analysis of the architectures that enable semantic integration of sensor networks using Semantic Web technologies in order to achieve the interoperability of sensor data and networks. The basic generic architecture types were first identified and their key characteristics and method of implementation were given. For architectures based on the gateway and the message broker, the performance evaluation was performed in the scenario of real-time sensor messages delivery by creating a simulation environment. Then the architectures with semantic-based integration of sensor networks are classified by identifying typical design approaches. Two groups of approaches are identified, approaches oriented to sensor networks and applicationoriented approaches, whereas each group contains four architectural types. For each architectural type, an analysis of the advantages and disadvantages of the used approach is given, as well as a brief description of the concrete representatives..
    corecore