Search CORE

8 research outputs found

Estimating the cardinality of conjunctive queries over RDF data using graph summarisation

Author: Kostylev EV
Motik B
Stefanoni G
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Estimating the cardinality (i.e., the number of answers) of conjunctive queries is particularly difficult in RDF systems: queries over RDF data are navigational and thus tend to involve many joins. We present a new, principled cardinality estimation technique based on graph summarisation. We interpret a summary of an RDF graph using a possible world semantics and formalise the estimation problem as computing the expected cardinality over all RDF graphs represented by the summary, and we present a closed-form formula for computing the expectation of arbitrary queries. We also discuss approaches to RDF graph summarisation. Finally, we show empirically that our cardinality technique is more accurate and more consistent, often by orders of magnitude, than the state of the art. </p

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Estimating the cardinality of conjunctive queries over RDF data using graph summarisation

Author: Kostylev EV
Motik B
Stefanoni G
Publication venue: International World Wide Web Conference Committee
Publication date: 01/01/2018
Field of study

Oxford University Research Archive

Join Cardinality Estimation Graphs: Analyzing Pessimistic and Optimistic Estimators Through a Common Lens

Author: Chen Jeremy Yujui
Publication venue: 'University of Waterloo'
Publication date: 25/07/2020
Field of study

Join cardinality estimation is a fundamental problem that is solved in the query optimizers of database management systems when generating efficient query plans. This problem arises both in systems that manage relational data as well those that manage graph-structured data where systems need to estimate the cardinalities of subgraphs in their input graphs. We focus on graph-structured data in this thesis. A popular class of join cardinality estimators uses statistics about sizes of small size queries to make estimates for larger queries. Statistics-based estimators can be broadly divided into two groups: (i) optimistic estimators that use statistics in formulas that make degree regularity and conditional independence assumptions; and (ii) the recent pessimistic estimators that estimate the sizes of queries using a set of upper bounds derived from linear programs, such as the AGM bound, or tighter bounds, such as the MOLP bound that are based on information theoretic bounds. In this thesis, we introduce a new framework that we call cardinality estimation graph (CEG) that can represent the estimates of both optimistic and pessimistic estimators. We observe that there is generally more than one way to generate optimistic estimates for a query, and the choice has either been ad-hoc or unspecified in previous work. We empirically show that choosing the largest candidate yields much higher accuracy than pessimistic estimators across different datasets and query workloads, and it is an effective heuristic to combat underestimations, which optimistic estimators are known to suffer from. To further improve the accuracy, we demonstrate how hash partitioning, an optimization technique designed to improve pessimistic estimators' accuracy, can be applied to optimistic estimators, and we evaluate the effectiveness. CEGs can also be used to obtain insights of pessimistic estimators. We show MOLP estimator is at least as tight as the pessimistic estimator and are identical on acyclic queries over binary relations, and the MOLP CEG offers an intuitive combinatorial proof that the MOLP bound is tighter than the DBPLP bound

University of Waterloo's Institutional Repository

Time and Memory Efficient Parallel Algorithm for Structural Graph Summaries and two Extensions to Incremental Summarization and $k$ -Bisimulation for Long $k$ -Chaining

Author: Blume Till
Rau Jannik
Richerby David
Scherp Ansgar
Publication venue
Publication date: 04/11/2022
Field of study

We developed a flexible parallel algorithm for graph summarization based on vertex-centric programming and parameterized message passing. The base algorithm supports infinitely many structural graph summary models defined in a formal language. An extension of the parallel base algorithm allows incremental graph summarization. In this paper, we prove that the incremental algorithm is correct and show that updates are performed in time

\mathcal{O}(\Delta \cdot d^k)

, where

\Delta

is the number of additions, deletions, and modifications to the input graph,

d

the maximum degree, and

k

is the maximum distance in the subgraphs considered. Although the iterative algorithm supports values of

k>1

, it requires nested data structures for the message passing that are memory-inefficient. Thus, we extended the base summarization algorithm by a hash-based messaging mechanism to support a scalable iterative computation of graph summarizations based on

k

-bisimulation for arbitrary

k

. We empirically evaluate the performance of our algorithms using benchmark and real-world datasets. The incremental algorithm almost always outperforms the batch computation. We observe in our experiments that the incremental algorithm is faster even in cases when

50\%

of the graph database changes from one version to the next. The incremental computation requires a three-layered hash index, which has a low memory overhead of only

8\%

(

\pm 1\%

). Finally, the incremental summarization algorithm outperforms the batch algorithm even with fewer cores. The iterative parallel

k

-bisimulation algorithm computes summaries on graphs with over

10

M edges within seconds. We show that the algorithm processes graphs of

100+\,

M edges within a few minutes while having a moderate memory consumption of

<150

GB. For the largest BSBM1B dataset with 1 billion edges, it computes

k=10

bisimulation in under an hour

arXiv.org e-Print Archive

Semantic-based integration of sensor networks

Author: Babović Zoran B.
Publication venue: Универзитет у Београду, Електротехнички факултет
Publication date: 28/09/2018
Field of study

азвој CMOS технологије високог степена интеграције проузроковао је интензиван развој сензорских технологија, бежичних комуникација и енергетски ефикасних процесора, који заједно чине сензорске чворове, способне да опажају нашу околину, врше обраду података и да размењују податке са другим уређајима и корисницима на Интернету. Такав технолошки развој је довео до појаве визије Интернета Ствари (енг. Internet of Things - IoT), са циљем да се корисницима на Интернету пруже информације из реалног света који нас окружује. Као један од предуслова за реализацију замишљених IoT сервиса и производа, неопходна је хоризонтална интеграција распоређених сензорских мрежа, имплементацијом платформи које би омогућиле интеграцију хетерогених сензорских мрежа, састављених од различитих сензорских уређаја, који користе различите комуникационе протоколе и формате порука и у могућности су да опслужују већи број корисника. Предмет ове дисертације су архитектуре за интеграцију сензорских мрежа, које у општем случају пружају подршку раду IoT апликацијама са захтевима за високим перформансама. Од основних генеричких типова, издвојена је архитектура заснована на семантици података и циљ ове дисертације је анализа архитектура које омогућавају семантичку интеграцију сензорских мрежа коришћењем семантичких веб технологија ради омогућавања интероперабилности сензорских података и мрежа. У раду су најпре идентификовани основни типови генеричких архитектура и дате су њихове кључне карактеристике и начин реализације, а за архитектуре засноване на пролазном уређају и брокеру порука урађена је евалуација перформанси у раду са подацима у реалном времену креирањем симулационог окружења. Затим су архитектуре са семантичком интеграцијом сензорских мрежа класификоване идентификовањем типичних пројектанских приступа. Издвојене су две групе приступа, и то приступи оријентисани ка сензорским мрежама и апликативно оријентисани приступи, а свака група садржи даље по четири типа архитектуралних приступа. За сваки архитектурални тип дата је анализа предности и недостатака коришћеног приступа и кратак опис конкретних представника...The progress of CMOS technology very large scale integration has resulted in the intensive development of sensor technologies, wireless communications, and energy efficient processors, which together make sensor nodes capable of perceiving our environment, processing data and exchanging data with other devices and users on the Internet. Such technological development has led to the emergence of the Internet of Things (IoT) vision, with the goal of providing information of the real world that surrounds us to the Internet users. As one of the prerequisites for the realization of imaginary IoT services and products, the horizontal integration of deployed sensor networks is required, by implementing platforms that are able to serve a large number of users and would allow the integration of heterogeneous sensor networks consisted of different sensor devices, using different communication protocols and message formats. The subject of this dissertation is the architectures for the integration of sensor networks, which generally support IoT applications with high performance requirements. Among the available generic types, an architecture based on the data semantics is selected and the aim of this dissertation is the analysis of the architectures that enable semantic integration of sensor networks using Semantic Web technologies in order to achieve the interoperability of sensor data and networks. The basic generic architecture types were first identified and their key characteristics and method of implementation were given. For architectures based on the gateway and the message broker, the performance evaluation was performed in the scenario of real-time sensor messages delivery by creating a simulation environment. Then the architectures with semantic-based integration of sensor networks are classified by identifying typical design approaches. Two groups of approaches are identified, approaches oriented to sensor networks and applicationoriented approaches, whereas each group contains four architectural types. For each architectural type, an analysis of the advantages and disadvantages of the used approach is given, as well as a brief description of the concrete representatives..

National Repository of Dissertations in Serbia (NaRDuS)

Nardus