8 research outputs found
Estimating the cardinality of conjunctive queries over RDF data using graph summarisation
Estimating the cardinality (i.e., the number of answers) of conjunctive queries is particularly difficult in RDF systems: queries over RDF data are navigational and thus tend to involve many joins. We present a new, principled cardinality estimation technique based on graph summarisation. We interpret a summary of an RDF graph using a possible world semantics and formalise the estimation problem as computing the expected cardinality over all RDF graphs represented by the summary, and we present a closed-form formula for computing the expectation of arbitrary queries. We also discuss approaches to RDF graph summarisation. Finally, we show empirically that our cardinality technique is more accurate and more consistent, often by orders of magnitude, than the state of the art.
</p
Estimating the cardinality of conjunctive queries over RDF data using graph summarisation
Estimating the cardinality (i.e., the number of answers) of conjunctive queries is particularly difficult in RDF systems: queries over RDF data are navigational and thus tend to involve many joins. We present a new, principled cardinality estimation technique based on graph summarisation. We interpret a summary of an RDF graph using a possible world semantics and formalise the estimation problem as computing the expected cardinality over all RDF graphs represented by the summary, and we present a closed-form formula for computing the expectation of arbitrary queries. We also discuss approaches to RDF graph summarisation. Finally, we show empirically that our cardinality technique is more accurate and more consistent, often by orders of magnitude, than the state of the art. </p
Join Cardinality Estimation Graphs: Analyzing Pessimistic and Optimistic Estimators Through a Common Lens
Join cardinality estimation is a fundamental problem that is solved in the query optimizers of database management systems when generating efficient query plans. This problem arises both in systems that manage relational data as well those that manage graph-structured data where systems need to estimate the cardinalities of subgraphs in their input graphs. We focus on graph-structured data in this thesis.
A popular class of join cardinality estimators uses statistics about sizes of small size queries to make estimates for larger queries. Statistics-based estimators can be broadly divided into two groups: (i) optimistic estimators that use statistics in formulas that make degree regularity and conditional independence assumptions; and (ii) the recent pessimistic estimators that estimate the sizes of queries using a set of upper bounds derived from linear programs, such as the AGM bound, or tighter bounds, such as the MOLP bound that are based on information theoretic bounds.
In this thesis, we introduce a new framework that we call cardinality estimation graph (CEG) that can represent the estimates of both optimistic and pessimistic estimators. We observe that there is generally more than one way to generate optimistic estimates for a query, and the choice has either been ad-hoc or unspecified in previous work. We empirically show that choosing the largest candidate yields much higher accuracy than pessimistic estimators across different datasets and query workloads, and it is an effective heuristic to combat underestimations, which optimistic estimators are known to suffer from.
To further improve the accuracy, we demonstrate how hash partitioning, an optimization technique designed to improve pessimistic estimators' accuracy, can be applied to optimistic estimators, and we evaluate the effectiveness.
CEGs can also be used to obtain insights of pessimistic estimators. We show MOLP estimator is at least as tight as the pessimistic estimator and are identical on acyclic queries over binary relations, and the MOLP CEG offers an intuitive combinatorial proof that the MOLP bound is tighter than the DBPLP bound
Time and Memory Efficient Parallel Algorithm for Structural Graph Summaries and two Extensions to Incremental Summarization and -Bisimulation for Long -Chaining
We developed a flexible parallel algorithm for graph summarization based on
vertex-centric programming and parameterized message passing. The base
algorithm supports infinitely many structural graph summary models defined in a
formal language. An extension of the parallel base algorithm allows incremental
graph summarization. In this paper, we prove that the incremental algorithm is
correct and show that updates are performed in time , where is the number of additions, deletions, and modifications
to the input graph, the maximum degree, and is the maximum distance in
the subgraphs considered. Although the iterative algorithm supports values of
, it requires nested data structures for the message passing that are
memory-inefficient. Thus, we extended the base summarization algorithm by a
hash-based messaging mechanism to support a scalable iterative computation of
graph summarizations based on -bisimulation for arbitrary . We
empirically evaluate the performance of our algorithms using benchmark and
real-world datasets. The incremental algorithm almost always outperforms the
batch computation. We observe in our experiments that the incremental algorithm
is faster even in cases when of the graph database changes from one
version to the next. The incremental computation requires a three-layered hash
index, which has a low memory overhead of only (). Finally, the
incremental summarization algorithm outperforms the batch algorithm even with
fewer cores. The iterative parallel -bisimulation algorithm computes
summaries on graphs with over M edges within seconds. We show that the
algorithm processes graphs of M edges within a few minutes while having
a moderate memory consumption of GB. For the largest BSBM1B dataset with
1 billion edges, it computes bisimulation in under an hour
Semantic-based integration of sensor networks
Π°Π·Π²ΠΎΡ CMOS ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ Π²ΠΈΡΠΎΠΊΠΎΠ³ ΡΡΠ΅ΠΏΠ΅Π½Π° ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡΠ΅ ΠΏΡΠΎΡΠ·ΡΠΎΠΊΠΎΠ²Π°ΠΎ ΡΠ΅
ΠΈΠ½ΡΠ΅Π½Π·ΠΈΠ²Π°Π½ ΡΠ°Π·Π²ΠΎΡ ΡΠ΅Π½Π·ΠΎΡΡΠΊΠΈΡ
ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ°, Π±Π΅ΠΆΠΈΡΠ½ΠΈΡ
ΠΊΠΎΠΌΡΠ½ΠΈΠΊΠ°ΡΠΈΡΠ° ΠΈ
Π΅Π½Π΅ΡΠ³Π΅ΡΡΠΊΠΈ Π΅ΡΠΈΠΊΠ°ΡΠ½ΠΈΡ
ΠΏΡΠΎΡΠ΅ΡΠΎΡΠ°, ΠΊΠΎΡΠΈ Π·Π°ΡΠ΅Π΄Π½ΠΎ ΡΠΈΠ½Π΅ ΡΠ΅Π½Π·ΠΎΡΡΠΊΠ΅ ΡΠ²ΠΎΡΠΎΠ²Π΅,
ΡΠΏΠΎΡΠΎΠ±Π½Π΅ Π΄Π° ΠΎΠΏΠ°ΠΆΠ°ΡΡ Π½Π°ΡΡ ΠΎΠΊΠΎΠ»ΠΈΠ½Ρ, Π²ΡΡΠ΅ ΠΎΠ±ΡΠ°Π΄Ρ ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΠΈ Π΄Π° ΡΠ°Π·ΠΌΠ΅ΡΡΡΡ
ΠΏΠΎΠ΄Π°ΡΠΊΠ΅ ΡΠ° Π΄ΡΡΠ³ΠΈΠΌ ΡΡΠ΅ΡΠ°ΡΠΈΠΌΠ° ΠΈ ΠΊΠΎΡΠΈΡΠ½ΠΈΡΠΈΠΌΠ° Π½Π° ΠΠ½ΡΠ΅ΡΠ½Π΅ΡΡ. Π’Π°ΠΊΠ°Π²
ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΡΠΊΠΈ ΡΠ°Π·Π²ΠΎΡ ΡΠ΅ Π΄ΠΎΠ²Π΅ΠΎ Π΄ΠΎ ΠΏΠΎΡΠ°Π²Π΅ Π²ΠΈΠ·ΠΈΡΠ΅ ΠΠ½ΡΠ΅ΡΠ½Π΅ΡΠ° Π‘ΡΠ²Π°ΡΠΈ (Π΅Π½Π³. Internet of
Things - IoT), ΡΠ° ΡΠΈΡΠ΅ΠΌ Π΄Π° ΡΠ΅ ΠΊΠΎΡΠΈΡΠ½ΠΈΡΠΈΠΌΠ° Π½Π° ΠΠ½ΡΠ΅ΡΠ½Π΅ΡΡ ΠΏΡΡΠΆΠ΅ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΡΠ΅
ΠΈΠ· ΡΠ΅Π°Π»Π½ΠΎΠ³ ΡΠ²Π΅ΡΠ° ΠΊΠΎΡΠΈ Π½Π°Ρ ΠΎΠΊΡΡΠΆΡΡΠ΅. ΠΠ°ΠΎ ΡΠ΅Π΄Π°Π½ ΠΎΠ΄ ΠΏΡΠ΅Π΄ΡΡΠ»ΠΎΠ²Π° Π·Π° ΡΠ΅Π°Π»ΠΈΠ·Π°ΡΠΈΡΡ
Π·Π°ΠΌΠΈΡΡΠ΅Π½ΠΈΡ
IoT ΡΠ΅ΡΠ²ΠΈΡΠ° ΠΈ ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄Π°, Π½Π΅ΠΎΠΏΡ
ΠΎΠ΄Π½Π° ΡΠ΅ Ρ
ΠΎΡΠΈΠ·ΠΎΠ½ΡΠ°Π»Π½Π° ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡΠ°
ΡΠ°ΡΠΏΠΎΡΠ΅ΡΠ΅Π½ΠΈΡ
ΡΠ΅Π½Π·ΠΎΡΡΠΊΠΈΡ
ΠΌΡΠ΅ΠΆΠ°, ΠΈΠΌΠΏΠ»Π΅ΠΌΠ΅Π½ΡΠ°ΡΠΈΡΠΎΠΌ ΠΏΠ»Π°ΡΡΠΎΡΠΌΠΈ ΠΊΠΎΡΠ΅ Π±ΠΈ
ΠΎΠΌΠΎΠ³ΡΡΠΈΠ»Π΅ ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡΡ Ρ
Π΅ΡΠ΅ΡΠΎΠ³Π΅Π½ΠΈΡ
ΡΠ΅Π½Π·ΠΎΡΡΠΊΠΈΡ
ΠΌΡΠ΅ΠΆΠ°, ΡΠ°ΡΡΠ°Π²ΡΠ΅Π½ΠΈΡ
ΠΎΠ΄
ΡΠ°Π·Π»ΠΈΡΠΈΡΠΈΡ
ΡΠ΅Π½Π·ΠΎΡΡΠΊΠΈΡ
ΡΡΠ΅ΡΠ°ΡΠ°, ΠΊΠΎΡΠΈ ΠΊΠΎΡΠΈΡΡΠ΅ ΡΠ°Π·Π»ΠΈΡΠΈΡΠ΅ ΠΊΠΎΠΌΡΠ½ΠΈΠΊΠ°ΡΠΈΠΎΠ½Π΅
ΠΏΡΠΎΡΠΎΠΊΠΎΠ»Π΅ ΠΈ ΡΠΎΡΠΌΠ°ΡΠ΅ ΠΏΠΎΡΡΠΊΠ° ΠΈ Ρ ΠΌΠΎΠ³ΡΡΠ½ΠΎΡΡΠΈ ΡΡ Π΄Π° ΠΎΠΏΡΠ»ΡΠΆΡΡΡ Π²Π΅ΡΠΈ Π±ΡΠΎΡ
ΠΊΠΎΡΠΈΡΠ½ΠΈΠΊΠ°. ΠΡΠ΅Π΄ΠΌΠ΅Ρ ΠΎΠ²Π΅ Π΄ΠΈΡΠ΅ΡΡΠ°ΡΠΈΡΠ΅ ΡΡ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ΅ Π·Π° ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡΡ
ΡΠ΅Π½Π·ΠΎΡΡΠΊΠΈΡ
ΠΌΡΠ΅ΠΆΠ°, ΠΊΠΎΡΠ΅ Ρ ΠΎΠΏΡΡΠ΅ΠΌ ΡΠ»ΡΡΠ°ΡΡ ΠΏΡΡΠΆΠ°ΡΡ ΠΏΠΎΠ΄ΡΡΠΊΡ ΡΠ°Π΄Ρ IoT
Π°ΠΏΠ»ΠΈΠΊΠ°ΡΠΈΡΠ°ΠΌΠ° ΡΠ° Π·Π°Ρ
ΡΠ΅Π²ΠΈΠΌΠ° Π·Π° Π²ΠΈΡΠΎΠΊΠΈΠΌ ΠΏΠ΅ΡΡΠΎΡΠΌΠ°Π½ΡΠ°ΠΌΠ°. ΠΠ΄ ΠΎΡΠ½ΠΎΠ²Π½ΠΈΡ
Π³Π΅Π½Π΅ΡΠΈΡΠΊΠΈΡ
ΡΠΈΠΏΠΎΠ²Π°, ΠΈΠ·Π΄Π²ΠΎΡΠ΅Π½Π° ΡΠ΅ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ° Π·Π°ΡΠ½ΠΎΠ²Π°Π½Π° Π½Π° ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΈ
ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΠΈ ΡΠΈΡ ΠΎΠ²Π΅ Π΄ΠΈΡΠ΅ΡΡΠ°ΡΠΈΡΠ΅ ΡΠ΅ Π°Π½Π°Π»ΠΈΠ·Π° Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ° ΠΊΠΎΡΠ΅ ΠΎΠΌΠΎΠ³ΡΡΠ°Π²Π°ΡΡ
ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΊΡ ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡΡ ΡΠ΅Π½Π·ΠΎΡΡΠΊΠΈΡ
ΠΌΡΠ΅ΠΆΠ° ΠΊΠΎΡΠΈΡΡΠ΅ΡΠ΅ΠΌ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΈΡ
Π²Π΅Π±
ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ° ΡΠ°Π΄ΠΈ ΠΎΠΌΠΎΠ³ΡΡΠ°Π²Π°ΡΠ° ΠΈΠ½ΡΠ΅ΡΠΎΠΏΠ΅ΡΠ°Π±ΠΈΠ»Π½ΠΎΡΡΠΈ ΡΠ΅Π½Π·ΠΎΡΡΠΊΠΈΡ
ΠΏΠΎΠ΄Π°ΡΠ°ΠΊΠ° ΠΈ
ΠΌΡΠ΅ΠΆΠ°. Π£ ΡΠ°Π΄Ρ ΡΡ Π½Π°ΡΠΏΡΠ΅ ΠΈΠ΄Π΅Π½ΡΠΈΡΠΈΠΊΠΎΠ²Π°Π½ΠΈ ΠΎΡΠ½ΠΎΠ²Π½ΠΈ ΡΠΈΠΏΠΎΠ²ΠΈ Π³Π΅Π½Π΅ΡΠΈΡΠΊΠΈΡ
Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ° ΠΈ Π΄Π°ΡΠ΅ ΡΡ ΡΠΈΡ
ΠΎΠ²Π΅ ΠΊΡΡΡΠ½Π΅ ΠΊΠ°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠ΅ ΠΈ Π½Π°ΡΠΈΠ½ ΡΠ΅Π°Π»ΠΈΠ·Π°ΡΠΈΡΠ΅, Π°
Π·Π° Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ΅ Π·Π°ΡΠ½ΠΎΠ²Π°Π½Π΅ Π½Π° ΠΏΡΠΎΠ»Π°Π·Π½ΠΎΠΌ ΡΡΠ΅ΡΠ°ΡΡ ΠΈ Π±ΡΠΎΠΊΠ΅ΡΡ ΠΏΠΎΡΡΠΊΠ° ΡΡΠ°ΡΠ΅Π½Π° ΡΠ΅
Π΅Π²Π°Π»ΡΠ°ΡΠΈΡΠ° ΠΏΠ΅ΡΡΠΎΡΠΌΠ°Π½ΡΠΈ Ρ ΡΠ°Π΄Ρ ΡΠ° ΠΏΠΎΠ΄Π°ΡΠΈΠΌΠ° Ρ ΡΠ΅Π°Π»Π½ΠΎΠΌ Π²ΡΠ΅ΠΌΠ΅Π½Ρ ΠΊΡΠ΅ΠΈΡΠ°ΡΠ΅ΠΌ
ΡΠΈΠΌΡΠ»Π°ΡΠΈΠΎΠ½ΠΎΠ³ ΠΎΠΊΡΡΠΆΠ΅ΡΠ°. ΠΠ°ΡΠΈΠΌ ΡΡ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ΅ ΡΠ° ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠΊΠΎΠΌ ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡΠΎΠΌ
ΡΠ΅Π½Π·ΠΎΡΡΠΊΠΈΡ
ΠΌΡΠ΅ΠΆΠ° ΠΊΠ»Π°ΡΠΈΡΠΈΠΊΠΎΠ²Π°Π½Π΅ ΠΈΠ΄Π΅Π½ΡΠΈΡΠΈΠΊΠΎΠ²Π°ΡΠ΅ΠΌ ΡΠΈΠΏΠΈΡΠ½ΠΈΡ
ΠΏΡΠΎΡΠ΅ΠΊΡΠ°Π½ΡΠΊΠΈΡ
ΠΏΡΠΈΡΡΡΠΏΠ°. ΠΠ·Π΄Π²ΠΎΡΠ΅Π½Π΅ ΡΡ Π΄Π²Π΅ Π³ΡΡΠΏΠ΅ ΠΏΡΠΈΡΡΡΠΏΠ°, ΠΈ ΡΠΎ ΠΏΡΠΈΡΡΡΠΏΠΈ
ΠΎΡΠΈΡΠ΅Π½ΡΠΈΡΠ°Π½ΠΈ ΠΊΠ° ΡΠ΅Π½Π·ΠΎΡΡΠΊΠΈΠΌ ΠΌΡΠ΅ΠΆΠ°ΠΌΠ° ΠΈ Π°ΠΏΠ»ΠΈΠΊΠ°ΡΠΈΠ²Π½ΠΎ ΠΎΡΠΈΡΠ΅Π½ΡΠΈΡΠ°Π½ΠΈ ΠΏΡΠΈΡΡΡΠΏΠΈ,
Π° ΡΠ²Π°ΠΊΠ° Π³ΡΡΠΏΠ° ΡΠ°Π΄ΡΠΆΠΈ Π΄Π°ΡΠ΅ ΠΏΠΎ ΡΠ΅ΡΠΈΡΠΈ ΡΠΈΠΏΠ° Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ°Π»Π½ΠΈΡ
ΠΏΡΠΈΡΡΡΠΏΠ°. ΠΠ°
ΡΠ²Π°ΠΊΠΈ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ°Π»Π½ΠΈ ΡΠΈΠΏ Π΄Π°ΡΠ° ΡΠ΅ Π°Π½Π°Π»ΠΈΠ·Π° ΠΏΡΠ΅Π΄Π½ΠΎΡΡΠΈ ΠΈ Π½Π΅Π΄ΠΎΡΡΠ°ΡΠ°ΠΊΠ°
ΠΊΠΎΡΠΈΡΡΠ΅Π½ΠΎΠ³ ΠΏΡΠΈΡΡΡΠΏΠ° ΠΈ ΠΊΡΠ°ΡΠ°ΠΊ ΠΎΠΏΠΈΡ ΠΊΠΎΠ½ΠΊΡΠ΅ΡΠ½ΠΈΡ
ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π½ΠΈΠΊΠ°...The progress of CMOS technology very large scale integration has
resulted in the intensive development of sensor technologies, wireless
communications, and energy efficient processors, which together make sensor
nodes capable of perceiving our environment, processing data and exchanging data
with other devices and users on the Internet. Such technological development has
led to the emergence of the Internet of Things (IoT) vision, with the goal of
providing information of the real world that surrounds us to the Internet users. As
one of the prerequisites for the realization of imaginary IoT services and products,
the horizontal integration of deployed sensor networks is required, by
implementing platforms that are able to serve a large number of users and would
allow the integration of heterogeneous sensor networks consisted of different
sensor devices, using different communication protocols and message formats. The
subject of this dissertation is the architectures for the integration of sensor
networks, which generally support IoT applications with high performance
requirements. Among the available generic types, an architecture based on the data
semantics is selected and the aim of this dissertation is the analysis of the
architectures that enable semantic integration of sensor networks using Semantic
Web technologies in order to achieve the interoperability of sensor data and
networks. The basic generic architecture types were first identified and their key
characteristics and method of implementation were given. For architectures based
on the gateway and the message broker, the performance evaluation was performed
in the scenario of real-time sensor messages delivery by creating a simulation
environment. Then the architectures with semantic-based integration of sensor
networks are classified by identifying typical design approaches. Two groups of
approaches are identified, approaches oriented to sensor networks and applicationoriented
approaches, whereas each group contains four architectural types. For
each architectural type, an analysis of the advantages and disadvantages of the used
approach is given, as well as a brief description of the concrete representatives..