8,997 research outputs found
Exploiting the power of relational databases for efficient stream processing
textabstractStream applications gained significant popularity over
the last years that lead to the development of specialized stream engines.
These systems are designed from scratch with a different
philosophy than nowadays database engines in order to cope with the stream applications requirements.
However, this means that they lack the power and sophisticated techniques of a full fledged
database system that exploits techniques and algorithms accumulated over many years of database research.
In this paper, we take the opposite route and design a stream engine directly on top of a database kernel.
Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets.
A continuous query can then be evaluated over its relevant baskets as a typical one-time query
exploiting the power of the relational engine.
Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket.
A basket can be the input to a single or multiple similar query plans.
Furthermore, a query plan can be split into multiple parts each one with its own
input/output baskets allowing for flexible load sharing query scheduling.
Contrary to traditional stream engines, that process one tuple at a time,
this model allows batch processing of tuples, e.g., query a basket only after tuples arrive
or after a time threshold has passed.
Furthermore, we are not restricted to process tuples in the order they arrive.
Instead, we can selectively pick tuples from a basket based on the query requirements exploiting
a novel query component, the basket expressions.
We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages.
We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS.
A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and
the standard Linear Road benchmark demonstrate the potential of this new approach
Exploiting the Power of Relational Databases for Efficient Stream Processing
Stream applications gained significant popularity over
the last years that lead to the development of specialized stream engines.
These systems are designed from scratch with a different
philosophy than nowadays database engines in order to cope with the stream applications requirements.
However, this means that they lack the power and sophisticated techniques of a full fledged
database system that exploits techniques and algorithms accumulated over many years of database research.
In this paper, we take the opposite route and design a stream engine directly on top of a database kernel.
Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets.
A continuous query can then be evaluated over its relevant baskets as a typical one-time query
exploiting the power of the relational engine.
Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket.
A basket can be the input to a single or multiple similar query plans.
Furthermore, a query plan can be split into multiple parts each one with its own
input/output baskets allowing for flexible load sharing query scheduling.
Contrary to traditional stream engines, that process one tuple at a time,
this model allows batch processing of tuples, e.g., query a basket only after tuples arrive
or after a time threshold has passed.
Furthermore, we are not restricted to process tuples in the order they arrive.
Instead, we can selectively pick tuples from a basket based on the query requirements exploiting
a novel query component, the basket expressions.
We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages.
We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS.
A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and
the standard Linear Road benchmark demonstrate the potential of this new approach
DataCell: Exploiting the Power of Relational Databases for Efficient Stream Processing
Designed for complex event processing, DataCell is a research prototype database system in the area of sensor stream systems. Under development at CWI, it belongs to the MonetDB database system family. CWI researchers innovatively built a stream engine directly on top of a database kernel, thus exploiting and merging technologies from the stream world and the rich area of database literature. The results are very promising
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Temporal Stream Algebra
Data stream management systems (DSMS) so far focus on
event queries and hardly consider combined queries to both
data from event streams and from a database. However,
applications like emergency management require combined
data stream and database queries. Further requirements are
the simultaneous use of multiple timestamps after different
time lines and semantics, expressive temporal relations between multiple time-stamps and
exible negation, grouping
and aggregation which can be controlled, i. e. started and
stopped, by events and are not limited to fixed-size time
windows. Current DSMS hardly address these requirements.
This article proposes Temporal Stream Algebra (TSA) so
as to meet the afore mentioned requirements. Temporal
streams are a common abstraction of data streams and data-
base relations; the operators of TSA are generalizations of
the usual operators of Relational Algebra. A in-depth 'analysis of temporal relations guarantees that valid TSA expressions are non-blocking, i. e. can be evaluated incrementally.
In this respect TSA differs significantly from previous algebraic approaches which use specialized operators to prevent
blocking expressions on a "syntactical" level
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems
(CPS) present novel challenges to Big Data platforms for performing online
analytics. Ubiquitous sensors from IoT deployments are able to generate data
streams at high velocity, that include information from a variety of domains,
and accumulate to large volumes on disk. Complex Event Processing (CEP) is
recognized as an important real-time computing paradigm for analyzing
continuous data streams. However, existing work on CEP is largely limited to
relational query processing, exposing two distinctive gaps for query
specification and execution: (1) infusing the relational query model with
higher level knowledge semantics, and (2) seamless query evaluation across
temporal spaces that span past, present and future events. These allow
accessible analytics over data streams having properties from different
disciplines, and help span the velocity (real-time) and volume (persistent)
dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP)
framework that provides domain-aware knowledge query constructs along with
temporal operators that allow end-to-end queries to span across real-time and
persistent streams. We translate this query model to efficient query execution
over online and offline data streams, proposing several optimizations to
mitigate the overheads introduced by evaluating semantic predicates and in
accessing high-volume historic data streams. The proposed X-CEP query model and
execution approaches are implemented in our prototype semantic CEP engine,
SCEPter. We validate our query model using domain-aware CEP queries from a
real-world Smart Power Grid application, and experimentally analyze the
benefits of our optimizations for executing these queries, using event streams
from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems,
October 27, 201
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Analysing Temporal Relations – Beyond Windows, Frames and Predicates
This article proposes an approach to rely on the standard
operators of relational algebra (including grouping and ag-
gregation) for processing complex event without requiring
window specifications. In this way the approach can pro-
cess complex event queries of the kind encountered in appli-
cations such as emergency management in metro networks.
This article presents Temporal Stream Algebra (TSA) which
combines the operators of relational algebra with an analy-
sis of temporal relations at compile time. This analysis de-
termines which relational algebra queries can be evaluated
against data streams, i. e. the analysis is able to distinguish
valid from invalid stream queries. Furthermore the analysis
derives functions similar to the pass, propagation and keep
invariants in Tucker's et al. \Exploiting Punctuation Seman-
tics in Continuous Data Streams". These functions enable
the incremental evaluation of TSA queries, the propagation
of punctuations, and garbage collection. The evaluation of
TSA queries combines bulk-wise and out-of-order processing
which makes it tolerant to workload bursts as they typically
occur in emergency management. The approach has been
conceived for efficiently processing complex event queries on
top of a relational database system. It has been deployed
and tested on MonetDB
Efficient data representation for XML in peer-based systems
Purpose - New directions in the provision of end-user computing experiences mean that the best way to share data between small mobile computing devices needs to be determined. Partitioning large structures so that they can be shared efficiently provides a basis for data-intensive applications on such platforms. The partitioned structure can be compressed using dictionary-based approaches and then directly queried without firstly decompressing the whole structure. Design/methodology/approach - The paper describes an architecture for partitioning XML into structural and dictionary elements and the subsequent manipulation of the dictionary elements to make the best use of available space. Findings - The results indicate that considerable savings are available by removing duplicate dictionaries. The paper also identifies the most effective strategy for defining dictionary scope. Research limitations/implications - This evaluation is based on a range of benchmark XML structures and the approach to minimising dictionary size shows benefit in the majority of these. Where structures are small and regular, the benefits of efficient dictionary representation are lost. The authors' future research now focuses on heuristics for further partitioning of structural elements. Practical implications - Mobile applications that need access to large data collections will benefit from the findings of this research. Traditional client/server architectures are not suited to dealing with high volume demands from a multitude of small mobile devices. Peer data sharing provides a more scalable solution and the experiments that the paper describes demonstrate the most effective way of sharing data in this context. Social implications - Many services are available via smartphone devices but users are wary of exploiting the full potential because of the need to conserve battery power. The approach mitigates this challenge and consequently expands the potential for users to benefit from mobile information systems. This will have impact in areas such as advertising, entertainment and education but will depend on the acceptability of file sharing being extended from the desktop to the mobile environment. Originality/value - The original work characterises the most effective way of sharing large data sets between small mobile devices. This will save battery power on devices such as smartphones, thus providing benefits to users of such devices
- …