51,500 research outputs found
Integrating database and data stream systems
Traditionally, Database systems are viewed as passive data storage. Finite data sets are stored in traditional Database Systems and retrieved when needed. But applications such as sensor networks, network monitoring, retail transactions, and others, produce infinite data sets. A new system is under research and development, known as Data Stream Management System (DSMS), to deal with the infinite data sets. In DSMS, Data stream is a continuous source of sequential data. In Object-Oriented languages, like C/C++ and Java, the concept of stream does exist. The stream is viewed as a channel to which data is being inserted at one end and retrieved from the other end. To the database world, stream is a relatively new concept. In DSMS, data is processed on-line. Due to its very nature, the data fed to application through Data Stream can get lost, as it is never stored. This makes Data Stream non-persistent. Unlike this, Database Systems are persistent, which is the basis of my hypothesis. My hypothesis is Data Stream Management System and Database System can be combined under the same concepts and Data Stream can be made persistent. In this project, I have used an embedded database as a middleware to cache the data that is fed to an application through Data Stream. The embedded database is directly linked to the application that requires access to the stored data and is faster compared to a conventional database management system. Storing the streaming data in an embedded database makes Data Stream persistent. In the system developed, embedded database also stores the history of data from Database System. Now, any query that is run against the embedded database will generate combined result from Data Streams and Database Systems. An application is developed, using Active Collection Framework as a test bed, to prove the concept
A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring
ArticleIn recent years, the application and wide adoption of Internet of Things (IoT)-based
technologies have increased the proliferation of monitoring systems, which has consequently
exponentially increased the amounts of heterogeneous data generated. Processing and analysing
the massive amount of data produced is cumbersome and gradually moving from classical
‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance,
in environmental monitoring and management domain, time-series data and historical dataset are
crucial for prediction models. However, the environmental monitoring domain still utilises legacy
systems, which complicates the real-time analysis of the essential data, integration with big data
platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing
middleware framework for real-time analysis of heterogeneous environmental monitoring and
management data is presented and tested on a cluster using open source technologies in a big data
environment. The system ingests datasets from legacy systems and sensor data from heterogeneous
automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect
APIs for processing by the Kafka streaming processing engine. The stream processing engine executes
the predictive numerical models and algorithms represented in event processing (EP) languages
for real-time analysis of the data streams. To prove the feasibility of the proposed framework,
we implemented the system using a case study scenario of drought prediction and forecasting based
on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form
that could be executed by the streaming engine for real-time computing. Secondly, the model is
applied to the ingested data streams and datasets to predict drought through persistent querying of
the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of
the distributed stream processing middleware infrastructure is calculated to determine the real-time
effectiveness of the framework
RDF Stream Processing: Let's React
Stream processing has recently gained a prominent role in Computer Science research. From networks or databases to information theory or programming languages, a lot of work has been dedicated to conceive ways of representing, transmitting, processing and understanding infinite sequences of data. Nevertheless, there are still aspects that need time to reach a mature state. In particular, heterogeneity in stream data management and event processing is both a challenging topic and a key enabler for the rising Web of Things, where smart devices continuously sense properties of the surrounding world. Different proposals on RDF and Linked Data streams have shown promising results for managing this type of data, while keeping explicit semantics on the data streams, and linking them to other datasets in a web-friendly way. With time, these efforts led to the emergence of initiatives such as the RDF Stream Processing (RSP) W3C community group, aiming at specifying a base RDF stream model and query language for that model. Although these works produced interest results in defining overarching model definitions, there are still multiple orthogonal challenges that need to be addressed. In this work we identify some of these challenges, and we link them to the characteristics of what are nowadays called reactive systems. This paradigm includes natively supporting event-driven asynchronous message passing, non-blocking data communication and processing through all layers, and on-demand flexible scalability. We argue that RDF stream systems, combined with reactive techniques can lead to powerful, resilient and interoperable systems at Web scale
Temporal Stream Algebra
Data stream management systems (DSMS) so far focus on
event queries and hardly consider combined queries to both
data from event streams and from a database. However,
applications like emergency management require combined
data stream and database queries. Further requirements are
the simultaneous use of multiple timestamps after different
time lines and semantics, expressive temporal relations between multiple time-stamps and
exible negation, grouping
and aggregation which can be controlled, i. e. started and
stopped, by events and are not limited to fixed-size time
windows. Current DSMS hardly address these requirements.
This article proposes Temporal Stream Algebra (TSA) so
as to meet the afore mentioned requirements. Temporal
streams are a common abstraction of data streams and data-
base relations; the operators of TSA are generalizations of
the usual operators of Relational Algebra. A in-depth 'analysis of temporal relations guarantees that valid TSA expressions are non-blocking, i. e. can be evaluated incrementally.
In this respect TSA differs significantly from previous algebraic approaches which use specialized operators to prevent
blocking expressions on a "syntactical" level
A Survey on IT-Techniques for a Dynamic Emergency Management in Large Infrastructures
This deliverable is a survey on the IT techniques that are relevant to the three use cases of the project EMILI. It describes the state-of-the-art in four complementary IT areas: Data cleansing, supervisory control and data acquisition, wireless sensor networks and complex event processing. Even though the deliverable’s authors have tried to avoid a too technical language and have tried to explain every concept referred to, the deliverable might seem rather technical to readers so far little familiar with the techniques it describes
Complex Event Processing (CEP)
Event-driven information systems demand a systematic and automatic processing of events. Complex Event Processing (CEP) encompasses methods, techniques, and tools for processing events while they occur, i.e., in a continuous and timely fashion. CEP derives valuable higher-level knowledge from lower-level events; this knowledge takes the form of so called complex events, that is, situations that can only be recognized as a combination of several events. 1 Application Areas Service Oriented Architecture (SOA), Event-Driven Architecture (EDA), cost-reductions in sensor technology and the monitoring of IT systems due to legal, contractual, or operational concerns have lead to a significantly increased generation of events in computer systems in recent years. This development is accompanied by a demand to manage and process these events in an automatic, systematic, and timely fashion. Important application areas for Complex Event Processing (CEP) are the following
Asynchronous Multi-Context Systems
In this work, we present asynchronous multi-context systems (aMCSs), which
provide a framework for loosely coupling different knowledge representation
formalisms that allows for online reasoning in a dynamic environment. Systems
of this kind may interact with the outside world via input and output streams
and may therefore react to a continuous flow of external information. In
contrast to recent proposals, contexts in an aMCS communicate with each other
in an asynchronous way which fits the needs of many application domains and is
beneficial for scalability. The federal semantics of aMCSs renders our
framework an integration approach rather than a knowledge representation
formalism itself. We illustrate the introduced concepts by means of an example
scenario dealing with rescue services. In addition, we compare aMCSs to
reactive multi-context systems and describe how to simulate the latter with our
novel approach.Comment: International Workshop on Reactive Concepts in Knowledge
Representation (ReactKnow 2014), co-located with the 21st European Conference
on Artificial Intelligence (ECAI 2014). Proceedings of the International
Workshop on Reactive Concepts in Knowledge Representation (ReactKnow 2014),
pages 31-37, technical report, ISSN 1430-3701, Leipzig University, 2014.
http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-15056
Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)
Real-time analytics that requires integration and aggregation of
heterogeneous and distributed streaming and static data is a typical task in
many industrial scenarios such as diagnostics of turbines in Siemens. OBDA
approach has a great potential to facilitate such tasks; however, it has a
number of limitations in dealing with analytics that restrict its use in
important industrial applications. Based on our experience with Siemens, we
argue that in order to overcome those limitations OBDA should be extended and
become analytics, source, and cost aware. In this work we propose such an
extension. In particular, we propose an ontology, mapping, and query language
for OBDA, where aggregate and other analytical functions are first class
citizens. Moreover, we develop query optimisation techniques that allow to
efficiently process analytical tasks over static and streaming data. We
implement our approach in a system and evaluate our system with Siemens turbine
data
S-Net for multi-memory multicores
Copyright ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 5th ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming: http://doi.acm.org/10.1145/1708046.1708054S-Net is a declarative coordination language and component technology aimed at modern multi-core/many-core architectures and systems-on-chip. It builds on the concept of stream processing to structure dynamically evolving networks of communicating asynchronous components. Components themselves are implemented using a conventional language suitable for the application domain. This two-level software architecture maintains a familiar sequential development environment for large parts of an application and offers a high-level declarative approach to component coordination. In this paper we present a conservative language extension for the placement of components and component networks in a multi-memory environment, i.e. architectures that associate individual compute cores or groups thereof with private memories. We describe a novel distributed runtime system layer that complements our existing multithreaded runtime system for shared memory multicores. Particular emphasis is put on efficient management of data communication. Last not least, we present preliminary experimental data
- …