51,500 research outputs found

    Integrating database and data stream systems

    Get PDF
    Traditionally, Database systems are viewed as passive data storage. Finite data sets are stored in traditional Database Systems and retrieved when needed. But applications such as sensor networks, network monitoring, retail transactions, and others, produce infinite data sets. A new system is under research and development, known as Data Stream Management System (DSMS), to deal with the infinite data sets. In DSMS, Data stream is a continuous source of sequential data. In Object-Oriented languages, like C/C++ and Java, the concept of stream does exist. The stream is viewed as a channel to which data is being inserted at one end and retrieved from the other end. To the database world, stream is a relatively new concept. In DSMS, data is processed on-line. Due to its very nature, the data fed to application through Data Stream can get lost, as it is never stored. This makes Data Stream non-persistent. Unlike this, Database Systems are persistent, which is the basis of my hypothesis. My hypothesis is Data Stream Management System and Database System can be combined under the same concepts and Data Stream can be made persistent. In this project, I have used an embedded database as a middleware to cache the data that is fed to an application through Data Stream. The embedded database is directly linked to the application that requires access to the stored data and is faster compared to a conventional database management system. Storing the streaming data in an embedded database makes Data Stream persistent. In the system developed, embedded database also stores the history of data from Database System. Now, any query that is run against the embedded database will generate combined result from Data Streams and Database Systems. An application is developed, using Active Collection Framework as a test bed, to prove the concept

    A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring

    Get PDF
    ArticleIn recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical ‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework

    RDF Stream Processing: Let's React

    Get PDF
    Stream processing has recently gained a prominent role in Computer Science research. From networks or databases to information theory or programming languages, a lot of work has been dedicated to conceive ways of representing, transmitting, processing and understanding infinite sequences of data. Nevertheless, there are still aspects that need time to reach a mature state. In particular, heterogeneity in stream data management and event processing is both a challenging topic and a key enabler for the rising Web of Things, where smart devices continuously sense properties of the surrounding world. Different proposals on RDF and Linked Data streams have shown promising results for managing this type of data, while keeping explicit semantics on the data streams, and linking them to other datasets in a web-friendly way. With time, these efforts led to the emergence of initiatives such as the RDF Stream Processing (RSP) W3C community group, aiming at specifying a base RDF stream model and query language for that model. Although these works produced interest results in defining overarching model definitions, there are still multiple orthogonal challenges that need to be addressed. In this work we identify some of these challenges, and we link them to the characteristics of what are nowadays called reactive systems. This paradigm includes natively supporting event-driven asynchronous message passing, non-blocking data communication and processing through all layers, and on-demand flexible scalability. We argue that RDF stream systems, combined with reactive techniques can lead to powerful, resilient and interoperable systems at Web scale

    Temporal Stream Algebra

    Get PDF
    Data stream management systems (DSMS) so far focus on event queries and hardly consider combined queries to both data from event streams and from a database. However, applications like emergency management require combined data stream and database queries. Further requirements are the simultaneous use of multiple timestamps after different time lines and semantics, expressive temporal relations between multiple time-stamps and exible negation, grouping and aggregation which can be controlled, i. e. started and stopped, by events and are not limited to fixed-size time windows. Current DSMS hardly address these requirements. This article proposes Temporal Stream Algebra (TSA) so as to meet the afore mentioned requirements. Temporal streams are a common abstraction of data streams and data- base relations; the operators of TSA are generalizations of the usual operators of Relational Algebra. A in-depth 'analysis of temporal relations guarantees that valid TSA expressions are non-blocking, i. e. can be evaluated incrementally. In this respect TSA differs significantly from previous algebraic approaches which use specialized operators to prevent blocking expressions on a "syntactical" level

    A Survey on IT-Techniques for a Dynamic Emergency Management in Large Infrastructures

    Get PDF
    This deliverable is a survey on the IT techniques that are relevant to the three use cases of the project EMILI. It describes the state-of-the-art in four complementary IT areas: Data cleansing, supervisory control and data acquisition, wireless sensor networks and complex event processing. Even though the deliverable’s authors have tried to avoid a too technical language and have tried to explain every concept referred to, the deliverable might seem rather technical to readers so far little familiar with the techniques it describes

    Complex Event Processing (CEP)

    Get PDF
    Event-driven information systems demand a systematic and automatic processing of events. Complex Event Processing (CEP) encompasses methods, techniques, and tools for processing events while they occur, i.e., in a continuous and timely fashion. CEP derives valuable higher-level knowledge from lower-level events; this knowledge takes the form of so called complex events, that is, situations that can only be recognized as a combination of several events. 1 Application Areas Service Oriented Architecture (SOA), Event-Driven Architecture (EDA), cost-reductions in sensor technology and the monitoring of IT systems due to legal, contractual, or operational concerns have lead to a significantly increased generation of events in computer systems in recent years. This development is accompanied by a demand to manage and process these events in an automatic, systematic, and timely fashion. Important application areas for Complex Event Processing (CEP) are the following

    Asynchronous Multi-Context Systems

    Full text link
    In this work, we present asynchronous multi-context systems (aMCSs), which provide a framework for loosely coupling different knowledge representation formalisms that allows for online reasoning in a dynamic environment. Systems of this kind may interact with the outside world via input and output streams and may therefore react to a continuous flow of external information. In contrast to recent proposals, contexts in an aMCS communicate with each other in an asynchronous way which fits the needs of many application domains and is beneficial for scalability. The federal semantics of aMCSs renders our framework an integration approach rather than a knowledge representation formalism itself. We illustrate the introduced concepts by means of an example scenario dealing with rescue services. In addition, we compare aMCSs to reactive multi-context systems and describe how to simulate the latter with our novel approach.Comment: International Workshop on Reactive Concepts in Knowledge Representation (ReactKnow 2014), co-located with the 21st European Conference on Artificial Intelligence (ECAI 2014). Proceedings of the International Workshop on Reactive Concepts in Knowledge Representation (ReactKnow 2014), pages 31-37, technical report, ISSN 1430-3701, Leipzig University, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-15056

    Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)

    Full text link
    Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data

    S-Net for multi-memory multicores

    Get PDF
    Copyright ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 5th ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming: http://doi.acm.org/10.1145/1708046.1708054S-Net is a declarative coordination language and component technology aimed at modern multi-core/many-core architectures and systems-on-chip. It builds on the concept of stream processing to structure dynamically evolving networks of communicating asynchronous components. Components themselves are implemented using a conventional language suitable for the application domain. This two-level software architecture maintains a familiar sequential development environment for large parts of an application and offers a high-level declarative approach to component coordination. In this paper we present a conservative language extension for the placement of components and component networks in a multi-memory environment, i.e. architectures that associate individual compute cores or groups thereof with private memories. We describe a novel distributed runtime system layer that complements our existing multithreaded runtime system for shared memory multicores. Particular emphasis is put on efficient management of data communication. Last not least, we present preliminary experimental data
    corecore