25 research outputs found

    MonetDB/DataCell: Online Analytics in a Streaming Column-Store

    Get PDF
    In DataCell, we design streaming functionalities in a mod- ern relational database kernel which targets big data analyt- ics. This includes exploitation of both its storage/execution engine and its optimizer infrastructure. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages for mod- ern applications in need for online analytics such as web logs, network monitoring and scientific data management. The major challenge then becomes the efficient support for specialized stream features, e.g., multi-query processing and incremental window-based processing as well as exploiting standard DBMS functionalities in a streaming environment such as indexing. In this demo, we present the DataCell system, an exten- sion of the MonetDB open-source column-store for online an- alytics. The demo gives the user the opportunity to experi- ence the features of DataCell such as processing both stream and persistent data and performing window based process- ing. The demo provides a visual interface to monitor the critical system components, e.g., how query plans transform from typical DBMS query plans to online query plans, how data flows through the query plans as the streams evolve, how DataCell maintains intermediate results in columnar form to avoid repeated evaluation of the same stream por- tions, etc. The demo also provides the ability to interac- tively set the test scenarios regarding input data and various DataCell knobs

    MOBANA: A distributed stream-based information system for public transit

    Get PDF
    Abstract: Public transit generates a wide range of diverse data, which include static data and high-velocity data streams from sensors. Integrating and processing this big real-time data is a challenge in developing analytical systems for public transit. We here propose MOBANA (MOBility ANAlyzer), a distributed stream-based system, which provides real-time information to a wide range of users for monitoring and analyzing the performance of public transit. To do so, MOBANA integrates the diverse data sources of public transit, and converts them into standard and exchangeable data formats. In order to manage such diverse data, we propose a layered architecture, where each layer handles a specific kind of data. MOBANA is designed to be efficient. e.g., it identifies the real time position of vehicles by adjusting planned position with real-time data as needed, thus dropping network load. MOBANA is implemented by Distributed Stream Processing Engine (DSPE) and Distributed Messaging System (DMS), which pursue scalable, efficient and reliable real-time processing and analytics. MOBANA was deployed as pilot in Pavia, and tested with real data

    Collaborative Reuse of Streaming Dataflows in IoT Applications

    Full text link
    Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming enable composition of continuous dataflows that execute persistently over data streams. They are used by Internet of Things (IoT) applications to analyze sensor data from Smart City cyber-infrastructure, and make active utility management decisions. As the ecosystem of such IoT applications that leverage shared urban sensor streams continue to grow, applications will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively reuse the outputs of overlapping dataflows, thereby improving the resource efficiency. In this paper, we propose \emph{dataflow reuse algorithms} that given a submitted dataflow, identifies the intersection of reusable tasks and streams from a collection of running dataflows to form a \emph{merged dataflow}. Similar algorithms to unmerge dataflows when they are removed are also proposed. We implement these algorithms for the popular Apache Storm DSPS, and validate their performance and resource savings for 35 synthetic dataflows based on public OPMW workflows with diverse arrival and departure distributions, and on 21 real IoT dataflows from RIoTBench.Comment: To appear in IEEE eScience Conference 201
    corecore