Search CORE

5 research outputs found

NMSTREAM: A SCALABLE EVENT-DRIVEN ETL FRAMEWORK FOR PROCESSING HETEROGENEOUS STREAMING DATA

Author: C. Li
F. Xiao
Y. Wu
Z. Wu
Publication venue: 'Copernicus GmbH'
Publication date: 01/09/2018
Field of study

ETL (Extraction-Transform-Load) tools, traditionally developed to operate offline on historical data for feeding Data-warehouses need to be enhanced to deal with continuously increased streaming data and be executed at network level during data streams acquisition. In this paper, a scalable and web-based ETL system called NMStream was presented. NMStream is based on event-driven architecture and designed for integrating distributed and heterogeneous streaming data by integrating the Apache Flume and Cassandra DB system, and the ETL processes were conducted through the Flume agent object. NMStream can be used for feeding traditional/real-time data-warehouses or data analytic tools in a stable and effective manner

Directory of Open Access Journals

X-Warehousing: An XML-Based Approach for Warehousing Complex Data

Author: J. Pokorný
J. Trujillo
M. Golfarelli
M. Golfarelli
M. Golfarelli
M. Heath
P. Krill
R. Kimball
V. Nassis
W. Hümmer
X. Baril
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref

Understanding the Performance of Low Power Raspberry Pi Cloud for Big Data

Author: Hajji W
Tso FP
Publication venue: 'MDPI AG'
Publication date: 01/06/2016
Field of study

Nowadays, Internet-of-Things (IoT) devices generate data at high speed and large volume. Often the data require real-time processing to support high system responsiveness which can be supported by localised Cloud and/or Fog computing paradigms. However, there are considerably large deployments of IoT such as sensor networks in remote areas where Internet connectivity is sparse, challenging the localised Cloud and/or Fog computing paradigms. With the advent of the Raspberry Pi, a credit card-sized single board computer, there is a great opportunity to construct low-cost, low-power portable cloud to support real-time data processing next to IoT deployments. In this paper, we extend our previous work on constructing Raspberry Pi Cloud to study its feasibility for real-time big data analytics under realistic application-level workload in both native and virtualised environments. We have extensively tested the performance of a single node Raspberry Pi 2 Model B with httperf and a cluster of 12 nodes with Apache Spark and HDFS (Hadoop Distributed File System). Our results have demonstrated that our portable cloud is useful for supporting real-time big data analytics. On the other hand, our results have also unveiled that overhead for CPU-bound workload in virtualised environment is surprisingly high, at 67.2%. We have found that, for big data applications, the virtualisation overhead is fractional for small jobs but becomes more significant for large jobs, up to 28.6%

Multidisciplinary Digital Publishing Institute

LJMU Research Online (Liverpool John Moores University)

Directory of Open Access Journals

Raspberry Pi Technology

Author
Publication venue: 'MDPI AG'
Publication date: 20/11/2017
Field of study

Portsmouth University Research Portal (Pure)

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen