Search CORE

115,298 research outputs found

Patterns for distributed real-time stream processing

Author: Arias Fisteus Jesús
Basanta Val Pablo
Fernández García Norberto
Sánchez Fernández Luis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In recent years, big data systems have become an active area of research and development. Stream processing is one of the potential application scenarios of big data systems where the goal is to process a continuous, high velocity flow of information items. High frequency trading (HFT) in stock markets or trending topic detection in Twitter are some examples of stream processing applications. In some cases (like, for instance, in HFT), these applications have end-to-end quality-of-service requirements and may benefit from the usage of real-time techniques. Taking this into account, the present article analyzes, from the point of view of real-time systems, a set of patterns that can be used when implementing a stream processing application. For each pattern, we discuss its advantages and disadvantages, as well as its impact in application performance, measured as response time, maximum input frequency and changes in utilization demands due to the pattern.This work been partially supported by Distributed Java Infrastructure for Real-Time Big Data (CAS14/00118). It has been also partially funded by eMadrid (S2013/ICE-2715), HERMES-MARTDRIVER (TIN2013-46801-C4-2-R) and AUDACity (TIN2016-77158-C4-1-R); and also by European Union's 7th Framework Program under Grant Agreement FP7-IC6-318763. We are also in debt with our anonymous reviewers that improved the quality of the article

Universidad Carlos III de Madrid e-Archivo

MOBANA: A distributed stream-based information system for public transit

Author: Liu Dongmeng
LIU KAIXU
MA TIANYI
MOTTA GIANMARIO PIERO ANTONIO
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Abstract: Public transit generates a wide range of diverse data, which include static data and high-velocity data streams from sensors. Integrating and processing this big real-time data is a challenge in developing analytical systems for public transit. We here propose MOBANA (MOBility ANAlyzer), a distributed stream-based system, which provides real-time information to a wide range of users for monitoring and analyzing the performance of public transit. To do so, MOBANA integrates the diverse data sources of public transit, and converts them into standard and exchangeable data formats. In order to manage such diverse data, we propose a layered architecture, where each layer handles a specific kind of data. MOBANA is designed to be efficient. e.g., it identifies the real time position of vehicles by adjusting planned position with real-time data as needed, thus dropping network load. MOBANA is implemented by Distributed Stream Processing Engine (DSPE) and Distributed Messaging System (DMS), which pursue scalable, efficient and reliable real-time processing and analytics. MOBANA was deployed as pilot in Pavia, and tested with real data

Adaptive Normalization in Streaming Data

Author: Elwell R.
Lopez M. A.
Tan P. N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/10/2019
Field of study

In todays digital era, data are everywhere from Internet of Things to health care or financial applications. This leads to potentially unbounded ever-growing Big data streams and it needs to be utilized effectively. Data normalization is an important preprocessing technique for data analytics. It helps prevent mismodeling and reduce the complexity inherent in the data especially for data integrated from multiple sources and contexts. Normalization of Big Data stream is challenging because of evolving inconsistencies, time and memory constraints, and non-availability of whole data beforehand. This paper proposes a distributed approach to adaptive normalization for Big data stream. Using sliding windows of fixed size, it provides a simple mechanism to adapt the statistics for normalizing changing data in each window. Implemented on Apache Storm, a distributed real-time stream data framework, our approach exploits distributed data processing for efficient normalization. Unlike other existing adaptive approaches that normalize data for a specific use (e.g., classification), ours does not. Moreover, our adaptive mechanism allows flexible controls, via user-specified thresholds, for normalization tradeoffs between time and precision. The paper illustrates our proposed approach along with a few other techniques and experiments on both synthesized and real-world data. The normalized data obtained from our proposed approach, on 160,000 instances of data stream, improves over the baseline by 89% with 0.0041 root-mean-square error compared with the actual data

arXiv.org e-Print Archive

Real-Time Data Processing With Lambda Architecture

Author: Malusare Omkar Ashok
Publication venue: SJSU ScholarWorks
Publication date: 20/05/2019
Field of study

Data has evolved immensely in recent years, in type, volume and velocity. There are several frameworks to handle the big data applications. The project focuses on the Lambda Architecture proposed by Marz and its application to obtain real-time data processing. The architecture is a solution that unites the benefits of the batch and stream processing techniques. Data can be historically processed with high precision and involved algorithms without loss of short-term information, alerts and insights. Lambda Architecture has an ability to serve a wide range of use cases and workloads that withstands hardware and human mistakes. The layered architecture enhances loose coupling and flexibility in the system. This a huge benefit that allows understanding the trade-offs and application of various tools and technologies across the layers. There has been an advancement in the approach of building the LA due to improvements in the underlying tools. The project demonstrates a simplified architecture for the LA that is maintainable

SJSU ScholarWorks