55,578 research outputs found

    Real-Time Context-Aware Microservice Architecture for Predictive Analytics and Smart Decision-Making

    Get PDF
    The impressive evolution of the Internet of Things and the great amount of data flowing through the systems provide us with an inspiring scenario for Big Data analytics and advantageous real-time context-aware predictions and smart decision-making. However, this requires a scalable system for constant streaming processing, also provided with the ability of decision-making and action taking based on the performed predictions. This paper aims at proposing a scalable architecture to provide real-time context-aware actions based on predictive streaming processing of data as an evolution of a previously provided event-driven service-oriented architecture which already permitted the context-aware detection and notification of relevant data. For this purpose, we have defined and implemented a microservice-based architecture which provides real-time context-aware actions based on predictive streaming processing of data. As a result, our architecture has been enhanced twofold: on the one hand, the architecture has been supplied with reliable predictions through the use of predictive analytics and complex event processing techniques, which permit the notification of relevant context-aware information ahead of time. On the other, it has been refactored towards a microservice architecture pattern, highly improving its maintenance and evolution. The architecture performance has been evaluated with an air quality case study

    Real time Big Data analysis by using Apache Kudu and NoSQL Redis in web applications

    Get PDF
    In the recent years data processing in Big Data applications are moving from a batch processing model to a live streaming processing model. The live streaming processing model allows to process data in a real time or near real time manner. In many cases, there are need of combining data from various sources - both new generated data and data previously gathered and archived. In each case a different storage mechanism is appropriate to achieve best flexibility and to avoid the so-called bottleneck when processing data. In this article we review some of the possibilities of using Apache Kudu and NoSQL Redis systems which made them suitable to be used together for fast processing of streaming data

    Streaming Feature Grouping and Selection (Sfgs) For Big Data Classification

    Get PDF
    Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes massive and diverse and forms what is known as a big data challenge. In machine learning, streaming feature selection has always been a preferred method in the preprocessing of streaming data. Recently, feature grouping, which can measure the hidden information between selected features, has begun gaining attention. This dissertation’s main contribution is in solving the issue of the extremely high dimensionality of streaming big data by delivering a streaming feature grouping and selection algorithm. Also, the literature review presents a comprehensive review of the current streaming feature selection approaches and highlights the state-of-the-art algorithms trending in this area. The proposed algorithm is designed with the idea of grouping together similar features to reduce redundancy and handle the stream of features in an online fashion. This algorithm has been implemented and evaluated using benchmark datasets against state-of-the-art streaming feature selection algorithms and feature grouping techniques. The results showed better performance regarding prediction accuracy than with state-of-the-art algorithms

    Data Systems Fault Coping for Real-time Big Data Analytics Required Architectural Crucibles

    Get PDF
    This paper analyzes the properties and characteristics of unknown and unexpected faults introduced into information systems while processing Big Data in real-time. The authors hypothesize that there are new faults, and requirements for fault handling and propose an analytic model and architectural framework to assess and manage the faults and mitigate the risks of correlating or integrating otherwise uncorrelated Big Data, and to ensure the source pedigree, quality, set integrity, freshness, and validity of data being consumed. We argue that new architectures, methods, and tools for handling and analyzing Big Data systems functioning in real-time must design systems that address and mitigate concerns for faults resulting from real-time streaming processes while ensuring that variables such as synchronization, redundancy, and latency are addressed. This paper concludes that with improved designs, real-time Big Data systems may continuously deliver the value and benefits of streaming Big Data

    Development of HU Cloud-based Spark Applications for Streaming Data Analytics

    Get PDF
    Nowadays, streaming data overflows from various sources and technologies such as Internet of Things (IoT), making conventional data analytics methods unsuitable to manage the latency of data processing relative to the growing demand for high processing speed and algorithmically scalability [1]. Real-time streaming data analytics, which processes data while it is in motion, is required to allow many organizations to analyze streaming data effectively and efficiently for being more active in their strategies. To analyze real time “Big” streaming data, parallel and distributed computing over a cloud of computers has become a mainstream solution to allow scalability, resiliency to failure, and fast processing of massive data sets. Several open source data analytics frameworks have been proposed and developed for streaming data analytics successfully. Apache Spark is one such framework being developed at the University of California, Berkley and gains lots of attentions due to reducing IO by storing data in a memory and a unique data executing model. In Computer & Information Sciences (CISC) at Harrisburg University (HU), we have been working on building a private Cloud Computing for future research and planning to involve industry collaboration where high volumes of real time streaming data are used to develop solutions to practical problems in industry. By developing a HU Cloud based environment for Apache Spark applications for streaming data analytics with batch processing on Hadoop Distributed File System (HDFS), we can prepare future big data era that can turn big data into beneficial actions for industry needs. This research aims to develop Spark applications supporting an entire streaming data analytics workflow, which consists of data ingestion, data analytics, data visualization and data storing. In particular, we will focus on a real time stock recommender system based on state-of-the-art Machine Learning (ML)/Deep Learning (DL) frameworks such as mllib, TensorFlow, Apache mxnet and pytorch. The plan is to gather real time stock market data from Google/Yahoo finance data streams to build a model to predict a future stock market trend. The proposed Spark applications on the HU cloud-based architecture will give emphasis to finding time-series forcating module for a specific period, typically based on selected attributes. In addition, we will test scale-out architecture, efficient parallel processing and fault tolerance of Spark applications on the HU Cloud based HDFS. We believe that this research will bring the CISC program at HU significant competitive advantages globally

    Challenges in managing real-time data in health information system (HIS)

    Get PDF
    © Springer International Publishing Switzerland 2016. In this paper, we have discussed the challenges in handling real-time medical big data collection and storage in health information system (HIS). Based on challenges, we have proposed a model for realtime analysis of medical big data. We exemplify the approach through Spark Streaming and Apache Kafka using the processing of health big data Stream. Apache Kafka works very well in transporting data among different systems such as relational databases, Apache Hadoop and nonrelational databases. However, Apache Kafka lacks analyzing the stream, Spark Streaming framework has the capability to perform some operations on the stream. We have identified the challenges in current realtime systems and proposed our solution to cope with the medical big data streams

    Big Data Processing with Apache Spark in Tertiary Institutions: Spark Streaming

    Get PDF
    In tertiary institutions, different set of information are derived from the various department and other functional sections. Individual departments and other functional sections in the institutions manage their data separately. This situation has resulted in huge number of different set of data across the various departments in tertiary institutions. There is no centralized data centre where data/information can be retrieved for the management committee when the need arises. In academic institution data captured is restricted to the institution which collected it but centralisation of the various data in the various functional sections does not exist. This makes it difficult for the management committee to take decisions based on relevant information needed. In order to address this problem, we proposed Spark Streaming. Spark Streaming is an element which facilitates processing of live flows of data. Spark streaming will able to capture data in real time, process it and make it available to the management committee when the need arises Keywords: Spark, Streaming, Big data, Processing, Tertiary, Institutio
    corecore