17 research outputs found

    An OPC UA-based industrial Big Data architecture

    Full text link
    Industry 4.0 factories are complex and data-driven. Data is yielded from many sources, including sensors, PLCs, and other devices, but also from IT, like ERP or CRM systems. We ask how to collect and process this data in a way, such that it includes metadata and can be used for industrial analytics or to derive intelligent support systems. This paper describes a new, query model based approach, which uses a big data architecture to capture data from various sources using OPC UA as a foundation. It buffers and preprocesses the information for the purpose of harmonizing and providing a holistic state space of a factory, as well as mappings to the current state of a production site. That information can be made available to multiple processing sinks, decoupled from the data sources, which enables them to work with the information without interfering with devices of the production, disturbing the network devices they are working in, or influencing the production process negatively. Metadata and connected semantic information is kept throughout the process, allowing to feed algorithms with meaningful data, so that it can be accessed in its entirety to perform time series analysis, machine learning or similar evaluations as well as replaying the data from the buffer for repeatable simulations

    Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures

    Full text link
    Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines. Core of this method is the definition of use cases that microservices implementing stream processing have to fulfill. For each use case, our method identifies relevant workload dimensions that might affect the scalability of a use case. We propose to design one benchmark per use case and relevant workload dimension. We present a general benchmarking framework, which can be applied to execute the individual benchmarks for a given use case and workload dimension. Our framework executes an implementation of the use case's dataflow architecture for different workloads of the given dimension and various numbers of processing instances. This way, it identifies how resources demand evolves with increasing workloads. Within the scope of this paper, we present 4 identified use cases, derived from processing Industrial Internet of Things data, and 7 corresponding workload dimensions. We provide implementations of 4 benchmarks with Kafka Streams and Apache Flink as well as an implementation of our benchmarking framework to execute scalability benchmarks in cloud environments. We use both for evaluating the Theodolite method and for benchmarking Kafka Streams' and Flink's scalability for different deployment options.Comment: 28 page

    Big Data Architectures and Concepts

    Get PDF
    Nowadays, the processing of big data has become a major preoccupation for businesses, not only for storage and processing but also for operational requirements such as speed, maintaining performance with scalability, reliability, availability, security, and cost control; ultimately enabling them to maximize their profits by using the new possibilities offered by Big Data. In this article, we will explore and exploit the concepts and architectures of Big Data, in particular through the Hadoop open-source framework, and see how it meets the needs set out above, in its cluster structure, its components, its Lambda and Kappa architectures, and so on. We are also going to deploy Hadoop in a virtualized Linux environment, with several nodes, under the Oracle Virtual Box virtualization software, and use the experimental method to compare the processing time of the MapReduce algorithm on two DataSets with successively one, two, and three and four Datanodes, and thus observe the gains in processing time with the increase in the number of nodes in the cluste

    Real-time data analytic platform

    Get PDF
    Atualmente, o mundo dos dados está a crescer, sobretudo nas áreas de Data Science e Data Engineering. A análise de dados, tem-se tornado cada vez mais relevante para obter um conhecimento mais profundo sobre uma determinada empresa e representa uma oportunidade de negócio, precisamente devido à emergente presença de dados derivados da Inteligência Artificial, Internet of Things (IoT), social media e componentes de software/hardware. De modo a processar, analisar e distribuir estes dados num curto espaço de tempo, o tem ganho popularidade e as plataformas de análise de dados em tempo real começaram a surgir, colocando de lado os tradicionais processamentos de dados por lotes. De facto, para desenvolver uma plataforma de análise de dados, em tempo real ou não, as arquiteturas de Big Data e os seus componentes tornaram-se essenciais. As arquiteturas de Big Data existentes, Lambda e Kappa, são suportadas por vários componentes, oferecendo a oportunidade de explorar as suas funcionalidades para desenvolver plataformas de análise de dados em tempo real. Ao implementar este tipo de soluções, surge, por vezes, a questão sob qual das arquiteturas será a mais adequada a um determinado tipo de negócio. Neste relatório de estágio, é demonstrada a análise e conclusões sobre uma possível correlação entre os tipos de negócio e quais as soluções de análise de dados mais adequadas para os suportar. Ao longo deste documento, é ainda ponderada a possibilidade de desenvolver uma plataforma de análise de dados em tempo real, genérica o suficiente, para ser aplicável em qualquer tipo de negócio, reduzindo significativamente os custos de desenvolvimento e implementação. Neste contexto, são examinadas as arquiteturas Lambda e Kappa, por forma a entender se são suficientemente universais para essa possibilidade ou se é viável uma personalização baseada nos seus componentes. De modo a comprovar se qualquer uma destas arquiteturas de é implementável numa plataforma genérica de análise de dados em tempo real, o relatório também descreve o desenvolvimento de um caso de uso específico baseado na arquitetura Kappa

    An Iterative Methodology for Defining Big Data Analytics Architectures

    Get PDF
    Thanks to the advances achieved in the last decade, the lack of adequate technologies to deal with Big Data characteristics such as Data Volume is no longer an issue. Instead, recent studies highlight that one of the main Big Data issues is the lack of expertise to select adequate technologies and build the correct Big Data architecture for the problem at hand. In order to tackle this problem, we present our methodology for the generation of Big Data pipelines based on several requirements derived from Big Data features that are critical for the selection of the most appropriate tools and techniques. Thus, thanks to our approach we reduce the required know-how to select and build Big Data architectures by providing a step-by-step methodology that leads Big Data architects into creating their Big Data Pipelines for the case at hand. Our methodology has been tested in two use cases.This work has been funded by the ECLIPSE project (RTI2018-094283-B-C32) from the Spanish Ministry of Science, Innovation and Universities

    Distributed Processing and Analytics of IoT data in Edge Cloud

    Get PDF
    Sensors of different kinds connect to the IoT network and generate a large number of data streams. We explore the possibility of performing stream processing at the network edge and an architecture to do so. This thesis work is based on a prototype solution developed by Nokia. The system operates close to the data sources and retrieves the data based on requests made by applications through the system. Processing the data close to the place where it is generated can save bandwidth and assist in decision making. This work proposes a processing component operating at the far edge. The applicability of the prototype solution given the proposed processing component was illustrated in three use cases. Those use cases involve analysis performed on values of Key Performance Indicators, data streams generated by air quality sensors called Sensordrones, and recognizing car license plates by an application of deep learning

    An Optimized Kappa Architecture for IoT Data Management in Smart Farming

    Full text link
    peer reviewedAgriculture 4.0 is a domain of IoT in full growth which produces large amounts of data from machines, robots, and sensors networks. This data must be processed very quickly, especially for the systems that need to make real-time decisions. The Kappa architecture provides a way to process Agriculture 4.0 data at high speed in the cloud, and thus meets processing requirements. This paper presents an optimized version of the Kappa architecture allowing fast and efficient data management in Agriculture. The goal of this optimized version of the classical Kappa architecture is to improve memory management and processing speed. the Kappa architecture parameters are fine tuned in order to process data from a concrete use case. The results of this work have shown the impact of parameters tweaking on the speed of treatment. We have also proven that the combination of Apache Samza with Apache Druid offers the better performances

    Internet-of-Things Streaming over Realtime Transport Protocol : A reusablility-oriented approach to enable IoT Streaming

    Get PDF
    The Internet of Things (IoT) as a group of technologies is gaining momentum to become a prominent factor for novel applications. The existence of high computing capability and the vast amount of IoT devices can be observed in the market today. However, transport protocols are also required to bridge these two advantages. This thesis discussed the delivery of IoT through the lens of a few selected streaming protocols, which are Realtime Transport Protocol(RTP) and its cooperatives like RTP Control Protocol(RTCP) and Session Initiation Protocol (SIP). These protocols support multimedia content transfer with a heavy-stream characteristic requirement. The main contribution of this work was the multi-layer reusability schema for IoT streaming over RTP. IoT streaming as a new concept was defined, and its characteristics were introduced to clarify its requirements. After that, the RTP stacks and their commercial implementation-VoLTE(Voice over LTE) were investigated to collect technical insights. Based on this distilled knowledge, the application areas for IoT usage and the adopting methods were described. In addition to the realization, prototypes were made to be a proof of concept for streaming IoT data with RTP functionalities on distanced devices. These prototypes proved the possibility of applying the same duo-plane architect (signaling/data transferring) widely used in RTP implementation for multimedia services. Following a standard IETF, this implementation is a minimal example of adopting an existing standard for IoT streaming applications
    corecore