17 research outputs found
An OPC UA-based industrial Big Data architecture
Industry 4.0 factories are complex and data-driven. Data is yielded from many
sources, including sensors, PLCs, and other devices, but also from IT, like ERP
or CRM systems. We ask how to collect and process this data in a way, such that
it includes metadata and can be used for industrial analytics or to derive
intelligent support systems. This paper describes a new, query model based
approach, which uses a big data architecture to capture data from various
sources using OPC UA as a foundation. It buffers and preprocesses the
information for the purpose of harmonizing and providing a holistic state space
of a factory, as well as mappings to the current state of a production site.
That information can be made available to multiple processing sinks, decoupled
from the data sources, which enables them to work with the information without
interfering with devices of the production, disturbing the network devices they
are working in, or influencing the production process negatively. Metadata and
connected semantic information is kept throughout the process, allowing to feed
algorithms with meaningful data, so that it can be accessed in its entirety to
perform time series analysis, machine learning or similar evaluations as well
as replaying the data from the buffer for repeatable simulations
Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures
Distributed stream processing engines are designed with a focus on
scalability to process big data volumes in a continuous manner. We present the
Theodolite method for benchmarking the scalability of distributed stream
processing engines. Core of this method is the definition of use cases that
microservices implementing stream processing have to fulfill. For each use
case, our method identifies relevant workload dimensions that might affect the
scalability of a use case. We propose to design one benchmark per use case and
relevant workload dimension. We present a general benchmarking framework, which
can be applied to execute the individual benchmarks for a given use case and
workload dimension. Our framework executes an implementation of the use case's
dataflow architecture for different workloads of the given dimension and
various numbers of processing instances. This way, it identifies how resources
demand evolves with increasing workloads. Within the scope of this paper, we
present 4 identified use cases, derived from processing Industrial Internet of
Things data, and 7 corresponding workload dimensions. We provide
implementations of 4 benchmarks with Kafka Streams and Apache Flink as well as
an implementation of our benchmarking framework to execute scalability
benchmarks in cloud environments. We use both for evaluating the Theodolite
method and for benchmarking Kafka Streams' and Flink's scalability for
different deployment options.Comment: 28 page
Big Data Architectures and Concepts
Nowadays, the processing of big data has become a major preoccupation for businesses, not only for storage and processing but also for operational requirements such as speed, maintaining performance with scalability, reliability, availability, security, and cost control; ultimately enabling them to maximize their profits by using the new possibilities offered by Big Data. In this article, we will explore and exploit the concepts and architectures of Big Data, in particular through the Hadoop open-source framework, and see how it meets the needs set out above, in its cluster structure, its components, its Lambda and Kappa architectures, and so on. We are also going to deploy Hadoop in a virtualized Linux environment, with several nodes, under the Oracle Virtual Box virtualization software, and use the experimental method to compare the processing time of the MapReduce algorithm on two DataSets with successively one, two, and three and four Datanodes, and thus observe the gains in processing time with the increase in the number of nodes in the cluste
Real-time data analytic platform
Atualmente, o mundo dos dados está a crescer, sobretudo nas áreas de Data Science e Data
Engineering. A análise de dados, tem-se tornado cada vez mais relevante para obter um
conhecimento mais profundo sobre uma determinada empresa e representa uma oportunidade
de negócio, precisamente devido à emergente presença de dados derivados da Inteligência
Artificial, Internet of Things (IoT), social media e componentes de software/hardware.
De modo a processar, analisar e distribuir estes dados num curto espaço de tempo, o
tem ganho popularidade e as plataformas de análise de dados em tempo real começaram a
surgir, colocando de lado os tradicionais processamentos de dados por lotes. De facto, para
desenvolver uma plataforma de análise de dados, em tempo real ou não, as arquiteturas de Big
Data e os seus componentes tornaram-se essenciais.
As arquiteturas de Big Data existentes, Lambda e Kappa, são suportadas por vários
componentes, oferecendo a oportunidade de explorar as suas funcionalidades para desenvolver
plataformas de análise de dados em tempo real. Ao implementar este tipo de soluções, surge,
por vezes, a questão sob qual das arquiteturas será a mais adequada a um determinado tipo de
negócio.
Neste relatório de estágio, é demonstrada a análise e conclusões sobre uma possível correlação
entre os tipos de negócio e quais as soluções de análise de dados mais adequadas para os
suportar. Ao longo deste documento, é ainda ponderada a possibilidade de desenvolver uma
plataforma de análise de dados em tempo real, genérica o suficiente, para ser aplicável em
qualquer tipo de negócio, reduzindo significativamente os custos de desenvolvimento e
implementação. Neste contexto, são examinadas as arquiteturas Lambda e Kappa, por forma a
entender se são suficientemente universais para essa possibilidade ou se é viável uma
personalização baseada nos seus componentes. De modo a comprovar se qualquer uma destas
arquiteturas de é implementável numa plataforma genérica de análise de dados em
tempo real, o relatório também descreve o desenvolvimento de um caso de uso específico
baseado na arquitetura Kappa
An Iterative Methodology for Defining Big Data Analytics Architectures
Thanks to the advances achieved in the last decade, the lack of adequate technologies to deal with Big Data characteristics such as Data Volume is no longer an issue. Instead, recent studies highlight that one of the main Big Data issues is the lack of expertise to select adequate technologies and build the correct Big Data architecture for the problem at hand. In order to tackle this problem, we present our methodology for the generation of Big Data pipelines based on several requirements derived from Big Data features that are critical for the selection of the most appropriate tools and techniques. Thus, thanks to our approach we reduce the required know-how to select and build Big Data architectures by providing a step-by-step methodology that leads Big Data architects into creating their Big Data Pipelines for the case at hand. Our methodology has been tested in two use cases.This work has been funded by the ECLIPSE project (RTI2018-094283-B-C32) from the Spanish Ministry of Science, Innovation and Universities
Distributed Processing and Analytics of IoT data in Edge Cloud
Sensors of different kinds connect to the IoT network and generate a large number of data streams. We explore the possibility of performing stream processing at the network edge and an architecture to do so. This thesis work is based on a prototype solution developed by Nokia. The system operates close to the data sources and retrieves the data based on requests made by applications through the system. Processing the data close to the place where it is generated can save bandwidth and assist in decision making. This work proposes a processing component operating at the far edge. The applicability of the prototype solution given the proposed processing component was illustrated in three use cases. Those use cases involve analysis performed on values of Key Performance Indicators, data streams generated by air quality sensors called Sensordrones, and recognizing car license plates by an application of deep learning
An Optimized Kappa Architecture for IoT Data Management in Smart Farming
peer reviewedAgriculture 4.0 is a domain of IoT in full growth which produces large amounts of data from machines, robots, and sensors networks. This data must be processed very quickly, especially for the systems that need to make real-time decisions. The Kappa architecture provides a way to process Agriculture 4.0 data at high speed in the cloud, and thus meets processing requirements. This paper presents an optimized version of the Kappa architecture allowing fast and efficient data management in Agriculture. The goal of this optimized version of the classical Kappa architecture is to improve memory management and processing speed. the Kappa architecture parameters are fine tuned in order to process data from a concrete use case. The results of this work have shown the impact of parameters tweaking on the speed of treatment. We have also proven that the combination of Apache Samza with Apache Druid offers the better performances
Internet-of-Things Streaming over Realtime Transport Protocol : A reusablility-oriented approach to enable IoT Streaming
The Internet of Things (IoT) as a group of technologies is gaining momentum to become a prominent factor for novel applications. The existence of high computing capability and the vast amount of IoT devices can be observed in the market today. However, transport protocols are also required to bridge these two advantages.
This thesis discussed the delivery of IoT through the lens of a few selected streaming protocols, which are Realtime Transport Protocol(RTP) and its cooperatives like RTP Control Protocol(RTCP) and Session Initiation Protocol (SIP). These protocols support multimedia content transfer with a heavy-stream characteristic requirement.
The main contribution of this work was the multi-layer reusability schema for IoT streaming over RTP. IoT streaming as a new concept was defined, and its characteristics were introduced to clarify its requirements. After that, the RTP stacks and their commercial implementation-VoLTE(Voice over LTE) were investigated to collect technical insights. Based on this distilled knowledge, the application areas for IoT usage and the adopting methods were described.
In addition to the realization, prototypes were made to be a proof of concept for streaming IoT data with RTP functionalities on distanced devices. These prototypes proved the possibility of applying the same duo-plane architect (signaling/data transferring) widely used in RTP implementation for multimedia services. Following a standard IETF, this implementation is a minimal example of adopting an existing standard for IoT streaming applications