31 research outputs found
MOMIS Dashboard: a powerful data analytics tool for Industry 4.0
In this work we present the MOMIS Dashboard, an interactive data analytics tool to explore and visualize data sources content through several kind of dynamic views (e.g. maps, bar, line, pie, etc.). The software tool is very versatile, and supports the connection to the main relational DBMS and Big Data sources. Moreover, it can be connected to MOMIS, a powerful Open Source Data Integration system, able to integrate heterogeneous data sources as enterprise information systems as well as sensors data. MOMIS Dashboard provides a secure permission management to limit data access on the basis of a user role, and a Designer to create and share personalized insights on the company KPIs, facilitating the enterprise collaboration. We illustrate the MOMIS Dashboard efficacy in a real enterprise scenario: a production monitoring platform to analyze real-time and historical data collected through sensors located on production machines that optimize production, energy consumption, and enable preventive maintenance
Blockchain based Access Control for Enterprise Blockchain Applications
Access control is one of the fundamental security mechanisms of IT systems. Most existing access control schemes rely on a centralized party to manage and enforce access control policies. As blockchain technologies, especially permissioned networks, find more applicability beyond cryptocurrencies in enterprise solutions, it is expected that the security requirements will increase. Therefore, it is necessary to develop an access control system that works in a decentralized environment without compromising the unique features of a blockchain. A straightforward method to support access control is to deploy a firewall in front of the enterprise blockchain application. However, this approach does not take advantage of the desirable features of blockchain. In order to address these concerns, we propose a novel blockchainβbased access control scheme, which keeps the decentralization feature for access controlβrelated operations. The newly proposed system also provides the capability to protect user\u27s privacy by leveraging ring signature. We implement a prototype of the scheme using Hyperledger Fabric and assess its performance to show that it is practical for realβworld applications
On Efficiently Partitioning a Topic in Apache Kafka
Apache Kafka addresses the general problem of delivering extreme high volume
event data to diverse consumers via a publish-subscribe messaging system. It
uses partitions to scale a topic across many brokers for producers to write
data in parallel, and also to facilitate parallel reading of consumers. Even
though Apache Kafka provides some out of the box optimizations, it does not
strictly define how each topic shall be efficiently distributed into
partitions. The well-formulated fine-tuning that is needed in order to improve
an Apache Kafka cluster performance is still an open research problem. In this
paper, we first model the Apache Kafka topic partitioning process for a given
topic. Then, given the set of brokers, constraints and application requirements
on throughput, OS load, replication latency and unavailability, we formulate
the optimization problem of finding how many partitions are needed and show
that it is computationally intractable, being an integer program. Furthermore,
we propose two simple, yet efficient heuristics to solve the problem: the first
tries to minimize and the second to maximize the number of brokers used in the
cluster. Finally, we evaluate its performance via large-scale simulations,
considering as benchmarks some Apache Kafka cluster configuration
recommendations provided by Microsoft and Confluent. We demonstrate that,
unlike the recommendations, the proposed heuristics respect the hard
constraints on replication latency and perform better w.r.t. unavailability
time and OS load, using the system resources in a more prudent way.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. This work was funded by the European Union's Horizon
2020 research and innovation programme MARVEL under grant agreement No 95733
ΠΠΎΡΡΡΠΎΠ΅Π½ΠΈΠ΅ Π°ΡΡ ΠΈΡΠ΅ΠΊΡΡΡΡ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΡΠ°Π»ΡΠ½ΠΎΠΉ ΡΠΈΡΡΠ΅ΠΌΡ ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ Π³ΠΎΡΠΎΠ΄ΡΠΊΠΎΠΉ ΡΠ΅Π»ΡΡΠΎΠ²ΠΎΠΉ ΡΡΠ°Π½ΡΠΏΠΎΡΡΠ½ΠΎΠΉ ΡΠΈΡΡΠ΅ΠΌΠΎΠΉ
The increase in the volume of passenger transportation in megalopolises and large urban agglomerations is efficiently provided by the integration of urban public transit systems and city railways. Traffic management under those conditions requires creating intelligent centralised multi-level traffic control systems that implement the required indicators of quality, comfort, and traffic safety regarding passenger transportation. Besides, modern control systems contribute to traction power saving, are foundation and integral part of the systems of digitalisation of urban transit and the cities. Building systems solving the traffic planning and control tasks is implemented using algorithms based on the methods of artificial intelligence, principles of hierarchically structured centralised systems, opportunities provided by Big Data technology. Under those conditions it is necessary to consider growing requirements towards software as well as theoretical design and practical implementation of network organisation.This article discusses designing architecture and shaping requirements for developed applications and their integration with databases to create a centralised intelligent control system for the urban rail transit system (CICS URTS). The article proposes the original architecture of the network, routing of information flows and software of CICS URTS. The routing design is based on a fully connected network. This allows to significantly increase the network bandwidth and meet the requirements regarding information protection, since information flows are formed based on the same type of protocols, which prevents emergence of covert transmission channels. The implementation of the core using full connectivity allows, according to the tags of information flows, to pre-form the routes for exchange of information between servers and applications deployed in CICS URTS. The use of encrypted tags of information flows makes it much more difficult to carry out attacks and organise collection of information about the network structure.Platforms for development of intelligent control systems (ICS), which include CICS URTS, high computing power, data storage capacity and new frameworks are becoming more available for researchers and developers and allow rapid development of ICS. The article discusses the issues of interaction of applications with databases through a combination of several approaches used in the field of Big Data, substantiates combination of Internet of Things (IoT) methodology and microservice architecture. This combination will make it possible to single out business processes in the system and form streaming data processing requiring operational analysis by a human, which is shown by relevant examples.Thus, the objective of the article is to formalise the principles of organising data exchange between CICS URTS and automated control systems (ACS) of railway companies (in our case, using the example of JSC Russian Railways), URTS services providers, and city government bodies, implement the developed approaches into the architecture of CICS URTS and formalise principles of organisation of microservice architecture of CICS URTS software. The main research methods are graph theory, Big Data and IoT methods.Π ΠΎΡΡ ΠΎΠ±ΡΡΠΌΠ° ΠΏΠ°ΡΡΠ°ΠΆΠΈΡΡΠΊΠΈΡ
ΠΏΠ΅ΡΠ΅Π²ΠΎΠ·ΠΎΠΊ Π² ΡΡΠ»ΠΎΠ²ΠΈΡΡ
ΠΊΡΡΠΏΠ½ΡΡ
Π³ΠΎΡΠΎΠ΄ΡΠΊΠΈΡ
Π°Π³Π»ΠΎΠΌΠ΅ΡΠ°ΡΠΈΠΉ ΠΈ ΠΌΠ΅Π³Π°ΠΏΠΎΠ»ΠΈΡΠΎΠ² ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎ ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠΈΠ²Π°Π΅ΡΡΡ ΠΎΠ±ΡΠ΅Π΄ΠΈΠ½Π΅Π½ΠΈΠ΅ΠΌ ΠΎΠ±ΡΠ΅ΡΡΠ²Π΅Π½Π½ΠΎΠ³ΠΎ ΡΡΠ°Π½ΡΠΏΠΎΡΡΠ° ΠΈ Π³ΠΎΡΠΎΠ΄ΡΠΊΠΈΡ
Π»ΠΈΠ½ΠΈΠΉ ΠΆΠ΅Π»Π΅Π·Π½ΡΡ
Π΄ΠΎΡΠΎΠ³. Π£ΠΏΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ Π΄Π²ΠΈΠΆΠ΅Π½ΠΈΠ΅ΠΌ Π² ΡΡΠΈΡ
ΡΡΠ»ΠΎΠ²ΠΈΡΡ
ΡΡΠ΅Π±ΡΠ΅Ρ ΡΠΎΠ·Π΄Π°Π½ΠΈΡ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΡΠ°Π»ΡΠ½ΡΡ
ΡΠ΅Π½ΡΡΠ°Π»ΠΈΠ·ΠΎΠ²Π°Π½Π½ΡΡ
ΠΌΠ½ΠΎΠ³ΠΎΡΡΠΎΠ²Π½Π΅Π²ΡΡ
ΡΠΈΡΡΠ΅ΠΌ ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ, ΡΠ΅Π°Π»ΠΈΠ·ΡΡΡΠΈΡ
Π·Π°Π΄Π°Π½Π½ΡΠ΅ ΠΏΠΎΠΊΠ°Π·Π°ΡΠ΅Π»ΠΈ ΠΊΠ°ΡΠ΅ΡΡΠ²Π°, ΠΊΠΎΠΌΡΠΎΡΡΠ° ΠΈ Π±Π΅Π·ΠΎΠΏΠ°ΡΠ½ΠΎΡΡΠΈ ΠΏΠ΅ΡΠ΅Π²ΠΎΠ·ΠΎΠΊ ΠΏΠ°ΡΡΠ°ΠΆΠΈΡΠΎΠ². Π‘ΠΎΠ²ΡΠ΅ΠΌΠ΅Π½Π½ΡΠ΅ ΡΠΈΡΡΠ΅ΠΌΡ ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡΠ΅Π»ΡΠ½ΠΎ ΡΠ΅ΡΠ°ΡΡ Π·Π°Π΄Π°ΡΠΈ ΡΠΊΠΎΠ½ΠΎΠΌΠΈΠΈ ΡΠ½Π΅ΡΠ³ΠΈΠΈ Π½Π° ΡΡΠ³Ρ ΠΏΠΎΠ΄Π²ΠΈΠΆΠ½ΠΎΠ³ΠΎ ΡΠΎΡΡΠ°Π²Π°, ΡΠ²Π»ΡΡΡΡΡ ΡΡΠ½Π΄Π°ΠΌΠ΅Π½ΡΠΎΠΌ ΠΈ ΡΠΎΡΡΠ°Π²Π½ΠΎΠΉ ΡΠ°ΡΡΡΡ ΡΠΈΡΡΠ΅ΠΌ ΡΠΈΡΡΠΎΠ²ΠΈΠ·Π°ΡΠΈΠΈ Π³ΠΎΡΠΎΠ΄ΡΠΊΠΎΠ³ΠΎ ΡΡΠ°Π½ΡΠΏΠΎΡΡΠ° ΠΈ Π³ΠΎΡΠΎΠ΄Π° Π² ΡΠ΅Π»ΠΎΠΌ. ΠΠΎΡΡΡΠΎΠ΅Π½ΠΈΠ΅ ΡΠΈΡΡΠ΅ΠΌ, ΡΠ΅ΡΠ°ΡΡΠΈΡ
Π·Π°Π΄Π°ΡΠΈ ΠΏΠ»Π°Π½ΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΠΈ ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ Π΄Π²ΠΈΠΆΠ΅Π½ΠΈΠ΅ΠΌ, ΡΠ΅Π°Π»ΠΈΠ·ΡΠ΅ΡΡΡ Ρ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ΠΌ Π°Π»Π³ΠΎΡΠΈΡΠΌΠΎΠ², ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡΡΠΈΡ
ΠΌΠ΅ΡΠΎΠ΄Ρ ΠΈΡΠΊΡΡΡΡΠ²Π΅Π½Π½ΠΎΠ³ΠΎ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΠ°, ΠΏΡΠΈΠ½ΡΠΈΠΏΡ ΠΈΠ΅ΡΠ°ΡΡ
ΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΡΠ΅Π½ΡΡΠ°Π»ΠΈΠ·ΠΎΠ²Π°Π½Π½ΡΡ
ΡΠΈΡΡΠ΅ΠΌ, Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡΠΈ ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΠΈ Big Data. Π ΡΡΠΈΡ
ΡΡΠ»ΠΎΠ²ΠΈΡΡ
Π½Π΅ΠΎΠ±Ρ
ΠΎΠ΄ΠΈΠΌΠΎ ΡΡΠΈΡΡΠ²Π°ΡΡ Π²ΠΎΠ·ΡΠΎΡΡΠΈΠ΅ ΡΡΠ΅Π±ΠΎΠ²Π°Π½ΠΈΡ Π½Π΅ ΡΠΎΠ»ΡΠΊΠΎ ΠΊ ΠΏΡΠΎΠ³ΡΠ°ΠΌΠΌΠ½ΠΎΠΌΡ ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠ΅Π½ΠΈΡ, Π½ΠΎ ΠΈ ΠΊ ΡΠ΅ΠΎΡΠ΅ΡΠΈΡΠ΅ΡΠΊΠΈΠΌ ΠΈ ΠΏΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΈΠΌ ΡΠ΅ΡΠ΅Π½ΠΈΡΠΌ ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΠΈ ΡΠ΅ΡΠΈ.Π Π΄Π°Π½Π½ΠΎΠΉ ΡΡΠ°ΡΡΠ΅ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°ΡΡΡΡ Π²ΠΎΠΏΡΠΎΡΡ ΡΠΎΡΠΌΠΈΡΠΎΠ²Π°Π½ΠΈΡ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΡ ΠΈ ΡΡΠ΅Π±ΠΎΠ²Π°Π½ΠΈΠΉ ΠΊ ΡΠ°Π·ΡΠ°Π±Π°ΡΡΠ²Π°Π΅ΠΌΡΠΌ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡΠΌ ΠΈ ΠΈΡ
ΠΈΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΠΈ Ρ Π±Π°Π·Π°ΠΌΠΈ Π΄Π°Π½Π½ΡΡ
Π΄Π»Ρ ΡΠΎΠ·Π΄Π°Π½ΠΈΡ ΡΠ΅Π½ΡΡΠ°Π»ΠΈΠ·ΠΎΠ²Π°Π½Π½ΠΎΠΉ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΡΠ°Π»ΡΠ½ΠΎΠΉ ΡΠΈΡΡΠ΅ΠΌΡ ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ Π³ΠΎΡΠΎΠ΄ΡΠΊΠΎΠΉ ΡΠ΅Π»ΡΡΠΎΠ²ΠΎΠΉ ΡΡΠ°Π½ΡΠΏΠΎΡΡΠ½ΠΎΠΉ ΡΠΈΡΡΠ΅ΠΌΠΎΠΉ (Π¦ΠΠ‘Π£ ΠΠ Π’Π‘). Π ΡΡΠ°ΡΡΠ΅ ΠΏΡΠ΅Π΄Π»Π°Π³Π°ΡΡΡΡ ΠΎΡΠΈΠ³ΠΈΠ½Π°Π»ΡΠ½ΡΠ΅ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Ρ ΠΊ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ΅ ΡΠ΅ΡΠΈ, ΠΌΠ°ΡΡΡΡΡΠΈΠ·Π°ΡΠΈΠΈ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΡΡ
ΠΏΠΎΡΠΎΠΊΠΎΠ² ΠΈ ΠΏΡΠΎΠ³ΡΠ°ΠΌΠΌΠ½ΠΎΠΌΡ ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠ΅Π½ΠΈΡ Π¦ΠΠ‘Π£ ΠΠ Π’Π‘. Π ΠΎΡΠ½ΠΎΠ²Π΅ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΠΌΠ°ΡΡΡΡΡΠΈΠ·Π°ΡΠΈΠΈ Π»Π΅ΠΆΠΈΡ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΏΠΎΠ»Π½ΠΎΡΠ²ΡΠ·Π½ΠΎΠΉ ΡΠ΅ΡΠΈ. ΠΡΠΎ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ Π·Π½Π°ΡΠΈΡΠ΅Π»ΡΠ½ΠΎ ΡΠ²Π΅Π»ΠΈΡΠΈΡΡ ΠΏΡΠΎΠΏΡΡΠΊΠ½ΡΡ ΡΠΏΠΎΡΠΎΠ±Π½ΠΎΡΡΡ ΡΠ΅ΡΠΈ ΠΈ Π²ΡΠΏΠΎΠ»Π½ΠΈΡΡ ΡΡΠ΅Π±ΠΎΠ²Π°Π½ΠΈΡ ΠΏΠΎ Π·Π°ΡΠΈΡΠ΅ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ, ΡΠ°ΠΊ ΠΊΠ°ΠΊ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΡΠ΅ ΠΏΠΎΡΠΎΠΊΠΈ ΡΠΎΡΠΌΠΈΡΡΡΡΡΡ Π½Π° Π±Π°Π·Π΅ ΠΎΠ΄Π½ΠΎΡΠΈΠΏΠ½ΡΡ
ΠΏΡΠΎΡΠΎΠΊΠΎΠ»ΠΎΠ², ΡΡΠΎ ΠΏΡΠ΅ΠΏΡΡΡΡΠ²ΡΠ΅Ρ ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΡ ΡΠΊΡΡΡΡΡ
ΠΊΠ°Π½Π°Π»ΠΎΠ² ΠΏΠ΅ΡΠ΅Π΄Π°ΡΠΈ. Π Π΅Π°Π»ΠΈΠ·Π°ΡΠΈΡ ΡΠ΄ΡΠ° Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΠΏΠΎΠ»Π½ΠΎΡΠ²ΡΠ·Π½ΠΎΡΡΠΈ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ ΠΏΠΎ ΠΌΠ΅ΡΠΊΠ°ΠΌ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΡΡ
ΠΏΠΎΡΠΎΠΊΠΎΠ² Π·Π°ΡΠ°Π½Π΅Π΅ ΡΡΠΎΡΠΌΠΈΡΠΎΠ²Π°ΡΡ ΠΌΠ°ΡΡΡΡΡΡ ΠΎΠ±ΠΌΠ΅Π½Π° ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ ΠΌΠ΅ΠΆΠ΄Ρ ΡΠ΅ΡΠ²Π΅ΡΠ°ΠΌΠΈ ΠΈ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡΠΌΠΈ, ΡΠ°Π·Π²ΡΡΠ½ΡΡΡΠΌΠΈ Π² Π¦ΠΠ‘Π£ ΠΠ Π’Π‘. ΠΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ ΡΠΈΡΡΠΎΠ²Π°Π½Π½ΡΡ
ΠΌΠ΅ΡΠΎΠΊ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΡΡ
ΠΏΠΎΡΠΎΠΊΠΎΠ² Π·Π½Π°ΡΠΈΡΠ΅Π»ΡΠ½ΠΎ ΡΡΠ»ΠΎΠΆΠ½ΡΠ΅Ρ ΠΏΡΠΎΠ²Π΅Π΄Π΅Π½ΠΈΠ΅ Π°ΡΠ°ΠΊ ΠΈ ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΡ ΡΠ±ΠΎΡΠ° ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ ΠΎ ΡΡΡΡΠΊΡΡΡΠ΅ ΡΠ΅ΡΠΈ.ΠΠ»Π°ΡΡΠΎΡΠΌΡ Π΄Π»Ρ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΡΠ°Π»ΡΠ½ΡΡ
ΡΠΈΡΡΠ΅ΠΌ ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ (ΠΠ‘Π£), ΠΊ ΠΊΠΎΡΠΎΡΡΠΌ ΠΎΡΠ½ΠΎΡΠΈΡΡΡ Π¦ΠΠ‘Π£ ΠΠ Π’Π‘, ΠΎΠ³ΡΠΎΠΌΠ½ΡΠ΅ Π²ΡΡΠΈΡΠ»ΠΈΡΠ΅Π»ΡΠ½ΡΠ΅ ΠΌΠΎΡΠ½ΠΎΡΡΠΈ, Ρ
ΡΠ°Π½ΠΈΠ»ΠΈΡΠ° Π΄Π°Π½Π½ΡΡ
ΠΈ Π½ΠΎΠ²ΡΠ΅ ΡΡΠ΅ΠΉΠΌΠ²ΠΎΡΠΊΠΈ ΡΡΠ°Π½ΠΎΠ²ΡΡΡΡ Π²ΡΡ Π±ΠΎΠ»Π΅Π΅ Π΄ΠΎΡΡΡΠΏΠ½ΡΠΌΠΈ Π΄Π»Ρ ΡΡΡΠ½ΡΡ
ΠΈ ΡΠ°Π·ΡΠ°Π±ΠΎΡΡΠΈΠΊΠΎΠ² ΠΈ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡΡ Π±ΡΡΡΡΠΎ ΡΠ°Π·Π²ΠΈΠ²Π°ΡΡΡΡ ΠΠ‘Π£. Π ΡΡΠ°ΡΡΠ΅ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°ΡΡΡΡ Π²ΠΎΠΏΡΠΎΡΡ Π²Π·Π°ΠΈΠΌΠΎΠ΄Π΅ΠΉΡΡΠ²ΠΈΡ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΠΉ Ρ Π±Π°Π·Π°ΠΌΠΈ Π΄Π°Π½Π½ΡΡ
Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΊΠΎΠΌΠ±ΠΈΠ½Π°ΡΠΈΠΈ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΈΡ
ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ΠΎΠ², ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌΡΡ
Π² ΠΎΠ±Π»Π°ΡΡΠΈ Big Data, ΠΎΠ±ΠΎΡΠ½ΠΎΠ²ΡΠ²Π°Π΅ΡΡΡ ΡΠΎΡΠ΅ΡΠ°Π½ΠΈΠ΅ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ»ΠΎΠ³ΠΈΠΈ ΠΈΠ½ΡΠ΅ΡΠ½Π΅ΡΠ° Π²Π΅ΡΠ΅ΠΉ (IoT) ΠΈ ΠΌΠΈΠΊΡΠΎΡΠ΅ΡΠ²ΠΈΡΠ½ΠΎΠΉ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΡ. ΠΠ°Π½Π½Π°Ρ ΠΊΠΎΠΌΠ±ΠΈΠ½Π°ΡΠΈΡ ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΡ Π²ΡΠ΄Π΅Π»ΠΈΡΡ Π² ΡΠΈΡΡΠ΅ΠΌΠ΅ Π±ΠΈΠ·Π½Π΅Ρ-ΠΏΡΠΎΡΠ΅ΡΡΡ ΠΈ ΡΡΠΎΡΠΌΠΈΡΠΎΠ²Π°ΡΡ ΠΏΠΎΡΠΎΠΊΠΎΠ²ΡΡ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΡ Π΄Π°Π½Π½ΡΡ
, ΡΡΠ΅Π±ΡΡΡΠΈΡ
ΠΎΠΏΠ΅ΡΠ°ΡΠΈΠ²Π½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° ΡΠ΅Π»ΠΎΠ²Π΅ΠΊΠΎΠΌ, ΡΡΠΎ Π΄ΠΎΠΊΠ°Π·ΡΠ²Π°Π΅ΡΡΡ ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΡΡΡΠΈΠΌΠΈ ΠΏΡΠΈΠΌΠ΅ΡΠ°ΠΌΠΈ.Π’Π°ΠΊΠΈΠΌ ΠΎΠ±ΡΠ°Π·ΠΎΠΌ, ΡΠ΅Π»ΡΡ ΡΡΠ°ΡΡΠΈ ΡΠ²Π»ΡΠ΅ΡΡΡ ΡΠΎΡΠΌΠ°Π»ΠΈΠ·Π°ΡΠΈΡ ΠΏΡΠΈΠ½ΡΠΈΠΏΠΎΠ² ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΠΈ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΠΎΠ³ΠΎ ΠΎΠ±ΠΌΠ΅Π½Π° Π¦ΠΠ‘Π£ ΠΠ Π’Π‘ ΠΈ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π½ΡΡ
ΡΠΈΡΡΠ΅ΠΌ ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ (ΠΠ‘Π£) ΠΆΠ΅Π»Π΅Π·Π½ΠΎΠ΄ΠΎΡΠΎΠΆΠ½ΡΡ
ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΉ (Π² Π½Π°ΡΠ΅ΠΌ ΡΠ»ΡΡΠ°Π΅ β Π½Π° ΠΏΡΠΈΠΌΠ΅ΡΠ΅ ΠΠΠ Β«Π ΠΠΒ»), ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΠΉ, ΠΏΡΠ΅Π΄ΠΎΡΡΠ°Π²Π»ΡΡΡΠΈΡ
ΡΡΠ»ΡΠ³ΠΈ ΠΠ Π’Π‘, ΠΈ Π³ΠΎΡΠΎΠ΄ΡΠΊΠΈΡ
ΠΎΡΠ³Π°Π½ΠΎΠ² ΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ, ΡΠ΅Π°Π»ΠΈΠ·Π°ΡΠΈΡ ΡΡΠΎΡΠΌΡΠ»ΠΈΡΠΎΠ²Π°Π½Π½ΡΡ
ΠΏΡΠΈΠ½ΡΠΈΠΏΠΎΠ² Π² Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ΅ Π¦ΠΠ‘Π£ ΠΠ Π’Π‘ ΠΈ ΡΠΎΡΠΌΠ°Π»ΠΈΠ·Π°ΡΠΈΡ ΠΏΡΠΈΠ½ΡΠΈΠΏΠΎΠ² ΠΎΡΠ³Π°Π½ΠΈΠ·Π°ΡΠΈΠΈ ΠΌΠΈΠΊΡΠΎΡΠ΅ΡΠ²ΠΈΡΠ½ΠΎΠΉ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΡ ΠΏΡΠΎΠ³ΡΠ°ΠΌΠΌΠ½ΠΎΠ³ΠΎ ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠ΅Π½ΠΈΡ Π¦ΠΠ‘Π£ ΠΠ Π’Π‘. ΠΡΠ½ΠΎΠ²Π½ΡΠΌΠΈ ΠΌΠ΅ΡΠΎΠ΄Π°ΠΌΠΈ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ ΡΠ²Π»ΡΡΡΡΡ ΡΠ΅ΠΎΡΠΈΡ Π³ΡΠ°ΡΠΎΠ², ΠΌΠ΅ΡΠΎΠ΄Ρ Big Data, IoT
AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing
Distributed Stream Processing Systems (DSPSs) are among the currently most
emerging topics in data management, with applications ranging from real-time
event monitoring to processing complex dataflow programs and big data
analytics. The major market players in this domain are clearly represented by
Apache Spark and Flink, which provide a variety of frontend APIs for SQL,
statistical inference, machine learning, stream processing, and many others.
Yet rather few details are reported on the integration of these engines into
the underlying High-Performance Computing (HPC) infrastructure and the
communication protocols they use. Spark and Flink, for example, are implemented
in Java and still rely on a dedicated master node for managing their control
flow among the worker nodes in a compute cluster.
In this paper, we describe the architecture of our AIR engine, which is
designed from scratch in C++ using the Message Passing Interface (MPI),
pthreads for multithreading, and is directly deployed on top of a common HPC
workload manager such as SLURM. AIR implements a light-weight, dynamic sharding
protocol (referred to as "Asynchronous Iterative Routing"), which facilitates a
direct and asynchronous communication among all client nodes and thereby
completely avoids the overhead induced by the control flow with a master node
that may otherwise form a performance bottleneck. Our experiments over a
variety of benchmark settings confirm that AIR outperforms Spark and Flink in
terms of latency and throughput by a factor of up to 15; moreover, we
demonstrate that AIR scales out much better than existing DSPSs to clusters
consisting of up to 8 nodes and 224 cores.Comment: 16 pages, 6 figures, 15 plot