27 research outputs found
On Efficiently Partitioning a Topic in Apache Kafka
Apache Kafka addresses the general problem of delivering extreme high volume
event data to diverse consumers via a publish-subscribe messaging system. It
uses partitions to scale a topic across many brokers for producers to write
data in parallel, and also to facilitate parallel reading of consumers. Even
though Apache Kafka provides some out of the box optimizations, it does not
strictly define how each topic shall be efficiently distributed into
partitions. The well-formulated fine-tuning that is needed in order to improve
an Apache Kafka cluster performance is still an open research problem. In this
paper, we first model the Apache Kafka topic partitioning process for a given
topic. Then, given the set of brokers, constraints and application requirements
on throughput, OS load, replication latency and unavailability, we formulate
the optimization problem of finding how many partitions are needed and show
that it is computationally intractable, being an integer program. Furthermore,
we propose two simple, yet efficient heuristics to solve the problem: the first
tries to minimize and the second to maximize the number of brokers used in the
cluster. Finally, we evaluate its performance via large-scale simulations,
considering as benchmarks some Apache Kafka cluster configuration
recommendations provided by Microsoft and Confluent. We demonstrate that,
unlike the recommendations, the proposed heuristics respect the hard
constraints on replication latency and perform better w.r.t. unavailability
time and OS load, using the system resources in a more prudent way.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. This work was funded by the European Union's Horizon
2020 research and innovation programme MARVEL under grant agreement No 95733
Distributed Path Reconfiguration and Data Forwarding in Industrial IoT Networks
In today's typical industrial environments, the computation of the data
distribution schedules is highly centralised. Typically, a central entity
configures the data forwarding paths so as to guarantee low delivery delays
between data producers and consumers. However, these requirements might become
impossible to meet later on, due to link or node failures, or excessive
degradation of their performance. In this paper, we focus on maintaining the
network functionality required by the applications after such events. We avoid
continuously recomputing the configuration centrally, by designing an energy
efficient local and distributed path reconfiguration method. Specifically,
given the operational parameters required by the applications, we provide
several algorithmic functions which locally reconfigure the data distribution
paths, when a communication link or a network node fails. We compare our method
through simulations to other state of the art methods and we demonstrate
performance gains in terms of energy consumption and data delivery success rate
as well as some emerging key insights which can lead to further performance
gains
ML-based Approaches for Wireless NLOS Localization: Input Representations and Uncertainty Estimation
The challenging problem of non-line-of-sight (NLOS) localization is critical
for many wireless networking applications. The lack of available datasets has
made NLOS localization difficult to tackle with ML-driven methods, but recent
developments in synthetic dataset generation have provided new opportunities
for research. This paper explores three different input representations: (i)
single wireless radio path features, (ii) wireless radio link features
(multi-path), and (iii) image-based representations. Inspired by the two latter
new representations, we design two convolutional neural networks (CNNs) and we
demonstrate that, although not significantly improving the NLOS localization
performance, they are able to support richer prediction outputs, thus allowing
deeper analysis of the predictions. In particular, the richer outputs enable
reliable identification of non-trustworthy predictions and support the
prediction of the top-K candidate locations for a given instance. We also
measure how the availability of various features (such as angles of signal
departure and arrival) affects the model's performance, providing insights
about the types of data that should be collected for enhanced NLOS
localization. Our insights motivate future work on building more efficient
neural architectures and input representations for improved NLOS localization
performance, along with additional useful application features.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. Work partly supported by the RA Science Committee grant
No. 22rl-052 (DISTAL) and the EU under Italian National Recovery and
Resilience Plan of NextGenerationEU on "Telecommunications of the Future"
(PE00000001 - program "RESTART"
Agnostic Learning for Packing Machine Stoppage Prediction in Smart Factories
The cyber-physical convergence is opening up new business opportunities for
industrial operators. The need for deep integration of the cyber and the
physical worlds establishes a rich business agenda towards consolidating new
system and network engineering approaches. This revolution would not be
possible without the rich and heterogeneous sources of data, as well as the
ability of their intelligent exploitation, mainly due to the fact that data
will serve as a fundamental resource to promote Industry 4.0. One of the most
fruitful research and practice areas emerging from this data-rich,
cyber-physical, smart factory environment is the data-driven process monitoring
field, which applies machine learning methodologies to enable predictive
maintenance applications. In this paper, we examine popular time series
forecasting techniques as well as supervised machine learning algorithms in the
applied context of Industry 4.0, by transforming and preprocessing the
historical industrial dataset of a packing machine's operational state
recordings (real data coming from the production line of a manufacturing plant
from the food and beverage domain). In our methodology, we use only a single
signal concerning the machine's operational status to make our predictions,
without considering other operational variables or fault and warning signals,
hence its characterization as ``agnostic''. In this respect, the results
demonstrate that the adopted methods achieve a quite promising performance on
three targeted use cases
IEEE Access Special Section Editorial: Wirelessly Powered Networks, and Technologies
Wireless Power Transfer (WPT) is, by definition, a process that occurs in any system where electrical energy is transmitted from a power source to a load without the connection of electrical conductors. WPT is the driving technology that will enable the next stage in the current consumer electronics revolution, including battery-less sensors, passive RF identification (RFID), passive wireless sensors, the Internet of Things and 5G, and machine-to-machine solutions. WPT-enabled devices can be powered by harvesting energy from the surroundings, including electromagnetic (EM) energy, leading to a new communication networks paradigm, the Wirelessly Powered Networks
A Survey on Networked Data Streaming With Apache Kafka
Apache Kafka has become a popular solution for managing networked data streaming in a variety of applications, from industrial to general purpose. This paper systematically surveys the research literature in this field by carefully classifying it into key macro areas, namely algorithms, networks, data, cyber-physical systems, and security. Through this meticulous classification, the paper aims to identify and analyze the optimization aspects relevant to each area, drawing upon practical applications as the basis for analysis. In this respect, the paper synthesizes and consolidates existing knowledge, saving researchers valuable time and effort in searching for relevant information across multiple sources. The tangible benefits of this survey paper include providing a consolidated knowledge base about research-intensive Apache Kafka topics, highlighting practical insights and novel approaches, pointing up cross-domain applications, identifying related research challenges, and serving as a trusted reference for the Apache Kafka community
Performance Analysis of Latency-Aware Data Management in Industrial IoT Networks
Maintaining critical data access latency requirements is an important challenge of Industry 4.0. The traditional, centralized industrial networks, which transfer the data to a central network controller prior to delivery, might be incapable of meeting such strict requirements. In this paper, we exploit distributed data management to overcome this issue. Given a set of data, the set of consumer nodes and the maximum access latency that consumers can tolerate, we consider a method for identifying and selecting a limited set of proxies in the network where data needed by the consumer nodes can be cached. The method targets at balancing two requirements; data access latency within the given constraints and low numbers of selected proxies. We implement the method and evaluate its performance using a network of WSN430 IEEE 802.15.4-enabled open nodes. Additionally, we validate a simulation model and use it for performance evaluation in larger scales and more general topologies. We demonstrate that the proposed method (i) guarantees average access latency below the given threshold and (ii) outperforms traditional centralized and even distributed approaches
Wireless energy transfer in sensor networks with adaptive, limited knowledge protocols
We investigate the problem of efficient wireless energy transfer in Wireless Rechargeable Sensor Networks (WRSNs). In such networks a special mobile entity (called the Mobile Charger) traverses the network and wirelessly replenishes the energy of sensor nodes. In contrast to most current approaches, we envision methods that are distributed, adaptive and use limited network information. We propose three new, alternative protocols for efficient charging, addressing key issues which we identify, most notably (i) to what extent each sensor should be charged, (ii) what is the best split of the total energy between the charger and the sensors and (iii) what are good trajectories the Mobile Charger should follow. One of our protocols (LRP) performs some distributed, limited sampling of the network status, while another one (RTP) reactively adapts to energy shortage alerts judiciously spread in the network. We conduct detailed simulations in uniform and non-uniform network deployments, using three different underlying routing protocol families. In most cases, both our charging protocols significantly outperform known state of the art methods, while their performance gets quite close to the performance of the global knowledge method (GKP) we also provide. (C) 2014 Elsevier B.V. All rights reserved