5 research outputs found
Editorial for FGCS Special issue on “Time-critical Applications on Software-defined Infrastructures”
Performance requirements in many applications can often be modelled as constraints related to time, for example, the span of data processing for disaster early warning [1], latency in live event broadcasting [2], and jitter during audio/video conferences [3]. These time constraints are often treated either in an “as fast as possible” manner, such as sensitive latencies in high-performance computing or communication tasks, or in a “timeliness” way where tasks have to be finished within a given window in real-time systems, as classified in [4]. To meet the required time constraints, one has to carefully analyse time constraints, engineer and integrate system components, and optimise the scheduling for computing and communication tasks. The development of a time-critical application is thus time-consuming and costly.
During the past decades, the infrastructure technologies of computing, storage and networking have made tremendous progress. Besides the capacity and performance of physical devices, the virtualisation technologies offer effective resource management and isolation at different levels, such as Java Virtual Machines at the application level, Dockers at the operating system level, and Virtual Machines at the whole system level. Moreover, the network embedding [5] and software-defined networking [6] provide network-level virtualisation and control that enable a new paradigm of infrastructure, where infrastructure resources can be virtualised, isolated, and dynamically customised based on application needs.
The software-defined infrastructures, including Cloud, Fog, Edge, software-defined networking and network function virtualisation, emerge nowadays as new environments for distributed applications with time-critical application requirements, but also face challenges in effectively utilising the advanced infrastructure features in system engineering and dynamic control. This special issue on “time-critical applications and software-defined infrastructures” focuses on practical aspects of the design, development, customisation and performance-oriented operation of such applications for Clouds and other distributed environments
Dependability Evaluation of Middleware Technology for Large-scale Distributed Caching
Distributed caching systems (e.g., Memcached) are widely used by service
providers to satisfy accesses by millions of concurrent clients. Given their
large-scale, modern distributed systems rely on a middleware layer to manage
caching nodes, to make applications easier to develop, and to apply load
balancing and replication strategies. In this work, we performed a
dependability evaluation of three popular middleware platforms, namely
Twemproxy by Twitter, Mcrouter by Facebook, and Dynomite by Netflix, to assess
availability and performance under faults, including failures of Memcached
nodes and congestion due to unbalanced workloads and network link bandwidth
bottlenecks. We point out the different availability and performance trade-offs
achieved by the three platforms, and scenarios in which few faulty components
cause cascading failures of the whole distributed system.Comment: 2020 IEEE 31st International Symposium on Software Reliability
Engineering (ISSRE 2020
Overload control for virtual network functions under CPU contention
In this paper, we analyze the problem of overloads caused by physical CPU contention in cloud infrastructures, from the perspective of time-critical applications (such as Virtual Network Functions) running at guest level. We show that guest-level overload control solutions to counteract traffic spikes (e.g., traffic throttling) are counterproductive against overloads caused by CPU contention. We then propose a general guest-level solution to protect applications from overloads also in the case of CPU contention. We reproduced the phenomena on a IP Multimedia Subsystem (IMS) testbed based on OpenStack on top of KVM. The results show that the approach can dynamically adapt the service throughput to the actual system capacity in both cases of traffic spikes and CPU contention, by guaranteeing at the same time the IMS latency requirements