6 research outputs found
Highlighting the Container Memory Consolidation Problems in Linux
International audienceThe container mechanism supports server consolidation ; to ensure memory performance isolation, Linux relies on static memory limits. However, this results in poor performance, because an application needs are dynamic. In this article we will show current problems with memory consolidation for containers in Linux
Predicting Dynamic Memory Requirements for Scientific Workflow Tasks
With the increasing amount of data available to scientists in disciplines as
diverse as bioinformatics, physics, and remote sensing, scientific workflow
systems are becoming increasingly important for composing and executing
scalable data analysis pipelines. When writing such workflows, users need to
specify the resources to be reserved for tasks so that sufficient resources are
allocated on the target cluster infrastructure. Crucially, underestimating a
task's memory requirements can result in task failures. Therefore, users often
resort to overprovisioning, resulting in significant resource wastage and
decreased throughput.
In this paper, we propose a novel online method that uses monitoring time
series data to predict task memory usage in order to reduce the memory wastage
of scientific workflow tasks. Our method predicts a task's runtime, divides it
into k equally-sized segments, and learns the peak memory value for each
segment depending on the total file input size. We evaluate the prototype
implementation of our method using workflows from the publicly available
nf-core repository, showing an average memory wastage reduction of 29.48%
compared to the best state-of-the-art approac
Container Resource Allocation versus Performance of Data-intensive Applications on Different Cloud Servers
In recent years, data-intensive applications have been increasingly deployed
on cloud systems. Such applications utilize significant compute, memory, and
I/O resources to process large volumes of data. Optimizing the performance and
cost-efficiency for such applications is a non-trivial problem. The problem
becomes even more challenging with the increasing use of containers, which are
popular due to their lower operational overheads and faster boot speed at the
cost of weaker resource assurances for the hosted applications. In this paper,
two containerized data-intensive applications with very different performance
objectives and resource needs were studied on cloud servers with Docker
containers running on Intel Xeon E5 and AMD EPYC Rome multi-core processors
with a range of CPU, memory, and I/O configurations. Primary findings from our
experiments include: 1) Allocating multiple cores to a compute-intensive
application can improve performance, but only if the cores do not contend for
the same caches, and the optimal core counts depend on the specific workload;
2) allocating more memory to a memory-intensive application than its
deterministic data workload does not further improve performance; however, 3)
having multiple such memory-intensive containers on the same server can lead to
cache and memory bus contention leading to significant and volatile performance
degradation. The comparative observations on Intel and AMD servers provided
insights into trade-offs between larger numbers of distributed chiplets
interconnected with higher speed buses (AMD) and larger numbers of centrally
integrated cores and caches with lesser speed buses (Intel). For the two types
of applications studied, the more distributed caches and faster data buses have
benefited the deployment of larger numbers of containers
MemOpLight: Leveraging application feedback to improve container memory consolidation
International audienceThe container mechanism amortizes costs by consolidating several servers onto the same machine, while keeping them mutually isolated.Specifically, to ensure performance isolation, Linux relies on memory limits.These limits are static, despite the fact that application needs are dynamic; this results in poor performance.To solve this issue, MemOpLight uses dynamic application feedback to rebalance physical memory allocation between containers focusing on under-performing ones.This paper presents the issues, explains the design of MemOpLight, and validates it experimentally.Our approach increases total satisfaction by 13% compared to the default
Modeling the Linux page cache for accurate simulation of data-intensive applications
The emergence of Big Data in recent years has led to a growing need in data processing and an increasing number of data intensive applications. Processing and storage of massive amounts of data require large-scale solutions and thus must data-intensive applications be executed on infrastructures such as cloud or High Performance Computing (HPC) clusters. Although there are advancements of hardware/software stack that enable larger computing platforms, some relevant challenges remain in resource management, performance, scheduling, scalability, etc. As a result, there is an increasing demand for optimizing and quantifying performance when executing data-intensive applications on those platforms. While infrastructures with sufficient computing power and storage capacity are available, the I/O performance on disks remains a bottleneck. To tackle this problem, apart from hardware improvements, the Linux page cache is an efficient architectural approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing simulation frameworks do not simulate page caching fully, or even at all. As a result, simulation-based performance studies of data-intensive applications lead to inaccurate results.
This thesis proposes an I/O simulation model that captures the key features of the Linux page cache. We have implemented this model as part of the WRENCH workflow simulation framework, which itself builds on the popular SimGrid distributed systems simulation framework. Our model and its implementation enable the simulation of both single-threaded and multithreaded applications, and of both writeback and writethrough caches for local or network-based filesystems. We evaluate the accuracy of our model in different conditions, including sequential and concurrent applications, as well as local and remote I/Os. The results show that our page cache model reduces the simulation error by up to an order of magnitude when compared to state-of-the-art, cacheless simulations