122 research outputs found
Swap Fairness for Thrashing Mitigation
International audienceThe swap mechanis mallows an operating system to work with more memory than available RAM space, by temporarily flushing some data to disk. However, the system sometimes ends up spending more time swapping data in and out of disk than performing actual computation. This state is called thrashing. Classical strategies against thrashing rely on reducing system load, so as to decrease memory pressure and increase global throughput. Those approaches may however be counterproductive when tricked into advantaging malicious or long-standing processes. This is particularily true in the context of shared hosting or virtualization, where multiple users run uncoordinated and selfish workloads. To address this challenge, we propose an accounting layer that forces swap fairness among processes competing for main memory. It ensures that a process cannot monopolize the swap subsystem by delaying the swap operations of abusive processes, reducing the number of system-wide page faults while maximizing memory utilization
Using TCP/IP traffic shaping to achieve iSCSI service predictability
This thesis reproduces the properties of load interference common in many storage devices using resource sharing for flexibility and maximum hardware utilization. The nature of resource sharing and load is studied and compared to assumptions and models used in previous work. The results are used to design a method for throttling iSCSI initiators, attached to an iSCSI target server, using a packet delay module in Linux Traffic Control. The packet delay throttle enables close-to-linear rate reduction for both read and write operations. Iptables and Ipset are used to add dynamic packet matching needed for rapidly changing throttling values. All throttling is achieved without triggering TCP retransmit timeout and subsequent slow start caused by packet loss. A control mechanism for dynamically adapting throttling values to rapidly changing workloads is implemented using a modified proportional integral derivative (PID) controller. Using experiments, control engineering filtering techniques and results from previous research, a suitable per resource saturation indicator was found. The indicator is an exponential moving average of the wait time of active resource consumers. It is used as input value to the PID controller managing the packet rates of resource consumers, creating a closed control loop managed by the PID controller. Finally a prototype of an autonomic resource prioritization framework is designed. The framework identifies and maintains information about resources, their consumers, their average wait time for active consumers and their set of throttleable consumers. The information is kept in shared memory and a PID controller is spawned for each resource, thus safeguarding read response times by throttling writers on a per-resource basis. The framework is exposed to extreme workload changes and demonstrates high ability to keep read response time below a predefined threshold. Using moderate tuning efforts the framework exhibits low overhead and resource consumption, promising suitability for large scale operation in production environments
Energy Management for Hypervisor-Based Virtual Machines
Current approaches to power management are based on operating systems with full knowledge of and full control over the underlying hardware; the distributed nature of multi-layered virtual machine environments renders such approaches insufficient. In this paper, we present a novel framework for energy management in modular, multi-layered operating system structures. The framework provides a unified model to partition and distribute energy, and mechanisms for energy-aware resource accounting and allocation. As a key property, the framework explicitly takes the recursive energy consumption into account, which is spent, e.g., in the virtualization layer or subsequent driver components.
Our prototypical implementation targets hypervisor- based virtual machine systems and comprises two components: a host-level subsystem, which controls machine-wide energy constraints and enforces them among all guest OSes and service components, and, complementary, an energy-aware guest operating system, capable of fine-grained applicationspecific energy management. Guest level energy management thereby relies on effective virtualization of physical energy effects provided by the virtual machine monitor. Experiments with CPU and disk devices and an external data acquisition system demonstrate that our framework accurately controls and stipulates the power consumption of individual hardware devices, both for energy-aware and energyunaware guest operating systems
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
RADIO: managing the performance of large, distributed storage systems
Els sistemes informà tics d’altes prestacions continuen creixent en grandà ria i complexitat, i sovint han de gestionar moltes tasques diferents simultà niament. El
subsistema d’entrada i sortida és freqüentment un coll d’ampolla per al rendiment general del sistema, i les interferències entre aplicacions poden conduir a la degradació desproporcionada de les prestacions, a temps d’execució impredictibles i a l’ús ineficient dels recursos. Aquesta xerrada presenta la nostra recerca en curs sobre com s’ha de gestionar i garantir l’execució de grans sistemes d’emmagatzematge distribuït. Discutirem el nostre model general per a la gestió del rendiment, supervisarem les nostres solucions per a la UCP (unitat central de processament), el disc, la xarxa, l’emmagatzematge i el servidor de memòria cau, i discutirem la nostra recerca,
encaminada a aplicar aquestes solucions per al control i la gestiĂł de sistemes distribuĂŻts
Analysis of Performance and Power Aspects of Hypervisors in Soft Real-Time Embedded Systems
The exponential growth of malware designed to attack soft real-time embedded systems has necessitated solutions to secure these systems. Hypervisors are a solution, but the overhead imposed by them needs to be quantitatively understood. Experiments were conducted to quantify the overhead hypervisors impose on soft real-time embedded systems. A soft real-time computer vision algorithm was executed, with average and worst-case execution times measured as well as the average power consumption. These experiments were conducted with two hypervisors and a control configuration. The experiments showed that each hypervisor imposed differing amounts of overhead, with one achieving near native performance and the other noticeably impacting the performance of the system
Memory-Aware Scheduling for Fixed Priority Hard Real-Time Computing Systems
As a major component of a computing system, memory has been a key performance and power consumption bottleneck in computer system design. While processor speeds have been kept rising dramatically, the overall computing performance improvement of the entire system is limited by how fast the memory can feed instructions/data to processing units (i.e. so-called memory wall problem). The increasing transistor density and surging access demands from a rapidly growing number of processing cores also significantly elevated the power consumption of the memory system. In addition, the interference of memory access from different applications and processing cores significantly degrade the computation predictability, which is essential to ensure timing specifications in real-time system design. The recent IC technologies (such as 3D-IC technology) and emerging data-intensive real-time applications (such as Virtual Reality/Augmented Reality, Artificial Intelligence, Internet of Things) further amplify these challenges. We believe that it is not simply desirable but necessary to adopt a joint CPU/Memory resource management framework to deal with these grave challenges.
In this dissertation, we focus on studying how to schedule fixed-priority hard real-time tasks with memory impacts taken into considerations. We target on the fixed-priority real-time scheduling scheme since this is one of the most commonly used strategies for practical real-time applications. Specifically, we first develop an approach that takes into consideration not only the execution time variations with cache allocations but also the task period relationship, showing a significant improvement in the feasibility of the system. We further study the problem of how to guarantee timing constraints for hard real-time systems under CPU and memory thermal constraints. We first study the problem under an architecture model with a single core and its main memory individually packaged. We develop a thermal model that can capture the thermal interaction between the processor and memory, and incorporate the periodic resource sever model into our scheduling framework to guarantee both the timing and thermal constraints. We further extend our research to the multi-core architectures with processing cores and memory devices integrated into a single 3D platform. To our best knowledge, this is the first research that can guarantee hard deadline constraints for real-time tasks under temperature constraints for both processing cores and memory devices. Extensive simulation results demonstrate that our proposed scheduling can improve significantly the feasibility of hard real-time systems under thermal constraints
User-aware performance evaluation and optimization of parallel job schedulers
Die Dissertation User-Aware Performance Evaluation and Optimization of Parallel Job Schedulers beschäftigt sich mit der realitätsnahen, dynamischen Simulation und Optimierung von Lastsituationen in parallelen Rechensystemen unter Berücksichtigung von Feedback-Effekten zwischen Performance und Nutzerverhalten. Die Besonderheit solcher Systeme liegt in der geteilten Nutzung durch mehrere Anwender, was zu einer Einschr änkung der Verfügbarkeit von Ressourcen führt. Sollten nicht alle Rechenanfragen, die sogenannten Jobs, gleichzeitig ausgeführt werden können, werden diese in Warteschlangen zwischengespeichert. Da das Verhalten der Nutzer nicht genau bekannt ist, entsteht eine große Unsicherheit bezüglich zukünftiger Lastsituationen. Ziel ist es, Methoden zu finden, die eine Ressourcenzuweisung erzeugt, die den Zielvorstellungen der Nutzer soweit wie möglich entspricht und diese Methoden realistisch zu evaluieren. Dabei ist auch zu berücksichtigen, dass das Nutzerverhalten und die Zielvorstellungen der Nutzer in Abhängigkeit von der Lastsituation und der Ressourcenzuweisung variieren. Es wird ein dreigliedriger Forschungsansatz gewählt: Analyse von Nutzerverhalten unter Ressourcenbeschränkung: In Traces von parallelen Rechensystem zeigt sich, dass die Wartezeit auf Rechenergebnisse mit dem zukünftigen Nutzerverhalten korreliert, d.h. dass es im Durchschnitt länger dauert bis ein Nutzer, der lange warten musste, erneut das System nutzt. Im Rahmen des Promotionsprojekts wurde diese Analyse fortgesetzt und zusätzliche Korrelationen zwischen weiteren Systemparametern (neben der Wartezeit) und Nutzerverhalten aufgedeckt. Des Weiteren wurden Funktionen zur Zufriedenheit und Reaktion von Nutzern auf variierende Antwortzeiten von Rechensystemen entwickelt. Diese Ergebnisse wurden durch eine Umfrage unter Nutzern von Parallelrechner an der TU Dortmund erzielt, für die ein spezieller Frageboden entwickelt wurde. Modellierung von Nutzerverhalten und Feedback Wegen des dynamischen Zusammenhangs zwischen Systemgeschwindigkeit und Nutzerverhalten ist es nötig, Zuweisungsstrategien in dynamischen, feedback-getriebenen Simulationen zu evaluieren. Hierzu wurde ein mehrstufiges Nutzermodell entwickelt, welches die aktuellen Annahmen an Nutzerverhalten beinhaltet und das zukünftige Hinzufügen von zusätzlichen Verhaltenskomponenten ermöglicht. Die Kernelemente umfassen bisher Modelle für den Tages- und Nachtrhythmus, den Arbeitsrhythmus und Eigenschaften der submittierten Jobs. Das dynamische Feedback ist derart gestaltet, dass erst die Fertigstellung von bestimmten Jobs die zukünftige Jobeinreichung auslöst. Optimierung der Allokationsstrategien zur Steigerung der Nutzerzufriedenheit Die mit Hilfe des Fragebogens entwickelte Wartezeitakzeptanz von Nutzern ist durch ein MILP optimiert worden. Das MILP sucht nach Lösungen, die möglichst viele Jobs innerhalb eines akzeptierten Wartezeitfensters startet bzw. die Summe der Verspätungen minimiert. Durch die Komplexität dieses Optimierungsalgorithmus besteht die Evaluation bisher nur auf fixierten, statischen Szenarien, die Abbilder bestimmter System- und Warteschlangenzustände abbilden. Deswegen ist weiterhin geplant, Schedulingverfahren zur Steigerung der Anzahl an eingereichten Jobs und der Wartezeitzufriedenheit mit Hilfe des dynamischen Modells zu evaluieren
- …