27 research outputs found
rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks
Scientific applications often contain large and computationally intensive
parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a
balanced load execution of such applications on high performance computing
(HPC) systems. Large HPC systems are vulnerable to processors or node failures
and perturbations in the availability of resources. Most self-scheduling
approaches do not consider fault-tolerant scheduling or depend on failure or
perturbation detection and react by rescheduling failed tasks. In this work, a
robust dynamic load balancing (rDLB) approach is proposed for the robust self
scheduling of independent tasks. The proposed approach is proactive and does
not depend on failure or perturbation detection. The theoretical analysis of
the proposed approach shows that it is linearly scalable and its cost decrease
quadratically by increasing the system size. rDLB is integrated into an MPI DLS
library to evaluate its performance experimentally with two computationally
intensive scientific applications. Results show that rDLB enables the tolerance
of up to (P minus one) processor failures, where P is the number of processors
executing an application. In the presence of perturbations, rDLB boosted the
robustness of DLS techniques up to 30 times and decreased application execution
time up to 7 times compared to their counterparts without rDLB
A Measure of Robustness Against Multiple Kinds of Perturbations
Parallel and distributed heterogeneous computing systems may operate in an environment that undergoes unpredictable changes causing certain system performance features to degrade. Such systems need robustness to guarantee limited degradation despite fluctuations in the behavior of its component parts or environment. Our previous work in this area presented a method for generating a measure of robustness for a given system. However, the focus of that approach was on a scenario where all perturbations were of the same kind, e.g., all perturbations were in message sizes or computation times, but not both message sizes and computation times. This paper gives an extended discussion of the case where perturbations could be of different kinds, and presents some new insights
10- #1123 DISE脩O ROBUSTO DEL SISTEMA LOG脥STICO DE ACOPIO Y REFRIGERACI脫N DE LECHE MEDIANTE AN脕LISIS DE LAS COMPENSACIONES ENTRE EMISIONES DE CO2 Y VALOR PRESENTE NETO
El problema de dise帽o de sistemas log铆sticos es un problema de nivel estrat茅gico que implica la selecci贸n de uno o varios dep贸sitos de un conjunto de ubicaciones candidatas. Durante los 煤ltimos a帽os, muchos problemas de log铆stica e investigaci贸n de operaciones se han extendido para incluir problemas de efecto invernadero y aspectos financieros relacionados con el impacto ambiental de las actividades de transporte. El presente trabajo presenta un dise帽o robusto del sistema log铆stico de acopio y refrigeraci贸n de leche en una Cooperativa (Tordecilla-Madera, Polo, Mu帽oz, Gonz谩lez-Rodr铆guez, 2017). Este dise帽o consiste en la localizaci贸n de tanques de refrigeraci贸n, en donde cada uno permite acopiar la leche de varios productores. El modelo propuesto est谩 formulado como un problema bi-objetivo, considerando la minimizaci贸n de las emisiones de gases de efecto invernadero producida por el trasporte de cantinas de leche en motocicleta y la maximizaci贸n del valor presente neto de la configuraci贸n del sistema (VPN). Al caracterizar la relaci贸n robustez-VPN y robustez-CO2 se determin贸 cu谩l configuraci贸n es m谩s robusta y como se genera esta robustez. El modelo matem谩tico propuesto del problema se resuelve con la t茅cnica cl谩sica de restricci贸n 茅psilon y la robustez se determina por medio de la metodolog铆a FePia (Ali, Maciejewski, Siegel, 2004). Se determin贸 entonces que la Cooperativa debe montar su sistema log铆stico de acopio y refrigeraci贸n de acuerdo con la configuraci贸n escogida y para esta se dise帽贸 un plan t谩ctico que optimiza el uso de los tanques de refrigeraci贸n instalados
Robust resource allocation in weather data processing systems
Includes bibliographical references (pages [9-10]).Reliability of weather data processing systems is of prime importance to ensure the efficient operation of space-based weather monitoring systems. This work defines a heterogeneous weather data processing system that is susceptible to uncertainties in data set arrival times. The resource allocation must be robust with respect to these uncertainties. The tasks to be executed by the data processing system are classified into three broad categories: telemetry, tracking and control (high priority); data processing (medium priority); and data research (low priority).The high priority tasks must be completed before considering medium and low priority tasks. The goal of this research is to find a resource allocation that minimizes makespan of the high priority tasks, and to find a mapping that maximizes a function of the completion time and priority of the medium and low priority tasks. Different heuristic techniques to find near optimal solutions are studied, and their performance is evaluated
Robust processor allocation for independent tasks when dollar cost for processors is a constraint
Includes bibliographical references (pages 9-10).In a distributed heterogeneous computing system, the resources have different capabilities and tasks have different requirements. Different classes of machines used in such systems typically vary in dollar cost based on their computing efficiencies. Makespan (defined as the completion time for an entire set of tasks) is often the performance feature that is optimized. Resource allocation is often done based on estimates of the computation time of each task on each class of machines. Hence, it is important that makespan be robust against errors in computation time estimates. The dollar cost to purchase the machines for use can be a constraint such that only a subset of the machines available can be purchased. The goal of this study is to: (1) select a subset of all the machines available so that the cost constraint for the machines is satisfied, and (2) find a static mapping of tasks so that the robustness of the desired system feature, makespan, is maximized against the errors in task execution time estimates. Six heuristic techniques to this problem are presented and evaluated
Robustness of resource allocation in parallel and distributed computing systems, The
Includes bibliographical references (page [9]).This paper gives an overview of the material to be discussed in the invited keynote presentation by H. J. Siegel; it summarizes our research in [1]. Performing computing and communication tasks on parallel and distributed systems involves the coordinated use of different types of machines, networks, interfaces, and other resources. Decisions about how best to allocate resources are often based on estimated values of task and system parameters, due to uncertainties in the system environment. An important research problem is the development of resource management strategies that can guarantee a particular system performance given such uncertainties. We have designed a methodology for deriving the degree of robustness of a resource allocation - the maximum amount of collective uncertainty in system parameters within which a user-specified level of system performance (QoS) can be guaranteed. Our four-step procedure for deriving a robustness metric for an arbitrary system will be presented. We will illustrate this procedure and its usefulness by deriving robustness metrics for some example distributed systems
The Robustness of Resource Allocation in Parallel and Distributed Computing Systems
This paper gives an overview of the material to be discussed in the invited keynote presentation by H. J. Siegel. Performing computing and communication tasks on parallel and distributed systems involves the coordinated use of different types of machines, networks, interfaces, and other resources. Decisions about how best to allocate resources are often based on estimated values of task and system parameters, due to uncertainties in the system environment. An important research problem is the development of resource management strategies that can guarantee a particular system performance given such uncertainties. We have designed a methodology for deriving the degree of robustness of a resource allocation - the maximum amount of collective uncertainty in system parameters within which a user-specified level of system performance (QoS) can be guaranteed. Our four-step procedure for deriving a robustness metric for an arbitrary system will be presented. We will illustrate this procedure and its usefulness by deriving robustness metrics for some example distributed systems