2,011 research outputs found

    rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks

    Full text link
    Scientific applications often contain large and computationally intensive parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a balanced load execution of such applications on high performance computing (HPC) systems. Large HPC systems are vulnerable to processors or node failures and perturbations in the availability of resources. Most self-scheduling approaches do not consider fault-tolerant scheduling or depend on failure or perturbation detection and react by rescheduling failed tasks. In this work, a robust dynamic load balancing (rDLB) approach is proposed for the robust self scheduling of independent tasks. The proposed approach is proactive and does not depend on failure or perturbation detection. The theoretical analysis of the proposed approach shows that it is linearly scalable and its cost decrease quadratically by increasing the system size. rDLB is integrated into an MPI DLS library to evaluate its performance experimentally with two computationally intensive scientific applications. Results show that rDLB enables the tolerance of up to (P minus one) processor failures, where P is the number of processors executing an application. In the presence of perturbations, rDLB boosted the robustness of DLS techniques up to 30 times and decreased application execution time up to 7 times compared to their counterparts without rDLB

    A Framework for an adaptive grid scheduling: an organizational perspective

    Get PDF
    Grid systems are complex computational organizations made of several interacting components evolving in an unpredictable and dynamic environment. In such context, scheduling is a key component and should be adaptive to face the numerous disturbances of the grid while guaranteeing its robustness and efficiency. In this context, much work remains at low-level focusing on the scheduling component taken individually. However, thinking the scheduling adaptiveness at a macro level with an organizational view, through its interactions with the other components, is also important. Following this view, in this paper we model a grid system as an agent-based organization and scheduling as a cooperative activity. Indeed, agent technology provides high level organizational concepts (groups, roles, commitments, interaction protocols) to structure, coordinate and ease the adaptation of distributed systems efficiently. More precisely, we make the following contributions. We provide a grid conceptual model that identifies the concepts and entities involved in the cooperative scheduling activity. This model is then used to define a typology of adaptation including perturbing events and actions to undertake in order to adapt. Then, we provide an organizational model, based on the Agent Group Role (AGR) meta-model of Freber, to support an adaptive scheduling at the organizational level. Finally, a simulator and an experimental evaluation have been realized to demonstrate the feasibility of our approach

    Self-Healing Distributed Scheduling Platform

    Get PDF
    International audienceDistributed systems require effective mechanisms to manage the reliable provisioning of computational resources from different and distributed providers. Moreover, the dynamic environment that affects the behaviour of such systems and the complexity of these dynamics demand autonomous capabilities to ensure the behaviour of distributed scheduling platforms and to achieve business and user objectives. In this paper we propose a self-adaptive distributed scheduling platform composed of multiple agents implemented as intelligent feedback control loops to support policy-based scheduling and expose self-healing capabilities. Our platform leverages distributed scheduling processes by (i) allowing each provider to maintain its own internal scheduling process, and (ii) implementing self-healing capabilities based on agent module recovery. Simulated tests are performed to determine the optimal number of agents to be used in the negotiation phase without affecting the scheduling cost function. Test results on a real-life platform are presented to evaluate recovery times and optimize platform parameters

    A Review on Software Architectures for Heterogeneous Platforms

    Full text link
    The increasing demands for computing performance have been a reality regardless of the requirements for smaller and more energy efficient devices. Throughout the years, the strategy adopted by industry was to increase the robustness of a single processor by increasing its clock frequency and mounting more transistors so more calculations could be executed. However, it is known that the physical limits of such processors are being reached, and one way to fulfill such increasing computing demands has been to adopt a strategy based on heterogeneous computing, i.e., using a heterogeneous platform containing more than one type of processor. This way, different types of tasks can be executed by processors that are specialized in them. Heterogeneous computing, however, poses a number of challenges to software engineering, especially in the architecture and deployment phases. In this paper, we conduct an empirical study that aims at discovering the state-of-the-art in software architecture for heterogeneous computing, with focus on deployment. We conduct a systematic mapping study that retrieved 28 studies, which were critically assessed to obtain an overview of the research field. We identified gaps and trends that can be used by both researchers and practitioners as guides to further investigate the topic

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Hierarchical workflow management system for life science applications

    Get PDF
    In modern laboratories, an increasing number of automated stations and instruments are applied as standalone automated systems such as biological high throughput screening systems, chemical parallel reactors etc. At the same time, the mobile robot transportation solution becomes popular with the development of robotic technologies. In this dissertation, a new superordinate control system, called hierarchical workflow management system (HWMS) is presented to manage and to handle both, automated laboratory systems and logistics systems.In modernen Labors werden immer mehr automatisierte Stationen und Instrumente als eigenständige automatisierte Systeme eingesetzt, wie beispielsweise biologische High-Throughput-Screening-Systeme und chemische Parallelreaktoren. Mit der Entwicklung der Robotertechnologien wird gleichzeitig die mobile Robotertransportlösung populär. In der vorliegenden Arbeit wurde ein hierarchisches Verwaltungssystem für Abeitsablauf, welches auch als HWMS bekannt ist, entwickelt. Das neue übergeordnete Kontrollsystem kann sowohl automatisierte Laborsysteme als auch Logistiksysteme verwalten und behandeln

    Parallel evolutionary algorithms for scheduling on heterogeneous computing and grid environments

    Get PDF
    This thesis studies the application of sequential and parallel evolutionary algorithms to the scheduling problem in heterogeneous computing and grid environments, a key problem when executing tasks in distributed computing systems. Since the 1990's, this class of systems has been increasingly employed to provide support for solving complex problems using high-performance computing techniques. The scheduling problem in heterogeneous computing systems is an NP-hard optimization problem, which has been tackled using several optimization methods in the past. Among many new techniques for optimization, evolutionary computing methods have been successfully applied to this class of problems. In this work, several evolutionary algorithms in their sequential and parallel variants are specically designed to provide accurate solutions for the problem, allowing to compute an eficient planning for heterogeneous computing and grid environments. New problem instances, far more complex than those existing in the related literature, are introduced in this thesis in order to study the scalability of the presented parallel evolutionary algorithms. In addition, a new parallel micro-CHC algorithm is developed, inspired by useful ideas from the multiobjective optimization field. Eficient numerical results of this algorithm are reported in the experimental analysis performed on both well-known problem instances and the large instances specially designed in this work. The comparative study including traditional methods and evolutionary algorithms shows that the new parallel micro-CHC is able to achieve a high problem solving eficacy, outperforming previous results already reported for the problem and also having a good scalability behavior when solving high dimension problem instances.In addition, two variants of the scheduling problem in heterogeneous environments are also tackled, showing the versatility of the proposed approach using parallel evolutionary algorithms to deal with both dynamic and multi-objective scenarios.Esta tesis estudia la aplicación de algoritmos evolutivos secuenciales y paralelos para el problema de planicación de tareas en entornos de cómputo heterogéneos y de computación grid. Desde la década de 1990, estos sistemas computacionales han sido utilizados con éxito para resolver problemas complejos utilizando técnicas de computación de alto desempeo. El problema de planificación de tareas en entornos heterogéneos es un problema de optimización NP-difícil que ha sido abordado utilizando diversas técnicas. Entre las técnicas emergentes para optimización combinatoria, los algoritmos evolutivos han sido aplicados con éxito a esta clase de problemas. En este trabajo, varios algoritmos evolutivos en sus versiones secuenciales y paralelas han sido especificamente diseados para alcanzar soluciones precisas para el problema de planicación de tareas en entornos de heterogéneos, permitiendo calcular planificaciones eficientes para entornos que modelan clusters de computadores y plataformas de computación grid. Nuevas instancias del problema, con una complejidad mucho mayor que las previamente existentes en la literatura relacionada, son presentadas en esta tesis con el objetivo de analizar la escalabilidad de los algoritmos evolutivos propuestos. Complementariamente, un nuevo método, el micro-CHC paralelo es desarrollado, inspirado en ideas ítiles provenientes del área de optimización multiobjetivo. Resultados numéricos precisos y eficientes se reportan en el análisis experimental realizado sobre instancias estándar del problema y sobre las nuevas instancias especificamente diseñadas en este trabajo.El estudio comparativo que incluye a métodos tradicionales para planificación de tareas, los nuevos métodos propuestos y algoritmos evolutivos previamente aplicados al problema, demuestra que el nuevo micro-CHC paralelo es capaz de alcanzar altos valores de eficacia, superando a los mejores resultados previamente reportados en la literatura del área y mostrando un buen comportamiento de escalabilidad para resolver las instancias de gran dimensión. Además, dos variantes del problema de planificación de tareas en entornos heterogéneos han sido inicialmente estudiadas, comprobándose la versatilidad del enfoque propuesto para resolver las variantes dinámica y multiobjetivo del problema

    A framework for smart production-logistics systems based on CPS and industrial IoT

    Get PDF
    Industrial Internet of Things (IIoT) has received increasing attention from both academia and industry. However, several challenges including excessively long waiting time and a serious waste of energy still exist in the IIoT-based integration between production and logistics in job shops. To address these challenges, a framework depicting the mechanism and methodology of smart production-logistics systems is proposed to implement intelligent modeling of key manufacturing resources and investigate self-organizing configuration mechanisms. A data-driven model based on analytical target cascading is developed to implement the self-organizing configuration. A case study based on a Chinese engine manufacturer is presented to validate the feasibility and evaluate the performance of the proposed framework and the developed method. The results show that the manufacturing time and the energy consumption are reduced and the computing time is reasonable. This paper potentially enables manufacturers to deploy IIoT-based applications and improve the efficiency of production-logistics systems
    corecore