14 research outputs found

    Using simple PID-inspired controllers for online resilient resource management of distributed scientific workflows

    Get PDF
    Scientific workflows have become mainstream for conducting large-scale scientific research. As a result, many workflow applications and Workflow Management Systems (WMSs) have been developed as part of the cyberinfrastructure to allow scientists to execute their applications seamlessly on a range of distributed platforms. Although the scientific community has addressed this challenge from both theoretical and practical approaches, failure prediction, detection, and recovery still raise many research questions. In this paper, we propose an approach inspired by the control theory developed as part of autonomic computing to predict failures before they happen, and mitigated them when possible. The proposed approach is inspired on the proportional–integral–derivative controller (PID controller) control loop mechanism, which is widely used in industrial control systems, where the controller will react to adjust its output to mitigate faults. PID controllers aim to detect the possibility of a non-steady state far enough in advance so that an action can be performed to prevent it from happening. To demonstrate the feasibility of the approach, we tackle two common execution faults of large scale data-intensive workflows—data storage overload and memory overflow. We developed a simulator, which implements and evaluates simple standalone PID-inspired controllers to autonomously manage data and memory usage of a data-intensive bioinformatics workflow that consumes/produces over 4.4 TB of data, and requires over 24 TB of memory to run all tasks concurrently. Experimental results obtained via simulation indicate that workflow executions may significantly benefit from the controller-inspired approach, in particular under online and unknown conditions. Simulation results show that nearly-optimal executions (slowdown of 1.01) can be attained when using our proposed method, and faults are detected and mitigated far in advance of their occurrence

    Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

    Get PDF
    Scientific workflows have become mainstream for conductinglarge-scale scientific research. As a result, many workflowapplications and Workflow Management Systems (WMSs)have been developed as part of the cyberinfrastructure toallow scientists to execute their applications seamlessly ona range of distributed platforms. In spite of many successstories, a key challenge for running workflows in distributedsystems is failure prediction, detection, and recovery. Inthis paper, we propose an approach to use control theorydeveloped as part of autonomic computing to predict failures before they happen, and mitigated them when possible.The proposed approach applying the proportional-integralderivative controller (PID controller) control loop mechanism, which is widely used in industrial control systems, tomitigate faults by adjusting the inputs of the controller. ThePID controller aims at detecting the possibility of a fault farenough in advance so that an action can be performed toprevent it from happening. To demonstrate the feasibility ofthe approach, we tackle two common execution faults of theBig Data era—data storage overload and memory overflow.We define, implement, and evaluate simple PID controllersto autonomously manage data and memory usage of a bioinformatics workflow that consumes/produces over 4.4TB ofdata, and requires over 24TB of memory to run all tasksconcurrently. Experimental results indicate that workflowexecutions may significantly benefit from PID controllers,in particular under online and unknown conditions. Simulation results show that nearly-optimal executions (slowdownof 1.01) can be attained when using our proposed method,and faults are detected and mitigated far in advance of theiroccurence

    Online Multi-User Workflow Scheduling Algorithm for Fairness and Energy Optimization

    Get PDF
    International audienceThis article tackles the problem of scheduling multiuser scientific workflows with unpredictable random arrivals and uncertain task execution times in a Cloud environment from the Cloud provider point of view. The solution consists in a deadline sensitive online algorithm, named NEARDEADLINE, that optimizes two metrics: the energy consumption and the fairness between users. Scheduling workflows in a private Cloud environment is a difficult optimization problem as capacity constraints must be fulfilled additionally to dependencies constraints between tasks of the workflows. Furthermore, NEARDEADLINE is built upon a new workflow execution platform. As far as we know no existing work tries to combine both energy consumption and fairness metrics in their optimization problem. The experiments conducted on a real infrastructure (clusters of Grid'5000) demonstrate that the NEARDEADLINE algorithm offers real benefits in reducing energy consumption, and enhancing user fairness

    Analysis and optimization of task granularity on the Java virtual machine

    Get PDF
    Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasks (i.e., small tasks carrying out few computations) may introduce considerable parallelization overheads. On the other hand, coarse-grained tasks (i.e., large tasks performing substantial computations) may not fully utilize the available CPU cores, leading to missed parallelization opportunities. We focus on task-parallel applications running in a single Java Virtual Machine on a shared- memory multicore. Despite their performance may considerably depend on the granularity of their tasks, this topic has received little attention in the literature. Our work fills this gap, analyzing and optimizing the task granularity of such applications. In this dissertation, we present a new methodology to accurately and efficiently collect the granularity of each executed task, implemented in a novel profiler. Our profiler collects carefully selected metrics from the whole system stack with low overhead. Our tool helps developers locate performance and scalability problems, and identifies classes and methods where optimizations related to task granularity are needed, guiding developers towards useful optimizations. Moreover, we introduce a novel technique to drastically reduce the overhead of task-granularity profiling, by reifying the class hierarchy of the target application within a separate instrumentation process. Our approach allows the instrumentation process to instrument only the classes representing tasks, inserting more efficient instrumentation code which decreases the overhead of task detection. Our technique significantly speeds up task-granularity profiling and so enables the collection of accurate metrics with low overhead.We use our novel techniques to analyze task granularity in the DaCapo, ScalaBench, and Spark Perf benchmark suites. We reveal inefficiencies related to fine-grained and coarse-grained tasks in several workloads. We demonstrate that the collected task-granularity profiles are actionable by optimizing task granularity in numerous benchmarks, performing optimizations in classes and methods indicated by our tool. Our optimizations result in significant speedups (up to a factor of 5.90x) in numerous workloads suffering from fine- and coarse-grained tasks in different environments. Our results highlight the importance of analyzing and optimizing task granularity on the Java Virtual Machine

    Controlling fairness and task granularity in distributed, online, non-clairvoyant workflow executions

    No full text
    International audienceDistributed computing infrastructures are commonly used for scientific computing, and science gateways provide complete middleware stacks to allow their transparent exploitation by end users. However, administrating such systems manually is time consuming and sub-optimal because of the complexity of the execution conditions. Algorithms and frameworks aiming at automating system administration must deal with online and non-clairvoyant conditions, where most parameters are unknown and evolve over time. We consider the problem of controlling task granularity and fairness among scientific workflows executed in these conditions. We present two self-managing loops monitoring the fineness, coarseness, and fairness of workflow executions, comparing these metrics with thresholds extracted from knowledge acquired in previous executions and planning appropriate actions to maintain these metrics to appropriate ranges. Experiments on the European Grid Infrastructure show that our task granularity control can speed up executions up to a factor of 2 and that our fairness control reduces slowdown variability by 3-7 compared with first-come, first-served. We also study the interaction between granularity control and fairness control: our experiments demonstrate that controlling task granularity degrades fairness but that our fairness control algorithm can compensate this degradation

    Criblage virtuel sur grille de composés isolés au Vietnam

    Get PDF
    Virtual Screening (VS) is a computational technique used in the drug discovery process to select the most promising candidate drugs for in vitro testing from millions of chemical compounds. This method can offer an efficient alternative to reduce the cost of drug discovery and platform. The Natural Products Chemistry Institute of the Academy of Sciences of Vietnam (INPC) collects samples from local biodiversity and determines the 3D structure of single molecules. Their challenge is to set up a virtual screening platform on grid computing for their chemists to process their data. However, as the number of users who might have a wide range of virtual screening applications (in terms of the number of tasks and execution time) increases with limited available computing resources, it becomes crucial to devise an effective scheduling policy that can ensure a certain degree of fairness, user satisfaction and overall system throughput. In this context, the thesis focuses on an effective scheduling policy for the virtual screening workflow where multiple users with varying numbers of tasks are actively sharing a common system infrastructure. We have researched in theory and proposed some candidate policies. With the simulation results and the experimentation results in real system, we proposed the best policy for the fairness between users, which can be applied to INPC virtual screening platform.L’Institut National des Produits Chimiques de l’Académie des Sciences du Vietnam (INPC) développe depuis plusieurs années une activité autour de la recherche de nouveaux médicaments issus de la biodiversité. Le développement d’un nouveau médicament prend de l’ordre d’une dizaine d’années et passe par plusieurs phases. Dans la phase de découverte, l’activité des composés chimiques sur une cible biologique est mesurée afin de mettre en évidence une action inhibitrice. Le développement d’approches in silico pour le criblage virtuel des composés chimiques est une alternative aux approches classiques in vitro beaucoup plus coûteuses à mettre en œuvre. L’utilisation de la grille a été identifiée comme une voie économiquement prometteuse pour accompagner la recherche de nouveaux médicaments au Vietnam. En effet, le développement de nouvelles stratégies basées sur l’utilisation de plates-formes de soumission de tâches (DIRAC, HTCaaS) a permis d’améliorer considérablement le taux de succès et le confort des utilisateurs, ouvrant la voie à une démocratisation de la grille.Dans ce contexte, l’objectif poursuivi dans le cadre de cette thèse est d’étudier dans quelle mesure des plates-formes multidisciplinaires pouvaient répondre aux besoins des chimistes de l’INPC. Le travail s’est concentré sur les modalités d’un partage équitable d’une plate-forme de soumission de tâches sur la grille par une ou plusieurs communautés d’utilisateurs. L’ordonnancement des tâches sur un serveur commun doit permettre que les différents groupes aient une expérience positive et comparable. Sur les infrastructures de grille EGEE et EGI en Europe , on peut distinguer deux grandes catégories d’utilisateurs : les utilisateurs « normaux » qui vont solliciter les ressources pour des tâches requérant typiquement de quelques dizaines à quelques centaines d’heures de calcul, et les « gros » utilisateurs qui vont lancer des grandes productions nécessitant le traitement de plusieurs milliers de tâches pendant des dizaines, voire des centaines de milliers d’heures de calcul. Les stratégies d’ordonnancement déployées aujourd’hui sur les plates-formes comme DIRAC ou HTCaaS ne permettent pas de servir de façon optimale et simultanée ces deux familles d’utilisateurs.Le manuscrit présente une évaluation par simulation des performances de plusieurs stratégies d’ordonnancement des tâches d’une plate-forme soumettant des jobs pilotes. L’outil SimGrid a permis de simuler l’infrastructure de grille régionale déployée en Auvergne à partir de traces archivées de son utilisation. Après évaluation des performances de plusieurs politiques d’ordonnancement tirées de la littérature, une nouvelle politique a été proposée dans laquelle les utilisateurs normaux et les très gros utilisateurs sont gérés de façon indépendante. Grâce à cette politique, le ralentissement expérimenté par les très gros utilisateurs est réduit significativement sans pénaliser excessivement les utilisateurs normaux. L’étude a été étendue à une fédération de clouds utilisant les mêmes ressources et arrive aux mêmes conclusions. Les performances des politiques d’ordonnancement ont ensuite été évaluées sur des environnements de production, à savoir l’infrastructure de grille européenne EGI et l’infrastructure nationale de supercalculateurs de la Corée du Sud. Un serveur DIRAC a été adossé aux ressources de l’organisation virtuelle biomédicale d’EGI pour étudier les ralentissements observés par les utilisateurs de ce serveur. Pareillement, les ralentissements expérimentés par les utilisateurs de la plate-forme HTCaaS au KISTI ont été observés en excellent accord avec les résultats de simulation avec SimGrid.Ces travaux confirment la faisabilité et l’intérêt d’une plate-forme unique au Vietnam au service des communautés scientifiques consommatrices des ressources académiques de grille et de cloud, notamment pour la recherche de nouveaux médicaments
    corecore