3 research outputs found

    Desarrollo de un workflow genérico para el modelado de problemas de barrido paramétrico en sistemas distribuidos

    Get PDF
    This work presents the development and experimental validation of a generic workflow model applicable to any parameter sweep problem: the Parameter Sweep Scientific Workflow (PSWF) model. As part of it, a model for the monitoring and management of scientific workflows on distributed systems is developed. This model, Star Superscalar Status (SsTAT), is applicable to the StarSs programming model family. PSWF and SsTAT can be used by the scientific community as a reference for solving problems using the parameter sweep strategy. As an integral part of the work, the treatment of the parameter sweep problem is formalized. This is achieved by developing a general solution based on the PSNSS (Parameter Sweep Nested Summation Symbol) algorithm, using both the original sequential and a concurrent approach. Both versions are implemented and validated, showing its applicability to all automatable PSWF lifecycle phases. Load testing shows that large-scale parameter sweep problems can efficiently be addressed with the proposed approach. In addition, the SsTAT monitoring and management generic model is instantiated for a Grid environment. Thus, an operational implementation of SsTAT based on GRIDSs, GSTAT (GRID Superscalar Status), is developed. A series of tests performed on an actual heterogeneous Grid of computers shows that GSTAT can appropriately develop their functionality even in an environment so demanding as that. As a practical case, the model proposed here is applied to determining the molecular potential energy hypersurfaces. For this purpose, a specific instance of the workflow, called PSHYP (Parameter Sweep Hypersurfaces), is created.En este trabajo se presenta el desarrollo y validación experimental de un modelo de workflow genérico, aplicable a cualquier problema de barrido de parámetros, denominado Parameter Sweep Scientific Workflow (PSWF). Asimismo, se diseña y prueba un modelo de monitorización y gestión de workflows científicos, en sistemas distribuidos, designado como SsTAT (Star Superscalar Status) que es aplicable a la familia de modelos de programación Star Superscalar (StarSs). Los modelos PSWF y SsTAT pueden ser utilizados por la comunidad científica como referencia a la hora de resolver problemas mediante la estrategia de barrido de parámetros. Como parte integral del trabajo se formaliza el tratamiento del problema del barrido de parámetros, desarrollándose una solución general concretada en el algoritmo PSNSS (Parameter Sweep Nested Summation Symbol) en su versión secuencial y concurrente. Ambas versiones se implementan y validan, mostrándose su aplicabilidad a todas las fases automatizables del ciclo de vida PSWF. Mediante la realización de varias pruebas de carga se comprueba que el tratamiento de problemas de barrido de parámetros de gran envergadura puede abordarse eficientemente con la aproximación propuesta. A su vez, el modelo genérico de monitorización y gestión SsTAT se particulariza para un entorno Grid, generándose una implementación operativa del mismo, basada en GRIDSs, denominada GSTAT (GRID Superscalar Status). La realización de una serie de pruebas sobre un Grid real de computadores heterogéneo muestra que GSTAT desarrolla apropiadamente sus funciones incluso en un entorno tan exigente como este. Como caso práctico, se aplica el modelo aquí propuesto a la obtención de la hipersuperficie de energía potencial molecular generando a tal efecto un workflow específico denominado PSHYP (Parameter Sweep Hypersurfaces

    Graph-based task replication for workflow applications

    No full text
    The Grid is an heterogeneous and dynamic environment which enables distributed computation. This makes it a technology prone to failures. Some related work uses replication to overcome failures in a set of independent tasks, and in workflow applications, but they do not consider possible resource limitations when scheduling the replicas. In this paper, we focus on the use of task replication techniques for workflow applications, trying to achieve not only tolerance to the possible failures in an execution, but also to speed up the computation without demanding the user to implement an application-level checkpoint, which may be a difficult task depending on the application. Moreover, we also study what to do when there are not enough resources for replicating all running tasks. We establish different priorities of replication depending on the graph of the workflow application, giving more priority to tasks with a higher output degree. We have implemented our proposed policy in the GRID superscalar system, and we have run the fastDNAml as an experiment to prove our objectives are reached. Finally, we have identified and studied a problem which may arise due to the use of replication in workflow applications: the replication wait time.Peer Reviewe

    Graph-based task replication for workflow applications

    No full text
    The Grid is an heterogeneous and dynamic environment which enables distributed computation. This makes it a technology prone to failures. Some related work uses replication to overcome failures in a set of independent tasks, and in workflow applications, but they do not consider possible resource limitations when scheduling the replicas. In this paper, we focus on the use of task replication techniques for workflow applications, trying to achieve not only tolerance to the possible failures in an execution, but also to speed up the computation without demanding the user to implement an application-level checkpoint, which may be a difficult task depending on the application. Moreover, we also study what to do when there are not enough resources for replicating all running tasks. We establish different priorities of replication depending on the graph of the workflow application, giving more priority to tasks with a higher output degree. We have implemented our proposed policy in the GRID superscalar system, and we have run the fastDNAml as an experiment to prove our objectives are reached. Finally, we have identified and studied a problem which may arise due to the use of replication in workflow applications: the replication wait time.Peer Reviewe
    corecore