11 research outputs found

    A Comparison of Scheduling Approaches for Mixed-Parallel Applications on Heterogeneous Platforms

    Get PDF
    International audienceMixed-parallel applications can take advantage of large-scale computing platforms but scheduling them efficiently on such platforms is challenging. In this paper we compare the two main proposed approaches for solving this scheduling problem on a heterogeneous set of homogeneous clusters. We first modify previously proposed algorithms for both approaches and show that our modifications lead to significant improvements. We then perform a comparison of the modified algorithms in simulation over a wide range of application and platform conditions. We find that although both approaches have advantages, one of them is most likely he most appropriate for the majority of users

    Scheduling Delta-Critical Tasks in Mixed-Parallel Applications on a National Grid

    Get PDF
    International audienceMixed-parallel applications can take advantage of large-scale computing platforms but scheduling them efficiently on such platforms is challenging. When relying on classic list scheduling algorithms, the issue of independent and selfish task allocation determination may arise. Indeed the allocation of the most critical task may lead to poor allocations for subsequent tasks. In this paper we propose a new mixed-parallel scheduling heuristic that takes into account that several tasks may have almost the same level of criticality during the allocation process. We then perform a comparison of this heuristic with other algorithms in simulation over a wide range of application and on platform conditions. We find that our heuristic achieves better performance in terms of schedule length, speedup and degradation from best

    A Comparison of Scheduling Approaches for Mixed-Parallel Applications on Heterogeneous Platforms

    Get PDF
    International audienceMixed-parallel applications can take advantage of large-scale computing platforms but scheduling them efficiently on such platforms is challenging. In this paper we compare the two main proposed approaches for solving this scheduling problem on a heterogeneous set of homogeneous clusters. We first modify previously proposed algorithms for both approaches and show that our modifications lead to significant improvements. We then perform a comparison of the modified algorithms in simulation over a wide range of application and platform conditions. We find that although both approaches have advantages, one of them is most likely he most appropriate for the majority of users

    Hybrid MPI-Thread Parallelization of the Fast Multipole Method

    Get PDF
    We present in this paper multi-thread and multi-process parallelizations of the Fast Multipole Method (FMM) for Laplace equation, for uniform and non uniform distributions. These parallelizations apply to the original FMM formulation and to our new matrix formulation with BLAS (Basic Linear Algebra Subprograms) routines. Differences between the multi-thread and the multi-process versions are detailed, and a hybrid MPI-thread approach enables to gain parallel efficiency and memory scalability over the pure MPI one on clusters of SMP nodes. On 128 processors, we obtain 85% (respectively 75%) parallel efficiency for uniform (respectively non uniform) distributions with up to 100 million particles

    Towards Data Partitioning for Parallel Computing on Three Interconnected Clusters

    Get PDF
    6th International Symposium on Parallel and Distributed Computing, 2007 (ISPDC 2007), Hagenberg, Austria, 5 - 8 July 2007We present a new data partitioning strategy for parallel computing on three interconnected clusters. This partitioning has two advantages over existing partitionings. First it can reduce communication time due to a lower total volume of communication and a more efficient communication schedule. When the network topology is a linear array this partitioning always results in a lower total volume of communication compared to existing partitionings, provided the most powerful node is at the center of the array. When the topology is fully connected this partitioning results in a lower total volume of communication for all but a few power ratios. Second, it allows for the overlapping of communication and computation. These two inherent advantages work together to reduce overall execution time significantly.Science Foundation Irelan

    Optimizing recovery protocols for replicated database systems

    Full text link
    En la actualidad, el uso de tecnologías de informacíon y sistemas de cómputo tienen una gran influencia en la vida diaria. Dentro de los sistemas informáticos actualmente en uso, son de gran relevancia los sistemas distribuidos por la capacidad que pueden tener para escalar, proporcionar soporte para la tolerancia a fallos y mejorar el desempeño de aplicaciones y proporcionar alta disponibilidad. Los sistemas replicados son un caso especial de los sistemas distribuidos. Esta tesis está centrada en el área de las bases de datos replicadas debido al uso extendido que en el presente se hace de ellas, requiriendo características como: bajos tiempos de respuesta, alto rendimiento en los procesos, balanceo de carga entre las replicas, consistencia e integridad de datos y tolerancia a fallos. En este contexto, el desarrollo de aplicaciones utilizando bases de datos replicadas presenta dificultades que pueden verse atenuadas mediante el uso de servicios de soporte a mas bajo nivel tales como servicios de comunicacion y pertenencia. El uso de los servicios proporcionados por los sistemas de comunicación de grupos permiten ocultar los detalles de las comunicaciones y facilitan el diseño de protocolos de replicación y recuperación. En esta tesis, se presenta un estudio de las alternativas y estrategias empleadas en los protocolos de replicación y recuperación en las bases de datos replicadas. También se revisan diferentes conceptos sobre los sistemas de comunicación de grupos y sincronia virtual. Se caracterizan y clasifican diferentes tipos de protocolos de replicación con respecto a la interacción o soporte que pudieran dar a la recuperación, sin embargo el enfoque se dirige a los protocolos basados en sistemas de comunicación de grupos. Debido a que los sistemas comerciales actuales permiten a los programadores y administradores de sistemas de bases de datos renunciar en alguna medida a la consistencia con la finalidad de aumentar el rendimiento, es importante determinar el nivel de consistencia necesario. En el caso de las bases de datos replicadas la consistencia está muy relacionada con el nivel de aislamiento establecido entre las transacciones. Una de las propuestas centrales de esta tesis es un protocolo de recuperación para un protocolo de replicación basado en certificación. Los protocolos de replicación de base de datos basados en certificación proveen buenas bases para el desarrollo de sus respectivos protocolos de recuperación cuando se utiliza el nivel de aislamiento snapshot. Para tal nivel de aislamiento no se requiere que los readsets sean transferidos entre las réplicas ni revisados en la fase de cetificación y ya que estos protocolos mantienen un histórico de la lista de writesets que es utilizada para certificar las transacciones, este histórico provee la información necesaria para transferir el estado perdido por la réplica en recuperación. Se hace un estudio del rendimiento del protocolo de recuperación básico y de la versión optimizada en la que se compacta la información a transferir. Se presentan los resultados obtenidos en las pruebas de la implementación del protocolo de recuperación en el middleware de soporte. La segunda propuesta esta basada en aplicar el principio de compactación de la informacion de recuperación en un protocolo de recuperación para los protocolos de replicación basados en votación débil. El objetivo es minimizar el tiempo necesario para transfeir y aplicar la información perdida por la réplica en recuperación obteniendo con esto un protocolo de recuperación mas eficiente. Se ha verificado el buen desempeño de este algoritmo a través de una simulación. Para efectuar la simulación se ha hecho uso del entorno de simulación Omnet++. En los resultados de los experimentos puede apreciarse que este protocolo de recuperación tiene buenos resultados en múltiples escenarios. Finalmente, se presenta la verificación de la corrección de ambos algoritmos de recuperación en el Capítulo 5.Nowadays, information technology and computing systems have a great relevance on our lives. Among current computer systems, distributed systems are one of the most important because of their scalability, fault tolerance, performance improvements and high availability. Replicated systems are a specific case of distributed system. This Ph.D. thesis is centered in the replicated database field due to their extended usage, requiring among other properties: low response times, high throughput, load balancing among replicas, data consistency, data integrity and fault tolerance. In this scope, the development of applications that use replicated databases raises some problems that can be reduced using other fault-tolerant building blocks, as group communication and membership services. Thus, the usage of the services provided by group communication systems (GCS) hides several communication details, simplifying the design of replication and recovery protocols. This Ph.D. thesis surveys the alternatives and strategies being used in the replication and recovery protocols for database replication systems. It also summarizes different concepts about group communication systems and virtual synchrony. As a result, the thesis provides a classification of database replication protocols according to their support to (and interaction with) recovery protocols, always assuming that both kinds of protocol rely on a GCS. Since current commercial DBMSs allow that programmers and database administrators sacrifice consistency with the aim of improving performance, it is important to select the appropriate level of consistency. Regarding (replicated) databases, consistency is strongly related to the isolation levels being assigned to transactions. One of the main proposals of this thesis is a recovery protocol for a replication protocol based on certification. Certification-based database replication protocols provide a good basis for the development of their recovery strategies when a snapshot isolation level is assumed. In that level readsets are not needed in the validation step. As a result, they do not need to be transmitted to other replicas. Additionally, these protocols hold a writeset list that is used in the certification/validation step. That list maintains the set of writesets needed by the recovery protocol. This thesis evaluates the performance of a recovery protocol based on the writeset list tranfer (basic protocol) and of an optimized version that compacts the information to be transferred. The second proposal applies the compaction principle to a recovery protocol designed for weak-voting replication protocols. Its aim is to minimize the time needed for transferring and applying the writesets lost by the recovering replica, obtaining in this way an efficient recovery. The performance of this recovery algorithm has been checked implementing a simulator. To this end, the Omnet++ simulating framework has been used. The simulation results confirm that this recovery protocol provides good results in multiple scenarios. Finally, the correction of both recovery protocols is also justified and presented in Chapter 5.García Muñoz, LH. (2013). Optimizing recovery protocols for replicated database systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/31632TESI

    Crash recovery with partial amnesia failure model issues

    Full text link
    Replicated systems are a kind of distributed systems whose main goal is to ensure that computer systems are highly available, fault tolerant and provide high performance. One of the last trends in replication techniques managed by replication protocols, make use of Group Communication Sys- tem, and more specifically of the communication primitive atomic broadcast for developing more eficient replication protocols. An important aspect in these systems consists in how they manage the disconnection of nodes {which degrades their service{ and the connec- tion/reconnection of nodes for maintaining their original support. This task is delegated in replicated systems to recovery protocols. How it works de- pends specially on the failure model adopted. A model commonly used for systems managing large state is the crash-recovery with partial amnesia be- cause it implies short recovery periods. But, assuming it implies arising several problems. Most of them have been already solved in the literature: view management, abort of local transactions started in crashed nodes { when referring to transactional environments{ or for example the reinclu- sion of new nodes to the replicated system. Anyway, there is one problem related to the assumption of this second failure model that has not been completely considered: the amnesia phenomenon. Phenomenon that can lead to inconsistencies if it is not correctly managed. This work presents this inconsistency problem due to the amnesia and formalizes it, de ning the properties that must be ful lled for avoiding it and de ning possible solutions. Besides, it also presents and formalizes an inconsistency problem {due to the amnesia{ which appears under a speci c sequence of events allowed by the majority partition progress condition that will imply to stop the system, proposing the properties for overcoming it and proposing di erent solutions. As a consequence it proposes a new majority partition progress condition. In the sequel there is deDe Juan Marín, R. (2008). Crash recovery with partial amnesia failure model issues [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/3302Palanci

    Bridging a Gap Between Research and Production: Contributions to Scheduling and Simulation

    Get PDF
    Large scale distributed computing infrastructures (e.g., data centers, grids, or clouds) are used by scientists from various domains to produce outstanding research results, such as the discovery of the Higgs Boson in High Energy Physics. These infrastructures are also studied by Computer Scientists to produce their own set of scientific results. Ideally, a virtuous circle should exist between Domain and Computer Scientists: the former raising challenges that could be addressed by the latter. Unfortunately, in many occasions, a gap exists that prevents such an ideal and fostering collaboration. This habilitation covers research works conducted in the fields of scheduling and simulation that contribute to the filling of this gap. It discusses the necessary conditions to achieve this goal and details concrete initiatives in this endeavor

    Revised reference model

    Get PDF
    This document contains an update of the HIDENETS Reference Model, whose preliminary version was introduced in D1.1. The Reference Model contains the overall approach to development and assessment of end-to-end resilience solutions. As such, it presents a framework, which due to its abstraction level is not only restricted to the HIDENETS car-to-car and car-to-infrastructure applications and use-cases. Starting from a condensed summary of the used dependability terminology, the network architecture containing the ad hoc and infrastructure domain and the definition of the main networking elements together with the software architecture of the mobile nodes is presented. The concept of architectural hybridization and its inclusion in HIDENETS-like dependability solutions is described subsequently. A set of communication and middleware level services following the architecture hybridization concept and motivated by the dependability and resilience challenges raised by HIDENETS-like scenarios is then described. Besides architecture solutions, the reference model addresses the assessment of dependability solutions in HIDENETS-like scenarios using quantitative evaluations, realized by a combination of top-down and bottom-up modelling, as well as verification via test scenarios. In order to allow for fault prevention in the software development phase of HIDENETS-like applications, generic UML-based modelling approaches with focus on dependability related aspects are described. The HIDENETS reference model provides the framework in which the detailed solution in the HIDENETS project are being developed, while at the same time facilitating the same task for non-vehicular scenarios and application
    corecore