Search CORE

HAL UVSQ

A study of various load information exchange mechanisms for a distributed application using dynamic scheduling

Author: Guermouche Abdou
L'Excellent Jean-Yves
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

We consider a distributed asynchronous system where processes can only communicate by message passing and need a coherent view of the load(e.g.,workload,memory) of others to take dynamic decisions (scheduling).We present several mechanisms to obtain a distributed view of such information,based eithe ron maintaining that view or demand-driven witha snapshot algorithm.We perform an experimental study in the context of a real application,an asynchronous parallel solver for large sparse systems of linear equationsNous considérons un système distribué et asynchrone où les processus peuvent seulement communiquer par passage de messages, et requièrent une estimation correcte de la charge (travail en attente, mémoire utilisée) des autres processus pour procéder à des décisions dynamiques liées à l'ordonnancement des tâches de calcul. Nous présentons plusieurs types de mécanismes pour obtenir une vision distribuée de telles informations. Dans un premier type d'approches, la vision est maintenue grâce à des échanges de messages réguliers; dans le deuxième type d'approches (mécanismes à la demande ou de type snapshot), le processus demandeur des informations émet une requête, et reçoit ensuite les informations de charge correspondant à sa demande. Nous expérimentons ces approches dans le cadre d'une application réelle utilisant des ordonnanceurs dynamiques distribués

HAL-ENS-LYON

Elsevier - Publisher Connector

Timed Specification For Web Services Compatibility Analysis

Author: Guermouche Nawal
Perrin Olivier
Ringeissen Christophe
Publication venue: Elsevier B.V.
Publication date: 14/12/2007
Field of study

AbstractWeb services are becoming one of the main technologies for designing and building complex inter-enterprise business applications. Usually, a business application cannot be fulfilled by one Web service but by coordinating a set of them. In particular, to perform a coordination, one of the important investigations is the compatibility analysis. Two Web services are said compatible if they can interact correctly. In the literature, the proposed frameworks for the services compatibility checking rely on the supported sequences of messages. The interaction of services depends also on other properties, such that the exchanged data flow. Thus, considering only supported sequences of messages seems to be insufficient. Other properties on which the services interaction can rely on, are the temporal constraints. In this paper, we focus our interest on the compatibility analysis of Web services regarding their (1) supported sequences of messages, (2) the exchanged data flow, (3) constraints related to the exchanged data flow and (4) the temporal requirements. Based on these properties, we study three compatibility classes: (i) absolute compatibility, (ii) likely compatibility and (iii) absolute incompatibility

HAL - Université de Franche-Comté

Self-management of machine-to-machine communications: a multi-models approach

Author: Eichler Cédric
Gharbi Ghada
Guermouche Nawal
Monteil Thierry
Stolf Patricia
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2016
Field of study

International audienceMachine-to-Machine (M2M) paradigm apply to systems composed by numerous devices sharing information and making cooperative decisions with little or no human intervention. The M2M standard defined by the European Telecommunications Standards Institute (ETSI) is the only one providing an end-to-end view of the global M2M architecture. Noticeably, it furnishes a standardised framework for inter-operable M2M services that satisfies most of M2M modelling requirements. However, and even though M2M systems usually operate in highly evolving contexts, this standard does not address the issue of system adaptations. It is furthermore unsuitable for building self-managed systems. This paper introduces a multi-model approach for modelling manageable M2M systems. Said approach consists in a formal graph-based model on top of the ETSI M2M standard, alongside bi-directional updates that ensure layer coherency. Its fitness for enforcing self-management properties is demonstrated by designing high-level reconfiguration rules. Finally, its applicability is illustrated and evaluated using a smart-metering application

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL-INSA Toulouse

Multifrontal QR Factorization for Multicore Architectures over Runtime Systems

Author: Agullo Emmanuel
Buttari Alfredo
Guermouche Abdou
Lopez Florent
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceTo face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability of runtime systems for complex applications, namely, sparse matrix multifrontal factorizations which constitute extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Experimental results on real-life matrices show that it is possible to achieve the same efficiency as with an ad hoc scheduler which relies on the knowledge of the algorithm. A detailed analysis shows the performance behavior of the resulting code and possible ways of improving the effectiveness of runtime systems

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime systems

Author: Agullo Emmanuel
Buttari Alfredo
Guermouche Abdou
Lopez Florent
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2016
Field of study

International audienceTo face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability and effectiveness of runtime systems based on the Sequential Task Flow model for complex applications , namely, sparse matrix multifrontal factorizations which feature extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Most importantly, it shows how this parallel programming model eases the development of complex features that benefit the performance of sparse, direct solvers as well as their memory consumption. We illustrate our discussion with the multifrontal QR factorization running on top of the StarPU runtime system. ACM Reference Format: Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche and Florent Lopez, 2014. Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime system

Scientific Publications of the University of Toulouse II Le Mirail

Exploiting a Parametrized Task Graph model for the parallelization of a sparse direct multifrontal solver

Author: Agullo Emmanuel
Bosilca George
Buttari Alfredo
Guermouche Abdou
Lopez Florent
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/08/2016
Field of study

International audienceThe advent of multicore processors requires to reconsider the design of high performance computing libraries to embrace portable and effective techniques of parallel software engineering. One of the most promising approaches consists in abstracting an application as a directed acyclic graph (DAG) of tasks. While this approach has been popularized for shared memory environments by the OpenMP 4.0 standard where dependencies between tasks are automatically inferred, we investigate an alternative approach, capable of describing the DAG of task in a distributed setting, where task dependencies are explicitly encoded. So far this approach has been mostly used in the case of algorithms with a regular data access pattern and we show in this study that it can be efficiently applied to a higly irregular numerical algorithm such as a sparse multifrontal QR method. We present the resulting implementation and discuss the potential and limits of this approach in terms of productivity and effectiveness in comparison with more common parallelization techniques. Although at an early stage of development, preliminary results show the potential of the parallel programming model that we investigate in this work

Scientific Publications of the University of Toulouse II Le Mirail

Multi-criteria checkpointing strategies: response-time versus resource utilization

Author: Bouteiller Aurélien
Cappello Franck
Dongarra Jack
Guermouche Amina
Herault Thomas
Robert Yves
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceFailures are increasingly threatening the efficiency of HPC systems, and current projections of Exascale platforms indicate that rollback recovery, the most convenient method for providing fault tolerance to general-purpose applications, reaches its own limits at such scales. One of the reasons explaining this unnerving situation comes from the focus that has been given to per-application completion time, rather than to platform efficiency. In this paper, we discuss the case of uncoordinated rollback recovery where the idle time spent waiting recovering processors is used to progress a different, independent application from the system batch queue. We then propose an extended model of uncoordinated checkpointing that can discriminate between idle time and wasted computation. We instantiate this model in a simulator to demonstrate that, with this strategy, uncoordinated checkpointing per application completion time is unchanged, while it delivers near-perfect platform efficiency.Voir le résumé en anglais

Scientific Publications of the University of Toulouse II Le Mirail

Hybrid scheduling for the parallel solution of linear systems

Author: Abdou Guermouche
Jean-Yves L’Excellent
Patrick R. Amestoy
Stéphane Pralet
École Normale
Publication venue: HAL CCSD
Publication date: 01/01/2004
Field of study

In this paper, we consider the problem of designing a dynamic scheduling strategy that takes into account both workload and memory information in the context of the parallel multifrontal factorization. The originality of our approach is that we base our estimations (work and memory) on a static optimistic scenario during the analysis phase. This scenario is then used during the factorization phase to constrain the dynamic decisions. The task scheduler has been redesigned to take into account these new features. Moreover performance have been improved because the new constraints allow the new scheduler to make optimal decisions that were forbidden or too dangerous in unconstrained formulations. Performance analysis show that the memory estimation becomes much closer to the memory effectively used and that even in a constrained memory environment we decrease the factorization time with respect to the initial approach.Nous proposons des stratégies d'ordonnancement bi-critères, qui s'intéressent à la fois à la performance et à la consommation mémoire d'un algorithme parallèle de factorisation de matrices creuses, basé sur la méthode multifrontale. L'originalité de notre approche est que nous basons nos estimations mémoire sur un scénario optimiste (simulation lors de la phase d'analyse),qui est ensuite utilisé lors de la factorisation pour contraindre les décisions dynamiques d'ordonnancement. Un nouvel ordonnanceur a été implanté, qui prend en compte ces nouvelles contraintes. De plus, la performance a été améliorée parce que notre nouvelle approche permet à l'ordonnanceur de prendre des décisions meilleures, qui étaient interdites ou trop dangereuses auparavant. Une analyse de performance montre que les estimations mémoire sont beaucoup plus proches de la mémoire effectivement utilisée, et que le temps de factorisation est amélioré de façon significative par rapport à l'approche initiale

HAL-ENS-LYON

CiteSeerX

HAL Descartes

Experimental analysis of vectorized instructions impact on energy and power consumption under thermal design power constraints

Author: Guermouche Amina
Orgerie Anne-Cécile
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/01/2019
Field of study

International audienceVectorized instructions were introduced to improve the performance of applications. However, they come with an increase in the power consumption cost. As a consequence, processors are designed to limit the frequency of the processors when such instructions are used in order to maintain the thermal design power.In this paper, we study and compare the impact of thermal design power and SIMD instructions on performance, power and energy consumption of processors and memory. The study is performed on three different architectures providing different characteristics and four applications with different profiles (including one application with different phases, each phase having a different profile).The study shows that, because of processor frequency, performance and power consumption are strongly related under thermal design power. It also shows that AVX512 has unexpected behavior regarding processor power consumption, while DRAM power consumption is impacted by SIMD instructions because of the generated memory throughput