329 research outputs found

    Minimizing Energy Consumption of MPI Programs in Realistic Environment

    Full text link
    Dynamic voltage and frequency scaling proves to be an efficient way of reducing energy consumption of servers. Energy savings are typically achieved by setting a well-chosen frequency during some program phases. However, determining suitable program phases and their associated optimal frequencies is a complex problem. Moreover, hardware is constrained by non negligible frequency transition latencies. Thus, various heuristics were proposed to determine and apply frequencies, but evaluating their efficiency remains an issue. In this paper, we translate the energy minimization problem into a mixed integer program that specifically models most current hardware limitations. The problem solution then estimates the minimal energy consumption and the associated frequency schedule. The paper provides two different formulations and a discussion on the feasibility of each of them on realistic applications

    A study of various load information exchange mechanisms for a distributed application using dynamic scheduling

    Get PDF
    We consider a distributed asynchronous system where processes can only communicate by message passing and need a coherent view of the load(e.g.,workload,memory) of others to take dynamic decisions (scheduling).We present several mechanisms to obtain a distributed view of such information,based eithe ron maintaining that view or demand-driven witha snapshot algorithm.We perform an experimental study in the context of a real application,an asynchronous parallel solver for large sparse systems of linear equationsNous considĂ©rons un systĂšme distribuĂ© et asynchrone oĂč les processus peuvent seulement communiquer par passage de messages, et requiĂšrent une estimation correcte de la charge (travail en attente, mĂ©moire utilisĂ©e) des autres processus pour procĂ©der à  des dĂ©cisions dynamiques liĂ©es à  l'ordonnancement des tĂąches de calcul. Nous prĂ©sentons plusieurs types de mĂ©canismes pour obtenir une vision distribuĂ©e de telles informations. Dans un premier type d'approches, la vision est maintenue grĂące Ă  des Ă©changes de messages rĂ©guliers; dans le deuxiĂšme type d'approches (mĂ©canismes à  la demande ou de type snapshot), le processus demandeur des informations Ă©met une requĂȘte, et reçoit ensuite les informations de charge correspondant à  sa demande. Nous expĂ©rimentons ces approches dans le cadre d'une application rĂ©elle utilisant des ordonnanceurs dynamiques distribuĂ©s

    Timed Specification For Web Services Compatibility Analysis

    Get PDF
    AbstractWeb services are becoming one of the main technologies for designing and building complex inter-enterprise business applications. Usually, a business application cannot be fulfilled by one Web service but by coordinating a set of them. In particular, to perform a coordination, one of the important investigations is the compatibility analysis. Two Web services are said compatible if they can interact correctly. In the literature, the proposed frameworks for the services compatibility checking rely on the supported sequences of messages. The interaction of services depends also on other properties, such that the exchanged data flow. Thus, considering only supported sequences of messages seems to be insufficient. Other properties on which the services interaction can rely on, are the temporal constraints. In this paper, we focus our interest on the compatibility analysis of Web services regarding their (1) supported sequences of messages, (2) the exchanged data flow, (3) constraints related to the exchanged data flow and (4) the temporal requirements. Based on these properties, we study three compatibility classes: (i) absolute compatibility, (ii) likely compatibility and (iii) absolute incompatibility

    Self-management of machine-to-machine communications: a multi-models approach

    Get PDF
    International audienceMachine-to-Machine (M2M) paradigm apply to systems composed by numerous devices sharing information and making cooperative decisions with little or no human intervention. The M2M standard defined by the European Telecommunications Standards Institute (ETSI) is the only one providing an end-to-end view of the global M2M architecture. Noticeably, it furnishes a standardised framework for inter-operable M2M services that satisfies most of M2M modelling requirements. However, and even though M2M systems usually operate in highly evolving contexts, this standard does not address the issue of system adaptations. It is furthermore unsuitable for building self-managed systems. This paper introduces a multi-model approach for modelling manageable M2M systems. Said approach consists in a formal graph-based model on top of the ETSI M2M standard, alongside bi-directional updates that ensure layer coherency. Its fitness for enforcing self-management properties is demonstrated by designing high-level reconfiguration rules. Finally, its applicability is illustrated and evaluated using a smart-metering application

    Multifrontal QR Factorization for Multicore Architectures over Runtime Systems

    Get PDF
    International audienceTo face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability of runtime systems for complex applications, namely, sparse matrix multifrontal factorizations which constitute extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Experimental results on real-life matrices show that it is possible to achieve the same efficiency as with an ad hoc scheduler which relies on the knowledge of the algorithm. A detailed analysis shows the performance behavior of the resulting code and possible ways of improving the effectiveness of runtime systems

    Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime systems

    Get PDF
    International audienceTo face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability and effectiveness of runtime systems based on the Sequential Task Flow model for complex applications , namely, sparse matrix multifrontal factorizations which feature extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Most importantly, it shows how this parallel programming model eases the development of complex features that benefit the performance of sparse, direct solvers as well as their memory consumption. We illustrate our discussion with the multifrontal QR factorization running on top of the StarPU runtime system. ACM Reference Format: Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche and Florent Lopez, 2014. Implementing multifrontal sparse solvers for multicore architectures with Sequential Task Flow runtime system

    Exploiting a Parametrized Task Graph model for the parallelization of a sparse direct multifrontal solver

    Get PDF
    International audienceThe advent of multicore processors requires to reconsider the design of high performance computing libraries to embrace portable and effective techniques of parallel software engineering. One of the most promising approaches consists in abstracting an application as a directed acyclic graph (DAG) of tasks. While this approach has been popularized for shared memory environments by the OpenMP 4.0 standard where dependencies between tasks are automatically inferred, we investigate an alternative approach, capable of describing the DAG of task in a distributed setting, where task dependencies are explicitly encoded. So far this approach has been mostly used in the case of algorithms with a regular data access pattern and we show in this study that it can be efficiently applied to a higly irregular numerical algorithm such as a sparse multifrontal QR method. We present the resulting implementation and discuss the potential and limits of this approach in terms of productivity and effectiveness in comparison with more common parallelization techniques. Although at an early stage of development, preliminary results show the potential of the parallel programming model that we investigate in this work

    Multi-criteria checkpointing strategies: response-time versus resource utilization

    Get PDF
    International audienceFailures are increasingly threatening the efficiency of HPC systems, and current projections of Exascale platforms indicate that rollback recovery, the most convenient method for providing fault tolerance to general-purpose applications, reaches its own limits at such scales. One of the reasons explaining this unnerving situation comes from the focus that has been given to per-application completion time, rather than to platform efficiency. In this paper, we discuss the case of uncoordinated rollback recovery where the idle time spent waiting recovering processors is used to progress a different, independent application from the system batch queue. We then propose an extended model of uncoordinated checkpointing that can discriminate between idle time and wasted computation. We instantiate this model in a simulator to demonstrate that, with this strategy, uncoordinated checkpointing per application completion time is unchanged, while it delivers near-perfect platform efficiency.Voir le résumé en anglais

    Hybrid scheduling for the parallel solution of linear systems

    Get PDF
    In this paper, we consider the problem of designing a dynamic scheduling strategy that takes into account both workload and memory information in the context of the parallel multifrontal factorization. The originality of our approach is that we base our estimations (work and memory) on a static optimistic scenario during the analysis phase. This scenario is then used during the factorization phase to constrain the dynamic decisions. The task scheduler has been redesigned to take into account these new features. Moreover performance have been improved because the new constraints allow the new scheduler to make optimal decisions that were forbidden or too dangerous in unconstrained formulations. Performance analysis show that the memory estimation becomes much closer to the memory effectively used and that even in a constrained memory environment we decrease the factorization time with respect to the initial approach.Nous proposons des stratégies d'ordonnancement bi-critÚres, qui s'intéressent à la fois à la performance et à la consommation mémoire d'un algorithme parallÚle de factorisation de matrices creuses, basé sur la méthode multifrontale. L'originalité de notre approche est que nous basons nos estimations mémoire sur un scénario optimiste (simulation lors de la phase d'analyse),qui est ensuite utilisé lors de la factorisation pour contraindre les décisions dynamiques d'ordonnancement. Un nouvel ordonnanceur a été implanté, qui prend en compte ces nouvelles contraintes. De plus, la performance a été améliorée parce que notre nouvelle approche permet à l'ordonnanceur de prendre des décisions meilleures, qui étaient interdites ou trop dangereuses auparavant. Une analyse de performance montre que les estimations mémoire sont beaucoup plus proches de la mémoire effectivement utilisée, et que le temps de factorisation est amélioré de façon significative par rapport à l'approche initiale

    Experimental analysis of vectorized instructions impact on energy and power consumption under thermal design power constraints

    Get PDF
    International audienceVectorized instructions were introduced to improve the performance of applications. However, they come with an increase in the power consumption cost. As a consequence, processors are designed to limit the frequency of the processors when such instructions are used in order to maintain the thermal design power.In this paper, we study and compare the impact of thermal design power and SIMD instructions on performance, power and energy consumption of processors and memory. The study is performed on three different architectures providing different characteristics and four applications with different profiles (including one application with different phases, each phase having a different profile).The study shows that, because of processor frequency, performance and power consumption are strongly related under thermal design power. It also shows that AVX512 has unexpected behavior regarding processor power consumption, while DRAM power consumption is impacted by SIMD instructions because of the generated memory throughput
