11 research outputs found

    Improving Memory Hierarchy Utilisation for Stencil Computations on Multicore Machines

    Full text link
    Although modern supercomputers are composed of multicore machines, one can find scientists that still execute their legacy applications which were developed to monocore cluster where memory hierarchy is dedicated to a sole core. The main objective of this paper is to propose and evaluate an algorithm that identify an efficient blocksize to be applied on MPI stencil computations on multicore machines. Under the light of an extensive experimental analysis, this work shows the benefits of identifying blocksizes that will dividing data on the various cores and suggest a methodology that explore the memory hierarchy available in modern machines

    Versatile Communication Cost Modelling for Multicomputer Task Scheduling

    No full text
    Institute for Computing Systems ArchitectureProgrammers face daunting problems when attempting to design portable programs for multicomputers. This is mainly due to the huge variation in communication performance on the range of multicomputer platforms currently in use. These programmers require a computational model that is sufficiently abstract to allow them to ignore machine-specific performance features, and yet is sufficiently versatile to allow the computational structure to be mapped efficiently to a wide range of multicomputer platforms. This dissertation focusses on parallel computations that can be expressed as task graphs: tasks that must be scheduled on the multicomputer's processors. In the past, scheduling models have only considered the message delay as the predominant communication parameter. In the current generation of parallel machines, however, latency is negligible compared to the CPU penalty of communication-related activity associated with inter-processor communication. This CPU penalty cannot be modelled by a latency parameter because the CPU activity consumes time otherwise available for useful computation. In view of this, we consider a model in which the CPU penalty is significant and is associated with communication events that are incurred when applications execute in parallel. In this dissertation a new multi-stage scheduling approach that takes into account these communication parameters is proposed. Initially, in the first stage, the input task graph is transformed into a new structure that can be scheduled with a smaller number of communication events. Task replication is incorporated to produce clusters of tasks. However, a different view of clusters is adopted. Tasks are clustered so that messages are bundled and consequently, the number of communication events is decreased. The communication event tasks are associated with the relationship between the clusters. More specifically, this stage comprises a family of scheduling heuristics that can be customised to classes of parallel machines, according to their communication performance characteristics, through parameterisation and by varying the order in which the heuristics are applied. A second stage is necessary, where the actual schedule on the target machine is defined. The mechanisms implemented analyse carefully the clusters and their relationship so that communication costs are minimised and the degree of parallelism is exploited. Therefore, the aim of the proposed approach is to tackle the min-max problem, considering realistic architectural issues

    TOWARDS OPTIMAL STATIC TASK SCHEDULING FOR REALISTIC MACHINE MODELS: THEORY AND PRACTICE

    No full text
    Task scheduling is a key element in achieving high performance from multicomputer systems. Efficient scheduling algorithms reduce the interprocessor communication and improve processor utilization. To do so effectively, such algorithms must be based on a communication cost model appropriate for computing systems in use. The optimal scheduling of tasks is NP-hard, and a large number of heuristic algorithms have been proposed for a range of differing scheduling conditions (graph types, granularities and cost or architectural models). Unfortunately, due both to the variety of systems available and the rate at which these systems evolve, an appropriate representative cost model has yet to be established. In this paper we study the problem of task scheduling unde

    On the Scope of Applicability of the ETF Algorithm

    No full text
    Superficially, the Earliest Task First (ETF) heuristic [1] is attractive because it models heterogeneous messages passing through a heterogeneous network. On closer inspection, however, this is precisely the set of circumstances that can cause ETF to produce seriously sub-optimal schedules. In this paper we analyze the scope of applicability of ETF. We show that ETF has a good performance if messages are short and the links are fast and a poor performance otherwise. For the first application we choose the Diamond DAG with unit execution time for each task and the multiprocessor system in the form of the fully connected network. We show that ETF partitions the DAG into lines each of which is scheduled on the same processor. The analysis reveals that if the communication times between pairs of adjacent tasks in a precedence relation are all less than or equal to unit then the schedule is optimal. If the communication time is equal to the processing time needed to evaluate a row then the ..

    Towards an Effective Task Clustering Heuristic for LogP Machines

    No full text
    This paper describes a task scheduling algorithm, based on a LogP-type model, for allocating arbitrary task graphs to fully connected networks of processors. This problem is known to be NP-complete even under the delay model (a special case under the LogP model). The strategy exploits the replication and clustering of tasks to minimise the ill effects of communication overhead on the makespan. The quality of the schedules produced by this LogP-based algorithm, initially under delay model conditions, is compared with that of other good delay model-based approaches

    Harnessing Low-Cost Virtual Machines on the Spot

    No full text
    International audiencePublic cloud providers offer computing resources through a plethora of Virtual Machine (VM) instances of different capacities. Each instance is composed of a pre-determined set of virtualized hardware components of different types and/or quantities (number of cores, memory, storage and bandwidth capacities, etc.), in an attempt to satisfy the demands of a diverse range of user applications. Typically, cloud providers offer these instances under several contract models that differ in terms of availability guarantees and prices (On-demand, Spot, Reserved). This chapter provides an overview on how users might utilize and benefit from the variety of instances and different contract models on offer from public cloud providers to reduce their financial outlays. A methodology to dynamically schedule applications with deadline constraints in both hibernation-prone Spot VMs and On-Demand Instances in order to lower costs in relation to a pure On-demand solution is described. Independent of the chosen contract model, identifying the appropriate instance type for applications is also important when attempting to trim expenses. Since it may not be obvious, a short discussion motivates why this decision is not solely related to defining the required resource capacities the chosen instances should have. Finally, given that some cloud providers have recently introduced the concept of Burstable Instances that can boost their performance for a limited period of time, the chapter closes with a summary of approaches that exploit the discounted rates afforded by this new instance class

    Optimizing computational costs of Spark for SARS‐CoV‐2 sequences comparisons on a commercial cloud

    No full text
    International audienceCloud computing is currently one of the prime choices in the computing infrastructure landscape. In addition to advantages such as the pay-per-use bill model and resource elasticity, there are technical benefits regarding heterogeneity and large-scale configuration. Alongside the classical need for performance, for example, time, space, and energy, there is an interest in the financial cost that might come from budget constraints. Based on scalability considerations and the pricing model of traditional public clouds, a reasonable optimization strategy output could be the most suitable configuration of virtual machines to run a specific workload. From the perspective of runtime and monetary cost optimizations, we provide the adaptation of a Hadoop applications execution cost model extracted from the literature aiming at Spark applications modeled with the MapReduce paradigm. We evaluate our optimizer model executing an improved version of the Diff Sequences Spark application to perform SARS-CoV-2 coronavirus pairwise sequence comparisons using the AWS EC2's virtual machine instances. The experimental results with our model outperformed 80% of the random resource selection scenarios. By only employing spot worker nodes exposed to revocation scenarios rather than on-demand workers, we obtained an average monetary cost reduction of 35.66% with a slight runtime increase of 3.36%

    COMBINATORIAL OPTIMIZATION TECHNICQUES APPLIED TO A PARALLEL PRECONDITIONER BASED ON THE SPIKE ALGORITHM

    No full text
    Abstract. Parallel algorithms capable to use efficiently thousands of multi-core processors is the trend in High Performance Computing. To achieve a high scalability, hybrid solvers are suitable candidates since they can combine the robustness of direct methods and the low computational cost of iterative methods. The parallel hybrid SPIKE algorith
    corecore