5,058 research outputs found

    Load Balancing Regular Meshes on SMPS with MPI

    Get PDF
    Domain decomposition for regular meshes on parallel computers has traditionally been performed by attempting to exactly partition the work among the available processors (now cores). However, these strategies often do not consider the inherent system noise which can hinder MPI application scalability to emerging peta-scale machines with 10000+ nodes. In this work, we suggest a solution that uses a tunable hybrid static/dynamic scheduling strategy that can be incorporated into current MPI implementations of mesh codes. By applying this strategy to a 3D jacobi algorithm, we achieve performance gains of at least 16% for 64 SMP nodes

    A fine-grain time-sharing Time Warp system

    Get PDF
    Although Parallel Discrete Event Simulation (PDES) platforms relying on the Time Warp (optimistic) synchronization protocol already allow for exploiting parallelism, several techniques have been proposed to further favor performance. Among them we can mention optimized approaches for state restore, as well as techniques for load balancing or (dynamically) controlling the speculation degree, the latter being specifically targeted at reducing the incidence of causality errors leading to waste of computation. However, in state of the art Time Warp systems, events’ processing is not preemptable, which may prevent the possibility to promptly react to the injection of higher priority (say lower timestamp) events. Delaying the processing of these events may, in turn, give rise to higher incidence of incorrect speculation. In this article we present the design and realization of a fine-grain time-sharing Time Warp system, to be run on multi-core Linux machines, which makes systematic use of event preemption in order to dynamically reassign the CPU to higher priority events/tasks. Our proposal is based on a truly dual mode execution, application vs platform, which includes a timer-interrupt based support for bringing control back to platform mode for possible CPU reassignment according to very fine grain periods. The latter facility is offered by an ad-hoc timer-interrupt management module for Linux, which we release, together with the overall time-sharing support, within the open source ROOT-Sim platform. An experimental assessment based on the classical PHOLD benchmark and two real world models is presented, which shows how our proposal effectively leads to the reduction of the incidence of causality errors, as compared to traditional Time Warp, especially when running with higher degrees of parallelism

    Leakage-Aware Multiprocessor Scheduling

    Full text link

    Incremental and Modular Context-sensitive Analysis

    Full text link
    Context-sensitive global analysis of large code bases can be expensive, which can make its use impractical during software development. However, there are many situations in which modifications are small and isolated within a few components, and it is desirable to reuse as much as possible previous analysis results. This has been achieved to date through incremental global analysis fixpoint algorithms that achieve cost reductions at fine levels of granularity, such as changes in program lines. However, these fine-grained techniques are not directly applicable to modular programs, nor are they designed to take advantage of modular structures. This paper describes, implements, and evaluates an algorithm that performs efficient context-sensitive analysis incrementally on modular partitions of programs. The experimental results show that the proposed modular algorithm shows significant improvements, in both time and memory consumption, when compared to existing non-modular, fine-grain incremental analysis techniques. Furthermore, thanks to the proposed inter-modular propagation of analysis information, our algorithm also outperforms traditional modular analysis even when analyzing from scratch.Comment: 56 pages, 27 figures. To be published in Theory and Practice of Logic Programming. v3 corresponds to the extended version of the ICLP2018 Technical Communication. v4 is the revised version submitted to Theory and Practice of Logic Programming. v5 (this one) is the final author version to be published in TPL

    Adaptive Transactional Memories: Performance and Energy Consumption Tradeoffs

    Get PDF
    Energy efficiency is becoming a pressing issue, especially in large data centers where it entails, at the same time, a non-negligible management cost, an enhancement of hardware fault probability, and a significant environmental footprint. In this paper, we study how Software Transactional Memories (STM) can provide benefits on both power saving and the overall applications’ execution performance. This is related to the fact that encapsulating shared-data accesses within transactions gives the freedom to the STM middleware to both ensure consistency and reduce the actual data contention, the latter having been shown to affect the overall power needed to complete the application’s execution. We have selected a set of self-adaptive extensions to existing STM middlewares (namely, TinySTM and R-STM) to prove how self-adapting computation can capture the actual degree of parallelism and/or logical contention on shared data in a better way, enhancing even more the intrinsic benefits provided by STM. Of course, this benefit comes at a cost, which is the actual execution time required by the proposed approaches to precisely tune the execution parameters for reducing power consumption and enhancing execution performance. Nevertheless, the results hereby provided show that adaptivity is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all

    Active Processor Scheduling Using Evolution Algorithms

    Get PDF
    The allocation of processes to processors has long been of interest to engineers. The processor allocation problem considered here assigns multiple applications onto a computing system. With this algorithm researchers could more efficiently examine real-time sensor data like that used by United States Air Force digital signal processing efforts or real-time aerosol hazard detection as examined by the Department of Homeland Security. Different choices for the design of a load balancing algorithm are examined in both the problem and algorithm domains. Evolutionary algorithms are used to find near-optimal solutions. These algorithms incorporate multiobjective coevolutionary and parallel principles to create an effective and efficient algorithm for real-world allocation problems. Three evolutionary algorithms (EA) are developed. The primary algorithm generates a solution to the processor allocation problem. This allocation EA is capable of evaluating objectives in both an aggregate single objective and a Pareto multiobjective manner. The other two EAs are designed for fine turning returned allocation EA solutions. One coevolutionary algorithm is used to optimize the parameters of the allocation algorithm. This meta-EA is parallelized using a coarse-grain approach to improve performance. Experiments are conducted that validate the improved effectiveness of the parallelized algorithm. Pareto multiobjective approach is used to optimize both effectiveness and efficiency objectives. The other coevolutionary algorithm generates difficult allocation problems for testing the capabilities of the allocation EA. The effectiveness of both coevolutionary algorithms for optimizing the allocation EA is examined quantitatively using standard statistical methods. Also the allocation EAs objective tradeoffs are analyzed and compared
    • …
    corecore