5,058 research outputs found
Load Balancing Regular Meshes on SMPS with MPI
Domain decomposition for regular meshes on parallel computers has
traditionally been performed by attempting to exactly partition the work among the available processors (now cores). However, these
strategies often do not consider the inherent system noise which can hinder MPI application scalability to emerging peta-scale machines
with 10000+ nodes. In this work, we suggest a solution that uses a tunable hybrid static/dynamic scheduling strategy that can be incorporated into current MPI implementations of mesh codes. By applying this strategy to a 3D jacobi algorithm, we achieve performance gains
of at least 16% for 64 SMP nodes
A fine-grain time-sharing Time Warp system
Although Parallel Discrete Event Simulation (PDES) platforms relying on the Time Warp (optimistic) synchronization
protocol already allow for exploiting parallelism, several techniques have been proposed to
further favor performance. Among them we can mention optimized approaches for state restore, as well as
techniques for load balancing or (dynamically) controlling the speculation degree, the latter being specifically
targeted at reducing the incidence of causality errors leading to waste of computation. However, in
state of the art Time Warp systems, events’ processing is not preemptable, which may prevent the possibility
to promptly react to the injection of higher priority (say lower timestamp) events. Delaying the processing
of these events may, in turn, give rise to higher incidence of incorrect speculation. In this article we present
the design and realization of a fine-grain time-sharing Time Warp system, to be run on multi-core Linux
machines, which makes systematic use of event preemption in order to dynamically reassign the CPU to
higher priority events/tasks. Our proposal is based on a truly dual mode execution, application vs platform,
which includes a timer-interrupt based support for bringing control back to platform mode for possible CPU
reassignment according to very fine grain periods. The latter facility is offered by an ad-hoc timer-interrupt
management module for Linux, which we release, together with the overall time-sharing support, within the
open source ROOT-Sim platform. An experimental assessment based on the classical PHOLD benchmark and
two real world models is presented, which shows how our proposal effectively leads to the reduction of the
incidence of causality errors, as compared to traditional Time Warp, especially when running with higher
degrees of parallelism
Incremental and Modular Context-sensitive Analysis
Context-sensitive global analysis of large code bases can be expensive, which
can make its use impractical during software development. However, there are
many situations in which modifications are small and isolated within a few
components, and it is desirable to reuse as much as possible previous analysis
results. This has been achieved to date through incremental global analysis
fixpoint algorithms that achieve cost reductions at fine levels of granularity,
such as changes in program lines. However, these fine-grained techniques are
not directly applicable to modular programs, nor are they designed to take
advantage of modular structures. This paper describes, implements, and
evaluates an algorithm that performs efficient context-sensitive analysis
incrementally on modular partitions of programs. The experimental results show
that the proposed modular algorithm shows significant improvements, in both
time and memory consumption, when compared to existing non-modular, fine-grain
incremental analysis techniques. Furthermore, thanks to the proposed
inter-modular propagation of analysis information, our algorithm also
outperforms traditional modular analysis even when analyzing from scratch.Comment: 56 pages, 27 figures. To be published in Theory and Practice of Logic
Programming. v3 corresponds to the extended version of the ICLP2018 Technical
Communication. v4 is the revised version submitted to Theory and Practice of
Logic Programming. v5 (this one) is the final author version to be published
in TPL
Adaptive Transactional Memories: Performance and Energy Consumption Tradeoffs
Energy efficiency is becoming a pressing issue, especially in large data centers where it entails, at the same time, a non-negligible management cost, an enhancement of hardware fault probability, and a significant environmental footprint. In this paper, we study how Software Transactional Memories (STM) can provide benefits on both power saving and the overall applications’ execution performance. This is related to the fact that encapsulating shared-data accesses within transactions gives the freedom to the STM middleware to both ensure consistency and reduce the actual data contention, the latter having been shown to affect the overall power needed to complete the application’s execution.
We have selected a set of self-adaptive extensions to existing STM middlewares (namely, TinySTM and R-STM) to prove how self-adapting computation can capture the actual degree of parallelism and/or logical contention on shared data in a better way, enhancing even more the intrinsic benefits provided by STM. Of course, this benefit comes at a cost, which is the actual execution time required by the proposed approaches to precisely tune the execution parameters for reducing power consumption and enhancing execution performance. Nevertheless, the results hereby provided show that adaptivity is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all
Active Processor Scheduling Using Evolution Algorithms
The allocation of processes to processors has long been of interest to engineers. The processor allocation problem considered here assigns multiple applications onto a computing system. With this algorithm researchers could more efficiently examine real-time sensor data like that used by United States Air Force digital signal processing efforts or real-time aerosol hazard detection as examined by the Department of Homeland Security. Different choices for the design of a load balancing algorithm are examined in both the problem and algorithm domains. Evolutionary algorithms are used to find near-optimal solutions. These algorithms incorporate multiobjective coevolutionary and parallel principles to create an effective and efficient algorithm for real-world allocation problems. Three evolutionary algorithms (EA) are developed. The primary algorithm generates a solution to the processor allocation problem. This allocation EA is capable of evaluating objectives in both an aggregate single objective and a Pareto multiobjective manner. The other two EAs are designed for fine turning returned allocation EA solutions. One coevolutionary algorithm is used to optimize the parameters of the allocation algorithm. This meta-EA is parallelized using a coarse-grain approach to improve performance. Experiments are conducted that validate the improved effectiveness of the parallelized algorithm. Pareto multiobjective approach is used to optimize both effectiveness and efficiency objectives. The other coevolutionary algorithm generates difficult allocation problems for testing the capabilities of the allocation EA. The effectiveness of both coevolutionary algorithms for optimizing the allocation EA is examined quantitatively using standard statistical methods. Also the allocation EAs objective tradeoffs are analyzed and compared
- …