2,459 research outputs found
Building scalable software systems in the multicore era
Software systems must face two challenges today: growing complexity and increasing parallelism in the underlying computational models. The problem of increased complexity is often solved by dividing systems into modules in a way that permits analysis of these modules in isolation. The problem of lack of concurrency is often tackled by dividing system execution into tasks that permits execution of these tasks in isolation. The key challenge in software design is to manage the explicit and implicit dependence between modules that decreases modularity. The key challenge for concurrency is to manage the explicit and implicit dependence between tasks that decreases parallelism. Even though these challenges appear to be strikingly similar, current software design practices and languages do not take advantage of this similarity. The net effect is that the modularity and concurrency goals are often tackled mutually exclusively. Making progress towards one goal does not naturally contribute towards the other. My position is that for programmers that are not formally and rigorously trained in the concurrency discipline the safest and most productive way to get scalability in their software is by improving modularity of their software using programming language features and design practices that reconcile modularity and concurrency goals. I briefly discuss preliminary efforts of my group, but we have only touched the tip of the iceberg
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
A review of High Performance Computing foundations for scientists
The increase of existing computational capabilities has made simulation
emerge as a third discipline of Science, lying midway between experimental and
purely theoretical branches [1, 2]. Simulation enables the evaluation of
quantities which otherwise would not be accessible, helps to improve
experiments and provides new insights on systems which are analysed [3-6].
Knowing the fundamentals of computation can be very useful for scientists, for
it can help them to improve the performance of their theoretical models and
simulations. This review includes some technical essentials that can be useful
to this end, and it is devised as a complement for researchers whose education
is focused on scientific issues and not on technological respects. In this
document we attempt to discuss the fundamentals of High Performance Computing
(HPC) [7] in a way which is easy to understand without much previous
background. We sketch the way standard computers and supercomputers work, as
well as discuss distributed computing and discuss essential aspects to take
into account when running scientific calculations in computers.Comment: 33 page
Sniper: scalable and accurate parallel multi-core simulation
Sniper is a next generation parallel, high-speed and accurate x86 simulator. This multi-core simulator is based on the interval core model and the Graphite simulation infrastructure, allowing for fast and accurate simulation and for trading off simulation speed for accuracy to allow a range of flexible simulation options when exploring different homogeneous and heterogeneous multi-core architectures. The Sniper simulator allows one to perform timing simulations for both multi-programmed workloads and multi-threaded, shared-memory applications running on 10s to 100+ cores, at a high speed when compared to existing simulators. The main feature of the simulator is its core model which is based on interval simulation, a fast mechanistic core model. Interval simulation raises the level of abstraction in architectural simulation which allows for faster simulator development and evaluation times; it does so by âjumpingâ between miss events, called intervals. Sniper has been validated against multi-socket Intel Core2 and Nehalem systems and provides average performance prediction errors within 25% at a simulation speed of up to several MIPS. This simulator, and the interval core model, is useful for uncore and system-level studies that require more detail than the typical one-IPC models, but for which cycle-accurate simulators are too slow to allow workloads of meaningful sizes to be simulated. As an added benefit, the interval core model allows the generation of CPI stacks, which show the number of cycles lost due to different characteristics of the system, like the cache hierarchy or branch predictor, and lead to a better understanding of each componentâs effect on total system performance. This extends the use for Sniper to application characterization and hardware/software co-design. The Sniper simulator is available for download at http://snipersim.org and can be used freely for academic research
Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs
Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining a moderate number of unknowns. However, flexibility in the meshes goes along with severe drawbacks with respect to parallel execution â especially with respect to the definition of adequate smoothers. This point becomes in particular pronounced in the framework of fine-grained parallelism on GPUs with hundreds of execution units. We use the approach of matrixbased multigrid that has high flexibility and adapts well to the exigences of modern computing platforms. In this work we investigate multi-colored GauĂ-Seidel type smoothers, the power(q)-pattern enhanced multi-colored ILU(p) smoothers with fillins
- âŠ