14,370 research outputs found
The future of computing beyond Moore's Law.
Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Efficient multicore-aware parallelization strategies for iterative stencil computations
Stencil computations consume a major part of runtime in many scientific
simulation codes. As prototypes for this class of algorithms we consider the
iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient
parallel implementations for cache-based multicore architectures. Temporal
cache blocking is a known advanced optimization technique, which can reduce the
pressure on the memory bus significantly. We apply and refine this optimization
for a recently presented temporal blocking strategy designed to explicitly
utilize multicore characteristics. Especially for the case of Gauss-Seidel
smoothers we show that simultaneous multi-threading (SMT) can yield substantial
performance improvements for our optimized algorithm.Comment: 15 pages, 10 figure
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
Improving the Efficiency of FP-LAPW Calculations
The full-potential linearized augmented-plane wave (FP-LAPW) method is well
known to enable most accurate calculations of the electronic structure and
magnetic properties of crystals and surfaces. The implementation of atomic
forces has greatly increased it's applicability, but it is still generally
believed that FP-LAPW calculations require substantial higher computational
effort compared to the pseudopotential plane wave (PPW) based methods.
In the present paper we analyse the FP-LAPW method from a computational point
of view. Starting from an existing implementation (WIEN95 code), we identified
the time consuming parts and show how some of them can be formulated more
efficiently. In this context also the hardware architecture plays a crucial
role. The remaining computational effort is mainly determined by the setup and
diagonalization of the Hamiltonian matrix. For the latter, two different
iterative schemes are compared. The speed-up gained by these optimizations is
compared to the runtime of the ``original'' version of the code, and the PPW
approach. We expect that the strategies described here, can also be used to
speed up other computer codes, where similar tasks must be performed.Comment: 20 pages, 3 figures. Appears in Comp. Phys. Com. Other related
publications can be found at http://www.rz-berlin.mpg.de/th/paper.htm
Scalability Analysis of Parallel GMRES Implementations
Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics
- …