14,370 research outputs found

    The future of computing beyond Moore's Law.

    Get PDF
    Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

    Efficient multicore-aware parallelization strategies for iterative stencil computations

    Full text link
    Stencil computations consume a major part of runtime in many scientific simulation codes. As prototypes for this class of algorithms we consider the iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient parallel implementations for cache-based multicore architectures. Temporal cache blocking is a known advanced optimization technique, which can reduce the pressure on the memory bus significantly. We apply and refine this optimization for a recently presented temporal blocking strategy designed to explicitly utilize multicore characteristics. Especially for the case of Gauss-Seidel smoothers we show that simultaneous multi-threading (SMT) can yield substantial performance improvements for our optimized algorithm.Comment: 15 pages, 10 figure

    Improving the Efficiency of FP-LAPW Calculations

    Full text link
    The full-potential linearized augmented-plane wave (FP-LAPW) method is well known to enable most accurate calculations of the electronic structure and magnetic properties of crystals and surfaces. The implementation of atomic forces has greatly increased it's applicability, but it is still generally believed that FP-LAPW calculations require substantial higher computational effort compared to the pseudopotential plane wave (PPW) based methods. In the present paper we analyse the FP-LAPW method from a computational point of view. Starting from an existing implementation (WIEN95 code), we identified the time consuming parts and show how some of them can be formulated more efficiently. In this context also the hardware architecture plays a crucial role. The remaining computational effort is mainly determined by the setup and diagonalization of the Hamiltonian matrix. For the latter, two different iterative schemes are compared. The speed-up gained by these optimizations is compared to the runtime of the ``original'' version of the code, and the PPW approach. We expect that the strategies described here, can also be used to speed up other computer codes, where similar tasks must be performed.Comment: 20 pages, 3 figures. Appears in Comp. Phys. Com. Other related publications can be found at http://www.rz-berlin.mpg.de/th/paper.htm

    Scalability Analysis of Parallel GMRES Implementations

    Get PDF
    Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics
    • …
    corecore