84,100 research outputs found
Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems
One-sided dense matrix decompositions (e.g., Cholesky, LU, and QR) are the
key components in scientific computing in many different fields. Although their
design has been highly optimized for modern processors, they still consume a
considerable amount of energy. As CPU-GPU heterogeneous systems are commonly
used for matrix decompositions, in this work, we aim to further improve the
energy saving of one-sided matrix decompositions on CPU-GPU heterogeneous
systems. We first build an Algorithm-Based Fault Tolerance protected
overclocking technique (ABFT-OC) to enable us to exploit reliable overclocking
for key matrix decomposition operations. Then, we design an energy-saving
matrix decomposition framework, Bi-directional Slack Reclamation(BSR), that can
intelligently combine the capability provided by ABFT-OC and DVFS to maximize
energy saving and maintain performance and reliability. Experiments show that
BSR is able to save up to 11.7% more energy compared with the current best
energy saving optimization approach with no performance degradation and up to
14.1% Energy * Delay^2 reduction. Also, BSR enables the Pareto efficient
performance-energy trade-off, which is able to provide up to 1.43x performance
improvement without costing extra energy
Direct -body code on low-power embedded ARM GPUs
This work arises on the environment of the ExaNeSt project aiming at design
and development of an exascale ready supercomputer with low energy consumption
profile but able to support the most demanding scientific and technical
applications. The ExaNeSt compute unit consists of densely-packed low-power
64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are
heterogeneous architecture where computing power is supplied both by CPUs and
GPUs, and are emerging as a possible low-power and low-cost alternative to
clusters based on traditional CPUs. A state-of-the-art direct -body code
suitable for astrophysical simulations has been re-engineered in order to
exploit SoC heterogeneous platforms based on ARM CPUs and embedded GPUs.
Performance tests show that embedded GPUs can be effectively used to accelerate
real-life scientific calculations, and that are promising also because of their
energy efficiency, which is a crucial design in future exascale platforms.Comment: 16 pages, 7 figures, 1 table, accepted for publication in the
Computing Conference 2019 proceeding
The future of computing beyond Moore's Law.
Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
3E: Energy-Efficient Elastic Scheduling for Independent Tasks in Heterogeneous Computing Systems
Reducing energy consumption is a major design constraint for modern heterogeneous computing systems to minimize electricity cost, improve system reliability and protect environment. Conventional energy-efficient scheduling strategies developed on these systems do not sufficiently exploit the system elasticity and adaptability for maximum energy savings, and do not simultaneously take account of user expected finish time. In this paper, we develop a novel scheduling strategy named energy-efficient elastic (3E) scheduling for aperiodic, independent and non-real-time tasks with user expected finish times on DVFS-enabled heterogeneous computing systems. The 3E strategy adjusts processors’ supply voltages and frequencies according to the system workload, and makes trade-offs between energy consumption and user expected finish times. Compared with other energy-efficient strategies, 3E significantly improves the scheduling quality and effectively enhances the system elasticity
- …