7 research outputs found
The Green500 List: Escapades to Exascale
Energy efficiency is now a top priority. The first
four years of the Green500 have seen the importance of en-
ergy efficiency in supercomputing grow from an afterthought
to the forefront of innovation as we near a point where sys-
tems will be forced to stop drawing more power. Even so,
the landscape of efficiency in supercomputing continues to
shift, with new trends emerging, and unexpected shifts in
previous predictions.
This paper offers an in-depth analysis of the new and
shifting trends in the Green500. In addition, the analysis of-
fers early indications of the track we are taking toward exas-
cale, and what an exascale machine in 2018 is likely to look
like. Lastly, we discuss the new efforts and collaborations
toward designing and establishing better metrics, method-
ologies and workloads for the measurement and analysis of
energy-efficient supercomputing
Taming Multi-core Parallelism with Concurrent Mixin Layers
The recent shift in computer system design to multi-core technology requires that the developer leverage explicit parallel programming techniques in order to utilize available performance. Nevertheless, developing the requisite parallel applications remains a prohibitively-difficult undertaking, particularly for the general programmer. To mitigate many of the challenges in creating concurrent software, this paper introduces a new parallel programming methodology that leverages feature-oriented programming (FOP) to logically decompose a product line architecture (PLA) into concurrent execution units. In addition, our efficient implementation of this methodology, that we call concurrent mixin layers, uses a layered architecture to facilitate the development of parallel applications. To validate our methodology and accompanying implementation, we present a case study of a product line of multimedia applications deployed within a typical multi-core environment. Our performance results demonstrate that a product line can be effectively transformed into parallel applications capable of utilizing multiple cores, thus improving performance. Furthermore, concurrent mixin layers significantly reduces the complexity of parallel programming by eliminating the need for the programmer to introduce explicit low-level concurrency control. Our initial experience gives us reason to believe that concurrent mixin layers is a promising technique for taming parallelism in multi-core environments
Accelerating Electrostatic Surface Potential Calculation with Multiscale Approximation on Graphics Processing Units
Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. This paper demonstrates how one can take advantage of graphic processing units (GPUs) available in today’s typical desktop computer, together with a multiscale approximation method, to significantly speedup such computations. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is implemented on an ATI Radeon 4870 GPU in combination with the hierarchical charge partitioning (HCP) multiscale approximation. This implementation delivers a combined 1800-fold speedup for a 476,040 atom viral capsid
GPU First -- Execution of Legacy CPU Codes on GPUs
Utilizing GPUs is critical for high performance on heterogeneous systems.
However, leveraging the full potential of GPUs for accelerating legacy CPU
applications can be a challenging task for developers. The porting process
requires identifying code regions amenable to acceleration, managing distinct
memories, synchronizing host and device execution, and handling library
functions that may not be directly executable on the device. This complexity
makes it challenging for non-experts to leverage GPUs effectively, or even to
start offloading parts of a large legacy application.
In this paper, we propose a novel compilation scheme called "GPU First" that
automatically compiles legacy CPU applications directly for GPUs without any
modification of the application source. Library calls inside the application
are either resolved through our partial libc GPU implementation or via
automatically generated remote procedure calls to the host. Our approach
simplifies the task of identifying code regions amenable to acceleration and
enables rapid testing of code modifications on actual GPU hardware in order to
guide porting efforts.
Our evaluation on two HPC proxy applications with OpenMP CPU and GPU
parallelism, four micro benchmarks with originally GPU only parallelism, as
well as three benchmarks from the SPEC OMP 2012 suite featuring hand-optimized
OpenMP CPU parallelism showcases the simplicity of porting host applications to
the GPU. For existing parallel loops, we often match the performance of
corresponding manually offloaded kernels, with up to 14.36x speedup on the GPU,
validating that our GPU First methodology can effectively guide porting efforts
of large legacy applications
Defer Mechanism for {C}
International audienceThe defer mechanism can restore a previously known property or invariant that is altered duringthe processing of a code block. The defer mechanism is useful for paired operations, where oneoperation is performed at the start of a code block and the paired operation is performed beforeexiting the block. Because blocks can be exited using a variety of mechanisms, operations arefrequently paired incorrectly. The defer mechanism in C is intended to help ensure the properpairing of these operations. This pattern is common in resource management, synchronization,and outputting balanced strings (e.g., parenthesis or HTML).A separable feature of the defer mechanism is a panic/recover mechanism that allows errorhandling at a distance
Defer Mechanism for {C}
International audienceThe defer mechanism can restore a previously known property or invariant that is altered duringthe processing of a code block. The defer mechanism is useful for paired operations, where oneoperation is performed at the start of a code block and the paired operation is performed beforeexiting the block. Because blocks can be exited using a variety of mechanisms, operations arefrequently paired incorrectly. The defer mechanism in C is intended to help ensure the properpairing of these operations. This pattern is common in resource management, synchronization,and outputting balanced strings (e.g., parenthesis or HTML).A separable feature of the defer mechanism is a panic/recover mechanism that allows errorhandling at a distance