Search CORE

13,296 research outputs found

Load-Varying LINPACK: A Benchmark for Evaluating Energy Efficiency in High-End Computing

Author: Feng Wu-chun
Subramaniam Balaji
Publication venue
Publication date: 01/12/2010
Field of study

For decades, performance has driven the high-end computing (HEC) community. However, as highlighted in recent exascale studies that chart a path from petascale to exascale computing, power consumption is fast becoming the major design constraint in HEC. Consequently, the HEC community needs to address this issue in future petascale and exascale computing systems. Current scientific benchmarks, such as LINPACK and SPEChpc, only evaluate HEC systems when running at full throttle, i.e., 100% workload, resulting in a focus on performance and ignoring the issues of power and energy consumption. In contrast, efforts like SPECpower evaluate the energy efficiency of a compute server at varying workloads. This is analogous to evaluating the energy efficiency (i.e., fuel efficiency) of an automobile at varying speeds (e.g., miles per gallon highway versus city). SPECpower, however, only evaluates the energy efficiency of a single compute server rather than a HEC system; furthermore, it is based on SPEC's Java Business Benchmarks (SPECjbb) rather than a scientific benchmark. Given the absence of a load-varying scientific benchmark to evaluate the energy efficiency of HEC systems at different workloads, we propose the load-varying LINPACK (LV-LINPACK) benchmark. In this paper, we identify application parameters that affect performance and provide a methodology to vary the workload of LINPACK, thus enabling a more rigorous study of energy efficiency in supercomputers, or more generally, HEC

Computer Science Technical Reports @Virginia Tech

Towards Energy-Proportional Computing for Enterprise-Class Server Workloads

Author: Feng Wu-chun
Subramaniam Balaji
Publication venue
Publication date: 01/01/2012
Field of study

Massive data centers housing thousands of computing nodes have become commonplace in enterprise computing, and the power consumption of such data centers is growing at an unprecedented rate. Adding to the problem is the inability of the servers to exhibit energy proportionality, i.e., provide energy-ecient execution under all levels of utilization, which diminishes the overall energy eciency of the data center. It is imperative that we realize eective strategies to control the power consumption of the server and improve the energy eciency of data centers. With the advent of Intel Sandy Bridge processors, we have the ability to specify a limit on power consumption during runtime, which creates opportunities to design new power-management techniques for enterprise workloads and make the systems that they run on more energy-proportional. In this paper, we investigate whether it is possible to achieve energy proportionality for an enterprise-class server workload, namely SPECpower ssj2008 benchmark, by using Intel's Running Average Power Limit (RAPL) interfaces. First, we analyze the power consumption and characterize the instantaneous power prole of the SPECpower benchmark at a subsystem-level using the on-chip energy meters exposed via the RAPL interfaces. We then analyze the impact of RAPL power limiting on the performance, per-transaction response time, power consumption, and energy eciency of the benchmark under dierent load levels. Our observations and results shed light on the ecacy of the RAPL interfaces and provide guidance for designing power-management techniques for enterprise-class workloads

Computer Science Technical Reports @Virginia Tech

Bipolaronic blockade effect in quantum dots with negative charging energy

Author: Fang Tie-Feng
Niu Chun-Jiang
Sun Qing-feng
Zhang Shu-Feng
Publication venue: 'IOP Publishing'
Publication date: 04/11/2013
Field of study

We investigate single-electron transport through quantum dots with negative charging energy induced by a polaronic energy shift. For weak dot-lead tunnel couplings, we demonstrate a bipolaronic blockade effect at low biases which suppresses the oscillating linear conductance, while the conductance resonances under large biases are enhanced. Novel conductance plateau develops when the coupling asymmetry is introduced, with its height and width tuned by the coupling strength and external magnetic field. It is further shown that the amplitude ratio of magnetic-split conductance peaks changes from 3 to 1for increasing coupling asymmetry. Though we demonstrate all these transport phenomena in the low-order single-electron tunneling regime, they are already strikingly different from the usual Coulomb blockade physics and are easy to observe experimentally.Comment: 6 pages, 5 figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures

Author: Feng Wu-chun
Gardner Mark
Martinez Gabriel
Publication venue
Publication date: 01/01/2011
Field of study

The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendorneutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a “write once, run anywhere” ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to- OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of our automatically translated applications via CU2CL is on par with their manually ported countparts

Computer Science Technical Reports @Virginia Tech

CiteSeerX

The Green500 List: Escapades to Exascale

Author: Feng Wu-chun
Scogland Tom
Subramaniam Balaji
Publication venue
Publication date: 01/01/2011
Field of study

Energy efﬁciency is now a top priority. The ﬁrst four years of the Green500 have seen the importance of en- ergy efﬁciency in supercomputing grow from an afterthought to the forefront of innovation as we near a point where sys- tems will be forced to stop drawing more power. Even so, the landscape of efﬁciency in supercomputing continues to shift, with new trends emerging, and unexpected shifts in previous predictions. This paper offers an in-depth analysis of the new and shifting trends in the Green500. In addition, the analysis of- fers early indications of the track we are taking toward exas- cale, and what an exascale machine in 2018 is likely to look like. Lastly, we discuss the new efforts and collaborations toward designing and establishing better metrics, method- ologies and workloads for the measurement and analysis of energy-efﬁcient supercomputing

Computer Science Technical Reports @Virginia Tech

Crossref

Convergence of the Ginzburg-Landau approximation for the Ericksen-Leslie system

Author: Feng Zhewen
Hong Min-Chun
Mei Yu
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 13/06/2019
Field of study

We establish the local well-posedness of the general Ericksen-Leslie system in liquid crystals with the initial velocity and director field in

H^1 \times H_b^2

. In particular, we prove that the solutions of the Ginzburg-Landau approximation system converge smoothly to the solution of the Ericksen-Leslie system for any

t \in (0,T^\ast)

with a maximal existence time

T^\ast

of the Ericksen- Leslie system

arXiv.org e-Print Archive

University of Queensland eSpace