Search CORE

110,676 research outputs found

Iso-energy-efficiency: An approach to power-constrained parallel computation

Author: Ge Rong
Kirk Cameron
Song Shuaiwen
Su Chunyi
Vishnu Abhinav
Publication venue
Publication date: 01/01/2010
Field of study

Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making

Computer Science Technical Reports @Virginia Tech

EOS software evolution enabling LHC Run 3

Author: Amadio Guilherme
Caffy Cedric
Guenther Jaroslav
Lekshmanan Abhishek
Mascetti Luca
Peters Andreas
Reis Manuel
Simon Michal Kamil
Sindrilaru Elvin
Smith David
Publication venue: EDP Sciences
Publication date: 01/01/2024
Field of study

EOS has been the main storage system at CERN for more than a decade, continuously improving in order to meet the ever evolving requirements of the LHC experiments and the whole physics user community. In order to satisfy the demands of LHC Run-3, in terms of storage performance and tradeoff between cost and capacity, EOS was enhanced with a set of new functionalities and features that we will detail in this paper. First of all, we describe the use of erasure coded layouts in a large-scale deployment which enables an efficient use of available storage capacity, while at the same time providing end-users with better throughput when accessing their data. This new operating model implies more coupling between the machines in a cluster, which in turn leads to the next set of EOS improvements that we discuss, targeting I/O traffic shaping, better I/O scheduling policies and tagged traffic prioritization. Increasing the size of the EOS clusters to cope with experiment demands, means stringent constraints on the data integrity and durability that we addressed by a re-designed consistency check engine. Another focus area of EOS development was to minimize the operational load by making the internal operational procedures (draining, balancing or conversions) more robust and efficient, to allow managing easily multiple clusters and avoid possible scaling issues. All these improvements available in the EOS 5 release series, are coupled with the new XRootD 5 framework which brings additional security features like TLS support and optimizations for large data transfers like page read and page write functionalities. Last but not least, the area of authentication/authorization methods has seen important developments by adding support for different types of bearer tokens that we will describe along with EOS specific token extensions. We conclude by highlighting potential areas of the EOS architecture that might require further developments or re-design in order to cope with the ever-increasing demands of our end-users

Directory of Open Access Journals

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

Author: Chen Yu-Hsin
Emer Joel
Sze Vivienne
Yang Tien-Ju
Publication venue
Publication date: 20/05/2019
Field of study

A recent trend in DNN development is to extend the reach of deep learning applications to platforms that are more resource and energy constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency, and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity. These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes, and often require specialized hardware to exploit sparsity for performance improvement. Thus, many DNN accelerators designed for large DNNs do not perform well on these models. In this work, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations, and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65nm CMOS process achieves a throughput of 1470.6 inferences/sec and 2560.3 inferences/J at a batch size of 1, which is 12.6x faster and 2.5x more energy efficient than the original Eyeriss running MobileNet. We also present an analysis methodology called Eyexam that provides a systematic way of understanding the performance limits for DNN processors as a function of specific characteristics of the DNN model and accelerator design; it applies these characteristics as sequential steps to increasingly tighten the bound on the performance limits.Comment: accepted for publication in IEEE Journal on Emerging and Selected Topics in Circuits and Systems. This extended version on arXiv also includes Eyexam in the appendi

arXiv.org e-Print Archive

DSpace@MIT