Search CORE

14 research outputs found

Green HPC: Optimizing Software Stack Energy Efficiency of Large Data Systems

Author: Wilkins Grant
Publication venue: Clemson University Libraries
Publication date: 01/05/2023
Field of study

High-performance computing (HPC) is indispensable in modern scientific research and industry applications, but its energy consumption is a growing concern. This thesis presents two novel approaches to optimize energy consumption in large data systems. The first chapter of the thesis will discuss the use of Dynamic Voltage and Frequency Scaling (DVFS) to optimize the energy efficiency of two popular lossy compression algorithms: SZ and ZFP. By adjusting the voltage and frequency levels of computing resources, DVFS can reduce energy consumption while maintaining the desired level of performance and accuracy. The second chapter of the thesis will focus on a detailed comparison and analysis of asynchronous and synchronous checkpointing energy consumption using the VELOC and GenericIO libraries. The study investigates the trade-offs between these two checkpointing techniques, offering insights into their energy consumption patterns and performance impacts on large-scale HPC systems. Based on the analysis, we provide recommendations for choosing the most energy-efficient checkpointing method for specific application scenarios. Together, these two approaches contribute to the development of Green HPC, paving the way for more sustainable and energy-efficient large data systems. This thesis will provide valuable insights for researchers and industry practitioners aiming to optimize energy consumption while maintaining high-performance computing capabilities. i

Clemson University: TigerPrints

Big Data Application and System Co-optimization in Cloud and HPC Environment

Author: Wang Xinying
Publication venue
Publication date: 28/06/2022
Field of study

The emergence of big data requires powerful computational resources and memory subsystems that can be scaled efficiently to accommodate its demands. Cloud is a new well-established computing paradigm that can offer customized computing and memory resources to meet the scalable demands of big data applications. In addition, the flexible pay-as-you-go pricing model offers opportunities for using large scale of resources with low cost and no infrastructure maintenance burdens. High performance computing (HPC) on the other hand also has powerful infrastructure that has potential to support big data applications. In this dissertation, we explore the application and system co-optimization opportunities to support big data in both cloud and HPC environments. Specifically, we explore the unique features of both application and system to seek overlooked optimization opportunities or tackle challenges that are difficult to be addressed by only looking at the application or system individually. Based on the characteristics of the workloads and their underlying systems to derive the optimized deployment and runtime schemes, we divide the workflow into four categories: 1) memory intensive applications; 2) compute intensive applications; 3) both memory and compute intensive applications; 4) I/O intensive applications.When deploying memory intensive big data applications to the public clouds, one important yet challenging problem is selecting a specific instance type whose memory capacity is large enough to prevent out-of-memory errors while the cost is minimized without violating performance requirements. In this dissertation, we propose two techniques for efficient deployment of big data applications with dynamic and intensive memory footprint in the cloud. The first approach builds a performance-cost model that can accurately predict how, and by how much, virtual memory size would slow down the application and consequently, impact the overall monetary cost. The second approach employs a lightweight memory usage prediction methodology based on dynamic meta-models adjusted by the application's own traits. The key idea is to eliminate the periodical checkpointing and migrate the application only when the predicted memory usage exceeds the physical allocation. When applying compute intensive applications to the clouds, it is critical to make the applications scalable so that it can benefit from the massive cloud resources. In this dissertation, we first use the Kirchhoff law, which is one of the most widely used physical laws in many engineering principles, as an example workload for our study. The key challenge of applying the Kirchhoff law to real-world applications at scale lies in the high, if not prohibitive, computational cost to solve a large number of nonlinear equations. In this dissertation, we propose a high-performance deep-learning-based approach for Kirchhoff analysis, namely HDK. HDK employs two techniques to improve the performance: (i) early pruning of unqualified input candidates which simplify the equation and select a meaningful input data range; (ii) parallelization of forward labelling which execute steps of the problem in parallel. When it comes to both memory and compute intensive applications in clouds, we use blockchain system as a benchmark. Existing blockchain frameworks exhibit a technical barrier for many users to modify or test out new research ideas in blockchains. To make it worse, many advantages of blockchain systems can be demonstrated only at large scales, which are not always available to researchers. In this dissertation, we develop an accurate and efficient emulating system to replay the execution of large-scale blockchain systems on tens of thousands of nodes in the cloud. For I/O intensive applications, we observe one important yet often neglected side effect of lossy scientific data compression. Lossy compression techniques have demonstrated promising results in significantly reducing the scientific data size while guaranteeing the compression error bounds, but the compressed data size is often highly skewed and thus impact the performance of parallel I/O. Therefore, we believe it is critical to pay more attention to the unbalanced parallel I/O caused by lossy scientific data compression

University of Nevada, Reno ScholarWorks Repository

From detection to optimization: impact of soft errors on high-performance computing applications

Author: Calhoun Jon Cameron
Publication venue
Publication date: 01/08/2017
Field of study

As high-performance computing (HPC) continues to progress, constraints on HPC system design forces the handling of errors to higher levels in the software stack. Of the types of errors facing HPC, soft errors that silently corrupt system or application state are among the most severe. The behavior of HPC applications in the presence of soft errors is critical to gain insight for effective utilization of HPC systems. The need to understand this behavior can be used in developing algorithm-based error detection guided by application characteristics from fault injection and error propagation studies. Furthermore, the realization that applications are tolerant to small errors allows optimizations such as lossy compression on high-cost data transfers. Lossy compression adds small user controllable amounts of error when compressing data, to reduce data size before expensive data transfers saving time. This dissertation investigates and improves the resiliency of HPC applications to soft errors, and explores lossy compression as a new form of optimization for expensive, time-consuming data transfers

Illinois Digital Environment for Access to Learning and Scholarship Repository

Approachable Error Bounded Lossy Compression

Author: Underwood Robert Raymond
Publication venue: Clemson University Libraries
Publication date: 01/12/2021
Field of study

Compression is commonly used in HPC applications to move and store data. Traditional lossless compression, however, does not provide adequate compression of floating point data often found in scientific codes. Recently, researchers and scientists have turned to lossy compression techniques that approximate the original data rather than reproduce it in order to achieve desired levels of compression. Typical lossy compressors do not bound the errors introduced into the data, leading to the development of error bounded lossy compressors (EBLC). These tools provide the desired levels of compression as mathematical guarantees on the errors introduced. However, the current state of EBLC leaves much to be desired. The existing EBLC all have different interfaces requiring codes to be changed to adopt new techniques; EBLC have many more configuration options than their predecessors, making them more difficult to use; and EBLC typically bound quantities like point wise errors rather than higher level metrics such as spectra, p-values, or test statistics that scientists typically use. My dissertation aims to provide a uniform interface to compression and to develop tools to allow application scientists to understand and apply EBLC. This dissertation proposal presents three groups of work: LibPressio, a standard interface for compression and analysis; FRaZ/LibPressio-Opt frameworks for the automated configuration of compressors using LibPressio; and work on tools for analyzing errors in particular domains

Clemson University: TigerPrints

Lossy and Lossless Compression Techniques to Improve the Utilization of Memory Bandwidth and Capacity

Author: Eldst\ue5l-Ahrens Albin
Publication venue
Publication date: 01/01/2022
Field of study

Main memory is a critical resource in modern computer systems and is in increasing demand. An increasing number of on-chip cores and specialized accelerators improves the potential processing throughput but also calls for higher data rates and greater memory capacity. In addition, new emerging data-intensive applications further increase memory traffic and footprint. On the other hand, memory bandwidth is pin limited and power constrained and is therefore more difficult to scale. Memory capacity is limited by cost and energy considerations.This thesis proposes a variety of memory compression techniques as a means to reduce the memory bottleneck. These techniques target two separate problems in the memory hierarchy: memory bandwidth and memory capacity. In order to reduce transferred data volumes, lossy compression is applied which is able to reach more aggressive compression ratios. A reduction of off-chip memory traffic leads to reduced memory latency, which in turn improves the performance and energy efficiency of the system. To improve memory capacity, a novel approach to memory compaction is presented.The first part of this thesis introduces Approximate Value Reconstruction (AVR), which combines a low-complexity downsampling compressor with an LLC design able to co-locate compressed and uncompressed data. Two separate thresholds limit the error introduced by approximation. For applications that tolerate aggressive approximation in large fractions of their data, in a system with 1GB of 1600MHz DDR4 per core and 1MB of LLC space per core, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing at most 1.2% error in the application output.The second part of this thesis proposes Memory Squeeze (MemSZ), introducing a parallelized implementation of the more advanced Squeeze (SZ) compression method. Furthermore, MemSZ improves on the error limiting capability of AVR by keeping track of life-time accumulated error. An alternate memory compression architecture is also proposed, which utilizes 3D-stacked DRAM as a last-level cache. In a system with 1GB of 800MHz DDR4 per core and 1MB of LLC space per core, MemSZ improves execution time, energy and memory traffic over AVR by up to 15%, 9%, and 64%, respectively.The third part of the thesis describes L2C, a hybrid lossy and lossless memory compression scheme. L2C applies lossy compression to approximable data, and falls back to lossless if an error threshold is exceeded. In a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, L2C improves on the performance of MemSZ by 9%, and energy consumption by 3%.The fourth and final contribution is FlatPack, a novel memory compaction scheme. FlatPack is able to reduce the traffic overhead compared to other memory compaction systems, thus retaining the bandwidth benefits of compression. Furthermore, FlatPack is flexible to changes in block compressibility both over time and between adjacent blocks. When available memory corresponds to 50% of the application footprint, in a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, FlatPack increases system performance compared to current state-of-the-art designs by 36%, while reducing system energy consumption by 12%

Chalmers Research

Software for Exascale Computing - SPPEXA 2016-2019

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

OAPEN Library

A FLEXIBLE APPROACH FOR ORCHESTRATING ADAPTIVE SCIENTIFIC WORKFLOWS FOR SCALABLE COMPUTING

Author: Singhal Swati
Publication venue
Publication date: 01/01/2022
Field of study

Modern scientific workflows are becoming complex with the incorporation of non-traditionalcomputation methods, and advances in technologies enabling on-the-fly analysis. These work- flows exhibit unpredictable runtime behaviors and have dynamic requirements. For example, such workflows must maintain overall performance and throughput while dealing with undesired events, adapting to failures, and supporting data-driven adaptive analysis. A fixed, predetermined resource assignment common to HPC machines is inefficient for overall performance, throughput, and data-driven adaptive analysis. While solutions exist to enable elastic resource management, there is no support that can manage the workflows at runtime to determine when the resource assignment and/or the runtime state of tasks (i.e. stopping, starting, changing the task parameters for adapting analysis, or changing how data is sent/received by the workflow tasks) needs to be revised, and perform the feasible changes at runtime accordingly. This dissertation provides a flexible and portable model, DYFLOW, with strategies to auto-mate the management of scalable and adaptive workflows. The model gathers runtime statistics, tracks the occurrence of important events, and finalizes a plan of action to execute in response to events that occurred, by mediating between suggested actions with respect to the running state of the workflow tasks and resource availability. Further, the model supports a wide range of con- structs and tunable parameters that allow users to express events of interest, select prospective responses, and set various preferences to set the service expectation, e.g., throughput, performance, resilience to failures, or quality of results. To showcase that the DYFLOW model supports adaptive functionality desired for emerging workflows, several examples of problematic behavior are demonstrated where DYFLOW accommodates the specific requirements and automates the runtime management process for scientists while delivering the quality of service desired

Digital Repository at the University of Maryland

Technology 2004, Vol. 2

Author
Publication venue
Publication date
Field of study

Proceedings from symposia of the Technology 2004 Conference, November 8-10, 1994, Washington, DC. Volume 2 features papers on computers and software, virtual reality simulation, environmental technology, video and imaging, medical technology and life sciences, robotics and artificial intelligence, and electronics

NASA Technical Reports Server

How To Touch a Running System

Author: Hammer Moritz
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 07/05/2009
Field of study

The increasing importance of distributed and decentralized software architectures entails more and more attention for adaptive software. Obtaining adaptiveness, however, is a difficult task as the software design needs to foresee and cope with a variety of situations. Using reconfiguration of components facilitates this task, as the adaptivity is conducted on an architecture level instead of directly in the code. This results in a separation of concerns; the appropriate reconfiguration can be devised on a coarse level, while the implementation of the components can remain largely unaware of reconfiguration scenarios. We study reconfiguration in component frameworks based on formal theory. We first discuss programming with components, exemplified with the development of the cmc model checker. This highly efficient model checker is made of C++ components and serves as an example for component-based software development practice in general, and also provides insights into the principles of adaptivity. However, the component model focuses on high performance and is not geared towards using the structuring principle of components for controlled reconfiguration. We thus complement this highly optimized model by a message passing-based component model which takes reconfigurability to be its central principle. Supporting reconfiguration in a framework is about alleviating the programmer from caring about the peculiarities as much as possible. We utilize the formal description of the component model to provide an algorithm for reconfiguration that retains as much flexibility as possible, while avoiding most problems that arise due to concurrency. This algorithm is embedded in a general four-stage adaptivity model inspired by physical control loops. The reconfiguration is devised to work with stateful components, retaining their data and unprocessed messages. Reconfiguration plans, which are provided with a formal semantics, form the input of the reconfiguration algorithm. We show that the algorithm achieves perceived atomicity of the reconfiguration process for an important class of plans, i.e., the whole process of reconfiguration is perceived as one atomic step, while minimizing the use of blocking of components. We illustrate the applicability of our approach to reconfiguration by providing several examples like fault-tolerance and automated resource control

Digitale Hochschulschriften der LMU