Search CORE

11 research outputs found

PARALLEX FILE SYSTEM (PXFS): BRIDGING THE GAP BETWEEN EXASCALE PROCESSING CAPABILITIES AND I/O PERFORMANCE

Author: Snyder Shane
Publication venue: Clemson University Libraries
Publication date: 01/05/2013
Field of study

Due to processors reaching the maximum performance allowable by current technology, architectural trends for computer systems continue to increase the number of cores per processing chip to maximize system performance. Most estimates suggest massively parallel systems will be available within the decade, containing millions of cores and capable of exaFlops of performance. New models of execution are necessary to maximize processor utilization and minimize power costs for these exascale systems. ParalleX is one such execution model, which attempts to address inefficiencies of current execution models by exposing fine-grained parallelism, increasing system utilization using asynchronous workflow, and resolving resource contention through the use of adaptive and dynamic resource scheduling. A particularly important aspect of these exascale execution models is the design of the I/O subsystem, which has seen limited performance increases compared to processor and network technologies. Parallel file systems have been designed to help alleviate the poor performance of storage technologies by distributing file data across multiple nodes of a parallel system to maximize the aggregate throughput attainable by file system clients. However, the design of parallel file systems needs to be modified to explicitly address the inherent high-latency of remote file system operations without degrading file system performance and scalability. We present modifications to OrangeFS, a high-performance, working model parallel file system geared towards the facilitation of research in the field of parallel I/O, to help address the inefficiencies of current file systems. We deem our resultant parallel file system implementation ParalleX File System (PXFS), as it attempts to support the features required by the I/O subsystem of the ParalleX execution model. Specifically, PXFS offers mechanisms for masking the latency of file system operations, defining meaningful computation to be overlapped with file system communication, and maintaining the high-performance and scalability exhibited by OrangeFS. Our results indicate PXFS successfully improves file system performance and supports the semantics of ParalleX with limited programmer intervention, potentially simplifying the design and increasing the performance of many ParalleX applications

Clemson University: TigerPrints

Scalable and Reliable Sparse Data Computation on Emergent High Performance Computing Systems

Author: Miao Zheng
Publication venue: Clemson University Libraries
Publication date: 01/05/2022
Field of study

Heterogeneous systems with both CPUs and GPUs have become important system architectures in emergent High Performance Computing (HPC) systems. Heterogeneous systems must address both performance-scalability and power-scalability in the presence of failures. Aggressive power reduction pushes hardware to its operating limit and increases the failure rate. Resilience allows programs to progress when subjected to faults and is an integral component of large-scale systems, but incurs significant time and energy overhead. The future exascale systems are expected to have higher power consumption with higher fault rates. Sparse data computation is the fundamental kernel in many scientific applications. It is suitable for the studies of scalability and resilience on heterogeneous systems due to its computational characteristics. To deliver the promised performance within the given power budget, heterogeneous computing mandates a deep understanding of the interplay between scalability and resilience. Managing scalability and resilience is challenging in heterogeneous systems, due to the heterogeneous compute capability, power consumption, and varying failure rates between CPUs and GPUs. Scalability and resilience have been traditionally studied in isolation, and optimizing one typically detrimentally impacts the other. While prior works have been proved successful in optimizing scalability and resilience on CPU-based homogeneous systems, simply extending current approaches to heterogeneous systems results in suboptimal performance-scalability and/or power-scalability. To address the above multiple research challenges, we propose novel resilience and energy-efficiency technologies to optimize scalability and resilience for sparse data computation on heterogeneous systems with CPUs and GPUs. First, we present generalized analytical and experimental methods to analyze and quantify the time and energy costs of various recovery schemes, and develop and prototype performance optimization and power management strategies to improve scalability for sparse linear solvers. Our results quantitatively reveal that each resilience scheme has its own advantages depending on the fault rate, system size, and power budget, and the forward recovery can further benefit from our performance and power optimizations for large-scale computing. Second, we design a novel resilience technique that relaxes the requirement of synchronization and identicalness for processes, and allows them to run in heterogeneous resources with power reduction. Our results show a significant reduction in energy for unmodified programs in various fault situations compared to exact replication techniques. Third, we propose a novel distributed sparse tensor decomposition that utilizes an asynchronous RDMA-based approach with OpenSHMEM to improve scalability on large-scale systems and prove that our method works well in heterogeneous systems. Our results show our irregularity-aware workload partition and balanced-asynchronous algorithms are scalable and outperform the state-of-the-art distributed implementations. We demonstrate that understanding different bottlenecks for various types of tensors plays critical roles in improving scalability

Clemson University: TigerPrints

Recommended from our members

Laboratory Directed Research and Development Program FY 2007 Annual Report

Author: Sjoreen Terrence P
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/04/2008
Field of study

The Oak Ridge National Laboratory (ORNL) Laboratory Directed Research and Development (LDRD) program reports its status to the U.S. Department of Energy (DOE) in March of each year. The program operates under the authority of DOE Order 413.2B, 'Laboratory Directed Research and Development' (April 19, 2006), which establishes DOE's requirements for the program while providing the Laboratory Director broad flexibility for program implementation. LDRD funds are obtained through a charge to all Laboratory programs. This report includes summaries for all ORNL LDRD research activities supported during FY 2007. The associated FY 2007 ORNL LDRD Self-Assessment (ORNL/PPA-2008/2) provides financial data and an internal evaluation of the program's management process. ORNL is a DOE multiprogram science, technology, and energy laboratory with distinctive capabilities in materials science and engineering, neutron science and technology, energy production and end-use technologies, biological and environmental science, and scientific computing. With these capabilities ORNL conducts basic and applied research and development (R&D) to support DOE's overarching mission to advance the national, economic, and energy security of the United States and promote scientific and technological innovation in support of that mission. As a national resource, the Laboratory also applies its capabilities and skills to specific needs of other federal agencies and customers through the DOE Work for Others (WFO) program. Information about the Laboratory and its programs is available on the Internet at http://www.ornl.gov/. LDRD is a relatively small but vital DOE program that allows ORNL, as well as other DOE laboratories, to select a limited number of R&D projects for the purpose of: (1) maintaining the scientific and technical vitality of the Laboratory; (2) enhancing the Laboratory's ability to address future DOE missions; (3) fostering creativity and stimulating exploration of forefront science and technology; (4) serving as a proving ground for new research; and (5) supporting high-risk, potentially high-value R&D. Through LDRD the Laboratory is able to improve its distinctive capabilities and enhance its ability to conduct cutting-edge R&D for its DOE and WFO sponsors. To meet the LDRD objectives and fulfill the particular needs of the Laboratory, ORNL has established a program with two components: the Director's R&D Fund and the Seed Money Fund. As outlined in Table 1, these two funds are complementary. The Director's R&D Fund develops new capabilities in support of the Laboratory initiatives, while the Seed Money Fund is open to all innovative ideas that have the potential for enhancing the Laboratory's core scientific and technical competencies. Provision for multiple routes of access to ORNL LDRD funds maximizes the likelihood that novel ideas with scientific and technological merit will be recognized and supported

UNT Digital Library

Partial aggregation for collective communication in distributed memory machines

Author: Kowalewski Roger
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 03/08/2021
Field of study

High Performance Computing (HPC) systems interconnect a large number of Processing Elements (PEs) in high-bandwidth networks to simulate complex scientific problems. The increasing scale of HPC systems poses great challenges on algorithm designers. As the average distance between PEs increases, data movement across hierarchical memory subsystems introduces high latency. Minimizing latency is particularly challenging in collective communications, where many PEs may interact in complex communication patterns. Although collective communications can be optimized for network-level parallelism, occasional synchronization delays due to dependencies in the communication pattern degrade application performance. To reduce the performance impact of communication and synchronization costs, parallel algorithms are designed with sophisticated latency hiding techniques. The principle is to interleave computation with asynchronous communication, which increases the overall occupancy of compute cores. However, collective communication primitives abstract parallelism which limits the integration of latency hiding techniques. Approaches to work around these limitations either modify the algorithmic structure of application codes, or replace collective primitives with verbose low-level communication calls. While these approaches give fine-grained control for latency hiding, implementing collective communication algorithms is challenging and requires expertise knowledge about HPC network topologies. A collective communication pattern is commonly described as a Directed Acyclic Graph (DAG) where a set of PEs, represented as vertices, resolve data dependencies through communication along the edges. Our approach improves latency hiding in collective communication through partial aggregation. Based on mathematical rules of binary operations and homomorphism, we expose data parallelism in a respective DAG to overlap computation with communication. The proposed concepts are implemented and evaluated with a subset of collective primitives in the Message Passing Interface (MPI), an established communication standard in scientific computing. An experimental analysis with communication-bound microbenchmarks shows considerable performance benefits for the evaluated collective primitives. A detailed case study with a large-scale distributed sort algorithm demonstrates, how partial aggregation significantly improves performance in data-intensive scenarios. Besides better latency hiding capabilities with collective communication primitives, our approach enables further optimizations of their implementations within MPI libraries. The vast amount of asynchronous programming models, which are actively studied in the HPC community, benefit from partial aggregation in collective communication patterns. Future work can utilize partial aggregation to improve the interaction of MPI collectives with acclerator architectures, and to design more efficient communication algorithms

Digitale Hochschulschriften der LMU

From detection to optimization: impact of soft errors on high-performance computing applications

Author: Calhoun Jon Cameron
Publication venue
Publication date: 01/08/2017
Field of study

As high-performance computing (HPC) continues to progress, constraints on HPC system design forces the handling of errors to higher levels in the software stack. Of the types of errors facing HPC, soft errors that silently corrupt system or application state are among the most severe. The behavior of HPC applications in the presence of soft errors is critical to gain insight for effective utilization of HPC systems. The need to understand this behavior can be used in developing algorithm-based error detection guided by application characteristics from fault injection and error propagation studies. Furthermore, the realization that applications are tolerant to small errors allows optimizations such as lossy compression on high-cost data transfers. Lossy compression adds small user controllable amounts of error when compressing data, to reduce data size before expensive data transfers saving time. This dissertation investigates and improves the resiliency of HPC applications to soft errors, and explores lossy compression as a new form of optimization for expensive, time-consuming data transfers

Illinois Digital Environment for Access to Learning and Scholarship Repository

Software for Exascale Computing - SPPEXA 2016-2019

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

OAPEN Library

Generalized averaged Gaussian quadrature and applications

Author: Spalević Miodrag
Publication venue
Publication date: 01/01/2019
Field of study

A simple numerical method for constructing the optimal generalized averaged Gaussian quadrature formulas will be presented. These formulas exist in many cases in which real positive GaussKronrod formulas do not exist, and can be used as an adequate alternative in order to estimate the error of a Gaussian rule. We also investigate the conditions under which the optimal averaged Gaussian quadrature formulas and their truncated variants are internal

Machinery - Repository of the Faculty of Mechanical Engineering, University of Belgrade

machinery

MS FT-2-2 7 Orthogonal polynomials and quadrature: Theory, computation, and applications

Author: Pranić Miroslav
Reichel Lothar
Spalević Miodrag
Publication venue
Publication date: 01/01/2019
Field of study

Quadrature rules find many applications in science and engineering. Their analysis is a classical area of applied mathematics and continues to attract considerable attention. This seminar brings together speakers with expertise in a large variety of quadrature rules. It is the aim of the seminar to provide an overview of recent developments in the analysis of quadrature rules. The computation of error estimates and novel applications also are described

Machinery - Repository of the Faculty of Mechanical Engineering, University of Belgrade

machinery