Search CORE

16 research outputs found

Software Roadmap to Plug and Play Petaflop/s

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Tools for analyzing parallel I/O

Author: A Knüpfer
A Peters
AJ Peters
C Karbach
E Betke
IF Adams
JM Kunkel
P Carns
P Gomez-Sanchez
SA Wright
T Benson
T Ludwig
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2018
Field of study

Parallel application I/O performance often does not meet user expectations. Additionally, slight access pattern modifications may lead to significant changes in performance due to complex interactions between hardware and software. These issues call for sophisticated tools to capture, analyze, understand, and tune application I/O. In this paper, we highlight advances in monitoring tools to help address these issues. We also describe best practices, identify issues in measure- ment and analysis, and provide practical approaches to translate parallel I/O analysis into actionable outcomes for users, facility operators, and researchers

arXiv.org e-Print Archive

Central Archive at the University of Reading

Crossref

Juelich Shared Electronic Resources

Recommended from our members

National Energy Research Scientific Computing Center 2007 Annual Report

Author: Bashor Jon
Hules John A.
Preuss Paul
Wang Ucilia
Yarris Lynn
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 23/10/2008
Field of study

This report presents highlights of the research conducted on NERSC computers in a variety of scientific disciplines during the year 2007. It also reports on changes and upgrades to NERSC's systems and services aswell as activities of NERSC staff

UNT Digital Library

Capturing the impact of external interference on HPC application performance

Author: Shah Aamer
Publication venue
Publication date: 01/01/2020
Field of study

HPC applications are large software packages with high computation and storage requirements. To meet these requirements, the architectures of supercomputers are continuously evolving and their capabilities are continuously increasing. Present-day supercomputers have achieved petaflops of computational power by utilizing thousands to millions of compute cores, connected through specialized communication networks, and are equipped with petabytes of storage using a centralized I/O subsystem. While fulfilling the high resource demands of HPC applications, such a design also entails its own challenges. Applications running on these systems own the computation resources exclusively, but share the communication interconnect and the I/O subsystem with other concurrently running applications. Simultaneous access to these shared resources causes contention and inter-application interference, leading to degraded application performance. Inter-application interference is one of the sources of run-to-run variation. While other sources of variation, such as operating system jitter, have been investigated before, this doctoral thesis specifically focuses on inter-application interference and studies it from the perspective of an application. Variation in execution time not only causes uncertainty and affects user expectations (especially during performance analysis), but also causes suboptimal usage of HPC resources. Therefore, this thesis aims to evaluate inter-application interference, establish trends among applications under contention, and approximate the impact of external influences on the runtime of an application. To this end, this thesis first presents a method to correlate the performance of applications running side-by-side. The method divides the runtime of a system into globally synchronized, fine-grained time slices for which application performance data is recorded separately. The evaluation of the method demonstrates that correlating application performance data can identify inter-application interference. The thesis further uses the method to study I/O interference and shows that file access patterns are a significant factor in determining the interference potential of an application. This thesis also presents a technique to estimate the impact of external influences on an application run. The technique introduces the concept of intrinsic performance characteristics to cluster similar application execution segments. Anomalies in the cluster are the result of external interference. An evaluation with several benchmarks shows high accuracy in estimating the impact of interference from a single application run. The contributions of this thesis will help establish interference trends and devise interference mitigation techniques. Similarly, estimating the impact of external interference will restore user expectations and help performance analysts separate application performance from external influence

tuprints

Holistic energy and failure aware workload scheduling in Cloud datacenters

Author: Garraghan Peter
Jiang Xiaohong
Li Xiang
Wu Zhaohui
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

The global uptake of Cloud computing has attracted increased interest within both academia and industry resulting in the formation of large-scale and complex distributed systems. This has led to increased failure occurrence within computing systems that induce substantial negative impact upon system performance and task reliability perceived by users. Such systems also consume vast quantities of power, resulting in significant operational costs perceived by providers. Virtualization – a commonly deployed technology within Cloud datacenters – can enable flexible scheduling of virtual machines to maximize system reliability and energy-efficiency. However, existing work address these two objectives separately, providing limited understanding towards studying the explicit trade-offs towards dependable and energy-efficient compute infrastructure. In this paper, we propose two failure-aware energy-efficient scheduling algorithms that exploit the holistic operational characteristics of the Cloud datacenter comprising the cooling unit, computing infrastructure and server failures. By comprehensively modeling the power and failure profiles of a Cloud datacenter, we propose workload scheduling algorithms Ella-W and Ella-B, capable of reducing cooling and compute energy while minimizing the impact of system failures. A novel and overall metric is proposed that combines energy efficiency and reliability to specify the performance of various algorithms. We evaluate our algorithms against Random, MaxUtil, TASA, MTTE and OBFIT under various system conditions of failure prediction accuracy and workload intensity. Evaluation results demonstrate that Ella-W can reduce energy usage by 29.5% and improve task completion rate by 3.6%, while Ella-B reduces energy usage by 32.7% with no degradation to task completion rate

Crossref

Lancaster E-Prints

The Fifth Workshop on HPC Best Practices: File Systems and Archives

Author: Hick Jason
Hules John
Uselton Andrew
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 30/11/2011
Field of study

The workshop on High Performance Computing (HPC) Best Practices on File Systems and Archives was the fifth in a series sponsored jointly by the Department Of Energy (DOE) Office of Science and DOE National Nuclear Security Administration. The workshop gathered technical and management experts for operations of HPC file systems and archives from around the world. Attendees identified and discussed best practices in use at their facilities, and documented findings for the DOE and HPC community in this report

Crossref

UNT Digital Library

Laboratory Directed Research and Development Program FY 2008 Annual Report

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Cyber-Human Systems, Space Technologies, and Threats

Author: Carter Candice M.
Diebold Carter
Drew Jerry V., II
Farcot Max
Hood John-Paul
Jackson Mark J.
Johnson Peter D.
Joseph Siny
Kahn Saeed
Lonstein Wayne D.
McCreight Robert
Muehlfelder Trevor W.
Mumm Hans C.
Nichols Randall K.
Ryan Juole J.C.H.
Sincavage Suzanne M.
Solfer William
Toebes John
Publication venue: 'New Prairie Press'
Publication date: 15/08/2023
Field of study

CYBER-HUMAN SYSTEMS, SPACE TECHNOLOGIES, AND THREATS is our eighth textbook in a series covering the world of UASs / CUAS/ UUVs / SPACE. Other textbooks in our series are Space Systems Emerging Technologies and Operations; Drone Delivery of CBNRECy – DEW Weapons: Emerging Threats of Mini-Weapons of Mass Destruction and Disruption (WMDD); Disruptive Technologies with applications in Airline, Marine, Defense Industries; Unmanned Vehicle Systems & Operations On Air, Sea, Land; Counter Unmanned Aircraft Systems Technologies and Operations; Unmanned Aircraft Systems in the Cyber Domain: Protecting USA’s Advanced Air Assets, 2nd edition; and Unmanned Aircraft Systems (UAS) in the Cyber Domain Protecting USA’s Advanced Air Assets, 1st edition. Our previous seven titles have received considerable global recognition in the field. (Nichols & Carter, 2022) (Nichols, et al., 2021) (Nichols R. K., et al., 2020) (Nichols R. , et al., 2020) (Nichols R. , et al., 2019) (Nichols R. K., 2018) (Nichols R. K., et al., 2022)https://newprairiepress.org/ebooks/1052/thumbnail.jp

Kansas State University