Search CORE

11 research outputs found

PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies

Author: Day Judy
Lenhart Suzanne
Peterson Gregory D.
Ponce Eduardo
Stephenson Brittany
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/07/2018
Field of study

The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward since they may need multiple processing tasks and iterations. Furthermore, parameter and performance studies are common approaches used to characterize a simulation, often requiring traversal of a large parameter space. High-performance computers offer practical resources at the expense of users handling the setup, submission, and management of jobs. This work presents the design of PaPaS, a portable, lightweight, and generic workflow framework for conducting parallel parameter and performance studies. Workflows are defined using parameter files based on keyword-value pairs syntax, thus removing from the user the overhead of creating complex scripts to manage the workflow. A parameter set consists of any combination of environment variables, files, partial file contents, and command line arguments. PaPaS is being developed in Python 3 with support for distributed parallelization using SSH, batch systems, and C++ MPI. The PaPaS framework will run as user processes, and can be used in single/multi-node and multi-tenant computing systems. An example simulation using the BehaviorSpace tool from NetLogo and a matrix multiply using OpenMP are presented as parameter and performance studies, respectively. The results demonstrate that the PaPaS framework offers a simple method for defining and managing parameter studies, while increasing resource utilization.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

arXiv.org e-Print Archive

Crossref

XSEDE: eXtreme Science and Engineering Discovery Environment Third Quarter 2012 Report

Author
Publication venue
Publication date: 30/09/2012
Field of study

The Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is an integrated cyberinfrastructure ecosystem with singular interfaces for allocations, support, and other key services that researchers can use to interactively share computing resources, data, and expertise.This a report of project activities and highlights from the third quarter of 2012.National Science Foundation, OCI-105357

Illinois Digital Environment for Access to Learning and Scholarship Repository

XSEDE: The Extreme Science and Engineering Discovery Environment (OAC 15-48562) Interim Project Report 13: Report Year 5, Reporting Period 2 August 1, 2020 – October 31, 2020

Author
Publication venue
Publication date: 19/11/2020
Field of study

This is the Interim Project Report 13 (IPR13) for the NSF XSEDE project. It includes Key Performance Indicator data and project highlights for Reporting Year 5, Report Period 2 (August 1-October 31, 2020).NSF OAC 15-48562Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

XSEDE: The Extreme Science and Engineering Discovery Environment Post-XSEDE 2.0 Preliminary Transition Plan

Author
Publication venue
Publication date: 30/01/2019
Field of study

The XSEDE team is committed to a seamless transition with no interruption in services at the hand-off from current XSEDE 2.0 operations to potential follow-on award(s) and awardee(s). XSEDE is comprised of six Work Breakdown Structure (WBS) Level 2 sub-groups (L2s), and each of those is further divided into WBS Level 3 areas (L3s). This report includes specific documents and activities that would be transitioned in each of these L2/L3 areas to a follow-on award(s) or awardee(s).National Science Foundation grant number ACI-1548562Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

Design and implementation of a telemetry platform for high-performance computing environments

Author: Weidner Ole
Publication venue: The University of Edinburgh
Publication date: 30/11/2021
Field of study

A new generation of high-performance and distributed computing applications and services rely on adaptive and dynamic architectures and execution strategies to run efficiently, resiliently, and at scale in today’s HPC environments. These architectures require insights into their execution behaviour and the state of their execution environment at various levels of detail, in order to make context-aware decisions. HPC telemetry provides this information. It describes the continuous stream of time series and event data that is generated on HPC systems by the hardware, operating systems, services, runtime systems, and applications. Current HPC ecosystems do not provide the conceptual models, infrastructure, and interfaces to collect, store, analyse, and integrate telemetry in a structured and efficient way. Consequently, applications and services largely depend on one-off solutions and custom-built technologies to achieve these goals; introducing significant development overheads that inhibit portability and mobility. To facilitate a broader mix of applications, more efficient application development, and swift adoption of adaptive architectures in production, a comprehensive framework for telemetry management and analysis must be provided as part of future HPC ecosystem designs. This thesis provides the blueprint for such a framework: it proposes a new approach to telemetry management in HPC: the Telemetry Platform concept. Departing from the observation that telemetry data and the corresponding analysis, and integration pat- terns on modern multi-tenant HPC systems have a lot of similarities to the patterns observed in large-scale data analytics or “Big Data” platforms, the telemetry platform concept takes the data platform paradigm and architectural approach and applies them to HPC telemetry. The result is the blueprint for a system that provides services for storing, searching, analysing, and integrating telemetry data in HPC applications and other HPC system services. It allows users to create and share telemetry data-driven insights using everything from simple time-series analysis to complex statistical and machine learning models while at the same time hiding many of the inherent complexities of data management such as data transport, clean-up, storage, cataloguing, access management, and providing appropriate and scalable analytics and integration capabilities. The main contributions of this research are (1) the application of the data platform concept to HPC telemetry data management and usage; (2) a graph-based, time-variant telemetry data model that captures structures and properties of platform and applications and in which telemetry data can be organized; (3) an architecture blueprint and prototype of a concrete implementation and integration architecture of the telemetry platform; and (4) a proposal for decoupled HPC application architectures, separating telemetry data management, and feedback-control-loop logic from the core application code. First experimental results with the prototype implementation suggest that the telemetry platform paradigm can reduce overhead and redundancy in the development of telemetry-based application architectures, and lower the barrier for HPC systems research and the provisioning of new, innovative HPC system services

Edinburgh Research Archive

2015 XSEDE Federation Risk Assessment Overview

Author: Fleury Terry
Slagell Adam
Publication venue
Publication date: 25/09/2019
Field of study

The methodology and working documentation for performing the 2012 and 2015 XSEDE Security Risk Assessments.NSF #1053575Ope

Illinois Digital Environment for Access to Learning and Scholarship Repository

Scalable Observation, Analysis, and Tuning for Parallel Portability in HPC

Author: Wood Chad
Publication venue: University of Oregon
Publication date: 10/05/2022
Field of study

It is desirable for general productivity that high-performance computing applications be portable to new architectures, or can be optimized for new workflows and input types, without the need for costly code interventions or algorithmic re-writes. Parallel portability programming models provide the potential for high performance and productivity, however they come with a multitude of runtime parameters that can have significant impact on execution performance. Selecting the optimal set of parameters, so that HPC applications perform well in different system environments and on different input data sets, is not trivial.This dissertation maps out a vision for addressing this parallel portability challenge, and then demonstrates this plan through an effective combination of observability, analysis, and in situ machine learning techniques. A platform for general-purpose observation in HPC contexts is investigated, along with support for its use in human-in-the-loop performance understanding and analysis. The dissertation culminates in a demonstration of lessons learned in order to provide automated tuning of HPC applications utilizing parallel portability frameworks

University of Oregon Scholars' Bank

Managing Computational Gateway Resources with XDMoD

Author: abani patra (4499095)
benjamin plessinger (4499119)
Jeanette Sperhac (4499050)
jeffrey t. palmer (4499083)
joseph p. white (4499110)
martins innus (4499068)
matthew d. jones (4499071)
nikolay simakov (4499107)
robert l. deleon (4499116)
ryan rathsam (4499074)
steven m. gallo (4499092)
thomas r. furlani (4499113)
thomas yearke (4499056)
Publication venue
Publication date
Field of study

The U.S. National Science Foundation (NSF) has invested heavily in research computing, funding XSEDE to integrate supercomputers with science gateways and datasets for researchers in the U.S. and around the world. It is important to understand how these tools contribute to knowledge, plan wisely for future resource investments, and enable end users to make better use of these resources. Enter XDMoD (XD Metrics on Demand), a comprehensive tool that collects and presents detailed data about resource usage, for program managers, developers, support staff, and end users alike. As XSEDE adds non-traditional resources and diversifies its computational resources, the considerable capabilities offered by XDMoD must keep pace. In this short paper, we introduce XDMoD's current capabilities, describe the state of its support for gateway resources, and outline our plans to further enhance these offerings

FigShare