9 research outputs found
Recommended from our members
Containerization on Petascale HPC Clusters
Containerization technologies provide a mechanism to encapsulate applications and many of their dependencies, facilitating software portability and reproducibility on HPC systems. However, in order to access many of the architectural features that enable HPC system performance, compatibility between certain components of the container and host are required, resulting in a trade-off between portability and performance. In this work, we discuss our early experiences running three state-of-the-art containerization technologies on the petascale Frontera system. We present how we build the containers to ensure performance and security and their performance at scale.We ran microbenchmarks at a scale of 4,096 nodes and demonstrate the near-native performance and minimal memory overheads by the containerized environments at 70,000 processes on 1,296 nodes with a scientific application MILC - a quantum chromodynamics code.UT Austin-Portugal Program, a collaboration between the Portuguese Foundation of Science and Technology and the University of Texas at Austin, award UTA18-001217Texas Advanced Computing Center (TACC
PADLL: Taming Metadata-intensive HPC Jobs Through Dynamic, Application-agnostic QoS Control
Modern I/O applications that run on HPC infrastructures are increasingly
becoming read and metadata intensive. However, having multiple concurrent
applications submitting large amounts of metadata operations can easily
saturate the shared parallel file system's metadata resources, leading to
overall performance degradation and I/O unfairness. We present PADLL, an
application and file system agnostic storage middleware that enables QoS
control of data and metadata workflows in HPC storage systems. It adopts ideas
from Software-Defined Storage, building data plane stages that mediate and rate
limit POSIX requests submitted to the shared file system, and a control plane
that holistically coordinates how all I/O workflows are handled. We demonstrate
its performance and feasibility under multiple QoS policies using synthetic
benchmarks, real-world applications, and traces collected from a production
file system. Results show that PADLL can enforce complex storage QoS policies
over concurrent metadata-aggressive jobs, ensuring fairness and prioritization.Comment: To appear at 23rd IEEE/ACM International Symposium on Cluster, Cloud
and Internet Computing (CCGrid'23
Towards the use of Online Social Networks for Efficient Internet Content Distribution
Abstract—A large contributor to the growing Internet traffic is user generated content shared via online social networking websites. Our insight is that these websites can reveal valuable information that can be used in content delivery networks for better caching and pre-fetching performance. In this paper, we combine five different datasets from Twitter and other sources, and make several observations that can lead to helpful heuristics for better content placement. In particular, we study the temporal growth and decay, the geographical spread, and the social spread, of topics on the social network. We also describe in detail our methodologies for data collection, that can be useful for other researchers working in this space as well. In the future, we will use these observations to design heuristics for improved CDN performance. I
Taming metadata-intensive HPC jobs through dynamic, application-agnostic QoS control
Modern I/O applications that run on HPC infrastructures are increasingly becoming read and metadata intensive. However, having multiple applications submitting large amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to overall performance degradation and I/O unfairness. We present PADLL, an application and file system agnostic storage middleware that enables QoS control of data and metadata workflows in HPC storage systems. It adopts ideas from Software-Defined Storage, building data plane stages that mediate and rate limit POSIX requests submitted to the shared file system, and a control plane that holistically coordinates how all I/O workflows are handled. We demonstrate its performance and feasibility under multiple QoS policies using synthetic benchmarks, real-world applications, and traces collected from a production file system. Results show that PADLL can enforce complex storage QoS policies over concurrent metadata-aggressive jobs, ensuring fairness and prioritization.We thank AIST for providing access to computational resources of ABCI. We thank Claudia Brito and Tânia Esteves for reviewing initial versions of this work. This work is financed by: the ERDF - European Regional Development Fund, through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme under the Portugal 2020 Partnership Agreement, and by National
Funds through the FCT - Portuguese Foundation for Science and Technology, I.P. on the scope of the UT Austin Portugal Program within project BigHPC, with reference POCI-01-0247-FEDER-045924 (Mariana Miranda); through PhD Fellowships SFRH/BD/146059/2019 and PD/BD/151403/2021; and the UT Austin-Portugal Program, a collaboration between the Portuguese Foundation of Science and Technology and the University of Texas at Austin, award UTA18-001217. The first two authors contributed equally to this work
SI2-SSI (2018): Collaborative Research: A Software Infrastructure for MPI Performance Engineering: Integrating MVAPICH and TAU via the MPI Tools Interface
<div>This research aims to create an open source</div><div>integrated software infrastructure built on the</div><div>MPI_T interface which defines the API for</div><div>interaction and information interchange to enable</div><div>fine grained performance optimizations for HPC</div><div>applications. The challenges addressed by the</div><div>project include: 1) enhancing existing support for</div><div>MPI_T in MVAPICH to expose a richer set of</div><div>performance and control variables; 2) redesigning</div><div>TAU to take advantage of the new MPI_T variables</div><div>exposed by MVAPICH; 3) extending and enhancing TAU</div><div>and MVAPICH with the ability to generate</div><div>recommendations and performance engineering</div><div>reports; 4) proposing fundamental design changes</div><div>to make MPI libraries like MVAPICH</div><div>"reconfigurable" at runtime; and 5) adding support</div><div>to MVAPICH and TAU for interactive performance</div><div>engineering sessions.</div
SI2-SSI (2018): Collaborative Research: A Software Infrastructure for MPI Performance Engineering: Integrating MVAPICH and TAU via the MPI Tools Interface
<div>This research aims to create an open source</div><div>integrated software infrastructure built on the</div><div>MPI_T interface which defines the API for</div><div>interaction and information interchange to enable</div><div>fine grained performance optimizations for HPC</div><div>applications. The challenges addressed by the</div><div>project include: 1) enhancing existing support for</div><div>MPI_T in MVAPICH to expose a richer set of</div><div>performance and control variables; 2) redesigning</div><div>TAU to take advantage of the new MPI_T variables</div><div>exposed by MVAPICH; 3) extending and enhancing TAU</div><div>and MVAPICH with the ability to generate</div><div>recommendations and performance engineering</div><div>reports; 4) proposing fundamental design changes</div><div>to make MPI libraries like MVAPICH</div><div>"reconfigurable" at runtime; and 5) adding support</div><div>to MVAPICH and TAU for interactive performance</div><div>engineering sessions.</div