5 research outputs found
BOSS-LDG: A Novel Computational Framework that Brings Together Blue Waters, Open Science Grid, Shifter and the LIGO Data Grid to Accelerate Gravitational Wave Discovery
We present a novel computational framework that connects Blue Waters, the
NSF-supported, leadership-class supercomputer operated by NCSA, to the Laser
Interferometer Gravitational-Wave Observatory (LIGO) Data Grid via Open Science
Grid technology. To enable this computational infrastructure, we configured,
for the first time, a LIGO Data Grid Tier-1 Center that can submit
heterogeneous LIGO workflows using Open Science Grid facilities. In order to
enable a seamless connection between the LIGO Data Grid and Blue Waters via
Open Science Grid, we utilize Shifter to containerize LIGO's workflow software.
This work represents the first time Open Science Grid, Shifter, and Blue Waters
are unified to tackle a scientific problem and, in particular, it is the first
time a framework of this nature is used in the context of large scale
gravitational wave data analysis. This new framework has been used in the last
several weeks of LIGO's second discovery campaign to run the most
computationally demanding gravitational wave search workflows on Blue Waters,
and accelerate discovery in the emergent field of gravitational wave
astrophysics. We discuss the implications of this novel framework for a wider
ecosystem of Higher Performance Computing users.Comment: 10 pages, 10 figures. Accepted as a Full Research Paper to the 13th
IEEE International Conference on eScienc
Catch Me If You Can: Using Power Analysis to Identify HPC Activity
Monitoring users on large computing platforms such as high performance
computing (HPC) and cloud computing systems is non-trivial. Utilities such as
process viewers provide limited insight into what users are running, due to
granularity limitation, and other sources of data, such as system call tracing,
can impose significant operational overhead. However, despite technical and
procedural measures, instances of users abusing valuable HPC resources for
personal gains have been documented in the past \cite{hpcbitmine}, and systems
that are open to large numbers of loosely-verified users from around the world
are at risk of abuse. In this paper, we show how electrical power consumption
data from an HPC platform can be used to identify what programs are executed.
The intuition is that during execution, programs exhibit various patterns of
CPU and memory activity. These patterns are reflected in the power consumption
of the system and can be used to identify programs running. We test our
approach on an HPC rack at Lawrence Berkeley National Laboratory using a
variety of scientific benchmarks. Among other interesting observations, our
results show that by monitoring the power consumption of an HPC rack, it is
possible to identify if particular programs are running with precision up to
and recall of 95\% even in noisy scenarios
RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing
The rigid MPI programming model and batch scheduling dominate
high-performance computing. While clouds brought new levels of elasticity into
the world of computing, supercomputers still suffer from low resource
utilization rates. To enhance supercomputing clusters with the benefits of
serverless computing, a modern cloud programming paradigm for pay-as-you-go
execution of stateless functions, we present rFaaS, the first RDMA-aware
Function-as-a-Service (FaaS) platform. With hot invocations and decentralized
function placement, we overcome the major performance limitations of FaaS
systems and provide low-latency remote invocations in multi-tenant
environments. We evaluate the new serverless system through a series of
microbenchmarks and show that remote functions execute with negligible
performance overheads. We demonstrate how serverless computing can bring
elastic resource management into MPI-based high-performance applications.
Overall, our results show that MPI applications can benefit from modern cloud
programming paradigms to guarantee high performance at lower resource costs