42 research outputs found
CAMPUS GRIDS: A FRAMEWORK TO FACILITATE RESOURCE SHARING
It is common at research institutions to maintain multiple clusters. These might fulfill different needs and policies, or represent different owners or generations of hard- ware. Many of these clusters are under utilized while researchers at other departments may require these resources. This may be solved by linking clusters with grid mid- dleware. This thesis describes a distributed high throughput computing framework to link clusters without changing security or execution environments. The framework initially keeps jobs local to the submitter, overflowing if necessary to the campus, and regional grid. The framework is implemented spanning two campuses at the Holland Computing Center. We evaluate the framework for five characteristics of campus grids. This framework is then further expanded to bridge campus grids into a regional grid, and overflow to national cyberinfrastructure
Discovering Job Preemptions in the Open Science Grid
The Open Science Grid(OSG) is a world-wide computing system which facilitates
distributed computing for scientific research. It can distribute a
computationally intensive job to geo-distributed clusters and process job's
tasks in parallel. For compute clusters on the OSG, physical resources may be
shared between OSG and cluster's local user-submitted jobs, with local jobs
preempting OSG-based ones. As a result, job preemptions occur frequently in
OSG, sometimes significantly delaying job completion time.
We have collected job data from OSG over a period of more than 80 days. We
present an analysis of the data, characterizing the preemption patterns and
different types of jobs. Based on observations, we have grouped OSG jobs into 5
categories and analyze the runtime statistics for each category. we further
choose different statistical distributions to estimate probability density
function of job runtime for different classes.Comment: 8 page
Data Access for LIGO on the OSG
During 2015 and 2016, the Laser Interferometer Gravitational-Wave Observatory
(LIGO) conducted a three-month observing campaign. These observations delivered
the first direct detection of gravitational waves from binary black hole
mergers. To search for these signals, the LIGO Scientific Collaboration uses
the PyCBC search pipeline. To deliver science results in a timely manner, LIGO
collaborated with the Open Science Grid (OSG) to distribute the required
computation across a series of dedicated, opportunistic, and allocated
resources. To deliver the petabytes necessary for such a large-scale
computation, our team deployed a distributed data access infrastructure based
on the XRootD server suite and the CernVM File System (CVMFS). This data access
strategy grew from simply accessing remote storage to a POSIX-based interface
underpinned by distributed, secure caches across the OSG.Comment: 6 pages, 3 figures, submitted to PEARC1
SciTokens: Capability-Based Secure Access to Remote Scientific Data
The management of security credentials (e.g., passwords, secret keys) for
computational science workflows is a burden for scientists and information
security officers. Problems with credentials (e.g., expiration, privilege
mismatch) cause workflows to fail to fetch needed input data or store valuable
scientific results, distracting scientists from their research by requiring
them to diagnose the problems, re-run their computations, and wait longer for
their results. In this paper, we introduce SciTokens, open source software to
help scientists manage their security credentials more reliably and securely.
We describe the SciTokens system architecture, design, and implementation
addressing use cases from the Laser Interferometer Gravitational-Wave
Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey
Telescope (LSST) projects. We also present our integration with widely-used
software that supports distributed scientific computing, including HTCondor,
CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for
capability-based secure access to remote scientific data. The access tokens
convey the specific authorizations needed by the workflows, rather than
general-purpose authentication impersonation credentials, to address the risks
of scientific workflows running on distributed infrastructure including NSF
resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds
(e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the
interoperability and security of scientific workflows, SciTokens 1) enables use
of distributed computing for scientific domains that require greater data
protection and 2) enables use of more widely distributed computing resources by
reducing the risk of credential abuse on remote systems.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
IceCube experience using XRootD-based Origins with GPU workflows in PNRP
The IceCube Neutrino Observatory is a cubic kilometer neutrino telescope
located at the geographic South Pole. Understanding detector systematic effects
is a continuous process. This requires the Monte Carlo simulation to be updated
periodically to quantify potential changes and improvements in science results
with more detailed modeling of the systematic effects. IceCube's largest
systematic effect comes from the optical properties of the ice the detector is
embedded in. Over the last few years there have been considerable improvements
in the understanding of the ice, which require a significant processing
campaign to update the simulation. IceCube normally stores the results in a
central storage system at the University of Wisconsin-Madison, but it ran out
of disk space in 2022. The Prototype National Research Platform (PNRP) project
thus offered to provide both GPU compute and storage capacity to IceCube in
support of this activity. The storage access was provided via XRootD-based OSDF
Origins, a first for IceCube computing. We report on the overall experience
using PNRP resources, with both successes and pain points.Comment: 7 pages, 3 figures, 1 table, To be published in Proceedings of CHEP2
BoscoR: Extending R from the desktop to the Grid
In this paper, we describe a framework to execute R functions on remote resources from the desktop using Bosco. The R language is attractive to researchers because of its high level programming constructs which lower the barrier of entry for use. As the use of the R programming language in HPC and High Throughput Computing (HTC) has grown, so too has the need for parallel libraries in order to utilize computing resources.
Bosco is middleware that uses common protocols to manage job submissions to a variety of remote computational platforms and resources. The researcher is able to control and monitor remote submission from their interactive R IDE, such as RStudio. Bosco is capable of managing many concurrent tasks submitted to remote resources while providing feedback to the interactive R environment. We will also show how this framework can be used to access national infrastructure such as the Open Science Grid.
Through interviews with R users, and their feedback after using BoscoR, we learned how R users work and designed BoscoR to fit their needs. We incorporated their feedback to improve BoscoR by adding much needed features, such as remote package management. A key design goal was to have a flat learning curve in using BoscoR for any R user
Creating a content delivery network for general science on the internet backbone using XCaches
A general problem faced by computing on the grid for opportunistic users is
that delivering cycles is simpler than delivering data to those cycles. In this
project we show how we integrated XRootD caches placed on the internet backbone
to implement a content delivery network for general science workflows. We will
show that for some workflows on different science domains like high energy
physics, gravitational waves, and others the combination of data reuse from the
workflows together with the use of caches increases CPU efficiency while
decreasing network bandwidth use