122 research outputs found
Exploring compression techniques for ROOT IO
ROOT provides an flexible format used throughout the HEP community. The
number of use cases - from an archival data format to end-stage analysis - has
required a number of tradeoffs to be exposed to the user. For example, a high
"compression level" in the traditional DEFLATE algorithm will result in a
smaller file (saving disk space) at the cost of slower decompression (costing
CPU time when read). At the scale of the LHC experiment, poor design choices
can result in terabytes of wasted space or wasted CPU time. We explore and
attempt to quantify some of these tradeoffs. Specifically, we explore: the use
of alternate compressing algorithms to optimize for read performance; an
alternate method of compressing individual events to allow efficient random
access; and a new approach to whole-file compression. Quantitative results are
given, as well as guidance on how to make compression decisions for different
use cases.Comment: Proceedings for 22nd International Conference on Computing in High
Energy and Nuclear Physics (CHEP 2016
Continuous Performance Benchmarking Framework for ROOT
Foundational software libraries such as ROOT are under intense pressure to
avoid software regression, including performance regressions. Continuous
performance benchmarking, as a part of continuous integration and other code
quality testing, is an industry best-practice to understand how the performance
of a software product evolves over time. We present a framework, built from
industry best practices and tools, to help to understand ROOT code performance
and monitor the efficiency of the code for a several processor architectures.
It additionally allows historical performance measurements for ROOT I/O,
vectorization and parallelization sub-systems.Comment: 8 pages, 5 figures, CHEP 2018 - 23rd International Conference on
Computing in High Energy and Nuclear Physic
Discovering Job Preemptions in the Open Science Grid
The Open Science Grid(OSG) is a world-wide computing system which facilitates
distributed computing for scientific research. It can distribute a
computationally intensive job to geo-distributed clusters and process job's
tasks in parallel. For compute clusters on the OSG, physical resources may be
shared between OSG and cluster's local user-submitted jobs, with local jobs
preempting OSG-based ones. As a result, job preemptions occur frequently in
OSG, sometimes significantly delaying job completion time.
We have collected job data from OSG over a period of more than 80 days. We
present an analysis of the data, characterizing the preemption patterns and
different types of jobs. Based on observations, we have grouped OSG jobs into 5
categories and analyze the runtime statistics for each category. we further
choose different statistical distributions to estimate probability density
function of job runtime for different classes.Comment: 8 page
Extending ROOT through Modules
The ROOT software framework is foundational for the HEP ecosystem, providing
capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses
object-oriented concepts and build-time components to layer between them. We
believe additional layering formalisms will benefit ROOT and its users. We
present the modularization strategy for ROOT which aims to formalize the
description of existing source components, making available the dependencies
and other metadata externally from the build system, and allow post-install
additions of functionality in the runtime environment. components can then be
grouped into packages, installable from external repositories to deliver
post-install step of missing packages. This provides a mechanism for the wider
software ecosystem to interact with a minimalistic install. Reducing
intra-component dependencies improves maintainability and code hygiene. We
believe helping maintain the smallest "base install" possible will help
embedding use cases. The modularization effort draws inspiration from the Java,
Python, and Swift ecosystems. Keeping aligned with the modern C++, this
strategy relies on forthcoming features such as C++ modules. We hope
formalizing the component layer will provide simpler ROOT installs, improve
extensibility, and decrease the complexity of embedding in other ecosystemsComment: 8 pages, 2 figures, 1 listing, CHEP 2018 - 23rd International
Conference on Computing in High Energy and Nuclear Physic
Designing Computing System Architecture and Models for the HL-LHC era
This paper describes a programme to study the computing model in CMS after
the next long shutdown near the end of the decade.Comment: Submitted to proceedings of the 21st International Conference on
Computing in High Energy and Nuclear Physics (CHEP2015), Okinawa, Japa
Data Access for LIGO on the OSG
During 2015 and 2016, the Laser Interferometer Gravitational-Wave Observatory
(LIGO) conducted a three-month observing campaign. These observations delivered
the first direct detection of gravitational waves from binary black hole
mergers. To search for these signals, the LIGO Scientific Collaboration uses
the PyCBC search pipeline. To deliver science results in a timely manner, LIGO
collaborated with the Open Science Grid (OSG) to distribute the required
computation across a series of dedicated, opportunistic, and allocated
resources. To deliver the petabytes necessary for such a large-scale
computation, our team deployed a distributed data access infrastructure based
on the XRootD server suite and the CernVM File System (CVMFS). This data access
strategy grew from simply accessing remote storage to a POSIX-based interface
underpinned by distributed, secure caches across the OSG.Comment: 6 pages, 3 figures, submitted to PEARC1
Long Term Dynamics for Two Three-Species Food Webs
In this paper, we analyze two possible scenarios for food webs with two prey and one predator (a food web is similar to a food chain except that in a web we have more than one species at some levels). In neither scenario do the prey compete, rather the scenarios differ in the selection method used by the predator. We determine how the dynamics depend on various parameter values. For some parameter values, one or more species dies out. For other parameter values, all species co-exist at equilibrium. For still other parameter values, the populations behave cyclically. We have even discovered parameter values for which the system exhibits chaos and has a positive Lyapunov exponent. Our analysis relies on common techniques such as nullcline analysis, equilibrium analysis and singular perturbation analysis
SciTokens: Capability-Based Secure Access to Remote Scientific Data
The management of security credentials (e.g., passwords, secret keys) for
computational science workflows is a burden for scientists and information
security officers. Problems with credentials (e.g., expiration, privilege
mismatch) cause workflows to fail to fetch needed input data or store valuable
scientific results, distracting scientists from their research by requiring
them to diagnose the problems, re-run their computations, and wait longer for
their results. In this paper, we introduce SciTokens, open source software to
help scientists manage their security credentials more reliably and securely.
We describe the SciTokens system architecture, design, and implementation
addressing use cases from the Laser Interferometer Gravitational-Wave
Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey
Telescope (LSST) projects. We also present our integration with widely-used
software that supports distributed scientific computing, including HTCondor,
CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for
capability-based secure access to remote scientific data. The access tokens
convey the specific authorizations needed by the workflows, rather than
general-purpose authentication impersonation credentials, to address the risks
of scientific workflows running on distributed infrastructure including NSF
resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds
(e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the
interoperability and security of scientific workflows, SciTokens 1) enables use
of distributed computing for scientific domains that require greater data
protection and 2) enables use of more widely distributed computing resources by
reducing the risk of credential abuse on remote systems.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
- …