814 research outputs found
Data handling in KLOE
Abstract The KLOE experiment is going to acquire and manage petabytes of data. An efficient and easy to use system is essential to cope with this amount of data. In this paper a general overview of the approach chosen at KLOE is presented
Demonstrating 100 Gbps in and out of the public Clouds
There is increased awareness and recognition that public Cloud providers do
provide capabilities not found elsewhere, with elasticity being a major driver.
The value of elastic scaling is however tightly coupled to the capabilities of
the networks that connect all involved resources, both in the public Clouds and
at the various research institutions. This paper presents results of
measurements involving file transfers inside public Cloud providers, fetching
data from on-prem resources into public Cloud instances and fetching data from
public Cloud storage into on-prem nodes. The networking of the three major
Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google
Cloud Platform, has been benchmarked. The on-prem nodes were managed by either
the Pacific Research Platform or located at the University of Wisconsin -
Madison. The observed sustained throughput was of the order of 100 Gbps in all
the tests moving data in and out of the public Clouds and throughput reaching
into the Tbps range for data movements inside the public Cloud providers
themselves. All the tests used HTTP as the transfer protocol.Comment: 4 pages, 6 figures, 3 table
Defining a canonical unit for accounting purposes
Compute resource providers often put in place batch compute systems to
maximize the utilization of such resources. However, compute nodes in such
clusters, both physical and logical, contain several complementary resources,
with notable examples being CPUs, GPUs, memory and ephemeral storage. User jobs
will typically require more than one such resource, resulting in co-scheduling
trade-offs of partial nodes, especially in multi-user environments. When
accounting for either user billing or scheduling overhead, it is thus important
to consider all such resources together. We thus define the concept of a
threshold-based "canonical unit" that combines several resource types into a
single discrete unit and use it to characterize scheduling overhead and make
resource billing more fair for both resource providers and users. Note that the
exact definition of a canonical unit is not prescribed and may change between
resource providers. Nevertheless, we provide a template and two example
definitions that we consider appropriate in the context of the Open Science
Grid.Comment: 6 pages, 2 figures, To be published in proceedings of PEARC2
Porting and optimizing UniFrac for GPUs
UniFrac is a commonly used metric in microbiome research for comparing
microbiome profiles to one another ("beta diversity"). The recently implemented
Striped UniFrac added the capability to split the problem into many independent
subproblems and exhibits near linear scaling. In this paper we describe steps
undertaken in porting and optimizing Striped Unifrac to GPUs. We reduced the
run time of computing UniFrac on the published Earth Microbiome Project dataset
from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12 minutes on an NVIDIA Tesla
V100 GPU, and to about one hour on a laptop with NVIDIA GTX 1050 (with minor
loss in precision). Computing UniFrac on a larger dataset containing 113k
samples reduced the run time from over one month on the CPU to less than 2
hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in
precision). This was achieved by using OpenACC for generating the GPU offload
code and by improving the memory access patterns. A BSD-licensed implementation
is available, which produces a C shared library linkable by any programming
language.Comment: 4 pages, 3 figures, 4 table
Characterizing network paths in and out of the clouds
Commercial Cloud computing is becoming mainstream, with funding agencies
moving beyond prototyping and starting to fund production campaigns, too. An
important aspect of any scientific computing production campaign is data
movement, both incoming and outgoing. And while the performance and cost of VMs
is relatively well understood, the network performance and cost is not. This
paper provides a characterization of networking in various regions of Amazon
Web Services, Microsoft Azure and Google Cloud Platform, both between Cloud
resources and major DTNs in the Pacific Research Platform, including OSG data
federation caches in the network backbone, and inside the clouds themselves.
The paper contains both a qualitative analysis of the results as well as
latency and throughput measurements. It also includes an analysis of the costs
involved with Cloud-based networking.Comment: 7 pages, 1 figure, 5 tables, to be published in CHEP19 proceeding
Measuring gravitational lensing of the cosmic microwave background using cross correlation with large scale structure
We cross correlate the gravitational lensing map extracted from cosmic
microwave background measurements by the Wilkinson Microwave Anisotropy Probe
(WMAP) with the radio galaxy distribution from the NRAO VLA Sky Survey (NVSS)
by using a quadratic estimator technique. We use the full covariance matrix to
filter the data, and calculate the cross-power spectra for the lensing-galaxy
correlation. We explore the impact of changing the values of cosmological
parameters on the lensing reconstruction, and obtain statistical detection
significances at . The results of all cross correlations pass the
curl null test as well as a complementary diagnostic test using the NVSS data
in equatorial coordinates. We forecast the potential for Planck and NVSS to
constrain the lensing-galaxy cross correlation as well as the galaxy bias. The
lensing-galaxy cross-power spectra are found to be Gaussian distributed.Comment: 16 pages, 10 figure
Flexible Session Management in a Distributed Environment
Many secure communication libraries used by distributed systems, such as SSL,
TLS, and Kerberos, fail to make a clear distinction between the authentication,
session, and communication layers. In this paper we introduce CEDAR, the secure
communication library used by the Condor High Throughput Computing software,
and present the advantages to a distributed computing system resulting from
CEDAR's separation of these layers. Regardless of the authentication method
used, CEDAR establishes a secure session key, which has the flexibility to be
used for multiple capabilities. We demonstrate how a layered approach to
security sessions can avoid round-trips and latency inherent in network
authentication. The creation of a distinct session management layer allows for
optimizations to improve scalability by way of delegating sessions to other
components in the system. This session delegation creates a chain of trust that
reduces the overhead of establishing secure connections and enables centralized
enforcement of system-wide security policies. Additionally, secure channels
based upon UDP datagrams are often overlooked by existing libraries; we show
how CEDAR's structure accommodates this as well. As an example of the utility
of this work, we show how the use of delegated security sessions and other
techniques inherent in CEDAR's architecture enables US CMS to meet their
scalability requirements in deploying Condor over large-scale, wide-area grid
systems
Running a Pre-Exascale, Geographically Distributed, Multi-Cloud Scientific Simulation
As we approach the Exascale era, it is important to verify that the existing
frameworks and tools will still work at that scale. Moreover, public Cloud
computing has been emerging as a viable solution for both prototyping and
urgent computing. Using the elasticity of the Cloud, we have thus put in place
a pre-exascale HTCondor setup for running a scientific simulation in the Cloud,
with the chosen application being IceCube's photon propagation simulation. I.e.
this was not a purely demonstration run, but it was also used to produce
valuable and much needed scientific results for the IceCube collaboration. In
order to reach the desired scale, we aggregated GPU resources across 8 GPU
models from many geographic regions across Amazon Web Services, Microsoft
Azure, and the Google Cloud Platform. Using this setup, we reached a peak of
over 51k GPUs corresponding to almost 380 PFLOP32s, for a total integrated
compute of about 100k GPU hours. In this paper we provide the description of
the setup, the problems that were discovered and overcome, as well as a short
description of the actual science output of the exercise.Comment: 18 pages, 5 figures, 4 tables, to be published in Proceedings of ISC
High Performance 202
Testing GitHub projects on custom resources using unprivileged Kubernetes runners
GitHub is a popular repository for hosting software projects, both due to
ease of use and the seamless integration with its testing environment. Native
GitHub Actions make it easy for software developers to validate new commits and
have confidence that new code does not introduce major bugs. The freely
available test environments are limited to only a few popular setups but can be
extended with custom Action Runners. Our team had access to a Kubernetes
cluster with GPU accelerators, so we explored the feasibility of automatically
deploying GPU-providing runners there. All available Kubernetes-based setups,
however, require cluster-admin level privileges. To address this problem, we
developed a simple custom setup that operates in a completely unprivileged
manner. In this paper we provide a summary description of the setup and our
experience using it in the context of two Knight lab projects on the Prototype
National Research Platform system.Comment: 5 pages, 1 figure, To be published in proceedings of PEARC2
- …