23,903 research outputs found
Leveraging OpenStack and Ceph for a Controlled-Access Data Cloud
While traditional HPC has and continues to satisfy most workflows, a new
generation of researchers has emerged looking for sophisticated, scalable,
on-demand, and self-service control of compute infrastructure in a cloud-like
environment. Many also seek safe harbors to operate on or store sensitive
and/or controlled-access data in a high capacity environment.
To cater to these modern users, the Minnesota Supercomputing Institute
designed and deployed Stratus, a locally-hosted cloud environment powered by
the OpenStack platform, and backed by Ceph storage. The subscription-based
service complements existing HPC systems by satisfying the following unmet
needs of our users: a) on-demand availability of compute resources, b)
long-running jobs (i.e., days), c) container-based computing with
Docker, and d) adequate security controls to comply with controlled-access data
requirements.
This document provides an in-depth look at the design of Stratus with respect
to security and compliance with the NIH's controlled-access data policy.
Emphasis is placed on lessons learned while integrating OpenStack and Ceph
features into a so-called "walled garden", and how those technologies
influenced the security design. Many features of Stratus, including tiered
secure storage with the introduction of a controlled-access data "cache",
fault-tolerant live-migrations, and fully integrated two-factor authentication,
depend on recent OpenStack and Ceph features.Comment: 7 pages, 5 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation
We report on improvements made over the past two decades to our adaptive
treecode N-body method (HOT). A mathematical and computational approach to the
cosmological N-body problem is described, with performance and scalability
measured up to 256k () processors. We present error analysis and
scientific application results from a series of more than ten 69 billion
() particle cosmological simulations, accounting for
floating point operations. These results include the first simulations using
the new constraints on the standard model of cosmology from the Planck
satellite. Our simulations set a new standard for accuracy and scientific
throughput, while meeting or exceeding the computational efficiency of the
latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC
'1
4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem
As an entry for the 2012 Gordon-Bell performance prize, we report performance
results of astrophysical N-body simulations of one trillion particles performed
on the full system of K computer. This is the first gravitational trillion-body
simulation in the world. We describe the scientific motivation, the numerical
algorithm, the parallelization strategy, and the performance analysis. Unlike
many previous Gordon-Bell prize winners that used the tree algorithm for
astrophysical N-body simulations, we used the hybrid TreePM method, for similar
level of accuracy in which the short-range force is calculated by the tree
algorithm, and the long-range force is solved by the particle-mesh algorithm.
We developed a highly-tuned gravity kernel for short-range forces, and a novel
communication algorithm for long-range forces. The average performance on 24576
and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49%
and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012
(http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional
information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201
Distributed-memory parallelization of an explicit time-domain volume integral equation solver on Blue Gene/P
Two distributed-memory schemes for efficiently parallelizing the explicit marching-on in-time based solution of the time domain volume integral equation on the IBM Blue Gene/P platform are presented. In the first scheme, each processor stores the time history of all source fields and only the computationally dominant step of the tested field computations is distributed among processors. This scheme requires all-to-all global communications to update the time history of the source fields from the tested fields. In the second scheme, the source fields as well as all steps of the tested field computations are distributed among processors. This scheme requires sequential global communications to update the time history of the distributed source fields from the tested fields. Numerical results demonstrate that both schemes scale well on the IBM Blue Gene/P platform and the memory efficient second scheme allows for the characterization of transient wave interactions on composite structures discretized using three million spatial elements without an acceleration algorithm
From Bare Metal to Virtual: Lessons Learned when a Supercomputing Institute Deploys its First Cloud
As primary provider for research computing services at the University of
Minnesota, the Minnesota Supercomputing Institute (MSI) has long been
responsible for serving the needs of a user-base numbering in the thousands.
In recent years, MSI---like many other HPC centers---has observed a growing
need for self-service, on-demand, data-intensive research, as well as the
emergence of many new controlled-access datasets for research purposes. In
light of this, MSI constructed a new on-premise cloud service, named Stratus,
which is architected from the ground up to easily satisfy data-use agreements
and fill four gaps left by traditional HPC. The resulting OpenStack cloud,
constructed from HPC-specific compute nodes and backed by Ceph storage, is
designed to fully comply with controls set forth by the NIH Genomic Data
Sharing Policy.
Herein, we present twelve lessons learned during the ambitious sprint to take
Stratus from inception and into production in less than 18 months. Important,
and often overlooked, components of this timeline included the development of
new leadership roles, staff and user training, and user support documentation.
Along the way, the lessons learned extended well beyond the technical
challenges often associated with acquiring, configuring, and maintaining
large-scale systems.Comment: 8 pages, 5 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
- …