16,795 research outputs found

    Leveraging OpenStack and Ceph for a Controlled-Access Data Cloud

    Full text link
    While traditional HPC has and continues to satisfy most workflows, a new generation of researchers has emerged looking for sophisticated, scalable, on-demand, and self-service control of compute infrastructure in a cloud-like environment. Many also seek safe harbors to operate on or store sensitive and/or controlled-access data in a high capacity environment. To cater to these modern users, the Minnesota Supercomputing Institute designed and deployed Stratus, a locally-hosted cloud environment powered by the OpenStack platform, and backed by Ceph storage. The subscription-based service complements existing HPC systems by satisfying the following unmet needs of our users: a) on-demand availability of compute resources, b) long-running jobs (i.e., >30> 30 days), c) container-based computing with Docker, and d) adequate security controls to comply with controlled-access data requirements. This document provides an in-depth look at the design of Stratus with respect to security and compliance with the NIH's controlled-access data policy. Emphasis is placed on lessons learned while integrating OpenStack and Ceph features into a so-called "walled garden", and how those technologies influenced the security design. Many features of Stratus, including tiered secure storage with the introduction of a controlled-access data "cache", fault-tolerant live-migrations, and fully integrated two-factor authentication, depend on recent OpenStack and Ceph features.Comment: 7 pages, 5 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

    2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

    Full text link
    We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2182^{18}) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (409634096^3) particle cosmological simulations, accounting for 4×10204 \times 10^{20} floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracy and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.Comment: 12 pages, 8 figures, 77 references; To appear in Proceedings of SC '1

    Enhancing speed and scalability of the ParFlow simulation code

    Full text link
    Regional hydrology studies are often supported by high resolution simulations of subsurface flow that require expensive and extensive computations. Efficient usage of the latest high performance parallel computing systems becomes a necessity. The simulation software ParFlow has been demonstrated to meet this requirement and shown to have excellent solver scalability for up to 16,384 processes. In the present work we show that the code requires further enhancements in order to fully take advantage of current petascale machines. We identify ParFlow's way of parallelization of the computational mesh as a central bottleneck. We propose to reorganize this subsystem using fast mesh partition algorithms provided by the parallel adaptive mesh refinement library p4est. We realize this in a minimally invasive manner by modifying selected parts of the code to reinterpret the existing mesh data structures. We evaluate the scaling performance of the modified version of ParFlow, demonstrating good weak and strong scaling up to 458k cores of the Juqueen supercomputer, and test an example application at large scale.Comment: The final publication is available at link.springer.co

    4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem

    Full text link
    As an entry for the 2012 Gordon-Bell performance prize, we report performance results of astrophysical N-body simulations of one trillion particles performed on the full system of K computer. This is the first gravitational trillion-body simulation in the world. We describe the scientific motivation, the numerical algorithm, the parallelization strategy, and the performance analysis. Unlike many previous Gordon-Bell prize winners that used the tree algorithm for astrophysical N-body simulations, we used the hybrid TreePM method, for similar level of accuracy in which the short-range force is calculated by the tree algorithm, and the long-range force is solved by the particle-mesh algorithm. We developed a highly-tuned gravity kernel for short-range forces, and a novel communication algorithm for long-range forces. The average performance on 24576 and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49% and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012 (http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201

    From Bare Metal to Virtual: Lessons Learned when a Supercomputing Institute Deploys its First Cloud

    Full text link
    As primary provider for research computing services at the University of Minnesota, the Minnesota Supercomputing Institute (MSI) has long been responsible for serving the needs of a user-base numbering in the thousands. In recent years, MSI---like many other HPC centers---has observed a growing need for self-service, on-demand, data-intensive research, as well as the emergence of many new controlled-access datasets for research purposes. In light of this, MSI constructed a new on-premise cloud service, named Stratus, which is architected from the ground up to easily satisfy data-use agreements and fill four gaps left by traditional HPC. The resulting OpenStack cloud, constructed from HPC-specific compute nodes and backed by Ceph storage, is designed to fully comply with controls set forth by the NIH Genomic Data Sharing Policy. Herein, we present twelve lessons learned during the ambitious sprint to take Stratus from inception and into production in less than 18 months. Important, and often overlooked, components of this timeline included the development of new leadership roles, staff and user training, and user support documentation. Along the way, the lessons learned extended well beyond the technical challenges often associated with acquiring, configuring, and maintaining large-scale systems.Comment: 8 pages, 5 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US
    • …
    corecore