1,450 research outputs found
funcX: A Federated Function Serving Fabric for Science
Exploding data volumes and velocities, new computational methods and
platforms, and ubiquitous connectivity demand new approaches to computation in
the sciences. These new approaches must enable computation to be mobile, so
that, for example, it can occur near data, be triggered by events (e.g.,
arrival of new data), be offloaded to specialized accelerators, or run remotely
where resources are available. They also require new design approaches in which
monolithic applications can be decomposed into smaller components, that may in
turn be executed separately and on the most suitable resources. To address
these needs we present funcX---a distributed function as a service (FaaS)
platform that enables flexible, scalable, and high performance remote function
execution. funcX's endpoint software can transform existing clouds, clusters,
and supercomputers into function serving systems, while funcX's cloud-hosted
service provides transparent, secure, and reliable function execution across a
federated ecosystem of endpoints. We motivate the need for funcX with several
scientific case studies, present our prototype design and implementation, show
optimizations that deliver throughput in excess of 1 million functions per
second, and demonstrate, via experiments on two supercomputers, that funcX can
scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and
Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap
with arXiv:1908.0490
Scientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific analyses commonly compose multiple single-process programs into a
dataflow. An end-to-end dataflow of single-process programs is known as a
many-task application. Typically, tools from the HPC software stack are used to
parallelize these analyses. In this work, we investigate an alternate approach
that uses Apache Spark -- a modern big data platform -- to parallelize
many-task applications. We present Kira, a flexible and distributed astronomy
image processing toolkit using Apache Spark. We then use the Kira toolkit to
implement a Source Extractor application for astronomy images, called Kira SE.
With Kira SE as the use case, we study the programming flexibility, dataflow
richness, scheduling capacity and performance of Apache Spark running on the
EC2 cloud. By exploiting data locality, Kira SE achieves a 2.5x speedup over an
equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon
EC2 cloud. Furthermore, we show that by leveraging software originally designed
for big data infrastructure, Kira SE achieves competitive performance to the C
implementation running on the NERSC Edison supercomputer. Our experience with
Kira indicates that emerging Big Data platforms such as Apache Spark are a
performant alternative for many-task scientific applications
HIL: designing an exokernel for the data center
We propose a new Exokernel-like layer to allow mutually untrusting physically deployed services to efficiently share the resources of a data center. We believe that such a layer offers not only efficiency gains, but may also enable new economic models, new applications, and new security-sensitive uses. A prototype (currently in active use) demonstrates that the proposed layer is viable, and can support a variety of existing provisioning tools and use cases.Partial support for this work was provided by the MassTech Collaborative Research Matching Grant Program, National Science Foundation awards 1347525 and 1149232 as well as the several commercial partners of the Massachusetts Open Cloud who may be found at http://www.massopencloud.or
Leveraging OpenStack and Ceph for a Controlled-Access Data Cloud
While traditional HPC has and continues to satisfy most workflows, a new
generation of researchers has emerged looking for sophisticated, scalable,
on-demand, and self-service control of compute infrastructure in a cloud-like
environment. Many also seek safe harbors to operate on or store sensitive
and/or controlled-access data in a high capacity environment.
To cater to these modern users, the Minnesota Supercomputing Institute
designed and deployed Stratus, a locally-hosted cloud environment powered by
the OpenStack platform, and backed by Ceph storage. The subscription-based
service complements existing HPC systems by satisfying the following unmet
needs of our users: a) on-demand availability of compute resources, b)
long-running jobs (i.e., days), c) container-based computing with
Docker, and d) adequate security controls to comply with controlled-access data
requirements.
This document provides an in-depth look at the design of Stratus with respect
to security and compliance with the NIH's controlled-access data policy.
Emphasis is placed on lessons learned while integrating OpenStack and Ceph
features into a so-called "walled garden", and how those technologies
influenced the security design. Many features of Stratus, including tiered
secure storage with the introduction of a controlled-access data "cache",
fault-tolerant live-migrations, and fully integrated two-factor authentication,
depend on recent OpenStack and Ceph features.Comment: 7 pages, 5 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
From Bare Metal to Virtual: Lessons Learned when a Supercomputing Institute Deploys its First Cloud
As primary provider for research computing services at the University of
Minnesota, the Minnesota Supercomputing Institute (MSI) has long been
responsible for serving the needs of a user-base numbering in the thousands.
In recent years, MSI---like many other HPC centers---has observed a growing
need for self-service, on-demand, data-intensive research, as well as the
emergence of many new controlled-access datasets for research purposes. In
light of this, MSI constructed a new on-premise cloud service, named Stratus,
which is architected from the ground up to easily satisfy data-use agreements
and fill four gaps left by traditional HPC. The resulting OpenStack cloud,
constructed from HPC-specific compute nodes and backed by Ceph storage, is
designed to fully comply with controls set forth by the NIH Genomic Data
Sharing Policy.
Herein, we present twelve lessons learned during the ambitious sprint to take
Stratus from inception and into production in less than 18 months. Important,
and often overlooked, components of this timeline included the development of
new leadership roles, staff and user training, and user support documentation.
Along the way, the lessons learned extended well beyond the technical
challenges often associated with acquiring, configuring, and maintaining
large-scale systems.Comment: 8 pages, 5 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
M2: Malleable Metal as a Service
Existing bare-metal cloud services that provide users with physical nodes
have a number of serious disadvantage over their virtual alternatives,
including slow provisioning times, difficulty for users to release nodes and
then reuse them to handle changes in demand, and poor tolerance to failures. We
introduce M2, a bare-metal cloud service that uses network-mounted boot drives
to overcome these disadvantages. We describe the architecture and
implementation of M2 and compare its agility, scalability, and performance to
existing systems. We show that M2 can reduce provisioning time by over 50%
while offering richer functionality, and comparable run-time performance with
respect to tools that provision images into local disks. M2 is open source and
available at https://github.com/CCI-MOC/ims.Comment: IEEE International Conference on Cloud Engineering 201
- …