860 research outputs found
Virtualizing the Stampede2 Supercomputer with Applications to HPC in the Cloud
Methods developed at the Texas Advanced Computing Center (TACC) are described
and demonstrated for automating the construction of an elastic, virtual cluster
emulating the Stampede2 high performance computing (HPC) system. The cluster
can be built and/or scaled in a matter of minutes on the Jetstream self-service
cloud system and shares many properties of the original Stampede2, including:
i) common identity management, ii) access to the same file systems, iii)
equivalent software application stack and module system, iv) similar job
scheduling interface via Slurm.
We measure time-to-solution for a number of common scientific applications on
our virtual cluster against equivalent runs on Stampede2 and develop an
application profile where performance is similar or otherwise acceptable. For
such applications, the virtual cluster provides an effective form of "cloud
bursting" with the potential to significantly improve overall turnaround time,
particularly when Stampede2 is experiencing long queue wait times. In addition,
the virtual cluster can be used for test and debug without directly impacting
Stampede2. We conclude with a discussion of how science gateways can leverage
the TACC Jobs API web service to incorporate this cloud bursting technique
transparently to the end user.Comment: 6 pages, 0 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
Fine-Grained Scheduling for Containerized HPC Workloads in Kubernetes Clusters
Containerization technology offers lightweight OS-level virtualization, and
enables portability, reproducibility, and flexibility by packing applications
with low performance overhead and low effort to maintain and scale them.
Moreover, container orchestrators (e.g., Kubernetes) are widely used in the
Cloud to manage large clusters running many containerized applications.
However, scheduling policies that consider the performance nuances of
containerized High Performance Computing (HPC) workloads have not been
well-explored yet. This paper conducts fine-grained scheduling policies for
containerized HPC workloads in Kubernetes clusters, focusing especially on
partitioning each job into a suitable multi-container deployment according to
the application profile. We implement our scheduling schemes on different
layers of management (application and infrastructure), so that each component
has its own focus and algorithms but still collaborates with others. Our
results show that our fine-grained scheduling policies outperform baseline and
baseline with CPU/memory affinity enabled policies, reducing the overall
response time by 35% and 19%, respectively, and also improving the makespan by
34% and 11%, respectively. They also provide better usability and flexibility
to specify HPC workloads than other comparable HPC Cloud frameworks, while
providing better scheduling efficiency thanks to their multi-layered approach.Comment: HPCC202
Dynamic HPC Clusters within Amazon Web Services (AWS)
Amazon Web Services (AWS) provides public cloud computing resources and services and is one of the largest cloud computing providers in the world. However, in order to get started using AWS, one must spend many hours overcoming the steep learning curve and terminology associated with AWS. This is especially true for researchers looking to create and utilize a High Performance Computing (HPC) cluster within AWS. This is due to the massive amount of AWS services and AWS resources that must be created and linked together in order to create a fully functional HPC cluster with AWS. The Dynamic AWS HPC Cluster Project aims to help simplify the steps needed to create a fully functional dynamic HPC cluster within AWS. The user simply completes a simple wizard that specifies the details of the HPC cluster that they want: the size and type of the shared filesystem, the type of HPC scheduler, the number of Compute Instances, what IP addresses they want the cluster to be accessible from, and the number of Login/Head Instances required. After all this has been specified, the Dynamic AWS HPC Cluster project makes the required calls to the AWS APIs in order to create all the required AWS resources. After the resources have been created, they are all automatically configured, networked together, and have the usernames and passwords pushed out to all of the cluster instances for SSH login. The user can then run their jobs and when they have no more jobs left to run they can pause the cluster, which means they do not pay for compute charges, and then when they have more jobs to run resume the cluster and run their jobs. This allows users to only pay for the cluster when they need it which can help save them money
- …