2,494 research outputs found
Evaluation of Docker Containers for Scientific Workloads in the Cloud
The HPC community is actively researching and evaluating tools to support
execution of scientific applications in cloud-based environments. Among the
various technologies, containers have recently gained importance as they have
significantly better performance compared to full-scale virtualization, support
for microservices and DevOps, and work seamlessly with workflow and
orchestration tools. Docker is currently the leader in containerization
technology because it offers low overhead, flexibility, portability of
applications, and reproducibility. Singularity is another container solution
that is of interest as it is designed specifically for scientific applications.
It is important to conduct performance and feature analysis of the container
technologies to understand their applicability for each application and target
execution environment. This paper presents a (1) performance evaluation of
Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mechanism
by which Docker containers can be mapped with InfiniBand hardware with RDMA
communication and (3) analysis of mapping elements of parallel workloads to the
containers for optimal resource management with container-ready orchestration
tools. Our experiments are targeted toward application developers so that they
can make informed decisions on choosing the container technologies and
approaches that are suitable for their HPC workloads on cloud infrastructure.
Our performance analysis shows that scientific workloads for both Docker and
Singularity based containers can achieve near-native performance. Singularity
is designed specifically for HPC workloads. However, Docker still has
advantages over Singularity for use in clouds as it provides overlay networking
and an intuitive way to run MPI applications with one container per rank for
fine-grained resources allocation
Revisiting Matrix Product on Master-Worker Platforms
This paper is aimed at designing efficient parallel matrix-product algorithms
for heterogeneous master-worker platforms. While matrix-product is
well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm
and ScaLAPACK outer product algorithm), there are three key hypotheses that
render our work original and innovative:
- Centralized data. We assume that all matrix files originate from, and must
be returned to, the master.
- Heterogeneous star-shaped platforms. We target fully heterogeneous
platforms, where computational resources have different computing powers.
- Limited memory. Because we investigate the parallelization of large
problems, we cannot assume that full matrix panels can be stored in the worker
memories and re-used for subsequent updates (as in ScaLAPACK).
We have devised efficient algorithms for resource selection (deciding which
workers to enroll) and communication ordering (both for input and result
messages), and we report a set of numerical experiments on various platforms at
Ecole Normale Superieure de Lyon and the University of Tennessee. However, we
point out that in this first version of the report, experiments are limited to
homogeneous platforms
Learning from the Success of MPI
The Message Passing Interface (MPI) has been extremely successful as a
portable way to program high-performance parallel computers. This success has
occurred in spite of the view of many that message passing is difficult and
that other approaches, including automatic parallelization and directive-based
parallelism, are easier to use. This paper argues that MPI has succeeded
because it addresses all of the important issues in providing a parallel
programming model.Comment: 12 pages, 1 figur
DIVERSE: a Software Toolkit to Integrate Distributed Simulations with Heterogeneous Virtual Environments
We present DIVERSE (Device Independent Virtual Environments- Reconfigurable, Scalable, Extensible), which is a modular collection of complimentary software packages that we have developed to facilitate the creation of distributed operator-in-the-loop simulations. In DIVERSE we introduce a novel implementation of remote shared memory (distributed shared memory) that uses Internet Protocol (IP) networks. We also introduce a new method that automatically extends hardware drivers (not in the operating system kernel driver sense) into inter-process and Internet hardware services. Using DIVERSE, a program can display in a CAVEâą, ImmersaDeskâą, head mounted display (HMD), desktop or laptop without modification. We have developed a method of configuring user programs at run-time by loading dynamic shared objects (DSOs), in contrast to the more common practice of creating interpreted configuration languages. We find that by loading DSOs the development time, complexity and size of DIVERSE and DIVERSE user applications is significantly reduced. Configurations to support different I/O devices, device emulators, visual displays, and any component of a user application including interaction techniques, can be changed at run-time by loading different sets of DIVERSE DSOs. In addition, interpreted run-time configuration parsers have been implemented using DIVERSE DSOs; new ones can be created as needed.
DIVERSE is free software, licensed under the terms of the GNU General Public License (GPL) and the GNU Lesser General Public License (LGPL) licenses.
We describe the DIVERSE architecture and demonstrate how DIVERSE was used in the development of a specific application, an operator-in-the-loop Navy ship-board crane simulator, which runs unmodified on a desktop computer and/or in a CAVE with motion base motion queuing
GraphH: High Performance Big Graph Analytics in Small Clusters
It is common for real-world applications to analyze big graphs using
distributed graph processing systems. Popular in-memory systems require an
enormous amount of resources to handle big graphs. While several out-of-core
approaches have been proposed for processing big graphs on disk, the high disk
I/O overhead could significantly reduce performance. In this paper, we propose
GraphH to enable high-performance big graph analytics in small clusters.
Specifically, we design a two-stage graph partition scheme to evenly divide the
input graph into partitions, and propose a GAB (Gather-Apply-Broadcast)
computation model to make each worker process a partition in memory at a time.
We use an edge cache mechanism to reduce the disk I/O overhead, and design a
hybrid strategy to improve the communication performance. GraphH can
efficiently process big graphs in small clusters or even a single commodity
server. Extensive evaluations have shown that GraphH could be up to 7.8x faster
compared to popular in-memory systems, such as Pregel+ and PowerGraph when
processing generic graphs, and more than 100x faster than recently proposed
out-of-core systems, such as GraphD and Chaos when processing big graphs
Checkpointing as a Service in Heterogeneous Cloud Environments
A non-invasive, cloud-agnostic approach is demonstrated for extending
existing cloud platforms to include checkpoint-restart capability. Most cloud
platforms currently rely on each application to provide its own fault
tolerance. A uniform mechanism within the cloud itself serves two purposes: (a)
direct support for long-running jobs, which would otherwise require a custom
fault-tolerant mechanism for each application; and (b) the administrative
capability to manage an over-subscribed cloud by temporarily swapping out jobs
when higher priority jobs arrive. An advantage of this uniform approach is that
it also supports parallel and distributed computations, over both TCP and
InfiniBand, thus allowing traditional HPC applications to take advantage of an
existing cloud infrastructure. Additionally, an integrated health-monitoring
mechanism detects when long-running jobs either fail or incur exceptionally low
performance, perhaps due to resource starvation, and proactively suspends the
job. The cloud-agnostic feature is demonstrated by applying the implementation
to two very different cloud platforms: Snooze and OpenStack. The use of a
cloud-agnostic architecture also enables, for the first time, migration of
applications from one cloud platform to another.Comment: 20 pages, 11 figures, appears in CCGrid, 201
- âŠ