66 research outputs found
Exploring the Fairness and Resource Distribution in an Apache Mesos Environment
Apache Mesos, a cluster-wide resource manager, is widely deployed in massive
scale at several Clouds and Data Centers. Mesos aims to provide high cluster
utilization via fine grained resource co-scheduling and resource fairness among
multiple users through Dominant Resource Fairness (DRF) based allocation. DRF
takes into account different resource types (CPU, Memory, Disk I/O) requested
by each application and determines the share of each cluster resource that
could be allocated to the applications. Mesos has adopted a two-level
scheduling policy: (1) DRF to allocate resources to competing frameworks and
(2) task level scheduling by each framework for the resources allocated during
the previous step. We have conducted experiments in a local Mesos cluster when
used with frameworks such as Apache Aurora, Marathon, and our own framework
Scylla, to study resource fairness and cluster utilization. Experimental
results show how informed decision regarding second level scheduling policy of
frameworks and attributes like offer holding period, offer refusal cycle and
task arrival rate can reduce unfair resource distribution. Bin-Packing
scheduling policy on Scylla with Marathon can reduce unfair allocation from
38\% to 3\%. By reducing unused free resources in offers we bring down the
unfairness from to 90\% to 28\%. We also show the effect of task arrival rate
to reduce the unfairness from 23\% to 7\%
Tromino: Demand and DRF Aware Multi-Tenant Queue Manager for Apache Mesos Cluster
Apache Mesos, a two-level resource scheduler, provides resource sharing
across multiple users in a multi-tenant cluster environment. Computational
resources (i.e., CPU, memory, disk, etc. ) are distributed according to the
Dominant Resource Fairness (DRF) policy. Mesos frameworks (users) receive
resources based on their current usage and are responsible for scheduling their
tasks within the allocation. We have observed that multiple frameworks can
cause fairness imbalance in a multiuser environment. For example, a greedy
framework consuming more than its fair share of resources can deny resource
fairness to others. The user with the least Dominant Share is considered first
by the DRF module to get its resource allocation. However, the default DRF
implementation, in Apache Mesos' Master allocation module, does not consider
the overall resource demands of the tasks in the queue for each user/framework.
This lack of awareness can result in users without any pending task receiving
more resource offers while users with a queue of pending tasks starve due to
their high dominant shares. We have developed a policy-driven queue manager,
Tromino, for an Apache Mesos cluster where tasks for individual frameworks can
be scheduled based on each framework's overall resource demands and current
resource consumption. Dominant Share and demand awareness of Tromino and
scheduling based on these attributes can reduce (1) the impact of unfairness
due to a framework specific configuration, and (2) unfair waiting time due to
higher resource demand in a pending task queue. In the best case, Tromino can
significantly reduce the average waiting time of a framework by using the
proposed Demand-DRF aware policy
Evaluation of Docker Containers for Scientific Workloads in the Cloud
The HPC community is actively researching and evaluating tools to support
execution of scientific applications in cloud-based environments. Among the
various technologies, containers have recently gained importance as they have
significantly better performance compared to full-scale virtualization, support
for microservices and DevOps, and work seamlessly with workflow and
orchestration tools. Docker is currently the leader in containerization
technology because it offers low overhead, flexibility, portability of
applications, and reproducibility. Singularity is another container solution
that is of interest as it is designed specifically for scientific applications.
It is important to conduct performance and feature analysis of the container
technologies to understand their applicability for each application and target
execution environment. This paper presents a (1) performance evaluation of
Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mechanism
by which Docker containers can be mapped with InfiniBand hardware with RDMA
communication and (3) analysis of mapping elements of parallel workloads to the
containers for optimal resource management with container-ready orchestration
tools. Our experiments are targeted toward application developers so that they
can make informed decisions on choosing the container technologies and
approaches that are suitable for their HPC workloads on cloud infrastructure.
Our performance analysis shows that scientific workloads for both Docker and
Singularity based containers can achieve near-native performance. Singularity
is designed specifically for HPC workloads. However, Docker still has
advantages over Singularity for use in clouds as it provides overlay networking
and an intuitive way to run MPI applications with one container per rank for
fine-grained resources allocation
LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications
Abstract—Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! and Facebook for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and diskbased systems, and can also be brought to bear in processing small but CPU intensive dis-tributed applications. In this work, we focus both on the perfor-mance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. In this paper, we present LEMO-MR (Low overhead, elastic, configurable for in-memory applications, and on-demand fault tolerance), an optimized implementation of MapReduce, for both on-disk and in-memory applications, describe its architecture and identify not only the necessary components of this model, but also trade offs and factors to be considered. We show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. Finally, we quantify the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment. I
MARIANE: MApReduce Implementation Adapted for HPC Environments
MapReduce is increasingly becoming a popular framework, and a potent programming model. The most popular open source implementation of MapReduce, Hadoop, is based on the Hadoop Distributed File System (HDFS). However, as HDFS is not POSIX compliant, it cannot be fully leveraged by applications running on a majority of existing HPC environments such as Teragrid and NERSC. These HPC environments typicallysupport globally shared file systems such as NFS and GPFS. On such resourceful HPC infrastructures, the use of Hadoop not only creates compatibility issues, but also affects overall performance due to the added overhead of the HDFS. This paper not only presents a MapReduce implementation directly suitable for HPC environments, but also exposes the design choices for better performance gains in those settings. By leveraging inherent distributed file systems' functions, and abstracting them away from its MapReduce framework, MARIANE (MApReduce Implementation Adapted for HPC Environments) not only allows for the use of the model in an expanding number of HPCenvironments, but also allows for better performance in such settings. This paper shows the applicability and high performance of the MapReduce paradigm through MARIANE, an implementation designed for clustered and shared-disk file systems and as such not dedicated to a specific MapReduce solution. The paper identifies the components and trade-offs necessary for this model, and quantifies the performance gains exhibited by our approach in distributed environments over Apache Hadoop in a data intensive setting, on the Magellan testbed at the National Energy Research Scientific Computing Center (NERSC)
A Study on the Seismic Response of Ground and Reinforced Concrete Buildings in Belgaum Region, India
In the recent years, each unpredictable earthquake (for example Kobe, 1995; Taiwan, 1999; Kocaeli, 1999 and Bhuj, 2001) has revealed its own saga of destruction and failure of engineered structures. Therefore, regional variation in potential earthquake damage should be carefully assessed for better planning towards disaster mitigation. This requires assessment of the seismic ground motion parameters. The present study focus on the seismic response of ground and reinforced concrete buildings in Belgaum region (located in zone III, as per IS 1893–Part1: 2002) in Karnataka state. At present no strong motion records are available in this region and therefore, wavelet-based spectrum compatibility approach is used to generate synthetic earthquake motions for the region. The effect of soil deposits in the region on propagation of seismic motion parameters to the ground surface is investigated based on equivalent linear approach. The Frequency response analysis of buildings of various configurations is carried out with three dimensional numerical modeling. The results indicate the maximum spectral accelerations at the ground surface in the range of 0.68g to 1.29g and peak ground acceleration (PGA) amplifications in the range of 2.16 to 3.13. The predicted fundamental period of soil deposits in the region varies from 0.2 s to 0.4 s. The configuration of buildings susceptible to resonance due to the close matching of resulting wave frequencies of the ground is identified
A Revised Analysis of the Open Grid Services Infrastructure
This paper began its life as an unpublished technical review citeanalysis of the proposed Open Grid Services Architecture (OGSA) as described in the papers, ``The Physiology of the Grid'' citefoster by Ian Foster, Carl Kesselman, Jeffrey Nick and Steven Tuecke, and ``The Grid Service Specification (Draft 2/15/02) citeogsi'' by Foster, Kesselman, Tuecke and Karl Czajkowski, Jeffrey Frey and Steve Graham. However, much has changed since the publication of the original documents. The architecture has evolved substantially and the vast majority of our initial concerns have been addressed. In this paper we will describe the evolution of the specification from its original form to the current draft of 10/4/02 authored by S. Tuecke, K. Czajkowski, J. Frey, S. Graham, C. Kesselman, and P. Vanderbilt, which is now the central component of the Global Grid Forum Open Grid Service Infrastructure (OGSI) working group which is co-chaired by Steven Tuecke and David Snelling
- …