Search CORE

66 research outputs found

Exploring the Fairness and Resource Distribution in an Apache Mesos Environment

Author: Beltre Angel
Govindaraju Madhusudhan
Saha Pankaj
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/05/2019
Field of study

Apache Mesos, a cluster-wide resource manager, is widely deployed in massive scale at several Clouds and Data Centers. Mesos aims to provide high cluster utilization via fine grained resource co-scheduling and resource fairness among multiple users through Dominant Resource Fairness (DRF) based allocation. DRF takes into account different resource types (CPU, Memory, Disk I/O) requested by each application and determines the share of each cluster resource that could be allocated to the applications. Mesos has adopted a two-level scheduling policy: (1) DRF to allocate resources to competing frameworks and (2) task level scheduling by each framework for the resources allocated during the previous step. We have conducted experiments in a local Mesos cluster when used with frameworks such as Apache Aurora, Marathon, and our own framework Scylla, to study resource fairness and cluster utilization. Experimental results show how informed decision regarding second level scheduling policy of frameworks and attributes like offer holding period, offer refusal cycle and task arrival rate can reduce unfair resource distribution. Bin-Packing scheduling policy on Scylla with Marathon can reduce unfair allocation from 38\% to 3\%. By reducing unused free resources in offers we bring down the unfairness from to 90\% to 28\%. We also show the effect of task arrival rate to reduce the unfairness from 23\% to 7\%

arXiv.org e-Print Archive

Crossref

Tromino: Demand and DRF Aware Multi-Tenant Queue Manager for Apache Mesos Cluster

Author: Beltre Angel
Govindaraju Madhusudhan
Saha Pankaj
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/05/2019
Field of study

Apache Mesos, a two-level resource scheduler, provides resource sharing across multiple users in a multi-tenant cluster environment. Computational resources (i.e., CPU, memory, disk, etc. ) are distributed according to the Dominant Resource Fairness (DRF) policy. Mesos frameworks (users) receive resources based on their current usage and are responsible for scheduling their tasks within the allocation. We have observed that multiple frameworks can cause fairness imbalance in a multiuser environment. For example, a greedy framework consuming more than its fair share of resources can deny resource fairness to others. The user with the least Dominant Share is considered first by the DRF module to get its resource allocation. However, the default DRF implementation, in Apache Mesos' Master allocation module, does not consider the overall resource demands of the tasks in the queue for each user/framework. This lack of awareness can result in users without any pending task receiving more resource offers while users with a queue of pending tasks starve due to their high dominant shares. We have developed a policy-driven queue manager, Tromino, for an Apache Mesos cluster where tasks for individual frameworks can be scheduled based on each framework's overall resource demands and current resource consumption. Dominant Share and demand awareness of Tromino and scheduling based on these attributes can reduce (1) the impact of unfairness due to a framework specific configuration, and (2) unfair waiting time due to higher resource demand in a pending task queue. In the best case, Tromino can significantly reduce the average waiting time of a framework by using the proposed Demand-DRF aware policy

arXiv.org e-Print Archive

Crossref

Evaluation of Docker Containers for Scientific Workloads in the Cloud

Author: Beltre Angel
Govindaraju Madhusudhan
Saha Pankaj
Uminski Piotr
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/05/2019
Field of study

The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mechanism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal resource management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container technologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that scientific workloads for both Docker and Singularity based containers can achieve near-native performance. Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation

arXiv.org e-Print Archive

Crossref

LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications

Author: Madhusudhan Govindaraju
Zacharia Fadika
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Abstract—Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! and Facebook for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and diskbased systems, and can also be brought to bear in processing small but CPU intensive dis-tributed applications. In this work, we focus both on the perfor-mance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. In this paper, we present LEMO-MR (Low overhead, elastic, configurable for in-memory applications, and on-demand fault tolerance), an optimized implementation of MapReduce, for both on-disk and in-memory applications, describe its architecture and identify not only the necessary components of this model, but also trade offs and factors to be considered. We show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. Finally, we quantify the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment. I

CiteSeerX

Crossref

MARIANE: MApReduce Implementation Adapted for HPC Environments

Author: Dede Elif
Fadika Zacharia
Govindaraju Madhusudhan
Ramakrishnan Lavanya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

MapReduce is increasingly becoming a popular framework, and a potent programming model. The most popular open source implementation of MapReduce, Hadoop, is based on the Hadoop Distributed File System (HDFS). However, as HDFS is not POSIX compliant, it cannot be fully leveraged by applications running on a majority of existing HPC environments such as Teragrid and NERSC. These HPC environments typicallysupport globally shared file systems such as NFS and GPFS. On such resourceful HPC infrastructures, the use of Hadoop not only creates compatibility issues, but also affects overall performance due to the added overhead of the HDFS. This paper not only presents a MapReduce implementation directly suitable for HPC environments, but also exposes the design choices for better performance gains in those settings. By leveraging inherent distributed file systems' functions, and abstracting them away from its MapReduce framework, MARIANE (MApReduce Implementation Adapted for HPC Environments) not only allows for the use of the model in an expanding number of HPCenvironments, but also allows for better performance in such settings. This paper shows the applicability and high performance of the MapReduce paradigm through MARIANE, an implementation designed for clustered and shared-disk file systems and as such not dedicated to a specific MapReduce solution. The paper identifies the components and trade-offs necessary for this model, and quantifies the performance gains exhibited by our approach in distributed environments over Apache Hadoop in a data intensive setting, on the Magellan testbed at the National Energy Research Scientific Computing Center (NERSC)

CiteSeerX

Crossref

UNT Digital Library

A Study on the Seismic Response of Ground and Reinforced Concrete Buildings in Belgaum Region, India

Author: Govindaraju L.
Madhusudhan C. A.
Quadri S. S.
Publication venue: Scholars\u27 Mine
Publication date: 24/05/2010
Field of study

In the recent years, each unpredictable earthquake (for example Kobe, 1995; Taiwan, 1999; Kocaeli, 1999 and Bhuj, 2001) has revealed its own saga of destruction and failure of engineered structures. Therefore, regional variation in potential earthquake damage should be carefully assessed for better planning towards disaster mitigation. This requires assessment of the seismic ground motion parameters. The present study focus on the seismic response of ground and reinforced concrete buildings in Belgaum region (located in zone III, as per IS 1893–Part1: 2002) in Karnataka state. At present no strong motion records are available in this region and therefore, wavelet-based spectrum compatibility approach is used to generate synthetic earthquake motions for the region. The effect of soil deposits in the region on propagation of seismic motion parameters to the ground surface is investigated based on equivalent linear approach. The Frequency response analysis of buildings of various configurations is carried out with three dimensional numerical modeling. The results indicate the maximum spectral accelerations at the ground surface in the range of 0.68g to 1.29g and peak ground acceleration (PGA) amplifications in the range of 2.16 to 3.13. The predicted fundamental period of soil deposits in the region varies from 0.2 s to 0.4 s. The configuration of buildings susceptible to resonance due to the close matching of resulting wave frequencies of the ground is identified

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

A Revised Analysis of the Open Grid Services Infrastructure

Author: Chiu Kenneth
Gannon Dennis
Govindaraju Madhusudhan
Slominski Aleksander
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 21/02/2012
Field of study

This paper began its life as an unpublished technical review citeanalysis of the proposed Open Grid Services Architecture (OGSA) as described in the papers, ``The Physiology of the Grid'' citefoster by Ian Foster, Carl Kesselman, Jeffrey Nick and Steven Tuecke, and ``The Grid Service Specification (Draft 2/15/02) citeogsi'' by Foster, Kesselman, Tuecke and Karl Czajkowski, Jeffrey Frey and Steve Graham. However, much has changed since the publication of the original documents. The architecture has evolved substantially and the vast majority of our initial concerns have been addressed. In this paper we will describe the evolution of the specification from its original form to the current draft of 10/4/02 authored by S. Tuecke, K. Czajkowski, J. Frey, S. Graham, C. Kesselman, and P. Vanderbilt, which is now the central component of the Global Grid Forum Open Grid Service Infrastructure (OGSI) working group which is co-chaired by Steven Tuecke and David Snelling

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)