47 research outputs found
Identifying Logical Homogeneous Clusters for Efficient Wide-area Communications
Recently, many works focus on the implementation of collective communication
operations adapted to wide area computational systems, like computational Grids
or global-computing. Due to the inherently heterogeneity of such environments,
most works separate "clusters" in different hierarchy levels. to better model
the communication. However, in our opinion, such works do not give enough
attention to the delimitation of such clusters, as they normally use the
locality or the IP subnet from the machines to delimit a cluster without
verifying the "homogeneity" of such clusters. In this paper, we describe a
strategy to gather network information from different local-area networks and
to construct "logical homogeneous clusters", better suited to the performance
modelling.Comment: http://www.springerlink.com/index/TTJJL61R1EXDLCM
Holistic Slowdown Driven Scheduling and Resource Management for Malleable Jobs
In job scheduling, the concept of malleability has been explored since many
years ago. Research shows that malleability improves system performance, but
its utilization in HPC never became widespread. The causes are the difficulty
in developing malleable applications, and the lack of support and integration
of the different layers of the HPC software stack. However, in the last years,
malleability in job scheduling is becoming more critical because of the
increasing complexity of hardware and workloads. In this context, using nodes
in an exclusive mode is not always the most efficient solution as in
traditional HPC jobs, where applications were highly tuned for static
allocations, but offering zero flexibility to dynamic executions. This paper
proposes a new holistic, dynamic job scheduling policy, Slowdown Driven
(SD-Policy), which exploits the malleability of applications as the key
technology to reduce the average slowdown and response time of jobs. SD-Policy
is based on backfill and node sharing. It applies malleability to running jobs
to make room for jobs that will run with a reduced set of resources, only when
the estimated slowdown improves over the static approach. We implemented
SD-Policy in SLURM and evaluated it in a real production environment, and with
a simulator using workloads of up to 198K jobs. Results show better resource
utilization with the reduction of makespan, response time, slowdown, and energy
consumption, up to respectively 7%, 50%, 70%, and 6%, for the evaluated
workloads
Analysis of DNA sequence transformations on grids
Study of the evolution of species or organisms is essential for various biological applications. Evolution is typically studied at the molecular level by analyzing the mutations of DNA sequences of organisms. Techniques have been developed for building phylogenetic or evolutionary trees for a set of sequences. Though phylogenetic trees capture the overall evolutionary relationships among the sequences, they do not reveal fine-level details of the evolution. In this work, we attempt to resolve various fine-level sequence transformation details associated with a phylogenetic tree using cellular automata. In particular, our work tries to determine the cellular automata rules for neighbor-dependent mutations of segments of DNA sequences. We also determine the number of time steps needed for evolution of a progeny from an ancestor and the unknown segments of the intermediate sequences in the phylogenetic tree. Due to the existence of vast number of cellular automata rules, we have developed a grid system that performs parallel guided explorations of the rules on grid resources. We demonstrate our techniques by conducting experiments on a grid comprising machines in three countries and obtaining potentially useful statistics regarding evolutions in three HIV sequences. In particular, our work is able to verify the phenomenon of neighbor-dependent mutations and find that certain combinations of neighbor-dependent mutations, defined by a cellular automata rule, occur with greater than 90% probability. We also find the average number of time steps for mutations for some branches of phylogenetic tree over a large number of possible transformations with standard deviations less than 2
HyPar: A divide-and-conquer model for hybrid CPU-GPU graph processing
Efficient processing of graph applications on heterogeneous CPU-GPU systems require effectively harnessing the combined power of both the CPU and GPU devices. This paper presents HyPar, a divide-and-conquer model for processing graph applications on hybrid CPU-GPU systems. Our strategy partitions the given graph across the devices and performs simultaneous independent computations on both the devices. The model provides a simple and generic API, supported with efficient runtime strategies for hybrid executions. The divide-and-conquer model is demonstrated with five graph applications and using experiments with these applications on a heterogeneous system it is shown that our HyPar strategy provides equivalent performance to the state-of-art, optimized CPU-only and GPU-only implementations of the corresponding applications. When compared to the prevalent BSP approach for multi-device executions of graphs, our HyPar method yields 74%-92% average performance improvements
An Efficient MPI_Allgather for Grids
Allgather is an important MPI collective communication. Most of the algorithms for allgather have been designed for homogeneous and tightly coupled systems. The existing algorithms for allgather on Gridsystems do not efficiently utilize the bandwidths available on slow wide-area links of the grid. In this paper, we present an algorithm for allgather on grids that efficiently utilizes wide-area bandwidths and is also wide-area optimal. Our algorithm is also adaptive to gridload dynamics since it considers transient network characteristics for dividing the nodes into clusters. Our experiments on a real-grid setup consisting of 3 sites show that our algorithm gives an average performance improvement of 52% over existing strategies
SRS: A framework for developing malleable and migratable parallel applications for distributed systems
The ability to produce malleable parallel applications that can be stopped and reconfigured during the execution can offer attractive benefits for both the system and the applications. The reconfiguration can be in terms of varying the parallelism for the applications, changing the data distributions during the executions or dynamically changing the software components involved in the application execution. In distributed and Grid computing systems, migration and reconfiguration of such malleable applications across distributed heterogeneous sites which do not share common file systems provides flexibility for scheduling and resource management in such distributed environments. The present reconfiguration systems do not support migration of parallel applications to distributed locations. In this paper, we discuss a framework for developing malleable and migratable MPI message-passing parallel applications for distributed systems. The framework includes a user-level checkpointing library called SRS and a runtime support system that manages the checkpointed data for distribution to distributed locations. Our experiment results indicate that the parallel applications, with instrumentation to SRS library, were able to achieve reconfigurability incurring about 15- 35% overhead