1,593 research outputs found
Evaluation of a distributed numerical simulation optimization approach applied to aquifer remediation
AbstractIn this paper we evaluate a distributed approach which uses numerical simulation and optimization techniques to automatically find remediation solutions to a hypothetical contaminated aquifer. The repeated execution of the numerical simulation model of the aquifer through the optimization cycles tends to be computationally expensive. To overcome this drawback, the numerical simulations are executed in parallel using a network of heterogeneous workstations. Performance metrics for heterogeneous environments are not trivial; a new way of calculating speedup and efficiency for Bag-of-Tasks (BoT) applications is proposed. The performance of the parallel approach is evaluated
MOON: MapReduce On Opportunistic eNvironments
Abstract—MapReduce offers a flexible programming model for processing and generating large data sets on dedicated resources, where only a small fraction of such resources are every unavailable at any given time. In contrast, when MapReduce is run on volunteer computing systems, which opportunistically harness idle desktop computers via frameworks like Condor, it results in poor performance due to the volatility of the resources, in particular, the high rate of node unavailability. Specifically, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. The adaptive task and data scheduling algorithms in MOON distinguish between (1) different types of MapReduce data and (2) different types of node outages in order to strategically place tasks and data on both volatile and dedicated nodes. Our tests demonstrate that MOON can deliver a 3-fold performance improvement to Hadoop in volatile, volunteer computing environments
Runtime support for load balancing of parallel adaptive and irregular applications
Applications critical to today\u27s engineering research often must make use of the increased memory and processing power of a parallel machine. While advances in architecture design are leading to more and more powerful parallel systems, the software tools needed to realize their full potential are in a much less advanced state. In particular, efficient, robust, and high-performance runtime support software is critical in the area of dynamic load balancing. While the load balancing of loosely synchronous codes, such as field solvers, has been studied extensively for the past 15 years, there exists a class of problems, known as asynchronous and highly adaptive , for which the dynamic load balancing problem remains open. as we discuss, characteristics of this class of problems render compile-time or static analysis of little benefit, and complicate the dynamic load balancing task immensely.;We make two contributions to this area of research. The first is the design and development of a runtime software toolkit, known as the Parallel Runtime Environment for Multi-computer Applications, or PREMA, which provides interprocessor communication, a global namespace, a framework for the implementation of customized scheduling policies, and several such policies which are prevalent in the load balancing literature. The PREMA system is designed to support coarse-grained domain decompositions with the goals of portability, flexibility, and maintainability in mind, so that developers will quickly feel comfortable incorporating it into existing codes and developing new codes which make use of its functionality. We demonstrate that the programming model and implementation are efficient and lead to the development of robust and high-performance applications.;Our second contribution is in the area of performance modeling. In order to make the most effective use of the PREMA runtime software, certain parameters governing its execution must be set off-line. Optimal values for these parameters may be determined through repeated executions of the target application; however, this is not always possible, particularly in large-scale environments and long-running applications. We present an analytic model that allows the user to quickly and inexpensively predict application performance and fine-tune applications built on the PREMA platform
A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Brok
Computational Grids, coupling geographically distributed resources such as
PCs, workstations, clusters, and scientific instruments, have emerged as a next
generation computing platform for solving large-scale problems in science,
engineering, and commerce. However, application development, resource
management, and scheduling in these environments continue to be a complex
undertaking. In this article, we discuss our efforts in developing a resource
management system for scheduling computations on resources distributed across
the world with varying quality of service. Our service-oriented grid computing
system called Nimrod-G manages all operations associated with remote execution
including resource discovery, trading, scheduling based on economic principles
and a user defined quality of service requirement. The Nimrod-G resource broker
is implemented by leveraging existing technologies such as Globus, and provides
new services that are essential for constructing industrial-strength Grids. We
discuss results of preliminary experiments on scheduling some parametric
computations using the Nimrod-G resource broker on a world-wide grid testbed
that spans five continents
The Grid[Way] Job Template Manager, a tool for parameter sweeping
Parameter sweeping is a widely used algorithmic technique in computational
science. It is specially suited for high-throughput computing since the jobs
evaluating the parameter space are loosely coupled or independent.
A tool that integrates the modeling of a parameter study with the control of
jobs in a distributed architecture is presented. The main task is to facilitate
the creation and deletion of job templates, which are the elements describing
the jobs to be run. Extra functionality relies upon the GridWay Metascheduler,
acting as the middleware layer for job submission and control. It supports
interesting features like multi-dimensional sweeping space, wildcarding of
parameters, functional evaluation of ranges, value-skipping and job template
automatic indexation.
The use of this tool increases the reliability of the parameter sweep study
thanks to the systematic bookkeping of job templates and respective job
statuses. Furthermore, it simplifies the porting of the target application to
the grid reducing the required amount of time and effort.Comment: 26 pages, 1 figure
Support for flexible and transparent distributed computing
Modern distributed computing developed from the traditional supercomputing community rooted firmly
in the culture of batch management. Therefore, the field has been dominated by queuing-based resource
managers and work flow based job submission environments where static resource demands needed be
determined and reserved prior to launching executions. This has made it difficult to support resource
environments (e.g. Grid, Cloud) where the available resources as well as the resource requirements
of applications may be both dynamic and unpredictable. This thesis introduces a flexible execution
model where the compute capacity can be adapted to fit the needs of applications as they change during
execution. Resource provision in this model is based on a fine-grained, self-service approach instead
of the traditional one-time, system-level model. The thesis introduces a middleware based Application
Agent (AA) that provides a platform for the applications to dynamically interact and negotiate resources
with the underlying resource infrastructure.
We also consider the issue of transparency, i.e., hiding the provision and management of the distributed
environment. This is the key to attracting public to use the technology. The AA not only replaces
user-controlled process of preparing and executing an application with a transparent software-controlled
process, it also hides the complexity of selecting right resources to ensure execution QoS. This service
is provided by an On-line Feedback-based Automatic Resource Configuration (OAC) mechanism cooperating
with the flexible execution model. The AA constantly monitors utility-based feedbacks from the
application during execution and thus is able to learn its behaviour and resource characteristics. This
allows it to automatically compose the most efficient execution environment on the fly and satisfy any
execution requirements defined by users. Two policies are introduced to supervise the information learning
and resource tuning in the OAC. The Utility Classification policy classifies hosts according to their
historical performance contributions to the application. According to this classification, the AA chooses
high utility hosts and withdraws low utility hosts to configure an optimum environment. The Desired
Processing Power Estimation (DPPE) policy dynamically configures the execution environment according
to the estimated desired total processing power needed to satisfy users’ execution requirements.
Through the introducing of flexibility and transparency, a user is able to run a dynamic/normal
distributed application anywhere with optimised execution performance, without managing distributed
resources. Based on the standalone model, the thesis further introduces a federated resource negotiation
framework as a step forward towards an autonomous multi-user distributed computing world
Efficient distributed load balancing for parallel algorithms
2009 - 2010With the advent of massive parallel processing technology, exploiting the power
offered by hundreds, or even thousands of processors is all but a trivial task.
Computing by using multi-processor, multi-core or many-core adds a number of
additional challenges related to the cooperation and communication of multiple
processing units.
The uneven distribution of data among the various processors, i.e. the load
imbalance, represents one of the major problems in data parallel applications.
Without good load distribution strategies, we cannot reach good speedup, thus
good efficiency.
Load balancing strategies can be classified in several ways, according to the
methods used to balance workload. For instance, dynamic load balancing algorithms
make scheduling decisions during the execution and commonly results
in better performance compared to static approaches, where task assignment is
done before the execution.
Even more important is the difference between centralized and distributed
load balancing approaches. In fact, despite that centralized algorithms have
a wider vision of the computation, hence may exploit smarter balancing techniques,
they expose global synchronization and communication bottlenecks involving
the master node. This definitely does not assure scalability with the
number of processors.
This dissertation studies the impact of different load balancing strategies.
In particular, one of the key observations driving our work is that distributed
algorithms work better than centralized ones in the context of load balancing
for multi-processors (alike for multi-cores and many-cores as well).
We first show a centralized approach for load balancing, then we propose several
distributed approaches for problems having different parallelization, workload
distribution and communication pattern. We try to efficiently combine several
approaches to improve performance, in particular using predictive metrics
to obtain a per task compute-time estimation, using adaptive subdivision, improving
dynamic load balancing and addressing distributed balancing schemas.
The main challenge tackled on this thesis has been to combine all these approaches
together in new and efficient load balancing schemas.
We assess the proposed balancing techniques, starting from centralized approaches
to distributed ones, in distinctive real case scenarios: Mesh-like computation,
Parallel Ray Tracing, and Agent-based Simulations. Moreover, we
test our algorithms with parallel hardware such has cluster of workstations,
multi-core processors and exploiting SIMD vectorial instruction set.
Finally, we conclude the thesis with several remarks, about the impact of
distributed techniques, the effect of the communication pattern and workload
distribution, the use of cost estimation for adaptive partitioning, the trade-off
fast versus accuracy in prediction-based approaches, the effectiveness of work
stealing combined with sorting, and a non-trivial way to exploit hybrid CPUGPU
computations. [edited by author]IX n.s
Experiences with Mesh-like computations using Prediction Binary Trees
In this paper we aim at exploiting the temporal coherence among successive phases of a computation, in order to implement a load-balancing technique in mesh-like computations to be mapped on a cluster of processors. A key concept, on which the load balancing schema is built on, is the use of a Predictor component that is in charge of providing an estimation of the unbalancing between successive phases. By using this information, our method partitions the computation in balanced tasks through the Prediction Binary Tree (PBT). At each new phase, current PBT is updated by using previous phase computing time for each task as next phase's cost estimate. The PBT is designed so that it balances the load across the tasks as well as reduces {\em dependency} among processors for higher performances. Reducing dependency is obtained by using rectangular tiles of the mesh, of almost-square shape (i. e. one dimension is at most twice the other). By reducing dependency, one can reduce inter-processors communication or exploit local dependencies among tasks (such as data locality). Furthermore, we also provide two heuristics which take advantage of data-locality. Our strategy has been assessed on a significant problem, Parallel Ray Tracing. Our implementation shows a good scalability, and improves performance in both cheaper commodity cluster and high performance clusters with low latency networks. We report different measurements showing that tasks granularity is a key point for the performances of our decomposition/mapping strategy
- …