1,593 research outputs found

    Evaluation of a distributed numerical simulation optimization approach applied to aquifer remediation

    Get PDF
    AbstractIn this paper we evaluate a distributed approach which uses numerical simulation and optimization techniques to automatically find remediation solutions to a hypothetical contaminated aquifer. The repeated execution of the numerical simulation model of the aquifer through the optimization cycles tends to be computationally expensive. To overcome this drawback, the numerical simulations are executed in parallel using a network of heterogeneous workstations. Performance metrics for heterogeneous environments are not trivial; a new way of calculating speedup and efficiency for Bag-of-Tasks (BoT) applications is proposed. The performance of the parallel approach is evaluated

    MOON: MapReduce On Opportunistic eNvironments

    Get PDF
    Abstract—MapReduce offers a flexible programming model for processing and generating large data sets on dedicated resources, where only a small fraction of such resources are every unavailable at any given time. In contrast, when MapReduce is run on volunteer computing systems, which opportunistically harness idle desktop computers via frameworks like Condor, it results in poor performance due to the volatility of the resources, in particular, the high rate of node unavailability. Specifically, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. The adaptive task and data scheduling algorithms in MOON distinguish between (1) different types of MapReduce data and (2) different types of node outages in order to strategically place tasks and data on both volatile and dedicated nodes. Our tests demonstrate that MOON can deliver a 3-fold performance improvement to Hadoop in volatile, volunteer computing environments

    Runtime support for load balancing of parallel adaptive and irregular applications

    Get PDF
    Applications critical to today\u27s engineering research often must make use of the increased memory and processing power of a parallel machine. While advances in architecture design are leading to more and more powerful parallel systems, the software tools needed to realize their full potential are in a much less advanced state. In particular, efficient, robust, and high-performance runtime support software is critical in the area of dynamic load balancing. While the load balancing of loosely synchronous codes, such as field solvers, has been studied extensively for the past 15 years, there exists a class of problems, known as asynchronous and highly adaptive , for which the dynamic load balancing problem remains open. as we discuss, characteristics of this class of problems render compile-time or static analysis of little benefit, and complicate the dynamic load balancing task immensely.;We make two contributions to this area of research. The first is the design and development of a runtime software toolkit, known as the Parallel Runtime Environment for Multi-computer Applications, or PREMA, which provides interprocessor communication, a global namespace, a framework for the implementation of customized scheduling policies, and several such policies which are prevalent in the load balancing literature. The PREMA system is designed to support coarse-grained domain decompositions with the goals of portability, flexibility, and maintainability in mind, so that developers will quickly feel comfortable incorporating it into existing codes and developing new codes which make use of its functionality. We demonstrate that the programming model and implementation are efficient and lead to the development of robust and high-performance applications.;Our second contribution is in the area of performance modeling. In order to make the most effective use of the PREMA runtime software, certain parameters governing its execution must be set off-line. Optimal values for these parameters may be determined through repeated executions of the target application; however, this is not always possible, particularly in large-scale environments and long-running applications. We present an analytic model that allows the user to quickly and inexpensively predict application performance and fine-tune applications built on the PREMA platform

    A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Brok

    Full text link
    Computational Grids, coupling geographically distributed resources such as PCs, workstations, clusters, and scientific instruments, have emerged as a next generation computing platform for solving large-scale problems in science, engineering, and commerce. However, application development, resource management, and scheduling in these environments continue to be a complex undertaking. In this article, we discuss our efforts in developing a resource management system for scheduling computations on resources distributed across the world with varying quality of service. Our service-oriented grid computing system called Nimrod-G manages all operations associated with remote execution including resource discovery, trading, scheduling based on economic principles and a user defined quality of service requirement. The Nimrod-G resource broker is implemented by leveraging existing technologies such as Globus, and provides new services that are essential for constructing industrial-strength Grids. We discuss results of preliminary experiments on scheduling some parametric computations using the Nimrod-G resource broker on a world-wide grid testbed that spans five continents

    The Grid[Way] Job Template Manager, a tool for parameter sweeping

    Full text link
    Parameter sweeping is a widely used algorithmic technique in computational science. It is specially suited for high-throughput computing since the jobs evaluating the parameter space are loosely coupled or independent. A tool that integrates the modeling of a parameter study with the control of jobs in a distributed architecture is presented. The main task is to facilitate the creation and deletion of job templates, which are the elements describing the jobs to be run. Extra functionality relies upon the GridWay Metascheduler, acting as the middleware layer for job submission and control. It supports interesting features like multi-dimensional sweeping space, wildcarding of parameters, functional evaluation of ranges, value-skipping and job template automatic indexation. The use of this tool increases the reliability of the parameter sweep study thanks to the systematic bookkeping of job templates and respective job statuses. Furthermore, it simplifies the porting of the target application to the grid reducing the required amount of time and effort.Comment: 26 pages, 1 figure

    Support for flexible and transparent distributed computing

    Get PDF
    Modern distributed computing developed from the traditional supercomputing community rooted firmly in the culture of batch management. Therefore, the field has been dominated by queuing-based resource managers and work flow based job submission environments where static resource demands needed be determined and reserved prior to launching executions. This has made it difficult to support resource environments (e.g. Grid, Cloud) where the available resources as well as the resource requirements of applications may be both dynamic and unpredictable. This thesis introduces a flexible execution model where the compute capacity can be adapted to fit the needs of applications as they change during execution. Resource provision in this model is based on a fine-grained, self-service approach instead of the traditional one-time, system-level model. The thesis introduces a middleware based Application Agent (AA) that provides a platform for the applications to dynamically interact and negotiate resources with the underlying resource infrastructure. We also consider the issue of transparency, i.e., hiding the provision and management of the distributed environment. This is the key to attracting public to use the technology. The AA not only replaces user-controlled process of preparing and executing an application with a transparent software-controlled process, it also hides the complexity of selecting right resources to ensure execution QoS. This service is provided by an On-line Feedback-based Automatic Resource Configuration (OAC) mechanism cooperating with the flexible execution model. The AA constantly monitors utility-based feedbacks from the application during execution and thus is able to learn its behaviour and resource characteristics. This allows it to automatically compose the most efficient execution environment on the fly and satisfy any execution requirements defined by users. Two policies are introduced to supervise the information learning and resource tuning in the OAC. The Utility Classification policy classifies hosts according to their historical performance contributions to the application. According to this classification, the AA chooses high utility hosts and withdraws low utility hosts to configure an optimum environment. The Desired Processing Power Estimation (DPPE) policy dynamically configures the execution environment according to the estimated desired total processing power needed to satisfy users’ execution requirements. Through the introducing of flexibility and transparency, a user is able to run a dynamic/normal distributed application anywhere with optimised execution performance, without managing distributed resources. Based on the standalone model, the thesis further introduces a federated resource negotiation framework as a step forward towards an autonomous multi-user distributed computing world

    Efficient distributed load balancing for parallel algorithms

    Get PDF
    2009 - 2010With the advent of massive parallel processing technology, exploiting the power offered by hundreds, or even thousands of processors is all but a trivial task. Computing by using multi-processor, multi-core or many-core adds a number of additional challenges related to the cooperation and communication of multiple processing units. The uneven distribution of data among the various processors, i.e. the load imbalance, represents one of the major problems in data parallel applications. Without good load distribution strategies, we cannot reach good speedup, thus good efficiency. Load balancing strategies can be classified in several ways, according to the methods used to balance workload. For instance, dynamic load balancing algorithms make scheduling decisions during the execution and commonly results in better performance compared to static approaches, where task assignment is done before the execution. Even more important is the difference between centralized and distributed load balancing approaches. In fact, despite that centralized algorithms have a wider vision of the computation, hence may exploit smarter balancing techniques, they expose global synchronization and communication bottlenecks involving the master node. This definitely does not assure scalability with the number of processors. This dissertation studies the impact of different load balancing strategies. In particular, one of the key observations driving our work is that distributed algorithms work better than centralized ones in the context of load balancing for multi-processors (alike for multi-cores and many-cores as well). We first show a centralized approach for load balancing, then we propose several distributed approaches for problems having different parallelization, workload distribution and communication pattern. We try to efficiently combine several approaches to improve performance, in particular using predictive metrics to obtain a per task compute-time estimation, using adaptive subdivision, improving dynamic load balancing and addressing distributed balancing schemas. The main challenge tackled on this thesis has been to combine all these approaches together in new and efficient load balancing schemas. We assess the proposed balancing techniques, starting from centralized approaches to distributed ones, in distinctive real case scenarios: Mesh-like computation, Parallel Ray Tracing, and Agent-based Simulations. Moreover, we test our algorithms with parallel hardware such has cluster of workstations, multi-core processors and exploiting SIMD vectorial instruction set. Finally, we conclude the thesis with several remarks, about the impact of distributed techniques, the effect of the communication pattern and workload distribution, the use of cost estimation for adaptive partitioning, the trade-off fast versus accuracy in prediction-based approaches, the effectiveness of work stealing combined with sorting, and a non-trivial way to exploit hybrid CPUGPU computations. [edited by author]IX n.s

    Experiences with Mesh-like computations using Prediction Binary Trees

    Get PDF
    In this paper we aim at exploiting the temporal coherence among successive phases of a computation, in order to implement a load-balancing technique in mesh-like computations to be mapped on a cluster of processors. A key concept, on which the load balancing schema is built on, is the use of a Predictor component that is in charge of providing an estimation of the unbalancing between successive phases. By using this information, our method partitions the computation in balanced tasks through the Prediction Binary Tree (PBT). At each new phase, current PBT is updated by using previous phase computing time for each task as next phase's cost estimate. The PBT is designed so that it balances the load across the tasks as well as reduces {\em dependency} among processors for higher performances. Reducing dependency is obtained by using rectangular tiles of the mesh, of almost-square shape (i. e. one dimension is at most twice the other). By reducing dependency, one can reduce inter-processors communication or exploit local dependencies among tasks (such as data locality). Furthermore, we also provide two heuristics which take advantage of data-locality. Our strategy has been assessed on a significant problem, Parallel Ray Tracing. Our implementation shows a good scalability, and improves performance in both cheaper commodity cluster and high performance clusters with low latency networks. We report different measurements showing that tasks granularity is a key point for the performances of our decomposition/mapping strategy
    corecore