2,095 research outputs found

    Space and time adaptation for parallel applications via data over-partitioning.

    Get PDF
    Adaptive resource allocation is a new feature to run parallel applications. It is used to obtain better space and time sharing according to current workload, to schedule around obstacles through reservation and to cope with lack of accurate predictability on heterogeneous resources. The implementation of resource adaptation is potentially very expensive if total remapping or partitioning from scratch has to be performed. The existing popular run-time systems include AMPI and Dome. AMPI, which uses huge numbers of threads in MPI process to implement resource adaptation, suffers from frequent thread switches and loss of cache locality; and Dome, an object-based migration environment, suffers from lack of general language supports. When resource adaptation occurs, load balancing techniques are used to allocate the workload fairly across processors, so that each processor takes roughly the same time to execute the processes assigned to it, and that every processor has the same workload to obtain the best performance and maximize resource utilization. This thesis proposes a novel approach---Adaptive Time/space sharing via Over-Partitioning (ATOP)---to implement resource adaptation with better performance in terms of time overhead. Total workload is represented by a data graph. ATOP performs over-partitioning on the graph to create a certain number of workload pieces, or partitions, while processing partitions per processor as one data collection in a single MPI process. Typically, the number of partitions is set equal to the number of processors potentially allocated. This approach is feasible for the applications using 2n processors. In the cases where our over-partitioning approach does not perform well, or non-fitting numbers of resources need to be chosen, ATOP still provides the alternative option to repartition from scratch. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .H36. Source: Masters Abstracts International, Volume: 43-03, page: 0876. Adviser: A. C. Sodan. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    Achieving High Speed CFD simulations: Optimization, Parallelization, and FPGA Acceleration for the unstructured DLR TAU Code

    Get PDF
    Today, large scale parallel simulations are fundamental tools to handle complex problems. The number of processors in current computation platforms has been recently increased and therefore it is necessary to optimize the application performance and to enhance the scalability of massively-parallel systems. In addition, new heterogeneous architectures, combining conventional processors with specific hardware, like FPGAs, to accelerate the most time consuming functions are considered as a strong alternative to boost the performance. In this paper, the performance of the DLR TAU code is analyzed and optimized. The improvement of the code efficiency is addressed through three key activities: Optimization, parallelization and hardware acceleration. At first, a profiling analysis of the most time-consuming processes of the Reynolds Averaged Navier Stokes flow solver on a three-dimensional unstructured mesh is performed. Then, a study of the code scalability with new partitioning algorithms are tested to show the most suitable partitioning algorithms for the selected applications. Finally, a feasibility study on the application of FPGAs and GPUs for the hardware acceleration of CFD simulations is presented

    Hypergraph Partitioning in the Cloud

    Get PDF
    The thesis investigates the partitioning and load balancing problem which has many applications in High Performance Computing (HPC). The application to be partitioned is described with a graph or hypergraph. The latter is of greater interest as hypergraphs, compared to graphs, have a more general structure and can be used to model more complex relationships between groups of objects such as non-symmetric dependencies. Optimal graph and hypergraph partitioning is known to be NP-Hard but good polynomial time heuristic algorithms have been proposed. In this thesis, we propose two multi-level hypergraph partitioning algorithms. The algorithms are based on rough set clustering techniques. The first algorithm, which is a serial algorithm, obtains high quality partitionings and improves the partitioning cut by up to 71\% compared to the state-of-the-art serial hypergraph partitioning algorithms. Furthermore, the capacity of serial algorithms is limited due to the rapid growth of problem sizes of distributed applications. Consequently, we also propose a parallel hypergraph partitioning algorithm. Considering the generality of the hypergraph model, designing a parallel algorithm is difficult and the available parallel hypergraph algorithms offer less scalability compared to their graph counterparts. The issue is twofold: the parallel algorithm and the complexity of the hypergraph structure. Our parallel algorithm provides a trade-off between global and local vertex clustering decisions. By employing novel techniques and approaches, our algorithm achieves better scalability than the state-of-the-art parallel hypergraph partitioner in the Zoltan tool on a set of benchmarks, especially ones with irregular structure. Furthermore, recent advances in cloud computing and the services they provide have led to a trend in moving HPC and large scale distributed applications into the cloud. Despite its advantages, some aspects of the cloud, such as limited network resources, present a challenge to running communication-intensive applications and make them non-scalable in the cloud. While hypergraph partitioning is proposed as a solution for decreasing the communication overhead within parallel distributed applications, it can also offer advantages for running these applications in the cloud. The partitioning is usually done as a pre-processing step before running the parallel application. As parallel hypergraph partitioning itself is a communication-intensive operation, running it in the cloud is hard and suffers from poor scalability. The thesis also investigates the scalability of parallel hypergraph partitioning algorithms in the cloud, the challenges they present, and proposes solutions to improve the cost/performance ratio for running the partitioning problem in the cloud. Our algorithms are implemented as a new hypergraph partitioning package within Zoltan. It is an open source Linux-based toolkit for parallel partitioning, load balancing and data-management designed at Sandia National Labs. The algorithms are known as FEHG and PFEHG algorithms

    Building science gateways by utilizing the generic WS-PGRADE/gUSE workflow system

    Get PDF

    Time adaptation for parallel applications in unbalanced time sharing environment

    Get PDF
    Time adaptation is very significant for parallel jobs running on a parallel centralized or distributed multiprocessor machine. The turnaround time of an individual job depends on the turnaround time of each of its processes. Dynamic load balancing for unbalanced time sharing environment helps to equally distribute the work load among the available resources, so that all processes of a single job end almost at the same time, thus minimizing the turnaround time and maximizing the resource utilization. In this thesis we propose and implement an approach that helps parallel applications to use our library so that it can adapt in time dimension (if running in a time sharing environment) without changing the space allocation. This approach provides an interface between application, monitoring information, the job scheduler and a cost model that considers application, system and load-balancing information. This interface allows binding of different adaptation approaches for synchronous adaptation and semi-static remapping. We also determined job types for what this approach is suitable and at the end we present results from our test run on a 16-node cluster with synthetic MPI programs and a time adaptation approach, demonstrating the gain from our approach. In this work, we make extension of existing ATOP [11] work. We directly use their over partitioning strategy. But unlike ATOP, applications can use our adaptation library and adapt dynamically. We also adopted the dynamic directory concept used in SCOJO [8]. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .A74. Source: Masters Abstracts International, Volume: 44-03, page: 1393. Thesis (M.Sc.)--University of Windsor (Canada), 2005
    corecore