441 research outputs found

    Improving utilization of heterogeneous clusters

    Get PDF
    Datacenters often agglutinate sets of nodes with different capabilities, leading to a sub-optimal resource utilization. One of the best ways of improving utilization is to balance the load by taking into account the heterogeneity of these clusters. This article presents a novel way of expressing computational capacity, more adequate for heterogeneous clusters, and also advocates for task migration in order to further improve the utilization. The experimental evaluation shows that both proposals are advantageous and allow improving the utilization of heterogeneous clusters and reducing the makespan to 16.7% and 17.1%, respectively.This work has been supported by the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and TIN2016-81840-REDT (CAPAP-H6 network) and the European HiPEAC Network of Excellenc

    HARP: A Dynamic Inertial Spectral Partitioner

    Get PDF
    Partitioning unstructured graphs is central to the parallel solution of computational science and engineering problems. Spectral partitioners, such recursive spectral bisection (RSB), have proven effecfive in generating high-quality partitions of realistically-sized meshes. The major problem which hindered their wide-spread use was their long execution times. This paper presents a new inertial spectral partitioner, called HARP. The main objective of the proposed approach is to quickly partition the meshes at runtime in a manner that works efficiently for real applications in the context of distributed-memory machines. The underlying principle of HARP is to find the eigenvectors of the unpartitioned vertices and then project them onto the eigerivectors of the original mesh. Results for various meshes ranging in size from 1000 to 100,000 vertices indicate that HARP can indeed partition meshes rapidly at runtime. Experimental results show that our largest mesh can be partitioned sequentially in only a few seconds on an SP2 which is several times faster than other spectral partitioners while maintaining the solution quality of the proven RSB method. A parallel WI version of HARP has also been implemented on IBM SP2 and Cray T3E. Parallel HARP, running on 64 processors SP2 and T3E, can partition a mesh containing more than 100,000 vertices into 64 subgrids in about half a second. These results indicate that graph partitioning can now be truly embedded in dynamically-changing real-world applications

    Parallel Architectures for Planetary Exploration Requirements (PAPER)

    Get PDF
    The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified

    Order Acceptance and Scheduling: A Taxonomy and Review

    Get PDF
    Over the past 20 years, the topic of order acceptance has attracted considerable attention from those who study scheduling and those who practice it. In a firm that strives to align its functions so that profit is maximized, the coordination of capacity with demand may require that business sometimes be turned away. In particular, there is a trade-off between the revenue brought in by a particular order, and all of its associated costs of processing. The present study focuses on the body of research that approaches this trade-off by considering two decisions: which orders to accept for processing, and how to schedule them. This paper presents a taxonomy and a review of this literature, catalogs its contributions and suggests opportunities for future research in this area

    Large-scale parallelism for constraint-based local search: the costas array case study

    Get PDF
    International audienceWe present the parallel implementation of a constraint-based Local Search algorithm and investigate its performance on several hardware plat-forms with several hundreds or thousands of cores. We chose as the basis for these experiments the Adaptive Search method, an efficient sequential Local Search method for Constraint Satisfaction Problems (CSP). After preliminary experiments on some CSPLib benchmarks, we detail the modeling and solving of a hard combinatorial problem related to radar and sonar applications: the Costas Array Problem. Performance evaluation on some classical CSP bench-marks shows that speedups are very good for a few tens of cores, and good up to a few hundreds of cores. However for a hard combinatorial search problem such as the Costas Array Problem, performance evaluation of the sequential version shows results outperforming previous Local Search implementations, while the parallel version shows nearly linear speedups up to 8,192 cores. The proposed parallel scheme is simple and based on independent multi-walks with no communication between processes during search. We also investigated a cooperative multi-walk scheme where processes share simple information, but this scheme does not seem to improve performance

    A high-performance computing framework for Monte Carlo ocean color simulations

    Get PDF
    This paper presents a high-performance computing (HPC) framework for Monte Carlo (MC) simulations in the ocean color (OC) application domain. The objective is to optimize a parallel MC radiative transfer code named MOX, developed by the authors to create a virtual marine environment for investigating the quality of OC data products derived from in situ measurements of in-water radiometric quantities. A consolidated set of solutions for performance modeling, prediction, and optimization is implemented to enhance the efficiency of MC OC simulations on HPC run-time infrastructures. HPC, machine learning, and adaptive computing techniques are applied taking into account a clear separation and systematic treatment of accuracy and precision requirements for large-scale MC OC simulations. The added value of the work is the integration of computational methods and tools for MC OC simulations in the form of an HPC-oriented problem-solving environment specifically tailored to investigate data acquisition and reduction methods for OC field measurements. Study results highlight the benefit of close collaboration between HPC and application domain researchers to improve the efficiency and flexibility of computer simulations in the marine optics application domain. (C) 2016 The Authors. Concurrency and Computation: Practice and Experience Published by John Wiley & Sons Ltd.Portuguese Foundation for Science and Technology (FCT/MEC) [PEst-OE/EEI/UI0527/2011]; ESA [22576/09/I-OL, ARG/003-025/1406/CIMA]; NOVA LINCS [UID/CEC/04516/2013]info:eu-repo/semantics/publishedVersio

    Optimizing work stealing algorithms with scheduling constraints

    Get PDF
    The fork-join paradigm of concurrent expression has gained popularity in conjunction with work-stealing schedulers. Random work-stealing schedulers have been shown to effectively perform dynamic load balancing, yielding provably-efficient schedules and space bounds on shared-memory architectures with uniform memory models. However, the advent of hierarchical, non-uniform multicore systems and large-scale distributed-memory architectures has reduced the efficacy of these scheduling policies. Furthermore, random work stealing schedulers do not exploit persistence within iterative, scientific applications. In this thesis, we prove several properties of work-stealing schedulers that enable online tracing of the tasks with very low overhead. We then describe new scheduling policies that use online schedule introspection to understand scheduler placement and thus improve the performance on NUMA and distributed-memory architectures. Finally, by incorporating an inclusive data effect system into fork--join programs with schedule placement knowledge, we show how we can transform a fork-join program to significantly improve locality
    • …
    corecore