7 research outputs found

    Enhancing Resource Management through Prediction-based Policies

    Full text link
    Task-based programming models are emerging as a promising alternative to make the most of multi-/many-core systems. These programming models rely on runtime systems, and their goal is to improve application performance by properly scheduling application tasks to cores. Additionally, these runtime systems offer policies to cope with application phases that lack in parallelism to fill all cores. However, these policies are usually static and favor either performance or energy efficiency. In this paper, we have extended a task-based runtime system with a lightweight monitoring and prediction infrastructure that dynamically predicts the optimal number of cores required for each application phase, thus improving both performance and energy efficiency. Through the execution of several benchmarks in multi-/many-core systems, we show that our prediction-based policies have competitive performance while improving energy efficiency when compared to state of the art policies.Comment: Postprint submitted and published at Euro-Par2020: International European Conference on Parallel and Distributed Computing (Springer) (https://link.springer.com/chapter/10.1007%2F978-3-030-57675-2_31

    A power-aware, self-adaptive macro data flow framework

    Get PDF
    The dataflow programming model has been extensively used as an effective solution to implement efficient parallel programming frameworks. However, the amount of resources allocated to the runtime support is usually fixed once by the programmer or the runtime, and kept static during the entire execution. While there are cases where such a static choice may be appropriate, other scenarios may require to dynamically change the parallelism degree during the application execution. In this paper we propose an algorithm for multicore shared memory platforms, that dynamically selects the optimal number of cores to be used as well as their clock frequency according to either the workload pressure or to explicit user requirements. We implement the algorithm for both structured and unstructured parallel applications and we validate our proposal over three real applications, showing that it is able to save a significant amount of power, while not impairing the performance and not requiring additional effort from the application programmer

    Adaptive, efficient, parallel execution of parallel programs

    Get PDF
    Abstract Future multicore processors will be heterogeneous, be increasingly less reliable, and operate in dynamically changing operating conditions. Such environments will result in a constantly varying pool of hardware resources which can greatly complicate the task of efficiently exposing a program's parallelism onto these resources. Coupled with this uncertainty is the diverse set of efficiency metrics that users may desire. This paper proposes Varuna, a system that dynamically, continuously, rapidly and transparently adapts a program's parallelism to best match the instantaneous capabilities of the hardware resources while satisfying different efficiency metrics. Varuna is applicable to both multithreaded and task-based programs and can be seamlessly inserted between the program and the operating system without needing to change the source code of either. We demonstrate Varuna's effectiveness in diverse execution environments using unaltered C/C++ parallel programs from various benchmark suites. Regardless of the execution environment, Varuna always outperformed the state-of-the-art approaches for the efficiency metrics considered

    Holistic run-time parallelism management for time and energy efficiency

    No full text

    Adaptive parallelism mapping in dynamic environments using machine learning

    Get PDF
    Modern day hardware platforms are parallel and diverse, ranging from mobiles to data centers. Mainstream parallel applications execute in the same system competing for resources. This resource contention may lead to a drastic degradation in a program’s performance. In addition, the execution environment composed of workloads and hardware resources, is dynamic and unpredictable. Efficient matching of program parallelism to machine parallelism under uncertainty is hard. The mapping policies that determine the optimal allocation of work to threads should anticipate these variations. This thesis proposes solutions to the mapping of parallel programs in dynamic environments. It employs predictive modelling techniques to determine the best degree of parallelism. Firstly, this thesis proposes a machine learning-based model to determine the optimal thread number for a target program co-executing with varying workloads. For this purpose, this offline trained model uses static code features and dynamic runtime information as input. Next, this thesis proposes a novel solution to monitor the proposed offline model and adjust its decisions in response to the environment changes. It develops a second predictive model for determining how the future environment should be, if the current thread prediction was optimal. Depending on how close this prediction was to the actual environment, the predicted thread numbers are adjusted. Furthermore, considering the multitude of potential execution scenarios where no single policy is best suited in all cases, this work proposes an approach based on the idea of mixture of experts. It considers a number of offline experts or mapping policies, each specialized for a given scenario, and learns online the best expert that is optimal for the current execution. When evaluated on highly dynamic executions, these solutions are proven to surpass default, state-of-art adaptive and analytic approaches
    corecore