2 research outputs found

    A Software Cache Autotuning Strategy for Dataflow Computing with UPC++ DepSpawn

    Get PDF
    This is the accepted version of the following article: B. B. Fraguela, D. Andrade. A software cache autotuning strategy for dataflow computing with UPC++ DepSpawn. Computational and Mathematical Methods, 3(6), e1148. November 2021, which has been published in final form at http://dx.doi.org/10.1002/cmm4.1148. This article may be used for noncommercial purposes in accordance with the Wiley Self-Archiving Policy [http://www.wileyauthors.com/self-archiving].[Abstract] Dataflow computing allows to start computations as soon as all their dependencies are satisfied. This is particularly useful in applications with irregular or complex patterns of dependencies which would otherwise involve either coarse grain synchronizations which would degrade performance, or high programming costs. A recent proposal for the easy development of performant dataflow algorithms in hybrid shared/distributed memory systems is UPC++ DepSpawn. Among the many techniques it applies to provide good performance is a software cache that minimizes the communications among the processes involved. In this article we provide the details of the implementation and operation of this cache and we present an autotuning strategy that simplifies its usage by freeing the user from having to estimate an adequate size for this cache. Rather, the runtime is now able to define reasonably sized caches that provide near optimal behavior.This research was funded by the Ministry of Science and Innovation of Spain (TIN2016-75845-P and PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/501100011033), and by the Xunta de Galicia co-funded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2017/04). The authors acknowledge also the support from the Centro Singular de Investigación de Galicia “CITIC,” funded by Xunta de Galicia and the European Union (European Regional Development Fund- Galicia 2014-2020 Program), by grant ED431G 2019/01. They also acknowledge the Centro de Supercomputación de Galicia (CESGA) for the use of its computersXunta de Galicia; ED431C 2017/04Xunta de Galicia; ED431G/0

    A Comparison of Task Parallel Frameworks based on Implicit Dependencies in Multi-core Environments

    Get PDF
    The larger flexibility that task parallelism offers with respect to data parallelism comes at the cost of a higher complexity due to the variety of tasks and the arbitrary patterns of dependences that they can exhibit. These dependencies should be expressed not only correctly, but optimally, i.e. avoiding over-constraints, in order to obtain the maximum performance from the underlying hardware. There have been many proposals to facilitate this non-trivial task, particularly within the scope of nowadays ubiquitous multi-core architectures. A very interesting family of solutions because of their large scope of application, ease of use and potential performance are those in which the user declares the dependences of each task, and lets the parallel programming framework figure out which are the concrete dependences that appear at runtime and schedule accordingly the parallel tasks. Nevertheless, as far as we know, there are no comparative studies of them that help users identify their relative advantages. In this paper we describe and evaluate four tools of this class discussing the strengths and weaknesses we have found in their use.
    corecore