4,803 research outputs found

    Polly's Polyhedral Scheduling in the Presence of Reductions

    Full text link
    The polyhedral model provides a powerful mathematical abstraction to enable effective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions can lead to a significant performance increase. We give a precise, dependence based, definition of reductions and discuss ways to extend polyhedral optimization to exploit the associativity and commutativity of reduction computations. We have implemented a reduction-enabled scheduling approach in the Polly polyhedral optimizer and evaluate it on the standard Polybench 3.2 benchmark suite. We were able to detect and model all 52 arithmetic reductions and achieve speedups up to 2.21×\times on a quad core machine by exploiting the multidimensional reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho

    copulaedas: An R Package for Estimation of Distribution Algorithms Based on Copulas

    Get PDF
    The use of copula-based models in EDAs (estimation of distribution algorithms) is currently an active area of research. In this context, the copulaedas package for R provides a platform where EDAs based on copulas can be implemented and studied. The package offers complete implementations of various EDAs based on copulas and vines, a group of well-known optimization problems, and utility functions to study the performance of the algorithms. Newly developed EDAs can be easily integrated into the package by extending an S4 class with generic functions for their main components. This paper presents copulaedas by providing an overview of EDAs based on copulas, a description of the implementation of the package, and an illustration of its use through examples. The examples include running the EDAs defined in the package, implementing new algorithms, and performing an empirical study to compare the behavior of different algorithms on benchmark functions and a real-world problem

    Introduction to StarNEig -- A Task-based Library for Solving Nonsymmetric Eigenvalue Problems

    Full text link
    In this paper, we present the StarNEig library for solving dense non-symmetric (generalized) eigenvalue problems. The library is built on top of the StarPU runtime system and targets both shared and distributed memory machines. Some components of the library support GPUs. The library is currently in an early beta state and only real arithmetic is supported. Support for complex data types is planned for a future release. This paper is aimed for potential users of the library. We describe the design choices and capabilities of the library, and contrast them to existing software such as ScaLAPACK. StarNEig implements a ScaLAPACK compatibility layer that should make it easy for a new user to transition to StarNEig. We demonstrate the performance of the library with a small set of computational experiments.Comment: 10 pages, 4 figures (10 when counting sub-figures), 2 tex-files. Submitted to PPAM 2019, 13th international conference on parallel processing and applied mathematics, September 8-11, 2019. Proceedings will be published after the conference by Springer in the LNCS series. Second author's first name is "Carl Christian" and last name "Kjelgaard Mikkelsen

    Chain-based scheduling: Part I - loop transformations and code generation

    Get PDF
    Chain-based scheduling [1] is an efficient partitioning and scheduling scheme for nested loops on distributed-memory multicomputers. The idea is to take advantage of the regular data dependence structure of a nested loop to overlap and pipeline the communication and computation. Most partitioning and scheduling algorithms proposed for nested loops on multicomputers [1,2,3] are graph algorithms on the iteration space of the nested loop. The graph algorithms for partitioning and scheduling are too expensive (at least O(N), where N is the total number of iterations) to be implemented in parallelizing compilers. Graph algorithms also need large data structures to store the result of the partitioning and scheduling. In this paper, we propose compiler loop transformations and the code generation to generate chain-based parallel codes for nested loops on multicomputers. The cost of the loop transformations is O(nd), where n is the number of nesting loops and d is the number of data dependences. Both n and d are very small in real programs. The loop transformations and code generation for chain-based partitioning and scheduling enable parallelizing compilers to generate parallel codes which contain all partitioning and scheduling information that the parallel processors need at run time

    Fast Hardware Implementations of Static P Systems

    Get PDF
    In this article we present a simulator of non-deterministic static P systems using Field Programmable Gate Array (FPGA) technology. Its major feature is a high performance, achieving a constant processing time for each transition. Our approach is based on representing all possible applications as words of some regular context-free language. Then, using formal power series it is possible to obtain the number of possibilities and select one of them following a uniform distribution, in a fair and non-deterministic way. According to these ideas, we yield an implementation whose results show an important speed-up, with a strong independence from the size of the P system.Ministry of Science and Innovation of the Spanish Government under the project TEC2011-27936 (HIPERSYS)European Regional Development Fund (ERDF)Ministry of Education of Spain (FPU grant AP2009-3625)ANR project SynBioTI
    • …
    corecore