4,803 research outputs found
Polly's Polyhedral Scheduling in the Presence of Reductions
The polyhedral model provides a powerful mathematical abstraction to enable
effective optimization of loop nests with respect to a given optimization goal,
e.g., exploiting parallelism. Unexploited reduction properties are a frequent
reason for polyhedral optimizers to assume parallelism prohibiting dependences.
To our knowledge, no polyhedral loop optimizer available in any production
compiler provides support for reductions. In this paper, we show that
leveraging the parallelism of reductions can lead to a significant performance
increase. We give a precise, dependence based, definition of reductions and
discuss ways to extend polyhedral optimization to exploit the associativity and
commutativity of reduction computations. We have implemented a
reduction-enabled scheduling approach in the Polly polyhedral optimizer and
evaluate it on the standard Polybench 3.2 benchmark suite. We were able to
detect and model all 52 arithmetic reductions and achieve speedups up to
2.21 on a quad core machine by exploiting the multidimensional
reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho
copulaedas: An R Package for Estimation of Distribution Algorithms Based on Copulas
The use of copula-based models in EDAs (estimation of distribution
algorithms) is currently an active area of research. In this context, the
copulaedas package for R provides a platform where EDAs based on copulas can be
implemented and studied. The package offers complete implementations of various
EDAs based on copulas and vines, a group of well-known optimization problems,
and utility functions to study the performance of the algorithms. Newly
developed EDAs can be easily integrated into the package by extending an S4
class with generic functions for their main components. This paper presents
copulaedas by providing an overview of EDAs based on copulas, a description of
the implementation of the package, and an illustration of its use through
examples. The examples include running the EDAs defined in the package,
implementing new algorithms, and performing an empirical study to compare the
behavior of different algorithms on benchmark functions and a real-world
problem
Introduction to StarNEig -- A Task-based Library for Solving Nonsymmetric Eigenvalue Problems
In this paper, we present the StarNEig library for solving dense
non-symmetric (generalized) eigenvalue problems. The library is built on top of
the StarPU runtime system and targets both shared and distributed memory
machines. Some components of the library support GPUs. The library is currently
in an early beta state and only real arithmetic is supported. Support for
complex data types is planned for a future release. This paper is aimed for
potential users of the library. We describe the design choices and capabilities
of the library, and contrast them to existing software such as ScaLAPACK.
StarNEig implements a ScaLAPACK compatibility layer that should make it easy
for a new user to transition to StarNEig. We demonstrate the performance of the
library with a small set of computational experiments.Comment: 10 pages, 4 figures (10 when counting sub-figures), 2 tex-files.
Submitted to PPAM 2019, 13th international conference on parallel processing
and applied mathematics, September 8-11, 2019. Proceedings will be published
after the conference by Springer in the LNCS series. Second author's first
name is "Carl Christian" and last name "Kjelgaard Mikkelsen
Chain-based scheduling: Part I - loop transformations and code generation
Chain-based scheduling [1] is an efficient partitioning and scheduling scheme for nested loops on distributed-memory multicomputers. The idea is to take advantage of the regular data dependence structure of a nested loop to overlap and pipeline the communication and computation. Most partitioning and scheduling algorithms proposed for nested loops on multicomputers [1,2,3] are graph algorithms on the iteration space of the nested loop. The graph algorithms for partitioning and scheduling are too expensive (at least O(N), where N is the total number of iterations) to be implemented in parallelizing compilers. Graph algorithms also need large data structures to store the result of the partitioning and scheduling. In this paper, we propose compiler loop transformations and the code generation to generate chain-based parallel codes for nested loops on multicomputers. The cost of the loop transformations is O(nd), where n is the number of nesting loops and d is the number of data dependences. Both n and d are very small in real programs. The loop transformations and code generation for chain-based partitioning and scheduling enable parallelizing compilers to generate parallel codes which contain all partitioning and scheduling information that the parallel processors need at run time
Fast Hardware Implementations of Static P Systems
In this article we present a simulator of non-deterministic static P systems
using Field Programmable Gate Array (FPGA) technology. Its major feature
is a high performance, achieving a constant processing time for each transition. Our
approach is based on representing all possible applications as words of some regular
context-free language. Then, using formal power series it is possible to obtain the
number of possibilities and select one of them following a uniform distribution, in
a fair and non-deterministic way. According to these ideas, we yield an implementation
whose results show an important speed-up, with a strong independence from
the size of the P system.Ministry of Science and Innovation of the Spanish Government under the project TEC2011-27936 (HIPERSYS)European Regional Development Fund (ERDF)Ministry of Education of Spain (FPU grant AP2009-3625)ANR project SynBioTI
- …