3
specific models for GPU and other accelerators need to be considered. An answer to this problem has come from the parallel patterns community [6] . Parallel patterns can be seen as a mechanism to express parallelism in existing sequential applications. They allow to raise the abstraction level and to ensure that application logic and implementation details can be kept separate as distinct aspects of the software. Many typical parallel patterns [7] can be used to express typical algorithms. Moreover, many of those patterns are easily exploitable by different heterogeneous parallel architectures. In fact, parallel patterns have become an excellent way of expressing algorithms in a portable way between traditional multi-cores and more innovative accelerator-based systems.
Special issue presentation
This special issue includes 12 new research contributions. Four of these correspond to extended research contributions from the RePara 2017 workshop that was held in conjunction with the ParCo 2017 Conference, held in Bologna, Italy, during September 2017. The remaining eight contributions were selected from an open call for papers.
In "Prediction models for performance, power, and energy efficiency of software executed on heterogeneous hardware" [8] , Bán et al. make use of both static source code metrics, and dynamic execution measuring time, power and energy to build predictive models on improvements. Using those models for training, they found that using static code metrics to predict concrete continuous values of dynamic properties cannot be achieved in general. However, they obtained good results in terms of category prediction which in most cases is enough to make refactoring decisions.
In "Supporting structured parallel program design, development and tuning in FastFlow" [9] , Gazarri and Danelutto focus on the separation of concerns provided by structured parallel programming. They describe a shell that allows to explore the design space of functionally equivalent parallel compositions with different nonfunctional properties.
In "Stream parallelism with ordered data constraints on multi-core systems" [10] , Griebler et al. propose a new technique that can be easily integrated into different C++ parallel programming frameworks supporting stream parallelism. The strategy focuses on those cases where ordering is relevant in stream parallelism with an irregular number of tasks in different stages.
In "SpExSim: assessing kernel suitability for C-based high-level hardware synthesis" [11] , Oppermann et al. introduce techniques for performing surveys on existing legacy C codebases that could be accelerated by FPGA-based compute units. Their approach focuses on high-level synthesis of source code to minimize development costs and efforts.
In "Simultaneous multiprocessing in a software-defined heterogeneous FPGA" [12] , Núñez-Yáñez et al.'s reductions of overheads are investigated to enable the utilization of all CPU cores and an FPGA in a heterogeneous environment, when a high-level general-purpose language as C++ is used.
In "Hybrid static-dynamic selection of implementation alternatives in heterogeneous environments" [13] , David del Rio et al. focus on the combination of static and dynamic techniques to select mappings of software components from an application to different computing devices in a heterogeneous system. Their technique is capable of generating a compile-time decision tree that can be used to select the best mapping.
In "On dynamic memory allocation in sliding-window parallel patterns for streaming analytics" [14] , Torquati et al. study the issue of dynamic memory management for streaming parallelism. Their study shows that the default memory management mechanisms provided by the C++ standard are not the most adequate for this subset of applications. They provide alternate techniques combining custom allocators with variants of smart pointers to improve the performance of pipelines and other streaming patterns.
In "Experiences with implementing parallel discrete-event simulation on GPU" [15] , Sang et al. focus on porting discrete-event simulators to run them on GPU.
They compare two open-source approaches to provide interfaces that are similar to existing C++ Standard Template Library which is quite in line with current parallel STL and gives a path toward upcoming parallelism extensions in new versions of C++.
In "Multi-objective algorithms for the application mapping problem in heterogeneous multiprocessor embedded system design" [16] , Sinaei and Fatemi address the problems in Electronic System-Level design where both simulation and design space exploration are critical performance steps. A special focus is given to two specific multi-objective optimization algorithms.
In "Toward a software transactional memory for heterogeneous CPU-GPU processors" [17] , Villegas et al. introduce APUTM, a software transactional memory solution for APUs (Accelerated Processing Units) where CPU and GPU are integrated into a single chip. APUTM allows experimenting and better understanding the trade-offs in this category of platforms.
In "A hybrid sample generation approach in speculative multithreading" [18] , Li et al. apply machine learning techniques to speculative multithreading to perform thread partitions. They apply those techniques to benchmark applications and compare them to techniques using heuristic rules-based approaches, which cannot generate adaptive samples.
In "Toward fault-tolerant hybrid programming over large-scale heterogeneous clusters via checkpointing/restart optimization" [19] , Chen et al. focus on programming models for large clusters where traditional models, as MPI + X, have been more concerned about the performance and reliability. Their approach is supported by in-memory checkpointing providing new capabilities for heterogeneous applications and hence simplifying application-level checkpointing. It is quite interesting that results were validated on different benchmarks and applications on the Tianhe-2 supercomputer.
In summary, the papers included in this special issue are representative of the progress achieved by the research community at various levels from the very high level using parallel patterns to lower levels using, for example, transactional software memory. Also the integration of GPUs and FPGAs in the landscape is essential 1 3 to achieve better performance in different categories of applications. All these innovative research directions will contribute to better achieve the long-term goal of better refactoring of existing applications to new and evolving parallel heterogeneous architectures.
