## **PREFACE**

The ACACES summer school wants to create an opportunity to learn new things and to meet new people. We believe that the 12 courses and the two invited talks – all by world class experts – suffice to reach the first goal.

The second goal is a bigger challenge. How can we bring the participants in contact with as many other participants of the summer school in one week? To reach this goal, we arranged to have all meals and coffee breaks together, there are long breaks, and very importantly – we organize a poster session on Wednesday afternoon.

The basic idea is that you can present your own research to the other participants, and that you learn more about the other participants' research. We have put the poster session in the middle of the week so that people with a common research interest still have enough time during the rest of the week to discuss their mutual research interest, hopefully resulting in a long lasting research collaboration and joint research contributions. So, the poster session will help you in further developing your professional network, this is what HiPEAC is all about.

There will be 65 posters presented during the poster session. You will not have time to discuss them all during one afternoon. Therefore, we have collected the abstracts in a book of abstracts. The abstracts in this book were not reviewed as we did not want to exclude anybody from participating in the poster session, and from making new contacts. The sole purpose of the book is to prepare your visit to the poster session. You can in advance select the posters you want to discuss and then visit them (the order of posters on the posters panels is the same as in the book). If you present a poster yourself, make sure that you spend about 50% of your time at your poster, and the other 50% visiting other posters.

I wish you a very productive poster session

Koen De Bosschere Summer School Organizer

## **CONTENTS**

| A Highly Efficient, Thread-Safe Software Cache Implementation for Tightly-Coupled Multicore Clusters  Christian Pinto and Luca Benini                                                                    | e<br>1 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------|
| Memory affinity in multi-threading: the Bowtie2 case study Claudia Misale, Marco Aldinucci and Massimo Torquati                                                                                          | 5      |
| Assessing the effect on inter-task interferences in real multicores<br>Gabriel Fernandez, Mikel Fernandez, Jaume Abella, Eduardo Quinones, Luca<br>Fossati, Marco Zulianello and Francisco J. Cazorla    | 9      |
| Design of a legacy-free operating system for multicore platforms  Laust Brock-Nannestad and Sven Karlsson                                                                                                | 13     |
| Optimizing the Overhead for Network-on-Chip Routing Reconfiguration in Parallel Multi-Core Platforms  Marco Balboni, Francisco Triviño, José Flich and Davide Bertozzi                                   | 17     |
| Performance and Power Efficiency Optimization and Evaluation of a Data<br>Cleansing Algorithm on Multicore Processors<br>Abdullah Al Hasib and Lasse Natvig                                              | 21     |
| Integration of HW IPs into tightly coupled multicore clusters: a synthesis-friendly approach Francesco Conti, Andrea Marongiu and Luca Benini                                                            | 25     |
| Architecture for Transparent Binary Acceleration with External Memory Accesses Nuno Miguel Cardanha Paulino, João Canas Ferreira and João Manuel Paiva Cardoso                                           | 29     |
| A Communication-efficient Mapping of AUTOSAR Runnables on Multicores  H. R. Faragardi, T. Nolte and B. Lisper                                                                                            | 33     |
| An overview of queuing schemes for HPC-systems interconnection networks with direct and hybrid topologies  Pedro Yebenes, Jesus Escudero-Sahuquillo, Crispin Gomez, Pedro J. Garcia and Francisco Quiles | 37     |
| Methodological Study of Shared Cache Optimizations K. Kavi, M. Islam and M. Scrbak                                                                                                                       | 41     |
| Memory Array Protection:Check on Reads or Check on Writes?  Panagiota Nikolaou, Yiannakis Sazeides, Lorena Ndreou, Emre Ozer and Sachin Idgunji                                                          | 45     |
| Energy Efficient Memory Systems Nico Reissmann and Magnus Jahre                                                                                                                                          | 49     |
| Automatic Estimation of DVFS Potential  Nicolas Triquenaux                                                                                                                                               | 53     |

| Performance Analysis of Caches in Faulty Real-Time Systems  Mladen Slijepcevic, Leonidas Kosmidis, Jaume Abella, Eduardo Quinones and Francisco J. Cazorla | 57         |
|------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| Dynamic Command Scheduling for Real-Time Memory Controller<br>Yonghui Li, Benny Akesson and Kees Goossens                                                  | 61         |
| Hard Real-Time Task Migration on Embedded Heterogeneous Many-Core Processors<br>Peter Munk and Hans-Ulrich Heiß                                            | 63         |
| Dynamic Application Adaptation for Heterogeneous Platforms  Christos Margiolas and Michael F. P. O'Boyle                                                   | 67         |
| Heterogeneous Programming Library: A Framework for Quick Development of Heterogene<br>Applications<br>Moisés Viñas, Zeki Bozkus and Basilio B. Fraguela    | eous<br>69 |
| Enabling the OpenMP programming model on embedded heterogeneous manycore SoC<br>Alessandro Capotondi, Andrea Marongiu and Luca Benini                      | 73         |
| OpenMP extensions to exploit HW acceleration on shared-memory many-core clusters<br>Paolo Burgio, Andrea Marongiu and Luca Benini                          | 77         |
| Coordination Programming Approach for Linear Algebra Applications  Pavels Zaicenkovs                                                                       | 81         |
| Data abstractions for portable parallel codes  Javier Fresno, Arturo Gonzalez-Escribano and Diego R. Llanos                                                | 85         |
| Adaptive Cooperative Caching for Many-cores systems Safae Dahmani, Loïc Cudennec and Guy Gogniat                                                           | 89         |
| SHiC approach for Agile Application Placement in Many-Core Systems  Mohamamd Fattah, Masoud Daneshtalab, Pasi Liljeberg and Juha Plosila                   | 93         |
| A Scalable Distributed Data-flow Scheduler for Many-Cores  Andrea Mondelli                                                                                 | 97         |
| Hybrid multi-core data flow architecture  Charles Shelor                                                                                                   | 101        |
| Combining a Dataflow Substrate with Multi-level Checkpointing Omer Subasi, Javier Arias, Osman Unsal, Jesus Labarta and Adrian Cristal                     | 105        |
| Profiling of Dataflow-Based Coarse-Grained Reconfigurable Platforms  Carlo Sau, Francesca Palumbo and Luigi Raffo                                          | 109        |
| Efficient Fault Emulation using Dynamic FPGA Reconfiguration  Alexandra Kourfali, Karel Bruneel and Dirk Stroobandt                                        | 113        |

| Fault recovery for an FPGA mapped artificial pancreas using partial reconfiguration<br>Michail Vavouras and Christos-Savvas Bouganis                                              | 115 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Markov Chain Monte Carlo: An FPGA implementation perspective<br>Grigorios Mingas and Christos-Savvas Bouganis                                                                     | 119 |
| Maximizing GEMM Performances via Offline Heuristic Generation and Run-time Specialization  Victor Lomuller and Henri-Pierre Charles                                               | 123 |
| DART: A GPU architecture exploiting temporal SIMD for divergent workloads  Jan Lucas, Sohan Lal, Mauricio Alvarez-Mesa, Ahmed Elhossini and Ben  Juurlink                         | 127 |
| Exploring GPGPUs Workload Characteristics and Power Consumption<br>Sohan Lal, Jan Lucas, Mauricio Alvarez Mesa , Ahmed Elhossini and Ben<br>Juurlink                              | 131 |
| Integrated code generation for clustered VLIW architectures Nikolai Kim                                                                                                           | 135 |
| Design Space Exploration and Analysis Of Compiler Transformation in VLIW Processors Amir Hossein Ashouri, Gianluca Palermo, Cristina Silvano, Vittorio Zaccaria and Sotiris Xydis | 139 |
| Sniper: A Fast and Accurate Many-Core Simulator<br>Wim Heirman, Trevor Carlson, Kenzo Van Craeynest and Lieven Eeckhout                                                           | 141 |
| PIKE - Improving COTSon Interface for Easier Design Space Exploration Andrea Mondelli, Kang Cai and Roberto Giorgi                                                                | 145 |
| Improving a Design Space Exploration Framework for Computing Systems Multi-Objective Optimization  Radu Chis and Lucian Vinta                                                     | 149 |
| Virtual Platforms for Fast Memory Subsystem Exploration Using gem5 and TLM2.0<br>Matthias Jung, MohammadSadegh Sadri and Norbert Wehn                                             | 153 |
| Identifying Sequences of Optimizations for HW/SW Compilation Ricardo Nobre and João M. P. Cardoso                                                                                 | 157 |
| pFS: A partitioned filesystem targeting Virtual Machine images Anastasios Papagiannis, Yannis Sfakianakis, Stelios Mavridis, Manolis Marazakis and Angelos Bilas                  | 161 |
| Efficient Techniques for Detecting and Exploiting Runtime Phases  Andreas Sembrant                                                                                                | 165 |
| Kernel level profiling of I/O intensive applications<br>Spyridon Papageorgiou, Manolis Marazakis and Angelos Bilas                                                                | 169 |

| A Unified Approach to Identifying and Healing Vulnerabilities in x86 Machine Codes<br>Kirill Kononenko                                                                                                                      | 173 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Benchmarking the Hardware Error Sensitivity of Machine Instructions Behrooz Sangchoolie, Fatemeh Ayatolahi, Raul Barbosa, Roger Johansson and Johan Karlsson                                                                | 177 |
| DOME: Delaying and Overcoming Microprocessor Errors  Negar Miralaei, Jyothish Soman, Timothy M Jones and Alan Mycroft                                                                                                       | 181 |
| Fault tolerance techniques in the router's micro-architecture inside NoC<br>Alirad Malek, Ioannis Sourdis and Stavros Tzilis                                                                                                | 185 |
| Time-Based Sampled Simulation of Synchronizing Multi-Threaded Applications  Trevor E. Carlson, Wim Heirman and Lieven Eeckhout                                                                                              | 189 |
| Transient Error Detection<br>Konstantina Mitropoulou, Vasileios Porpodas and Marcelo Cintra                                                                                                                                 | 193 |
| Design of Energy-Efficient Adder Units for Vector Processors  Ivan Ratkovic, Oscar Palomar, Milan Stanic, Osman Unsal, Adrian Cristal and Mateo Valero                                                                      | 197 |
| Rapid Characterization and Vectorization Using Vector Library Milan Stanic, Oscar Palomar, Ivan Ratkovic, Osman Unsal, Adrian Cristal and Mateo Valero                                                                      | 201 |
| Automatic Vector Custom Instruction Set Extensions Anadi Mishra and Laura Pozzi                                                                                                                                             | 205 |
| An Automated Negotiation Model based on Different Strategies in an Adaptive<br>Multi-Agent System<br>Serban Radu                                                                                                            | 209 |
| Parallel implementation of N-gram algorithm for document comparison Maciej Wielgosz, Sebastian Koryciak, Marcin Janiszewski, Marcin Piertron, Pawel Russek, Ernest Jamro, Agnieszka Dabrowsk-Boruch and Kazimierz Wiatr     | 213 |
| Parallel MPI implementation of N-gram algorithm for document comparison Maciej Wielgosz, Sebastian Koryciak, Marcin Janiszewski, Marcin Pietron, Agnieszka Dabrowska-Boruch, Pawel Russek, Ernest Jamro and Kazimierz Wiatr | 217 |
| PARTEE: PARallel Task Execution Engine Nikolaos Papakonstantinou and Polyvios Pratikakis                                                                                                                                    | 221 |
| Philosophy of Thought and Action in a programming model <i>T.A. Atabong</i>                                                                                                                                                 | 225 |

| A Novel Framework for the Design of Low-complexity QC-LDPC Encoders  Georgios Tzimpragos, Christofors Kachris, Dimitrios Soudris and Ioannis  Tomkos           | 227 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| ELB-trees: Efficient Lock-free B+trees  Lars Bonnichsen, Sven Karlsson and Christian Probst                                                                    | 231 |
| Strengthening Consistency in the Cassandra Distributed Key-value Store Panagiotis Garefalakis, Panagiotis Papadopoulos, Ioannis Manousakis and Kostas Magoutis | 235 |
| Shattering the Telecom Infrastructure  Mohamed El-Refaey                                                                                                       | 239 |
| Revisiting Value Prediction  Arthur Perais                                                                                                                     | 241 |
| Simultaneous Optical Path Setup for Reconfigurable Photonic Networks in Tiled CMPs<br>Paolo Grani and Sandro Bartolini                                         | 245 |
| A Variability-Aware Voltage Island Formation Framework for Multi/Many-Core<br>Architectures at Near-Threshold Computing<br>Ioannis Stamelakos                  | 249 |