## **PRFFACE** The ACACES summer school wants to create an opportunity to learn new things and to meet new people. We believe that the 12 courses and the two invited talks – all by world class experts – suffice to reach the first goal. The second goal is a bigger challenge. How can we bring the participants in contact with as many other participants of the summer school in one week? To reach this goal, we arranged to have all meals and coffee breaks together, there are long breaks, and very importantly – we organize a poster session on Wednesday afternoon. The basic idea is that you can present your own research to the other participants, and that you learn more about the other participants' research. We have put the poster session in the middle of the week so that people with a common research interest still have enough time during the rest of the week to discuss their mutual research interest, hopefully resulting in a long lasting research collaboration and joint research contributions. So, the poster session will help you in further developing your professional network, this is what HiPEAC is all about. There will be 82 posters presented during the poster session. You will not have time to discuss them all during one afternoon. Therefore, we have collected the abstracts in a book of abstracts. The abstracts in this book were not reviewed as we did not want to exclude anybody from participating in the poster session, and from making new contacts. The sole purpose of the book is to prepare your visit to the poster session. You can in advance select the posters you want to discuss and then visit them (the order of posters on the posters panels is the same as in the book). If you present a poster yourself, make sure that you spend about 50% of your time at your poster, and the other 50% visiting other posters. I wish you a very productive poster session Koen De Bosschere Summer School Organizer ## **CONTENTS** | Towards a Performance Scalable File System Design Konstantinos Chasapis, Yannis Klonatos, Stelios Mavridis, Michail D. Flouris, Manolis Marazakis, Angelos Bilas | 1 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----| | Towards an HPC I/O framework for clusters of Virtual Machines Anastassios Nanos, Nectarios Koziris | 5 | | A full custom modular switch for CMP Sytems Antoni Roca, José Flich, Federico Silla, José Duato | 9 | | A minimal/non-minimal routing algorithm for NoCs to misroute packets around congested areas Masoumeh Ebrahimi, Masoud Daneshtalab, Pasi Liljeberg, Hannu Tenhunen | 13 | | An Approach to the Performance of Congestion Management Techniques in Interconnection<br>Networks with Direct Topologies<br>Daniel Gomez-García, Pedro Javier García, Francisco José Quiles, Jesús<br>Escudero-Sahuquillo, Juan Antonio Villar, José Flich, José Duato | 15 | | Exploring the Coherence Protocol Acceleration through the Interconnection Network Lucía G. Menezo, Adrián Colaso, Valentín Puente, Jose-Ángel Gregorio | 19 | | Exploring the On-Board Interconnect Requirements of Multi-Chip Architectures Karthikeyan Palavedu Saravanan, Alejandro Rico, Felipe Cabarcas, Alex Ramirez | 23 | | Exploring 3D-NoC based architectures Daniele Bortolotti, Andrea Marongiu, Martino Ruggiero, Luca Benini | 27 | | A New Selection Policy for Low Power Networks on Chip<br>Diana Salemi, Maurizio Palesi | 31 | | DVFS Management in Real Processors Vasileios Spiliopoulos, Georgios Keramidas, Stefanos Kaxiras, Konstantinos Efstathiou | 35 | | Online Performance Prediction in Processors with DVFS Capabilities <i>Qixiao Liu, Miquel Moreto, Jaume Abella, Francisco J. Cazorla</i> | 39 | | Exploring the performance-energy tradeoffs in Sparse Matrix-Vector Multiplication<br>Vasileios Karakasis, Georgios Goumas, Nectarios Koziris | 43 | | Reducing energy consumption with flexible memory systems Andreas Koltes, Robert Mullins | 47 | | Combining technologies to reduce energy in L1 data caches<br>Alejandro Valero, Julio Sahuquillo, Salvador Petit, Pedro López, José Duato | 51 | | Memory Hierarchy and Network Co-design through Trace-Driven Simulation<br>Mario Lodde, José Flich | 55 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----| | Improving the World's Fastest Cache Simulator Andreas Sandberg, Peter Vestberg, Erik Hagersten | 59 | | StatCC: Modeling Multi-Core Cache Sharing in a Fraction of a Second<br>David Eklov, David Black-Schaffer, Erik Hagersten | 63 | | Using Miss Ratio Curves To Understand Program Optimization Muneeb Khan, Nikos Nikoleris, Erik Hagersten | 67 | | Towards Value-Aware Caches Angelos Arelakis, Per Stenstrom | 71 | | Cache Pirating: Measuring the Performance Impact of Cache Sharing Nikos Nikoleris, David Eklov, David Black-Schaffer, Erik Hagersten | 75 | | Scarphase: Fast Online Phase Classification Andreas Sembrant, David Eklov, Erik Hagersten | 79 | | How sensitive is processor customization to the workload's input data sets? Maximilien Breughe, Zheng Li, Yang Chen, Stijn Eyerman, Olivier Temam, Chengyong Wu, Lieven Eeckhout | 83 | | Characterizing Phase Behavior for Dynamically Reconfigurable Architectures Zhibin Yu, Nikola Puzovic, Antonio Portero, Roberto Giorgi | 89 | | Communication Strategy for Embedded Distributed Architectures<br>Celine Azar, Stephane Chevobbe, Yves Lhuillier, Jean-Philippe Diguet | 93 | | Coarse-Grained Reconfigurable Approach for Multi-Dataflow Systems Nicola Carta, Francesca Palumbo, Luigi Raffo | 97 | | Efficiently generating FPGA configurations through a stack machine Fatma Abouelella, Karel Bruneel, Dirk Stroobandt | 101 | | FPGAs for general purpose computing Javier Olivito, Javier Resano | 105 | | Peak Performance Model for a Custom Precision Floating-Point Dot Product on FPGAs Manfred Muecke, Bernd Lesser, Wilfried N. Gansterer | 109 | | Fast ASIP Design Space Exploration on FPGAs through Binary Translation<br>Sebastiano Pomata, Giuseppe Tuveri, Paolo Meloni, Menno Lindwer | 115 | | Architectural Support for Concurrency on Reconfigurable Systems Pavel Zaykov, Georgi Kuzmanov | 119 | | A configurable and scalable multi-core architecture template supporting hybrid Model of Computation Giuseppe Tuveri, Sebastiano Pomata, Simone Secchi, Paolo Meloni | 123 | | Parallel Access Schemes for Polymorphic Register Files: Motivation Study<br>Catalin Ciobanu, Georgi Kuzmanov, Alex Ramirez, Georgi Gaydadjiev | 127 | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----| | Mapping irregular MPSoC topologies onto 2D-meshes José Cano, José Flich, José Duato, Marcello Coppola, Riccardo Locatelli | 131 | | Automated Architecture Synthesis and Application Mapping for ASIP based adaptable MPSoCs Erkan Diken, Roel Jordans, Rosilde Corvino, Lech Jozwiak, Menno Lindwer | 135 | | Thermal-aware SoC design through micro-architectures selective block replication<br>Dionisios Diamantopoulos, Kostas Siozios, Sotiris Xydis, Dimitrios Soudris | 139 | | Early Exploration of Partitioning Trade-offs for Heterogeneous MPSoCs<br>Prashant Agrawal, Robert Fasthuber, Praveen Raghavan, Tom Vander Aa,<br>Francky Catthoor, Liesbet Van der Perre | 143 | | Accelerating Embedded Systems with C-based Hardware Synthesis<br>Vito Giovanni Castellana, Christian Pilato, Fabrizio Ferrandi | 147 | | Hardware OpenVG Rendering Engine<br>Yong-Luo Shen, Sang-woo Seo, Seok-Jae Kim, Hyun-Goo Lee, Hyeong-Cheol Oh | 151 | | Automatic Run-time Parallelism Extraction for the Design of Hardware Accelerators<br>Silvia Lovergine, Christian Pilato, Fabrizio Ferrandi | 155 | | Portability for Heterogeneous Parallel Architectures Peter Calvert, Alan Mycroft | 159 | | An Algorithm Template for Parallel Irregular Algorithms Carlos H. González, Basilio B. Fraguela | 163 | | Employing Helper Threads as a Parallelization Paradigm Anastasios Katsigiannis, Nikos Anastopoulos, Konstantinos Nikas, Georgios Goumas, Nectarios Koziris | 167 | | SCOOP: Source-level COmpiler Optimizations for Parallelism Foivos S. Zakkak, Dimitrios Chasapis, Polyvios Pratikakis, Angelos Bilas, Dimitrios S. Nikolopoulos | 171 | | SVP - a concurrency model for many-core computing<br>Q. Yang, C.R. Jesshope | 175 | | A Predictive Modelling based Approach to Runtime Adaptation of Parallel Programs Murali Krishna Emani, Michael O'Boyle | 179 | | VMAD: a Virtual Machine for Advanced Dynamic Analysis Alexandra Jimborean, Matthieu Herrmann, Philippe Clauss, Vincent Loechner | 183 | | Elasticity through Fault-Tolerance in a Cloud-based Distributed Stream Processing Engine Dimokritos Stamatakis, Kostas Magoutis | 187 | | Improving efficiency in the data center - The case of data streaming applications<br>Shoaib Akram, Angelos Bilas | 191 | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----| | An Auto-tuning Solution to Data Streams Clustering in OpenCL<br>Jianbin Fang, Ana Lucia Varbanescu, Henk Sips | 195 | | Rapid Prototyping in OpenCL with V-Parallel Process Networks Ana Balevic, Bart Kienhuis | 199 | | CUDA tuning and configuration parameters on Fermi architecture<br>Yuri Torres De La Sierra; Arturo González Escribano; Diego R. Llanos Ferraris | 203 | | Microscopic traffic simulation using CUDA Pavol Korcek, Lukas Sekanina, Otto Fucik | 207 | | Efficient Independent Component Analysis on a GPU<br>Rui Ramalho, Pedro Tomas, Leonel Sousa | 211 | | GPU performance analysis using the FFT Jacobo Lobeiras, Margarita Amor, Ramón Doallo | 215 | | Memory-Hierarchy-Aware Decoding of Structured LDPC Codes on GPUs Joao Andrade, Gabriel Falcao, Vitor Silva | 219 | | Analysis of parallel sorting algorithms on different parallel platforms Marko Misic, Milo Tomasevic | 223 | | Optimally Mapping a CFD Application on a HPC Architecture Ion Dan Mironescu, Lucian Vintan | 227 | | Multi-layered Abstractions for Partial Differential Equations from High-level Descriptions Florian Rathgeber, David A. Ham, Mike B. Giles, Paul H. J. Kelly, Graham R. Markall, Gihan R. Mudalige | 231 | | Compiler analysis for improving OpenMP code generation Sara Royuela, Roger Ferrer, Alex Duran, Xavier Martorell | 233 | | Analysis and Visualization of Software | 237 | | Pierre Caserta Implementation and Empirical Comparison of Partitioning-based Multi-core Scheduling Yi Zhang, Nan Guan, Wang Yi | 239 | | Implications of Merging Phases on Scalability of Multi-core Architectures Madhavan Manivannan, Ben Juurlink, Per Stenstrom | 243 | | Architecture for a Million Core Processor Zeus Gomez Marmolejo, Victor Garcia, Alex Ramirez, Nacho Navarro | 245 | | Exploiting Scalability on the Intel SCC Processor Andreas Diavastos, Panayiotis Petrides, Gabriel Falcao, Pedro Trancoso | 253 | | MapReduce for the Single-Chip-Cloud Architecture Anastasios Papagiannis, Dimitrios S. Nikolopoulos | | | |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|--| | Memory-intensive parallel computing on the Single Chip Cloud Computer: A case study with Mergesort Nicolas Melot, Kenan Avdic, Christoph Kessler, Jörg Keller | 261 | | | Graphic Rendering Application Profiling on a Shared Memory MPSoC Architecture Matthieu Texier, Raphael David, Karim Ben Chehida, Olivier Sentieys | 265 | | | Pipelining Producer-Consumer Tasks using Custom Multi-Core Architectures Ali Azarian, Joao M. P. Cardoso | 269 | | | User-directed Auto-vectorization in OmpSs<br>Diego Caballero, Xavi Martorell, Roger Ferrer, Alex Duran y Eduard Aigüadé | 273 | | | T-Star (T*): An x86-64 ISA Extension to support thread execution on many cores<br>Antoni Portero, Zhibin Yu, Rania Mameesh, Roberto Giorgi | 277 | | | PEPPHER: Performance Portability and Programmability for Heterogeneous Many-core<br>Architectures<br>Siegfried Benkner, Sabri Pllana, Jesper Larsson Träff, Philippas Tsigas,<br>Andrew Richards, Raymond Namyst, Beverly Bachmayer, Christoph Keßler,<br>David Moloney, Peter Sanders | 281 | | | Facing the Challenges of Heterogeneous Systems at Application Runtime Mario Kicherer, Wolfgang Karl | 285 | | | Fast JIT Code Generation for x86-64 with LLVM Viktor PAVLU, Andreas KRALL | 289 | | | Hardware Support for Dynamic Languages Pascal Schleuniger, Sven Karlsson, Christian W. Probst | 291 | | | NumCIL: Numeric operations in the Common Intermediate Language Kenneth Skovhede | 295 | | | Emeraude: Embedded Real-Time Adaptative Virtualization for Post-Moore Architectures Pierre Boulet, Julien Forget, Abdoulaye Gamatié, Laure Gonnord, Samuel Hym, Richard Olejnik | 299 | | | How to model real-time task constraints on a high-performance processor simulator<br>José Luis March, Julio Sahuquillo, Salvador Petit, Houcine Hassan, José Duato | 301 | | | A Count-Based Scheme for Fault Detection in Memory Arrays<br>Yiannakis Sazeides, Bushra Ahsan, Isidoros Sideris, Lorena Ndreu, Sachin<br>Idgunji, Emre Ozer | 305 | | | Cryptography on embedded devices with application to in-vehicle communication<br>Pal-Stefan Murvay | 309 | |