# Poster: Implications of Merging Phases on Scalability of Multi-core Architectures

Madhavan Manivannan Chalmers University of Technology Gothenburg, Sweden madhavan@chalmers.se Ben Juurlink Technische Universitat Berlin Berlin, Germany juurlink@ce.tu-berlin.de Per Stenstrom Chalmers University of Technology Gothenburg, Sweden per.stenstrom@chalmers.se

## ABSTRACT

Amdahl's Law estimates parallel applications with negligible serial sections to potentially scale to many cores. However, due to merging phases in data mining applications, the serial sections do not remain constant. We extend Amdahl's model to accommodate this and establish that Amdahl's Law can overestimate the scalability offered by symmetric and asymmetric architectures for such applications.

**Implications:** 1) A better use of the chip area is for fewer and hence more capable cores rather than simply increasing the number of cores for symmetric and asymmetric architectures and 2) The performance potential of asymmetric over symmetric multi-core architectures is limited for such applications.

## **Categories and Subject Descriptors**

C.1.2 [Multiprocessors]: Multiprocessors

**General Terms** 

Performance, Design

#### Keywords

Amdahl's Law, Reduction Operations, Multi-core architecture

## **1. INTRODUCTION**

This poster studies the scalability of a set of data mining workloads that have negligible serial sections. While the formulation of Amdahl's Law that optimistically assumes constant serial sections [1] estimates these workloads to scale to large number of cores, the overhead in carrying out merging (or reduction) operations makes scalability to peak at a much lower core count. We establish this by extending the Amdahl's speedup model to factor in the impact of reduction operations on the speedup of applications on symmetric as well as asymmetric chip multiprocessor (CMP) designs.

#### 2. MODEL AND ITS IMPLICATIONS

The mathematical formulation of Amdahl's Law in [1] assumes that the serial section remains constant, independent of scaling. However our analysis of the clustering applications in the Minebench suite [2] reveals that serial sections do not remain constant with scaling. This behavior can be attributed to merging phases in the application where partial results computed by different threads are merged. Merging operations have an inherent serial component and its complexity grows as we scale. We extend the model in [1] to incorporate this observation and validate it by runs on simulators as well as real hardware [3].

Copyright is held by the author/owner(s).

*ICS'11*, May 31–June 4, 2011, Tucson, Arizona, USA. ACM 978-1-4503-0102-2/11/05.

Figure 1 shows the scalability prediction for kmeans using the model presented in [1] and the extended Amdahl's model that incorporates reduction. By factoring in reduction operations, speedup tapers off at a lesser core count (71.9 instead of 246.5). This shows that naively using Amdahl's Law can lead to speedup overestimation.



Number of cores

Figure 1. kmeans scalability using different models

We use the resource model presented in [1] to obtain the theoretical speedup limit for a hypothetical application that spends 1% of the time on serial sections (f=0.99) with 256 simple cores (BCEs). Figure 2 compares the speedup obtained using the model presented in [1] (marked 'Amdahl') and the model with reduction (marked 'Reduction') for symmetric (sym) and asymmetric (asym) CMPs. We can observe that due to reduction overhead, the performance potential of asymmetric CMP over symmetric CMP is limited (43.3 against 36.2 as opposed to 162.3 against 79.7).



Figure 2. scalability on symmetric and asymmetric CMPs

## **3. REFERENCES**

- [1] Mark Hill and Mike Marty. Amdahl's Law in the multicore era. *IEEE Computer*, Vol. 41, no.7, pages 33-38, July 2008.
- [2] Ramanathan Narayanan et al. MineBench: A Benchmark Suite for Data Mining Workloads. In Proceedings of IISWC, 2006.
- [3] M. Manivannan, B. Juurlink, P. Stenstrom. Implications of Merging Phases on Scalability of Multi-core Architectures. *Technical Report*. Department of Computer Science and Engineering. Chalmers University.http://www.cse.chalmers.se/~madhavan/tr\_madhavan\_20 11\_01.pdf