I. INTRODUCTION
Manycore platforms are gaining momentum within the processor design spectrum. Chips with thousands of cores promise high computing power when multiple applications are � unning simultaneously, each with multiple tasks and multIple threads. There are a number of design challenges for these architectures including high power consumption, memory and cache coherence, network architectures and the impact of Process Voltage and Temperature (PVT) �ariation [1] . In this paper, we are tackling the problem of power reduction under PVT impact.
. O . ne . com n: o ? way o . f � educing power in integrated � lrcUlts . IS to dl : lde the chIp mto voltage islands. A voltage Island IS a regIOn of a chip that is powered by a single voltage level. Regions of the chip that are timing critical can be mapped to voltage islands powered by relatively high voltages, while regions of the chip that are not timing critical can be mapped to voltage islands powered by relatively low voltages. The regions powered by high voltages consume more power but run faster; the regions powered by low voltages consume less dynamic and static power, but run � Iower [�] [3] [ � ] [5] [6] . In a manycore chip, each voltage Island wIll tYPIcally contain many adjacent cores.
�ypically, these voltage islands are rectangular [2] [4] [5] .
In thIS paper, we show that this is a particularly bad choice in the presence of Process, Voltage, and Temperature variations for manycore architectures.
�he 2005 International Technology Roadmap for SemIconductors (ITRS) projected that these variations will ?e the major obstacle to high chip yields. On-chip variations mclude process variation due to manufacturing deficiencies at nano-scale technologies, and voltage and temperature variations due to activity and workload distribution. Process variation can be die-to-die (020) or within-die (WID). We focus on WID variation in this work [2] [5] [7] .
978-1-4244-4467 -0/09/$25.00 ©2009 IEEE 001
Within-die process variation will cause some cores to run slower than others. We can mitigate this effect by increasing the .
supply v ? ltage of the slower cores. However, if a chip is d . esigned usmg voltage islands, increasing the voltage of a smgle core requires increasing the voltage (and hence power) of all cores within the island. Thus, the voltage level for each island is dictated by the slowest core in that island [2] [4] [5] .
In this paper, we show that, due to the spatial variation inherent in within-die variation (WID), circular, or cloud shaped islands result in better power characteristics than rectangular cores. We then present a novel algorithm that uses this insight to create voltage islands that are optimized for both PVT variations and the inter-island communication required by the application, and places individual cores within these islands. We evaluate our algorithm in the context of a power-aware mapping algorithm involving voltage island creation, voltage selection and an optimization flow. '
II. RELATED WORK
In [5] , the authors statistically modeled transistor parameters to construct a process variation model. Then, they attempted to reduce power consumption using voltage/delay dep . en?ency. Humenay et al. provided a model for systematic vanatlOn and then showed process variation and thermal throttling implications on the cores ' frequency [8] . Herbert 
and .
Marculescu compared the throughput of chip-level multIprocessors (CMP) constructed using frequency islands acr ? s � a range of core counts and sizes under process vanatlOn [9] . They showed that designs with smaller cores are more tolerant to variability than designs with fewer larger cores. Huang et al. showed that an architecture with many simple cores rather than complex cores reduces the thermal powe � density .
(TPD) [10] . They proposed an on-chip adaptl ". e regulatIOn to the voltage supply. However, on-chip regulatIOn can lead into significant increase in energy. Wang et al :
presented a variation-aware task scheduling on a �ultlprocessor System-on-Chip (MPSoC) [11] . They mtroduced performance yield as a metric to assist the task allocation as a statistical timing task graph. Coskun et al.
provided an exploration of thermal-aware task allocation on an MPSoC [12] . They presented an OS-level dynamic task allocation algorithm for different policies with negligible performance overhead. Li et al. described a tool for NoC architecture space exploration under process and temperature variation [13] . This previous research dealt with simple multicore platforms with less than 100 cores. They did not combine the effects of process, voltage and temperature variation together in their energy optimization techniques for manycore designs. There is no prior study on the relationship between the voltage island shape and PVT impact.
III. PVT -VARIATION MODELS
Before describing our voltage island technique, we review the PVT models assumed in this work. To capture the WID process variation for circuit parameters, such as VI and LetT> we used the methodology described in [2] [7] . The systematic process variation is created using a multivariate normal distribution with a spherical spatial correlation structure. The chip is divided into small rectangular-shaped fragments. Each fragment is given a normal distribution of VI and Leff; the main cause of the systematic variations, with a mean and a standard deviation given by lJ,ys. The correlation factor, p(r), is a function of the distance r between fragment X and Y. The derived correlation function:
The random variation is represented by lJrand. The random segment is due to uncorrelated random doping effects. Both the random and systematic variations are considered equal. For Let!; the systematic variation O"tolatif1 is assumed to be half of that of VI' We assumed the total standard deviation over the mean, O"tolatif1, to be 0. 12, which is somewhat pessimistic [2] .
In order to compute the temperature variation in the system, the HotSpot tool of [14] is used to calculate the temperature of each core during execution of a segment of code. On the other hand, the power grid model used in [2] was utilized to estimate the voltage variation. The power grid was assumed to be a resistive network, with each processor a current source. Each voltage level was assumed to have its own power grid.
As in [2] , the model is extended to a multiprocessor platform by dividing the chip into separate core regions. The process variation within the region has a direct impact on the speed of the core. Based on the variation across a core and its location on the chip, the frequency can be computed for each core using the delay along the critical path. Figure I (a) shows an example of manycore platform with process variation. Indeed, visual inspection reveals the potential advantage of cloud-shaped islands.
IV. VOLTAGE ISLAND FORMATION
In this section, we present our voltage island formation algorithm and core placement algorithm.
A. Motivation
Intuitively, our algorithm should create voltage islands and position the cores such that: (a) the distance among cores in the same island is minimized to reduce PVT impact.
002
(b) islands with heavy communication are placed close to each other. (c) islands that will be supplied the same voltage level should be placed next to each other to allow these islands to be merged. The first goal suggests that islands should be circular, or cloud-shaped, rather than very oblong rectangles. Intuitively, these shapes would have a smaller average distance between cores. Due to the spatial correlation inherent in within-die process variations, cores that are close to each other share similar characteristics, but as the distance between the two cores increases, such similarities will start to fade [2] [7] [8] . In the case of a thousand-core platform, the wide spread of cores makes frequency differences among cores significant [\] . Figure 2 shows this graphically. The rectangular voltage island VII is much more affected by the systematic variations due to its span across the chip, whereas the circular voltage island VI2 is more PVT -tolerant. A PVT-tolerant solution is more power efficient, since the voltage supplied to each island is dictated by the slowest core in the island; large variations within an island means that, on average, more cores have to be powered at a higher voltage than would otherwise be necessary. The second goal ensures that communication paths are short. A good solution will balance these goals. Circular voltage islands would minimize the maximum distance between cores within each island. However, it is diffIcult to closely-pack circular voltage islands together which will have a negative impact on communication between different islands. Our algorithm "grows" islands in such a way as to balance these optimization goals.
B.

Cloud-Shaped Voltage Island Formation Algorithm
We assume that the cores have already been assigned islands and that the islands have been partitioned into d sets, where each island in a set will be eventually assigned the same voltage level (and hence merged into a single island). The number of islands in set j will be denoted nj.
Cores are assigned locations governed by three seeds. Each seed represents a physical location on the chip. Figure 3 shows our algorithm. We first set the placement seed to be one comer of the chip (other strategies are possible). We then cycle through each set, each time considering the set that has the highest inter-island communication with the islands already placed. For this set, we determine the closest unused location to the placement seed, and assign this to the set's global seed. We then cycle through all voltage islands within the set, each time considering the voltage island with the highest inter-island communication with the islands already placed. For each voltage island, we then determine the closest unused location to the global seed, and assign this to the island's local seed. Finally, we cycle through all cores within the island, and greedily position each core at the closest unused location to the island's local seed. An example is shown in Figure 4 . VII and VI3 communicate heavily and therefore they should be placed close to one another. Core CI does not communicate with the outside world so it should be closer to the seed of Vll. Since VII and VI2 will be merged, they should be placed adjacent to one another so that they can be assigned the same voltage later. Figure 5 shows the actual implementation of the example shown in Figure 4 . In the example of Figure 5 , Islands 1 and 3 have higher communication so they should be placed close to each other. Similarly, core C3 has high inter-island communication, thus it should be placed close to the border of the island. 
V. EXPERIMENTAL RESULTS
We evaluate our algorithm in the context of a larger power-aware mapping flow. Figure 7 shows the overall mapping algorithm. First, different applications are allocated portion of the available resources. Tasks are then scheduled and assigned to cores using an As-Soon-As-P ossible (ASAP ) scheduling algorithm. ... The slack of each task is computed, and used to scale the voltage of each core as low as possible such that all tasks can meet their deadlines. Voltage islands are then formed by grouping cores with the same voltage. Voltage islands are merged using the algorithm from [6] , and cloud-shaped voltage islands are then formed as described in this paper.
The routing traffic is optimized using a genetic algorithm. Each potential route is represented by a chromosome, and the outer loop of the algorithm optimizes each route. Further details can be found in [2] and [4] .
In our experiments, we used 30 test cases. In every test case, a randomly generated task graph, with 9S0 tasks representing the task graphs were used [IS] . The number of cores considered in the simulation varies from 400 to 202S cores. Up to four voltage levels between 0.6V and IAV are available for the optimization algorithm. Any unused cores are switched off completely. A 6Snm with nominal Vt=400m V and maximum Vdd=IAV was assumed. The processor speed was SOMHz, and the network speed was 100Mhz. The dynamic power was estimated to be OAmWIMHz [2] . The optimization and 004 placement operations were carried out on a representative process variation (PV) profile for calibration purposes; then, the resulting mapping was enforced on 8 other arbitrary profiles [2] . The voltage levels of the islands in the manycore platform were scaled up until the slowest core meets the timing requirements. The energy savings results of all the test cases are grouped and averaged. Figure 8 compares the energy savings obtained by our optimization algorithm for two cases: cloud-shaped voltage islands and rectangular voltage islands for various numbers of cores. Using the cloud-shaped cores, power is reduced by 10% to 12% beyond that with rectangular cores.
VI. CONCLUSIONS
In this paper, we proposed a novel approach for voltage island formation in manycore designs under the influence of within-die process, temperature, and voltage variations. Because of the locality of these effects, cloud-shaped voltage islands are much more PVT-tolerant than rectangular-shaped voltage islands. The proposed voltage island shape change will improve the energy savings up to 12%.
