Abstract-Optical Network-on-Chip (ONoC) is an emerging technology considered as one of the key solutions for future generation on-chip interconnects. However, silicon photonic devices in ONoC are highly sensitive to temperature variation, which leads to a lower efficiency of Vertical-Cavity SurfaceEmitting Lasers (VCSELs), a resonant wavelength shift of Microring Resonators (MR), and results in a lower Signal to Noise Ratio (SNR). In this paper, we propose a methodology enabling thermal-aware design for optical interconnects relying on CMOS-compatible VCSEL. Thermal simulations allow designing ONoC interfaces with low gradient temperature and analytical models allow evaluating the SNR.
INTRODUCTION
Technology scaling down to the ultra-deep submicron domain provides for billions of transistors on chip, enabling the integration of hundreds of cores. Many-core designs, integrating interconnect that can support low latency and high data bandwidth, are increasingly required in modern embedded systems to address the increasingly stringent power and performance constraints of embedded applications. Designing such systems using traditional electrical interconnect presents a significant challenge: due to capacitive and inductive coupling [5] , both interconnect noise and propagation delay of global interconnect increase. The increase in propagation delay requires global interconnect to be clocked at a low rate, which limits the achievable bandwidth and system performance.
In this context, Optical Network-on-Chip (ONoC) is an emerging technology considered as one of the key solutions for future generations of on-chip interconnects. It relies on optical waveguides to carry optical signals, so as to replace electrical interconnect, and provide the low latency and high bandwidth characteristic of optical interconnect. Among the proposed ONoCs, the wavelength-routing based interconnect solutions are of considerable interest among the major players in the field, since they do not require any arbitration [1] [2] [4] to propagate the optical signals. They rely on passive Microring Resonators (MRs) that filter the optical signals based on their resonant wavelengths. However, silicon photonic devices are highly sensitive to temperature variation, which leads to a drift of the MRs resonant wavelength, and consequently a lower Signal to Noise Ratio (SNR). Furthermore, the power efficiency of integrated laser sources decreases at higher temperatures, which further decreases the interconnect power efficiency. Among the available laser sources candidates, Vertical-Cavity Surface-Emitting Laser (VCSEL) compatible to CMOS are of high of interest despite the use of less mature technologies (they usually require the inclusion of III-V semiconductors). Indeed CMOS-compatible VCSELs allow direct modulation of the optical signals and can thus be dedicated to a communication channel. Their size is of the same order of magnitude as the size of a MR used to modulate continuous waves emitted by shared lasers. VCSELs are thus sufficiently compact to be implemented in a large number and at any position, which leads to the following key advantages. First, the integration is easier since layout constraints are relaxed. Second, power saving is expected since waveguide lengths are reduced and waveguide crossings are avoided. Third, higher scalability is obtained since the laser sources are fully distributed.
For the first time, this paper proposes a thermal aware methodology to design ONoC relying on fully distributed CMOS-compatible VCSELs. The methodology relies on i) steady-state thermal simulations and ii) SNR analyses taking into consideration the temperature of the VCSEL and the MRs. Design space exploration on MR heating power and laser current modulation allow reducing ONoC gradient temperature while keeping acceptable the VCSEL temperature. SNR analyses allows estimating the ONoC reliability and power efficiency under a given chip activity.
The paper is structured as follows. Section II presents the related work. Then, the considered architecture models and laser sources are described in Section III. Section IV details the design methodology and Section V presents the case study and gives results. Section VI concludes the paper.
II. RELATED WORK
Reducing the thermal sensitivity of optical communication has been addressed at both device and system levels. At device level, solutions relying on athermal devices [9] , voltage tuning [10] , local heating [11] , and feed-back control schemes [12] have been explored to limit the thermal impact on or control the resonant wavelength of MRs. In [13] , system level analyses allow evaluating the influence of temperature variation on the optical signal power received by the photodetectors. In order to counter-balance the temperature variation, communication channels can be remapped through ONoC reconfiguration [15] and DVFS and workload migration techniques can be applied [16] . These techniques depend on redundant resources to re-map the wavelength channel or detect run-time temperature variation.
In [14] , the authors proposed a thermally-aware job allocation policy to minimize the temperature gradients among MRs. This work relies on device characteristics such as the Free Spectral Range (FSR), the number of waveguides and wavelengths. In our work, we evaluate the influence of the temperature variation on the SNR, taking into account the actual efficiency of the VCSEL under a given chip activity. In [21] , the authors explore the placement of shared on-chip lasers on a layer located on top of the optical interconnect. In our work, we focus on CMOS-compatible VCSEL (allowing dedicated communication channel assignment) distributed on a layer dedicated to optical interconnects.
III. 3D ARCHITECTURE
This section presents the architecture including ONoC. The challenges addressed in this paper are then introduced. Figure 1 -a illustrates the MPSoC architecture we consider. It is composed of i) an electrical layer implementing processors (in tiles) and memories and ii) an optical layer dedicated to the implementation of a ring-based ONoC. The activity of the processors leads to local and global communications that are implemented with electrical interconnect and ONoC respectively. The communication hierarchy is defined at design time and depends mainly on the number of processors and the ONoC complexity and bandwidth. The silicon photonic fabrication process is compatible with the CMOS one allows integrating VCSEL, waveguides, MRs and photodetectors. These devices are gathered into so-called Optical Network Interface (ONI), which are responsible for emitting the light, modulating optical signals with the data to be transmitted, transporting the modulated signals and receiving them on the destination side (Figure 1-c) . VCSEL and photodetectors are respectively connected to CMOS drivers and CMOS receiver through TSVs (Figure 1-c) . We choose ORNoC [2] for the implementation of the optical communications. ORNoC is a ring-based network allowing point-to-point communications between source and destination, with passive MRs. As reported in [20] , ORNoC demonstrates reduced worst-case and average insertion losses compared with related optical crossbars including Matrix [18] , -router [1] and Snake [4] (e.g., on average, 42.5% reduction for worst-case and 38% for average in 4x4 scale), which is a significant advantage to reduce the laser power consumption.
A. Architecture Overview

B. ONoC Interface and Thermal Sensitivity
The ONIs are responsible for emitting, transmitting and receiving the optical signals on the optical layer, as illustrated in Figure 1- Device-level calibration processes [16] help improving the SNR by aligning the resonant wavelength. However, such technique suffers from significant power consumption overhead: voltage tuning and heat tuning of MRs (that allow blue shift and red shift of the resonant wavelength respectively) lead to 130uW/nm and 190uW/nm respectively, as reported in [17] . For large-scale ONoCs (e.g. Corona [17] including approximately 1.1×10 6 MRs), the power dedicated to the calibration process represents more than 50% of the total network power. Since the calibration comes with performances overhead due to algorithm execution and heating latency, they are generally coupled to MRs clustering techniques. Indeed, clustering the MRs helps reducing the algorithm complexity by assuming a same local temperature among MRs close enough. However, this technique requires careful design of the ONIs to ensure a homogeneous temperature under different processing activities.
Maintaining low gradient temperature within an ONI including on-chip lasers is a challenging task since they dissipate a relatively high power. Heating MRs to reduce the gradient temperature is thus mandatory, which can be obtained by using a resistance on top of each MR. In addition, alternatively placing VCSEL and MRs contributes to reduce MRs heating power through a better initial distribution of the heat generated by VCSELs. This assumption leads to the chessboard-like layout illustrated in Figure 1 -b: 4 waveguides propagating signals in clockwise and counter-clockwise rotation are alternatively placed and, for each waveguide, 4 receivers and 4 transmitters are alternatively placed.
C. CMOS-Compatible VCSEL
VCSEL-based lasers [7] [8] offer a direct emission of data through current modulation. While the fabrication processes of CMOS-compatible VCSEL is less mature than those of microdisk lasers [19] , they offer significant advantage in terms of scalability (higher laser output power is achievable) and spectral density due to their small 3dB bandwidth (typically 0.1nm). The drawback of on-chip lasers over off-chip counterpart is their intrinsically lower efficiency and their higher sensitivity to the chip activity variation since they are located above the processing layer. More precisely, each VCSEL is located above a CMOS driver that converts an input electrical data coming from an IP core (represented as a binary voltage) into a current, as illustrated in Figure 2 -a. The current propagates through a TSV and directly modulates the VCSEL. An optical signal is vertically emitted and is redirected to a horizontal waveguide through a taper. The power of the optical signal injected into the network (OP net ) thus depends on i) the intensity of the modulation current I VCSEL , ii) the laser efficiency (η laser ) and the taper coupling efficiency (η coupling , assumed to be 70%). The VCSEL efficiency is highly sensitive to its temperature: it can drop from 15% at 40°C to 4% at 60°C. This rather low efficiency leads to a high dissipated power (P VCSEL ) which, together with the power dissipated by the CMOS driver (P driver ) and chip (P chip ), influences the onchip laser temperature. Hence, for a given modulation current, the power of the emitted optical signal (OP VCSEL ) depends on the laser temperature, which is influenced by P chip , P VCSEL and P driver , as illustrated in Figure 2 -b. Furthermore, for an increasing activity of the processing layer (which is expected to result in additional communications), either the optical interconnect bandwidth will decrease assuming a same modulation current (the SNR being lower, data will be reemitted) or the optical interconnect power consumption will increase (a higher modulation current is required to compensate the reduced efficiency). The modulation current thus must be carefully selected since i) a too small value will lead to low SNR and ii) a too high value will lead to a power hungry solution. 
D. Contribution
The gradient temperature and average temperature in ONI are critical to design VCSEL-based optical interconnect: i) low gradient temperature within an ONI eases the run-time calibration process and reduces the design complexity; ii) low ONI average temperature helps maintaining a reasonable power efficiency of the VCSEL.
For the first time, we propose a temperature aware design methodology allowing efficient use of CMOS-compatible VCSEL. Temperature evaluation and system level analyses allow estimating the ONoC SNR.
IV. PROPOSED DESIGN METHODOLOGY
In this section, we describe the methodology allowing to design thermal-aware optical interconnect. Figure 3 illustrates the design methodology. System level inputs include information on the packaging (heat sink base and fins, fan, etc.) and the considered architecture (die size, location of the ONIs, distance between the optical layer and the electrical layer, material used for each layer, etc.). The ONIs are more accurately described since it is possible to specify the number and the type of each device (VCSELs, MRs/heater, TSVs and photodetectors), their size and their relative location. Key parameters such as the I VCSEL and the MR heater power (P heater ) are specified by the user. The VCSEL electrical characteristics are fetched from a library and different activities can be considered to simulate the power dissipated by the processing layer (e.g. uniform and diagonal). 
A. Design Methodology Overview
B. System Specification
In order to perform thermal evaluations, our architecture model is based on the real physical structure of the system. The different components of the system (i.e. package, die, heat sources, and optical devices) are represented as rectangular blocks, defined by their dimension, their position, and a constitutive material. The blocks can be assigned to power values, which allow modeling the heat sources of the system. The Back-End-Of-Line (BEOL) is modeled as a thin layer (10µm) and the heat sources (cores, cache, router, etc.) are represented as rectangular blocks with power values, situated in the BEOL layer. 
IcTherm
1 [23] is a thermal simulator for electronic devices which accurately models their complex structure and provides 3D full-chip temperature maps. IcTherm solves the physical equations that govern the temperature in the chip, using the Finite Volume Method [24] , a numerical method for solving partial differential equations. IcTherm was validated against the commercial simulator COMSOL [22]: its maximal error was found to be less than 1% [23] . The structure of the system is discretized into small cubic cells that match the distribution 
MPSoC activity
Uniform, diagonal, random, benchmark of the materials and the heat sources. Figure 4 illustrates the discretization of a section of the system. Because the interfaces contain micro-scale components (e.g. TSVs, VCSELs and CMOS drivers), we use a fine-grain resolution with a cell size of 5 µm x 5 µm for meshing the region containing the interfaces. For the rest of the system, we use a coarser resolution with a cell size of 100 µm x 100 µm for the heat sources and 500 µm x 500 µm for the package.
C. SNR Analysis
For a given activity scenario, the thermal map gives the temperature of the lasers and the MRs, from which the gradient temperature of each ONI is extracted. We assume that the gradient temperature within an ONI must remain below 1°C (i.e. since we consider MRs with 1.55nm 3dB bandwidth, 0.1nm drift of their resonant wavelength corresponds to 6.5% transmission loss). Design space on the MRs heater power can be explored in order to satisfy the 1°C gradient temperature constraint inside each ONI. However, the temperature gradient among ONIs influences the SNR, as detailed in the following. Figure 5 -a illustrates the transmission of an optical signal OP in at wavelength λ signal into an MR with a resonant wavelength λ MR . The 3dB bandwidth of the signal is assumed to be small compared to MR's one (0.1nm and 1.55nm respectively). The level of the signal power at the through port (OP through ) and drop port (OP drop ) depends on the alignment between λ signal and λ MR , (Figure 5-b) . A maximum transmission to the drop port is obtained when λ signal =λ MR . In case both wavelengths are significantly different (above 1.5nm), most of the input signal power continue propagating along the waveguide to the through port. The misalignment of the wavelengths may be a required from proper ONoC operation (for wavelength routed routing) or it can be a side effect related to a temperature difference between the ONIs. By assuming λ signal equals to λ MR for a same laser source and MR temperature, 50% of the signal will be (wrongly) dropped from the waveguide for a 7.7°C temperature difference (0.77nm misalignment). This will lead to crosstalk and signal attenuation, which is evaluated with an analytical model. 
. N is the number of ONIs in the optical interconnect. L k is the propagation loss (in %) along a communication path (e.g. L 1 for C 1  2 and L 2 for C 2  3 ), l k is the corresponding length of the waveguide and L propagation is the propagation loss (in % per cm).
The crosstalk noise received by each photodetector is estimated by considering the total signal power dropped due to wavelengths misalignment and is formulated as follows: 
given chip activity. This crucial information allows the exploration of the design space and particularly the driver power consumption. Indeed, P driver is directly related to the laser modulation current and, therefore, it impacts the laser efficiency and the optical signal power. Such exploration is illustrated in the following section.
V. RESULTS
We first detail the considered architecture and VCSEL model. Then, MR heater power is explored through thermal simulations in order to reduce the gradient temperature. Finally, ONoC solutions are compared according to the SNR.
A. Case Study
The targeted system used in our experiments is based on Intel's Single-Chip Cloud Computer (SCC) [6] , a 24-tile, 48-core IA-32 45nm processor with a maximum power dissipation of 125W. Given the large number of cores, the SCC is a good candidate for silicon-photonic communications. We model the same package as the one used by Intel [6] . Figure 7 shows the assembly view of the targeted system, which contains the following components: steel back-plate, motherboard, socket, SCC chip with silicon-photonic links and on-chip laser sources, copper lid and heat sink. We assume ORNoC relying on 4 waveguides and 4 lasers per waveguide per ONI. The CMOS drivers and receivers are placed on an empty area of the electrical router of SCC tile. We consider a VCSEL with a 15x30µm² footprint size [7] [8] . It relies on mirrors redirecting the vertically generated light into to horizontal waveguide, which allow reducing the thickness of the laser below 4µm (i.e. thus making the VCSEL compatible to CMOS). The direct modulation bandwidth is 12GHz and the 3dB bandwidth is about 0.1nm. [7] : it is composed of 3 layers of III-V material (0.6µm, 0.45µm and 0.4µm thickness respectively). The laser effect is generated in the central active layer (in red color) and the power is dissipated in the adjacent layers (in grey color). Two contacts allow driving the current from the CMOS layer through 5µm-diameter TSVs. The active layers are surrounded by Si/SiO 2 line constituting the mirror structure, which allows coupling the light into the horizontal waveguide (optical signal shown in blue arrow) using a tapper with an estimated 70% efficiency. Figure 8 -b gives the laser efficiency according to its temperature and I VCSEL . Figure 8 -c gives the influence of P VCSEL and its temperature on the actual emitted light OP VCSEL .
B. Reduction of the ONI Gradient Temperature
We first evaluate the influence of P chip and P VCSEL on the ONI average and gradient temperature. We run thermal simulations under homogeneous 12.5W, 18.75W, 25W and 31.25W chip activities and we explored P VCSEL ranging from 0 to 6mW (we assumed P VCSEL =P driver , which corresponds to the worst case scenario since the total energy received by the VCSEL is dissipated as heat). Figure 9 -a illustrates the average temperature results: a 6W increase of the total chip activity roughly leads to +3.3°C on the average temperature while a 6mW increase of P VCSEL leads to +11°C. This demonstrates how important the calibration of the laser modulation current according to the requirements is: a slightly over-sized current will lead to significant power consumption overhead. Results also show a significant impact of P VCSEL on the gradient temperature between lasers and MRs (1.7°C/mW). Such gradient temperature does not realistically allow using clustering technique for run-time calibration process. We thus explore MR heating power values (P heater ) to reduce this gradient at design time. As illustrated in Figure 9 -b, the smallest gradient is obtained for P heater = 0.3xP VCSEL . In Figure  10 , we compare the temperature results with and without MR heater: for P VCSEL =1mW, solutions with and without heater lead to 0.3°C and 1°C gradient temperature respectively. Significant improvement of the gradient temperature is obtained for higher P VCSEL values: for 6mW, the gradient temperature drops from 5.8°C to 1.3°C (i.e. -4.5°C), which is significant compared to the reasonable 0.8°C increase of the average laser temperature. 
C. System Level Estimation of SNR
We evaluate the influence of the ONIs location on the SNR.
As illustrated in Figure 11 , we consider 3 scenarios leading to 18mm (in red), 32.4mm (in blue) and 46.8mm (in green) waveguide lengths. P VCSEL and P heater are set to 3.6mW and 1.08mW respectively (simulations validated that the gradient temperature within each ONI remains below 1°C). OP VCSEL is estimated from the ONI average temperature and Figure 8 -c. Table 1 summarizes the technological parameters we have considered and Figure 12 gives the worst-case SNR results.
Under a uniform activity, the asymmetric structure of the SCC chip leads to a 3°C difference among the ONIs for 46.8mm length. The crosstalk is relatively small and the SNR thus only depends on the signal power, which depends on the propagation losses. The SNR drops from 38dB for 18mm length to 13dB for 46.8mm. Under the diagonal activity, the upper-right and bottom-left parts of the chip dissipate each 4W while the upper-left and bottom-right parts dissipate 8W each. This leads to heterogeneous temperature of the ONIs (54.6255.92°C for case 1, 54.3356.92°C for case 2, 56.1660.85°C for case 3 respectively). Compared to the uniform activity, the diagonal activity exhibits lower average temperature since upper-right and bottom-left parts have lower power. This leads to higher laser efficiency, further resulting in higher OP VCSEL . However, the temperature gradient among the ONIs being higher, additional crosstalk occurs, which results in a lower SNR compared to the uniform activity. A random activity leads to intermediate SNR results. This analysis validates that the ONoC matches with the receiver sensitivity and SNR requirements. Further explorations of the design space allow optimizing the ONoC. For instance, in case a lower SNR is acceptable, P VCSEL and P heater can be reduced for energy saving.
VI. CONCLUSION
In this paper, we proposed a thermal aware design methodology for CMOS-compatible VCSEL-based ONoC. Thermal simulations allow exploring design parameters such as the laser modulation current and heater power in order to reduce the gradient temperature within an ONoC interface. A heater dissipating 30% of the VCSEL power leads to an optimal solution for the considered architecture. SNR analyses allow comparing ONoC under different chip activities. 
