Abstract-The next generation of MPSoC points to the integration of thousands of IP cores, requiring high performance interconnect for high throughput communications. Optical onchip interconnect enables significantly increased bandwidth and decreased latency in MPSoC. However, the interface between electrical and photonic devices implies strong layout constraints that may impact the system performance and scalability. In this paper, we propose a novel optical interconnect named CHAMELEON. The interface simplifies the layout and allows the bandwidth between IP cores to be adapted according to the communication requirements. Compared to related networks, CHAMELEON demonstrates improved scalability and flexibility at the cost of minor increase in power consumption.
INTRODUCTION
Technology scaling down to the ultra deep submicron domain provides for billions of transistors on chip, enabling the integration of hundreds of cores. Many core designs are being increasingly used in modern embedded systems to address the increasing power and performance constraints of embedded applications. Given the increasing number of cores, we are faced with a major challenge in the design of many-core embedded systems: the design and implementation of interconnect that can support high data bandwidth between the cores, as well as between cores and on-chip memories. Designing such systems using traditional electrical interconnect poses a significant challenge: due to capacitive and inductive coupling [12] , interconnect noise and propagation delay of global interconnect increase. The increase in propagation delay requires global interconnect to be clocked at a very low rate, which limits the achievable bandwidth and overall system performance. Some attempts were made to solve this problem using different interconnect architectures; however, a new on-chip interconnect technology that can overcome the problems of electrical interconnect is highly desirable.
Optical Network-on-Chip (ONoC) is an emerging technology that is considered as one of the key solutions for the future generation of on-chip interconnects. It relies on optical waveguides to carry optical signals, so as to replace electrical interconnect and provide the low latency and high bandwidth properties of the optical interconnect. One of the main factors contributing to the impact of an ONoC on the overall performances of an MPSoC is the Optical Network Interface (ONI) architecture. The data rate, the flexibility and scalability as well as the ease of layout synthesis are some examples of MPSoC metrics that are directly related to ONI definition.
In this paper we propose CHAMELEON, which stands for CHANNEL Efficient ONoc, a novel optical interconnect. Compared with existing ONoCs, the main features of the proposed architecture are:
• Higher scalability due to the reuse of layout synthesis offered by a regular ONI and ring topology; • Highly adaptable bandwidths between IP cores offered by the reconfigurable feature of the ONI; • Reduced power consumption by assuming the combined use of on-chip lasers and both clockwise (C) and counterclockwise (CC) directions for signal propagation; • Higher bandwidth by considering waveguide partitioning to realize independent communications using the same wavelength in the same waveguide. The paper is structured as follows. Section II discusses the ONoCs proposed in related work. Section III presents CHAMELEON, the proposed ONoC. Section IV gives the evaluation results. Section V concludes the paper.
II. RELATED WORK
ONoCs proposed in related works are divided into two classes, active or passive, indicating the use of configurable or passive Microring Resonators (MRs) respectively. The type of MRs directly impacts the network scalability and efficiency. Indeed, most scalable networks are obtained by using configurable MRs since they allow multiplexed communications in time over photonic devices, leading to circuit switching architectures commonly addressed in electrical Network-on-Chips (NoCs) [2] [3] . However, most efficient networks rely on passive filters since they do not require any arbitration [1] [4] due to the dedicated point-topoint communications. This leads to ORNoC [4] , which allows wavelengths to be reused in a same waveguide to design 978-3-9815370-2-4/DATE14/©2014 EDAA energy-efficient point-to-point channels. CHAMELEON extends ORNoC with a reconfiguration layer to open channels at runtime, thus allowing better adaptation of the bandwidth according to the application traffic.
Snake [10] is a wavelength-routed optical network providing point-to-point connections between IP cores. A custom place-and-route optimizes the layout by reducing the number of waveguide crossings and the length of waveguide. However, the use of Photonic Switching Elements (PSEs) implicitly requires waveguide crossings that cannot be removed, which significantly impact the optical losses. CHAMELEON does not need any waveguide crossings thanks to the ring topology, which leads to more efficient energy transmission of data.
In ATAC [8] , the optical network is used for global broadcasting. Its topology is somewhat similar to CHAMELEON, but the contention-free property is based only on WDM, while in our network it is based on both WDM and wavelength reuse. CHAMELEON has the potential for fewer waveguides/wavelength, which eases scaling to a large-scale architecture. Moreover, contrary to our approach, ATAC does not support simultaneous communications between one source and multiple destinations, unless it is a broadcast of the same message. As opposed to ATAC, our network relies on on-chip laser sources that provide key advantages discussed below.
Efficient on-chip lasers usually require the inclusion of III-V semiconductors: gallium arsenide (GaAs) or indium phosphide (InP) are currently considered to be the best options. Microlasers, based on microdisk structures coupling light evanescently from the cavity resonant mode to the guided mode in an adjacent silicon waveguide, are sufficiently compact as to be implemented in large numbers and at any position. For a given wavelength, the size of an on-chip laser is of the same order of magnitude as the size of a MR used to modulate continuous waves emitted by off chip lasers, which leads to a similar on-chip size for both approaches. While onchip laser sources require the use of less mature technologies compared to their off-chip counterpart, they have the potential to provide the following three key advantages:
• Easier and more efficient integration by relaxing constraints on layout: it is not necessary to distribute the light from an external source to the modulators (e.g. through the so called power waveguide in Corona [7] ). Relaxing such constraints contributes to reducing the number of waveguide crossings or even to avoiding them altogether in the ring topology.
• Higher scalability by keeping the architecture fully distributed, which is not achievable by considering centralized off-chip lasers.
• Lower power by reducing the worst case communication distance: source IP destination IP with on-chip laser versus off-chip laser source IP destination IP. This contributes to reducing the propagation losses and consequently the minimum required laser output power. The power consumption can be further improved by locally turning off the laser when no communication is required.
To the best of our knowledge, the network presented in this paper is the first taking advantage of the three abovementioned key advantages. No waveguide crossing is required and the layout is regular, which make the network implicitly scalable without any custom place-and-route tool [6] [10] .
These features are achieved thanks to both the use of on-chip lasers and the special architecture design. For example, these benefits are not observable in architectures like Snake [10] , λ-router [1] or WANoC [11] even employing on-chip lasers. CHAMELEON remains fully distributed (including the laser sources) which further contributes to its scalability and also offers the potential for custom design and local run-time control of optical resources, e.g. to turn ON-OFF lasers.
III. CHAMELEON ARCHITECTURE
In this section, we describe CHAMELEON and the ONI. Possible communication schemes are highlighted and an optical loss model is proposed. The main feature of the network is that it has a regular architecture. Moreover, the ONI architecture is regular and can be reconfigured during run-time. The regularity of the ONI architecture facilitates reuse and layout synthesis, while runtime reconfiguration facilitates low power communications. Moreover, as in ORNoC [4] , CHAMELEON allows the reuse of wavelengths to realize several independent communications on a single waveguide. Run-time reconfiguration and wavelength reuse allows the use of the available bandwidth to be adapted according to the communication traffic. The combined use of WDM and multiple waveguides leads to a high overall bandwidth in the optical network.
A. Architecture Overview

B. Network interface
The ONI, illustrated in Figure 2 , facilitates the communication between IP cores through the optical interconnect. On the technological side, an ONI is composed of an optical part used to propagate data between IP cores through the optical network and an electrical part responsible for the resource allocation through the control network.
Figure 2: Optical Network Interface
Each ONI consists of a receiver part and a transmitter part crossed by a waveguide. The receiver part is composed of wavelength-specific MRs that can be turned ON or OFF, in order to respectively configure drop (receive) and pass through operations on the signals at the corresponding wavelength. Signals dropped from the waveguides reach a photodetector, where opto-electronic data conversion generates an electrical signal suitable for electrical receiver circuit. Optical signals that pass through an ONI continue the propagation in the waveguide until they reach the receiver, i.e. an ONI with the MR of the same wavelength that is in the ON state.
The transmitter is composed of on-chip laser sources that can emit and inject optical signals at a specific wavelength into the waveguide. The data are directly transmitted from these lasers through current modulation and each laser source can also be turned OFF in case it is unused.
The receiver and transmitter parts have a symmetric structure: the receiver can eject signals at a given set of wavelengths while the transmitter can inject signals with the same set of wavelengths. In CHAMELEON implementing WDM with N wavelengths, N MRs and N on-chip lasers are used for each waveguide. Each MR or laser source can be configured (i.e. turned ON/OFF) independently. The configuration of these resources is performed by the electrical part of the ONI (i.e. the control network) and it allows the following operations to be realized:
• injection: as with sending, the electrical data coming from the IP cores are converted into current, used to control an on-chip laser. For this purpose, the laser must be turned ON. The light is emitted and injected into the waveguide, and then propagates until reaching the receiver part of the destination ONI. Since each laser is wavelength-specific, the selection of the lasers to be used to realize a communication relies on a communication protocol not detailed in this paper. Figure 2 illustrates the injection of signals with wavelength λ 1 (shown in green) into the waveguide.
• ejection (drop): optical signal propagating along the waveguide will cross the MRs of the receiver. Signals, whose wavelength matches with the wavelength of MRs in ON state, will be dropped into the perpendicular waveguide, and reach the photodetector (this happens at the receiver part). Figure 2 illustrates the ejection of signals with wavelengths λ 0 and λ 1 from the waveguide (shown in red and green, respectively). It is important to notice that for the remaining part, further along the waveguide, the ejected wavelengths (λ 1 and λ 2 ) are unused. This allows the reuse of the same wavelength in the injection part of the ONI in order to realize another communication, as illustrated with the optical signals at λ 1 .
• pass through: an optical signal propagating along the waveguide will not be ejected, and no optical signal at the same wavelength is injected for the sake of coherency and to avoid interference. The signal thus crosses the ONI without being modified, as represented by the signal at wavelength λ 2 (blue color) in Figure 2 , meaning that the receiver for the signal is further along the waveguide. When no communication occurs, all the MRs and laser sources are turned OFF for energy saving. They are turned ON/OFF according to the configuration specified by the electrical control network, in order to allocate resources dynamically according to the communication to be realized.
C. Communication Schemes
The configurability of CHAMELEON allows multiple communication schemes to be realized by opening dedicated channels between IP cores, as illustrated in Figure 3: • Opening dedicated point-to-point communication channels is facilitated by waveguide partitioning, which allows ONIs to reuse a given wavelength to realize multiple independent communications in the same waveguide. In Figure 3 a), λ 0 is used to realize communications ONI A ONI B , ONI B ONI C and ONI C ONI A . Concurrently, λ 1 and λ 2 are used to realize ONI C ONI B and ONI A ONI D respectively. This facilitates the virtual partitioning of a waveguide for a given wavelength.
• Broadcast (resp. multicast) can be realized by opening dedicated communication channels between a source ONI and all the remaining ONIs (resp. the destination ONIs). In • Multiple waveguides can be used to propagate optical signals in both clockwise (C) and counter-clockwise (CC) directions. In addition to reducing the worst-case losses in the network (and consequently the power consumption, as will be discussed later on), this allows bi-directional dedicated communication channels to be opened, which will be suitable for processor-memory communications. These communication schemes can be combined as long as sufficient bandwidth in the network is available. For instance, high bandwidth channels can be opened, while other lower bandwidth channels are already open. This high flexibility makes CHAMELEON suitable to execute applications from various classes (or domains). However, opening channels at the granularity of the wavelength leads to a higher complexity in the control network, which may result in additional latency during the allocation of optical resources to channels. To make CHAMELEON efficient, each channel should thus transmit the largest set of data possible before closing. This suits the streaming model of computation particularly well, since it usually requires the transfer of large amounts of data for a short period. 
D. Optical Loss Model
CHAMELEON is composed of on-chip laser sources, MRs, photodetectors and waveguides that introduce optical losses as the signal propagates. In order to evaluate the minimum laser output power P min_laser (in dBm), the minimum optical power received by the detector P min_receiver (in dBm) and the worst-case losses in the optical path L wc (in dB) are considered as follows:
L wc depends on the propagation loss in the waveguide L waveguide (in dB), the through loss L through (in dB) and P drop (in dB), which corresponds to the drop loss occurring when a MR is in the ON state (in CHAMELEON, an optical signal crosses a single MR in drop mode; this occurs in the destination ONI to eject a signal from the waveguide). In addition, we assume that there is negligible bending loss and no waveguide crossing, due to the topology and layout properties [4] , which leads to the following equation:
L waveguide is obtained from the intrinsic propagation losses of the optical signal in the waveguide P propagation (in dB/cm) and from d max (in cm), the longest distance between the source and destination assuming a serpentine layout. By considering only the C direction for the signal propagation in a network including N×M ONIs, d max is defined as follows:
where d is the distance between two neighboring ONIs. By considering C-CC (i.e. C and CC directions for signal propagation through the use of separated waveguides), d max is defined as follows:
L through is the product result between the loss for each MR in through mode P through (in dB) and N through , the maximum number of MRs in the through mode passed by an optical signal at the corresponding wavelength. By considering the C direction, N through equals (N×M-2). By considering C-CC directions,
when M is odd or even, respectively.
It is worth noticing that d max and N through are significantly reduced for the C-CC case compared to the C case. This will result in a lower worst-case loss, which directly contributes to the energy-efficiency of the network. Indeed, for a given photodetector responsivity and a given target Bit Error Rate (BER), the lower loss in the communication path results in a lower minimum required laser output power.
IV. RESULTS
We compare CHAMELEON with Snake [10] , ORNoC [4] and SWMR (Single Write Multiple Read), which is modeled on ATAC [8] . Snake and ORNoC are passive networks relying on multistage and ring topologies respectively. For CHAMELEON and ORNoC, we consider both C-only and C-CC directions for signal propagation, thus leading to CHAMELEON C , CHAMELEON C-CC , ORNoC C and ORNoC C-CC .
A. Architectures
The comparison is achieved by considering 3 architectures: Arch 1 is extracted from [10] and is a processor to memory application. Figure 4 a) illustrates the layout considered for Snake: it is adapted from [10] to match the requirements of a fully-integrated system in which 4 processors (P 0 …P 3 ) and 4 memories (M 0 …M 3 ) share the same 20x10mm² electrical layer. Processors are interconnected through a crossbar located in the center (not shown in the figure), which avoids placing Snake in this area. We assume a 5mm distance d between optical interfaces in the optical network. The layouts for ORNoC and CHAMELEON involve closed waveguides successively crossing M 0 , M 1 , P 1 , P 3 , M 3 , M 2 , P 2 and P 0 . For a fair comparison with CHAMELEON, we assume that on-chip lasers are used in Snake.
Arch 2 corresponds to 4x4 IP cores (IP 0 …IP 15 ) connected with the optical network. A 20x20mm² die size is assumed and d=5mm. Figure 4 b) represents the layout for Snake which is designed to avoid any waveguide crossing between the network interfaces and the Snake multistage itself (Snake is located in the middle of the optical layer for layout optimization purposes and is represented as a box for the sake of clarity). Snake interconnects 16 inputs (in red lines) with 16 outputs (in black lines) through 112 PSEs. Th of Snake would assume 120 PSEs but, for a we adapt the reduction method from [1] to S remove unused PSEs. The layout we assume and ORNoC is the one illustrated in Figure 1 .
Arch 3 extends Arch 2 to 8x8 IP cores, th ATAC architecture: 20x20mm² die size an assumed. The size of Snake is increased to connectivity requirement, and the layouts of C ORNoC are extended from Arch 2 . Table 1 summarizes the parameters we use networks. Similarly to [8] , we consider conse aggressive (Ag) values. 
B. Network Comparisons
We evaluate the following ONoC chara case optical loss (L WC ) considering both Ag number of waveguides (N WG ), number of waveguide (N WL ), number of on-chip las number of MRs (N MR ) (NB: for Snake, the takes into account both the MRs based filter part of the network interface and the MRs ba network itself -one PSE counts for 2 MRs).
Regarding Arch 1 , we estimate the worst c Snake as follows: from P 2 to M 2 , the signa through a distance estimated to be equivalen minimum distance between ONIs (i.e. d). Sinc path also suffers from 6 waveguide crossing PSEs), the total loss is estimated to be 1.7dB Ag and Co values respectively (considering m deposited technology, which allows 3D p would contribute to reducing the losses composed of 56 MRs, including 12 PSEs (as bracket in Table 2 ). It requires 4 wavelength which is the limit considered for ORNoCs an ORNoC C shares similar characteristics t considering conservative values. However, b does not suffer from any waveguide crossin improvement is obtained when considering a This allows, for instance, a allocated for a given memory to pr other channels. As another examp opened between 2 processors, whic and ORNoC unless it is specified at lasers are turned-off, CHAMELEON power consumption. As a primary offers a run-time flexibility to adapt according to the connectivity req acceptable extra losses compared to mentioned in [10] ). 
C. Power Efficiency of CHAMELEON
We evaluate the total laser output power required for the evaluated worst-case loss in Arch3 and for the target 10 -12 BER [6] . Since we consider a germanium photodetector with the responsivity of 1A/W, the minimum received power is consequently -20dBm (10µW) for error-free operation with the target BER. Figure 5 represents the estimation for the aggressive values. The smaller waveguide for Snake does not alleviate the high number of waveguide crossings specific to the multistage topologies (24.45mW and 332W for Ag and Co respectively). ORNoC C-CC is the most power efficient network, requiring 10.67mW (Ag) and 417mW (Co). Compared to ORNoC C-CC , CHAMELEON C-CC requires 7.4% and 42% additional power for Ag and Co respectively, which appears acceptable considering its reconfigurability features. However, simulations are required to evaluate its run-time behavior since, on one hand, we assumed the network already configured as a crossbar (thus not considering the reconfiguration time) and, on the other hand, we do not take advantage of the potential of CHAMELEON to reduce the system power or to improve the execution performances by adapting the bandwidth to the application traffics. In this paper, we presented CHAMELEON, a novel ONoC that makes good use of WDM to create communication channels between IP cores. To the best of our knowledge, this network is the first allowing the run-time creation of point-topoint (i.e. dedicated) channels without any waveguide crossing in the optical path, which leads to energy-efficient optical transmission of data. Compared to related static (i.e. nonconfigurable) ONoCs designed to fully interconnect 8x8 cores, CHAMELEON can be configured at run-time to realize the same connectivity with an energy overhead of 7.4% when compared to the most energy-efficient non-configurable solution. The ring topology and the regular layout of the interfaces contribute to the good scalability of CHAMELEON. The combined use of clockwise and counter-clockwise directions for signal propagation allows a substantial improvement of its energyefficiency and scalability. The reconfigurable ability of CHAMELEON allows the bandwidth to be adapted between IP cores according to application traffic requirements, which will further reduce the energy/bit transmission of data for a given application. This will be further evaluated in future works.
