Abstract-Multi-level voltage scaling is a highly effective technique for reducing power and matching required speed in an integrated circuit. However, additional circuitry is required at the interfaces of the circuit blocks which operate at different voltage levels. These circuits impose a significant delay overhead and can prohibit the use of multi-voltage scaling at specific critical paths. These paths often cross boundaries of blocks that can otherwise operate at a different supply voltage, eliminating the benefits of multiple voltage domains. A by-pass circuit is proposed to alleviate these timing issues and simultaneously support multivoltage scaling under specific operating conditions. The new circuit results in performance improvements of up to 89% and power reduction up to 52% compared to traditional level-up and level-down shifters in a 32 nm technology node. Furthermore, greater performance and power savings are demonstrated where more cells are being by-passed, such as the isolation cells.
I. INTRODUCTION
A traditional challenge for the semiconductor industry is the reduction of power alongside maintaining the performance of a system. Various methodologies have been proposed for decreasing power, such as clock [1] and power gating [2] , gate level power optimization [3] , multi-voltage supply (MVS) [4] , multi-threshold logic [5] , and power sequencing [6] . MVS and power sequencing are the most efficient techniques for reducing power consumption due to the quadratic dependency of dynamic power on supply voltage. Additionally, multivoltage supply is a technique where performance is considered alongside power, as in modern SoCs, different blocks have different performance objectives and constraints [4] .
However, several challenges arise in the design process for MVS and power sequencing systems, notably at the interfaces of the blocks. The primary difficulty in utilizing these techniques in a system is the necessity of additional circuitry at the interfaces between blocks which operate at different voltage supplies. Signals propagate between blocks that utilize different power rails. Therefore, additional circuits are required to scale up/down the voltage level of signals and retain the previous state or clamp signals to a specific state when a circuit block is powered down [6] . These interface circuits, particularly level-up shifters that convert the signal from a low voltage to a high voltage, add significant delay to those paths where these level shifters are employed. Therefore, research effort has been placed to improve the performance of these cells [5] , [7] , [8] . A traditional feedback-based levelup shifter is adopted at 0.35 µm technology node in [7] to support "by-passing" functionality by employing pass transistors. Furthermore, multi-threshold cells are employed to improve the power and the performance of level-up shifters in [5] . However, in deep submicrometer technologies, the supply voltage headroom is smaller than the voltage threshold, therefore pass transistors drive weak signals not able to support voltage conversion. In addition, in a multi-threshold CMOS technology, the available threshold voltages are limited to few discrete values, thus decreasing the effectiveness of the proposed design in [5] .
Furthermore, the additional delay of these cells hinders the timing closure for a circuit. A typical example of this situation can be observed in cached CPUs, where a core can usually operate at a lower voltage than level 1 (L1) cache 1 to yield further power reduction [6] , [9] . However, the timing critical paths often include the interconnections between the core and the cache. To enable different power supplies between the cache and the core, voltage interface circuits should be added which entail a considerable penalty in delay. Consequently, to avoid a performance loss the power supply remains the same for both the L1 cache and core and is scaled less aggressively. This situation leads, in turn, to lower power savings.
To mitigate this issue, an advanced interface circuit for maintaining performance in systems with multi-level power voltages is presented. The novelty of this design is based on the principle that in a multi-voltage scaling environment, different blocks can have the same voltage in specific operation conditions, thus the additional circuitry can be by-passed. This circuit interfaces different voltage domains and is suitable for by-passing several cells, such as level shifters, clamp/isolation cells, and retention flops employed in MVS and power sequencing techniques. The performance of the circuit is investigated in several operating conditions, typically employed in systems with multi-voltage scaling power supplies. In addition, as several types of circuits can be by-passed, depending on the complexity of the interface, the merit of employing the proposed circuit across all these scenarios is evaluated.
The paper is organized as follows. In Section II, the proposed by-pass circuit is described and a discussion about which circuits can be by-passed is presented. In Section III, the simulation setup and results for all the traditional metrics, such as, performance, power, and area, of the proposed circuit are discussed. Some conclusions are offered in Section IV.
II. BY-PASS CIRCUIT DESIGN
In this section, the design of the by-pass circuit is described. The proposed interface circuit is based on the notion 1 The voltage required for stable operation in memory elements is typically higher than standard logic cells [6] .
978-1-5090-0493-5/16/$31.00 c 2016 European Union. that in a multi-voltage scaling environment, different blocks have the same voltage in specific (e.g., high performance) operating conditions. Hence, the level-conversion circuitry can be circumvented. The proposed by-pass circuit comprises 3 transmission gates of the same size and an NMOS transistor around the interface cells, as illustrated in Fig. 1 . The first two transmission gates (T G1 and T G2) operate as a demultiplexer which has a single data input and using the control signal (Sel) outputs the data accordingly to one of the two paths. The transmission gate (T G3) ensures that no current flows through the by-passed circuit. Furthermore, the NMOS transistor (MN1) pulls low strongly the input of the interface circuit, thereby avoiding undesirable switching and leakage due to a weak ground at Node 1, which can result when T G1 is off. The circuit operates as follows. If the voltage domains operate at different voltages, Sel is set high and the level conversion cell is employed to amplify the signal. If the voltage domains are at the same voltage, Sel is set low and the bypass path is utilized. In addition, MN1 is enabled to ground the output of the T G1 and the T G3 is turned off to prevent current flowing backwards.
The by-passing circuit is oblivious to the type of the interface employed, thus being applicable to all types of interface circuits, such as isolation cells and retention flops in addition to voltage shifters. Furthermore, latches that synchronize the interface between blocks in two different voltage domains can be by-passed in high performance mode where the frequency ratio between these blocks is 1:1. In the case where the interface is complex and contains several components connected in-series [6] , the benefits from by-passing this interface are higher. Alternatively, for signals with multiple fan-out, the demultiplexer can be extended by flanking each interface circuit with two transmission gates and maintaining the low latency by-pass path for the iso-voltage operating conditions. In a MVS system, where a few and fixed voltage levels are supported for different operating conditions, the power management unit can be programmed to generate these signals [6] . In the same manner, select signals can be generated in a dynamic voltage and frequency scaling (DVFS) environment. The details of generating the selection signals from the power management unit are beyond the scope of this work. Furthermore, in an adaptive voltage scaling scheme (AVS), where a control loop is used to adjust the voltage of different blocks, the proposed circuit in [10] can be utilized for generating the control signal(s).
III. RESULTS
The simulation setup and the effectiveness of the proposed circuit in terms of propagation delay, power, and area overhead, are presented in this section. The by-pass interface circuit is simulated with HSPICE R [11] at a 32 nm technology node [12] and pre-designed circuits for level-conversion and isolation cells are obtained from the Synopsys R 32 nm generic library [13] . The nominal operating voltage for 32 nm CMOS technology is 1 V olt [12] . Therefore, a typical ±0.2 V olt swing from the nominal supply voltage is considered for lowpower/high-performance conditions in a multi-voltage environment [14] . The proposed circuit is simulated in a variety of operating scenarios listed in Table I . Blocks 1 and 2 can be considered as the core and the L1 caches, respectively. Paths, which traverse these blocks from core to L1 cache, utilize level-up shifters. Alternatively, paths from L1 cache to core use level-down shifters. The first three scenarios (A, B, C) represent the situation where the core operates in reduced voltage as compared to L1 to maximize power savings. Alternatively, scenarios D and E represent nominal and high performance modes, respectively, where both blocks have the same voltage. 
A. Performance Analysis
The delay of the proposed by-pass circuit is investigated in this subsection. The feedback-based level-up shifter 2 (FLS) [15] and the traditional level-down shifter 3 (LSDN) [13] are utilized in our by-pass design for performance characterization. Emphasis is placed on the feedback-based level-up shifter, as is broadly used and adds large delay on the paths [6] , [13] .
The performance traits of the proposed by-pass circuit (PC), where FLS is utilized in a MVS system, are illustrated in Fig. 2 . At the same operating voltages, the propagation delay is decreased, on average (scenarios D and E), by 86%, where level-up shifters are by-passed. In the case where the blocks are supplied by the highest voltage assumed for the employed technology node, which indicates that the highest speed mode is enabled (scenario E), the delay decreases by 89%. Therefore, potential timing issues due to the considerable delay of the level-up shifters can be effectively alleviated. In contrast, if the interconnected blocks operate at different voltages (scenarios A, B, C), there is an overhead in delay of 10.7%. However, this overhead is negligible for the delay of the entire path as these modes represent the power saving and consequently low-speed modes of a MVS system. Thus, timing closure is less of an issue at these low-speed modes. Furthermore, if isolation cells 4 are also connected in-series with the FLS, the performance improvements are further increased, on average (scenarios D, E), by 92% while the delay overhead at lowspeed modes (scenarios A, B, C) remains low, on average, at 9.5% (see Fig. 3 ). In addition, the performance traits of the proposed by-pass circuit (PC), where LSDN and isolation cells are utilized in a MVS system, are listed in Table II . For the scenarios A, B, C, the delay overhead from the proposed circuit slightly increases to 15%, on average, as level-down shifters are faster than the FLS. However, the performance gains are noticeable (84% speedup on average) for the cases where the by-pass path is enabled (scenarios D, E). Moreover, paths, which traverse critical blocks in two industrial circuits (Ind 1 and Ind 2), are simulated at a high performance mode (see Table I ) to capture their latency for a variety of interface cells. The results are listed in Table III. In the case where level shifters (FLS) are utilized, the latency of these paths is increased by 4.8% and 9.7% for Ind 1 and Ind 2, respectively, as compared to where no interface cells are employed. Furthermore, the delay increases more, if isolation cells are also connected in-series with the shifters (FLS+ISO), by 7.7% for paths in Ind 1 and 13.8% for Ind 2. However, the additional delay of the PC (By-pass) is negligible as the latency of these paths is increased by 0.7% for Ind 1 and 2% for Ind 2. This behavior demonstrates that timing bottlenecks at the critical interfaces of blocks are effectively alleviated in high performance conditions at multi-V dd systems by employing the proposed by-pass circuit. Note also that latches are utilized in the first industrial circuit path. Hence, in the case where the proposed circuit by-passes the latches in a high performance mode a 5% decrease is observed in the latency of this path. Additionally, the support of disparate voltage supplies can be used conversely where an intrinsically slower voltage domain can be overdriven to match the performance of the faster voltage domain.
B. Power Analysis
In this subsection, the power consumed by the proposed circuit is investigated. The proposed circuit dissipates up to 52% less power than FLS, as depicted in Fig. 4 , by employing the by-pass path where the same supply voltage is applied to both interfaced circuits (scenarios D, E). In contrast, an overhead (average 6.8%) exists when the level-up shifters are employed (scenarios A, B, C) due to the transmission gates. Likewise, this power overhead drops, on average (scenarios A, B, C), to 5% where more cells are by-passed, such as the FLS in series with isolation cells (see Table IV ). In this situation, the power improvements are greater (up to 58.2%), where the same voltage is applied to both blocks (scenarios D, E). Moreover, at scenarios A, B, C, where level-down shifters are utilized, the power overhead from the proposed circuit slightly increases to 9.1%, on average. This is due to the fact that level-down shifters consume less power than FLS, thus, the impact of the proposed circuit is larger. However, the power improvements are noticeable (46% on average) for the cases where the leveldown shifters are by-passed (scenarios D, E). Power dissipation of the proposed by-pass circuit compared to traditional level-up shifters.
C. Area Analysis
In this subsection, the area overhead of the proposed circuit is investigated. The proposed circuit poses an area overhead of 55%, on average, as compared to the traditional level-up/down shifters (see Table V ). In the case where more cells, such as isolation cells alongside level shifters, are bypassed the area overhead of the proposed circuit drops to 41%, on average. Although, this overhead is due to the extra transmission gates of the proposed circuit, the total increase in the area of the block using this circuit is negligible as the blocks typical include hundreds of thousands of cells. In addition, the proposed circuit can be employed selectively to interfaces where different blocks have the same voltage in specific operating conditions. The area overhead on designs, where MVS technique is employed, is listed in Table VI . For a small design, such as Ind 2, the area increases up to 3.5% when our proposed circuit is utilized as compared to the design where traditional MVS cells are employed. On the other hand, for a bigger design (Ind 1), the proposed circuit poses 1.9% area overhead to the design with traditional interface cells. 
IV. CONCLUSIONS
In this paper, a by-pass circuit for multi-voltage scaling systems is presented. The key idea is that in a multi-voltage scaling environment, different blocks have the same voltage in specific operating conditions. Consequently, the interface circuits can be detoured to avoid performance and power losses where high speed operation is required. The proposed circuit is simulated under different operating scenarios at a 32 nm technology node. The proposed by-pass circuit is compared with the traditional level-up and level-down shifters, where speed is enhanced by up to 89% and power consumption is decreased up to 52% where the interfaced blocks operate at the same supply voltage. In the case where the proposed circuit is utilized in critical paths of industrial circuits the latency increase is negligible (0.7% and 2.1%) as compared to the latency of the paths where no level-conversion circuit is employed. This behavior demonstrates that traditional timing bottlenecks at the block interfaces are appeased in high performance conditions for MVS systems by employing the proposed circuit. Thus, MVS can be applied to smaller blocks (e.g., core and L1 caches).
