Power and performance benefits of scaling are lost to worst case margins as uncertainty of device characteristics is increasing. Adaptive techniques can dynamically adjust the margins required to tolerate variability and recover a significant part of the benefits lost due to worst-case conditions. Additionally, the stringent timing requirements for the synthesis of low-skew clock trees involve higher power consumption, and limit the adaptability to varying operating conditions. This paper introduces an elastic clocking scheme as an adaptive technique to confront variability and provide substantial power savings by dynamically adjusting to operating conditions. The synthesis and sign-off analysis of the elastic clocks is fully automated. Changes to the design flow and sign-off analysis of elastic clocks are addressed by automation of design flow support.
INTRODUCTION
Increasing process variability and decreasing operating voltage, as feature sizes scale down, reduce potential power-performance gains. Statistical design methods can reduce overdesign due to unrealistic worst-case assumptions [1] , [6] . Large volume parts can be binned and sold at different price points to recover some portion of performance versus yield trade-off. However, binning is not applicable to ASICs due to commercial reasons, and statistical timing analysis does not address the margins needed for environmental variations such as temperature and voltage changes.
Ad-hoc recovery of design margins is common place in today's world. Off-the-shelf processor parts can be over-clocked well beyond their rated speeds by employing sophisticated cooling. By the same token, reducing the supply voltage to run a fast part at the specified frequency can save energy and power, which are becoming a primary concern for all electronic systems. Adaptive Voltage Scaling (AVS) provides this capability by sensing onchip conditions dynamically and reducing or increasing the supply voltage to run the part at the required speed [2] . The power gains in AVS, however, are limited by the ability of predicting data path delays across the variation space [3] .
AVS addresses static (process) or slowly varying (temperature, and to some degree aging) variations. The response time of the voltage regulation loop is usually hundreds or thousands of clock cycles. Cycle-to-cycle variations, such as IR-drop due to dynamic loads, must be handled by increasing the margins, thus reducing the achievable power gains.
Fine-grained application of AVS to individual cores or blocks in an SOC further improves power gains, based on load and performance requirements. However, the clock skew due to voltage domain crossing quickly becomes the limiting factor for performance, and increases the hold time fixing overhead. A solution to overcome this limitation is the adoption of asynchronous communication techniques between blocks [4] . The GALS (Globally Asynchronous Locally Synchronous) approach provides the flexibility to have each block driven by its own separate clock, and possibly supply voltage, while still enabling safe communication with other blocks. The main drawback of a GALS approach is the synchronization latency required to cross different clock domains, which may have a significant impact on the performance of the system. Elastic clocks, where the period is dynamically adjusted to data path delays at the current operating conditions, provide the ability to minimize AVS margins due to IR-drop and clock skew. They also reduce latency in inter-block communication due to the asynchronous nature of the local clock controller protocol. Elastic clocks are implemented in a synchronous design flow through the desynchronization process.
DESYNCHRONIZATION
The separation between functionality and performance has always been a cornerstone of digital circuit design, enabling the development of tools that support functional specification, using snthesizable Verilog or VHDL; logic synthesis; physical design; equivalence checking and static timing analysis. Even testing schemes based on coupling of full stuck-at functional testing with limited at-speed performance testing, benefit from this separation.
2.2
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 
8
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Asynchronous circuits are interesting for the per modularity and Electro-Magnetic Interfer However, they so far suffered from a fundament separation between functionality and performanc new tools and languages to be developed a making their widespread adoption virtually imp years, extensive work has been done on a l generation technique known as "desynchronizati The desynchronization process is automate essentially augments Clock Tree Synthesis in flow, by creating localized clocks, and synthe logic between the asynchronous controllers th thus creating elastic clocks. This control layer delays to guarantee the appropriate synchroniza and hold) between all pairs of communicating re Desynchronization retains the separation betw and timing throughout the design flow, becaus not modify the logic and registers (ap transformations of flip-flops into latch pairs). I circuit as a result of desynchronization still work of "cycles", except that now the cycles are det combinational logic delays (including the variations) and handshaking, rather than by an synchronous clock reference. As such, it permit blocks, design tools and design flows that were for synchronous design, but provides the asynchronous circuit to a "traditional" designer.
ELASTIC CLOCKS
The control layer, and the corresponding de generate the clocking for different partition configure the elastic clocks that dynamica frequency to the data path delays. The delay ele close to the associated data path blocks, incre correlation between the data path and the delay the margins required to cover the variability can control layer and the delay elements can immed cycle-to-cycle variations that affect the data path and voltage ripple, and can delay the clock pulse guarantees that the circuit operates correctly in static or dynamic variations than traditional sync
The delay lines are synthesized such th communicating registers is always clocked to hold times. The false paths are ignored, as in us Analysis, and do not contribute to the synthesis Multi-cycle paths require special attention. Dur the longest delay, the delays of the multi normalized with the multi-cycle factor, and they on their slack. As a simplified example, if the mi between a pair of registers is due to a multi-cycl then the delay line is synthesized such that it delay of the multi-cycle path.
Elastic clocks are generated on a per cloc multiple clock domains are handled by desy domain separately. In this scheme, asynchrono are handled without any extra steps. The commu asynchronous clock domains is left untouched th elastic clocks. Synchronous clock domains handshake signals to control data transfer betwee
The area overhead of elastic clocks is rformance, power, ence properties. tal violation of the ce, which required and learned, thus possible. In recent ocal clock signal ion" [5] , [9] .
ed by a tool that a standard design esizing handshake at generate them, contains matched ation timing (setup gisters.
ween functionality se it actually does part from some In fact, the elastic ks using the notion termined by local effects of PVT n external PLL or ts the re-use of IP originally created e advantages of elay elements that ns of the circuit, ally adjust their ements are located easing the spatial y elements. Hence, n be reduced. The diately react to the h, such as IR-drop es as needed. This n a wider range of chronous AVS.
hat each pair of ensure setup and sual Static Timing of the delay lines. ring calculation of i-cycle paths are y contribute based inimum slack path le path of factor 3, has one third the ck domain basis, ynchronizing each ous clock domains unication between hrough insertion of require explicit en them. s minimized by transforming only the interface fli latches. Interface flip-flops are defined input from other partitions. Internal from their partition in their transitiv automatically created for the master la connected to the slave enable trees, w flip-flops that are internal to the parti master enable trees is minimized by number and location of the required m logic has a constant overhead of a few hence the relative area increase depend partition connections which is mirrore overall area overhead for a partition siz gates is about 2%.
Scan testing is supported with the e non-overlapping clocks applied to latches, hence operating them s overlapping clocks can be generated in or provided externally. Overall, the e scan-based methodology and ATPG change is necessary for testing flows.
Clock gating is a widely used tec power. During desynchronization, ex are copied to slave enable trees. If registers across multiple slave enables cloned. The data path delays include matched delay lines are adjusted to re arrival times, if they become critical.
DESIGN FLOW
The insertion of elastic clocks is don level netlist, just before clock tree synt is first partitioned into a set of logic b created for each block. Handshake synchronize elastic clocks that dri communicating block. The period determined by the post-placement tim corresponding block. Figure 2) for proper e to match data path delays at all sign-off corners. The margins between the delay elements and the corresponding data path delays can be optimized using statistical methods [7] .
Figure 2. Elastic clock connections
During sign-off, explicit timing checks are done between the data path and the delay elements at all corners, to guarantee the correct clocking of the logic to meet the setup and hold conditions. In Figure 2 , this corresponds to verifying that the data path delay between partitions A and B is always smaller than the delay between the associated controllers ("Delay") at all design corners, including the clock-tree insertion delays in each partition and the setup times of the receiving registers. These checks are performed using standard industrial sign-off tools by means of automatically generated scripts.
At the top level, each elastic block is connected to the desynchronized on-chip network asynchronously, by using the handshake signals. The handshake signals are used to "sense" the operating conditions and drive a voltage controller circuitry. This provides a unique opportunity to apply AVS where the sensing circuit is also an elastic clock generator, thus reducing the stringent clock skew requirements across multiple cores or blocks and reducing margins (e.g. due to cycle-to-cycle IR-drop). These signals are also used to control the voltage for a fine-grain AVS [8] . 
CONCLUSION
An AVS scheme using elastic clocks eliminates stringent clock skew requirements across multiple cores and blocks, and reduces margins due to cycle-to-cycle variations. The asynchronous nature of elastic clocks avoids the latency penalty introduced by GALS schemes, and makes fine grain voltage scaling a possibility without performance overhead. 
