Abstract. Clock network synthesis is an important part of digital integrate circuit design. For the purpose of further reducing the effect of clock skew and On-chip Variation, this paper realized a clock mesh structure by using Encounter EDI tool basing on traditional clock tree synthesis. Experiment results validated the advantage of clock mesh in clock skew optimizing and On-chip Variation.
Introduction
In digital integrated circuit design, the clock signal is an important benchmark of data transmission. For the features, performance and stability of synchronous digital circuit design, it is plays a decisive role.so the characteristic and implementation of the clock signal network are particularly concerned .With the development of integrated circuit design, increasing the design scale, decreasing the design process, the continuous improvement of the system clock signal characteristics, the clock network structure tends to be more complex, clock network synthesis become an increasingly important part of the digital integrated circuit back-end design. Especially in the 65nm, 40nm and below technology, timing closure problem has become a major bottleneck in the back-end design, the traditional clock tree synthesis, due to their own limitations, cannot meet the design requirements for high performance clock.
Based on the comparative analysis of the traditional clock tree synthesis, this paper uses clock mesh technology to implement clock network of a module. Experimental results show that, reducing the clock insertion delay, reducing the clock skew, while increasing on-chip variation of clock network, verified the advantage of clock mesh.
Clock Network
With decreasing of design process , clock skew and on-chip variation are became a decisive factor in circuit performance. The purpose of clock network synthesis is to minimize the clock skew, phase delay, and on-chip variation, and as much as possible to reduce the clock network power and connection coupling effects [1] .
Clock Skew
The total delay of one path comprises the standard cells delay and the wire delay, the total delay T total =∑T cell +∑T wire (1)
T cell is the cells delay of the path, T wire is the wire delay of the path. Ideally, the delay from the clock root node reaches the clock pins of clock domain different registers should be the same. However, in reality, with the clock wiring length and the loads, it can result in a International Conference on Information Technology and Management Innovation (ICITMI 2015) delay in arrival of different register clock pins in the clock domain. This difference is called clock skew [2] . Figure 1 is a partial timing path [3] , reg1 and reg2 two registers, which CLK from the same clock source. Through clock tree, the time delay of the clock signal reaches reg1 and reg2 are T1 and T2 respectively, with TS represents the clock skew between them,
By inserting buffers and adjusting the wiring between registers, it can theoretically achieve zero clock skew as possible.
Fig. 1 a partial timing path

On-Chip Variation
We use the on-chip variation to describe the different process, voltage and temperature parameters on different parts of one same chip. On-chip variation will have a direct impact on the timing of circuit. With the same signal variations and the same loads, two identical buffer cells will have different cell delay because of the different location on the chip. By the nano-technology, on-chip variation made more stringent requirements of timing.
Clock Tree Synthesis Currently, the way to implement digital integrated circuit design clock network is traditional clock tree synthesis techniques mostly. As the clock tree structure shown in Figure 2 ,the clock signal starting from the root node of the clock tree structure, tracing topology tree, and finally transferring to the leaf nodes , i.e., a clock input terminal of register, or other clock endpoints. Fig. 2 Clock tree structure. Clock tree structure is inserted the inverters and the buffers into the clock paths , driving by the multi-level inverters and buffers, so that can better balance the inverter delay on each clock path. This method can be very good supported and implemented by EDA tool flow, it is also widely used. Limitations of the clock tree is ,the ordinary performance of clock network, more clocks series, large clock insertion delay, not easy to control the clock skew, more sensitive on-chip variation .
Clock Mesh
Clock Mesh is divided clock domains into many grid areas. Clock signals connect to the clock mesh node through the pre-driver buffer chain , the clock leaf node units obtain the required clock signal from a nearby grid.
Clock mesh structure shown in Figure 3 , there are the following three parts: the top level chain, a global mesh and local tree. Fig. 3 Clock mesh structure [4] . Top level chain is the cascade buffer chain from clock root to the first stage of pre-driver buffer. Clock signal starts from the clock root, through the design module boundary into the module, and then transferred to a more reasonable position inside the module by the top level chain, to generate the global mesh, which would also reduce the clock skew because of the clock root position . You can also insert clock gating and other structures in this section. Top level chain of clock mesh structure is optional.
Global mesh use two layers of metal wiring, crossing each other, in order to spread the clock signal that placed by top level chain to the whole clock domain, and well control clock skew and clock delay. Global mesh structure include of pre-drive buffers and grid lines.
Pre-drive buffer requires that driver with a large enough drive strength to ensure the drive capability. A single buffer with largest drive strength still cannot meet the requirements, so need to use multiple buffers to drive global clock grid in parallel. In order to withstand the current of a plurality of parallel drive, it requires multi-widths, multi-pitchs metal wiring. The branch connected to the trunk through the vias, also requires multi-widths, multi-pitchs metal wiring.
In Encounter, there are two structures of clock mesh: H-tree structure and Fishbone. As shown in Figure 4 , the advantages of the H-tree structure is highly symmetrical. The pre-drive buffers evenly distributed on the trunk, it can better control clock skew for the clock domains with a large number of flip-flops. Compared with other structures, H-tree structure will consume more routing resources, while there will be more power consumption. Fishbone structure shown in Figure 5 , the final stage of the grid is driven the trunk by the multi-stage buffers. And then branches and trunks cross orthogonally, and driven by the trunk. Multi-driven pre-drive buffer have to be placed to the under level driver as close as possible. [4] Local tree is connected the clock signal from the global grid to the clock terminal of each leaf node, it can better balance the clock skew and On-chip Variation, while reducing the loads and power consumption when the grid directly drive the leaf node. Local trees are also an optional part. Has to determine the input port of clock source. For local signal distribution is more reasonable, can appropriately adjusted clock skew, signal conversion time and other parameters.
Wiring pitch of global mesh directly affects parasitic coupling capacitance value, increasing appropriately wiring pitch to reduce noise, increasing in the wiring width to reduce latency, while reducing wiring resource consumption. The wiring, barrier, memory, etc.in the design, can cause the trunk branch buffer cannot properly placed, so they have to be plan their layout and display properly.
Compared to clock tree, clock mesh has a smaller clock skew. For clock mesh structure, the time that spread from the clock root to the global mesh is basically the same, the time that spread from the global mesh to each local tree register is different. Because registers are connected to the nearest grid line, in this case, clock skew is smaller than that formed by the clock source through a clock tree structure. Because the clock mesh with a plurality of buffers can be driven in parallel, they are input to the clock terminal of the series less fan-out, so that whose insertion delay is smaller than that of clock trees. At the same time due to the redundant structure clock grid, anti-interference ability of on-chip variation is stronger.
Because of its structure, clock mesh also has some disadvantages. Firstly clock mesh structure consumes more routing resources, and redundant interconnect structure will bring more power consumption. Realization of clock grid technology, on the technical situation at this stage, it is still relatively low degree of automation, the implementation process in many places still need to be involved by designers.
Design and Analysis
This paper implemented a clock network design by smic65nm process, which has 4332 registers, with the Cadence's EDI tools.
The design of top level chain , has to determine the location of the clock signal primary, set the type and drive strength and so on of the buffers by the tool. Max Skew do not set too small, so in order to meet the requirements plugging in too many buffers. Table 1 , as compared to clock tree synthesis, the clock skew and On-chip Variation of clock mesh design have been optimized to a large extent, the clock skew is improved by 47%, improve the On-chip Variation of 43%, while the clock grid use the clock unit is relatively smaller.
Conclusions
Based on the advantages of clock mesh, this paper did the clock mesh and clock tree synthesis contrast design of the experimental module, completed the clock network design well. According to the experimental results, reflecting the advantages of clock mesh on clock skew and On-chip Variation, relative to clock tree synthesis. Clock mesh can be better used in more large-scale circuit design.
