We propose a multilevel full-chip routing algorithm that improves testability and diagnosability, manufacturability, and signal integrity for yield enhancement. Two major issues are addressed.
INTRODUCTION
With ever decreasing feature sizes and increasing chip dimensions, the integration complexity in system-on-a-chip (SOC) designs grows dramatically [1] . The high integration complexity is not only caused by the huge number of transistors and interconnects fabricated in a single chip, but also the modern SOC design issues in testability, manufacturability, and signal integrity. In particular, it is well known that interconnect delay dominates the circuit performance for nanometer IC designs. Therefore, it is desirable to handle the large-scale interconnect integration considering testability and diagnosability (defect reduction, yield enhancement, etc), manufacturability (process variation control, optical proximity correction, etc), and signal integrity (crosstalk minimization, etc) simultaneously.
Testability and diagnosability are very important issues for interconnect design in SOC ICs. Plenty of research works on interconnect testing can be found in the literature. Earlier works on interconnect testing were targeted for board-level testing. However, it is very difficult to apply these interconnect testing methods under the SOC environment without design-fortestability (DFT) support. The popular IEEE P1500 provides a structural support for core testing as well as interconnect testing in SOC. The P1500 SOC test environment consists of a centralized test access mechanism (TAM) and wrappers around cores. The TAM defines the test control, while the wrappers provide a standardized interface for test data transmission. An oscillation ring test (ORT) [2] method for interconnect test was proposed to detect not only stuck-at and open faults, but also delay and crosstalk glitch faults. Many testing and diagnosis problems are incurred by particular interconnect structures, which can be partly solved by carefully determining the interconnect structures. Further, to reduce the probability of multiple faults, it is desirable to reduce wiring congestion in a specific area. This approach is specifically important as the probability of back-endof-line (BEOL) defects (i.e., high-resistance via and interconnect defects) increases [3] . Therefore, many issues with testability and diagnosability should be addressed during routing.
As technology advances, the manufacturing process increasingly constrains physical layout design and verification. The CMP technology [4] is widely used to increase the metal layers integrated in a single chip. CMP induced variation is kept within acceptable limits by controlling local feature (interconnect) density, relative to a process-specific "window size," to achieve global planarization for manufacturability and performance. Thus, balancing interconnect density minimizes the CMP induced variation, and thus routing plays an important role in determining the variation.
OPC is one of the most effective methods adopted to compensate for the light diffraction effect, typically used as a post layout process to improve manufacturability [5] . Again, balancing interconnect density can improve the OPC effects efficiently and effectively since the effects are also influenced by neighboring structures and shapes.
Signal integrity is an important factor that affects yield in nanometer IC technology. Crosstalk affects the signal integrity in nanometer IC technology. Two adjacent wires form a coupling capacitor, and a signal changes on an aggressor net can interfere with the signal on a victim net. Crosstalk is also a crucial issue in modern router design.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. In this paper, we handle the modern SOC design issues of testability and diagnosability, manufacturability, and signal integrity simultaneously in the routing stage for yield improvement (see Figure 1(a) ). Traditionally, those issues are tackled at the post-layout stage. With the increasing design complexity, it is very difficult and even infeasible to handle those issues at the post-layout stage when most interconnect layouts are fixed and not flexible to be changed. In particular, those design issues can all be improves through balancing the routing congestion (see Figure 1(b) ). Therefore, we shall present a congestion-driven routing algorithm for yield improvement. Balancing routing congestion reduces multiple fault probability, CMP induced variation, OPC, and crosstalk, all of which improve yield.
Traditionally, the complex routing problem is often solved by using the two-stage approach of global routing followed by detailed routing. Global routing first partitions the routing area into tiles and decides tile-to-tile paths for all nets while detailed routing assigns actual tracks and vias for nets. The two-level, hierarchical routing framework, however, lacks information for the interactions among the subregions and is thus still insufficient in handling the dramatically growing complexity in current and future IC designs [6] . Therefore, it is desired to employ more levels of routing for very large-scale IC designs. The multilevel framework has attracted much attention in the literature recently. It employs a two-stage technique: coarsening followed by uncoarsening. The coarsening stage iteratively groups a set of circuit components (e.g., circuit nodes, cells, modules, routing tiles, etc) based on a predefined cost metric until the number of components being considered is smaller than a threshold. Then, the uncoarsening stage iteratively ungroups a set of previously clustered circuit components and refines the solution by using a combinatorial optimization technique (e.g., simulated annealing, local refinement, etc). The multilevel framework has been successfully applied to VLSI physical design. For example, the famous multilevel partitioners, ML [7] , and hMETIS [8] the multilevel placer, mPL [9] , and the multilevel floorplanner/placer, MB*-tree [10] , all show the promise of the multilevel framework for large-scale circuit partitioning, placement, and floorplanning. A framework similar to multilevel routing was presented in [11] , [12] . Cong, Xie, and Zhang later proposed an enhanced multilevel routing system, named MARS [13] , which incorporates resource reservation, a graph-based Steiner tree heuristic and a historybased multi-iteration scheme to improve the quality of the multilevel global routing algorithm. Lin and Chang also proposed a multilevel framework for full-chip routing, which considers both routability and performance [14] .
Experimental results on the MCNC benchmark circuits show that the proposed OR method achieves 100% fault coverage and maximal diagnosis resolution for interconnects, and the multilevel routing algorithm effectively balances the routing density to achieve 100% routing completion. Compared with [14] , the experimental results show that our router improves the maximal congestion by 1.24X--6.11X in runtime speedup by 1.08X--7.66X, and improves the average congestion by 1.00X--4.52X with the improved congestion deviation by 1.37X--5.55X.
PRELIMINARIES 2.1 OR Test Architecture for Interconnect
In this section, we discuss the oscillation ring test for interconnects. Oscillation ring (OR) test is a useful and efficient method to detect faults in SOC interconnect [2] . An oscillation ring is a closed loop of a circuit under test in which has an odd number of signal inversions. Once the ring is constructed during test mode, oscillation signal appears on the ring. Figure 2 illustrates a global counter-based test architecture for both delay and crosstalk glitch detection for SOC ICs. This test architecture implements the IEEE P1500 core test standard, in which each input/output pin of a core is attached with a wrapper cell, and a centralized test access mechanism (TAM) is provided to coordinate all test process. In additional to the normal input/output connections, all wrapper cells in a core can also be connected with a shift register, which is usually referred to as a scan path, to facilitate test access. A modified wrapper cell design has been proposed to provide extra connections and inversion control so that the oscillation rings can be constructed through the wires and the boundary scan paths in cores [2] . For example, the ring in Figure 2 consists of one oscillation ring and a neighboring net, and two scan paths in cores C 1 and C 2 form the oscillation ring.
This test architecture can detect stuck-at, open, and delay and crosstalk glitch faults. If an oscillation ring fails to oscillate, it implies that there exists stuck-at or open fault(s) in the oscillation ring. The period of the oscillation signal can also be measured by using a delay counter in a core to test delay faults, and a similar approach can be used for crosstalk glitch detection.A local counter is included in each core, and a central counter is in the TAM of the chip. The central counter in the TAM is enabled by signal OscTest and triggered by the system clock. A local counter is connected to one wrapper cell in each core; however, it can be accessed by every wrapper cell through the wrapper cell chain.
When an oscillation ring passes a core, an internal scan path is formed to connect the oscillation signal to the local counter. For example, consider core C 1 , in which the oscillation ring pass by (see Figure 2 ). The oscillation signal is fed to the local counter through a series of modified wrapper cells that are configured as SI→SO. When an oscillation test session starts (OscTest = 1), the TAM enables its own central counter as well as all local counters in cores. After the counter in the TAM counts to a specific number n, the oscillation test session terminates and all local counters are disabled (OscTest = 0). Then all the local counter contents can then be scanned out to ATE for inspection.
Assume that m oscillation rings are tested. Let the frequency of the system clock be f, and the delay counter contents of the rings be n 1 , n 2 , …, n m , respectively. An estimation of the i-th ring's oscillation frequency f i can be approximated by
Since the frequency of each ring is predetermined during the design phase, a delay fault can thus be detected and measured as compared with the result of the counters.
Process Variation on Oscillation Signals
In order to consider process variation effect on this proposed OR scheme, we conducted an experiment for a ring consisting of 7 inverters (plus transmission gates) and 20µm lines.
The Monte Carlo simulation was conducted by changing the W/L ratio of all transistors and the R, C parameters of the nets. The mean was the nominal value, while the distribution was Gaussian with 3σ = 20% of the nominal value. In all, 30 simulation runs were performed, and the simulation results are shown in Figure 3 , in which all oscillation signals start at time 0. At the end of the first cycle, there is a small variation in the cycle length, and the variations are less than 0.9% of the nominal period of the oscillation signal. The simulation results show that (1) this scheme can oscillate with an odd number of inversions, and (2) the process variation effects with 20% variance contribute to less than 0.9% in the frequency and oscillation period. 
Interconnect Model in OR Test
A multi-terminal net is usually modeled by a hypergraph. The circuit structure of an SOC can be directly transformed into a hypergraph, in which each vertex denotes a pin while each hypernet represents a signal net. However, this graph model is not good enough for the OR test problem, as two branches of a net should belong to two different rings, and they cannot be tested simultaneously [2] . Therefore, it would be better to consider each branch of a hypernet separately, instead of treating them as a whole. Each branch of a hypernet thus corresponds to a 2-pin net, which connects the source vertex to one of its sink vertices. An nterminal hypernet is thus broken into (n-1) 2-pin nets. The result is a normal graph G = (V, E), where E is the set of 2-pin nets.
A complete test for all interconnections is thus reduced to the problem of finding a set of rings that cover all edges corresponding to the interconnection structure in the graph G. This is equivalent to finding a set of sub-circuits (rings) R = {G 1 ,
, G i is a ring, and
If delay fault is considered, signal delay on each net along the ring should also be considered. The period of the oscillation signal is thus the summation of the path delay on all wires and scan paths. A large delay on an interconnect wire can be detected by observing the frequency of an oscillation signal that passes the wire under consideration. The detection can be masked by the variation of delays on other wires in the same ring, and thus the control of process variation is crucial for the correct detection.
Diagnosis with Oscillation Ring Tests
Diagnosis is the process of locating the exact fault site. The oscillation ring test can also be used for interconnect diagnosis. For interconnect diagnosis, the two-pin net model is also not sufficient. Consider the 4-terminal net shown in Figure 4 (a), which is divided into five edge segments e 1 to e 5 . If edge e 1 is faulty, all three rings will not oscillate correctly. A faulty e 3 affects rings 2 and 3, while faults on edges e 2 , e 4 , and e 5 affect rings 1, 2, and 3, respectively. For diagnosis purpose, all these five segments are different.
From the above discussion, it is obvious that hypernets cannot be used for diagnosis. Therefore, the interconnect structure is transformed into a graph model as follows. The scan path and wrapper cells in a core are lumped into a single terminal node, as we assume that they are fault-free. The fanout points of a hypernet form dummy intermediate nodes, and a wire segment connecting two nodes is an edge. For example, the diagnosis graph model for the hypernet of Figure 4 (a) is shown in Figure  4(b) , in which the white node is a terminal node and gray nodes are intermediate nodes. An edge is the smallest unit of a wire segment that can be uniquely diagnosed. From the above discussion, it can be seen that any stem affects all the downstream nodes and edges. 
MULTILEVEL ROUTING FRAMEWORK
We propose in this section a new multilevel routing framework, as illustrated in Figure 6 , that considers routability, performance, testability, diagnosability, process variation, and crosstalk. The oscillation rings for test are based on circuit connectivity, and thus they can be constructed before routing. However, when delay fault is considered, the routing structure must also be considered, since the wire delay is mainly decided by the wire length. On the other hand, the diagnosis process has to consider the actual net layout, and they must be considered after the routing process.
Routing Model
Our global routing algorithm is based on a graph search technique guided by the congestion information associated with routing regions. The router assigns higher costs to route nets through congested areas (or those of higher delay and/or crosstalk costs) to balance the net distribution among routing regions. Before we can apply the graph search technique to multilevel routing, we first need to model the routing architecture as a graph such that the graph topology can represent the chip structure. Figure 5 illustrates the routing graph model.
For the modeling, we first partition a chip into an array of rectangular subregions. These subregions are called global cells (GC). A node in the graph represents a GC in the chip, and an edge denotes the boundary between two adjacent GCs. Each edge is assigned a weight/capacity according to the physical area or the number of tracks of a GC. The graph is used to represent the routing area and is called a multilevel routing graph, denoted by G k , where k is the level ID. A global router finds GC-to-GC paths for all nets on a routing graph to guide the detailed routing. The goal of global routing is to route as many nets as possible while meeting the capacity constraint of each edge and any other constraints, if specified.
As the process technology advances, multiple routing layers are possible. The number of layers in a modern chip can be more than eight. Wires in each layer can run either horizontally (H) or vertically (V) in a grid style.
(a) partitioned layout (b) routing graph Figure 5 . The routing graph.
As illustrated in Figure 6 , G o corresponds to the routing graph of the level 0 of the multilevel coarsening stage. At each level, our global router first finds routing paths for the local nets (or local 2-pin connections) (those nets that entirely sit inside a GC). After the global routing is performed, we merge 2×2 of GC into a larger G i and at the same time perform resource estimation for use at the next level (i.e., level 1 here). Coarsening continues until the number of GCs at a level, say the k-th level, is below a threshold. The uncoarsening stage tries to refine the routing solution of the unassigned segments of the level k. During uncoarsening, the unroutable nets are performed by point-to-path maze routing and rip-up and re-route to refine the routing solution. Then we proceed to the next level (level k-1) of uncoarsening by expanding each G k to four finer G k-1 's. The process continues until we reach level 0 when the final routing solution is obtained. 
Testability-Aware Multilevel Routing
In the coarsening stage of multilevel routing, shorter nets are routed first, and a congestion-driven heuristic is used to guide a pattern router. For all the nets that can be successfully routed, both global route and detailed route are conducted. All the nets that fail to complete will be handled at the uncoarsening stage. At the uncoarsening stage, the failed nets are routed by a global router with a different cost function to avoid heavily congested area, and a detailed maze router is used to determine the final routing path. In addition to the traditional multilevel framework, we incorporate an oscillation ring test in the preprocessing stage Figure 6 ).
Diagnosability-Aware Routing Structure
The minimum spanning tree (MST) topology leads to the minimum total wire length, and thus congestion is often easier to be controlled for MST than other topologies. This topology may result in longer critical paths and thus degrade circuit performance. In contrast, a shortest path tree (SPT) may result in the best performance, but its total wire length (and congestion) may be significantly larger than that constructed by the MST algorithm.
The diagnosis problem also affects the routing structure. For instance, consider the 4-terminal net example shown in Figure 7 .
With the spanning tree connection given in Figure 7 (a), there are three different net segments to be diagnosed. On the other hand, as the diagnosis graph model shown in Figure 4 (b), for the Steiner tree connection given in Figure 7 (b), there are two intermediate nodes (indicated by the two dotted circles) and thus five net segments to be diagnosed. In general, a spanning tree connection employed fewer wire segments to be diagnosed, and thus it is favored in our router. Our algorithm first constructs the minimum spanning tree (MST) structure whenever possible, which is best for diagnosability. Otherwise, it will find a routing tree with the least number of intermediate nodes. 
Cost Metric for Routing Density Control
A router that incurs imbalanced routing density may degrade system performance in many ways.
Crosstalk effects are the results of signal coupling between adjacent wires, and the coupling capacitance is usually inversely proportional to the distance between wires. In a heavily congested area, the distance between adjacent wires is small and thus the probability of crosstalk faults is increased.
Physical defects in a congested area may create multiple faults, which are difficult to be detected and diagnosed.
Process variation due to CMP is usually caused by unbalanced routing congestion/density.
Therefore, it is desirable to balance routing congestion/density in all areas for router design. Given a netlist, we first run the minimum spanning tree (MST) algorithm to construct the topology for each net, and then decompose each net into 2-pin connections, with each connection corresponding to an edge of the minimum spanning tree. Our multilevel framework starts from coarsening the finest tiles of level 0. At each level, tiles are processed one by one, and only local nets (connections) are routed. At each level, the two-stage routing approach of global routing followed by detailed routing is applied. The global routing is based on the approach used in the pattern router [25] and first routes local nets on the tiles of level 0. Let the multilevel routing graph of level i be G i = (V i , E i ). Let R e = {e∈E i | e is the edge chosen for routing}. In order to balance the routing density, we use the cost function α: E i →R to guide the routing:
where c e is the congestion of edge and it is defined as where p e and d e are the capacity (p e ) and the number of nets assigned to edge e (d e ), respectively. The parameter t is used to define the target level of the maximum density, and it can be determined either by the user or by averaging over all routing areas. For example, if the goal is to make the average routing density to be half of the maximum acceptable density, then t is set to 2.
After the global routing is completed, we perform detailed routing with the guidance of the global-routing results and find a real path in the chip. Our detailed router is based on the maze-searching algorithm. Pattern routing uses an L-shaped or a Z-shaped route to make the connection, which gives the shortest path length between two points. Therefore, the wire length is minimized, and we do not include wire length in the cost function at this stage. We measure the routing congestion based on the commonly used channel density. After the detailed routing finishes routing a net, the channel density associated with an edge of a multilevel graph is updated accordingly.
Our global router first tries L-shaped pattern routing. If the routing fails, we try Z-shaped pattern routing. If both pattern routes fail, we give up routing the connection, and an overflow occurs. We refer to a failed net (failed connection) as that causes an overflow. The failed nets (connections) will be reconsidered (refined) at the uncoarsening stage.
The uncoarsening stage starts to refine each local failed net (connection), left from the coarsening stage. The global router is now changed to the maze router with the following cost function β: E i →R:
where a, b, are user-defined parameters, and o e ∈ {0,1}. If an overflow happens, o e is set to 1; otherwise, it is set to 0.
There is a trade-off between minimizing congestion and overflow. At the uncoarsening stage, we intend to resolve the overflow in a tile. Therefore, we make b much larger than a. Also, a detailed maze routing is performed after the global maze routing. Iterative refinement of a failed net is stopped when a route is found or several tries have been made. Uncoarsening continues until the first level G 0 is reached and the final solution is found.
EXPERIMENTAL RESULTS
The multilevel routing system was implemented in the C programming language on a 900 MHz SUN Blade 2500 workstation with 1GB memory. We conducted two sets of experiments: (1) testability enhancement, and (2) congestion control for routing considering multiple faults, manufacturability, and crosstalk. Three types of benchmarks were used in our experiments: the first type is for inter-module interconnects only (see Table I ); the second is the full-chip benchmarks (only mcc1 and mcc2), which include both inter-module interconnections and intra-module interconnections; the third type contains only intramodule interconnections which are local interconnections within standard-cell modules. The results of the experiments based on type-2 and -3 benchmarks are given in Table II. 
Testability Enhancement
For testability enhancement, the experimental results of the embedded OR scheme in the proposed multilevel routing framework are reported in Table I . We have presented both a detection (the preprocessing stage) and a diagnosis schemes (the postprocessing stage) as shown in Figure 6 for oscillation ring based interconnect testing in SOC in a predetermined design flow. Thus, f min ≤ f i ≤ f max gives the timing specification for this scheme, where f i is the estimated oscillation frequency for the i-th ring. Since our target of this OR scheme is for interconnect among modules, our experiments were conducted based on the MCNC benchmark circuits with inter-module connections. Table I gives the name of the circuit, the statistics for the circuits (the number of cores, #core; the number of pads, #pad; the number of hyperedges, #hyp; the number of 2-pin nets), the number of rings constructed for detection, |R t |, and the number of rings constructed for diagnosis, |R d |. Thus, |R t | is the testabilitydriven cost in the preprocessing stage, and |R d |-|R t | is the additional cost for the postprocessing stage. In addition to the 100% fault coverage of the oscillation ring detection scheme, we also obtained 100% net segment diagnosability.
To show the feasibility of this scheme, we include the actual estimated ATE measurement times in the parentheses in Table I . Since the frequency of each ring is predetermined during the design phase, a delay fault can thus be detected and measured by inspecting the contents of the local core counters (see Figure 2) . Let the oscillation frequency of the rings, according to the timing specification, be f min ≤ f i ≤ f max , with the unit time of measuring T 0 (= n/f). Thus, we have delay the counter contents of n min ≤ n i ≤ n max , where n min = f min ×T 0 and n max =f max ×T 0 . Let ξ be the resolution of delay measurement, and ε be the maximum measurement error. Since a counter's maximum measurement error is ±1, the requirement for ε should be the reciprocal of f min times T 0 .
We show an example of the delay measurement. Let the frequency specification of the oscillation rings be 4 MHz to 400 MHz, and ξ is 0.001, which implies that the counter content d min is at least 1000. From Equation (4), we have the required T 0 250µs. Thus, we get the estimated detection and diagnosis times in the parentheses. For example, for the ac3 circuit, we need 133 rings to detection and 374 rings to diagnose; therefore 133 x 250µs = 33.25 ms for interconnect detection, and 374 x 250µs = 93.5 ms for interconnect diagnosis. This shows the effectiveness and efficiency of the testability enhancement. 152400×152400  4  7541  25024  Struct  4903x4904  3  3551  5717  Primary1  7552x4988  3  2037  2941  Primary2  10438x6468  3  8197  11226  S5378  4330x2370  3  3124  4734  S9234  4020x2230  3  2774  4185  S13207  6590x3640  3  6995  10562  S15850  7040x3880  3  8321  12566  S38417  111430x6180  3  21035  32210  S38584  12940x6710  3  28177  42589 , and (C) our proposed method (with MST routing and balanced density).
Congestion Control for Multi-objective Optimization
In each case, we give the maximum (critical path) delay d max , average delay d avg , and the maximum number of nets crossing a level-0 tile #Net max , which is a good estimate for the maximum routing density. In our experiment, we set the parameter t = 4 for the ISCAS89 circuits, while for other benchmarks were set to t = 2. The completion rate is 100% for all cases. It can be seen that the proposed method achieves about the same level of performance as the routability-driven method does by up to 0.2% increase in d max and d avg ,, but the maximum density is much smaller. Compared with [24] , the experimental results show that our router improves the maximal congestion (#Net_max) by 1.24X--6.11X in runtime speedup by 1.08X--7.66X.
In Table IV , we show some statistical density results. The average number of nets crossing a level-0 tile is denoted by #Net avg , and we also list those of vertical tiles and horizontal tiles #Net avg_v and #Net avg_h respectively. Also, σ_v is denoted for the standard deviation from the vertical tile prospect and σ_h for that of the horizontal tile prospect. The results show that our scheme is more effective for the full-chip benchmarks mcc1 and mcc2. For other intra-module routing, our scheme also improve the results for most cases. Compared with [24] , the experimental results show that our router improves the average congestion by about 1.00X--4.52X, and improves the balanced congestion (σ _v and σ _h, standard deviation respective for vertical and horizontal tiles) by 1.37X--5.55X.
To demonstrate the effectiveness of the proposed algorithm in balancing the routing density, the number of horizontal wires crossing each level-0 tile for benchmark mcc1is shown in Figure  9 for the three algorithms. It can be seen that the performancedriven MR results in the least balanced routing, and the peak congestion is 181 (#Net max ) in mcc1. The routability-driven MR tries to avoid congested area to improve the probability of successful routing, and thus reduces the maximum density; its peak congestion is 61. With the proposed algorithm, the maximum density is further reduced to 45, and thus the manufacturability effects, the probability of multiple faults, and crosstalk effects are reduced accordingly. Mcc1 shows the maximal congestion improvement in our proposed algorithm by 1.36X compared to the routablility-driven MR and by 4.02X compared to the performance-driven MR. For mcc1, our proposed algorithm improves the average congestion by 1.01X--1.02X compared to the routablility-driven MR and 2.81X--2.85X compared to the performance-driven MR. For balanced congestion on mcc1, our proposed algorithm improves the result by 1.38X--1.48X compared to the routablility-driven MR and by 2.72X--3.32X compared to the performance-driven MR. For runtime speedup, our approach improves by 1.06X compared to routabillity-drive MR and by 3.08X compared to performancedriven MR.
Further, the interconnection congestion, as evident in the intermodule connections in mcc1 and mcc2, demonstrates the respective maximal and average congestion improvements by 1.39X--3.23X and 1.27X--2.36X with the congestion balance improvement (σ _v and σ _h, standard deviation respective for vertical and horizontal tiles) by 1.37X--2.76X. 
CONCLUDING REMARKS
We have shown that the embedded oscillation ring test and diagnosis scheme is feasible based on the simulation results with TSMC .18 µm process technology. Also, this OR scheme achieves 100% fault detection coverage and maximal diagnosability. We have also presented an effective multilevel routing framework that applies a congestion-driven routing 
