THE COMPLEXITY AND SPEED of fieldprogrammable gate array systems are limited by the gate equivalents available in the largest FPGAs-about 100,000 (compared to microprocessors with some 3.5 million transistors). Multichip modules (MCMs) like the Teramac Custom Computer, 1 built of FPGAs and memory, overcome IC area limits and most resistance-capacitance delays of boardbased multichip systems. Yet even MCMs, from which most defective ICs are removed in fabrication, require error correction techniques for large FPGA systems. Monolithic substrate integration achieves higher speeds than MCMs, but large-area devices require more flexible methods of bypassing defects. Fortunately, researchers have developed wafer-scale integration (WSI) techniques that are applicable to FPGAs.
Postfabrication laser processing techniques have proved very effective in removing defects and enhancing fault tolerance in VLSI circuits. A focused laser beam can cut signal paths, and with specialized structures, make permanent connections in a circuit. In commercial devices, such as dynamic RAMs or microprocessors with large, onboard level 1 and level 2 cache memories, laser defect avoidance leads to significantly enhanced yields. In experimental applications, it has expanded usable silicon area into WSI devices with areas greater than 25 square centimeters. 2, 3 For defect repair, designers generally break the overall device structure into redundant cells (blocks) surrounded by communication buses and switches to route signals among the cells. These cells may range from repetitive blocks or columns and rows of memory to complex processing elements such as FPGA cells. At the periphery, there are specialized I/O cells and (in larger systems) clock drivers. After fabrication, engineers test the system, identify working sections, and avoid defects by physically routing signals only to the operable cells.
A problem arises with this defect avoidance procedure: It limits designs to systems with repeatable units. However, we can produce a very flexible system if we apply the technique to building very large area or waferscale FPGAs. In FPGAs, the basic cells are programmable logic blocks interconnected by programmable routing buses, exactly what is needed for successful defect avoidance methods. Thus, FPGAs and custom computers have the four features that make them ideal for very large area applications: Routing is a key element in any FPGA design, involving 70% to 90% of the chip area. 5 Furthermore, previous WSI work 2, 3 has shown that in very large area FPGAs, both cells and signal lines may have defects, making the routing problem more difficult. Recent studies have considered this problem from a theoretical point of view. One article suggested using Monte Carlo simulation of defects and substituting medium-size blocks of cells. 6 Another simulation study considered the use of electrically alterable connections (antifuses and fuses) or active switches in the bus lines. 7 In a practical application, Altera has begun using a patented laser-programmed, column substitution redundancy technique, which works only on small devices, in its Flex 10K FPGA series.
In contrast, we have investigated and measured power and routing problems in large-area and wafer-scale FPGAs, using small, experimental prototypes containing all the features required for WSI devices. 8 Our work focuses on the following question: With a random distribution of working FPGA cells, how can we minimize defect avoidance signal delays so that the bypassing of defective cells is invisible to users?
Laser-link defect avoidance
The regular structure of IC memory allows the deletion technique of laser line cutting for removing and replacing dead rows or columns. First used in 64K DRAM chips, this type of error correction has significantly increased device yields. But deletion techniques are insufficient for large-area and wafer-scale systems, which need additive defect avoidance procedures to substitute spare cells for failed circuit elements as well as the line-cutting capability. Yield problems make transistor-based, active switching elements difficult to implement as a redundancy method for digital WSI systems. 2, 3 In our work, we use the successful WSI approach of Massachusetts Institute of Technology's Lincoln Laboratory: a postfabrication laser-diffused link.
2 Figure 1 illustrates such a structure. The link consists of two conductively doped lines separated by a 2-µm gap in 1.5-micron CMOS processes. Exposing the gap to an argon laser beam pulse (typically a 100-µs, 4-W, 1.2-µm spot) spreads the dopant throughout the gap, reducing resistances from gigaohms to about 70 ohms. Both first-and second-metal lines can also be cut. We built the structures described here in 1.5-micron CMOS; laser links have also been successfully tested in 3-, 2-, and 1.2-micron CMOS and 0.8-micron BiCMOS with similar resistances. Bernstein and Colella 9 have produced metal links using two metal lines 2 µm wide by 6 µm long and separated by a 1-µm gap. Laser pulses crack the intermetal glass and allow lasermelted metal to flow between the two lines, generating resistances of less than 3 ohms.
The greatest advantage of laser linking over bidirectional transistor switches is that the connections are permanent and have a lower impedance. Furthermore, laser connections take only 10% to 25% the area of active switches and their control sections. 4 Researchers have used laser-link defect avoidance to build fully operational wafer-scale circuits in nine different designs applied to DSP filter and speech recognition systems, all in large, metal MCM packages.
2,3

Design of FPGA redundancy paths
In designing FPGAs with built-in defect avoidance, the important question is how to enhance the design to maximize the harvest-that is, the fraction of working cells available for system operation. FPGA systems have small, repeatable cells, with signals routed through buses by switches, forming a large, 2D array at fabrication. Defects require the switches to bypass the bad blocks and connect the working sections, creating a smaller logical 2D array. Ideally, the FPGA user should need to know only the logical array, not the physical cell arrangement. In our work, we tested each cell by working from one end of the array to the other and setting the switches to bring the inputs (including cell programming) and outputs to and from the bonded lines.
The best way to restructure FPGAs is with row-column and . cell substitution to create a 5 × 5 logic array, as shown in Figure 2 . Here, we use a cell-by-cell method to replace defective cells with neighboring cells. Extra columns allow row-column substitution to bypass entire columns if a signal bus is defective. Also, the extra column can replace columns containing the most faults and thus gain extra rows, creating pseudofaults or unused cells (see column 5 in Figure 2 ).
To test the validity of this method, we wrote a Monte Carlo simulation program for an array of 100 × 100 cells. Recognizing that clustering occurs on wafers during fabrication, the program uses the negative binomial distribution, in which defect points follow the probability form . P is the probability of having x defects per area S, λ is the defect density per cell, and α c is the cluster coefficient. Cluster coefficients range from infinity, a purely random Poisson distribution of defects, to α c = 1, a moderate clustering, to α c = 0.1, at which most defects occur near each other.
The simulation starts with a defect-free array 4 and adds defects when a random number generated for each cell exceeds the failure probability. The failure probability depends on the number of defects already in the cell and its nearest neighbors. The simulation continues adding defects until it reaches the desired density. We simulated 10 lots of 100 wafers, each with 100 × 100 cells, for defect distributions at various cluster parameters. Our purpose was to test the ability of various restructuring layout designs to harvest the most FPGA cells. For example, we tested the effect of adding spare signal lines. Our model included a point usually missed: the significance of cell area for switches and buses (40% in our experiment), with their own resulting defects. For each simulation, we restructured the wafers with cell-by-cell substitution and row-column bypassing using the algorithm introduced by R. Gupta. 10 Using the defect distributions of these 1,000 wafers, we calculated the fraction of time a target 2D-array size could be built from the 100 × 100-cell array. Figure 3 shows the effect of adding extra lines to the design. If the cell yield is 93% (600 defects per wafer, or λ = 0.06), with no spare lines, the 50% yield point drops drastically to a 35 × 35 array, almost independently of the cluster coefficient effect. However, adding one spare line increases the 50% yield target to an 85 × 85 array, a second spare having little effect because the chance of clustered defects blocking both spares is limited. If the yield rises to 99% (100 defects per wafer, or λ = 0.01), the no-spare 50% yield point becomes 80 × 80 cells, while one spare line increases it to 97 × 97. Thus, we designed the FPGA test vehicle structure to allow one spare general line accessible by the cells.
Test vehicle design
The main building block of any FPGA system or custom computer is the basic FPGA cell, along with its switching sections. For the test vehicle, we built a very simple cell that incorporates all the characteristics of more complex waferscale FPGA cells (see Dufort . Figure 5 shows the FPGA cell's I/O connections, which use a bus switch box and 12 tracks per channel. The connection box switches each I/O to six possible lines. Either switch box routes the whole channel from the input to the three remaining directions. Each cell's state is input at programming time and stored in the logic block's static RAM, which forms the shift register. The bus switch box uses active pass transistors controlled by the FPGA cell. Alongside them, the laser switch box contains four merged laser links, creating the four standard switch directions: straight horizontal, downward turn, upward turn, and straight down. 
Cell utilities: power and clock routing
A critical aspect of monolithic large-area design is the cell utilities: power and clock routing. Early WSI research showed one power short kills the entire system. Typically, 50% of failed blocks draw high current, so cells cannot initially be power connected. 2 Similarly, the Teramac MCM 1 requires physically separating the chips to remove the power shorts. Previous WSI systems used either large laser links or long transistors for power connections. 4 Our FPGA test vehicle uses a new device called the testable laser link, which combines a long laser link and a small transistor. 8 Activating the transistor section provides a modest connection (a 0.08-V drop) for low-frequency cell power checks. After testing, laser linking makes a permanent connection of a small, 0.006-V drop for the working cells. For clock lines, these combined devices allow redundancy lines to route clock signals to all cells. In clock routing, lower-resistance laser links reduce clock skew across the wafer. Using metal laser links enhances these effects for both power and clocks.
Currently, FPGA redundancy concentrates on simple column substitution methods. Our work suggests that column power connections using testable laser links may be an important factor in improving FPGA yields. .
Delay tests
In FPGA applications such as custom computers, the major drawback is circuit speed. For a given circuit, a custom implementation is much faster than an FPGA implementation because of large delays in the latter's routing circuitry. Adding defect avoidance could significantly decrease the circuit speed because of the extra routing and the array's physical restructuring, especially with active switches. Thus, testing the effects of delays was an important aim of our research.
Probably the fastest circuit that can be built with an FPGA is an nth-order ring oscillator, in which the output of an odd number of inverters' output is fed back to the input inverter. In one series of experiments, we programmed ring oscillators into the test vehicle FPGA to determine the impact of restructuring on the circuit's maximum frequency of operation. To achieve the best results, we programmed an odd number of cells to simulate inverters, with a final cell acting as an output buffer, so that the oscilloscope's capacitance would not affect frequency. Figure 6a shows the oscillator without laser restructuring, and Figure 6b shows the active-cell bypass. Although the test chip consisted of a 1 × 12 row, it contained spare lines so that we could route the output inverter signal back to the input, creating the ring oscillator. Also, because we combined active switches and laser links in the switch boxes, it was not necessary to pass through an active switch to reach the laser switch boxes.
To conduct the test, we built and simulated (using HSpice) the oscillator without restructuring, then with one restructured path, and so on until we obtained the maximum number of restructured paths possible (N R = 4). To obtain the experimental results, we incrementally restructured the ring oscillator and measured the frequency for each value of N R from 0 to 4. Table 1 (next page) lists both the simulated and the experimental results of the active-switch rerouting and the laser-link bypassing. Both sets of results illustrate the expected trends of decreasing frequency with increasing N R . With laser linking, the ring oscillator's frequency decreases about 5% for each restructuring bypass N R value, whereas the decrease is about 10% for active devices. Thus, with laser linking, the heavy bypassing of N R = 4 has almost the same time delay as N R = 2 with active switching.
Indeed, as these results reflect the FPGA's fastest speed, the percentage decrease is even smaller for more complicated FPGA circuit implementations. Even doubling the active transistor sizes still leaves the simulated circuit's speed only 80% that of laser-link designs, at the cost of significant active-switch area increases.
In similar tests with a 2 × 5-matrix test vehicle, the shorter column substitutions produced smaller effects of 2% to 9% for N R = 1 to N R = 4.
4 This FPGA's relatively modest speed was a function of the fabrication process. We directly measured the delays from restructuring with active and laser links on the FPGA chip. In the first test path, a set of two lines was connected directly past three cells, with routing to another line and back-a delay of N R = 2. We used laser-link switch boxes for routing in one case, and active-switch boxes in the second case. The delays for a 1-MHz square wave signal were 7 to 8 ns for laser links versus 180 ns for active switches (due to their large effective resistance).
Effect of changing cell size
An important question is the effect of block size change on routing delays. 6 We used our experimental results in simulations to extrapolate the effects of changing the cell size, with an optimized logic block to get higher speeds. We compared the delays of defect bypassing with direct-wired connections (the ideal wire path delays that metal laser links would approach), routing switches using diode laser links, and active switches. We then simulated cells of one, two, and five times (1 ×, 2 ×, and 5 ×) the linear size of the base cell .
(1,206 × 650 µm), or one, four, and 25 times the area. We used the same restructuring scheme as in the experiments described earlier. Figure 7 shows the resulting ring oscillator frequencies for diode laser-link and active-switch bypassing. The structure's total length changes from 1.5 cm at 1 ×, to 3 cm at 2 ×, to 7.5 cm at 5 ×. This shows that the diode laser link is even better for very large cells with N R = 4-from 1.6 to 2 times faster than active cells. Metal links (direct connections) are 3.7 times faster than active cells in the 5 × case.
OUR SIMULATIONS show that in wafer-scale FPGAs, larger 2D arrays require an additional spare line in each row and column. Testable laser links removed power-short defects in large-area FPGA prototypes. The type of defect avoidance routing used affects the system speed of large-area FPGAs. Experimental and simulation results show that laser interconnection methods can increase the operation frequency of defective FPGAs from 1.6 to 2 times, depending on the cell size. Thus, laser defect avoidance techniques solve the major problems of power and defects for large-area FPGAs.
We are continuing our work on this defect avoidance technique, applying it to current and next-generation FPGA designs and investigating modifications for better applicability to commercial devices. 
TTTC Sponsors DFT Benchmark Effort
The IEEE TTTC is sponsoring an initiative to collect a new set of ATPG benchmarks for presentation at the ITC99. The Benchmarks subcommittee, under the DFT TAC, plans to distribute a set of realistic benchmark circuits to all design-fortestability researchers and seeks your input.
The last full set of ATPG benchmarks was released almost a decade ago, in 1989. These benchmarks jump-started work on sequential ATPG by providing a common basis for comparing algorithms. However, the gate-level ISCAS 85 and IS-CAS 89 benchmarks are not hierarchical, do not include realistic circuit features such as memories, and are small. These benchmarks have lost their value-they are much too simple in today's world of million-gate designs. Since today's ATPGs can handle the ISCAS 89 designs, progress in ATPG algorithms has stalled from lack of a challenge.
What should a new set of benchmarks contain?
Some benchmarks should be available at the register-transfer level. They should contain multiple clock domains, internal tristate buses, memories, inputs/outputs, and other needs that real-world test writers face. Some benchmarks available as system-on-a-chip designs would let us compare the effectiveness of the many solutions for embedded core testing being proposed today.
Though commercial tools can handle almost all designs (assuming the use of full scan), some designs are still difficult. Embedded memory support requires either scanning around the memory (and possibly losing fault coverage for logic inside the scan ring) or modeling the memory, which is difficult and slows down test generation. New university research in memory support is hampered today by a lack of examples. So far, there has been little successful work on high-level test generation or DFT. Published algorithms might work well on large RTL designs but none are freely available on which to test algorithms. Freely published and reproducible results will lead to healthy competition between research groups and accelerate the discovery of new, better DFT algorithms and techniques.
Visit the benchmark site at http://www.cerc.utexas.edu/ itc99-benchmarks/bench.html and subscribe to the benchmark mailing list to view more detail on benchmark requirements and justification. If you want to receive the benchmarks, fill out a survey so we can better understand your requirements.
If you work in a design organization, could you get any antiquated ASIC designs cleared for distribution? (You can change all the signal names.) Could you provide a large subcircuit of an old design? If you work for a CAD company, could you share test circuits? If you are in a university, could you donate designs done for class or research projects? Interesting designs do not have to be state of the art. If you are interested in donating a design, fill out the form on the Web page. All donations are anonymous.
For more information, visit the Web page or e-mail Scott Davidson at scott.davidson@eng.sun.com.
