We present the first delay-fault testing approach for FPGAs, applicable both for manufacturing and for on-line testing. Our approach is based on BIST, is comprehensive, and does not require expensive ATE. We have successfully implemented this BIST approach on the ORCA 2C series FPGA.
Introduction
Advances in VLSI technology have resulted in more defects affecting the delays in the circuit, thus increasing the importance of delay-fault testing. The current FPGA manufacturing testing practice relies on configuring the FPGA with many designs and running them at speed. This is useful for speed-binning, but it is not a comprehensive delay-fault test. One of the main difficulties of the problem of delay-fault testing for FPGAs originates in the fact that users' circuits are not known when the chip is manufactured. While for an ASIC the target clock frequency is known before the chip is fabricated, it is impossible for the FPGA manufacturer to test the circuits that will be implemented on each device. Previous work in FPGA delay-fault testing [7] [8] considered only the testing of the user's circuit. However, such an approach is not acceptable either to the FPGA manufacturers, who want to ship defect-free chips, or to users, who want to purchase defect-free parts; otherwise, system-level debugging becomes a nightmare if design problems are intermixed with manufacturing faults.
In testing FPGAs, we can take advantage of their unique properties -reconfigurability, partial reconfigurability, and regular structure -to achieve features that cannot be realized in ASIC testing. Unlike in ASICs, where BIST requires large area overhead and some performance degradation, FPGA BIST can be done with zero area overhead and no delay penalty [11] . In FPGAs we can pseudoexhaustively test [9] both the programmable logic blocks (PLBs) and the programmable interconnect network, both for off-line [4] [13] and on-line testing [3] [12] with reasonable test application times. Pseudoexhaustive testing is practically complete and removes the need to evaluate the test quality. Thus FPGA test does not require either ATPG or fault simulation, which are computationally expensive tasks for ASICs. Accurate fault diagnosis is difficult to achieve in ASICs, but it can be done very efficiently in FPGAs [4] [3] [12] . In ASICs, fault tolerance for manufacturing faults cannot be accomplished without massive redundancy, but FPGAs allow efficient fault-tolerance without dedicated spare resources [5] [6] .
Our first attempt to do FPGA delay-fault testing was to run our BIST configurations at the specified clock rate for the target FPGA. However, we quickly realized that this cannot work, unless we significantly increased the number of configurations. The number of configurations is the dominant factor determining the total test time, because the configuration time is several orders of magnitude larger than the patternapplication time. To limit the number of configurations, our BIST techniques try to test as many resources as possible within the same configuration. For off-line PLB test, this requires distributing the patterns from the test-pattern generator (TPG) to many PLBs under test using long signal paths. Similarly, for interconnect test, the wires under test are long, connecting many wire segments and switches. Under these conditions, we must run the BIST configurations at a frequency much lower than that used in normal operation, so most delay faults will not be detected by these tests. An alternative would be to reduce the amount of logic and/or interconnect that is under test during any BIST configuration to allow BIST execution at a higher clock frequency; however, this will require more configurations, and ultimately significant increases in testing time and cost.
In this paper, we introduce a novel technique for delayfault testing in FPGA. Our technique is independent of the applications implemented on the FPGA, and it is applicable for both off-line manufacturing testing and for on-line testing (the latter in the context of the Roving STARs approach [3] ). Our method is based on BIST, it is comprehensive, and it can work with any low-cost ATE. For our implementation, we used the Lattice ORCA 2C series FPGA [17] , but our technique can also be applied to most other FPGA architectures, such as such as the Xilinx Virtex [18] .
The remainder of this paper is organized as follows. In Section 2 we present the principle of the BIST technique, and in Section 3 we analyze its implementation issues. In Section 4 we discuss the application of the delay-fault BIST approach to on-line and off-line testing. Finally, we present conclusions in Section 5.
The Main Idea
An FPGA is composed of PLBs, programmable I/O cells, and programmable interconnect; the latter consists of wire segments that can be connected via programmable switches referred to as configurable interconnect points (CIPs). Wire segments in the programmable interconnect network are bounded by these CIPs and are considered to be either global or local routing resources. Global routing resources connect non-adjacent PLBs, while local routing resources connect a PLB to global routing resources or to adjacent PLBs. The routing resources are bus-oriented, with the number of wires per bus typically ranging between 4 and 8. The PLB functions and the CIPs are controlled by writing the configuration RAM.
2
The basic CIP structure consists of a transmission gate controlled by a configuration memory bit ( Figure 1a ). There are three types of CIPs which we refer to as the cross-point CIP (Figure 1b) , the break-point CIP (Figure 1c) , and the multiplexer (MUX) CIP (Figure 1d ) [17] . While a cross-point CIP connects wire segments located in disjoint planes (a horizontal segment with a vertical one), a break-point CIP connects two wire segments in the same plane. The MUX CIP comes in two varieties: decoded and non-decoded. A decoded MUX CIP is a group of 2 k cross-point CIPs sharing a common output wire and controlled by k configuration bits, such that the input wire being addressed by the configuration bits is connected to the output wire; the decoding logic is incorporated between the configuration bits and the transmission gates. A non-decoded MUX CIP contains a configuration bit for each transmission gate, such that k wire segments are controlled by k configuration bits; usually only one of the configuration bits is active for any configuration. There is also a compound CIP (Figure 1e ), which is a combination of four cross-point and two break-point CIPs, each separately controlled by a configuration bit [18] . Most recent FPGA interconnect architectures are primarily constructed from non-decoded MUX CIPs that are buffered to prevent signal degradation due to the series resistance of each transmission gate the signal passes through.
A signal path is formed by connecting several wire segments and PLBs in a continuous sequence via multiple CIPs. The propagation delay along the path accumulates the delays of all its PLBs, segments, and CIPs. A path may have different delays for rising (0/1) and falling (1/0) transitions. Figure 2a illustrates our BIST architecture. We configure several paths under test (PUTs), so that every path has the same sequence of PLBs, wire segments, and CIPs. Each PLB on the path is programmed as an identity function, so it appears as a buffer for the signal propagating along the path. The PUTs are identical, except for their position in the FPGA. This works well with the bus structure of the programmable interconnect of most FPGAs (with 4 to 8 wires per bus).
Our technique compares the delays of the PUTs. Assume that a rising transition is applied at their common input I. This transition propagates along every PUT, and it will eventually appear at the inputs of the OR and the NAND gates. The signal FIRST responds to the fastest arriving transition, while LAST changes only after the slowest one has arrived. FIRST enables the local oscillator loop, and LAST stops the oscillations (see Figure 2b) . Thus the count of OSC pulses measures the difference D between the fastest and the slowest propagation delays along the PUTs. In a circuit free of delay faults, D should be smaller than a predetermined threshold; otherwise we say that a delay fault is detected. Note that the same circuit can detect a delay fault affecting the propagation of a 1/0 transition, the only difference being that the roles of FIRST and LAST are reversed.
Since the first OSC pulse may be generated (possibly as a partial pulse) even when the transitions of FIRST and LAST are very close, a count of one should not be interpreted as indicating a delay fault. Figure 3 illustrates the typical structure of a PLB, consisting of look-up tables (LUTs), flip-flops (FFs) that can also be configured as latches, and output MUX logic. Figure 4 shows the structure of a PUT traversing both a LUT and a FF inside a PLB. The raising input transition is applied to all LUT inputs, and the LUT is configured as an AND gate, whose output propagates the slowest of its input transitions. This allows concurrent testing of all paths through the LUT. The FF/latch is configured as a latch, and its clock input is kept at the active value, so that the latch will be open and will behave like a buffer. In this way the entire PLB implements an identity function. The paths bypassing the FFs and the paths bypassing the LUTs are tested by similar configurations. When propagating a falling transition, the LUT is configured to implement an OR gate.
It is surprising to observe that, unlike ASIC delay-fault testing, our technique does not involve clocking. This is not a problem, since delays on the clock distribution paths are implicitly checked during speed-binning tests. Thus our The BIST circuitry is very simple: the TPG only needs to generate the two transitions, and the output response analyzer (ORA) consists of the three gates that produce the oscillation and the counter. The counter is reset before each experiment. Both the TPG and the counter are controlled, and the counter is read, via the FPGA boundary-scan access mechanism.
Implementation Issues
The smallest difference between the delay of the fastest and slowest PUTs detectable with our scheme corresponds to one OSC cycle. When testing a path with ASIC-type delayfault testing, the smallest detectable delay fault is generally about 5% of the path delay. To achieve a similar feature, PUTs should be constructed so that their delay corresponds to at least 20 OSC cycles.
While making PUTs as long as possible would increase the number of FPGA resources concurrently tested, thus possibly reducing the number of configurations required for a complete test, it may also cause false negative results. For example, assume a path P1 where all of its components (PLBs, CIPs, and wire segments) are just 1% slower than their counterparts on path P2. If the PUTs involve a large number of components, the accumulated difference between the delays of P1 and P2 may be incorrectly reported as a delay fault. Therefore PUTs should be constructed so that their delay is not significantly larger than that of an average "normal" path that would be used in circuits implemented in the FPGA.
In any comparison-based BIST, a passing result may be produced when the compared elements are all faulty; in our case, this means that all the compared PUTs are equally slow. Such a situation is unlikely when we compare several (4 to 8) paths. However, a validation test to protect against this case can be easily done by selecting one of the paths that passed the test and comparing it with a new path which was not part of the compared group.
No delay faults will be detected in a slow device where all paths are equally slow. This is the correct result, and such a chip will be identified by speed-binning and may be allowed to work as a lower speed-grade.
Our approach may fail if a PUT has compensating delay faults, so that the detection a slow path segment is masked by the presence of a fast segment, so that the overall path delay remains about the same as the other PUTs. In general, however, most delay faults slow down the circuit, and such a multiple fault is unlikely to occur in practice.
The use of the local oscillator created from the inverting feedback in the PLB logic could give rise to concerns of the quality of the clock feeding the ORA counter, specifically, the duty cycle and period needed for proper operation of the counter. One solution to this problem is to configure a single flip-flop as a toggle flip-flop with the output of the local oscillator driving the clock input to this flip-flop. This effectively divides the local oscillator frequency by 2 and ensures a near 50% duty cycle. The lower-frequency clock will only reduce the resolution of delay fault detection as opposed to preventing this delay fault BIST from working. However, we have implemented the delay fault BIST approach in an ORCA 2C15A FPGA and found the oscillator clock to run at 243 MHz while producing a duty cycle and clock waveform of sufficient quality to obtain reproducible results from one execution of the delay-fault BIST sequence to the next. Therefore, dividing the clock may not be necessary.
We emulated delay faults by creating a PUT longer than the other PUTs, and our technique detected all such faults.
Application to Off-Line and On-Line Testing
Our roving STARs approach [1] [3] introduced new techniques for on-line FPGA testing, diagnosis, and faulttolerance, applicable to any FPGA supporting incremental run-time reconfiguration (RTR) via its boundary-scan interface [16] . A STAR (self-testing area) is a temporarily off-line section of the FPGA where self-testing occurs without disturbing the normal system activity in the rest of the chip. Roving the STARs periodically brings every section of the FPGA under test. Our approach guarantees complete testing of the FPGA, including all its spare resources, and does not require any part of the chip to be fault-free. In this section, we discuss the application of our delay-fault BIST to roving STARs for delay fault detection. Figure 5 depicts an FPGA with a vertical STAR (V-STAR) and an horizontal STAR (H-STAR); the system application resides in the working areas outside the STARs. Note that global horizontal routing resources in V-STAR and global vertical routing resources in H-STAR may be used by the system signals connecting the working areas separated by the STARs. Partial RTR via the boundary scan interface allows the test configurations used by STARs to be downloaded without impacting the system operation. After self-testing of a STAR has been completed (both for PLBs and interconnect), the STAR roves to a new location, by exchanging places with an equal-size slice of the working area; roving the STARs across the FPGA is implemented by a sequence of precomputed partial reconfigurations and assures that the entire FPGA will be eventually tested. The roving process and the use of roving STARs for test and diagnosis of PLBs are described in detail in [1] and [3] .
Testing for delay faults follows the pattern of interconnect testing in our on-line routing BIST [12] , where horizontal and vertical routing resources are tested in H-STAR and V-STAR, respectively. Figure 6a illustrates this process (T denotes the TPG and O the ORA). Delay faults in the PLBs are also tested as part of the interconnect delay-fault testing sequence. As a result, no additional test phases are required for the PLB logic. Testing for delay faults in the cross-point CIPs connecting global horizontal and vertical routing busses must involve both STARs and can only be performed at the intersection of the two STARs, as illustrated in Figure 6b . Table 1 summarizes the set of on-line BIST configurations needed for a complete delay-fault test of a Lattice ORCA 2C series FPGA in terms of the number of test phases that must be executed in each STAR position.
One way of characterizing the difference between on-line and off-line testing is that no system function exists during off-line (manufacturing) testing. Hence for off-line testing, we can populate the entire FPGA with a "galaxy" of parallel STARs (either vertical or horizontal), all executing concurrently the same delay-fault BIST (Figure 7 ). The set of BIST configurations given in Table 1 is the same for both online and off-line testing.
Conclusions
We have presented the first delay-fault testing approach for FPGAs. Our technique is applicable both for off-line manufacturing testing as well as for on-line testing within the framework of the roving STARs approach. Our method is based on BIST, it is comprehensive, and does not require expensive ATE. We have successfully implemented this BIST approach on the ORCA 2C series FPGA and have verified that the approach is not only feasible but also practical.
The focus of our future research will be diagnosis and fault tolerance for delay faults. 
