Memory Block Based Scan-BIST Architecture for Application-Dependent FPGA Testing

Keita Ito, Tomokazu Yoneda, Yuta Yamato, Kazumi Hatayama, Michiko Inoue
Nara Institute of Science and Technology, Nara, Japan
Japan Science and Technology Agency, CREST, Tokyo, Japan
{keita-i, yoneda, yamato, k-hatayama, kounoe}@is.naist.jp

ABSTRACT
This paper presents a scan-based BIST architecture for FPGAs used as application-specific embedded devices for low-volume products. The proposed architecture efficiently utilizes memory blocks, instead of logic elements, to build up BIST components such as LFSR, MISR and scan chains for test points. It also provides enhanced scan functionality for test points and performs a hybrid test application of LOC and enhanced scan to improve delay test quality. Experimental results show that the proposed BIST architecture achieves high delay test quality with efficient resource utilization.

Categories and Subject Descriptors
B.8.1 [Performance and Reliability]: Reliability, Testing, and Fault-Tolerance

Keywords
built-in self-test; delay test; test point

1. INTRODUCTION
Continuous advances in silicon manufacturing technologies allow us to design large, high-speed and low power products. Among the many challenges imposed by the technologies, high in-field reliability is major concern and periodical online self-test is essential for overcoming reliability issues such as transistor aging. FPGAs are manufactured with most advanced technologies and are also used as mission-critical and application-specific embedded devices, instead of ASICs, for low-volume products due to their low development cost and short time-to-market. However, advanced technologies are more vulnerable to transistor aging and therefore it is important to ensure in-field reliability of application-specific circuits implemented on FPGA devices.

For ASIC products, several approaches have been proposed for circuit failure prediction [8, 6, 7, 11] to overcome reliability issues. Circuit failure prediction anticipates the occurrence of a circuit failure before the appearance of any error and the basic principle is to capture the gradual delay shift caused by the transistor aging using delay test schemes. For the purpose, they usually utilize scan-based BIST architecture, which is extensively studied and widely used infrastructure for manufacturing test of ASIC products.

There is no doubt that scan-based BIST is most promising and well-established architecture for high quality delay test of ASIC products. However, it is not efficient to adopt ASIC scan-based BIST architecture as it is for FPGA devices in term of resource utilization. FPGAs are not scan-ready devices (no scan cell and no scan chain) and scan cells/chains must be implemented using general resources such as Look-Up-Tables (LUTs), registers in Logic Elements (LEs) and local/global interconnects. It was reported that scan insertion for an application-dependent circuit introduces 50% increase in LE usage [9]. Even though many approaches have been proposed for testing of application-dependent circuits implemented on FPGAs [12, 13, 1, 10], only a few approach adopt scan-based architecture [9, 3] due to high area overhead.

In this paper, we assume that intended circuits for scan-based BIST are logic-intensive circuits and require LE resources rather than memory resources, which are another resources embedded on FPGAs and can be configured to provide various memory functions. This paper presents an efficient scan-based BIST architecture for application-dependent circuits on FPGAs using shift register configurations of memory blocks. To the best of authors’ knowledge, this is the first paper that discusses efficient implementation of scan-based BIST on FPGA devices. The contributions of the paper are summarized as follows.

- It presents FPGA-specific and area-efficient architecture for Linear Feedback Shift Register (LFSR)/Multiple Input Signature Register (MISR) [2] using shift register configurations of memory blocks.
- It presents a scan chain architecture for test point FFs. The proposed test point chain architecture is also implemented using memory blocks for area efficiency and provides enhanced scan [5] functionality that can improve delay test quality.
- Experimental results show the effectiveness of the proposed architecture in terms of area and fault coverage of transition delay faults compared to a conventional scan-based BIST architecture used for ASIC designs.

The rest of the paper is organized as follows. Section 2 describes the proposed BIST architecture and its test application scheme. Experimental results are shown in Section 3. Finally, Section 4 concludes this paper.

2. PROPOSED BIST ARCHITECTURE

2.1 Overview
Figure 1 shows a conventional scan-based BIST architecture with test point insertion which is widely used for ASIC products. An
LSFR and a MISR are used as a test pattern generator and a test response compactor, respectively. FFs in Circuit-Under-Test (CUT) are replaced with scan cells and several scan chains are constructed. Moreover, control points (CPs) and observation points (OPs) are added to improve random pattern testability. In the target BIST architecture, we assume that each CP and OP share one FF as a test point and these test points are stitched to form independent scan chains. We call the scan chain for test points as test point chains (TPCs) to distinguish from other scan chains in CUT. We also assume Launch-On-Capture (LOC) [5] test application scheme which is widely used in industry for delay fault testing.

In this paper, we assume that intended circuits for scan-based BIST insertion are logic-intensive circuits. In other words, the application-dependent circuits targeted in this paper require LE resources rather than memory resources, which are another resources embedded on FPGAs and can be configured to provide various memory functions such as RAM, ROM, FIFO buffers and shift registers without using LEs. The main idea is to efficiently utilize unused memory blocks, instead of LEs, to implement BIST components such as LFSR, MISR and TPCs. The overall of the proposed BIST architecture is shown in Fig. 2. The detailed architectures of LFSR, MISR and TPCs and its test application scheme will be explained in the following subsections. The unique characteristics are summarized as follows.

- Shift register mode of memory blocks is efficiently configured to implement LFSR, MISR and TPCs.
- TPCs and normal scan chains are controlled by independent scan enable signals, \( S_{E_{TPC}} \) and \( S_{E_{CUT}} \), respectively. TPCs remain unchanged during half of the test application cycles.
- TPCs naturally fit enhanced scan cell implementation that can improve delay test quality in LOC test application scheme.

### 2.2 LFSR, MISR and TPCs

As we explained in Section 3.1, memory blocks can be configured to provide shift register functions without using LEs. However, there are several design constraints to be satisfied to implement a shift register on memory block. For example, memory blocks embedded on Altera devices provide the shift register configuration shown in Fig. 3. The shift register configuration is determined by the input data width \( w \), the length of the taps \( m \) and the number of taps \( n \), and the size of a \( w \times m \times n \) shift register must be less than or equal to the maximum number of memory bits. In addition, the length of the taps \( m \) must be more than or equal to 3.

In order to implement the shift register part of LFSR on a memory block, we select the configuration: (1) \( w = 1 \), (2) \( m = 3 \) (minimum length of the taps) and (3) \( n = N_{sc} \) where \( N_{sc} \) is the total number of scan chains including TPCs, as shown in Fig. 4. All the \( n \) bits output of the \( n \)-tap 3-bits shift register on memory block are connected to scan-in ports to feed pseudo-random patterns to CUT.

Similarly, a MISR is implemented using a shift register configuration of memory block as shown in Fig. 5. In MISR case, we select the following configuration: (1) \( w = 1 \), (2) \( m = 3 \) (minimum length of the taps) and (3) \( n = 1 \), and prepare \( N_{sc} \) 1-tap 3-bits shift registers. The \( N_{sc} \) 1-tap 3-bits shift registers are connected in serial by way of XOR gates, which are implemented by LEs, to form MISR.

A TPC is also implemented using a shift register configuration of memory block as shown in Fig. 6. We select the following configuration for each test point (i.e., a pair of CP and OP): (1) \( w = 1 \), (2) \( m = 4 \) and (3) \( n = 1 \). Each 1-tap 4-bits shift register is connected to a single CP and OP. They are also connected in serial by way of the multiplexers, which are implemented on LEs, to form TPC.

The length of a 1PC is larger than normal scan chains, then the 1PC must be divided into several 1PCs which does not exceed the length of normal scan chains. The 1-tap 4-bits shift register can store two 2-pattern delay tests (i.e., 4 bits) for each CP and works as an enhanced scan cell during test application to improve delay test quality.
As you can observe from the figures, the size of LFSR/MISR/TPC becomes $m$ (3 and 4 for LFSR/MISR and TPC, respectively, in the FPGA device used in this paper) times larger than those used in the conventional BIST architecture to satisfy the design constraints. However, most part of the LFSR is implemented on memory block and there is not so much increase in LE usage.

2.3 Test Application Scheme

Figure 7 shows a timing diagram of the proposed BIST architecture during test application. Basically, it follows LOC-based at-speed delay test application scheme controlled by scan enable signal $SE_{CUT}$, and only TPCs have unique behavior controlled by $SE_{TPC}$ and $EN_{TPC}$ explained as follows.

Only when an even-numbered test pattern (i.e., pattern id is $i \times 2$) is scanned in, TPCs are active and work as a single shift register in between LFSR and MISR (i.e., $EN_{TPC} = 1$ and $SE_{TPC} = 1$). At the end of scan-in cycles for the even-numbered test pattern, each 1-tap 4-bits shift register in TPCs contains two 2-pattern delay tests (i.e., 4 bits) for a CP. Then, $SE_{TPC}$ is de-activated before the launch cycle and TPCs are switched to work as independent 1-tap 4-bits shift registers to capture test responses from OPs. When the next test pattern (odd-numbered pattern) is scanned in, $EN_{TPC}$ is de-activated and TPCs become in-active to avoid consuming unnecessary power. Finally, $EN_{TPC}$ is again activated and it performs launch and capture operation. This process is repeated until the BIST test application process is completed.

3. EXPERIMENTAL RESULTS

In this section, we present experimental results for two ITC’99 benchmark circuits b12 and b17 [4]. They were synthesized with 8 scan chains of length 8 and 20 scan chains of length 65, respectively. We refer the designs as "Org. w/ scan" and use them as baseline for our comparison shown in Table 1.

In our experiments, we randomly added 16 and 128 CP/OP test point pairs to b12 and b17, respectively. After the test point insertion, we implemented two BIST architectures, conventional BIST and "Conv. BIST" and the proposed BIST “Prop. BIST” on an FPGA device. In the conventional BIST architecture, we prepare one FF per each CP/OP test point pair and the size of LFSR/MISR and TPCs, respectively, in the FPGA device used in this paper) times larger than those used in the conventional BIST architecture to satisfy the design constraints. In contrast, we used 3-bits and 4-bits shift register configurations for LFSR/MISR and TPCs, respectively, to satisfy the design constraints for shift register realization on memory blocks. Consequently, the number of FFs and chains for TPCs in the proposed BIST is 4 times larger and the size of LFSR and MISR is much larger than those in the conventional BIST as shown in Table 1.

The designs were implemented on Altera Cyclone III using Quartus II 13.0 Web Edition, and area overhead is evaluated in terms of “Total logic elements”, “Total Combinational Functions”, “Total registers” and “Total memory bits”. Furthermore, fault simulations were also performed to evaluate delay test quality for the two BIST architectures. 500 and 20,000 pseudo-random patterns are used for the fault simulations of b12 and b17, respectively. The results for area and fault coverage are also included in Table 1. Note that the fault coverages are not so high since the test points are randomly inserted without considering controllability/observability in CUT and the BIST does not adopt complemental approaches such as phase shifter and reseeding [2] which are widely used to improve fault coverage.

First, we compared two results in columns “Memory” and “LE” of “Prop. BIST”. These two designs have exactly the same architecture (as the proposed architecture). Only the difference is that “Memory” implemented the architecture using memory blocks while “LE” implemented the architecture with LEs only. Therefore, the results for these two columns are identical except for area related items. This comparison concludes that the proposed architecture and its implementation using memory blocks can drastically reduce the LE utilization, and users do not need to worry about how much LE resources can be used for application and how much LE resources should be kept for BIST implementation.

Then, we compared two results in columns “Prop. BIST (Memory)” and “Conv. BIST”. Despite the proposed BIST architecture has larger LFSR/MISR and includes more TPCs, the difference in LE usage is very small since the proposed BIST architecture efficiently replaces LE resources with memory bits. On the other hand, the difference in delay test quality is remarkable. The proposed method can achieve 5% and 6.3% higher fault coverage than the conventional method. The differences become more visible when we compare the number of pseudo-random patterns to reach the same fault coverage. For example in b17, the proposed BIST architecture only requires 6,120 patterns to reach 49.43% fault coverage which is the final coverage of the conventional BIST architecture.
The trend is unique for FPGA devices and cannot be observed from delay test quality as well as LE usage.

4. CONCLUSIONS
We have presented a scan-based BIST architecture for application-dependent FPGA testing. The main idea is to utilize shift register configurations of memory blocks on FPGAs to efficiently implement BIST components such as LFSR, MISR and TPCs. Moreover, the proposed TPC architecture provides an enhanced scan capability and improves test quality of LOC-based delay test application scheme. One of the future works is to investigate a method for test point insertion based on the proposed BIST architecture to optimize delay test quality as well as LE usage.

5. ACKNOWLEDGMENTS
This work was supported in part by Japan Society for the Promotion of Science (JSPS) under Grants-in-Aid for Scientific Research (B)(No.25280015).

6. REFERENCES

after applying 20,000 pseudo-random patterns. This shows that the proposed method can obtain around 70% reduction in pattern count for the same fault coverage. These gains come from the two reasons: (1) the proposed BIST architecture efficiently implemented larger size LFSR and MISR and (2) TPCs have enhanced scan cell capability for LOC-based delay test application scheme.

We further analyzed the impact of test point insertion on the proposed BIST architecture for b12. Random test point insertion (16 CP/OP test point pairs) was performed for 5 times and the proposed BIST architecture was implemented for each design. Figure 8 shows the results of area overhead (the number of LEs) and fault coverage for the 5 designs. It is noticeable that the area overhead depends on the location where the test points are inserted. This trend is unique for FPGA devices and cannot be observed from test point insertion for ASIC products. It is also interesting to note that area overhead and fault coverage are not correlated so much (the correlation coefficient is 0.49). For example, "Design2" provides high fault coverage (68%) with low area overhead (the number of LEs is 469). In this case, the proposed BIST architecture can achieve 5% higher fault coverage with lower area overhead compared to the conventional BIST architecture shown in Table 1. The analysis tells us that a new test point insertion method is required for scan-based BIST on FPGA devices even though it is extensively studied for ASIC products.

Table 1: Results for b12 and b17.

<table>
<thead>
<tr>
<th>Design</th>
<th>b12 (#scan_chains=8, length=16)</th>
<th>b17 (#scan_chains=20, length=65)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Org. w/ scan</td>
<td>LE</td>
</tr>
<tr>
<td>Test point</td>
<td>- 16 CP/OP test point pairs</td>
<td>- 128 CP/OP test point pairs</td>
</tr>
<tr>
<td>Number of flip-flops for TPCs</td>
<td>- 16</td>
<td>64</td>
</tr>
<tr>
<td>LFSR/MISR size</td>
<td>9 (8 + 1)</td>
<td>36 ((8 + 4) × 3)</td>
</tr>
<tr>
<td>Resource for shift register</td>
<td>- LE Memory LE</td>
<td>- LE Memory LE</td>
</tr>
<tr>
<td>Total logic elements</td>
<td>411</td>
<td>472</td>
</tr>
<tr>
<td>Total combinational functions</td>
<td>405</td>
<td>455</td>
</tr>
<tr>
<td>Total registers</td>
<td>121</td>
<td>155</td>
</tr>
<tr>
<td>Total memory bits</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Delay test</td>
<td>LOC + enhanced scan</td>
<td>LOC + enhanced scan</td>
</tr>
<tr>
<td>Fault coverage [%]</td>
<td>- 63.01</td>
<td>69.38</td>
</tr>
</tbody>
</table>