Generally, a programmable LSI such as an FPGA is difficult to test compared to an ASIC. There are two major reasons for this. The first is that an automatic test pattern generator (ATPG) cannot be used because of the programmability of the FPGA. The other reason is that the FPGA architecture is very complex. In this paper, we propose a new FPGA architecture that will simplify the testing of the device. The base of our architecture is general island-style FPGA architecture, but it consists of a few types of circuit blocks and orderly wire connections. This paper also presents efficient test configurations for our proposed architecture. We evaluated our architecture and test configurations using a prototype chip. As a result, the chip was fully tested using our configurations in a short test time. Moreover, our architecture can provide comparable performance to a conventional FPGA architecture. key words: design for testability, homogeneous architecture, test method, prototype chip
Introduction
At present, the intellectual property (IP) core based system on a chip (SoC) is an important component in the embedded system industry. The use of IP technology has reduced manufacturing costs and improved time-to-market. However, the SoC is unsuitable for the low-volume production of a wide variety of products because it cannot be customized after its manufacture. Thus, reconfigurable logic IP cores such as FPGAs have become very important components that help provide flexibility to the SoC for each customer. However, reconfigurable components require a long manufacturing test time compared to application specific IPs, for which automatic test pattern generator (ATPG) tools can be used. To test the FPGA, generally, many test configurations must be downloaded into the FPGA. This increases the total test time since the configuration process accounts for a large part of the test time [1] , [2] . As a result, reconfigurable IPs cause shipping delays. In order to mitigate this problem, we need to consider efficient FPGA testing with fewer configurations.
One of the factors responsible for the long test time is the presence of numerous types of circuit blocks in the FPGA. Testing of these blocks requires test configurations suitable for each block, and therefore, the number of configurations is large. For example, we consider the general island-style FPGA architecture based on VPR [3] , [4] (see Fig. 1 ). It consists of an array of N ×N configurable logic blocks (LBs) that can be programmed to implement logical functions. The LBs are connected by switch blocks (SBs), connection blocks (CBs), and routing tracks. IO blocks (IOBs) are used to communicate outside of the chip. Although this architecture seems to have regular circuit blocks, it includes irregular circuit blocks on the nearby boundary of IOBs (shown as the shaded area in the Fig. 1 ). Accordingly, this architecture requires three types of SBs each with a different structure. Therefore, we need to prepare test configurations suitable for each type of SB, which increases the number of configurations. Similarly, CBs can also have several different structures. Having many structures is one of the reasons for using a large number of test configurations.
As mentioned above, a more simple architecture that can be easily tested is required. However, recent FPGAs are not suitable because they have very complex logic block and interconnect architectures, and testing of them requires more configurations. Thus, in order to perform comprehensive testing with the least number of configurations, we propose an easily testable FPGA routing architecture and a test method.
• New easily testable FPGA architecture: We propose a genuine regular FPGA architecture that consists of fewer types of circuit blocks. Our architecture is easily testable and it keeps the area overhead and speed overheads to a minimum.
• Configuration dataset for testing: The circuit resources that are the most difficult to test are the components of the programmable interconnects in the FPGA. Therefore, we introduce test configurations for the interconnect part. Our test configurations are based on the regularity of the SB topology, and all global interconnects can be tested completely with only five test configurations.
• Test equipment: We use several circuits such as test pattern generators (TPGs) and output response analyzers (ORAs), which are employed in built-in self-tests (BISTs). These blocks shorten the test time. Moreover, as the circuits are very small, there is no area overhead.
In this study, we evaluated the performance and testability
Copyright c 2012 The Institute of Electronics, Information and Communication Engineers of our architecture by using a prototype chip. We achieved 100% stuck-at fault coverage in a short test time with a little performance overhead. The paper is organized as follows. Related work is discussed in Sect. 2. In Sect. 3, we discuss the concept of an easily testable architecture, and describe our proposed architecture. Section 4 introduces efficient test configurations for programmable interconnects. In Sect. 5, we first discuss FPGA testing and then present an evaluation of the performance of our proposed architecture. Finally, Sect. 6 concludes this paper.
Related Work
The most widely used test technique is a BIST that uses the programmability of the FPGA [1] , [2] , [5] . In this technique, LBs have three roles: as test targets, TPGs, and ORAs. The test is basically performed in three sessions, with the role of the LB changed in each session. The advantage of this technique is that there is no area overhead since no additional equipment is used. However, this method increases the number of configurations required to complete the test because only a few LBs can be tested at a given time. On the other hand, our method involves the inclusion of specific circuits that can be used as TPGs and ORAs in the device. While the inclusion increases the device area, the test time is shortened since all LBs can be configured as test targets.
Techniques like BIST have become attractive through the use of "shift-configuration" [6] . This technique involves the modification of the SRAM so that it can shift the configuration data. The test is then performed for the entire device by reusing the shifted data. This leads to a greater reduction in configuration time compared to techniques that use external configuration. Since configuration time accounts for a major part of the test time [1] , [2] , this technique is very effective. However, it can be applied only to a completely homogeneous FPGA architecture. In fact, Dourmar et al. [6] have modified the SRAM structure so that configuration data can be shifted only in the homogeneous part of the FPGA. Our FPGA architecture has a completely homogeneous structure, thus allowing the use of the above mentioned technique for the entire device.
Renovell et al. [7] proposed very efficient test configurations for the programmable interconnect. They focused on the connections of SBs, which are divided into three types: orthogonal and two types of diagonal connections, and showed that 100% fault coverage could be achieved for SBs with three configurations. However, CBs and SBs can be of several types in an actual FPGA. Therefore, additional test configurations are required. Our architecture has regular interconnects. We do not need additional configurations for different structures.
In this work, we are aiming to use our device as one of IPs in SoCs. Accordingly, our device needs to offer the testability as well as other application specific IPs because of shipping delay. In order to meet this demand, we adopt more simple architecture like traditional island-style. Moreover, we prepare no hard macros like digital signal processors (DSPs) and embedded memories because they are already implemented as other IPs in SoCs. On the other hand, recent FPGAs such as Virtex-7 and Stratix-V offer highperformance devices with very complex architectures and several hard macros. That is why the strategy of our device is different from commercial devices. Therefore, the comparison of both devices is not necessary.
Architecture for Testability
In this section, we present a new FPGA architecture that can easily be tested. First, we discuss the type of architecture that facilitates easy testing and then we introduce our architecture.
Demands for Testability
In FPGA testing, it is important to test exhaustively in a short time. To achieve this, we need to design a regular FPGA architecture so that additional configurations for irregular parts can be avoided.
Besides, the number of required configurations depends on the complexity of the FPGA architecture. In particular, the interconnect network is very complex, and the number of test configurations for interconnects is generally larger than that for logic blocks [8] . The interconnect architecture should be simple in order to reduce the number of test configurations. Figure 2 shows a diagram of our proposed architecture. The most distinctive feature of this architecture is that it is composed of only a single type of tiles and IOBs. In other words, all SBs and CBs have the same structure, unlike the architecture of Fig. 1 . To simplify the task of designing circuits, the connections between tiles are kept simple, and the connections are formed only by the routing track.
Proposed Architecture
To realize this architecture, we consider the following issues.
• Eliminating irregular parts around IOBs: As previ- ously mentioned, the irregular parts in the area around IOBs lead to an increase in the number of configurations. Therefore, they should be eliminated. We modify the structure of the CB, SB, and IOB to make the connections well ordered.
• Simple connections between circuit blocks: In the conventional architecture, LBs, CBs, SBs, and wire segments are intricately connected to one another. Their connections make it difficult to test them. To overcome this problem, we design these components to have simple connections.
To resolve the above issues, we describe four points of modification in the following subsection.
CB Connections
In our architecture, we eliminate the shaded area shown in Fig. 1 to maintain the regularity. Further, tiles are connected to each other using only routing tracks. Therefore, we need to modify the CB connections as shown in Fig. 3 . In the conventional island-style architecture, outputs of the four CBs are connected to one LB beyond the tiles, and the connections between tiles become very complex, as shown in Fig. 3 (a) . On the other hand, in the proposed architecture, the two CBs are connected to one LB in the same tile, as shown in Fig. 3 (b) . Therefore, the connections between tiles are simplified.
SB Structure
Similarly, we modify the connections between SBs and LBs, as shown in Fig. 4 . In the conventional architecture, outputs of four LBs are connected to a single SB beyond the tiles, as shown in Fig. 4 (a) . In our proposed architecture, only one LB is connected to a single SB in the same tile, as shown in Fig. 4 (b) . In addition, we modify the SB structure to improve the routability of the SBs. In Fig. 4 (a) , the LB outputs can be propagated in any direction using four SBs. On the other hand, the outputs of the LB are connected to only one SB in the proposed architecture. We need to modify the SB so that the LB outputs can be propagated in all directions as shown in Fig. 4 (b) , which is necessary to maintain routing flexibility.
IOB Structure
In a typical island-style architecture (see Fig. 1 ), IOBs are connected to routing tracks through CBs. We modify the IOBs to connect directly to the routing tracks, as shown in Fig. 5 . The IOB has several MUXs, which can connect to routing tracks. Moreover, the IOB provides a TPG and ORA, which can be used for testing. The details of these circuits are described in Sect. 4. 
Alignment of Wire Segments
Most FPGA architectures have several types of wire segments. These render the routing structure complex. For example, we consider the connections of the quad line shown in Fig. 6 (a) . Each quad line is connected to alternate SBs to obtain high routability. In this case, we have to design many types of tiles with different connections between the SBs and wire segments.
In this work, we use aligners to simplify the connections of wire segments, as shown in Fig. 6 (b) . Aligners are implemented between SBs to align wire segments. Unlike the previous architecture, all SBs can have uniform connections for wire segments, and the number of types of tiles is reduced. Further, there is no area overhead due to the changing connective relationship.
Configuration Scheme
We also consider a configuration scheme to reduce testing time. The basic approach is to use the shift-configuration method in [6] . The example of shift-configuration is shown in Fig. 7 . First, we configure the entire chip. Then, the new configuration bits for one row will be inserted into the bottom of the chip by shift-configuration. Other configuration bits in the chip will be shifted to upper rows. In order to utilize this technique, we prepare two configuration paths: the Tile array conf. path and IOB conf. path as shown in Fig. 8 . These paths consist of shift-registers used as configuration units. By dividing the shift-registers into two paths, we can configure the tile array and IOBs individually and shift configuration data in each path.
Test Method
In this section, we explain our test method. First, we describe the test strategy of a fault model and target circuit resources and then, we present the test procedure.
Test Strategy
Our target fault model is a single stuck-at fault model, which is widely used as the basis for automatic test pattern generation in digital circuit testing. This paper mainly focuses on a global routing region that includes routing wires and SBs. The routing resources occupy the major part of the chip area, up to 90% [9] , and therefore, the possibility of fault occurrence is higher in them than in the other resources. Of course, the other resources, such as LBs, CB, IOBs, are also tested by using additional test configurations.
4.2 Target FPGA Architecture Table 1 presents the details of the target architecture in this work. The FPGA has our proposed homogeneous routing architecture, and it uses abridged adaptive LUTs (A 2 LUT), which are our proposed logic cells [11] . Further, we utilize NV-FFs (non-volatile flip-flops) as configuration memory cells, which contain FeRAM (ferroelectric random access memories), FFs, and power-gating control circuits [12] , [13] . Our prototype chip used in the evaluation, with an area of 54.76 mm 2 , is shown in Figs. 9 and 10.
A 2 LUT Architecture
In a previous work [11] , we investigated the appearance ratio of the logic functions using P-equivalence class [14] . As a result, we found that only small portions of the Pequivalence class can cover large portions of the logic functions used to implement circuits. Based on this result, we have proposed an abridged adaptive LUT (A 2 LUT) as shown in Fig. 11 to reduce the FPGA area. The A 2 LUT has a 4-input LUT (two 3-input LUTs), two extra configuration memory bits M [8] and M [9] , another three MUXs and an AND/NOR/OR gate, six input pins and one output pin. As well as the 4-LUT, the A 2 LUT can implement all of 4-input logic functions using the MUX connected to In[0]. The A 2 LUT can also implement a part of the 6-or 5-input functions. 
Proposed Test Method for Global Interconnects
Utilizing the regularity of SBs, we can perform efficient testing. We adopt a Wilton-type SB, which has high routability [10] . At least three configurations are required for testing Fig. 9 Layout photo of the prototype chip.
Fig. 10
Die photo of the prototype chip. 
Fig. 13 Test configuration for path (a).
global routings completely in this architecture [7] . Figure 12 shows the three types of configurations of the SB, referred to as orthogonal, clockwise, and counterclockwise. First, all SBs are configured as the orthogonal type, as shown in Fig. 13 . In this test case, all input and output data are accessed for the test using the I/O pin. However, it is impossible to test all resources simultaneously owing to the limited number of I/O pins. Therefore, we include TPGs and ORAs in all IOBs, as shown in Fig. 5 . The TPG consists of one inverter and one Flip-Flop and propagates a toggle signal. The ORA consists of one OR gate and EXOR gates to compare the propagated test signals with each other. If a stuck-at fault does not exist, ORA will output the "Low" signal; otherwise, it will output the "High" signal. Using the TPGs and ORAs, all resources can be tested and the constraint imposed by the number of available I/O pins can be overcome. Because these test support circuits are very small compared to the logic and routing resources, there is little area overhead.
Next, all SBs are configured as the clockwise type. Figure 14 shows some configured SBs. In this figure, two paths can be seen. Note that the bold dot lines denote quad lines, and the connections between neighboring SBs are formed using single lines. Path (b)-1 forms a closed loop. This path is drawn with a single stroke from a certain SB, and it returns to the same SB. In the test sequence, we incorporate a TPG and an ORA into one LB at the circled point in Fig. 14. A test signal leaves from the TPG and reaches the ORA through this path. In this case, the test signal is propagated in a "clockwise" direction through clockwise switches of different SBs. Moreover, the set of switches in one test path is exactly same as switches of one SB. The ORA analyzes the propagated signal and stores it in a Flip-Flop in (7) test the SB at bottom-left on the tile array, but they do not collide each other. In fact, 256 paths are combined for all SB testing because there are 256 SBs in our 16 × 16 tile array. Hence, all clockwise connections of all SBs can be tested simultaneously. This method depends only on the topology of the SB and not on the flexibility of the SB, the type and number of wire segments, or the size of the FPGA array. However, in the case of path (b)-1, a certain output pin of the SB at the circled point is not tested because the SB uses one output pin to connect to a TPG output. For this reason, we need one different clockwise configuration in which the TPG and ORA are implemented into different LBs to test the pin at the circled point. The configuration of the counterclockwise type is tested in the same way as the configuration of the clockwise type. Finally, our method can test global wires completely using only five tile array configurations; one orthogonal type, two clockwise types, and two counterclockwise types, and one IOB configuration.
Other Resources Testing

The Connections between LB-SB
The test method for the connections from LB outputs to SB is shown Fig. 16 and Fig. 17 . Our LB has 4 basic logic elements (BLEs) (BLE0-BLE3), and the number of LB outputs is 4. In our method, one BLE is configured as a TPG, and other three BLEs are configured as ORAs. For example, BLE3 is configured as a TPG in Fig. 16 . Next, BLE3 output is propagated to all SB outputs. Note that Fig. 16 only shows horizontal channels. One SB propagates 9 signals per side, these signals are input to LBs through CBs. Finally, the BLEs configured as ORAs analyze the signals, the results of analysis are read back through scan flip-flops. However, we need many ORAs to analyze all signals. Thus, we prepare more two LBs only for ORAs, and they are configured between the LBs under test as shown in Fig. 16 . For three LBs: one tested LB, another two LBs as ORAs, total 18 signals are propagated from SBs, and they are distributed to three LBs and analyzed. Figure 17 shows the configuration of entire device. In this configuration, we implement five types of tile: testing BLE0-BLE3, analyzing signals. Then, we test them while shifting configuration bits by shift-configuration technique in order to test all BLEs in all tiles in a short time. In this method, we need one configuration for each of tile array and IOBs, and 11 shift-configurations for exhaustive testing. Figure 18 shows the detail structure of the CB which has 6 outputs to an LB and 48 inputs from global tracks. The CB consists of six 24-to-1 MUXs. Therefore, in order to test all CB connections, 24 test configurations are required. For these configurations, we use one configuration for each of tile array and IOBs, and 23 shift-configurations. In this testing, test signals are propagated from the TPGs in IOBs through global tracks, and they are analyzed by ORAs in each LB.
CB
LB
In order to test stuck-at faults of A 2 LUT, we need to check that 0/1 signals can be propagated from configuration memories to A 2 LUT outputs. This is because A 2 LUT is mainly composed of MUXs selecting memory bits according to input signals. Hence, we configure A 2 LUT for two times, namely all configuration memories are configured to "0" or Testing of local interconnects of LB is performed simultaneously in testing other resources. That is why no configurations for local interconnects is required.
IOB
The actual IOB structure of our chip is shown in Fig. 19 . It has 24 inputs/outputs from/to global tracks and two I/O pins. Note that we add I/O elements (IOEs) into IOBs in the chip to improve the availability of IOB. The IOE includes a scan flip-flop, which can be used for boundary scan. In IOB testing, we need to check the following paths. (1) and (3), for example, firstly test signals are input using the path (1) of IOBs at the left side of the tile array, and then go through the tile array configured as shown Fig. 13 . They finally arrive at the path (3) of IOBs at the right side of the tile array. Next, we swap the role of each IOB, and then test again. In this case, the path (1) and (3) of IOBs can be tested. We similarly test other paths in all IOBs, and conclusively need three configurations of the tile array and 38 configurations of the IOBs to test all IOB resources.
Evaluation
The proposed architecture and test method were evaluated and compared in terms of test efficiency and performance. 
Test Evaluation
We discuss the efficiency of our proposed test method using the prototype chip. We measured the toggle coverage as the fault coverage for detecting stuck-at faults. We used an AND/OR tree suite as test circuits for comparison with our proposed method; the tree suite is widely used to detect stuck-at faults [6] . This suite consists of 16 AND trees and 16 OR trees, which we implemented CAD tools: A 2 LUTEMap, T-VPack [15] , and VPR 5.0 [4] as shown in Fig. 21 . A 2 LUT-EMap is a technology mapping tool for the A 2 LUT, it is based on the EMap used for standard FPGAs [16] . The VPR program is also modified to support a homogeneous architecture. First, we designed the FPGA with Verilog-HDL and synthesize a gate-level netlist with Synopsys Design Compiler Y-2006.06-SP6-2. Then, we performed a functional simulation and measured the toggle coverage using Cadence NC-Verilog 06.20-s004. Figure 20 shows the waveform of the interconnect testing using an AND tree. In this test, "L" signals were input into all input pins, and then toggle signals were input to verify the behavior of the AND tree. The configuration sequence was carried out at 16.4 MHz, and the test circuits also performed at 20.4 MHz. Table 2 gives the test results for the interconnects. Note that the sequences of tile array configuration and IOB configurations take 87,040 and 2,752 cycles, respectively. The shift-configuration used to shift configuration data in one row of tiles needs 5,440 cycles. The results show that, whereas the test coverage using the AND/OR tree is 86.2%, our proposed method achieved 100% fault coverage. Additionally, our method showed a reduction in test time of up to 84.1% of the test time for the AND/OR tree suite. Thus, our method is very efficient. Table 3 shows the test of the entire device including LBs, SBs, CBs, and IOBs. The AND/OR tree suite was tested in the same way as interconnect testing. Unlike interconnect testing, the proposed test needs more configurations compared with the AND/OR tree suite to test the entire device exhaustively. The numbers of IOB configurations and shift-configurations are particularly large. The major reason is the testing of multiplexers in CBs and IOBs. The multiplexers in CBs and IOBs have large number of inputs, and the testing requires configurations as many as at least the number of inputs. But IOB and shift-configurations take less time than tile array configurations because the number of configured bits in the IOB and shift-configuration is very small. As a result, our proposed method was executed in half the time taken by the AND/OR tree suite.
Performance Evaluation
We compared our architecture with the conventional islandstyle architecture shown in Fig. 1 by using benchmark circuits. The 15 circuits in the MCNC suite were used to measure the performance of each architecture. The implementation flow is shown in Fig. 21 . The area and delay are reported using these tools. Note that the VPR estimates the array size and the number of routing tracks for each benchmark circuit. Figure 22 shows the results of the number of routing tracks, area, and critical path delay for each circuit. The graph gives values normalized by the values for the conventional architecture. The routing tracks of our architecture are 13% larger on average. The main reason is that the LB outputs are gathered to a single SB. On the other hand, the LB outputs in the conventional architecture are distributed to surrounding SBs. Because of this difference, the possibility of signal congestion in our architecture is higher than that in a conventional one. However, in the proposed architecture, the total area overhead is 7% larger than the conventional architecture on average. Although the delays in our architecture increased by 0% -25%, these performance penalties are admissible owing to the efficiency of testing. We believe that these results can be improved by considering the algorithms for the tools used in Fig. 21. 
Conclusion
In this study, we developed a completely homogeneous FPGA architecture that simplifies the testing of the FPGA. This architecture has two main characteristics. One is that it has fewer types of circuit blocks compared to the conventional architecture, while the other is that it has simplified connections. As a result, fewer test configurations enable exhaustive testing.
We also proposed a test method that can be used for testing all interconnect resources in a short test time. Our method is based on the regularity of the Wilton-type SB. This method is independent of other architecture parameters, for example the array size, number of routing tracks, and types of wire segments.
Using our architecture and test method, we can achieve 100% fault coverage for global routing resources. Experimental results show that the performance overhead is very small in our architecture. We are now considering how to improve the device performance of our architecture in terms of architectures and CAD tools.
