In this paper we show the novel architecture to digital circuit testing in hardware on a reconfigurable processing unit for compile-time reconfiguration.
I. INTRODUCTION
This paper introduces an architecture that targets circuit testing on a reconfigurable architecture. The proposed architecture consists of a soft-core fixed microprocessor and a Reconfigurable Processor Unit (RPU).
The former guarantees meeting soft and hard deadlines in order to achieve the intended system functionality. While the latter holds different functional blocks which are referred to as Hardware Blocks (HB). The HBs can be any circuit; however this paper targets an HB composed of a Circuit Under Test (CUT) and three compressors. The CUT and compressors are realized from [1] . There are a fixed number of HBs in the RPU, but by exploiting Run-Time Reconfigurable (RTR) architectures, we can reconfigure the RPU with different HBs while the device is active [2] . RTR architectures have the ability to change the functionality of the programmable-hardware while the system is executing the given flow. Nevertheless, RTR description is outside the scope of this paper. Additionally, the RPU allows for the simultaneous execution of several HBs in either a static or dynamic RPU.
Reconfigurable computer architectures, which make use of a programmable-hardware device in order to perform those tasks which would otherwise execute slower on the traditional fixed-hardware processor, have been studied by Li et al [3] . Systems employing such architectures have been implemented in [4] . These systems introduce RTR devices. This puts into practice the concept of an RPU as the programmable-hardware device that is acting as a coprocessor for the main fixed-hardware processor.
Our ongoing project, Embedded Research Architecture for Co-design Environment (ERACE), by the Group for Embedded MicroSystems (GEMS) research lab at the University of Ottawa has deployed two MicroBlazes. One MicroBlaze acts as the main microprocessor of the whole system and is referred to as µB_A, which stands for MicroBlaze Application. The other MicroBlaze acts as a comicroprocessor and is referred to as µB_R, which stands for MicroBlaze Reconfiguration. Figure 1 shows the final system architecture and it shows where the RPU will be with respect to the other modules. Section II of this paper studies and examines the different reconfigurable digital circuit testing architectures already developed. The RPU architecture is presented in III and this paper is concluded in section III. For the herein system, a Field Programmable Gate Arrays (FPGA) functions as the programmable-hardware device, and, for the current design, a soft-core microprocessor (Xilinx [5] MicroBlaze) is placed on the FPGA to act as the fixed-hardware processor. As well, the demonstration board utilized is a Xilinx Multimedia Board equipped with a XC2V2000-FF896-4 FPGA courtesy from Canadian Microelectronics Corporation (CMC).
II. BACKGROUND
In this section we will investigate other hardware digital circuit testing and other co-processor system architectures that have explored the use of reconfigurable computing to extend the processor's functionality and achieve better performance.
We will concentrate our studies and examination on the former, digital circuit testing.
The Reconfigurable Co-Processor (RCoP) is based on the interconnection of several Xilinx 4000 series FPGAs [6] . Systems based on these chips suffer from two disadvantages: reconfiguration is slow and data transfer has to be done via I/O pins of the FPGA [7] . Furthermore, the communication delays introduced in order to achieve efficient parallelism among the XC4000 FPGA incurs significant overhead. In contrast, the system presented here will be entirely implemented on a single Xilinx Virtex-II chip, a partially reconfigurable FPGA that is optimized for high speed and handles a large range of densities up to 10 million system gates [8] . In addition, during the execution of a program code, the RCoP's hardware instruction calls are detected at run-time, whereas in our architecture, this is done at compile time. A compile time approach enhances parallelism as it predicts the reconfiguration of modules required for a particular program prior to their invocation in the application flow. The RECON system consists of a SUN SparcStation host and a reconfigurable co-processor board [8] . The board exploits an XC4010 FPGA as the reconfigurable processor unit and an XC2064 FPGA as the co-processor controller. Similarly to the RCoP architecture mentioned earlier, this system inherits the drawbacks of the XC4000 series FPGA. In addition, the use of a single reconfigurable FPGA that is highly restricted in gate density will limit the programmability of the co-processor. Moreover, a lack of parallelism is clearly present in the system as partial RTR of the XC4000 is unsupported. Conversely, in this system, maximum parallelism is exploited with the introduction of the HB methodology, along with the partial RTR capabilities of the Virtex-II FPGA enables the co-processor to execute several independent tasks in parallel on a single chip.
Erstwhile literature has shown fault-injection testing extremely effective when the circuit has been realized. However, that was the classical approach, now several authors demonstrated the advantages of fault-injection testing at an early stage in the design flow, primarily at the entry level. In [20] we see that this can be done at a very early stage (i.e. directly in the HDL (Hardware Description Language) modules), thence allowing for testing verification during the functional simulation stages. Although functional simulation is very useful to the whole design cycle, these simulations will grow dramatically with the number of injected faults, therefore lacking temporal scalability [21] . Digging deeper, several authors investigated with the idea of FPGAs (Field Programmable Gate Arrays) as the test prototype medium; were speedups can be achieved in hardware fault-injections. This approach would provide the testing stage with the physical and real-time inputs and outputs of the digital circuits [22] [23] . Therefore, we decided to perform our digital circuit testing using BIST (Built-In Self-Test) techniques on an FPGA platform.
BIST techniques have been verified, implemented, and tested on many benchmarks. These benchmarks were established in the International Symposium on Circuit And Systems (ISCAS) symposium (i.e. c17, c432, etc…). The three categories of benchmarks are combinational, sequential, and core (combinational plus sequential) benchmark circuits. These three benchmarks were established in ISCAS'85, ISCAS'89, and ISCAS'99, respectively, and they will be referred to as ISCAS'85, ISCAS'89, or ISCAS'99 benchmarks. The scope of this thesis covers the former, ISCAS'85 benchmarks (combinational benchmark circuits), such as c17 and c432. Each benchmark has a known number of tests, detected faults, and undetected faults for each test pattern. The test patterns could be generated utilizing one of the following three methods: Deterministic Test Pattern Generator (DTPG), pseudo-Random Test Pattern Generator (RTPG), or Exhaustive Test Pattern Generator (ETPG). DTPG has a known test-set. RTPG could be generated using any pseudo-random techniques, as implemented in [24] ; were the authors stated "highly efficient testing techniques that ensure correct system performance has thus assumed significant importance" [24] , and one of the techniques used to achieve such efficient testing techniques was from [25] . ETPG implies using the whole truth table's inputs as the testset.
There are two means to testing the BIST techniques, either using software or hardware. The former has been widely and successfully used as a de facto for first step testing of BIST techniques. The software results obtained for this thesis, and which will be compared to in the subsequent chapters, were obtained by employing the following software simulators: FSIM, ATALANTA, and HOPE. FSIM is utilized for combinational RTPG testing. ATALANTA encompasses both combinational DTPG and ETPG testing. And HOPE is used for sequential testing. However the latter will not be covered in this thesis. To achieve better performance (i.e. processing time it takes to detect all faults) a hardware system is exploited; more precisely an FPGA (as stated above).
In all applications, with this one not an exception, hardware outperforms software in execution time, however sometimes hardware may have too much overhead and the results will show this not to be the case for this application. Other hardware implementations have been developed and tested, such as [15] - [19] , to prove hardware's speedup performance..
In [15] , it is reported that fault injections are injected at the inputs/outputs of the LUTs (Look Up Tables) and these fault injections are partially reconfigured via a "Boundary Scan" JTAG (Joint Test Action Group) download (hence offboard communication). In this thesis, we resolve the former by injecting faults on every gate's inputs/outputs, hence a finer fault injections approach. And secondly, for the latter, all fault injection selections are realized on-chip, hence no off-board communication through JTAG, demonstrating a full on-chip on-board BIST system, where all modules of the BIST system are on-chip and are autonomously executed.
In [16] - [19] , RTR (Run-Time Reconfiguration) is utilized to manipulate the fault-injections through small-bit manipulations; one of the two flows for partial reconfiguration [26] . Stuck-at faults are injected into the LUTs by changing the values of the inputs and outputs of the LUTs, utilizing small-bit manipulation, without compromising the executions of the rest of the circuit. The primary downside to this method is the test's accuracy, or lack of. The testing strategy used only injects faults (stuckat-0 or stuck-at-1) at the inputs or outputs of the CUT (Circuit Under Test), thence restricting the range of discovered hidden and non-hidden faults. As in Figure 2 , if only the inputs or outputs of the CUT is injected with faults, the in-between (between the input and output stage) gates and lines are not injected with faults. Hence, in reference to Figure 2 , gates 10 and 11 and lines 8, 9, 12, and 13 are not injected with faults. Stuck-at-1 and stuck-at-0 faults are refereed to as s-a-1 and s-a-0, respectively. Keeping in mind Figure 2 as our reference, if we put a s-a-0 at input 2 the output of gate 10 will always be 1, however if line 12 is always s-a-1, by faults which may or may not cause a failure [27] . There are many faults that would or wouldn't cause similar faults. These faults are stuck-at faults, bridging faults, breaks and transistor stuck-on/-open faults in CMOS (Complementary Metal-Oxide Semiconductor), and delay faults. The scope of thesis only covers the former, stuck-at faults. Nevertheless, all of the pre-mentioned faults are all logical faults as opposed to nonlogical faults which include such faults as the malfunction of the clock signal, power failure, and so forth.
Atop the above mentioned reason for using other methods of fault injections, other than using LUTs as the inputs/outputs of the CUTs, there exists another incentive. Figure 2 shows the least complex and smallest circuit of all the ISCAS'85, ISCAS'89, and ISCAS'99 benchmark circuits. Table 1 shows how the subsequent circuits grow dramatically. The data in those tables were calculated using the original BENCH files provided from [28] . The circuit name provides the number of gates (which include the inverters) plus the number of lines in the circuit. Clearly the above methods of fault injections [15] - [19] can inject faults into the circuit, having a colossal drawback to full CUT testing. Moreover, lack of scalability is evident when the circuits are limited to a specific benchmark or a specific CUT, allowing for improvements in their systems.
If we look more closely on [16] , we notice a vital missing detail that the authors overlooked. They state that CTR can be realized by "several VHDL codes containing different faults and each of them can be executed separately." [16] , however with the right fault injections techniques, like the one in [24] ; which uses FIMs (Fault Injection Multiplexers) to inject stuck-at faults into any gate's input or output. An FSM (Finite State Machine) was developed to inject stuck-at faults at every gate's input and output of the CUT, thence a finer discovery of hidden and non-hidden faults with one configuration.
The main objective of our research is to demonstrate the advantages of sequential CTR and parallel CTR and to illustrate the speed-ups associated with digital circuit testing using BIST for software versus sequential CTR and sequential CTR versus parallel CTR.
III. SYSTEM ARCHITECTURE
The proposed final architecture, as shown on Figure 1 , consists of an RPU attached on the OPB_R. The RPU's own design went through a couple of stages, static designs and dynamic designs. The former was designed as a static coprocessor, while the later utilizes RTR functionality in order to respond to the changing environment inputs. The RPU allows for the simultaneous execution of multiple hardware functional units, by exploiting a Just-In-Time (JIT) compiler in order to maintain hardware and software flow synchronization. The JIT compiler's description and RTR is outside the scope of this paper. C17  6  11  C432  160  272  C499  202  297  C880  383  497  C1355  546  809  C1908  880  1028  C2670  1193  1477  C3540  1669  1871  C5315  2307  3008  C6288  2416  3872  C7552 3512 4040 * These include inverters Figure 1 shows the RPU on the OPB_R with FSL to and from both µB_A/R. These FSLs allow the MicroBlazes to communicate to the appropriate HBs. Figure 3 shows an advanced high-level design. We introduce OPB to OPB bridging to allow for the MicroBlaze to communication through OPB_R. As well we propose the addition of local memory (LM) to OPB_RPU to allow inter-communication between HBs. This is our system architecture used to test digital circuits. Each HB encompasses the CUT plus the utilized BIST technique. Figure 4 shows the HB architecture and all the inner modules. The BIST technique is encompassed within the CUT and FSM. We introduce stuck-at-0 and stuck-at-1 for each line in the circuit; hence in Figure 2 (C17 benchmark circuit) we would introduce 22 stuck-at faults (11 stuck-at-0 and 11 stuck-at-1). These stuck at faults are injected utilizing Fault-Injection Multiplexer (FIM) technique. The hardware fault injection technique is imperative in order to iteratively inject faults to every mutually-exclusive wire, and to test both the stuck-at-0 and stuck-a-1 faults. For that matter, a plan was formulated that is realized from [24] ; Figure 5 . As we can see in the figure, every mutually-exclusive wire now has a FIM introduced within it, which allows us to either run the wire as is, or inject stuck-at-0 or stuck-at-1 faults. If the value of the select signals of any multiplexer are at "00", then we will run the wire as is, while if the values are at "01", then we will inject a stuck-at-1 fault indicated by the logical 1 value coming into the multiplexer, and if the values are at "10", then we will inject a stuck-at-0 fault indicated by the logical 0 value coming into the multiplexer. Finally, if the values are at "11", then we will again assume normal operation of the wire.
The flow of FSM for executing the HB is shown in Figure  6 . The flow starts of by injecting a stuck-at on one of the lines then accepts the test pattern input. From the test pattern input we know the real output (i.e. the CUT's output without any fault injections) and we have the fault injected CUT output. We then use 3 different compressors to see if any of them will make a significant difference for future research. Once at the comparator stage, we detected if there are any discrepancies between the real outputs and the fault injected CUT's outputs. If a fault is detected we increment a counter for the respective compressor, as well we keep track of the fault causing injections and their respective test patterns. Then we repeat the above steps for a different input until all the inputs have been executed, and then we inject a stuck-at on a different line. We are done testing once we run every permutation.
IV. CONCLUSION AND FUTURE WORK
Since we are only testing combinational circuit we can calculate the expected execution time to test one CUT. To compare our hardware expected results with software results, we have executed the c17 using the ATALANTA simulator, with 4 test inputs (deterministic testing) (i.e. 01111, 11010, 10000, and 10101). The simulator needed a total time of 17 ms to finish execution. Our hardware needs about 88 Clock Cycles (CC), which translates to 1.76 ns, with a 50 MHz clock. A speedup of 1 x 10 4 . This is very promising and we can expend our research further by implemented parallel fault-injections. 
