Abstract-This paper introduces a novel architecture that is targeted for digital core testing and built-in-self test (BIST) algorithm development. This reconfigurable architecture is validated by an application that implements the novel idea of verifying algorithms for testing digital circuits by using runtime reconfigurable techniques in order to minimize the circuit area, as well as the test generation and application time. The idea revolves around the dynamic partial reconfiguration of circuits under test in order to inject stuck-at faults at different locations of the circuit and uncover both detectable and undetectable faults. Four testing strategies are presented, and two are experimentally compared, namely, the sequential compile-time reconfiguration and runtime reconfiguration strategies.
I. INTRODUCTION

T HE EMERGENCE of reconfigurable computing (RC)
has given rise to a new paradigm for a flexible solution to today's processing bottleneck: Either a generic processor is used for all applications, including specialized ones, or an application-specific processor is used for all applications, including generic ones. RC provides the combination of the microprocessor that handles all real-time deadlines and a reconfigurable coprocessor, allowing for the parallel execution of multiple hardware functional units. RC dramatically lowers design fabrication and verification times and produces significant speed gains because instructions are no longer fetched, decoded, and executed, since a parallel execution of "instruction streams" is allowed. RC also eases field upgrades and ports to newer process technologies [1] and increases fault tolerance, as faults can be isolated and switched out on location.
The increasing complexity of deep submicrometer integrated circuits has increased the probability of fault occurrences in digital systems. These systems, which are designed to operate in the field for an extended period of time, require more efficient Manuscript received June 30, 2006 ; revised March 15, 2007 . V. Groza methods of testing to ensure their reliability. Due to the everincreasing integration of core-based designs, it is imperative to find highly efficient testing techniques that ensure correct system performance [2] - [7] . Verification appears in many phases during the design flow. However, even after the design has been synthesized, placed, and routed, we would like to know whether the realization is fault free. Introducing stuck-at-0 and stuck-at-1 faults will inform us of detectable and undetectable, or hidden, faults. The former is a measure of how well the circuit responds to the injection of faults, but the latter is a measure of how reliable the circuit is, since a hidden fault can give the impression that the circuit is correctly functioning, although in reality, it is not. These fault injection techniques for testing digital systems have shown to be very efficient in studying circuit behavior when faults occur. This is done by applying test patterns generated by a test pattern generator (TPG) to the circuit under test (CUT) and comparing the responses to known correct responses in order to determine the transparent interconnects, which do not have an effect on the output and, hence, are labeled as hidden faults. It is quite imperative to locate these hidden faults within a digital circuit in order to correctly identify working circuits and/or chips. The preceding test procedure works well for reasonably sized circuits; however, for more complex circuits, due to the large storage requirements of fault-free responses, the test procedure becomes rather expensive and nonscalable. Alternative approaches are sought to minimize the amount of required storage. Moreover, the amount of time required to test a large circuit increases with each new interconnect wire. Although it is obvious that the ideal massively parallel hardware implementation involves simultaneously testing all of the possible CUTs in one clock cycle (CC), realistically, that is not a very efficient use of the hardware logical resources. We need, and will introduce in this paper, sequential hardware implementations: partially parallel hardware implementations and a hybrid of both. In this paper, however, experimental results will be presented and discussed for the two sequential approaches only. This paper describes the ongoing Embedded Research Architecture for Co-design Environment (ERACE) project by the Group for Embedded MicroSystems research laboratory at the University of Ottawa. The project introduces four competing realizations of an algorithm that is used to catch both detectable and hidden faults within a CUT. The hardware implementations of the design test architecture and verification environment are suitable for built-in self-testing (BIST) of digital cores, build on the work presented in [2] and [8] , and extend the work presented in [9] . This paper is organized as follows. Section II provides an overview of the verification scheme and previous techniques that utilize the field-programmable gate array (FPGA) as the test prototype. Section III outlines the four different realizations of the proposed testing strategy: the first purely in software and the remaining three, with varying degrees of integration, in hardware. Section IV describes the architecture of the underlying system that will execute the four methods of the testing strategy. Section V outlines the reconfigurable processing unit (RPU) that is used to execute the hardware test implementations. Section VI discusses the results that are achieved by the experiments for two of the realizations, whereas Section VII concludes the paper by discussing the bases for fuxture work.
II. BACKGROUND
Reconfigurable architectures such as FPGAs are ideal devices for prototyping test schemes using either hardware description languages (HDLs) such as the very high speed integrated circuit HDL (VHDL) and Verilog, or low-level configuration application programming interfaces (APIs), such as JBits [10] . The advent of today's advanced field-programmable gate array (FPGA) architectures (e.g., the Virtex-II family from Xilinx [11] ) allows a user to atomically reconfigure, at runtime, a frame of logical and routing resources without affecting the surrounding frames. This runtime reconfiguration (RTR) allows the designer to streamline their testing strategy through onchip emulation before any manufacturing and production runs occur on the particular digital circuit. RTR involves the direct manipulation of the available hardware resources at runtime in order to respond to the surrounding requirements that are placed upon the system. It allows for the time sharing of different tasks (hence, temporal partitioning) and, thus, allows for the minimization of the required silicon area, the birth of the virtual hardware concept, the cycle-by-cycle context switching, the postfabrication adaptation to new standards, features, implementations, or bug fixes and field updates, the acceleration of the application through hardware realizations of software intensive loops, and the true multitasking of applications and algorithms [12] .
In our previous work [8] , we used a development framework for Xilinx FPGAs based on the Java language, called the JBits SDK. The latter is an API that provides low-level access to the configuration of the resources in a Xilinx FPGA. The most popular of Xilinx FPGAs are static random access memory based and have the ability to be configured numerous times. JBits [13] , [14] provides a solution to support RTR. It provides the ability to rapidly create and modify Xilinx Virtex and XC4000 FPGA circuitry at runtime by allowing direct access into the configuration bit stream. The JBits API is an ideal design environment for implementing logic circuits on mainstream FPGAs, especially when the hardware is not present or a proof-of-concept is initially required. Henceforth, our published results indicate that it is possible to ensure that a design is functioning correctly in hardware by performing a design for testability based on the BIST approach. This paper extended that ideology by realizing the digital testing and circuit verification in a hardware-emulated environment.
Classically, fault-injection testing has been shown to be very effective when the circuit has already been realized; however, several authors recently showed the benefit of fault-injection testing earlier in the digital design flow, mainly at the design entry level. This can be done at very early stages (i.e., directly in the VHDL or Verilog models) [15] , which then allow for test verification during the functional simulation stages. The latter technique is very useful; however, it lacks temporal scalability as the amount of simulation time required to process a complex circuit increases dramatically with the number of injected faults [16] . Going further, authors began the investigation of using FPGAs as the test prototype medium, thus simultaneously providing the design and test verification of the digital circuit, with physical and real-time inputs and propagation delays [17] , [18] . This method requires the user to fully synthesize, place, route, and map the circuit onto the particular FPGA-a process usually requiring dozens of minutes for the latest device families (i.e., Xilinx Virtex and Virtex-II families).
More recently, researchers have begun the exploration of integrating RTR into synthesized circuitry in the hopes of modifying, at runtime, the internal structure of the circuit by applying the stuck-at fault model. This reconfiguration is very low in granularity and can be performed using either module-based partial reconfiguration, difference-based partial reconfiguration [19] , or the aforementioned JBits API. Leveugle et al. [20] use JBits in order to perform their smallbit manipulation by changing the values of lookup table (LUT) inputs and outputs. Stuck-at faults are injected by setting, at runtime, one of these inputs or outputs to 1 or 0, without modifying the operation of the rest of the circuit. The main drawbacks of the presented method are test verification integrity and RTR timing feasibility. The cause of the former is the technique's testing strategy: Only the inputs or outputs of a CUT can be injected with faults, hence limiting the test spectrum of the circuit to a few injections. The cause of the latter is the lack of any valid comparisons between RTR and its functional equivalent known as compile-time reconfiguration (CTR), which was referenced earlier in this paper as singlecontext reconfiguration [21] . In order to solidify the case toward RTR as a test verification paradigm, it becomes imperative to include the reconfiguration temporal overhead (i.e., the time and area that it takes to inject a fault at runtime) in calculating the figures of merit (e.g., total test time, total mapped area, total routing resources, and total logical resources) and to compare those to the ones that were obtained using static design and test methods.
The main goal of the undertaken project is to validate RTR as a viable paradigm of design and test verification by studying the software, the sequential CTR, the partially parallel CTR, and the sequential RTR strategies of the fault-injection model.
The following analysis and discussion shows the feasibility of RTR test verification in terms of computational speed and circuit area.
III. TEST STRATEGIES
As aforementioned, four different strategies will be explored, in increasing order of hardware integration. The first is a software realization of the fault model, where the CUT and set of aliasing-free compressors [22] are simulated in software in order to extract all the faults of that particular CUT. An aliasingfree compressor is one that does not compromise the CUT's fault-detecting capabilities. Thus, all faults that can be detected at the outputs of the CUT can also be detected at the outputs of the CUT and compressor combined. The compressors are used to reduce the number of system outputs and, thus, allow for a smaller storage area. This strategy is a common method used to prototype fault models, without paying too much attention to real-time interactions or results. This will provide us with our software simulation benchmark results.
The second strategy is a sequential CTR realization of the fault model (Fig. 1) , where the CUT and compressors are synthesized and mapped to the target device, with fault-injection multiplexers (FIMs) (discussed in Section III-B) built into the CUT. This allows us to sequentially perform test verification on the circuit by applying the test patterns and recording the circuit's behavior for each applied test pattern and injected fault. This strategy is equivalent to the contemporary ones used to test circuits after the synthesis and mapping steps. This will provide us with our sequential hardware benchmark results.
The third strategy is a partially parallel CTR realization of the fault model, where n * m * 2 CUTs are synthesized and mapped to the target device, with one particular fault-injection applied to each CUT. Hence, the test patterns are applied to all the CUTs simultaneously, providing us with the circuit's behavior for each applied test pattern only. Note that in a fully parallel strategy, n * m * l * 2 CUTs are synthesized and mapped, where n number of wires in the CUT; m number of test patterns used; l number of available test patterns; 2 number of stuck-at faults to inject. Conversely, in a partially parallel strategy, only n * m * 2 CUTs are required. Hence, if realizing a fully parallel strategy, one CC is required to generate the circuit's behavior for all applications of test patterns and injected faults. This strategy was not performed due to the unrealistic abuse of logical resources, as well as the fact that all the test patterns are not known at compile time, specifically in the pseudorandom test pattern generation (TPG) stage. This strategy is similar, in implementation, to the previous one; however, the FSM does not inject any faults into the CUT, as they are already synthesized into the circuit. This will provide us with our parallel hardware benchmark results.
The fourth and final strategy is a sequential RTR realization of the fault model, where the CUT is synthesized and mapped to the target device, with neither FIMs nor synthesized particular fault injections. Instead, we utilize the RTR technique to insert stuck-at faults at different wires of the CUT and record the circuit's behavior accordingly. This strategy is similar, in implementation, to the second strategy; however, the CUT itself is reconfigured at runtime and, hence, is free of any spaceconsuming FIMs. This will provide us with our sequential RTR benchmark results.
A. Fault-Injection Techniques
In this paper, two fault-injection techniques are utilized. The first involves an FIM and is used in the second and third aforementioned test strategies, whereas the second one involves the RTR of LUT values and is used in the fourth aforementioned test strategy. 
B. FIM
The hardware fault injection technique is imperative in order to iteratively inject faults to every mutually exclusive wire and to test both the stuck-at-0 and stuck-at-1 faults. For that matter, a plan was devised based on Fig. 2 [8] .
As we can see in Fig. 2 , every mutually exclusive wire now has an FIM introduced within it, which allows us to either run the wire as is or inject either stuck-at-0 or stuck-at-1 faults. If the value of the select signals of any multiplexer are at "00," then we will run the wire as is, whereas if the values are at "01," then we will inject a stuck-at-1 fault indicated by the logical 1 value coming into the multiplexer, and if the values are at "10," then we will inject a stuck-at-0 fault indicated by the logical 0 value coming into the multiplexer. Finally, if the values are at "11," then we will again assume normal operation of the wire.
C. Fault-Injection LUT
Fault injection using LUTs is one of the main features of using RTR for fault injections, as different faults can be emulated, for the same CUT, without the need to recompile or download the CUT.
Faults can be injected at the inputs of the LUT, as shown in Fig. 3 . In this figure, the truth table of a four-input LUT is shown in Fig. 3(a) . If a stuck-at-1 fault is to be injected on the F 3 input of the LUT, then the output value of F4F3F2F1 = 0100 must be written to the output of value of F4F3F2F1 = 0000, the value for F4F3F2F1 = 0101 to F4F3F2F1 = 0001, etc. Fig. 3(b) shows how the LUT has to be modified to resemble such a fault. The modified outputs are in bold.
The same can be applied for stuck-at-0 faults (except now, the output value of F4F3F2F1 = 0000 must be written to the output value of F4F3F2F1 = 0100, etc.) and to other inputs. Note that the output column is the actual reconfiguration values of LUT.
Similarly, to inject a stuck-at-1 or stuck-at-0 fault at the output of an LUT, all the 16-bit positions of the LUT reconfiguration vector must be programmed to 1 or 0, respectively. However, to inject a stuck-at fault within the LUT contents, a functionally driven method instead of a line driven one, as in the previous two methods, is required. The number of simulated faults in this method is identical to the possible combinations of LUT active inputs, i.e., a fault can be injected by inverting only the LUT position that corresponds to one-input combination. The total coverage of faults in this method corresponds to the exhaustive LUT functionality test [17] .
Xilinx Virtex-II FPGAs support only four-input/one-output LUT primitives. Those four-input LUTs can be used to implement functions of 0, 1, 2, 3, and 4 inputs, respectively. Multiple LUTs can be cascaded together to form one function. LUTs are represented by 16-bit vectors. However, not all of these bits are used when implementing functions with less than four inputs. For example, a three-input function can be implemented by one LUT. However, only 8 bits are used to represent the function within the 16-bit LUT vector.
It is important to identify the number of inputs relevant for each LUT used in order to include the corresponding faults in the fault list. Some researchers [17] suggested the use of the bit stream file generated by Xilinx tools to extract such information. This can be easily done by comparing the 8-bit vector values when the relevant input is either 0 or 1. If both vectors are identical, then the relative input is not active. However, such information is no longer available for the Virtex-II family; nevertheless, the information can be extracted by reading back each LUT configuration vector during runtime using the internal configuration access port (ICAP) interface. Our approach extracts such information in advance by controlling the way the HDL code is synthesized and LUT are instantiated.
IV. SYSTEM ARCHITECTURE
The realized system consists of two high-level components: the application flow (AF) and the reconfiguration flow (RF), as presented in Fig. 4 . The former is executed as a software program, with some of its parts being executed on hardware blocks (HBs) that are dynamically inserted and removed under RF's control to/from the RPU.
The hardware/software partitioning of AF is performed by a just-in-time (JIT) compiler [23] , which generates both the AF and RF, as well as ensures proper synchronization between the two flows. The system architecture includes the following components:
1) an application element AE, consisting of a soft IP microprocessor that is used as the execution unit for the main system application; 2) a reconfiguration element RE, containing a second soft IP microprocessor that is employed as the execution unit for the RTR of the RPU; 3) an RPU, which is used as the execution unit for the tasks to be mapped directly in hardware; The respective operations of the AF and RF are shown in Fig. 5 .
The ICAP [24] is the FPGA interface that allows for the dynamic partial reconfiguration of the RPU with the appropriate HBs under the control of the RF through the HWICAP controller. While the AF runs on the AE, the RF, which runs on the RE, sends the signal RTR_PREP_HB to the ICAP controller (HWICAP) to start the loading of the first HB bit stream (of length "Size," from the address "BRAM Offset" in OM R ) onto the RPU. Once that HB is ready within the RPU, the HWICAP sends back a Done signal to the RE. The newly placed HB begins execution as soon as it is enabled (EN). Upon completion, the HB sets the flag RTR_DONE to make the AF aware that it is ready for use. Once the AF has prepared the data that the HB needs for computation, it makes the HB aware of this by asserting the DATA_READY flag. The AF continues to run as long as it does not need the results computed by the HB. The latter asserts EXE_DONE when it completes its task and has prepared the results to be read by the AF. When the AF needs these results, it checks the EXE_DONE flag and blocks them if it is not yet set. The AF gets the results and then asserts DATA_ACK to acknowledge to the HB that it has received the valid data. Once this HB is no longer needed, it can be replaced with another one. Furthermore, another HB can be loaded at a different location of the RPU as soon as the current uploading process is completed.
The implementation of this architecture employed several elements that improved the system's speed while observing the platform's technological constraints. Fig. 6 presents the detailed block diagram of the system. Micriµm's µC/OS-II [25] was incorporated as the real-time operating system for both the AE and RE microprocessors (µB A and µB R ). The kernel and the main application code for each processor fit into that processor's local memory (LM). Several OPB slave peripherals were added: timer, interrupt controller (INTC), external memory controller (EMC), general-purpose input/output (GPIO), shared memory controller (CTL), UART, and SnoopP [26] . The latter is a core required to profile the AE at the hardware level.
The system internal communication was improved by employing unidirectional fast simplex links (FSLs [27] ) as intercommunication paths between the RPU and the AE and RE. Note that the Virtex-II that powers the Xilinx Multimedia board [28] contains 56 block RAMs (BRAMs), which is 112 kB of the total addressable on-chip RAM.
V. RPU
The final system architecture, as shown in Fig. 6 , includes an RPU connected to OP B R , which allows for the simultaneous execution of multiple hardware functional units by exploiting the JIT compiler in order to maintain hardware and software flow synchronization. The RPU executes those AF tasks that have been targeted for hardware by employing task-specific HBs. Since the JIT compiler generates the RF at compile time, µB R can accurately schedule the appropriate HB bit streams, stored in OM R , to be loaded onto the RPU through the ICAP controller. Each HB is assigned a preset ID when configured. Since this ID is set by the JIT compiler, both µB R and µB A have a reference to each HB for addressing purposes. Fig. 7 shows the RPU's high-level design. We introduced OP B RP U in order to alleviate a bottleneck between bit stream loading into OM R and data transfer from OM A to the RPU. The dual-ported OM A has one of its controllers residing on the OP B RP U bus, which is a slave to one of the many HB masters of that bus. The FSL interfaces of the RPU are required to allow both µB A and µB R to read/write register values into the appropriate HBs. Furthermore, a local memory LM RP U was added to provide for HB-to-HB communication.
A. RTR
The following analysis and discussion shows the feasibility of RTR test verification in terms of computational speed and circuit area. In this section, we present the realization details of the fourth testing strategy (sequential RTR realization of the fault model), where the CUT is synthesized and mapped to the target device, with neither FIMs nor synthesized particular fault injections. Instead, we utilize the RTR technique to insert stuck-at faults at different wires of the CUT and record the circuit's behavior accordingly. This strategy is similar, in implementation, to the second strategy; however, the CUT itself is reconfigured at runtime and, hence, is free of any space-consuming FIMs. This will provide us with our RTR benchmark results.
For any successful partial reconfiguration design, a strict design methodology should be used. For module-based partial reconfiguration, additional practical guidelines are required, in addition to those specified by the Xilinx Modular Design methodology in [19] , and in [29] , and [30] .
1) Bus macros, as shown in Fig. 8 , are required between active and static modules of the design. 2) The size and location of the reconfigurable module (active) is always fixed. 3) The reconfigurable module is always the full height of the device. 4) All logic resources that are located within the width of the module are considered part of the reconfigurable module's bit stream frame. This includes slices, tristate buffers, BRAMs, multipliers, input/output blocks, and all routing resources. This paper follows the Xilinx guidelines for modular design, including the same directory structure for synthesizing and assembling the design. The basic difference, however, is the use of an embedded processor as the static module within the top level design.
A partial bit stream is generated for the active (dynamic) part of the FPGA. The −g ActiveReconfig:Yes option is required for partial reconfiguration, which means that the device remains in full operation while the new partial bit stream is downloaded. Failing to utilize this command will assert the global set reset (GSR) during configuration, resetting the entire design. The full bit stream configuration must already be programmed into the device before downloading the partial bit stream. Multiple bit streams can be generated for every partially reconfigurable module variation. Fig. 9 demonstrates the FPGA layout, including the two softcore Microblaze [31] processors, the block RAMs, and the bus macros that allow for communication between the static and active blocks.
This paper was performed utilizing v7.1i of both the EDK and ISE tools. The prototyping platform was the Xilinx Multimedia Board, which carries a XC2V2000-FF896-4 Virtex-II FPGA. The design station was the IBM IntelliStation Z Pro, which possesses a 3.6-GHz Intel Xeon processor with 2 MB L2 cache and 2 GB of system memory. See [32] for more details on our customized flow for RTR implementation utilizing the aforementioned prototyping platform.
B. RTR BIST Methodology
Similar methodology to the one described above, as well as the LUT manipulation described in [20] , is used; however, in this paper, the active and static logic areas have adjacent boundaries and are separated by bus macros. As in any BIST circuit, the pattern generator and analyzer modules are used to stimulate and analyze the CUT, respectively, and will be located in the static logic area, connecting to the CUT only through bus macros. This logic separation will standardize the CUT interface logic and simplify testing multiple CUTs using the same static (embedded processor) configuration. The self-reconfigurable platform (SRP) [24] , which is provided by Xilinx, is utilized in the development of a runtime reconfigurable BIST technique. The SRP consists of a softcore Microblaze processor and the aforementioned ICAP. The SRP API enables fine reconfiguration control down to the frame level (there are multiple frames per CLB column). Clear boundaries are drawn between the reconfigurable circuit and the rest of the FPGA. Fig. 10 demonstrates the architecture for constructing a runtime reconfigurable BIST methodology, which includes the following components: 1) the SRP, which consists of the ICAP core, and its associated BRAM and Microblaze, as indicated in [24] ; 2) the pattern generator, which can be in any of the three following configurations: deterministic, pseudorandom, and exhaustive; 3) the active block, which contains the CUT or series of CUTs; 4) the analyzer, which consists of a module based on linear feedback shift registers. 
VI. RESULTS AND DISCUSSION
The two experimental setups were executed on the aforementioned Xilinx Multimedia Board using the ERACE architecture. We executed four different digital circuits: c17, c432, ec13, and ec37. The former two circuits (i.e., c17 and c432) are ISCAS'85 benchmark circuits. The latter two circuits (i.e., ec13 and ec37) are circuits used in the literature to give a variety of testing features, since the gap between c17 and c432 is large (with respect to number of gates and number of lines). The ec13 and ec37 circuits do not retain all of the aforementioned software performance of measures. Nevertheless, these circuits illustrate the hardware circuits' modularity in executing different circuits.
A. Sequential CTR Results
The experimental results obtained from hardware sequential CTR testing of a deterministic and pseudorandom TPGs of the aforementioned circuits are presented in Table I . Since hardware is best measured with CCs, we have employed a clock counter to keep track of the number of CCs needed to complete testing. The CC and its equivalence to time with respect to the clock speed (50 MHz) are presented.
Our experimental results have shown that sequential CTR is 1000-5000 faster, as expected, than its software equivalent, using two widely used fault simulation tools: ATALANTA and FSIM. Note that using our methodology augments software simulations by providing a localization table, which indicates the location of the fault and, hence, localizes the signal (wire) where the fault occurred and if an s-a-0 (stuck-at-0) or s-a-1 (stuck-at-1) caused the fault to be detected.
B. Sequential RTR Results
In the case of the sequential RTR testing strategy, we have extracted the reconfiguration overhead timings. Using the difference-based partial reconfiguration technique [19] , it was found that in order to reconfigure a single LUT, two steps are required: a readback and then a reconfigure. The first was performed in 3862 CCs, whereas the latter was performed in 4132 CCs, translating to a 77.24-µs delay and a 82.64-µs delay, respectively, given a 50-MHz global clock frequency. On average, it takes about 160 µs to change the contents of an LUT. Note that for the FPGA in use (Virtex-II, XC2V2000-FF896), the atomic unit of reconfiguration is the configuration bit stream frame, which varies in size, depending on the target device, and is column based. Hence, even changing one LUT requires two frames of data: one for readback and another for reconfiguration. In our device, there are 22 frames per CLB column and 64 frames per BRAM column, with each CLB column containing 48 frames.
Note that since we have eliminated the FIMs and their associated control logic, the circuit area has decreased by as much as 60% while maintaining the desired functionality. This is another major advantage of the sequential RTR technique, which allows for the utilization of the same logical area for multiple tasks: in this case, the sequential deterministic or pseudorandom testing of multiple circuits.
VII. CONCLUSION AND FUTURE WORK
In this paper, a novel multiprocessor architecture was introduced. It is targeted for BIST algorithm development and testing and is capable of handling variant functions (arithmetic, DSP, etc.), all at runtime. A task is available for the RPU to use once it is implemented in an HDL, and its respective bit stream is generated. The entire system was implemented on a system-on-chip (SoC), with two softcore microprocessors handling all real-time deadlines and an RPU, allowing for the parallel execution of multiple hardware functional units: in this case, the CUTs, with the help of a JIT compiler that manages both the AF and RF.
A future design would allow the user more flexibility by reconfiguring the processing units, depending on the computational and functional needs of the intended application. This would target low-power operation, as idle HBs are swapped out of the RPU, as well as real-time application adaptation, as updates to HBs can be reconfigured on-the-fly, whereas novel HBs can be downloaded for trial purposes, as is currently done on the Internet. This allows BIST algorithm developers to remotely upload and test their novel methodologies using high-speed testing strategies. The future computer will immensely benefit from the addition of an RPU to complement the extremely fast and efficient yet inflexible contemporary processors.
