Abstract-This paper presents a mixed-grained reconfigurable VLSI array architecture that can cover mission-critical applications to consumer products through C-to-array application mapping. A proof-of-concept VLSI chip was fabricated in 65nm process. Measurement results show that applications on the chip can be working in a harsh radiation environment. Irradiation tests also show the correlation between the number of sensitive bits and the mean time to failure. Furthermore, the temporal error rate of an example application due to soft errors in the datapath were measured and demonstrated for reliability-aware mapping.
I. INTRODUCTION
With the exponential rate of enhancement both in transistor performance and integration scale, VLSI systems have been driving advancement of information systems. People and society, in the meantime, have been more and more dependent on the services provided by the systems, and now it becomes a societal requirement to guarantee the reliability of the information systems and that of the VLSI systems accordingly.
On the other hand, as VLSI technology advances, nonrecurring engineering (NRE) costs are significantly elevating, and nowadays only the very highest-volume applications can accommodate the high NRE cost of custom SoC design. Therefore, reconfigurable VLSIs are drawing much attention as an alternative of SoCs development not only for smallvolume consumer products but also mission-critical applications, such as space and medical ones.
Coarse-grained reconfigurable architectures (CGRAs) are inherently superior in soft error immunity to FPGAs, since the amount of configuration bits is by orders of magnitude smaller than that of FPGAs. Although there are several CGRA proposals for reliability (e.g. [1] [2]), none of them have been validated on silicon except [3] . [3] successfully demonstrated a trade-off between soft error immunity and area; however, its compatibility with design tools is not well established.
In this paper, we propose a mixed-grained reconfigurable array with flexible reliability supporting C-to-array mapping, and we confirm its reliability for radiation with a 65nm prototype chips. We have developed a reliability-configurable array in which the reliability level for each processing element can be flexibly chosen depending on applications and environments. This enables system designers to systematically trade off area for improving the soft error immunity without having deep knowledge of reliability enhancement techniques. In addition, high-level design automation tool can be utilized to map an application from standard C programs taking into account the given reliability requirement. To support such a design tool [4] [5] , the proposed architecture introduces 1-bit processing elements into the CGRA for implementing state machines. The proposed architecture was fabricated with a 65nm CMOS technology, and its reliability for soft error was clarified in the following two ways. A live video demonstration was performed showing data processing at different reliability levels with the presence of radiation. Second, quantitative data of soft error rate was obtained using alpha-particle irradiation experiments and an FPGA.
The rest of this paper is organized as follows. Section II presents the proposed mixed-grained architecture, and Section III explains the silicon implementation of the architecture. Experimental results including irradiation tests are shown in Section IV and concluding remarks are given in Section V.
II. PROPOSED ARCHITECTURE
The proposed architecture consists of coarse-grained arithmetic logic unit (ALU) clusters, fine-grained lookup table (LUT) clusters, and memory clusters, where the basic element is noted as cluster. The reliability level of each cluster can be configured individually for improving reliability-area trade-off with selective redundancy [6] . The basic architecture for ALU and LUT clusters is shown in Fig. 1 . Twenty LUT clusters are organized in a two dimensional array forming a LUT block. LUT blocks, ALU clusters and memory clusters are placed in a two-dimensional array (Fig. 1) . As for dynamic reconfigura- ALU cluster includes three 16-word instruction register files (InstRF) for state-wise execution module (EM) reconfiguration, three interconnect registers (InterRF) for context-wise interconnection, and a constant register (Fig. 2) . The EM performs logic operations, shifting and clamp functions in addition to arithmetic operations such as 16-bit multiplication, 16/17-bit signed and unsigned addition and subtraction. The reliability levels, which are enabled by redundancy control unit (RDU) and comparing and voting unit (CVU), are summarized in Table I . TMR, SMS and SMM offer different immunities to soft errors (SEU/SET: single event upset/transient) and capabilities of dynamic reconfigurability (#contexts). In TMR level, both InstRF (configuration information) and EM (datapath) are redundant, only InstRF is redundant in SMS, and both InstRF and EM are singular in SMM. InterRF is protected by ECC (error correction code) at all three levels. All the register files are overwritten through ECC decoders or voters with a refreshing clock preventing error accumulation. The refreshing clock frequency trades reliability for power consumption.
The EM in LUT cluster is a 4-input LUT suitable for 1-bit processing, such as flag operations and state machine implementations. LUT clusters can be cascaded to form a larger LUT, and provides reliability levels of TMR and SMM.
As for memory cluster, while it contains only a dual-port SRAM macro protected by ECC, it is designed to be compatible with interconnect structure. There are two reliability levels; one provides dual-port access while the other gives single-port access. In the latter case, the other port is used for periodical refreshing to avoid error accumulation. Figure 3 illustrates the mixed-grained array consisting 26 ALU clusters, 6 memory clusters and 4 LUT blocks with the chip layout. A customized version of Cyber Workbench [4] [5], which is a commercially-available behavioral synthesis tool, synthesizes an RTL implementation from a C program taking into account reliability specification given by [6] . Finally, the RTL implementation is mapped on the array through technology mapping and P&R.
III. IMPLEMENTATION
The array was fabricated in 65nm 12ML CMOS process. The die size is 4.2×4.2mm
2 . Loading the configuration bits is performed through a scan-chain consisting of 165,312 FFs. ALU, LUT and memory clusters include 120k, 4k and 99k gates, respectively.
Thanks to dynamic reconfiguration using states generated by an FSM, area-efficient mapping becomes possible. For example, an edge detection filter can be implemented with 25 ALU clusters, while 62 ALU clusters are necessary without dynamic reconfiguration (i.e. in a single state). The number of clusters is reduced by 60%. The proposed architecture is consistent with such latency-area exploration in behavioral synthesis. 
IV. MEASUREMENT RESULTS
This section presents measurement results of the fabricated prototype chip. We first evaluated the maximum speed of the state and context distribution using a PLL on the chip, since the maximum frequency is limited by the distribution of state and context signals. Results show that the maximum frequency possible to propagate state signals is 240MHz, and it is 165MHz for context signal. In the following subsections, results of irradiation testing are described.
A. Demonstration
To validate the functionality and reliability of the architecture, a demonstration using two mappings of SMM and SMS was performed (Fig. 4) . The chip receives a live data stream from a video camera, processes it, and sends it to a monitor demonstrating the processed stream. The mapping of negative filter was generated from a C source code.
A snapshot of the results is shown in Fig. 5 . After positioning an Am-241 alpha foil whose flux is 9 ×10 9 cm −2 h −1 over the chip [7] , it was observed that SMS mapping continued to output the processed video as expected. On the other hand, SMM mapping got destroyed in 2 s due to SEUs in configuration registers and video processing stopped within 10 s in all four trials. We also tested TMR mapping and confirmed the expected continuous functionality. The proposed architecture thus enables reliable operation even under irradiation through application mapping.
B. Irradiation experiment
To quantitatively evaluate the immunity to soft errors, we carried out experiments of alpha particle irradiation. Figure 6 illustrates the test configuration. The ALU clusters on the array are serially connected to compose a pipelined chain. For simplicity, NOT operation is selected as the ALU function. We also implemented the same function as a golden circuit on an FPGA. A PC gives the same input patterns to the array and the golden circuit. The outputs of the array and the golden circuit are compared on the FPGA, and the inconsistency between them is detected as an error. Here, there are two types of the errors; a permanent error and a temporal error. The permanent error happens when the configuration information is corrupt due to SEUs in the configuration memory and the circuit functionality becomes wrong. On the other hand, a temporal error originates from SEUs and SETs in the datapath. In the current configuration, this temporal error does not accumulate in the datapath and is eventually flushed out from the pipeline.
To distinguish the permanent errors and the temporal errors, we implemented an error analyzer on the FPGA. If the inconsistency at the outputs continues for five clock cycles, we regard this situation as a permanent error. When the inconsistency lasts for less than five cycles, we judge that there is a temporal error. We count the numbers of permanent and temporal errors. In the case of the permanent errors, we need to reload the configuration information to eliminate the SEUs in the configuration memory, and Config. loader fills this role.
We first evaluated, using the scan chain, the number of SEUs accumulated in the configuration memory when a permanent error is detected. In the configuration memory, there are don't care bits which do not affect the functionality, and hence the number of accumulated SEUs could be larger than 1. Figure 7 shows the histogram. We can see that the number of accumulated SEUs less than 50 is most frequent, but there are cases having a large number of SEUs accumulated before a permanent error arises. To accurately estimate the soft error rate of reconfigurable devices, we need to pay attention to these don't care bits.
We next evaluate the MTTF (mean time to failures) focusing on the permanent error, where the temporal errors are not regarded as failures in this evaluation. A refreshing clock of 15MHz is given to eliminate SEUs from TMRed memory (InstRF) and registers with ECC (InterRF). We prepared five array configurations; all SMM, SMM/SMS, all SMS, SMS/TMR and all TMR. In the configuration of SMM/SMS (SMS/TMR), 50% of the ALU clusters are in SMM (SMS) level and the others are in SMS (TMR) level. Note that in all TMR, voters and inter-cluster connections are triplicated and hence there is no single point of failure. Table II lists the measured MTTF and the number of permanent errors observed in about 300-s radiation. Table II also includes the number of sensitive bits, where a configuration memory element that impacts the primary output of a particular design is defined as a "sensitive bit". Thus, the don't care bits are not included in the sensitive bits. In all SMM configuration, the number of sensitive bits is 1575, and it is the largest. On the other hand, the MTTF is 1.51 s, and it is the shortest. When SMM/SMS configuration is selected, the number of sensitive bits is reduced to 819, and consequently the MTTF is extended to 3.8 s. On one hand, all SMS, SMS/TMR and all TMR configurations do not include any sensitive bits, and for these no permanent errors were observed. These results clearly show that the MTTF is strongly dependent on the number of sensitive bits, and the system designer can trade area with robustness to soft errors guided by the number of sensitive bits. We finally evaluate the temporal error rate. While all SMS configuration is expected to have very few permanent errors as explained, the datapath is not redundant and temporal errors are supposed to happen. We counted the number of temporal errors in all SMS configuration. During 19-min irradiation, 1,109 temporal errors were observed and the rate is 0.95 bitflip per second. Assuming a standard package whose alpha emission rate is 10 cm −2 h −1 , this rate corresponds to 4.3 ×10 3 FIT in a nominal environment. If this FIT rate is not acceptable, the system designer needs to use TMR level. On the other hand, if this FIT rate is acceptable, SMS level is a good choice for area efficiency. Thus, the proposed array can be used for various environments and reliability specifications.
V. CONCLUSION This paper presented a 65nm silicon implementation of a mixed-grained reconfigurable array which supports C-to-array mapping and flexible reliability. We irradiated the fabricated test chip with alpha foil and confirmed that the application mapped for high reliability were keep functioning while the application mapped for ordinary reliability stopped due to permanent errors in configuration memory. In addition, we quantitatively evaluated the soft error immunity of each reliability level, and showed a concrete FIT number originating from SEUs and SETs in the datapath. These evaluation results give underlying characteristics of the array, which are indispensable in reliability-aware mapping.
