Abstract-The Xilinx Virtex-5QV is robust to configuration errors, but will experience Block RAM upsets. Various approaches to scrubbing Block RAM to support the Xilinx MicroBlaze soft processor were tested at LANSCE. Use of the embedded ECC blocks for system memory is contrasted with soft ECC blocks.
I. INTRODUCTION
Los Alamos National Laboratory provides wide-band Radio Frequency (RF) sensors for satellite applications. We are currently in the design phase of a new generation of RF sensors based on Software Defined Radio (SDR) concepts. Instead of using complex and inflexible analog electronics, these new instruments are based on Digital Signal Processing (DSP) approach, using Field Programmable Gate Array (FPGA) technology to provide the compute power. Each instrument consists of two FPGA-based processing cards based around a Xilinx Virtex-5QV SRAM-based FPGA. This marks the latest iteration of a long history of LANL use of reconfigurable processing in space, from the Virtex-I-based Cibola Flight Experiment [1] to the Virtex-4-based MRM [2] .
While the configuration memory of the Virtex-5QV is quite robust to Single Event Upsets (SEU)(approximately 5 upsets/device year), the embedded Block RAM (BRAM) on the device is susceptible to SEU. The Block RAM is important because we intend to use the Xilinx MicroBlaze soft-core processor to control the embedded firmware, as well as do mission data processing. Because the MicroBlaze uses the Block RAM for all of its instruction memory, as well as the storage of stack and heap space it is important that it be relatively robust from SEUs. The estimate from the Xilinx Radiation Test Consortium (XRTC) estimates 12 bit errors/day in GEO [3] . While the BRAM space required for the flight software is at most 1/10th of the device, the remaining 1 bit error/day still must be addressed.
Error-Correcting Codes (ECC) is a viable mechanism to protect the memory from upset. In this scheme, extra bits are provided that can be used to recover corrupted data. In this paper, only Single-Error-Correct/DoubleError-Detect (SEC/DED) schemes are considered. The Block RAM is not the main data storage resource on the board, we include a bank of DDR1 DRAM from 3D-Plus for bulk data storage. The same problems would apply if we were using the Block RAM solely as cache, although it would have a native mechanism to simply replace the cache block if the ECC failure was detected. In this paper, we describe three variants of the ECC memory module, including default user-logic ECC and two variations on the embedded Block RAM scrubber. One includes an auto scrub function that scrubs a bank of Block RAM in parallel, allowing the entire memory to be scrubbed every 512 clock cycles, and a second, lower power, less intensive approach that allows the software itself to control when and how it scrubs. We then present results from 84 hours of neutron beam time at LANSCE.
II. MEMORY ARCHITECTURE
The use of ECC on memory is certainly not novel. However, using the hard ECC block in the Virtex-5QV as the main memory for the MicroBlaze has not been documented, as far as we know. There are several supported mechanisms for providing error correcting codes in an embedded microcontroller. The Xilinx Embedded Development Kit (EDK), otherwise known as Xilinx Platform Studio, provides some free and supported options for commercial-grade parts, but these options do not make use of the special ECC scrub extension. The basic MicroBlaze BRAM controller can provide native ECC support; however this approach is implemented entirely in user logic and requires software-driven touching of each address. The ideal approach is to use the embedded Block RAM ECC function. This is an ASIC extension to the Block RAM that provides ECC functionality. However the scrubbing function does require a lot of manual intervention as well as one of the read ports of the BRAM. This makes it tricky to utilize in a softprocessor system, particularly the MicroBlaze, because the processor requires two ports into the same memory for the instruction bus and the data bus.
When using the EN ECC SCRUB function on a RAMB36SDP Block RAM component, it becomes a "simple dual port RAM", that is, instead of having two Read/Write ports, it has a write port and a single read port. These must be shared by time multiplexing the ports over multiple cycles, while producing the illusion to the microprocessor of two dedicated ports. The two cycle approach is important for sharing the instruction bus and the data bus, but it is also important for allowing time for a read-modify-write (RMW) cycle. The Xilinx Block RAM ECC Block is dependent on a 64-bit write. That is to say writes of less than 64 bits are not supported, because 64 bits are required to compute the ECC. Thus convenient functionality like byte enables are impossible to implement with ECC without RMW. While most peripheral control and control flow code requires only 32 bit writes, many useful functions, like printf, requires byte enables. This requirement forced us to move to the read-modify-write. The RMW approach is also used in the Xilinx ECC core provided with Embedded Development Kit, but as it is entirely implemented in user logic it is more expensive than the embedded ECC approach. As the Block RAM with EN ECC SCRUB is a "simple dual port RAM", there are no ports open for the scrubber. The mechanism we explore in this paper is to "double clock", that is clock the Block RAM at twice the normal rate, and multiplex the instructions and data ports so that another port can be left open for the scrubber functionality. Fig. 2 illustrates the shared-bus timing of the hard ECC scrubbing infrastructure. There are three read addresses active in any logical cycle (which takes two Fig. 2 . Timing of the shared bus structure. The RAMB36SDP with EN ECC SCRUB provides one read port and one write port. This must be shared between a scrubbing port, and two independent data and instruction buses. The MicroBlaze DLMB data bus gets the first cycle to read, and the MicroBlaze ILMB instruction bus gets the second cycle. The instruction bus is read-only, but the data bus requires two cycles to execute a read-modify-write to provide byte enabled writes, along with the required 64 bit write for the ECC function. The scrubbing port gets the first read cycle only if the R/W DLMB is not active double-frequency clock steps, n and n + 1): the data bus (R/W) on address x, the instruction bus on address y, and the "auto-scrub address", z. The MicroBlaze DLMB data bus gets the first cycle to read, and the MicroBlaze ILMB instruction bus gets the second cycle. For data bus writes, the word read in the first cycle gets modified with the byte-enabled input data and written back in the second cycle.
A key architectural tradeoff in sharing three logical ports across one physical read port is that the data and instruction bus must return immediately, but the autoscrubber has relaxed constraints. That is, when the data bus makes a read or write request, the ECC scrub is applied to the data bus's address, not the "autoscrub" address. This only applies to the particular RAMB36SDP component, not the memory module as a whole; thus, very few addresses get skipped consistently. We were able to measure the duty cycle of the data bus by instrumenting the memory interface with Xilinx Chipscope. In a representative C code on the MicroBlaze, the DLMB bus is active only one of every seven cycles, meaning that the auto-scrubber gets the bus 6/7ths of the cycles. For the non-autoscrub version, simply reading the address in software causes the scrub to take place. The relative benefit of various scrubbing schemes has been well thought through in [4] .
The memory module is composed of 32 individual Block RAMs. Each Block RAMs receives the lower bits of the address and the data outputs are selected by the most-significant bits (MSB). This allows each memory block to access their memory separately from the module as a whole. This approach allows us to have an individual scrubber for each Block RAM, this is made possible because the ECC is part of the Block RAM itself. All that is required is to provide the address and the read-modifywrite functionally to the Block RAM. Each individual Block RAM can otherwise take care of itself. The aspect ratio of the Block RAM is such that there are a total 512 individual addresses meaning that the entire memory can be scrubbed every 512 cycles if the data bus is not active (of course, the interface is clocked at twice the rate of the system, so from another perspective it requires 1024 cycles).
III. EXPERIMENT SETUP
The Block RAM test executed as follows. As illustrated in Fig. 6 , the LANL-designed Virtex-5QV board was centered in the neutron beam. The external connections to the FPGA board were solely power and JTAG. All configuration and telemetry were handled solely via the JTAG interface. The JTAG cable is connected to a SPARC Leon-based JTAG configuration scrubber. The scubber is similar to [5] , but handled over JTAG rather than SelectMAP (the Micro-TCA backplane provides JTAG by default). This interface provides flight-like scrubbing of the configuration memory of the Virtex-5QV FPGA. This is the approach intended for flight. We compared the virtues of internal vs external scrubbing, in much the same vein as [6] . However, the Leon-based JTAG is responsible for managing application bitstreams and software, having it scrub causes less duplication of functionality. The expected configuration upset rate is low enough that the speed difference between scrubbing with SelectMAP or JTAG should not have a significant impact. This scrubber assembly was then connected via ethernet to a host computer providing configuration streams, software control, and telemetry logging.
Configuration of the test unit included bitstream programming, as well as loading the software onto soft processor also via JTAG. Using various custom design scripts, the soft processor is loaded via the hardcoded JTAG sequences, and the processor is stopped and started as well via the JTAG interface. Telemetry is provided through JTAG as well. This was realized by implementing the XMD commands memory read and memory write, as well as native Xilinx user register command sequences to read BSCAN-based user logic registers. The memory read approach allowed us to interact directly with the software and not have to bring out telemetry to user-logic registers in the firmware. This also allowed us to provide embedded watchdog functionality by being able to directly check registers while the software is running (XMD commands do not work while the processor is executing). The combination of these approaches allowed us to both prove that the software was successfully executing as well as detect anomalous behavior and provide ongoing scrub and error upset counts.
The test set-up merits some discussion. We had 10 boards under test in the beam simultaneously, run by ex- Fig. 3 . Flowchart of test procedure. All configuration and telemetry retrieval accomplished through JTAG operations, scrubbing handled only by user logic.
perimenters from various universities with multiple test approaches. Neutron counts were tabulated by a small embedded system we developed at the beam that simply captured the detection pulses off of the test equipment and tabulated the counts. The counts are timestamped and broadcast to the assembled experimenters via UDP. This was a nice approach that provided everyone with the data every second, allowing the neutron counts to be integrated into their test logs on the fly, rather than requiring significant preprocessing to correlate their experiment behavior with the neutron counts.
For each scrub cycle the following procedure took place (as illustrated in Fig. 3 : first, the system usr access register was queried to determine if the FPGA was programmed. If not, a new copy of bitstream was sent to the FPGA. If the FPGA was already programmed, the software for the Microblaze is uploaded via an ELF file translated into a sequence of JTAG commands. The MicroBlaze is commanded to start execution (the reset and run commands). One of the first tasks the MicroBlaze executes is resetting a watchdog timer implemented in user logic. The test host checks via JTAG to see that the watchdog has been set proving that the MicroBlaze has executed the software correctly to that point. The software then continues independently writing a test sequence into all of the Block RAM modules.
Next, the software enables the hardware scrubber and then goes into loop scrubbing all the RAMs sequentially. The scrubbing loop continues for about 15 seconds at which point the test host stops the processor and reads out the telemetry from both the user logic registers and software counters. These software counters additionally have a header that provides some confidence that the Fig. 4 . Large memories are generated by stacking wide, shallow BRAM blocks together and using most-significant bits (MSB) for address decoding. Address decoding externally, at user logic speeds, but ECC is internal to BRAM block and operates at ASIC speeds Address decoding happens internally to BRAM block, at ASIC speeds, but ECC is external at user logic speeds Address decoding externally, at user logic speeds, but ECC is internal to BRAM block and operates at ASIC speeds Fig. 5 . Large memories are generated by concatenating high depth, narrow BRAM blocks together. Address decoding happens internally to BRAM block, at ASIC speeds, but ECC is external at user logic speeds correct memory address is being read as well as that the software had correctly written those addresses. The MicroBlaze software regions were protected by the autoscrub ECC memory. All of the telemetry is logged to the test host after each pass. Each of these cycles takes about 15 seconds, and during the test in the neutron beam we receive an upset approximately every seventh reset cycle.
IV. ARCHITECTURAL CONSIDERATIONS
While both approaches to instruction and scrubbing are valid and functional, even at high event rates, the differences come out in the resource costs of each implementation, power, and effects on the critical path.
The default microblaze ECC mechanism uses a user logic ECC. However, address decoding happens internally to BRAM block, at ASIC speeds. Xilinx BRAMs can internally produce various aspect ratios, for example partitioning the 18k or 36k block into a byte wide by 2k or 4k deep memory. The deep memories are then concatenated together, which is a free operation compared with an address muxing scheme. For larger memories, this appears to have a significant positive effect on the critical path compared with user-logic address decoding.
The hard-ECC blocks require that the memory width be 64 bits. This leads to a depth of 512 words per block. Combining these blocks together into a usable memory requires extra logic, and the datapath costs appear to outweigh the savings from an ASIC ECC. For this reason, we targeted a 50MHz clock for all tests. Another hybrid approach that we did not test is using only the hard ECC from the BRAM block, but not the rest of the BRAM. In this case we get the benefits from the deep BRAM block, but also the ASIC scrubber. The downside with this approach is a 64 bit word still needs to be presented to the scrubber, the memory still has to be configured wider than necessary and then selected to a usable width.
Xilinx does not allow the use of MIG (Memory Interface Generator) cores on the Virtex 5 QV. This is enforced through a tool check that detects the MIG signature. We pursued a vendor solution for the memory interface, choosing a DDR1 core from Northwest Logic. Unfortunately, the Northwest does not support PLB (the Xilinx standard bus for Virtex 5), only supporting AXI. Xilinx does not 'support' AXI on the Virtex 5. However, there is no architectural or component missing, the problem is actual the older HDL parser used in XST for Virtex 5 parts. Simply adding "-use new parser on the XST command line for the AXI components solves the problem. Moving the microprocessor bus to AXI allows use of some new components as well as providing a path towards reuse of IP in future architectures.
The reader will notice an emphasis on use of JTAG for all programming and telemetry operations. The flight computer interfaces with the V5 signal processing system solely through JTAG and spacewire. The configuration scrubber is a JTAG-based frame scrubber with an external CRC codebook. For flight operations, we have a choice between attempting to recover or resetting the entire board. Recovery implies scrubbing the upset frame and allowing the signal processing filters to flush out errors. Resetting everything takes 5-10 seconds and includes a full reprogramming of the bitstream, resending operating parameters from the flight computer, and resyching both signal processing boards.
The control of the MicroBlaze processor via JTAG is a bit unique, as no one else has published the protocol for controlling the processor over JTAG. Xilinx provides the capability through the Xilinx Microprocessor Debugger (XMD) tool, but the protocol is not documented. Through the use of XMD's XSVF export functionality, we were able to isolate and decode the protocol used by the MicroBlaze for the memory read, memory write, stop, run, and reset commands. This provides a very convenient approach to controlling a MicroBlaze without JTAG cable and XMD installation.
All of the xmd commands are based on the following process: turn on SVF writing in xmd, do some example commands, figure out what bit does what.
Example capture in xmd:
connect mb mdm -cable type xilinx_svffile fname output.svf -configdevice devicenr 1 partname XC5VFX130T irlength 14 idcode 0x23300093 -debugdevice devicenr 1 cpunr 1 cpu_version microblaze_v72
(microblaze v72 fixes endiness reversal in elf)
Interestingly, there are variations in syntax as the required address length changes. That is, when the address bits are zero above the 16th bit, the SDR command is only 40 bits, but when you attempt to access higher memory, eg, mrd 0x10000000, the SDR shift increases to 72 bits. I have defaulted to the larger shift for convenience at some expense of bandwidth.
We also use Xilinx XAPP058 [7] , the XSVF player implementation ported to the Leon 3 processor and VxWorks. XSVF files can be generated by the EDK tools to load software to the MicroBlaze. The XSVF functionality is probably better for XMD commands such download (software initialization). At the moment there is no way to discern if the processor is stopped except to check some known register for valid data.
V. EXPERIMENTAL RESULTS
We tested all three approaches in the Los Alamos Neutron Science Center (LANSCE) neutron beam for 84 hours. During this period of time, we detected over 8,000 total upsets in the Block RAM cells but never encountered a double bit upset. This is the ideal case for operating through errors, because single-bit errors can be corrected on-the-fly, but double-bit errors corrupt data.
This was an ideal upset rate for determining if the scrubber is working at a level that will be functional for flight, where we do not expect to anywhere near that rate of upsets, in fact only 12 upsets per device day 1 Single-and double-bit errors were recorded out of the ECC module, and a software comparison against a known pattern provided detection against any other types of errors. No data errors were detected during the 84 hours, and in no case did we detect a situation where the ECC was not successful in detecting and correcting the single bit error. We did not detect any entire BRAMclearing events [8] .
The BRAM-clearing event is thought to be a configuration upset that incorrectly enables the write enable signal for one of the write ports on a BRAM. This behavior is catastrophic for a microprocessor as the processor instructions and data are suddenly injected with whatever happens to be on the bus. However, behavior is not specific to the MicroBlaze, or any microprocessor, as it is a configuration susceptibility. Some detection or protection mechanism could be attempted but the configuration bits close to the BRAM are always susceptible.
The board was supported by a CommAgility test fixture (Fig. 6 ) and was fixed normal incidence to the beam at 2.4m from the beam cap. This suggests an attenuation to 79% of the initial fluence. The total fluence at the beam cap was 2.67E+11 neutrons/cm 2 , or 2.11E+11 neutrons/cm 2 at the Virtex-5QV board. The FPGA was kept at ambient temperatures through the use of a heatsink and fan and was run at nominal voltage provided by an external voltage regulator. The neutron cross section per bit for our tests was 1.28E-14 cm 2 /bit ± 4.547E-16 cm 2 /bit for both of the hard scrubbers, and 9.64E-15 cm 2 /bit ± 1.1E-15 cm 2 /bit for the soft scrubber. The hard scrubbers used 32 Block RAM units, and the soft scrubber used 4, plus a portion of another for the ECC bits. The BRAM used in the test represents about 30% of the device; it was difficult to meet timing at higher utilization (MicroBlaze running at 50 MHz, the BRAM internally running at 100 MHz).
Only considering SEU, it would take about 200 seconds at the observed LANSCE flux for two upsets to occur; based on eqn 17 of [9] , this produces a data error probability of 6E-5. Thus, even at accelerated beam test scrubbing every 10 seconds is far in excess of 1 The BRAM scrubber was also tested in a heavy ion beam at Berkeley, and even at low fluence, the upset rate was far higher, and also included many multi-bit upsets. Multi-bit upsets are a problem for two reasons: one, there is no recovering from a multi bit upset even with ECC. The only recovery is for that data line to be rewritten via a external configuration host. Unfortunately, there was a flaw in the scrubber telemetry that caused the telemetry counts for the upsets to increase uncontrollably when a multi bit upset was detected. This removed the value of the telemetry, except for indicating that a multiple bit upset had in fact occurred. While we did not see any Multiple-Bit Upsets (MBU) in a given memory line we did notice approximately 10% of the errors were detected on the same scrub lap of memory (see Fig. 8 ). This implies that many of those upsets detected on a single scrub pass were in fact multiple-cell upsets.
This suggests that Xilinx has successfully applied interleaving to the Block RAM to reduce the incidence of MBU causing multiple errors per row that would otherwise cause the ECC to fail. In [10] , we stated that the "MCUs are 6%±0.55% of all events at 65MeV and 10%±0.22% of all events at 200 MeV at a normal incidence." Our test data from LANSCE indicates that the percentage of MCUs in white spectrum neutron environment correlates well to our higher energy proton test results. The earlier data also indicates that nearly all MCUs should be 2-bit upsets (8% of all events) or 3-bit upsets (1% of all events), which also correlates to our LANSCE data results.
VI. CONCLUSION
We tested several variations on a Block RAM ECC core for the MicroBlaze soft processor. All approaches were successful at the upset rates we experienced at LANSCE. In general, the test was considered a successful checkout of several approaches, all are considered 2 . Because the upset rates during flight are not expected to be more than one per hour, a modest scrub rate is required and a pass can be executed during periods where the processor is not busy with mission data. Given the successful testing of all of the approaches, the determining factor becomes the effect on the critical path. Based on this metric, the built-in "soft" ECC has the least effect on the clock rate, and is also the most easy to integrate into a system.
