Due to harsh and inaccessible operating environments, space computing presents many unique challenges and constraints that limit onboard computing performance. However, the increasing need for real-time sensor and autonomous processing, coupled with limited communication bandwidth with ground stations, is increasing onboard computing demands for next-generation space missions. Because currently available space-grade processors cannot satisfy this growing demand, research into various processors is conducted to ensure that potential new processors are based upon architectures that will best meet the computing needs of space missions. Device metrics are used to measure and compare the theoretical capabilities of processors based upon vendor-provided data and tools, enabling the study of large and diverse sets of architectures. Architectural tradeoffs are determined that can be considered when comparing or designing space-grade processors. Results demonstrate how onboard computing capabilities are increasing due to emerging architectures that support high levels of parallelism in terms of computational units, internal memories, and input/output resources; and that performance varies between applications, depending on the compute-intensive kernels used. Furthermore, the overheads incurred by radiation hardening are quantified and used to analyze low-power commercial-off-the-shelf processors for potential hardening and use in future space missions.
I. Introduction M
OST currently available space-grade processors are the result of commercial-off-the-shelf (COTS) processor architectures being selected for radiation hardening and use in space missions. Because creating space-grade processors is a lengthy, complex, and costly process, and because space mission design typically requires lengthy development cycles, there is a large and potentially growing technological gap between space-grade and COTS processors that results in limited and outdated processor options for space missions.
Although current space-grade processors increasingly lag behind the capabilities of emerging COTS processors [1] [2] [3] , computing requirements for space missions are becoming more demanding due to the increasing need for real-time sensor and autonomous processing [4] [5] [6] . Furthermore, improving sensor technology and increasing mission data rates, data precisions, and problem sizes are increasing the demand for communication bandwidth to ground stations. Due to limited bandwidth and long transmission latencies, remote transmission of sensor data or real-time operating decisions become impractical for space missions. High-performance onboard computing can alleviate these challenges and address the unique computing needs of space missions by processing data before transmission to ground stations and making real-time operating decisions autonomously.
To address the continually increasing demand for high-performance onboard space computing, careful consideration is required when selecting processors for future space missions, and new architectures must be analyzed for potential new space-grade processors. Presently existing spacegrade processors are typically based upon COTS processors with architectures that were not explicitly designed for the unique challenges of space computing. To ensure that new space-grade processors are based upon architectures that are most suitable for next-generation space missions, tradeoffs in architectural characteristics should be determined and considered when comparing or designing space-grade processors or when selecting a COTS architecture for hardening and use in space missions. However, the set of available processors is large and diverse, with many possible architectures to evaluate.
To analyze the large and diverse set of existing and potential future processor architectures for space computing, a suite of device metrics is leveraged that provides a theoretical basis for the study of architectural capabilities [7] [8] [9] [10] . Facilitated by device metrics, quantitative analysis and objective comparisons are conducted for many diverse space-grade and low-power COTS processor architectures, from categories such as multicore and many-core central processing units (CPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs), graphics processing units (GPUs), and hybrid configurations of these architectures. A device metrics analysis provides insights into the performance, power efficiency, memory bandwidth, and input/output bandwidth of specific implementations of these processors to track the current and future progress of their development and to determine which can best meet the computing needs of space missions. Although other metrics are also of interest, such as the cost and reliability of each processor, this information is not standardized between vendors and is often unavailable or highly dependent on mission-specific factors.
The remainder of this paper is structured as follows. Section II describes background and related research for space-grade processors and device metrics. Section III describes methodologies for the analysis of fixed-logic, reconfigurable-logic, and hybrid processors with device metrics. Section IV provides a comparative analysis of present and future space-grade processors with device metrics, including comparisons of spacegrade processors to one another, in-depth analysis of how performance of space-grade processors varies between applications and kernels based on operations mix, comparisons of space-grade processors to the closest COTS counterparts upon which they were based to determine overheads incurred from hardening, and comparisons of top-performing space-grade and COTS processors to determine the potential for future space-grade processors. Finally, Sec. V provides conclusions and future research directions. Data for all results are tabulated and included in the Appendix.
II. Background and Related Research
Many radiation hazards exist in the harsh space environment such as galactic cosmic rays, solar particle events, and trapped radiation in the Van Allen belts, which threaten the operation of onboard processors [11, 12] . Space-grade processors must be radiation hardened or radiation tolerant to withstand cumulative radiation effects such as charge buildup within the gate oxide that causes damage to the silicon lattice over time, and they must provide immunity to single-event effects that occur when single particles pass through the silicon lattice and cause errors that can lead to data corruption or disrupt the functionality of the processor [13] [14] [15] . Several techniques exist for the fabrication of space-grade processors [16] [17] [18] , including radiation hardening by process, which involves the use of an insulating oxide layer, and radiation hardening by design, which involves specialized transistor-layout techniques. Although both space-grade and COTS processors can be used in space, space-grade processors are often necessary, depending on the mission's orbit or location, planned lifetime, and requirements for reliability and accessibility. However, creating a space-grade version of a COTS processor often comes with associated costs [19] , including slower operating frequencies, decreased numbers of processor cores or computational units, increased power dissipation, and decreased input/output resources. Traditionally, space-grade processors have come in the form of single-core CPUs [20] . However, in recent years, development has occurred on space-grade processors with more advanced architectures such as multicore and many-core CPUs, DSPs, and FPGAs.
To analyze and compare processors for use in space missions, an established set of device metrics is leveraged for the quantitative analysis of diverse processor architectures in terms of performance, power efficiency, memory bandwidth, and input/output bandwidth [7] [8] [9] [10] . Device metrics provide a theoretical basis for the analysis of a processor's capabilities and enable the objective comparison of diverse architectures, from categories such as multicore and many-core CPUs, DSPs, FPGAs, GPUs, and hybrid configurations of these architectures. Device metrics can be calculated solely based upon architectural characteristics described by vendor-provided documentation and software tools, without the hardware costs and development efforts required for device benchmarking, thus providing a practical methodology for the comparison and analysis of a large and diverse set of processors. However, device metrics describe only the theoretical capabilities of each architecture without complete consideration of software requirements and implementation details, which may result in additional costs to performance, productivity, and other factors. Therefore, once the best processors have been identified with device metrics, more exhaustive hardware experimentation and analysis can then be conducted with device benchmarking to determine realizable capabilities.
Computational density (CD), reported in gigaoperations per second (GOPS), is a metric for the steady-state performance of a processor's computational units for a stream of independent operations. By default, calculations are based upon an operations mix of half-additions and halfmultiplications. However, the default can be varied to analyze how performance differs between applications that contain kernels that require other operations mixes. Multiply-accumulate functions are only considered to be one operation each because they require data dependency between each addition and multiplication. CD is calculated separately for each data type considered, including 8 bit, 16 bit, and 32 bit integers, as well as both single-precision and double-precision floating points (hereafter referred to as Int8, Int16, Int32, SPFP, and DPFP, respectively). The CD per watt (CD/W), reported in GOPS per watt (GOPS/W), is a metric for the performance achieved for each watt of power dissipated by the processor. The internal memory bandwidth (IMB), reported in gigabytes per second, is a metric for the throughput between a processor and onchip memories. The external memory bandwidth (EMB), reported in gigabytes per second, is a metric for the throughput between a processor and offchip memories through dedicated memory controllers. The input/output bandwidth (IOB), reported in gigabytes per second, is a metric for the total throughput between a processor and offchip resources through both dedicated memory controllers and all other available forms of input/ output. Although no single metric can completely characterize the performance of any given processor, each metric provides unique insight into specific architectural features that can be related to applications and kernels as needed. The most relevant metric for performance may be CD when bound computationally, CD/W when bound by power efficiency, IMB or EMB when bound by memory, IOB when bound by input/output resources, or some combination of multiple metrics depending on specific application requirements.
III. Device Metrics Methodology
To calculate device metrics for a fixed-logic processor such as a CPU, DSP, or GPU, several key pieces of information are required about the architecture that are obtained from vendor-provided documentation [7, 8] . For example, Eqs. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) demonstrate the process of calculating device metrics for Freescale QorIQ ® P5040, which is a quadcore CPU [21] [22] [23] . CD calculations require information about the operating frequency, the number of each type of computational unit, and the number of operations per cycle that can be achieved by each type of computational unit for all operations mixes and data types considered. As shown in Eqs. (1) and (2), there is one-integer addition unit and one-integer multiplication unit on each processor core, allowing for one addition and one multiplication to be issued simultaneously per cycle for all integer data types. There is only one floating-point unit on each processor core, which handles both additions and multiplications, allowing for only one operation to be issued per cycle for all floating-point data types. CD/W calculations require the same information as CD calculations, in addition to the maximum power dissipation. As shown in Eqs. (3) and (4), the CD/W is calculated using the corresponding CD calculations and the maximum power dissipation. IMB calculations require information about the number of each type of onchip memory unit, such as caches and register files, and associated operating frequencies, bus widths, access latencies, and data rates. As shown in Eqs. (5-7), the IMB is calculated for all types of caches available on each processor core. Assuming cache hits, both types of L1 cache can supply data in each clock cycle. Although the L2 cache has a higher bus width, it also requires a substantial access latency, which limits the overall bandwidth. IMB values are combined to obtain the total IMB. EMB calculations require information about the number of each type of dedicated controller for offchip memories and associated operating frequencies, bus widths, and data rates. As shown in Eq. (8), the EMB is calculated for the dedicated controllers available for external memories on the QorIQ P5040. IOB calculations require the same information as EMB calculations, in addition to the number of each type of available input/output resource and associated operating frequencies, bus widths, and data rates. As shown in Eqs. (9) (10) (11) (12) (13) (14) (15) , the IOB is calculated for each type of input/ output resource available using optimal configurations for signal multiplexing. IOB values are combined to obtain the total IOB: To calculate device metrics for a reconfigurable-logic processor such as an FPGA, the process is more complex as compared to fixed-logic processors, and it requires several key pieces of information about the architecture that are obtained from vendor-provided documentation, software tools, and test cores [7] [8] [9] [10] . For example, Eqs. (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) demonstrate the process of calculating device metrics for the Xilinx Virtex ® -5 FX130T, which is an FPGA [24] [25] [26] [27] . CD calculations require information about the total available logic resources of the architecture in terms of flip flops, lookup tables, and digital-signal-processing units. Additionally, the use of software tools and test cores is required to generate information about the operating frequencies and logic resources used for all operations mixes and data types considered [25, 26] . A linearprogramming algorithm is used for optimization, based upon operating frequencies and the configuration of computational units on the reconfigurable fabric [9, 10] . As shown in Eqs. (16) (17) (18) (19) (20) , the CD is calculated separately for each integer and floating-point data type, based upon the operating frequencies and logic resources used for additions and multiplications, where each computational unit can compute one operation per cycle and multiple versions of each computational unit are considered that make use of different types of logic resources. CD/W calculations require the use software tools to generate information about power dissipation given the configuration of computational units for each data type [27] . As shown in Eqs. (21-25), the CD/W is calculated separately for each integer and floating-point data type using estimates for maximum power dissipation generated with vendor-provided tools. IMB calculations require information about the number of onchip memory units such as block random-access-memory (BRAM) units and the associated operating frequencies, number of ports, bus widths, and data rates. As shown in Eq. (26), the IMB is calculated for the internal BRAM units on the Virtex-5. EMB calculations require the operating frequency, logic and input/ output resource usage, bus widths, and data rates for dedicated controllers for offchip memories. As shown in Eq. (27) , the EMB is calculated for dedicated controllers for external memories, where the maximum number of controllers is limited by the number of input/output ports available. IOB calculations require the same type of information that is required for fixed-logic processors. As shown in Eqs. (28) (29) (30) To calculate device metrics for a hybrid processor that contains some combination of CPU, DSP, GPU, and FPGA architectures, the calculations must first be completed for each constituent architecture. CD values are then combined to obtain the hybrid CD, which is then divided by the combined maximum power dissipation to obtain the hybrid CD/W. IMB, EMB, and IOB values are also combined to obtain the hybrid IMB, EMB, and IOB, but they must account for any overlap of memory and input/output resources that are shared between the constituent architectures.
IV. Experiments, Results, and Analysis
To enable quantitative analysis and objective comparisons of space-grade processors, device metrics are calculated for many diverse spacegrade and low-power COTS processors. First, space-grade processors are compared to one another. Next, top-performing space-grade processors are further analyzed to determine how performance varies between applications and kernels based on operations mix. Then, space-grade processors are compared to the closest COTS counterparts upon which they were based to determine the overheads incurred from hardening. Finally, top-performing space-grade and low-power COTS processors are compared to determine the potential for future space-grade processors.
A. Space-Grade Processors
Using the methods described in Sec. III, Fig. 1 provides the CD, CD/W, IMB, EMB, and IOB for various existing and emerging space-grade processors in logarithmic scale, including the Honeywell HXRHPPC™ [28] and BAE Systems RAD750 ® [29] , which are single-core CPUs; the Cobham GR712RC™ [30, 31] , Cobham GR740™ [32] [33] [34] , and BAE Systems RAD5545™ [35] , which are multicore CPUs; the Boeing Maestro™ [36] [37] [38] , which is a many-core CPU; the Ramon Chips RC64™ [39] [40] [41] [42] [43] and BAE Systems RADSPEED™ [44, 45] , which are multicore DSPs; and the Xilinx Virtex-5QV FX130 [25- 27, 46] and Microsemi RTG4™ [47] [48] [49] [50] , which are FPGAs. Data from Fig. 1 is provided within Table A1 .
The HXRHPPC, RAD750, and GR712RC achieve lower CD and CD/W due to slower operating frequencies and older single-core or dual-core CPU architectures with limited computational units. Additionally, they achieve a low IMB due to limited internal caches, a low EMB due to limited or no dedicated external-memory controllers, and a low IOB due to limited and slow input/output resources. CPUs such as the GR740, RAD5545, and Maestro achieve a much higher CD than older CPUs due to their higher operating frequencies, newer multicore and many-core architectures, and (in the case of both the RAD5545 and Maestro), multiple-integer computational units within each processor core. Of all the CPUs compared, the Maestro achieved the highest CD and IMB due to its large number of processor cores and caches, whereas the GR740 achieved the highest CD/W due to its low-power dissipation.
Although the capabilities of space-grade processors are greatly increasing due to newer CPUs, even further gains are made with DSPs and FPGAs. The RC64 achieves a high-integer CD, and the RADSPEED achieves a high floating-point CD due to large levels of parallelism for these types of computational units; and both achieve a high IMB due to large numbers of internal caches and register files. The Virtex-5QVachieves high CD and CD/W, and the RTG4 achieves high-integer CD and CD/W because they support large numbers of computational units at a relatively lowpower dissipation; and both achieve a high IMB due to large numbers of internal BRAM units, a high EMB because they support multiple dedicated controllers for external memories, and a high IOB due to the large number of general-purpose input/output ports available. By comparing space-grade processors using device metrics, the changes in capabilities of space-grade processors can be analyzed. The performance achieved by space-grade processors has increased by several orders of magnitude due to newer processors with more advanced architectures that support higher levels of parallelism in terms of computational units, internal memories, and input/output resources.
B. Performance Variations in Space-Grade Processors
The CD calculations for each processor are based upon an operations mix of half-additions and half-multiplications by default because this is a common and critical operations mix for many compute-intensive kernels that are used in space applications. However, a further analysis can be conducted for other important operations mixes. Figure 2 displays several examples of kernels used in space applications and their corresponding operations mixes of additions and multiplications [51] [52] [53] [54] [55] [56] , where subtractions are considered logically equivalent to additions. Although overheads are required during implementation, these operations mixes characterize the work operations involved, and thus provide a foundation for the performance of each kernel and the applications in which they are used. Figure 3 provides the CD for each top-performing space-grade processor using all possible operations mixes consisting of additions and multiplications in order to demonstrate how the performance varies between different kernels. Data from Fig. 3 is provided within Table A2 . Further experimentation would be conducted for additional operations mixes that relate to other kernels consisting of operations such as divisions, shifts, square roots, and trigonometric functions; but, it is not possible because information about the performance of these operations is often not included in vendor-provided documentation or is accomplished using software emulation.
The GR740 contains an integer computational unit for each processor core that can compute one Int8, Int16, or Int32 addition or multiplication per cycle. The GR740 also contains a floating-point computational unit for each processor core that can compute one SPFP or DPFP addition or multiplication per cycle. Therefore, both integer and floating-point CDs remain constant for all operations mixes because additions and multiplications are computed in the same number of cycles. The
these units in the same cycle, resulting in the ability to compute both an addition and a multiplication per cycle, two additions per cycle, or one multiplication per cycle. Therefore, the integer CD remains constant for operations mixes with a majority of additions, but it decreases up to 50% as the percentage of multiplications surpasses the percentage of additions due to more multiplications that cannot be computed simultaneously with additions. The RAD5545 also contains a floating-point computational unit for each processor that can compute one SPFP or DPFP addition or multiplication per cycle. Therefore, the floating-point CD remains constant for all operations mixes because additions and multiplications are computed in the same number of cycles.
The Maestro contains several integer computational units for each processor core, including two units that can each compute four Int8 additions, two Int16 additions, or one Int32 addition per cycle, and one unit that can compute one Int8, Int16, or Int32 multiplication in two cycles. Therefore, the integer CD decreases up to 94% as the percentage of multiplications increases because multiplications take more cycles to compute than additions and have less computational units for each processor core. The Maestro also contains a floating-point computational unit for each processor core that can compute one SPFP or DPFP addition per cycle and one SPFP or DPFP multiplication in two cycles, with the ability to interleave additions with multiplications. Therefore, the floating-point CD remains constant for operations mixes with a majority of additions but decreases up to 50% as the percentage of multiplications surpasses the percentage of additions because multiplications take more cycles to compute and this results in more multiplications that cannot be interleaved with additions.
The RC64 contains several computational units for each processor core that can compute eight Int8 or Int16 additions per cycle, four Int32 additions per cycle, four Int8 or Int16 multiplications per cycle, one Int32 multiplication per cycle, or one SPFP addition or multiplication per cycle. DPFP operations are not supported. Therefore, the integer CD decreases up to 75% as the percentage of multiplications increases because multiplications take more cycles to compute than additions. The floating-point CD remains constant for all operations mixes because additions and multiplications are computed in the same number of cycles.
The RADSPEED contains an integer computational unit for each processor core that can compute one Int8 addition per cycle, one Int16 addition in two cycles, one Int32 addition in four cycles, one Int8 or Int16 multiplication in four cycles, or one Int32 multiplication in seven cycles. Therefore, the integer CD decreases up to 75% as the percentage of multiplications increases because multiplications take more cycles to compute than additions. The RADSPEED also contains several floating-point computational units for each processor core, including one unit that can compute one SPFP or DPFP addition per cycle and one unit that can compute one SPFP or DPFP multiplication per cycle. Operations can be issued to both of these units in the same cycle, resulting in the ability to compute both an addition and a multiplication per cycle but not two additions or two multiplications per cycle. However, the ability to compute two operations per cycle only applies to SPFP operations because DPFP operations are limited by bus widths. Therefore, a single-precision floating-point CD peaks when the percentages of additions and multiplications are equal and decreases up to 50% as the percentages of additions and multiplications become more unbalanced. The double-precision floating-point CD remains constant for all operations mixes because additions and multiplications are computed in the same number of cycles.
The Virtex-5QVand RTG4 contain reconfigurable fabrics that support computational units that compute one Int8, Int16, Int32, SPFP, or DPFP addition or multiplication per cycle. As data types and precisions increase, slower operating frequencies can typically be achieved and more logic resources are required. For Int8, Int16, and Int32 operations, multiplications typically achieve slower operating frequencies than additions and require more logic resources. Therefore, the integer CD decreases up to ∼92% for the Virtex-5QVand up to ∼99% for the RTG4 as the percentage of multiplications increases. For SPFP and DPFP operations, multiplications typically achieve slower operating frequencies than additions and require less logic resources when digital-signal-processing units are used, but they require more logic resources when these units are not used. Therefore, the floating-point CD either increases or decreases as the percentage of multiplications increases, depending on the use of digitalsignal-processing units. However, the floating-point CD does not vary as much as the integer CD because the differences between logic resources used for additions and multiplications are not as significant.
By matching the operations mixes from Fig. 2 with the results from Fig. 3 , the variations in performance between different kernels can be analyzed for each top-performing space-grade processor. For all operations on the GR740, the floating-point operations on the RAD5545 and RC64, and the double-precision floating-point operations on the RADSPEED, the CD does not vary between kernels. For integer operations on the RAD5545 and floating-point operations on the Maestro, the CD is highest for kernels that use at least half-additions (such as matrix addition, fast Fourier transform, matrix multiplication, and matrix convolution), becomes worse for kernels that use more than half-multiplications (such as Jacobi transformation), and is lowest for kernels that use all multiplications (such as the Kronecker product). For integer operations on the Maestro, RC64, RADSPEED, Virtex-5QV, and RTG4, the CD is highest for kernels that use all additions such as matrix addition and becomes worse for all other kernels where more multiplications are used. For single-precision floating-point operations on the RADSPEED, the CD is highest for kernels that use half-additions and half-multiplications (such as matrix multiplication and matrix convolution), becomes worse for all other kernels as either more additions or more multiplications are used, and is lowest for kernels that use either all additions or all multiplications (such as matrix addition or the Kronecker product). For floating-point operations on the Virtex-5QVand RTG4, the CD varies moderately between kernels. Variations in the CD demonstrate how the performance of space-grade processors is affected by the operations mixes of computeintensive kernels used in space applications. Figure 4 provides the CD, CD/W, IMB, EMB, and IOB for the closest COTS counterparts to the space-grade processors from Fig. 1 in logarithmic scale, where the HXRHPPC was based upon the Freescale PowerPC603e™ [57] , the RAD750 was based upon the IBM PowerPC750™ [58] [59] [60] , the RAD5545 was based upon the QorIQ P5040 [21-23], the Maestro was based upon the Tilera TILE64™ [61, 62] , the RADSPEED was based upon the ClearSpeed CSX700™ [63, 64] , and the Virtex-5QV FX130 was based upon the Virtex-5 FX130T [24] [25] [26] [27] . The GR712RC, GR740, RC64, and RTG4 are not included because they were not based upon any specific COTS devices. Data from Fig. 4 is provided within Table A3 .
C. Overheads Incurred from Radiation Hardening
By comparing the results from Figs. 1 and 4 , the overheads incurred from hardening of the COTS processors can be calculated. Figure 5 provides the percentages of operating frequencies, the number of computational cores, power dissipation, CD, CD/W, IMB, EMB, and IOB achieved by each space-grade processor as compared to its closest COTS counterpart. Data from Fig. 5 is provided within Tables A4 and A5 .
The largest decreases in operating frequencies were for the multicore and many-core CPUs because their closest COTS counterparts benefited from high operating frequencies that were significantly decreased in order to be sustainable on space-grade processors, whereas the closest COTS counterparts to the RADSPEED and Virtex-5QV only required moderate operating frequencies to begin with, and therefore did not need to be decreased as significantly. The largest decreases in the number of computational cores were for the Maestro, RADSPEED, and Virtex-5QV because their closest COTS counterparts contained large levels of parallelism that could not be sustained after hardening, whereas the closest COTS counterparts of the multicore CPUs did not contain enough parallelism to require any decreases to the number of computational cores during hardening. The Maestro achieved a larger floating-point CD and CD/W than its closest COTS counterpart due to the addition of floating-point units to each processor core, resulting in the only occurrence of increases in device metrics after radiation hardening. Increases and decreases in power dissipation were more unpredictable because they were dependent on many factors, including decreases in operating frequencies and the number of computational cores and changes to input/output peripherals.
By comparing space-grade processors to their closest COTS counterparts using device metrics, the overheads incurred from hardening can be analyzed. The largest decreases in the CD and IMB occurred for the multicore and many-core CPUs rather than the DSP and FPGA, demonstrating that large decreases in operating frequencies had a more significant impact on the resulting CD and IMB than decreases in the number of computational cores. The smallest decreases in the CD/W occurred for the Virtex-5QV due to relatively small decreases in the CD and only minor variations in power dissipation. The largest decreases in the EMB and IOB occurred for the older single-core CPUs because their input/output resources were highly dependent on operating frequencies that were significantly decreased. These overheads can be considered when analyzing processors for potential hardening and use in space missions. Figure 6 provides the CD, CD/W, IMB, EMB, and IOB for a variety of low-power COTS processors in logarithmic scale, including the Intel Quark™ X1000 [65, 66] , which is a single-core CPU; the Intel Atom™ Z3770 [67, 68] , Intel Core™ i7-4610Y [69] [70] [71] [72] , and Samsung Exynos™ 5433 [73] [74] [75] , which are multicore CPUs; the Tilera TILE-Gx8036™ [76] [77] [78] , which is a many-core CPU; the Freescale MSC8256™ [79] [80] [81] , [89, 90] , NVIDIATegra K1 [83, 84, 91] , and Tegra X1 [74, 75, 92] , which are GPUs paired with multicore CPUs. Several modern processors are considered from each architectural category with power dissipation no larger than 30 W. Data from Fig. 6 is provided within Table A6 .
D. Projected Future Space-Grade Processors
By comparing many low-power COTS processors, the top-performing architectures can be selected and considered for potential hardening and use in future space missions. Although the Core i7-4610Y is the top-performing CPU in most cases, the Exynos 5433 achieves the largest CD/W of the CPUs due to its small power dissipation. The top-performing DSP, FPGA, and GPU are the KeyStone-II, Kintex-7Q, and Tegra X1, respectively. However, if the architectures from these COTS processors were to be used in potential future space-grade processors, several overheads would likely be incurred during the hardening process that must be considered. Therefore, the results for top-performing COTS processors from Fig. 6 are decreased based on the worst-case and best-case hardening overheads from Fig. 5 in order to project device metrics for potential future space-grade processors. Figure 7 provides worst-case and best-case projections in logarithmic scale for potential future space-grade processors based on the Core i7-4610Y, Exynos 5433, KeyStone-II, Kintex-7Q, and Tegra X1 alongside the top-performing space-grade processors from Fig. 1 to determine how additional radiation hardening of top-performing COTS processors could impact the capabilities of space-grade processors. Data from Fig. 7 is provided within Tables A7 and A8 .
By comparing top-performing and projected future space-grade processors using device metrics, the potential benefits of hardening additional COTS architectures can be analyzed. Although the results from Fig. 5 suggest that the hardening of CPUs typically results in large overheads, the Core i7-4610Yand Exynos 5433 achieve the largest CD and CD/W for each data type considered, as well as the largest IMB, out of all space-grade CPUs even when using worst-case projections. However, the results from Fig. 5 also suggest that the hardening of DSPs and FPGAs typically results in smaller overheads. When using best-case projections, the KeyStone-II and Kintex-7Q achieve the largest CD and CD/W for each data type considered, as well as the largest EMB, as well as the largest IMB and IOB in most cases, out of all space-grade processors. Finally, although there are no past results for the hardening of GPUs, the Tegra X1 achieves a large CD and CD/W and a moderate IMB, EMB, and IOB within the range of projections used. Based on the projections and comparisons from Fig. 7 , COTS processors from each architectural category have a high potential to increase the capabilities of space-grade processors, even with the overheads incurred from hardening. Therefore, as expected, the hardening of modern COTS processors could benefit space computing in terms of performance, power efficiency, memory bandwidth, and input/ output bandwidth; and these results help to quantify potential outcomes.
V. Conclusions
As the performance needs for onboard space computing are continually increasing, existing and emerging space-grade and low-power commercial-off-the-shelf (COTS) processors are analyzed for potential use in future space missions. A device metrics analysis is demonstrated as a methodology to quantitatively and objectively analyze a large and diverse set of processor architectures. The results are generated to enable comparisons of space-grade processors to one another, comparisons of space-grade processors to their closest COTS counterparts to determine overheads incurred from radiation hardening, and comparisons of top-performing space-grade and COTS processors to determine the potential for future space-grade processors.
The results demonstrate and quantify how emerging space-grade processors with multicore and many-core CPU, DSP, and FPGA architectures are continually increasing the capabilities of space missions by supporting high levels of parallelism in terms of computational units, internal memories, and input/output resources. In particular, the best results are provided by the RC64, Virtex-5QV, and RTG4 for the integer CD and CD/ W; the RADSPEED and Virtex-5QV for the floating-point CD and CD/W; the RC64 and Virtex-5QV for the IMB; the RAD5545 and Virtex-5QV for the EMB; and the RAD5545, Virtex-5QV, and RTG4 for the IOB. Additionally, CD results for each top-performing space-grade processor are further analyzed to demonstrate and evaluate how the performance can vary significantly between applications, depending on the operations mixes used within compute-intensive kernels, with the largest variations occurring for integer operations on the Maestro, Virtex-5QV, and RTG4.
Furthermore, the overheads incurred from radiation hardening were quantified and analyzed, where the overheads incurred by the space-grade CPUs were typically much larger than those incurred by the RADSPEED and Virtex-5QV because they required more significant decreases in operating frequencies. Finally, overheads from past cases of hardening were used to project device metrics for potential future space-grade processors, demonstrating and quantifying how the hardening of modern COTS processors from each architectural category could result in significant increases in the capabilities of future space missions. In particular, the Core i7-4610Y and Exynos 5433 could provide the largest CD, CD/W, and IMB out of all space-grade CPUs; the KeyStone-II 66AK2H12 and Kintex-7Q K410T could provide the largest CD, CD/W, and EMB out of all space-grade processors, as well as the largest IMB and IOB in most cases; and the Tegra X1 could provide the largest CD and CD/W out of all space-grade processors, as well as moderate IMB, EMB, and IOB.
By using device metrics to analyze and compare present and future space-grade processors, tradeoffs between architectures were determined and could be considered when comparing or designing processors for future space missions. Future research directions will involve optimized device benchmarking of top-performing space-grade processors to analyze and optimize their performance capabilities for key space applications and kernels. 
Appendix: Device Metrics Data

