JPL has captured its experience from over four decades of robotic space exploration into a set of design rules. These rules have gradually changed into explicit requirements and are now formally implemented and verified. Over an extended period of time, the initial understanding of intent and rationale for these rules has faded and rules are now frequently applied without further consideration. In the meantime, mission classes and their associated risk postures have evolved, coupled with resource constraints and growing design diversity, bringing into question the current "one size fits all" thermal margin approach. This paper offers a systematic review of the heat flow path from an electronic junction to the eventual heat rejection to space. This includes the identification of different regimes along this path and the associated requirements. The work resulted in a renewed understanding of the intent behind JPL requirements for hot thermal margins and a framework for relevant considerations, which in turn enables better decision making when a deviation to these requirements is considered.
I. Introduction
HERMAL design at JPL has evolved over several decades. Many lessons were learned during this time and the practices that developed around these lessons were eventually codified and captured in institutional documents. Today, the authoritative sources for thermal design requirement are: 
Figure 2. Thermal Design and Margin Domains
Together, these documents cover the engineering domains along the path that heat will take, from its generation at the part junction, until it is rejected to space. These domains consist of electronic parts and their derating, packaging of these parts, reliability and qualification/protoflight margin, and the subsystem thermal design and heat rejection. Reliability and qualification/protoflight margins are governed by the requirements established in JPL's Design Principles and are typically depicted as indicated in Fig. 1 below.
When the thermal margin relationships are shown as a junction of AFT the previously mentioned domains become distinct, as shown in Fig. 2 on the next page.
The effect of the minimum hot qualification temperature of 70°C can be seen when the AFT drops below 50°C. It imposes a temperature rise from baseplate to part junction of 40°C (110°C -70°C) under this condition. In this paper we use the term "reliability margin" to refer to this "additional margin" beyond the 20°C that does result from an AFT above 50°C. Note, that upper AFT's are typically between 40°C and 50°C for electronic assemblies. The exception to this practice occurs most often in science instrument electronics. 
Thermal Margins
by temperature. The lower the E a , the easier it is to initiate the failure mechanism by temperature. Typical activation energies range from -1 eV for hot carrier injection, to 1.3 eV for slow charge trapping. Manufacturers continue to rely on the Arrhenius methodology to determine acceleration factors for failure rate calculations and equivalent stress testing protocols. Through accelerated testing, the user is able to reduce the time to failure and obtain data in a shorter time than would otherwise be required. This technique remains widely used throughout the semiconductor industry.
Many suppliers use product life testing at, or near, maximum junction temperature of the device to validate product lifetime; this is typically performed at 125°C to 150°C. Targeted product lifetimes for Mil-product are generally 10 years at maximum rated junction temperature; however, some designs are customer driven and reflect a 15-, 20-or even 25-year targeted product lifetime. There are varying product lifetime definitions from suppliers; therefore, the user should request the specific test conditions and confidence level associated with a given FIT (Failure in Time) rate. Supplier reported Mil-product target FIT rates for >0.18μm technologies have ranged from 50 FITs (0.5% cumulative failure rate) at 10 years to as low as 0.76 FIT (0.01% cumulative failure rate) at 15 years, 60% confidence level. One (1) FIT for intrinsic failure mechanisms over 10 years (0.01% cumulative failure rate or 1 failure/billion device hours), 60% confidence level is the historical benchmark. Therefore, we consider typical microelectronic lifetime for Mil-products to be 10 years at maximum rated junction temperature unless otherwise specified. Advertised FIT rates, calculation assumptions, and targeted product lifetimes should be considered with new technologies.
Historically, junction temperature (Tj) derating for silicon microcircuits in ceramic hermetic packages has been limited to between 110°C and 115°C. The basis of this calculation can be described as follows:
Adding a safety margin of two to the typical 10-year Mil-product design lifetime has been standard practice in industry for many years. In order to achieve twice the lifetime, the junction temperature must be lowered such that Mean-Time-To-Failure (MTTF) is twice the nominal value. For a 125°C max rated Tj device, assuming an E a = 0.6 eV, the typical 10-year MTTF can be extended by a safety margin of two by lowering the junction temperature by 15°C, to 110°C 7 . The current D-8545 typical derated Tj for silicon microcircuits remains 110°C or 40°C below the manufacturer's absolute maximum Tj rating, whichever is lower. Absolute max rated junction temperatures for many parts are typically between 150°C and 175°C, but can be higher or lower.
Effective thermal management from the system-level to the component-level is a key element in the overall design of reliable systems. Thermal margin stack-up play a large role in robust, reliable system design. Thermal management in space systems must consider a wide range of issues, including thermal loading of many different components on conduction cooled boards, radiation degradation of components, which may cause standby currents to increase, and the frequent temperature cycling of some systems, such as MSL. Conservative design practices are helpful, but they should be supplemented by radiation and reliability performance data over temperature for the wide range of microelectronic devices that are used on modern spacecraft.
B. Electronic Part Packaging
The main challenge to overcome in packaging is the limited allowable temperature rise between the Protoflight temperature and the celling established by the part derating limits. Thermal performance is dominated by heat conduction, with no convection and usually minor radiation transfer. Key interfaces are bolted joints and bonded joints. Component heat sinking usually takes the form of a bond to the circuit board to supplement lead thermal conduction. Some components can be directly mounted to the chassis for greater heat rejection within acceptable temperature rises.
Temperature margins are not intentionally added in the thermal analysis process. Thermal models are meant to be accurate with some conservatism in material thermal properties and in other aspects such as assumption that heat flows through bolted joints only in areas very near a fastener. There is likely to be some margin in the power dissipations used for analysis compared to the actual component dissipations during normal operation, but this is dependent on the source of the power dissipations. Typically, power dissipation is provided from an Electronic Parts Stress Analysis (EPSA) and/or by the Cognizant Engineer. The primary margin is in the Protoflight Temperature used for analysis compared to the Allowable Flight Temperature (AFT).
The requirements governing electronics parts packaging are documented in JPL's Spacecraft Electronic Packaging/Cabling Design and Fabrication Standard. Besides electrical performance, components must meet power/temperature derating requirements in Protoflight conditions and adequate electrical interconnect fatigue life must be achieved for the vibration environments and for the thermal cycling environments. "Card cage" chassis designs, some with backplanes, make use of standardized wedgelock interfaces for multiple cards. The result can be a single chassis with relatively high power dissipation due to the number of cards operating, with significant temperature rise from the chassis mounting surface to the individual card interfaces. There are also unique chassis designs with various methods of circuit board mounting. Dry mounting with screws may only suffice for moderate power dissipation but as power increase the ability to supplement the mounting fasteners with a thermal bonding material such as RTV (Room Temperature Vulcanization) adhesives becomes more desirable. Circuit board bonding to chassis may be needed for higher power dissipation, in addition to more internal PWB copper and/or thermal vias (plated holes).
Circuit boards with parts on both sides tend to have less board area available for making a good thermal attachment to chassis. Some circuit board bonding to the chassis may be needed in addition to the mounting fasteners to limit circuit board temperature rise. If the board is to be mounted on raised bosses or standoffs, no bonding area may be available and then the clearance of internal copper planes to the mounting holes will have to be minimized to avoid a significant "thermal choke" at the mounting fastener. A significant improvement can be made by attaching an internal plane to a plated mounting hole in combination with some surface copper that will be pressed against the chassis by the mounting screw. This plane would then be at chassis potential which may not be favorable in some designs in terms of electrical noise or other function. Copper planes are key thermal elements of circuit boards to provide lateral heat spreading and minimize hot spots; performance is degraded when the planes are segmented or split. Increased power density due to higher dissipation parts and/or decrease in packaging volume will increase the importance of having some unbroken copper planes in the area, potentially one or more "chassis ground" planes if other planes must be segmented. Managing the board-to-case temperature rise may require heatsink methods in addition to mounting the component on its leads only. Some of these methods need to be applied with caution since they may compromise solder joint thermal cycle durability.
C. Reliability
The requirements for reliability design analysis at JPL are invoked in JPL's institutional reliability requirements document for flight projects 8 and JPL's Reliability Assurance Handbook 9 . The requirements document establishes which reliability design and development activities a flight project must undertake, such as reliability design analyses. The handbook describes details about how to perform the reliability analyses, covering the processes and methodologies to satisfy the requirements.
Commonly performed reliability analyses include Thermal Analysis (TA), Electronic Parts Stress Analysis (EPSA), Worst-Case Analysis (WCA), Board level Structural Stress Analysis, and Reliability Design Estimates. This discussion focuses on EPSA & WCA and their relation to the Thermal Stress Analysis. Figure 3 depicts the information flow among the primary design analyses.
Figure 3. Design Analyses Information Flow
American Institute of Aeronautics and Astronautics Electronic Parts Stress Analysis (EPSA) is a detailed evaluation of the electrical and thermal stresses for electronic and electromechanical parts. The stress capabilities for some parts and some part parameters are temperature dependent. A thermal analysis (TA) provides thermal rises from the assembly thermal control surface (baseplates) to the localized heat sources (electronics, etc.).
The boundary temperatures for these analyses are based upon qualification/protoflight temperatures at the thermal control surfaces (TCS). Figure 4 (page 9) illustrates these relationships. Performing the EPSA at protoflight test temperatures ensures that the hardware is not overstressed during the testing, even if for a short time. Thus, unlike the mission operating environment, there is no margin in the test environment. The technical rationale for using qualification/protoflight temperatures as a basis for the analysis is to ensure proper performance at the marginal conditions, with the goal to demonstrate that there is graceful degradation and there is not a catastrophic drop-off of performance. Commonly, the EPSA is completed first using an assumed thermal rise (JPL assumes a 20°C rise) from the TCS to the part case, and the TA is completed with estimated power levels. The EPSA and TA must be checked afterward for any discrepancies.
The WCA is a detailed analysis to verify performance requirements in the presence of performance limiting factors. The purpose of the circuit WCA is to demonstrate that the electronic design will perform within functional specification under the extreme life, environment, & circuit conditions such as aging, radiation, temperature, initial part tolerance, and supply variations. The temperatures are based upon the hot and cold qualification/protoflight temperatures and an assumed thermal rise (JPL assumes a 10°C) from the thermal control surfaces to the part case for the hot condition and 0°C rise for the cold condition. Like the EPSA, the temperature assumptions used in the WCA must be verified and reconciled once the TA results are available.
A reliability estimate is the practice of estimating the reliability of a design based upon failure rates. When evaluating electronics, failure rates are related to the temperature assumed for the equipment and are also completely dependent upon the data available. Because of the uniqueness of JPL space missions, there is no database that is widely accepted as applicable across the institution. However, JPL uses reliability estimates in Reliability Trade Studies to evaluate different design architectures. For reliability estimates, various temperatures are used for failure rate estimates. However, it is generally accepted that qualification/protoflight temperatures are too extreme for reliability estimates because of the logarithmic relationship between temperature and aging.
When planning for and executing analyses, the amount of work necessary to prove the design will work needs to be traded against the time needed to develop and analyze the model. Analysis assumptions are often used to bound the possible outcomes and save time to develop the model. Design analyses need to be good enough to bound the performance expectations, but they do not usually have to provide a high degree of fidelity or accuracy. Without the cushion of design, much more time would be needed to develop higher fidelity models. However, prudence is needed in choosing margins, since unrealistic margins can over constrain the design effort. Hence design margins, when chosen wisely, provide a practical approach and a net savings in development resources.
Complications arise because of the interdependency of the Thermal analysis, EPSA, WCA, and the need to iterate the results across these analyses. Many analysts and detailed designers are not aware of the need to reconcile these analysis assumptions. This has been observed both at JPL and at contracting organizations. Therefore, this task of reconciling the assumptions sometimes isn't done, which has several consequences. Firstly, this causes design issues to fall through the cracks. A mistake or oversight in one analysis can ripple through other analyses. This may be the case when an analysis yields either overly optimistic or pessimistic results. In the case of the former, design problems that do not manifest a problem in the primary analysis, but may occur in problems in one of the dependent analyses. For example, underestimating thermal rise could mask a design issue and result in problems meeting actual worst-case circuit performance in thermal testing or late in the mission life. Alternatively, overestimating thermal rises may give the false appearance there electrical stress may be too high. Sometimes waivers are sought in these cases that are not actually needed the design analysts have not resolved the discrepancies between these analyses.
Projects with short development time often don't get the work done in time. If development time is too rushed for designers, they sometimes don't complete all this work in a schedule appropriate for the development activities. This drives designers into excessive workloads and schedule problems later in the development cycle, often resulting in design changes after CDR (Critical Design Review) and possibly even into the System Integration, Test, & Launch Preparations Phase.
There has been discussion in recent years that temperature rises for many electronic parts exceed the current 10°C rise assumption, which means there may be increased cost now for updating the WCAs with the real thermal rises than when the 20°C thermal rise was assumption used. So this problem could be getting worse for WCA.
Questions about thermal design margins arise regularly. In some cases, this happens because of misunderstandings about where the margins actually lie. It is not uncommon to hear complaints about a WCA or EPSA temperature being unrealistically high in comparison to a nominal operating temperature or room temperature. This is largely a training issue arising from the lack of understanding that the basis for the design analysis temperatures is the test temperatures and includes thermal rises. In some cases, questions arise because of the lack of data to justify the protoflight test temperatures coupled with the fact that some missions have less stringent mission requirements. The only real data available is that missions have historically been successful with the margins that have been used. But there is not adequate data to know whether refinements to the margins can be made without adversely impacting reliability.
D. Qualification
Electronics assemblies are designed and subjected to qualification (Qual) or protoflight (PF) thermal vacuum testing at the upper allowable flight temperature with a 20°C margin applied or an upper limit of +70°C, whichever yields a hotter temperature limit. Figure 1 (page 2) illustrates these relationships. On the cold side, the cold design temperature and Qual/PF limit is obtained by applying a margin of 15°C to the lower allowable flight temperature, or using the standard limit of -35°C, whichever is colder. Thus the standard qualification/protoflight temperature range for electronics becomes -35°C to +70°C for any Allowable Flight Temperature range between -20°C and +50°C.
The minimum hot electronics qualification/protoflight temperature limit of 70°C promotes the design and construction of robust and reliable hardware that will lead to successful missions. The rationale and benefits of the practice are as follows:
1. Testing at 70°C provides a robust electronics box/assembly-level stress screen, especially when coupled with the dwell requirement of 72 hours. This test provides high confidence that residual workmanship and "infant mortality" type failures have not escaped parts screening and/or have not been subsequently introduced during board fabrication or re-work. Arrhenius theory is commonly used to describe the acceleration of the electronic parts failure mechanisms as a function of temperature. Therefore time spent at elevated temperature is equivalent to an even greater amount of time at the expected mission temperature. 2. Qual/PF testing all electronics at the assembly-level to 70°C increases the likelihood of uncovering design and workmanship problems early in the program, when it's the least expensive to fix them. Electronics qualified to 70°C are less likely to experience hardware failures during system integration and test; correcting these late failures not only costs much more, but can also lead to project schedule slips and reduced system reliability. 3. Designing for a 70°C qualification limit yields lower in-flight part junction temperatures which may increase the useful life of the assembly. When high test temperatures are combined with piece part junction temperature restrictions (i.e. derating limits), the temperature rise between the baseplate and the part junction is typically limited to 40°C or less (based JPL's typical junction temperature derating of 110C). Voyager, Cassini and other missions that followed this practice have part junctions that typically run less than 60C during the mission, resulting in higher reliability over the life of the electronics. 4. The limited thermal rise to the electronic parts, promoted by the 70°C PF/Qual limit, also reduces the delta-T effect that will occur during any equipment power cycling during the mission, thus preserving thermal fatigue life, which is a consumable. 5. Having a standard electronics Qual/PF hot temperature limit of 70°C allows for a standard maximum AFT limit of 50°C which has the following advantages: a. It decouples the electronic assembly thermal design from flight system thermal design, allowing both disciplines to proceed with their designs in parallel with little chance for margin deterioration. b. An AFT limit of 50°C provides more flexibility and reliability in the system thermal design than lower AFT limits would. For example, decreasing the AFT from 50°C to 40°C may require going from a passive to an active thermal design, which may decrease reliability and increase mass. c. A standard electronics Qual/PF hot temperature limit of 70°C allows inherited electronics designs to be used in multiple mission applications. d. 70°C is consistent with the most robust of aero-space and industry requirements (71°C is the qualification requirement per MIL-STD-1540). The hot temperature test qualification level was originally established for the Block 1 Rangers and Mariner R spacecraft development phases. Electronics housed in the six bus bays were designed to allowable flight American Institute of Aeronautics and Astronautics temperatures of 5°C to 50°C. The upper limit of 50°C, according to anecdotal stories by engineers involved, was determined by predicting what the maximum temperature on the exposed surface of a passively controlled (white painted surface) of the housing of an electronic bay would be with the sun impinging directly on it while the spacecraft was traveling between the Earth and the moon. Since there was a great deal of uncertainty in the thermal design and thermal models it was decided to incorporate robustness into the performance demonstration for the electronics housed in the bays (most of these were analog electronics). The low temperature, 5°C, was to keep the hydrazine above its 2°C freezing during the flight. Since the first planetary mission was planned for Venus and both the lunar and planetary missions may have had to encounter a passage through the Earth's shadow, a concern for successful operation of critical electronics at the cold extremes was also indicated. As a result, the qualification temperature extremes were margined by AFT+/-25°C, yielding a requirement of -20°C to 75°C. These were the Ranger-Mariner "lines in the sand". Subsequently, this same requirement was applied to the Mariners, Viking, Voyager, Galileo, and, Cassini. Exceptions were introduced for unique hardware such as batteries, antennae, and appendage items. For Mars Pathfinder, and during the Faster/Better/Cheaper era, and to be more consistent with Military Standards, the qualification range was changed to -35 to 70°C. After one iteration back to 75°C in 2003, the 70°C limit is the current high temperature line in the sand.
The challenge to meet JPL margin requirements leads to frequent exception to the practice. 1. Some electronics designs, particularly for sensitive instruments, cannot perform within specification over the entire qualification range of -35°C to +70°C. For these assemblies, an exception is typically granted to reduce demonstration of in-specification performance to the Flight Acceptance range (AFT +/-5°C) and demonstration of operation that is predictable, repeatable and non-damaging to the full qualification range (-35°C to +70°C). 2. Many of JPL's inherited designs come from industry and typically have lower margins and qualification/protoflight temperatures. In that case a waiver (a documented authorization intentionally releasing a program or project from meeting a requirement) is required. Although time consuming to generate and process, these waivers are routinely accepted with low risk if reliable heritage is demonstrated. 3. As electronic parts become smaller and their packaging density increases, electronic packaging designs may have difficulty meeting the JPL de-rating guidelines for junction temperatures which can lead to a reduced availability of parts for JPL flight applications and/ or waivers.
E. Thermal Control
The thermal control system is designed to maintain the payload and the spacecraft subsystems within their Allowable Flight Temperature [AFT] requirements for all operating modes, in all thermal environments it may be exposed to, throughout the mission lifetime. Thermal control is achieved by implementing design features, thermal hardware, and spacecraft or instrument operational constraints. The standard JPL thermal engineering practice prescribes worst case methodologies for design. In this process, environmental and key uncertain thermal parameters (e.g., thermal blanket performance, interface conductance, optical properties) are stacked in a worst-case fashion to yield the upper and lower bounds of mission temperatures. This represents JPL's thermal design approach and is captured in JPL's thermal design procedure. Uncertainty in the margins and the absolute temperatures is usually estimated by sensitivity analyses and/or by comparing the worst-case results with "expected" results. Credibility checks are performed, such as energy balances, heat flow diagrams, and comparisons to development test data. These sanity checks are captured in JPL's best practices as well as available handbooks. Details and assumptions of the analytical model being used for design purposes along with any temperature requirement violations are documented in peer and project design review material.
Thermal subsystem design requirements are most commonly expressed in terms of Allowable Flight Temperatures (AFT), temperature gradient, temperature stability, and interface heat flow. These requirements are typically determined between the hardware Cognizant Engineer (CogE), the Environmental Requirements Engineer (ERE), and the Thermal Control Engineer.
There are several main operational modes that affect the thermal system design. During launch ascent, the spacecraft will be subject to radiant heating inside the launch fairing that must be accommodated by the thermal design. Spacecraft in orbit are subject to direct sunlight, planetary body albedo, and IR energy which all must be managed appropriately. Surface operations and ground testing will be subject to convection from ambient air and radiant heat exchange. Other surface effects, e.g., wind, must also be considered. The trajectory is also of concern. Missions that require very close approaches to the sun (perihelion) typically have pointing constraints as part of the thermal control strategy. Trajectory correction maneuvers will typically require the spacecraft to point in the American Institute of Aeronautics and Astronautics direction of its velocity vector. This may result in exposure to thermal environments that are detrimental to the spacecraft (e.g. sun broadside to the spacecraft or onto radiators). These transient events are typically designed to be accommodated by the hardware's thermal capacitance. In some cases, the maneuvers may be segmented to fit within the hardware's transient capability. Additional attitude constraints may exist, such as sun keep-out zones for optics, which can impose additional thermal control constraints. Outer planets missions experience very cold environments because of their large distance from the sun. In addition, the thermal system must be robust enough to support the safety of assemblies during system fault conditions.
The current and future state of thermal control systems requires much more verification and validation of flight systems by analysis than ever before. Spacecraft are getting too large for our simulators or our ability to simulate the environment on the ground technically very challenging. Simulating Surface Systems in the Martian atmosphere and gravity can be problematic. Also, the testing of scale models may not be representative. To mitigate these types of issues a robust thermal control system with large margins is typically implemented.
F. Assessment
The level of conservatism in the system design thermal margin must be considered in aggregate. Beginning at the semiconductor junction level, to the component case and circuit board assembly, to the protoflight operating conditions and predicted allowable in-flight system operational conditions; margin is cumulatively stacked to comprise a robust system design to improve reliability and reduce uncertainty. An example of integrated thermal margins of the different elements is represented in Fig. 4 .
Figure 4. Illustration of integrated thermal margins
In addition to margins, designers in respective domains employ a worst case methodology. Typically, this includes assumption of concurrent extreme power dissipation, environmental, and operating modes, in combination with the least favorable physical properties. Together this presents a formidable degree of conservatism intended to counteract the significant risks inherent in the space business.
What becomes complex, and is the subject of an ongoing debate at JPL, is the rationale and justification by which margins are allowed to depart from the current norm. It is not so much a question of process. It is much more a question of departing from the "tried and true" and entering into a regime for which less experience exists. While compromising mission success is no more an option than before, competitive pressures call for a less resource intensive design. The option space for margin ranges from retaining the current approach, which is rigid but familiar, all the way to changing qualification/protoflight margins and reliability margins, possibly at the discretion of a project, to gain flexibility and potential cost savings at the expense of increased risk. The topic of thermal margins is polarizing and that effect is reflected in the evaluation of the option space. Depending on perspective, an option may have strong positive and negative connotations at the same time. Negatives are typically derived from loss of robustness and heritage. Positives are derived from increased flexibility in design and a reduction of waivers with the implication of cost savings. In many cases a change has either little effect or the outcome is mission specific and not generally predictable with a reasonable amount of certainty. However, there are a few options that appear to be predominately positive or negative. As an additional aid, the potential effect that a reduction of margins has on maximum inflight junction temperature has been plotted in Figure  5 in the Appendix.
Conclusions
To the degree that a mission is classified (or willing) to take a certain amount of risk, a structured way to assess margin choices and consequences is needed. The work reported in this paper has revisited the rationale and intent for thermal margins that has evolved over several decades at JPL and is now codified in our design rules. At present, JPL management and technical experts are debating various options and their impacts that could be pursued in this complex, inter-related area of tailoring thermal requirements and margins to future mission needs. LITS lowered by 10°C
American Institute of Aeronautics and Astronautics
