Abstract | An in-depth analysis of the 80x86 processor families identies architectural properties that may have unexpected, and undesirable, results in secure computer systems. In addition, reported implementation errors in some processor versions render them undesirable for secure systems because of potential security and reliability problems. In this paper, we discuss the imbalance in scrutiny for hardware protection mechanisms relative to software, and why this imbalance is increasingly dicult to justify as hardware complexity increases. We illustrate this diculty with examples of architectural subtleties and reported implementation errors.
This analysis is being performed under the auspices of the National Security Agency's Trusted Product Evaluation Program (TPEP). The Trusted Product Evaluation Program was established to evaluate commercial products used in classied computing environments against the requirements dened within the Trusted Computer Systems Evaluation Criteria [Ncsc85] . A target assurance rating is assigned to a product under evaluation based on its security features, developmental controls, and the degree of analysis to which its security architecture is subjected. Because the preponderance of products in or being considered for TPEP evaluation use 80x86 processors (particularly products seeking the TCSEC's highest assurance rating), the The work reported in this paper was performed by The Aerospace Corporation under a contract to the National Security Agency as part of the Trusted Product Evaluation Program (contract F04701-93-C-0094). Oxford Systems participated in this study under contract to The Aerospace Corporation.
O. Sibert is with Oxford Systems Inc., 30 Ingleside Road, Lexignton, MA 02173-2522 USA; email: sibert@oxford.com.
P. Porras and R. Lindell are with The Aerospace Corporation, P.O. Box 92957, Los Angeles, CA 90009-2957, USA; email: porras@aero.org and lindell@aero.org. This paper is an extended version of a previous conference paper by the same authors [Sibe95] . 1 The \80x86" designation in this paper means, roughly, \80386 or better." The 8086 family is not included because it has no security mechanisms. Although the 80286 includes some of the same security mechanisms as later processors, it is not included because its architecture is signicantly dierent and it is not being used for new product development.
2 Intel 386, Intel 486, and Pentium are trademarks of Intel Corporation. All other trademarks are the property of their respective owners.
intent is to provide a common base of analysis that can be used by evaluation teams for dierent products incorporating these processors.
Traditionally, computer security evaluations have devoted little attention to hardware. 3 Section II explores this approach of implicitly trusting a system's hardware layer with minimal analysis, and why the increasing complexity of modern microprocessors makes this practice untenable.
The initial stage of the analysis focused on aspects of the architecture that may present pitfalls for secure system designers. Although we found no gross security aws \re-quired by the specication," we identied several features that, if not properly managed, introduce previously unreported covert channels 4 and other subtle problems. These results were surprising; we did not expect well-dened architectural features to cause undesirable security behavior. In retrospect, however, the very complexity of the architecture suggests that it was bound to include some unexpected feature interactions. These problems are discussed in section III.
In addition, we collected numerous reports of implementation errors claimed to exist in some processor versions. These are summarized in section I .
e were aware of a few widely-publicized computational errors in some processor versions: the recent oating point divide problem in the Pentium is the foremost example of a well-publicized (although not security-relevant) processor implementation aw. However, we were surprised to learn of so many distinct and varied reported implementation errors. Because some of these errors could introduce a means to bypass the protection mechanisms of a software platform hosted on the a icted hardware, it is important that they not be overlooked during a security evaluation. Although the problems appear to be xed in subsequent versions of each processor, it is not clear whether the dearth of reported aws in more modern processors (e.g., Intel 486, Pentium, non-Intel implementations) represents actual improvement in implementations, the predictable lag between product introduction and problem reports, or simply lesser eorts in making such information available.
The long-term goal of the analysis project is the development of a test suite to exercise 80x86 security features and search for additional aws. Essentially, we are performing an architecture study of the 80x86 protection mechanisms and a penetration testing eort. Sections and I review related work and discuss our plans. II. ackground The degree of assurance we gain in the protection mechanisms of a computer system is achieved through the various forms of analysis, testing and developmental controls that we impose on the system. A high-assurance secure computing system (such as one intended to satisfy the requirements of the Trusted Computer System Evaluation Criteria (TCSEC) [Ncsc85] at B2 or above) must be able to enforce security policies correctly and reliably, even while under hostile attack. uture versions of the system|developed in accordance with appropriate conguration management procedures|must continue to enforce those policies reliably. Moreover, the protection mechanisms that support these properties must be carefully structured and welldened for evaluation purposes. These fundamental aspects of secure systems form the basis for design, development, criteria, and evaluation worldwide.
A central concept in the design and analysis of trusted systems is that of the Trusted Computer Base (TCB). A TCB refers to the totality of protection mechanisms within a computer system that combine to enforce a unied security policy. These protection mechanisms may be provided at the hardware, rmware, and software layers, and each is important to the overall security of the system.
In general, however, much more eort is applied to assurance for a system's software components than for hardware components. Designers and evaluators tend to concentrate on the software portion of the Trusted Computing Base (TCB); hardware is assumed to operate securely if used correctly (this assumption is often implicit; [Gutt90] points out the need to verify such assumptions). Any hardwarerelated eort is primarily to ensure that software makes correct use of documented hardware features. This imbalance in the depth of hardware versus software analysis has been satisfactory 5 in the past, but is increasingly di cult to justify.
. ruste ystem ar ware e en encies
Although software may receive greater attention, hardware components are clearly critical to a system's security. Indeed (but for a few specialized exceptions), all hardware components are part of a system's TCB.
Generally the Central Processing nit (CP ) is most important. This component, which may include a distinct Memory Management nit (MM ), I processor, and or other functional units, is responsible for isolating the TCB from untrusted subjects, and subjects from each other. This may involve mechanisms such as user supervisor state or rings, address space separation, segmentation and segment protection, page protection, I device or address protection, etc. These mechanisms are fundamental to security enforcement|without them, the rest of the TCB could not maintain security.
In addition, the CP is trusted to be functionally correct; that is, to perform correct computations on behalf of the TCB software. If the CP operates incorrectly, the TCB software may fail unpredictably. However, such failures are less likely to introduce a security aw, and more likely to introduce computational errors that will cause non-TCB software to noticeably malfunction.
ther hardware components|memory, disks, tapes, other peripherals|are also trusted to function correctly in order for the TCB software to operate. Because such components rarely contain security mechanisms, their incorrect operation is similarly less likely to introduce an exploitable aw 6 and more likely be noticed.
.
valuation o ar ware ssurance hat is the reason for the traditional focus on software assurance in trusted systems? There are several. oremost, perhaps, is that this traditional approach \appears to work." There are many well-documented examples of security aws due to software errors, but few for hardware [ and94] .
Another reason is that, as developers and evaluators, we concentrate on what we understand and can aect. Hardware is typically presented as a black box, a \given," on which a secure system must be built. This hardware is often procured from a third party with no incentive (or capability) to provide any details about its implementation. In contrast to the malleable nature of software, hardware cannot be modied easily, and often not at all. urthermore, modifying and assessing hardware components requires dierent knowledge and skills than for software; these are rarely found together.
Although comfort and familiarity are di cult to justify scientically, a third reason is on rmer ground: relative complexity and reliability of software versus hardware. In general, the visible interface to a system's hardware components (such as a CP ) is much simpler than the trusted software interface. Because the interface is simpler, it is easier to test thoroughly, and given the speed of hardware operations, far more testing is possible. Hardware design is a more systematic and disciplined process than software design, contributing to reliability and general correctness (although there is less focus on isolation of security functions in hardware than for secure software). Together, these properties all contribute to a real distinction in the way that hardware assurance is provided.
nfortunately, simple interfaces do not necessarily correspond to simple implementations. In software, complexity is considered a prime breeding ground for aws; the same seems likely for hardware. However, limited visibility into hardware implementations makes it more di cult to judge hardware complexity. urthermore, because hardware components are often designed with an incomplete (or absent) understanding of security issues, the result may be poorly matched to the needs of a secure system. .
icro rocessor ssurance
The microprocessor has changed the way trusted systems are built. In the 19 0's, trusted systems were typically built entirely by one company: the same organization, although not necessarily the same people, produced both hardware and software components. Most hardware implementations were relatively simple|complex hardware was extremely costly to design and build. Security-relevant dependencies between hardware and software components could be addressed within the connes of a single organization. Analysis of, documentation for, or assurance about hardware components could be provided within the organization as required.
In the 1980's, this began to change: increasingly, hardware components became commodities, and organizations that had previously built their own processors and peripherals started to acquire those components from external sources over which they had relatively little inuence. ortunately, at least from an assurance perspective, trusted system development lagged this trend, and continued to rely either on proprietary hardware or on relatively simple commodity hardware.
In the 1990's, however, simplicity of hardware components is no longer a given. Processors in particular have become far more complex and less expensive, and trusted systems are migrating to those platforms. In particular, the architectural security features of the 80x86 processors and the prevalence of the 80x86-based PC-compatible architecture as a development and delivery platform has led many developers to target that environment. Table I shows all products available in mid-1994 that have been evaluated for TCSEC class B2 or better by the NSA's TPEP program.
f those, all but TS-200 is hosted on an 80x86 architecture; however, its follow-on product, TS-300, is claimed to be targeted for B3 evaluation and runs only on 80x86 platforms. Additionally, Trusted Information Systems (TIS) claims its Trusted Mach [Sebe94] project is targeted for B3 evaluation and employs the Intel 486 as its reference platform, and there are several products in development that also depend on these processors, including several targeted for B1 or below. Clearly, the 80x86 is the processor of choice for trusted systems in the 1990's|and if we were to include the list of systems accredited for use in security-critical processing environments, this list would expand signicantly.
nfortunately, this hardware cannot safely be dismissed as \simple" or not in need of in-depth analysis. A relatively simple interface can hide vast implementation complexities|and the 80x86 interface is far from simple. or example, the Pentium contains approximately 3.1 million transistors [Int94] . Its instruction processing and pipeline architecture are extremely complex. The popularity of these processors also raises the concern that a aw aecting one trusted system could easily aect others.
C.1 Assurance By Design and Analysis
Typically, software developers provide assurance by using top-down, modular design, minimizing complexity, structuring the development process, and documenting the design and implementation. Similarly, evaluators assess assurance by examination of the implementation and of the development evidence and documentation. These techniques correspond directly to modern software engineering practice, and involve information that is readily understood by both developers and evaluators.
nfortunately, these techniques do not apply well to hardware. The rst problem is that details of hardware design, structure, documentation, and so forth are generally not available to evaluators|if, indeed, they exist at all. Even in the mainframe era, this information was di cult to obtain within the connes of a single organization. or modern microprocessors, it is eectively impossible|the processor supplier is often unrelated to the system developer, and in any case considers the design details extremely proprietary. Although in-depth accurate hardware design documentation is more likely to exist today, it is simply not available in the context of a trusted system evaluation.
The second problem is that, even if hardware design documentation were available, it would be of limited utility to developers and evaluators whose skills lie largely in the software world. Although for simple processors a softwareoriented evaluator may be able to perform an informal assessment of hardware security (see section -A), it is impractical to gain the same degree of understanding as one would for software. or the complex implementations of today's microprocessors the situation is much more dicult.
ormal assurance methods represent another approach; this is discussed further in section -C.
C.2 Assurance by Testing and Exposure
The other primary technique for gaining assurance is testing. Hardware components support testing better than software, so this is relatively more eective. Testing, however, must be directed. It is a common fallacy that the exposure of a system to \millions of users" ensures that it is secure. In fact, this technique is of limited value, even for ensuring that a system is reliable|user acceptance may instead reect a tolerance of failure, and is in any case relevant primarily to the most heavily-used features of the system.
The more subtle problem with exposure testing for security is that security is the absence of undesired behavior. During normal system operation, a user is likely to notice when a function does not behave as advertised, because application programs use those functions. ne is much less likely to notice that a security policy has not been enforced. A correctly operating application program simply will not attempt operations that the security policies would prohibit|there can be no meaningful dependency on such operations. Thus, the exposure of the 80x86 processors to millions of PC users tells us little about the soundness of their security mechanisms.
Not all PC users even use the protection mechanisms (MS-D S users do not). f those who do, the vast majority do not depend on the mechanisms to en orce security policy. f that tiny minority that o expect security enforcement, few run hostile programs that would attempt operations that the hardware security mechanisms would prevent. Directed testing of the protection mechanisms seems essential, regardless of the total size of the user base. hile directed testing is clearly necessary, it too must contend with problems such as undocumented features. A processor with undocumented security-relevant features can undermine the TCB software's best eorts at policy enforcement. Although it is relatively easy to analyze software to determine all the functions it can perform, without visibility into the implementation, this is very di cult for a hardware component|and testing is not an eective substitute.
III. rc it ctura it a
This section describes several pitfalls in using the 80x86 architecture to build secure systems. These result from apparent design oversights and or unexpected interactions among processor features. In some cases, performance optimizations provide unexpected avenues for information ow.
The 80x86 architecture includes many mechanisms for constructing secure systems: The details of these mechanisms can be found in [Int386] , [Int486] , [IntP5] .
The Pentium processor brings a new aspect to this problem: it incorporates some functions that are e c t y undocumented. These are described in an \Appendix " to [IntP5] that is available only under strict non-disclosure protection. It is claimed in [IntP5] that these functions only provide optimi ations for certain TCB software operations, although without documentation, they cannot be fully analy ed for architectural pitfalls.
The pitfalls discussed here apply only to features (such as the unprivileged instruction set) that are directly visible to unprivileged programs, or that a TCB might virtualize to make indirectly available (such as the debugging registers). eatures that would necessarily be used solely by the TCB (such as page tables) are not addressed, since those resources must be entirely in the TCB's control.
In multi-level secure systems development, there is a concerted eort to dene and enforce rules of information ow. Covert channels provide avenues of attack that allow hostile processes to violate those information ow rules [ amp 3]. Several of the pitfalls discussed in the following sections represent covert channels. These are particularly interesting because some were discovered during the course of a high-assurance TPEP evaluation, and may be applicable to other products. ortunately, countermeasures were found to close the ows. They were not previously detected because the relevant processor features were not represented in the system specications. The others are simply features whose rules for safe use are more subtle than might be expected from an initial study of the architecture.
. la n ormation low
The 80x86 processors have a logically distinct loating Point nit ( P ) with its own register set and context. Because not all programs need or use the P , it would be ine cient to save and restore the P contents at every task switch. Therefore, there is a mechanism to help minimize P saves and restores: the Task Switched (TS) ag.
The TS ag works as follows: whenever a program executes a P instruction, if the TS ag is set, a hardwaredetected exception occurs, invoking the TCB. The TCB then has an opportunity to see whether the current process is the one that was last using the P . If so, it clears the TS ag and restarts the instruction immediately. If not, it must save the P context, re-load the P from the current process's P context, then clear the TS ag and restart the instruction. The TS ag is set automatically by the hardware whenever a task switch (i.e., loading the processor state for some process) occurs.
This would be ne except that the TS ag is visible outside the TCB. Although it is located in a restricted control register, an unprivileged instruction (Store Machine Status ord, ) makes the TS ag visible. The following scenario illustrates how the TS ag could be used as a sig-naling mechanism, and applies to all 80x86 processors (as well as the 80286).
igure 1 provides a simple demonstration of how the TS ag can be used to support a binary signal. In this example, the sending process signals the binary message to the receiving process. The scenario assumes that two processes, the sender S and receiver , share a single processor with no other processes active on the processor. Also assumed is the existence of a mechanism allowing S to sleep for a specied period of time, and preempt when this period is completed. S and are synchronized so that a bit is transmitted every cycle. A cycle is a xed period of time, during which executes three operations in sequence. begins the cycle by executing a P instruction, causing the TS ag to be cleared. Next, pauses (e.g., enters a wait loop), retaining control of the CP . inally, completes the cycle by polling the TS ag using . polls the TS ag to determine if its value was altered during the pause.
The sender S is then able to transmit one binary signal per cycle by preempting 's wait loop to signal or remaining silent through the cycle to signal . As illustrated in igure 1, S uses the system's sleep mechanism to suspend its operation until midway through 's second cycle. During the rst transmission cycle, S sleeps through the entire cycle resulting in no change to the TS ag: this is interpreted by as having received a . Midway through the second cycle, the alarm mechanism causes S to awaken and preempt 's wait loop. pon exchanging S for , the 80x86 task switch mechanism sets the TS ag. S then immediately sleeps again until the next transmission cycle in which a will be sent. This allows to swap back in and continue its wait loop, unaware it has been preempted. pon completion of the wait loop in cycle 2, polls the TS ag, and determines that it is now set. A set TS ag indicates that S has transmitted a . In the last cycle, S continues to sleep causing no change to the TS ag, which is then interpreted by as receiving the nal .
Although a TCB-provided timer facility is assumed for the example, this is not a necessary aspect of the channel.
or instance, in a multiprocessor system, the real sender S, running on processor A, could simply send interrupts or IPC messages to its proxy S that is sharing processor B with , causing S to wake up (and thus task switch) or not.
Methodologies for identifying covert channels through the analysis of a system's formal top-level specication ( T S) and or code analysis have been generally successful. However, one di culty in attempting to detect the TS ag channel with a T S and or code analysis is that the setting of the TS ag occurs internally to the processor. The state change representing the TS ag is not readily visible without a specic examination of the internal processor state, and is thus often omitted from analysis as an implementation detail. Indeed, this situation was observed during one security evaluation.
Interestingly, beginning with the i386, Intel moved the TS ag into the privileged C 0 register (it had previously been located in the 80286 Machine Status ord, which was removed in the i386). Intel recommends against further use of the instruction, and instead recommends use of the privileged instruction. nfortunately, compatibility with old (80286-based) programs requires that , and thus the value of the TS ag, remain available to unprivileged programs.
. onte t imin hannel
The previous scenario illustrates an information ow based on visibility of the TS ag with . However, even if the TS ag were not visible, its state could potentially be sensed by the duration of a P instruction. If the TS ag is set, an exception will occur and be handled by the TCB before the P instruction completes. This is likely to take considerably longer than if the TS ag were clear. This delay can be observed as greater execution time for the P instruction.
Additionally, the Intel scheme for handling numeric exception errors is to handle the error in the context of the next P instruction, regardless of which process executes the next instruction. Thus if process S performs an erroneous P instruction and immediately swaps out, process can sense this by performing a P instruction and observing the delay from the P context swap, plus the delay from handling the numeric exception.
undamentally, these ows are caused by the P context-saving optimization itself, not by visibility of the TS ag. Thus, a full remedy requires making the TS ag's eects, as well as its value, invisible to unprivileged programs.
. e ment ccesse it
The \accessed" bit in a segment descriptor is visible through the oad Access ights ( ) instruction. This bit is set whenever the segment's contents are accessed, and is reset only by TCB software.
If two processes share access to a segment descriptor (for instance, a read-only data segment in the Global Descriptor Table ( GDT)), a sending process can read from the segment and in doing so, signal to a receiving process that is waiting for the accessed bit to change. In practice, this is unlikely to be a serious problem because the bit, once set, could be reset only by TCB software, and because the number of possible GDT entries is xed. However, it is clearly a ow: an operation (performed by the sender) whose semantics clearly imply \read" actually performs a \write," and the result of that \write" is visible to the receiver without restriction.
A simple countermeasure is to ensure that the \accessed" bit of each descriptor is set when the descriptor is created. This, however, would eectively disable the feature, and render it unavailable to the TCB for use in segment management.
. ther e ment ttributes A similar situation to the segment-accessed bit exists for other segment attributes. There is an \available for soft- ware" bit that is also visible with , as well as a \present" bit; the TCB's use of those bits must be designed to avoid possible ows. Another attribute (segment limit) is also visible through an instruction ( oad Segment imit, ), and could be used to determine whether a segment had been extended by a reference beyond its earlier limit. A third potential problem comes from access rights themselves, visible both with and the access-checking instructions E ify ead (
) and E ify rite ( ): if a system uses access rights to implement demand segmentation, a program can determine whether a segment is actually writable. This could yield dierent results when checking writability than when attempting a write, as the latter would be made to work by the system's fault handler.
As with segment-accessed, none of these appear to present great risk, and they seem unlikely to be exploitable in real systems. However, they are examples of a design that is problematic for security in that it makes visible real attributes that should be virtualizable by a TCB. Because the applicable instructions are available without restriction (except for the target segment's P ), the TCB has no effective means of controlling their use, and must instead be designed to eliminate their potential misuse.
The eect of these instructions is subtle: although it is clear that the Page Directory Table and GDT themselves must be protected, it is not as obvious that the parts of descriptors that can be read (but not modied) through these instructions can present a security problem.
. a e ccess n isibility
The attribute-visibility pitfalls appear only at the segment level: there are no corresponding operations for interrogating page attributes, so page management can be truly invisible outside the TCB.
nfortunately, this means that, based on segment attributes, the access-checking instructions ( and ) can report that an address is readable (or writable) when it actually is not. A TCB that relies on page-level protection to enforce access control cannot use and to validate parameter addresses|even though a segment may be accessible, pages may be individually protected. These two instructions were provided to help S developers avoid access problems caused by invalid parameter addresses. nfortunately, they are unreliable for systems using page-level protection, because they could lead the TCB to conclude, incorrectly (and insecurely), that its caller has appropriate access to a protected page.
. nternal e ister isibility Several internal registers (those designating the local, global and interrupt descriptor tables; DT, GDT, and IDT respectively) are visible with unprivileged instructions. A program can issue the (Store DT , ) instruction to determine the current location of its DT. If DTs are ever moved or reassigned, perhaps in response to contention caused by other subjects, makes such changes visible, thus creating a potential information ow. Because the GDT and IDT are system-wide tables, their locations are less likely to be useful for covert channels.
ike segment attributes, the DT represents information that should be virtualizable by the TCB|or simply not accessible at all. Because the instruction is unprivileged, the TCB must be designed to preclude use of the DT as a covert channel.
. ebu e ister alues
Because the Debug egisters (D s) provide a powerful facility for program debugging, a general-purpose system may make them visible to unprivileged subjects. The D s are accessible only at P 0, so they must be virtualized and kept as part of a process's context. However, this is not su cient protection: the values set in D s must also be validated by the TCB to eliminate the potential for interference with TCB operation.
The breakpoint addresses are linear addresses, and so must be calculated by the TCB relative to the process's segments. The control values must be validated to avoid introducing undened values into the D s, or setting values in undened D elds. In practice, this should not be di cult.
ime tam ounter
The Pentium processors support a time stamp counter (TSC) register that provides a count of machine cycles since the processor was reset. This allows extremely highresolution timing of program activity and is useful for performance measurement and real-time programs. Although this is not fully documented in non-proprietary parts of [IntP5] , that manual contains su cient hints to determine how to understand the feature and make it available. In addition, [ anG94] provides a more detailed description.
High-resolution timing, however, is also the key to ecient exploitation of covert timing channels [Hu91]. ortunately, the Pentium has a control ag that makes the ead TSC ( ) instruction privileged; by making privileged, a TCB can virtualize the TSC to reduce its eectiveness for covert channels. As with any high-resolution clock, the TSC must be either virtualized or eliminated entirely to reduce the covert channel threat.
er ormance ounters
The Pentium has a set of Model-Specic egisters (MS s), most of which are undened in [IntP5] . They are, however, dened to be accessible only at P 0, reserving them to the TCB.
ecently, descriptions of some of these registers have been published [Math94] , based on analysis of an Inteldeveloped performance monitoring package. These MS s count internal processor events, such as cache misses, pipeline activity, etc., and can be used for characterizing program performance.
Although they are not directly accessible outside P 0, a TCB could (because they are useful to application programmers) provide virtual access to these MS s. As with the TSC, however, this requires care to reduce the threat from covert channels. nlike the TSC, which measures an external clock, some performance MS s can measure effects of other concurrent activities (e.g., cache snoops by another processor) or of previous activities (e.g., cache hits and misses due to cache activity by a previous subject). Earlier work [ ray91] has shown how to exploit such mechanisms based on measurement against a time reference, but MS s provide a direct indication.
. ache an imin hannels
As mentioned in the preceding scenario, caches present potential for covert timing channels. Even without MS s
In a system that relies on paging and page protection for all its separation, and provides a subject with one code and one data segment covering the entire linear address space, the breakpoint address need not be translated|however, it still must be validated to ensure that it does not permit breakpoints to be set for TCB addresses.
for direct measurements of cache activity, cache hits and misses can be detected strictly from instruction timing, as described in [ ray91] . To eliminate these ows, caches must be managed. This can reduce their e ciency considerably, depending on cache architecture, as it introduces otherwise unnecessary cache ush and invalidation activity.
The Intel 486 and later processors include caches for data and instructions, as do some of the non-Intel 80386-family processors. In addition, all 80x86 processors have a Translation ookaside Buer (T B) that supports a lower capacity covert timing channel. The aggregate number of entries in the T B is smaller than those available in the processor cache, resulting in a lower capacity channel.
. n e ne alues Most 80x86 computational instructions are dened explicitly to return \undened" values for arithmetic ags, presumably to aid implementation optimizations. Although it seems extremely likely that the actual values of the ags are derived deterministically from legitimately accessible information, 9 the architecture does not require it, and it is conceivable that they might represent the results of some processor activity performed by a previous subject and thus usable as a covert storage channel.
A few unprivileged instructions (for example, ) return values in which some bits are explicitly undened. Again, it seems extremely likely that those bits are derived from legitimately accessible information, but that is not required by the specication.
I . ort d I ntation rror
The 80x86 processors, particularly the early versions of the Intel 386 and Intel 486, have had a history of implementation errors reported in the computer trade press. ne such widely reported error involved incorrect results from 32-bit multiplication with certain operand values in some Intel 386 \B1 step" processors; Intel actually undertook additional testing and, until the problem was xed, provided specially-marked versions of these processors in which the error did not occur. An even more widely reported processor error, the Pentium aw, led to the replacement of faulty processors and has led to a signicant change in the way such errors are now reported by the vendor.
All of these widely-reported aws seemed to involve errors that, while they might cause an application program or possibly a TCB operation to operate incorrectly, did not appear to translate into exploitable security aws. A more in-depth search for reported aws, however, turned up some that clearly did involve the security architecture, as well as numerous others that did not.
These reports also suggest several general properties of aws:
1. It takes a while for any aw reports to become public. This appears at least partly due to the understandable There is a parallel here to [ arg74] , in which an undocumented instruction was discovered on the E-645 that accessed an internal processor register of no apparent signicance.
reluctance of system and processor manufacturers to announce aws in their products. 1 ccasional user reports of application malfunctions are not of much interest to the trade press, and in any case it may take considerable engineering eort to determine that a malfunction is caused by a hardware aw. 2. law reports (and, presumably, aws) are much more common for early versions of the processors. bviously: manufacturers are strongly motivated to x these problems while their eect on users is limited. ften, aw reports are not available until long after the aws have been xed. 3. law reports concentrate on processor functions that are heavily used in common application environments, but are not limited to those functions. Again, this is no surprise: if a processor feature is not used, it is unlikely that user exposure will result in discovery of an error in its implementation. If anything, it is a little surprising that some of the reported aws deal with obscure or pathological uses of processor features|it seems clear that some of these reports derive from explicit testing rather than accidental discovery.
verall, it does not seem safe to generalize from these properties. ithout visibility into the proprietary development procedures of processor and system manufacturers, we cannot reach any conclusion about, for example, why there are so few distinct reported aws in the Pentium. Although one Pentium aw (in oating point divide) has been extensively reported, few other reports have surfaced. It is di cult to know whether this is due to the customary reporting lag, an actual absence of aws, or simply a lack of available reports. It is similarly di cult to determine whether the little-used processor features are more or less likely to harbor aws (on the one hand, they are less exercised, so errors may not have been discovered; on the other hand, they are less likely to have a complex and optimized implementation, so errors may not have been introduced).
. e orte laws e have identied 11 reports of implementation errors aecting various versions of 80x86 processors from the de-
In many cases, the reports do not identify specic processor versions, but refer to \early" or \some" versions; where there are multiple reports of the same aw, it is counted in the category with the most specic version information. It seems generally safe to assume that a aw reported in one version is also present in the earlier versions; however, there are a few reports of new aws introduced by new versions. It is also not clear the extent to which non-Intel processors exhibit any of these faults. Some AMD processors, which share microcode with the Intel versions, would presumably share any microcode-based aws. thers, such as Cyrix processors could introduce aws not found in Intel or AMD versions [Meth94] .
e classied instruction aw reports as follows:
This refers to a aw directly exploitable by an unprivileged process that either provides a means to violate some hardwareenforced access restriction or that appears to have an adverse eect on security-critical data within the processor. These types of aws, by their nature, cannot be masked or removed by the software system which runs on the processor. This type of aw is the most serious type of security-relevant aw and would lead many system developers to avoid versions of processors with these types of errors. Extreme examples of this type of error is the ability to access either privileged registers or privileged address spaces directly from an unprivileged process. r This refers to a aw in an instruction that would likely be used in a system implementation, and which could lead to the mishandling of security-critical data within the TCB or processor.
hile it is usually possible to correct these types of aws in software, (e.g. avoiding odd byte boundaries of certain data structures) this requires that a system developer know of the existence of the aw and how to implement a work-around. In the context of this paper, we consider such aws as pitfalls, since both published erratas and work-arounds for these types of aws are not easily accessible to secure system developers or analysts, if they exist at all. r This is a aw that could apparently be used by an unprivileged program to \hang" the CP so that it can no longer operate. Again, these do not include problems due to incorrect initialization of TCB data structures.
r These are all the other aws, such as those causing incorrect computational results, those where an incorrect exception is reported, or that result in anomalous or unpredictable values being used by system or application software. Although all of these aws could potentially aect the correct operation of TCB software, they are not considered security-relevant because the eect is so indirect.
In addition to instruction aw reports, there are several reported aws concerning incorrect bus or cache operation.
e have excluded reports where apparent instruction misbehavior is attributed to problems in external circuitry. The primary goal of this eort is to identify exploitable errors that can be exercised from within the untrusted processor state. hile problems that arise from faulty external circuitry clearly have an eect of the trustworthiness of secure systems, the identication of such problems is considered outside the scope of this eort. Table II characterizes the 11 reports of distinct aws that we have encountered. f these aws, seven are categorized as hardware protection aws, providing a means for unprivileged processes to circumvent hardware access restrictions or cause security-critical processor data to be mishandled. ive aws are categorized as system instruction aws, requiring special attention by developers and analyst to ensure these aws do not introduce weaknesses in a system's security architecture. astly, ten examples are provided of aws that introduce denial of service errors in unprivileged instructions, allowing the unrestricted ability to halt the processor. rom the reports on which this table is based, it appears that it would be generally safe 11 to construct a trusted system with any Intel 386 processor later than step B1, and any Pentium. It must be emphasized, however, that this is based only on public reports of aws: no attempt has yet been made to verify these reports or to determine whether they have been xed in later processor versions. In addition, as discussed above, it is not clear whether the lack of reported security aws in the Pentium corresponds to a lack of actual aws or simply a lack of available reports.
. ar ware rotection laws
This section describes potential security aws that have been reported for various versions of the Intel 386 and Intel 486 processors. The descriptions in this and sections I -C and I -D are necessarily terse and dependent on a detailed knowledge of the 80x86 instruction set. A reference 11 The security aw for the Intel 386 0 step involves incorrect interpretation of the I O permission bitmap. A system that does not use that feature would not be aected by this aw; even systems that do use it would generally not be adversely aected. owever, to be safe, one should use a processor later than the 0 step if the I O bitmap is used.
to a more detailed description is associated with aws 1-6. law was independently conrmed by our group. e did not attempt to construct detailed exploitation scenarios for these aws because those would necessarily be dependent on an operating system's architecture. Although in some cases it is di cult to see how a securityrelevant aw might be exploited, it is even more di cult to show that exploitation would not be possible, and we chose to err on the conservative side in our assessments. The existence of any known aws, and the possibility that others might be present but unknown, argues strongly for avoiding the aected processor versions in any trusted system.
1. The instruction is used to load the Stack Selector with data from memory.
should verify that the equestor Privilege evel of the selector equals the Descriptor Privilege evel of the current code segment. The A1 step of the Intel 386 fails to perform this check. and are privileged instructions introduced in the Intel 486 to invalidate T B entries, invalidate cache, and ush and invalidate cache, respectively. Although they are designated privileged instructions, they operate successfully when executed from a non-privileged state (P 1-P 3).
ystem nstruction laws
This section describes aws that have been reported to cause various versions of the Intel microprocessors to mishandle security-critical data or incorrectly perform security-relevant operations. or each of the aws mentioned, a brief description of a work-around is presented.
hen implementing a trusted system, developers should use care to avoid the pitfalls associated with these aws.
1. A segment descriptor contains information needed by the memory management unit (MM ) to coordinate access to the associated segment. However, segment descriptors may be read incorrectly when not doubleword aligned on the 80386 A1 step.
[Turl88] ork-around: Segment descriptors should always be double word aligned. 2. nder the trap and fault logic of the 80x86, error status codes are dened to inform the system of the cause of an exception. nfortunately, on some versions of the 80386, a page fault concurrent with certain prefetch operations will yield incorrect values for dened bits in the error code.
[Humm92] ork-around: A page fault handler should examine the target linear address, privileges, and the page tables to determine the true state of the MM . 3. The and instructions on some versions of the 80386 malfunction when a null selector is specied as an operand. Instead of unconditionally clearing the ag (i.e., indicating a failed outcome), the instructions will attempt to perform their operation using descriptor 0 from the Global Descriptor Table (GDT). [Humm92] ork-around: Systems should initialize an invalid descriptor 0 for the GDT and avoid these instructions entirely. Access rights determination can be performed by directly reading the descriptors and decoding the appropriate information. 4. The processor will hang when a double fault occurs on an interrupt corresponding to an invalid entry in the Interrupt Descriptor Table ( In [Glig85], there is a description of the security analysis of the relatively simple security-enhanced Honeywell evel-6 minicomputer used in the A1-evaluated SC MP system. The analysis was performed relative to a specication of security properties, treating each instruction as a securitypreserving transition.
The Multics B2 evaluation [Mult85] included an informal analysis of security mechanism implementations in all hardware components. This was performed in 1984 by one of the authors (Sibert), based on interactive guidance from hardware design engineers. or the CP , this analysis was based on walk-throughs of the security-critical parts of the CP logic design at the gate level (the Multics CP is not microcoded). or the microcoded I controllers, the critical parts of microcode were examined.
There are several studies (e.g., [Baue84]) about architectural suitability of specic processors for building secure systems, and many more about hardware security features in general. These generally deal with high-level issues (e.g., are there enough domains, is process switching too expensive?), rather than with the sort of low-level details that yield the architectural pitfalls discussed in this paper. In [McAu92] , there is a discussion of maintaining hardware assurance when modifying a trusted product, but it too addresses only high-level architecture aspects. . These eorts were directed at overall correctness, a superset of security correctness. Most of these efforts dealt with processors that are far simpler than modern general-purpose microprocessors, and generally employ simplied specications of these processors.
The recent eort by S I and Collins [Sriv95] has demonstrated that it is possible to explore complex pipelined architectures with current verication tools and practices. Collins is investigating the further use of verication as a replacement for some design reviews in the development of the AAMP processors. These processors are used in highly safety-critical applications and are subject to design reviews and testing which presumably exceeds the scrutiny that is applied during a commercial microprocessor development cycle.
However, it is not clear that such techniques are generally scalable to modern and much larger microprocessors with features such as multiple pipelines and speculative execution. urthermore, the amount of time expended in these eorts is quite high and has apparently discouraged chip manufacturers from investing in them, except for some small research projects. The authors are aware of no current attempts to integrate formal verication into a commercial microprocessor development cycle.
I. utur ir ction Thus far, this project has identied architectural pitfalls and categorized aw reports in one microprocessor architecture. ur ndings point out the utility|indeed the necessity|for the closer examination of microprocessors in high-assurance secure system development.
Next, we intend to perform a thorough penetration-style test of the 80x86 processors. Such testing is limited to showing the resence of aws, not their absence, but we believe that even if testing discovers no aws, it will provide additional empirical support for trusting the processors| and if security aws are discovered, it will be extremely important for secure system developers to avoid them.
In addition, we will develop tests to exercise alreadyreported aws, at least those for which we have enough detail to construct a plausible exploitation scenario. This eort may be hampered by the unavailability of awed processor versions on which to run the tests, but the goal is to have tests that can show easily whether a particular processor exhibits any of the known security aws, rather than relying on ambiguous associations of aw reports with processor versions.
e will build a general purpose framework for test development to aid future penetration test eorts. The framework will support easy creation of protected mode environments, allowing customization of the framework's environment to correspond to the environments used by dierent secure systems, which dier in their use of various processor mechanisms (e.g., use of segments and paging). The framework will also support automated test case generation.
Currently, our penetration eort is limited by availability of information about the processors. In traditional penetration testing eorts, evaluators have complete access to internal design and implementation information about the system. Here, we are using only public information. A more thorough and cost-eective test could be done if design information were available. Despite these limitations, we are optimistic; we can only hope to be as successful as the analysis reported in [ arg 4], which was conducted under similar conditions. The planned tests are static: they will set up execution conditions in a stand-alone environment and observe the results. It would be a valuable extension to run these tests under more stressful conditions, such as might arise in a multiprocessor system: the interaction of externally-caused cache ushes, bus interactions, etc., might reveal additional problems.
astly, it is not clear whether the problems identied here demonstrate greater weaknesses within the 80x86 family (perhaps due to its complex architecture) than for other processors, or that they merely reect the popularity and exposure of the 80x86 relative to its competitors. During our literature survey for microprocessor implementation, we found the preponderance of aw reports concerned 80x86 processors, while other processor families (e.g., SPA C, MIPS, Motorola 68000) had few or no reported aws. 
