Designing, testing, and producing a new computer processor is a complex and very expensive process. To reduce costly mistakes in hardware, the microarchitecture is usually designed and tested with the aid of a software simulator. The FAST System enables microarchitects to develop architecture simulators rapidly and is less error-prone than using a high level language such as C. In this paper, we describe how the FAST System's Architecture Description Language (ADL) has been extended to facilitate the description of complex instruction sets such as Intel's IA-32 instruction set architecture. In this respect, we demonstrate that the notion of inheritence, a key concept in object oriented programming languages can be extended for selective inheritence to enable the specification of complex instruction set architectures in architecture description languages.
Introduction
Micro-architecture exploration is a difficult, error-prone and development intensive endevaour. Traditionally, there has been three distinct approaches to micro-architecture exploration; namely, hand-coding a custom simulator, generation through a hardware description language and automatic generation through an architecture description language.
Custom simulators for a specific architecture are hand-coded in a generalpurpose high-level-language, e.g. C. This group includes SimpleScalar, SuperDLX, SPIM, and URM [2, 5, 3, 9] among others. The second group in-¡ This work is supported in part by a grant from DARPA, PACC Award no. F29601-00-1-0183 to the Michigan Technological University and a CAREER award (CCR-0347592) from the National Science Foundation to SonerÖnder. cludes hardware description languages and simulators such as VHDL, VER-ILOG and ELLA [1, 12, 8, 4] . Most of the hand-coded simulators are very specific to the architecture they simulate which makes it difficult to make modifications to the ISA or the microarchitecture to see how the changes affect performance. Ranging from several thousand to 30,000 lines of C code and taking 12-24 man months to develop, these are complicated software systems. Such simulators embody problems of all large scale software projects, despite the best efforts spent to increase maintainability. Trying to study such an existing simulator's source code and make changes without breaking anything can be problematic at best. Similarly, hardware description languages are not not suitable for micro-architecture exploration because they are designed to describe the hardware.
Architecture description languages on the other can specify the instruction set architecture (ISA), can generate support tools such as the assembler and the linker automatically and hide the details of instructions from the programmer. As a result, they enable a clean model of the micro-architecture operation. More importantly, they can specify and model the operation of the microarchitecture without tying it to a particular hardware implementation and therefore seamlessly map the instruction set specification to the micro-architecture specification.
Flexible Architecture Simulation Tool (FAST) and its description language Architecture Description Language (ADL) [6] is one such system, which has been in use by a number of universities to describe and simulate micro architectures of varying complexity. Thus, FAST fills in a gap between high-level architecture-specific simulators, and low-level hardware simulators. Doing so, it allows automatic generation of the necessary system tools (assemblers, linkers, and so on) through the ADL description. When using the FAST system, the first step is to describe the architecture in question using ADL. An architecture described by ADL is made up of two distinct sections -one section describes the ISA, and the other section describes how the microarchitecture works (e.g., what are the pipeline stages and what happens during each stage). The instructions are described in a declarative form, while the microarchitecture is done in an imperative form similar to other high-level imperative languages, like C. An example instruction specification is shown in Figure 2 . 
IA-32 Architecture
The latest incarnation of IA-32 as seen in the Pentium 4 processor has its roots in the 8086 and 8088 processors from 1978. The ISA embodies a variable length instruction set encoding and the processor supports many memory models including segmented memory. The architecture also include overlapping registers. There are very few, if any, wasted bits in a typical x86 instruction. All these properties make the Intel IA-32 architecture quite challenging for an ADL specification.
In addition to compact encoding, there are many addressing modes used in IA-32, which are, for the most part, independent of the instruction since they are encoded using the ModR/M byte (and an SIB byte if necessary). The challenging aspect of the many addressing modes in IA-32 is trying to define them succinctly in ADL, since the fields are mostly independent of the opcode. That is, the opcode alone does not indicate all of the fields that follow the opcode. The simplest way to approach this problem is to enumerate every possible variation of an instruction as if it were a separate instruction, since ADL allows instructions to be overloaded, just like functions in C++. However, this leads to the problem of having to overload the same instruction many times due to the many addressing modes. There are nine addressing modes, however, three modes (Base + Displacement, Base + Index + Displacement, and Base + (Index*Scale) + Displacement) can use either an 8-bit or a 32-bit displacement, giving us 12 effective modes. Furthermore, there are restrictions on when %esp and %ebp can be used for base or index registers. Treating these restrictions as special addressing modes (which would be necessary in the current version of ADL) gives us 6 additional special case modes, for a total of 18 addressing modes! Creating separate ADL instruction definitions for every combination of x86 opcode with addressing mode would generate thousands of ADL instructions. This is tedious and highly error-prone.
Another challenge is the overlapping registers. IA-32 includes eight 32-bit general purpose registers: EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI. However, in order to maintain backwards compatibility, there are aliases for 8-bit and 16-bit parts of the registers. For example, AL, AH, and AX all describe different parts of the EAX register. AL and AH are two 8-bit registers that represent the lower and upper 8 bits of the 16-bit register AX, and AX is the low 16 bits of the 32-bit register EAX.
Finally, unlike RISC architectures, instructions in CISC machines can operate directly on memory. This leads to instructions that can have a variety of arguments. The mov instruction for example takes two arguments, a source and a destination, one of which must be a register, the other can be either a register or a memory address. The address can come in any one of the 18 modes. In addition to creating a plethora of instructions, this makes the pipeline specification extremely complex.
Although some of these issues have been addressed within the context of instruction set specification with the SLED approach [10, 11] , as it can be seen, the approach taken by SLED is inadequate for automatic generation of simulators. Although one can describe x86 ISA in less than 500 lines of code in SLED, the language was only designed for encoding and decoding instructions (as the name implies). Many instructions in x86 are encoded/decoded the same way with the only difference being the opcode, so patterns are used to define many instructions in one line. On the other hand, in order to tie in the microarchitecture specification, one needs to be able to specify the semantics of each instruction. Since the semantics of each instruction are very different, attaching semantics to many opcodes cannot be done with one line, and an alternative technique must be sought.
Our Solutions
Most of the problems we encountered (overlapping registers, variable length instructions, etc.) were handled by a simple extension of ADL or modification of an existing syntax. The real problem with x86 was the combination of instructions with multiple addressing modes. With over 400 instructions and many of them using up to 18 modes for memory addressing, plus register-to-register arguments, we could easily end up with thousands of descriptions if we were to enumerate each permutation as a separate instruction. Figure 3 . Use of typesets to describe addressing modes Our careful study of the SLED encoding scheme where patterns are extensively used lead us to believe that it may be possible to define an encoding pattern for the addressing modes and let each instruction inherit the right pattern? This turned out to be the key idea for the x86 extensions to ADL: treat instructions as objects and use multiple inheritance with a twist! The encoding patterns would be defined by a series of templates, and the real instructions would inherit the properties from these patterns. The objects in ADL are the instruction templates and the instructions themselves. Templates differ from normal instructions in two ways.
Fields in instruction templates can be grouped or made optional with the use of regular-expression like syntax [7] . Parentheses group fields together, (field1 field2 ...), to indicate that all of the fields in the group must appear together. That is, field1 cannot exist without field2, and vice-versa. This is useful for larger fields like the ModR/M byte in IA-32, which consists of three smaller fields: the 2 bit mod field, the 3 bit reg/op field, and the 3 bit r/m field, used to describe how memory and/or registers will be addressed. A '?' following a field indicates the field is optional. For example, the SIB byte is optional depending on the ModR/M byte, thus, it appears as (scale index base)? . A '?' is really just shorthand for n,m ¡ syntax (where n=0 and m=1) which says the previous item must appear at least n times but no more than m times. Finally, a ¢ indicates logical-or, useful for fields that vary in size. Some instructions have 8-bit immediates, others 16-bit, and others 32-bit, and others none at all, so, putting it all together: (imm8
Templates can inherit properties from other instruction templates and override fields or sections from the parent. This allows creation of a master template. A master template is really just another template (i.e., it is not a special type of template), but it helps the programmer avoid syntactical errors. In IA-32, there is one general instruction format:
In this format there may be up to 4 prefixes, where each prefix is 1 byte long. The opcode field can be 1 or 2 bytes long and is followed by the optional ModR/M and SIB bytes. Displacement and the Immediate fields can be anywhere from 0 to 4 bytes. Templates and instructions inherit the properties illustrated in Figure 4 .
The first item to notice is on the 3 rd line, no arguments are given to the generic instruction name intel. The non-existent arguments will be overridden by the following templates. The emit line, on the other hand, defines every possible field that might be emitted by a descendant and uses the ? and £ n,m¤ modifiers to indicate optional fields. Only two attributes are defined at this time; more will probably be needed when the microarchitecture is implemented. Finally, there are three pipeline stages used to execute the instruction, an execute stage and a memory access before and after the execute stage for those instructions that need it (more on this below). Again, the pipeline stages may change with implementation of the microarchitecture.
An instruction that inherits from this intel master template is free to override the arguments, any of the emit fields, any of the attributes, or any pipeline stage. (Note that if a pipeline stage is overridden, the entire stage must be overridden, even if only one line is changed.) Inheritance is indicated by the inherits keyword following the instruction's arguments as shown in Figure 4 .
The template intel r8 bis has two arguments, an 8-bit destination register, rd8, and a memory location addressed by base index scale mode. It inherits from the intel master template and then defines exactly which fields will be emitted for this type of instruction. The scale, index, and base functions are built-in to the ADL language and, with the help of the regular expressions for the addressing modes, return the respective values for scale, index, and base. Finally, the s MEM pipeline stage is used to load a byte into a temporary pipeline register which will be used by the s EX stage in instructions that inherit from this template. (The code to actually load from memory will be Figure 4 . Instruction templates and Using inheritance specific to how the microarchitecture is implemented, so for now we describe what has to be done in comments.) Memory Addressing and Conditional Inheritance: Once all the templates are defined as shown above, the final step is to create conditional inheritance. This borrows from the idea of multiple inheritance, except instead of inheriting all of the features from the parents, it only inherits from the one parent with the best fit. The best fit is determined by the arguments to the instruction. (Note that this implies the inherited arguments must be unambiguous.)
This allows us to create one template for each addressing mode. Each template will have the emit fields defined and other common properties. The child that inherits from the template then overrides the emitted opcode field and defines in the pipeline stage what exactly the instruction does (i.e., the semantics that make languages like SLED unfeasible for our work). The common ad-dressing modes are then combined using the conditional inheritance feature into one template which the real instructions will inherit from. To reinforce the idea that this is not traditional multiple inheritance, the ¢ operator (logical or) is used to split parents. For example, instructions that have a 32-bit register for a source and a 32-bit word in memory for a destination would inherit from the intel r32 rm32 template that is shown in Figure 5 .
mov inherits intel_r32_rm32 emit opcode=0x8B begin exact s_EX rd32 = temp; end; end, Figure 5 . Conditional Inheritance and example instruction using inheritance Each of the 18 templates intel r32 rm32 inherits from define an addressing mode (there are 12 modes plus 6 special modes for using %esp or %ebp as a base register). A real instruction then inherits from intel r32 rm32 as shown in Figure 5 . To see this in action, consider the following x86 instructions: The instruction is the same in both cases, movl, but the arguments differ. However, they differ in a unique and unambiguous way which allow the compiler to match it against only one parent. The first instruction matches intel r32 rm32 and its parent intel r32 bd8 (base + 8-bit-displacement) (technically it also matches intel r32 bd32, but the compiler will be smart enough to choose an 8-bit-displacement if it can via a pragma). Likewise, the second instruction matches intel r32 rm32 but with a different parent, intel r32 bisd8. The third instruction matches none of the parents in intel r32 rm32, so the compiler looks for another instruction to match against (which, in this case, will be mov inherits intel r16 rm16 and its parent intel r16 bd8) 4 . Overlapping Registers: C style unions and typesets were introduced to deal with overlapping registers. First, the physical registers are defined as generic
Conclusions and Future Work
The x86 is a powerful and compact ISA, but it's this same compactness that makes it so difficult to work with (in compilers, in simulators, and more). We have shown that by introducing the notion of conditional multiple inheritance we have tackled the most difficult challenges of x86 within the realm of an architecture description language. Our future work on IA-32 on FAST can be broken into two broad areas, namely the implementation of the language constructs in the ADL compiler and the completion of the micro-architecture specification.
