Abstract. There is growing interest in analyzing executables to look for bugs and security vulnerabilities. This paper describes the design and implementation of a language for describing the semantics of an instruction set, along with a runtime system to support the static analysis of executables written in that instruction set. The work advances the state of the art by creating multiple analysis phases from a specification of the concrete operational semantics of the language to be analyzed. By exploiting this powerful infrastructure for creating analysis components, it will be possibly for recently developed analysis techniques for analyzing executables to be applied more broadly, to executables written in a variety of instructions sets.
Introduction
The problem of analyzing executables to recover information about their execution properties has been receiving increased attention. However, much of this work has focused on specialized analyses to identify aliasing relationships [19] , data dependences [7, 13] , targets of indirect calls [18] , values of strings [12] , bounds on stack height [34] , and values of parameters and return values [40] . In contrast, Balakrishnan and Reps [8, 10] developed ways to address all of these problems by means of an analysis that discovers an overapproximation of the set of states that can be reached at each point in the executable-where a state means all of the state: values of registers, flags, and the contents of memory. Moreover, their approach is able to be applied to stripped executables (i.e., neither source code nor symbol-table/debugging information is available).
Although their techniques, in principle, are language-independent, they were instantiated only for the Intel IA32 instruction set. Our motivation is to provide a systematic way of extending those analyses-and others-to instruction sets other than IA32. The situation that we face is actually typical of much work on program analysis: although the techniques described in the literature are, in principle, language-independent, implementations are often tied to a specific language or intermediate representation (IR). This state of affairs reduces the impact that good ideas developed in one context (e.g., Java program analysis) have in other contexts (e.g., C++ analysis).
For high-level languages, the situation has been addressed by developing common intermediate languages, e.g., GCC's RTL, Microsoft's MSIL, etc. (although the academic research community has not rallied around a similar common platform). The situation is more serious for low-level instruction sets, because of (i) instruction-set evolution over time (and the desire to have backward compatibility as word size increased from 8 bits to 64 bits), which has led to instruction sets with several hundred instructions, and (ii) a variety of architecture-specific features that are incompatible with other architectures.
To address these issues, we developed a language for describing the semantics of an instruction set, along with a run-time system to support the static analysis of executables written in that instruction set. The work reported in this paper advances the state of the art by creating a system for automatically generating analysis components from a specification of the language to be analyzed. The system that we have created, called TSL (for "Transformer Specification Language"), has two classes of users: (1) instruction-set-specification (ISS) developers and (2) analysis developers. The former are involved in specifying the semantics of different instruction sets; the latter are involved in extending the analysis framework.
In the design of the TSL system, we were guided by the following principles: -There should be a formal language for specifying the semantics of the language to be analyzed. Moreover, ISS developers should specify only the abstract syntax and a concrete operational semantics of the language to be analyzed-each analyzer should be automatically generated from this specification. -Concrete syntactic issues-including (i) decoding (machine code to abstract syntax), (ii) encoding (abstract syntax to machine code), (iii) parsing assembly (assembly code to abstract syntax), and (iv) assembly pretty-printing (abstract syntax to assembly code)-should be handled separately from the abstract syntax and concrete semantics.
-Transformer-composition analyses [16, 37] , which are particularly useful for contextsensitive interprocedural analysis. -Unification-based analyses for flow-insensitive interprocedural analysis. In addition, an emulator (for the concrete semantics) is also supported.
Implemented Analyses. These mechanisms have been instantiated for a number of specific analyses that are useful for analyzing low-level code, including: value-set analysis [8, 10] ( §4.1), affine-relation analysis [8, §7.2] ( §4.2), aggregate structure identification [11] ( §4. 3) , def-use analysis (for memory, registers, and flags) ( §4.4), and generation of symbolic expressions for an instruction's semantics ( §4.5).
Established Applicability. The capabilities of our approach have been demonstrated by writing specifications for IA32 and PowerPC. These are nearly complete specifications of the languages-not idealized subsets, as are often used in academic studies-and include such features as (1) aliasing among 8-bit, 16-bit, and 32-bit registers, e.g., al, ah, ax, and eax (for IA32), (2) endianness, (3) issues arising due to bounded-wordsize arithmetic (overflow/underflow, carry and borrow, shifting, rotation, etc.), and (4) setting of condition codes (and their subsequent interpretation at jump instructions).
The abstract transformers for these analyses that are created from the IA32 and PowerPC32 TSL specifications have been put together to create a system that essentially duplicates CodeSurfer/x86 [9] . A similar analysis system for PowerPC is under construction. (The TSL-generated components are in place; only a few mundane infrastructure components are lacking.)
We have also experimented with sufficiently complex features of other low-level languages (e.g., register windows for Sun SPARC and conditional execution of instructions for ARM) to know that they fit our specification and implementation models.
There are many specification languages for instruction sets and many purposes to which they have been applied. In our work, we needed a mechanism to create abstract interpreters of instruction-set specifications. There are (at least) four issues that arise: during the abstract interpretation of each transformer, the abstract interpreter must be able to (i) execute over abstract states, (ii) execute both branches of a conditional expression, (iii) compare abstract states and terminate abstract execution when a fixed point is reached, and (iv) apply widening operators, if necessary, to ensure termination. Such a mechanism did not appear to be available in the languages that we looked at. As far as we know, TSL is the first instruction-set-specification language to support such mechanisms.
Although this paper only discusses the application of TSL to low-level instruction sets, we believe that only small extensions would be needed to be able to apply TSL to source-code languages (i.e., to create language-independent analyzers for source-level IRs). The main obstacle is that the concrete semantics of a source-code language generally uses an execution state based on nested variable-to-value (or variable-to-location, location-to-value) maps. For a low-level language, the state incorporates an addressbased memory model, for which the TSL language provides appropriate primitives.
The remainder of the paper is organized as follows: §2 introduces TSL and the capabilities of the system. §3 presents how the TSL system handles some important issues, such as recursion and conditional branches in CIR. §4 explains how CIR is instantiated to create an analyzer for a specific analysis component. §5 describes quirky features of several instruction sets, and discusses how those features are handled in TSL. §6 discusses related work.
Overview of the TSL System
This section provides an overview of the TSL system. We discuss how three analysis components are created automatically from a TSL specification, using a fragment of the IA32 instruction set to illustrate the process. Fig. 1 shows part of a specification of the IA32 instruction set taken from the manual [1] . The specification contains information about the registers as well as the addressing modes that are supported. It also provides the specification of the ADD instruction's action, i.e., how it manipulates its operands and how it changes the state. ADD32 32(dstOp, srcOp): [26] let dstVal = interpOp(S, dstOp); [27] srcVal = interpOp(S, srcOp); [28] res = dstVal + srcVal; [29] S2 = updateFlag(S, dstVal, srcVal, res); [30] in ( updateState( S2, dstOp, res ) ), [31] . [25] switch(I.id) { [26] case ID ADD32 32: { [27] operand32 dstOp = I.op1; [28] operand32 srcOp = I.op2; [29] INTERP::INT32 dstVal = interpOp(S, dstOp); [30] INTERP::INT32 srcVal = interpOp(S, srcOp); [31] INTERP::INT32 res = INTERP::Add(dstVal,srcVal); [32] state S2 = updateFlag(S, dstVal, srcVal, res); [33] ans = updateState(S2, dstOp, res); [34] } break; [35] . However, the specification from Fig. 1 is only semi-formal: it uses a mixture of English and pseudo-code. Our work is based on completely formal specifications, which are written in a language that we designed (TSL). TSL is a first-order functional language with a datatype-definition mechanism for defining recursive datatypes, plus deconstruction by means of pattern matching. Fig. 2 shows the part of the TSL specification that corresponds to Fig. 1 .
TSL from an ISS Developer's Standpoint
Much of what an ISS developer writes is similar to writing an interpreter for an instruction set in first-order ML [20] . An ISS developer specifies the abstract syntax grammar by defining the constructors for a language of instructions (lines 2-10), a concretestate type (lines [13] [14] [15] , and the concrete semantics of each instruction (lines [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] Fig. 2 .) These form part of the API available to analysis engines that use the TSL-generated transformers (see §4). The reserved types are used as an interface between the CIR and analysis domain implementations.
The definition of types and constructors on lines 2-10 of Fig. 2 is an abstract-syntax grammar for IA32. The definitions for var32 and var bool wrap the user-types reg32 and flag, respectively. Type reg32 consists of nullary constructors for IA32 registers, such as EAX() and EBX(); flag consists of nullary constructors for the IA32 condition codes, such as ZF() and SF(). Lines 4-7 define types and constructors to represent the various kinds of operands that IA32 supports, i.e., various sizes of direct register, immediate, and indirect memory operands. The reserved (but user-defined) type instruction consists of user-defined constructors for each instruction, such as ADD32 32 and ADD16 16, which represent instructions with different operand sizes.
The type state specifies the structure of the execution state. The state for IA32 is defined on lines 13-15 of Fig. 2 to consist of a memory-map, a register-map, and a flag-map. The concrete semantics is specified by writing a function named interpInstr (see line 23 of Fig. 2) , which maps an instruction and a state to a state. Fig. 3 shows part of the TSL CIR automatically generated from Fig. 2 . Each generated CIR is specific to a given instruction-set specification, but common (whence the name CIR) across generated analyses. Each generated CIR is a template class that takes as input INTERP, an abstract domain for an analysis (lines 1-2). The user-defined abstract syntax (lines 2-10 of Fig. 2 ) is translated to a set of C++ abstract-domain classes (lines 3-15 of Fig. 3 ) that contain appropriate abstract operators. The user-defined types, such as reg32, operand32, and instruction, are translated to abstract C++ classes, and the constructors, such as EAX, Indirect32, and ADD32 32, are subclasses of the parent abstract C++ class. Each user-defined function is translated to a CIR member function.
Common Intermediate Representation (CIR)
Each TSL basetype and basetype-operator is prepended with the template parameter name INTERP; INTERP is supplied for each analysis by an analysis designer. The with expression and the pattern matching on lines 24-25 of 
TSL from an Analysis Developer's Standpoint
The generated CIR is instantiated for an analysis by defining (in C++) an interpretation: a representation class for each TSL basetype, and implementations of each TSL basetype-operator and built-in function. Tab. 2 shows the implementations of primitives for three selected analyses: value-set analysis (VSA, see §4.1), quantifier-free bit-vector semantics (QFBV, see §4.5), and def-use analysis (DUA, see §4.4).
Each interpretation defines an abstract domain. For example, line 3 of each column defines the abstract-domain class for INT32: ValueSet32, QFBVTerm32, and UseSet. Each abstract domain is also required to contain a set of reserved functions, such as join, meet, and widen, which forms an additional part of the API available to analysis engines that use TSL-generated transformers (see §4).
Note that the work that an analysis developer performs is TSL-specific but independent of each language to be analyzed; from the interpretation that defines an analysis, the abstract transformers for that analysis can be generated automatically for every instruction set for which one has a TSL specification.
Generated Transformers
Consider the instruction "add ebx,eax", which causes the sum of the values of the 32-bit registers ebx and eax to be assigned into ebx. When Fig. 3 is instantiated with the Fig. 2 implement the three transformers presented (using mathematical notation) in Tab. 3 . Table 3 . Transformers generated by the TSL system. Analysis Generated Transformers for "add ebx,eax" The grey boxes represent TSLgenerated analysis components.
The TSL system provides two dimensions of parameterizability: different instruction sets and different analyses. Each ISS developer specifies an instruction-set semantics, and each analysis developer defines an abstract domain for a desired analysis by giving an interpretation (i.e., the implementations of TSL basetypes, basetype-operators, and access/update functions). Given the inputs from these two classes of users, the TSL system automatically generates an analysis component. Thus, to create M ×N analysis components, the TSL system only requires M specifications of the concrete semantics of instruction sets, and N analysis implementations (Fig. 4) , i.e., M + N inputs to obtain M × N analysis-component implementations.
The TSL system provides considerable leverage for implementing analysis tools and experimenting with new ones. New analyses are easily implemented because a clean interface is provided for defining an interpretation. It took approximately 1 man-day to create each of the DUA and QFBV interpretations.
Another measure of success is demonstrated by our effort to use TSL to recreate the analysis components used in CodeSurfer/x86 [9] . We estimate that the task of writing transformers (for eight analysis phases used in CodeSurfer/x86) consumed about 20 man-months; in contrast, we have invested a total of about 1 man-month to write the C++ code for the set of TSL interpretations that are used to generate the replacement components. To this, one should add 10-20 man-days to write the TSL specification for IA32: the current specifications for IA32 and PowerPC are, respectively, 2,834 and 1,370 (non-comment, non-blank) lines of TSL; the IA32 specification has gone through multiple revisions as the TSL system took shape; however, the PowerPC specification was written after the language stabilized, and took approximately 4 man-days.
Because each analysis is defined by providing an interpretation for the collection of TSL primitives, implementations of the abstract transformers for each analysis can be generated automatically for every instruction set for which one has a TSL specification. For instance, from the PowerPC specification, we were immediately able to generate all of the analyses that had been developed while working with the IA32 specification.
Ever since the days of the first compilers, systems that take over programming tasks previously performed manually have faced the question of how well their output performs compared to that created by human programmers. Due to the nature of the transformers used in one of the analyses that we implemented (affine-relation analysis (ARA) [28] ), it was possible to write an algorithm to compare the TSL-generated ARA transformers and the hand-coded ARA transformers that were incorporated in CodeSurfer/x86. On a corpus of 542 instruction instances that covered various opcodes, addressing modes, and operand sizes, we found that the TSL-generated transformers were equivalent in 324 cases and more precise than the hand-coded transformers in the remaining 218 cases. 5 In addition to leverage and thoroughness, for a system like CodeSurfer/x86-which uses multiple analysis phases-automating the process of creating abstract transformers ensures semantic consistency; that is, because analysis implementations are generated from a single specification of the concrete semantics, this guarantees that a consistent view of the concrete semantics is adopted by all of the analyses used in the system.
It takes approximately 8 seconds (on an Intel Pentium 4 with a 3.00GHz CPU and 2GB of memory, running Centos 4) for the TSL compiler to compile the IA32 specification to C++, followed by approximately 20 minutes wall-clock time (on an Intel Pentium 4 with a 1.73GHz CPU and 1.5GB of memory, running Windows XP) to compile the generated C++. 5 Approximately 130 of the cases of improvement can be ascribed to "fatigue factor" on the part of the human programmer: the hand-coded versions adopted a pessimistic view and just treated certain instructions as always assigning an unknown value to the registers that they affected, regardless of the values of the arguments. Because the TSL-generated transformers are based on the ARA interpretation's definitions of the TSL basetype-operators, the TSL-generated transformers were more thorough: a basetype-operator's definition in an interpretation is used in all places that the operator arises in the specification of the instruction set's concrete semantics.
Generation of the Common Intermediate Representation
Given a TSL specification of an instruction set, the TSL system generates CIR that consists of two parts: one is a list of C++ classes for the user-defined abstract-syntax grammar; the other is a list of C++ template functions for the user-defined functions, including the interface function interpInstr. The C++ functions are generated by linearizing the TSL specification, in evaluation order, into a series of C++ statements. However, there are some important issues that need to be properly handled for the resulting code to be able to used to create abstract interpreters for an instruction-set specification. In particular, the code generated for each transformer must be able to: (i) execute over abstract states ( §3.1), (ii) possibly propagate abstract states to more than one successor in a conditional expression ( §3.2), (iii) compare abstract states and terminate abstract execution when a fixed point is reached ( §3.3), and (iv) apply widening operators, if necessary, to ensure termination ( §3.3). In §3.4, we discuss an additional issue that arises in CIR generation, which is important for avoiding loss of precision for some generated analyzers.
Execution Over Abstract States
As discussed in §2. 2 State(memory, regs, flags): [5] let direction = VarBoolAccess(flags, DF()); [6] edi = RegValue32(regs,EDI()); [7] esi = RegValue32(regs,ESI()); [8] src = MemAccess 32 8 LE 32(memory, esi); [9] newRegs = direction [10] ? RegUpdate32(RegUpdate32( [11] regs,EDI(),edi-4), ESI(),esi-4) [12] : RegUpdate32(RegUpdate32( [13] regs,EDI(),edi+4), ESI(),esi+4); [14] newS = State(MemUpdate 32 8 LE 32 ( [15] memory, edi, src), newRegs, flags); [16] in ( repMovsd(newS, count -1) ) [17] global S = ⊥; [7] global count = ⊥; [8] global retval = ⊥; [9] return repMovsdAux(S, count); [10] Recursion is not often used in TSL specifications, but is needed for handling some instructions that involve iteration, such as the IA32 stringmanipulation instructions (STOS, LODS, MOVS, etc., with various REP prefixes), and the PowerPC multiple-word load/store instructions (LMT, STMT, etc). For these instructions, the amount of work performed is controlled either by the value of a register, the value of one or more strings, etc. These instructions can be specified in TSL using recursion. 6 For each recursive function specified by an ISS developer, the TSL system generates a function that appropriately compares abstract values and terminates the recursion if abstract values are found to be equal (i.e., the recursion has reached a fixed point). The function is also prepared to apply the widening operator that the analysis developer has specified for the abstract domain in use.
For example, Fig. 6 shows the userdefined TSL function that handles "rep movsd", which copies the contents of one area of memory to a second area. 7 The amount of memory to be copied is passed into the function as the argument count. Fig. 7 shows its translation into the CIR. A recursive function like repMovsd (Fig. 6 ) is split into two functions, repMovsd (line 4 of Fig. 7 ) and repMovsdAux (line 11 of Fig. 7) . The TSL system initializes appropriate global variables global S and global count (lines 6-8) in repMovsd, and then calls repMovsdAux (line 9). At the beginning of repMovsdAux, it generates statements that widen each of the global variables with respect to the arguments, and test whether all of the global variables have reached a fixpoint (lines 13-17). If so, repMovsdAux returns global retval (line 18). If not, the body of repMovsdAux is analyzed again (lines [23] [24] [25] [26] . Note that at the translation of each normal return from repMovsdAux (e.g., line 27), the return value is joined into global retval. The TSL system requires each analysis developer to define the functions join and widen for the basetypes of the interpretation used in the analysis.
Two-Level CIR
The examples given in Figs. 3, 5 , and 7, show slightly simplified versions of CIR code. The TSL system actually generates CIR code in which all the basetypes, basetypeoperators, and access/update functions are appended with one of two predefined namespaces that define a two-level interpretation [29, 22] [9] let . . . [10] cia = RegValue32(S, CIA()); // current address [11] new ia = (AA ? target // direct: BCA/BCLA [12] : cia + target); // relative: BC/BCL [13] lr = RegValue32(S, LR()); // linkage address [14] new lr = [15] (LK ? cia + 4 // change the link register: BCL/BCLA [16] : lr); // do not change the link register: BC/BCA [17] . . . [1] AddSubInstr(op, dstOp, srcOp): // ADD or SUB [2] let dstVal = interpOp(S, dstOp); [3] srcVal = interpOp(S, srcOp); [4] ans = (op == ADD() ? dstVal + srcVal [5] : dstVal -srcVal); // SUB() [6] in ( . . . ), [7] . . .
Fig. 9. An example of factoring in TSL.
The reason for using a two-level CIR is that the specification of an instruction set often contains some manipulations of values that should always be treated as concrete values. For example, an ISS developer could follow the approach taken in the PowerPC manual [2] and specify variants of the conditional branch instruction (BC, BCA, BCL, BCLA) of PowerPC by interpreting some of the fields in the instruction (AA and LK) to determine which of the four variants is being executed (Fig. 8) .
Another reason that this issue arises is that most well-designed instruction sets have many regularities, and it is convenient to factor the TSL specification to take advantage of these regularities when specifying the semantics. Such factoring leads to shorter specifications, but leads to the introduction of auxiliary functions in which one of the parameters holds a constant value for a given instruction. Fig. 9 shows an example of factoring. The IA32 instructions ADD and SUB both have two operands and can. share the code for fetching the values of the two operands. Lines 4-5 are the instruction-specific operations; the equality expression "op == ADD()" on line 4 can be (and should be) interpreted in concrete semantics.
In both cases, the precision of an abstract transformer can sometimes be improvedand is never made worse-by interpreting subexpressions associated with the manipulation of concrete values in concrete semantics. For instance, consider a TSL expression let v = (b ? 1 : 2) that occurs in a context in which b is definitely a concrete value; v will get a precise value-either 1 or 2-when b is concretely interpreted. However, if b is not expressible precisely in a given abstract domain, the conditional expression "(b ? 1 : 2)" will be evaluated by joining the two branches, and v will not hold a precise value (It will hold the abstraction of {1, 2}.).
To address this issue, we perform binding-time analysis [21] on the TSL code, the outcome of which is that expressions associated with the manipulation of concrete values in an instruction are annotated with C, and others with A. 
Generation of Static Analyzers
In this section, we explain how various analyses are created using our system, and illustrate this process with some specific analysis examples.
As illustrated in Fig. 4 , a version of the interface function interpInstr is created for each analysis. Each analysis engine calls interpInstr at appropriate moments to obtain a transformer for an instruction being processed. Analysis engines can be categorized as follows: For each analysis, the CIR is instantiated with an interpretation by an analysis developer. This mechanism provides wide flexibility in how one can couple the system to an external package. One approach, used with VSA, is that the analysis engine (written in C++) calls interpInstr directly. In this case, the instantiated CIR serves as a transformer evaluator: interpInstr is prepared to receive an instruction and an abstract state, and return an abstract state. Another approach, used in DUA, is used when interfacing to an analysis component that has its own input language for specifying abstract transformers. In this case, the instantiated CIR serves as a transformer generator: interpInstr is prepared to receive an instruction and a default abstract state 8 and return a transformer specification in the analysis component's input language.
The following subsections discuss how the CIR is instantiated for various analyses. 8 In the case of transformer generation for a TC analyzer, the default state is the identity function.
Creation of a TA Transformer Evaluator for VSA
VSA is a combined numeric-analysis and pointer-analysis algorithm that determines a safe approximation of the set of numeric values and addresses that each register and memory location holds at each program point [10] . A memory region is an abstract quantity that represents all runtime activation records of a procedure. To represent a set of numeric values and addresses, VSA uses value-sets, where a value-set is a map from memory regions to strided intervals. A strided interval represents a set of numbers with a lower bound, an upper bound, and a stride [35] . VSA uses this transformer evaluator to create an output abstract state, given an instruction and an input abstract state. For example, row 1 of Tab. 3 shows the generated VSA transformer for the instruction "add ebx,eax". The VSA evaluator returns a new abstract state in which ebx is updated with the sum of the values of ebx and eax from the input abstract state and the flags are updated appropriately.
Creation of a TC Transformer Generator for ARA
An affine relation is a linear-equality constraint between integer-valued variables. ARA finds all affine relationships that hold in the program, for a given set of variables. This analysis is used to find induction-variable relationships between registers and memory locations; these help in increasing the precision of VSA when interpreting conditional branches [8, §7.2] .
The principle that is used to create a TC transformer generator is as follows: by interpreting the TSL expression that defines the semantics of an individual instruction using an abstract domain in which values represent transformers, each call to interpInstr will residuate a transformer for the instruction. In the case of ARA, the CIR is instantiated so that for each instruction, the generated transformer operates on an abstract domain whose values are sets of matrices that represent affine transformations on registers and memory locations of the state [28] .
Interpretation of Basetypes and Basetype-Operators. The abstract domain for the integer basetypes is a set of linear expressions in which variables are either a register or an abstract memory location-the actual representation of the domain is a set of columns that consist of an integer constant and an integer coefficient for each program variable. This column represents an affine expression over the values that the variables' hold at the beginning of the instruction. The basetype operations are defined so that only a set of linear expressions can be generated; any operation that leads to a non-linear expression, such as Times(eax, ebx), returns TOP, which means that no affine relationship is known to hold.
Interpretation of Map-Basetypes and Access/Update Functions.
The abstract domain of the maps for ARA is a set of matrices of size (N + 1) × (N + 1), where N is the number of program variables. This abstraction, which is able to find all affine relationships in an affine program, was defined by Müller-Olm and Seidl [28] . Each access function extracts a set of columns associated with the variable it takes as an argument, from the set of matrices for its map argument-e.g., memmap or var32Map in Tab. 1 . Each update function creates a new set of matrices that reflects the affine transformation associated with the update to the variable in question.
For each instruction, the ARA transformer relates linear-equality relationships that hold before the instruction to those that hold after execution of the instruction.
Creation of a UB Transformer Generator for ASI
ASI is a unification-based, flow-insensitive algorithm to identify the structure of aggregates in a program [11] . For each instruction, the transformer generator generates a set of ASI commands, each of which is either a command to split a memory region or a command to unify some portions of memory (and/or some registers) At analysis time, a client analyzer typically applies the transformer generator to each of the instructions in the program, and then feeds the resulting set of ASI commands to an ASI solver to refine the memory regions.
Abstract Domain for Basetypes and Basetype-Operators.
The abstract domain for the basetypes is a dataref, which is either a memory access or a register access. The arithmetic, logical, and bit-vector operations on dataref s convert dataref s to unassignable dataref s, which means that they will only be used to generate splits.
Abstract Domain for Map-Basetypes and Access/Update Functions.
The abstract domain of the maps for ASI is a set of splits and unifications. The access functions generate a dataref associated with memory location or register. The update functions create a set of unifications or splits according to the dataref of the data argument.
For example, for the instruction "mov [ebx],eax", when ebx holds the abstract address AR foo−12, where AR foo is the memory-region for some procedure foo's activation records, the ASI transformer generator emits the ASI unification command "AR foo [-12:-9 The DUA results (e.g., row 3 of Tab. 3) are used to create transformers for several additional analyses, such as GMOD analysis [15] , which is an analysis to find modified variables for each function f (including variables modified by functions transitively called from f ) and live-flag analysis, which is used in our version of VSA to perform trace-splitting/collapsing ( §4.5).
Quantifier-Free Bit-Vector (QFBV) Semantics
QFBV semantics provides a way to obtain a symbolic representation-as a formula in first-order quantifier-free bit-vector logic-of an instruction's semantics. The Interpretation of Basetypes and Basetype-Operators. The abstract domain for the integer basetypes is a term, and each operator on it constructs a term that reflects the operation. The abstract domain for BOOL is a formula, and each operator on it constructs a formula that reflects the operation. The Interpretation of Map-Basetypes and Access/Update Functions. The abstract domain for the state components is a dictionary that maps a storage component to a term (or a formula in the case of VARBOOLMAP). The access/update functions retrieve from and update the dictionaries, respectively.
QFBV semantics is useful for a variety of purposes. One use is as auxiliary information in an abstract interpreter, such as the VSA analysis engine, to provide more precise abstract interpretation of branches in low-level code. The issue is that many instruction sets provide separate instructions for (i) setting flags (based on some condition that is tested) and (ii) branching according to the values held by flags. To address this problem, we use a trace-splitting/collapsing scheme [26] . The VSA analysis engine partitions the state at each flag-setting instruction based on live-flag information (which is obtained from an analysis that uses the DUA transformers); a semantic reduction [16] is performed on the split VSA states with respect to a formula obtained from the transformer generated by the QFBV semantics. The set of VSA states that result are propagated to appropriate successors at the branch instruction that uses the flags.
The cmp instruction (A), which is a flag-setting instruction, has SF and ZF as live flags since those flags are used at the branch instructions js (B) and jz (E): js and jz jump according to SF and ZF, respectively. After interpretation of (A), the state S is split into four states, S 1 , S 2 , S 3 , and S 4 , which are reduced with respect to the formulas ϕ 1 : (eax − 10 < 0) associated with SF, and ϕ 2 : (eax − 10 == 0) associated with ZF.
Because ϕ 1 ∧ ϕ 2 is not satisfiable, S 1 becomes ⊥. State S 2 is propagated to the true branch of js (i.e., just before (C)), and S 3 and S 4 to the false branch (i.e., just before (D)). Because no flags are live just before (C), the splitting mechanism maintains just a single state, and thus all states propagated to (C)-here there is just one-are collapsed to a single abstract state. Because ZF is still live until (E), the states S 3 and S 4 are maintained as separate abstract states at (D).
Paired Semantics
Our system allows easy instantiations of reduced products [16] by means of paired semantics. The TSL system provides a template for paired semantics as shown in Fig. 10 . [5] DUA::INTERP1::INT32 addr1 = addr.GetFirst(); [6] DUA::INTERP2::INT32 addr2 = addr.GetSecond(); [7] DUA::INT32 answer = interact(mem1, mem2, addr1, addr2); [8] return answer; [9] } Fig. 11 . An example of C++ explicit template specialization to create a reduced product.
The CIR is instantiated with a paired semantic domain defined with two interpretations, INTERP1 and INTERP2 (each of which may itself be a paired semantic domain), as shown on line 1 of Fig. 11 . The communication between interpretations may take place in basetype-operators or access/update functions; Fig. 11 is an example of the latter. The two components of the paired-semantics values are deconstructed on lines 3-6 of Fig. 11 , and the individual INTERP1 and INTERP2 components from both inputs can be used (as illustrated by the call to interact on line 7 of Fig. 11 ) to create the paired-semantics return value, answer. Such overridings of basetype-operators and access/update functions are done by C++ explicit specialization of members of class templates (this is specified in C++ by "template<>"; see line 2 of Fig. 11) .
We also found this method of CIR instantiation to be useful to perform a form of reduced product when analyses are split into multiple phases, as in a tool like CodeSurfer/x86. CodeSurfer/x86 carries out many analysis phases, and the application of its sequence of basic analysis phases is itself iterated. On each round, CodeSurfer/x86 applies a sequence of analyses: VSA, DUA, and several others. VSA is the primary workhorse, and it is often desirable for the information acquired by VSA to influence the outcomes of other analysis phases. We can use the paired-semantics mechanism to obtain desired multiphase interactions among our generated analyzers-typically, by pairing the VSA interpretation with another interpretation. For instance, with DUA INTERP alone, the information required to get abstract memory location(s) for addr is lost because the DUA basetype-operators (+ and * on line 3 of Fig. 12 ) just return the union of the arguments' use sets. With the pairing of VSA INTERP with DUA INTERP (line 1 of Fig. 11 ), DUA can use the abstract address computed for addr2 (line 6 of Fig. 11 ) by VSA INTERP, which uses VSA INTERP::Add and VSA INTERP::Mult; the latter operators operate on a numeric abstract domain (rather than a set-based one).
Note that during the application of the paired semantics, VSA interpretation will be carried out on the VSA component of paired intermediate values. In some sense, this is duplicated work; however, a paired semantics is typically used only in a phase of transformer generation (for a TC-style or UB-style evaluator), where the transformers are generated during a single pass over the interprocedural CFG to generate a transformer for each instruction. Thus, only a limited amount of VSA evaluation is performed (equal to what would be performed to check that the VSA solution is a fixed point).
Instruction Sets
In this section, we discuss the quirky characteristics of some instruction sets, and various ways these can be handled in TSL.
IA32
To provide compatibility with 16-bit and 8-bit versions of the instruction set, IA32 provides overlapping register names, such as AX (the lower 16-bits of EAX), AL (the lower 8-bits of AX), and AH (the upper 8-bits of AX). There are two possible ways to specify this feature in TSL. One is to keep three separate maps for 32-bit registers, 16-bit registers, and 8-bit registers, and specify that updates to any one of the maps affect the other two maps. Another is to keep one 32-bit map for registers, and obtain the value of a 16-bit or 8-bit register by masking the value of the 32-bit register.
Another characteristic to note is that IA32 keeps condition codes in a special register, called EFLAGS. 9 One way to specify this feature is to declare "reg32:Eflags();", and make every flag manipulation fetch the bit value from an appropriate bit position of the value associated with Eflags in the register-map. Another way is to have symbolic flags, as in our examples, and have every manipulation of EFLAGS affect the individual flags.
ARM
Almost all ARM instructions contain a condition field that allows an instruction to be executed conditionally, dependent on condition-code flags. This feature reduces branch overhead and compensates for the lack of a branch predictor. However, it may worsen the precision of an abstract analysis because in most instructions' specifications, the abstract values from two arms of a TSL conditional expression would be joined.
[1] MOVEQ(destReg, srcOprnd): [2] let cond = VarBoolAccess(flagMap, EQ()); [3] src = interpOperand(curState, srcOprnd); [4] a = Var32Update(regMap, destReg, src); [5] b = regMap; [6] answer = cond ? a : b; [7] in ( answer ) Fig. 13 . An example of the specification of an ARM conditional-move instruction in TSL.
For example, MOVEQ is one of ARM's conditional instructions; if the flag EQ is true when the instruction starts executing, it executes normally; otherwise, the instruction does nothing. Fig. 13 shows the specification of the instruction in TSL. In many abstract semantics, the conditional expression "cond ? a : b" will be interpreted as a join of the original register map b and the updated map a, i.e ., join(a,b) . Consequently, destReg would receive the join of its original value and src, even when cond is known to have a definite value (TRUE or FALSE) in VSA semantics. The paired-semantics mechanism presented in §4. 6 A specific platform will have some total number of registers, which are organized as a circular buffer; when the buffer becames full, registers are spilled to the stack to free up a sufficient number for the called procedure. Fig. 14 shows a way to accomodate this feature. The syntactic register (OutReg(n) or InReg(n), defined on line 2) in an instruction is used to obtain a semantic register (Reg(m), defined on line 1, where m represents the register's global index), which is the key used for accesses on and updates to the register map. The desired index of the semantic register is computed with the index of the syntactic register, the value of CWP (the current window pointer) from the current state, and the platform-specific value NWINDOWS.
Related Work
There are many specification languages for instruction sets and many purposes to which they have been applied. Some were designed for hardware simulation, such as cycle simulation and pipeline simulation [30, 27] . Others have been used to generate an emulator for compiler-optimization testing [17, 23] . TDL [23] is a hardware-description language that supports the retargeting of analyses and optimizations relevant to instruction scheduling, register assignment, and functional-unit binding. The New Jersey machinecode toolkit [33] addresses concrete syntactic issues (instruction decoding, instruction encoding, etc.). Harcourt et al. used ML to specify the semantics of instruction sets [20] . LISAS [14] is an instruction-set-description language that was developed based on their experience using ML. The latter two approaches particularly influenced the design of the TSL language.
TSL shares some of the same goals as λ-RTL [32] (i.e., the ability to specify the semantics of an instruction set and to support multiple clients that make use of a single specification). The two languages were both influenced by ML, but different choices were made about what aspects of ML to retain: λ-RTL is higher-order, but without datatype constructors and recursion; TSL is first-order, but supports both datatype con-structors and recursion. 10 The choices made in the design and implementation of TSL were driven by the goal of being able to define multiple abstract interpretations of an instruction-set's semantics.
Some systems for representing and analyzing programs are (mainly) targeted for a single language. For instance, SOOT [4] is a powerful and flexible analysis/optimization framework that supports analysis and transformation of Java bytecode.
One method to support the retargeting of analyses to different languages is to create a package that supports a family of program analyses that different front ends can use to create analysis components. Examples include BDDBDDB [39] , Banshee [25] , the PPL [3] , and WPDS++ [24] . The writer of each client front end needs to encode the semantics of his language by creating appropriate transformers for each statement and condition in the language's IR, using the package's API (or input language).
WALA [6] supports a common intermediate form (Common Abstract Syntax Tree), from which multiple additional IRs (e.g., CFGs and SSA-form) can be generated, and multiple analyses can be performed that use these IRs. Thus, this is similar to the package approach, but supports a multiplicity of analyses.
In contrast to the package approach, TSL provides a domain-specific language for instruction-set specification. With this approach, the ISS developer concentrates on specifying the concrete operational semantics of his language, using TSL, and a multiplicity of analyzers are then created automatically. Analysis developers can incorporate different analysis packages into the TSL framework by implementing appropriate abstract operations that overapproximate the semantics of a fixed set of TSL operations (that have a well-defined semantics). (Any of the aforementioned packages could be used for creating TSL-based analyses; currently, WPDS++ is used for all of the TCstyle analyzers that have been developed for use with TSL so far.)
There are two analysis systems, TVLA [5] and the optimizer flow-function inference system developed by Rice et al. [36] , in which sound analysis transformers are generated automatically from the concrete operational semantics, plus a specification of the abstraction (either via the abstraction function (TVLA) or the concretization function (Rice et al.)). In our system, we rely on the analysis developer to supply sound abstract operations. While this places an additional burden on developers, once an analysis is developed it can be used with each instruction set specified in TSL. Moreover, -The analyses that we support are much more efficient than those that can be created with TVLA and apply to our intended domain of application (low-level code). -Some of the analyses that we use, such as ARA [28] , appear to be beyond the power of the heuristics-based transformer-generation methods developed by Rice et al.
The development of methods for creating abstract transformers from a specification of the abstraction or concretization function (à la [5] and [36] ) is left for future research.
