Application of static analyses for state-space reduction to the microcontroller binary code

Bastian Schlich *, Jörg Brauer, Stefan Kowalewski
Embedded Software Laboratory, RWTH Aachen University, Altomstr. 55, 52074 Aachen, Germany

Abstract
This paper describes the application of two abstraction techniques, namely dead variable reduction and path reduction, to the microcontroller binary code in order to tackle the state-explosion problem in model checking. These abstraction techniques are based on static analyses, which have to cope with the peculiarities of the binary code such as hardware dependencies, interrupts, recursion, and globally accessible memory locations. An interprocedural static analysis framework is presented that handles these peculiarities. Based on this framework, extensions of dead variable reduction and path reduction are detailed. A case study using several microcontroller programs is presented in order to demonstrate the efficiency of the described abstraction techniques.

© 2010 Elsevier B.V. All rights reserved.

1. Introduction

Microcontrollers are often used in safety-critical systems for which correctness of software is crucial. Exhaustive testing of such systems in certain domains is not always performed due to fast time-to-market, uncertain environments, or the complexity of the systems. Model checking [10] is a formal verification technique that can be used to prove the correctness of software. In order to verify software for microcontrollers, we have developed a model checker for the microcontroller binary code called mc_square.

Model checking of binary code has some advantages compared to model checking of intermediate representations such as C code [4,22,25,26,36,38]. A binary code is the code that is finally deployed to the microcontroller and not an intermediate representation. Therefore, all errors introduced during the complete development process such as compiler errors, reentrance errors, and errors in handling features of the microcontroller can possibly be found. Furthermore, the model checker does not have to account for the behavior of the compiler. All constructs in the binary code have a clean and well-documented semantics and are easier to handle than some C constructs such as dynamic memory allocation and pointer arithmetic. Moreover, only the binary file of the program is needed and not the complete source code, which allows us to check complete applications including external libraries.

Model checking of the binary code, however, has two disadvantages. First, it is hardware dependent, and hence, has to be adapted for every microcontroller to be supported. Second, as more details are involved, the state-explosion problem [9] tends to be worse than when model checking of the C code. To apply model checking of the binary code effectively, these two disadvantages have to be approached.

* Corresponding author.
E-mail addresses: schlch@embedded.rwth-aachen.de (B. Schlich), brauer@embedded.rwth-aachen.de (J. Brauer), kowalewski@embedded.rwth-aachen.de (S. Kowalewski).
URL: http://www.embedded.rwth-aachen.de (B. Schlich).
1 http://www.embedded.rwth-aachen.de/mc_square.

0167-6423/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.scico.2010.03.006
This paper describes how to tackle the state-explosion problem by applying two abstraction techniques, namely dead variable reduction (DVR) and path reduction (PR). While DVR merges states that only differ in values that are not read before they are overwritten, PR stores states only in program locations of interest. Both abstraction techniques are based on static analyses that annotate the program prior to model checking. These analyses have to cope with the peculiarities of the microcontroller binary code such as hardware dependencies, interrupts, recursion, and globally accessible memory locations. This makes the application of intraprocedural approaches infeasible, and hence, an interprocedural approach that takes the underlying hardware into account is required. Both abstraction techniques were previously used in other model checkers (cp. Section 2), but due to the peculiarities of the binary code, they could not be transferred to [MC]Square one-to-one. This paper, which is an extended version of a previously published paper [37], describes the application of these abstraction techniques for model checking programs written for the ATMELE ATmega16 microcontroller.

The remainder of this paper is structured as follows. First, related work is presented in Section 2. Then, Section 3 details the challenges of applying static analysis to the microcontroller binary code. In Section 4, a short introduction of [MC]Square is given. Our interprocedural static analysis framework that is needed to cope with the peculiarities of the binary code is detailed in Section 5. The described analyses form the basis of the two abstraction techniques DVR and PR, which are detailed in Section 6. The effectiveness of DVR and PR is demonstrated in a case study using several microcontroller programs in Section 7.

2. Related work

Regehr et al. [32] describe an abstract interpretation for the analysis of stacks in the microcontroller assembly code, for which a bit-wise representation is used. An analysis similar to our stack analysis described in Section 5.3.1 is described by Schwartz et al. [39]. Their approach does not check for physical addresses of the runtime stack but for certain properties, which are summarized as so-called well-behavedness in their approach. A particular challenge in the analysis of software for microcontrollers is the presence of interrupt handlers, which significantly differ from regular threads. A description of the main differences between threads in high-level programs and interrupt handlers is given by Regehr and Coopprider [31].

Yorav and Grumberg [45] describe DVR and PR for a parallel version of the while language, which is implemented for the model-checking tool Murphi. In this language, every process has its own local variables, but global variables do not exist. Communication between processes is performed at fixed program locations by means of send and receive statements. For DVR, function calls are handled by inlining the body of the method at each call location. Hence, the static analysis can be performed intraprocedurally. So-called breaking points (cp. Section 6.2) used for PR can be determined completely statically since the language does not contain indirect control statements.

Spin [17] uses both DVR and PR. It works on a language called Promela, which is similar to the parallel while language described before with respect to these two reduction techniques. This means that function calls are handled by inlining, communication is conducted at certain program locations, and indirect control is not present. Both analyses are performed statically via an intraprocedural approach prior to model checking.

Quiros [30] adapts the approach described by Yorav and Grumberg to a bytecode language used in a virtual machine. This bytecode language is similar to the parallel while language as it has no indirect control and communication between processes is performed at fixed program locations. The main important difference for static analysis is that the bytecode language used by Quiros has local and global variables, but they are easily distinguishable as different instructions are used to access global and local variables, respectively. DVR is only applied to local variables because the static analysis in this approach is performed intraprocedurally. The breaking points used in PR are determined statically as well.

Apart from Spin and MURPHI, static DVR is integrated into numerous other model checkers such as Bandera [11], IF [7], or BEBOP [5]. In particular, Bozga et al. [7] describe state-space reductions based on live variable analysis for the model checker IF. Furthermore, they show that abstractions preserving liveness establish an equivalence relation that is stronger than bisimulation. Their process algebraic specification of liveness is comparable to our definition of live variables for binary programs described later.

Another approach for DVR is used in the model checking tool Estes as described by Lewis and Jones [23]. DVR is performed dynamically during state-space creation to exploit runtime information. Due to the dynamic nature of the approach, the results are more accurate in certain situations, but dynamic DVR increases the runtime. The user has to provide some information in order to use DVR such as a description of the behavior of the environment and addresses of the main function, interrupt handler starting points, and interrupt handler ending points. PR is not used in the Estes model checker. An improved version of the dynamic algorithm of Lewis and Jones [23] is described by Self and Mercer [40]. Their approach eliminates the need for the user interaction and produces maximal reductions for deterministic models, but is limited to single-procedure programs. This restriction makes their approach unsuitable for the analysis of the binary code.

3. Applying static analysis to the microcontroller binary code

This section presents specifics of static analysis for the microcontroller binary code. First, the microcontroller used in this paper is detailed, and then, challenges that arise when applying static analysis to the microcontroller binary code are described. Solutions to these challenges are presented in Sections 5 and 6.
3.1. ATMEL ATmega16 Microcontroller

The ATMEL ATmega16 microcontroller is an 8-bit microcontroller, which uses a Harvard architecture. In Harvard architectures, memory and buses for data and program are separated. A detailed description of the ATmega16 is given in its documentation [2,1].

The ATMEL ATmega16 features 1120 bytes of data memory, 16 kB of in-system programmable flash memory, and 512 bytes of EEPROM memory. Fig. 1 shows the organization of the data memory. The first 32 memory locations address the general-purpose registers, the I/O registers are addressed by the following 64 memory locations, and the internal data SRAM is addressed by the last 1024 memory locations. The general-purpose working registers are used, for example, in computations and to temporarily store values and the I/O registers are used to control peripherals of the microcontroller and to communicate with the environment. The ATmega16 features peripherals such as two 8-bit timers/counters, a 16-bit timer/counter, an analog-to-digital converter, and a watchdog timer. Communication is done by means of 32 I/O lines, which are organized in four 8-bit I/O ports. The internal data SRAM stores variables and the stack. The flash memory is used to store the program and the EEPROM is used to permanently store values.

The ATmega16 supports 131 instructions. As common for Harvard architectures, each of these instructions can only address one of the three memory types, that is, there are different instructions to access the data memory, the flash memory, and the EEPROM memory. The ATmega16 supports direct and indirect addressing modes. The indirect addressing mode depends on the instruction and uses either one of three 16-bit pointer registers (X, Y, and Z) or the stack pointer, which is located in two adjacent 8-bit registers SPL and SPH.

Interrupts are an important feature of microcontrollers. The ATmega16 supports 21 different interrupts. Operation of the interrupts is controlled by certain I/O registers. The global interrupt flag, which is located in the status register (SREG), controls whether interrupts are enabled. Each interrupt has an extra flag, which is located in an interrupt control register, that determines whether the specific interrupt is enabled. Additionally, many interrupts have an interrupt source, for example, timer interrupts depend on the corresponding timer/counter. That is, these interrupts can only occur if the corresponding source is active. Interrupts have fixed priorities, which are only important if two interrupts occur at the same time. In this case, the interrupt with higher priority is handled first. There are two modes for interrupt handlers, which can either be interruptible or non-interruptible. In the interruptible case, interrupt handlers are interrupted by all other interrupts including interrupts of lower priority. In the non-interruptible case, interrupt handlers are not interruptible at all. The standard behavior is that interrupt handlers are non-interruptible. Developers have to enable interrupts within interrupt handlers in order to make them interruptible.

For each interrupt there is an interrupt vector, which points to the corresponding interrupt handler. All interrupt vectors are combined in the interrupt vector table, which can be placed at different locations in memory depending on the configuration of two bit-fuses. By default, the interrupt vector table is located at the lowest addresses of the program memory. Interrupt vectors are ordered from higher priorities to lower priorities in the interrupt vector table.

3.2. Challenges

Applying static analysis to the binary code involves some challenges because of binary code constructs that make a generic intraprocedural static analysis approach infeasible. The binary code is hardware dependent as each microcontroller
architecture has its own instruction set and hardware features. Additionally, in the microcontroller binary code there are instructions to access specific registers that change the behavior of the microcontroller or influence other registers. For example, interrupt control registers enable or disable interrupts, timer control registers enable or disable timers, and certain output registers change values of specific input registers. These hardware dependencies have to be accounted for in the static analysis.

All memory locations are globally accessible in the binary code. This includes registers used for indirect accesses and indirect control. An important feature in the binary code is the stack. It is used for different purposes such as exchanging values, saving the contents of the status register, and storing return addresses resulting from function calls or the execution of interrupt handlers. The stack is accessed using specific instructions push and pop, which utilize the stack pointer, but it can also be accessed directly.

Functions are not explicitly defined in the binary code. All program locations can be the target of a call, rcall, or icall instruction. Hence, all instructions that are targets of a call instruction have to be handled as entries to functions. Often, different functions share some common code fragments. Functions are left using ret instructions. Interrupt handlers are implicitly defined in the binary code. They are entered via jmp instructions from the interrupt vector table and left via reti instructions. The reti instruction differs from ret in that it enables interrupts upon execution while ret leaves the global interrupt flag unchanged. Interrupt handlers are similar to functions, but in contrast to functions, interrupts can occur at all program locations where they are enabled. Therefore, an analysis is required that determines program locations where interrupts are enabled.

Interrupts introduce pseudo-parallelism because they can interrupt the main program at all program locations where they are enabled, but the main program cannot interfere with interrupt handlers. Moreover, interrupt handlers exchange information with many program locations as all memory locations are globally accessible and interrupts can occur at many program locations. Interrupts, as well as recursion (which is not recommended but frequently found in the microcontroller binary code), render the application of interprocedural approaches that use inlining or call-strings useless. Handling functions by assuming that they change all variables is also not appropriate as this over-approximation is too coarse to obtain meaningful results. These challenges require an interprocedural approach that has to cope with globally accessible memory locations, interrupts, and recursion. This approach has to incorporate hardware dependencies in order to handle the specifics of the respective microcontroller.

4. [mc]s\text{QUARE}

[mc]s\text{QUARE} stands for model checking microcontrollers. It is a discrete Computation Tree Logic (CTL) [14] model checker for the verification of the microcontroller binary code. It supports binary code for several microcontrollers including the ATMEL ATmega16, the ATMEL ATmega128, the Infineon XC167, the Intel MCS-51, and the Renesas R8C/23. Moreover, [mc]s\text{QUARE} supports model checking software for Programmable Logic Controllers (PLCs) written in IL [35] and abstract state machines [6]. The CTL model checking algorithm used in [mc]s\text{QUARE} is a local model checking algorithm that was first proposed by Vergauwen and Lewi [43] and later adapted by Heljanko [16]. This algorithm can be applied on-the-fly during model checking. Thereby, [mc]s\text{QUARE} can locate errors in programs that are too large to be checked completely.

[mc]s\text{QUARE} takes as input a binary file, the corresponding C code if it is available, and a specification given in CTL. It supports different binary file formats such as Executable and Linking Format (ELF), Intel Hex, and Motorola S file format. For some of these formats, it processes debug information to relate the binary code and C code. Within the CTL formula, users can make propositions about registers, I/O registers, memory locations, C variables, and the program counter. Additionally, [mc]-s\text{QUARE} checks for stack collisions, stack overflows, and unintended use of microcontroller features such as write accesses to reserved registers.

The fundamental concept of the approach implemented in [mc]s\text{QUARE} is to use tailored simulators to build state spaces for model checking. These simulators utilize domain-dependent information during state-space building. This information is used to minimize state spaces, to allow propositions about all features of the system to be verified, and to present counterexamples in a representation that is understood by users. Within these simulators, our main interest is the development of both domain-dependent and domain-independent abstraction techniques in order to tackle the state-explosion problem.

[mc]s\text{QUARE} allows state spaces to be stored in main memory and on a hard disk. State spaces can be built using single processors, multiple processors, or multiple computers. Counterexamples and witnesses are presented in the disassembled binary code, in the control flow graph of the disassembled binary code, in the C code, and as a state-space graph. A detailed description of [mc]s\text{QUARE} and its theoretical foundations is given by Schlich [34].

Fig. 2 shows the model checking process applied in [mc]s\text{QUARE}. The general process is the same for all supported microcontroller architectures. In this process, first the binary program file, the C file, and the CTL formula are read and transformed into their internal representations. Then, the static analyzer is executed and the program is annotated. The annotations are used by abstraction techniques implemented in the simulator to reduce the state space during model checking. For the ATMEL ATmega16, several static analyses such as live variables analysis, reaching definitions analysis, and analysis of interrupt registers are implemented. Details are presented in Sections 5 and 6.
In the next step, model checking is started. The model checker obtains the initial state from the state space and evaluates the validity of certain subformulas of the current formula in this state. Then, depending on the result, it requests successors of this state from the state space and continues checking subformulas. As the model checker uses a local algorithm, it does not determine the truth values of all subformulas in all states. It only determines the truth values that are needed to determine the truth value of the overall formula in the initial state. If the model checker requires successors of a state that are not created yet, the state space uses the simulator to create successors on-the-fly. In order to create successor states, the simulator conducts the following four steps.

1. Load state into the microcontroller model
2. Determine all possible assignments for nondeterministic values
3. For each such assignment
   (a) Simulate the effect of the next instruction
   (b) Evaluate truth values of atomic propositions
4. Return resulting states.

First, the state is loaded into the model of the microcontroller used within the simulator. Then, the simulator determines which nondeterministic values have to be instantiated. Nondeterministic values are introduced by the microcontroller model. In the model of the ATmega16, for example, accessing a timer, reading input from the environment, or handling of external interrupts introduces nondeterminism. Modelling timers with deterministic behavior leads to state explosion, and thus, [mc]square resorts to nondeterminism whenever timer-values are accessed.

In order to correctly handle nondeterminism and to guarantee the validity of the model checking results, the simulator has to create an over-approximation of the behavior shown by the real microcontroller. This is achieved by assigning all possible combinations to nondeterministic values accessed in the instruction to be executed next.

For example, executing an add instruction, which sums up two deterministic values, creates a single successor as no value assignment is needed. If in the same program location an external interrupt is enabled, two successors are created. In one successor the interrupt handler is entered, and in the other successor the add instruction is executed. Another example is the execution of an instruction that reads input from the environment by means of an I/O port. If an 8-bit port is used for input, 256 successors are created as all 256 possible values have to be assigned. For each of these assignments, the simulator executes the effect of the next instruction and evaluates the truth values of atomic propositions for the resulting states. This leads to a state explosion, but [mc]square uses several automatic abstraction techniques such as delayed nondeterminism [29] that tackle this problem within the simulator. Thus, the actual model checking algorithm does not have to account for hardware-specific information as this information is kept within the simulator. When all successors are created, they are added to the state space. After model checking is finished, the counterexample generator is invoked and presents the counterexample in the different representations available.

The counterexample generator takes the result and builds a counterexample or a witness depending on the formula and the outcome of the model checking process. It presents the counterexample in all existing representations, and users can choose the representation that fits their needs best. In all these representations, users can step forward and backward through the counterexample and analyze the values of registers, I/O registers, general memory locations, variables, and the program counter. This makes it easy for users to locate the source of errors found. The capability of stepping both forward and backward renders [mc]square a useful tool for debugging.

5. Static analysis

This section explains the static analysis framework implemented in [mc]square. Due to the nature of the binary code, some preliminary analyses are required to make DVR and PR applicable. These preliminary analyses are stack analysis (STA), reaching definitions analysis (RDA), global interrupt flag analysis (GIFA), and live variable analysis (LVA).

STA detects dependencies between values stored on the stack and values read from the stack by tracking corresponding pairs of push and pop instructions. Often, the value of a register is not changed in a called function because it is stored
on the stack and restored later. RDA determines for each program location and each memory location where the memory location may have been assigned a value. Together with RDA, an extended constant propagation analysis is performed. GIFA applies an abstract interpretation to infer the value of the interrupt flag at each program location. LVA determines for each program location the memory locations that may be read on some path through the program before they are overwritten.

STA evaluates each function on its own, and hence, it is executed as an intraprocedural analysis. In contrast, the other three analyses are executed using interprocedural fixed point iterations. In the following, first the general approach for intra- and interprocedural fixed point algorithms used in [mc]square is described. Then, the hardware model implemented in [mc]square is presented before the static analyses are detailed in their order of execution.

5.1. Approach

In the context of [mc]square, both intra- and interprocedural analyses are required, which—depending on the property of interest—are executed as forward or backward data-flow analyses [28]. This section focuses on forward analyses, where information about the program execution is propagated along the edges of the control flow graph (CFG) of a program. A backward analysis can be seen as the dual as it operates on the reversed CFG. Information about the program is represented using finite lattices [13]. In the following, we refer to elements of the respective lattices as data-flow facts. This section first gives an overview of our general approach. It then explains a standard intraprocedural fixed point algorithm before detailing our approach to interprocedural analysis in the presence of recursion and global variables.

5.1.1. Overview

In [mc]square, the different static analyses interact as depicted in Fig. 3. First, STA is executed as it forms the basis of all other analyses. The major dependency is between RDA and GIFA. The constant propagation executed together with RDA influences the precision of GIFA and the status of the global interrupt flag is required in order to take the execution of interrupt handlers into account during RDA. In the end, LVA is executed as it depends on the results of GIFA.

In order to handle the mutual dependencies between RDA and GIFA, these two analyses are executed in alternating order until the results of one of the analyses remain unchanged as depicted in Fig. 4. That is, a fixed point iteration of different analyses is conducted. In the first iteration, the execution of RDA is based on the assumption that interrupts are active in all program locations, generating a coarse over-approximation. These results are then used in GIFA in order to obtain a more precise over-approximation of the status of the global interrupt flag. In the second iteration, these more precise results of GIFA are used to obtain more precise RDA results, which are then used in GIFA, and so forth. If RDA is executed, the results of RDA from the previous iteration are stored to detect the fixed point, but they are not used for the computation, that is, all data-flow facts are reset in the beginning of each iteration. After each iteration, the new results are compared to the results of the previous iteration. If they are equal, a fixed point is reached and the iteration terminates. Otherwise, another iteration of GIFA is executed with more precise results from RDA. The same applies for GIFA, that is, GIFA uses results of previous iterations only for detecting fixed points. Thus, the analysis results become more precise with each iteration of RDA and GIFA and eventually remain unchanged.

The following describes this approach formally. Here, the results of RDA and GIFA after the ith iteration are denoted by $\text{RDA}^i$ and $\text{GIFA}^i$, respectively. Then $\text{RDA}^{i+1} \subseteq \text{RDA}^i$ and $\text{GIFA}^{i+1} \subseteq \text{GIFA}^i$. Since the results are monotone decreasing with each iteration and the domains are finite, there exists $n \in \mathbb{N}$ such that $\text{RDA}^n = \text{RDA}^{n+1}$ and $\text{GIFA}^n = \text{GIFA}^{n+1}$. That is, the analysis results eventually stabilize and the iteration terminates. From our experience, this is typically the case after the third or fourth iteration.
5.1.2. Intraprocedural analysis

Given the CFG $G = (V, E)$ of a program and a lattice $L$ representing data-flow facts, a data-flow analysis can be expressed in terms of an equation system over program locations $p \in V$. Here, a monotone transfer function $\omega : V \times L \rightarrow L$ computes the effects of executing an instruction on the data-flow facts. That is, given a program location $p$ and $l, l' \in L$, it is $l \subseteq l' \Rightarrow \omega(p, l) \subseteq \omega(p, l')$. The monotonicity of $\omega$ in combination with finite lattices ensures termination of the fixed point iteration. The following equation system expresses a forward data-flow analysis, that is, information is propagated along the edges in the CFG:

$$
\begin{align*}
\text{dfa}_{\text{exit}}(p) &= \begin{cases} 
\bot & : p \text{ is initial} \\
\bigcup \{ \text{dfa}_{\text{exit}}(p') \mid (p', p) \in E \} & : \text{otherwise}
\end{cases} \\
\text{dfa}_{\text{entry}}(p) &= \omega(p, \text{dfa}_{\text{entry}}(p))
\end{align*}
$$

Such an equation system can, for instance, be solved using a worklist-based fixed point iteration. This is a well-known approach, which was extensively studied in the past [18,15,24,28].

5.1.3. Interprocedural analysis

As argued before, the analysis of the binary code requires interprocedural analyses (cp. Section 3). While it is possible to encode call-edges in the equation system in order to perform interprocedural analysis, this approach leads to loss of context sensitivity. That is, it is not possible to distinguish different behaviors for different call sites, and the analysis results are unified, leading to loss of precision. Other techniques such as inlining and the call-string approach [41,44] are unsuitable due to the presence of unbounded recursion.

In order to deal with recursive function calls and to implement a context-sensitive analysis, we developed an interprocedural fixed point algorithm. In this algorithm, summaries are used to embody the effects of function calls. The summary of a function is called its behavior because it summarizes the visible behavior of the function. This means that the behavior consists of the data-flow facts in the final instruction of a function. The algorithm consists of the following four steps.

Step 1 The behavior of each function and each interrupt handler is determined using a data-flow analysis. In this step, an intraprocedural analysis is performed and function calls are ignored. This means that no data-flow facts are added through function calls and each function is analyzed only once.

Step 2 In the following step, the data-flow facts at each call site are combined with the behavior of the callee. That is, a data-flow analysis is executed, where for each call instruction in a function $f$, the available data-flow facts are joined with the behavior of the called function. In this step, the behavior of $f$ is extended and callers of $f$ are reanalyzed in case the behavior of $f$ has changed.

Step 3 An interprocedural fixed point iteration is conducted, starting from the main function of the program. Data-flow facts are propagated from each call instruction into the called function. Additionally, data-flow facts are propagated from all program location where interrupts are enabled into each interrupt handler. Callers and interrupt handlers are then reanalyzed with new data-flow facts available at the beginning of the function.

Step 4 Superfluous data-flow facts are removed at call sites.

In each of the steps, all functions and interrupt handlers are analyzed using intraprocedural fixed point iterations. The steps mainly differ in the ways call instructions are handled and functions need to be reanalyzed.

In Step 1, each function and interrupt handler is analyzed exactly once in order to determine its behavior. The behavior is computed using an intraprocedural data-flow analysis as described before, but function calls are ignored. Then, in Step 2, each function and each interrupt handler is evaluated using an intraprocedural analysis. In this step, each function $f$ is analyzed at least once, but the data-flow facts at each call instruction in $f$ are joined with the behavior of the called function. Consequently, the results in the call instruction in $f$ become larger. If more data-flow information is available in the final instruction of $f$, which corresponds to the behavior, all functions calling $f$ have to be reanalyzed. Data-flow facts from interrupt handlers are propagated into all program locations where the corresponding interrupt is enabled. Hence, an interprocedural fixed point iteration is performed.

In Step 3, the analysis starts with the main function of the program. Analysis results are propagated from each call instruction of the main function into a called function $f$. If the data-flow facts present at the entry of $f$ have changed, then $f$ needs to be reanalyzed based on the new data-flow facts using an intraprocedural fixed point iteration. As before, data-flow facts are propagated into all functions called from $f$. All in all, this leads to an interprocedural fixed point iteration. In like manner, data-flow facts are propagated from an instruction into each interrupt handler if interrupts are enabled in this instruction. This execution order has the advantage that no definitions are propagated from a calling function into another calling function, and hence, context sensitivity is preserved. The same applies for the propagation of data-flow facts through interrupt handlers.

So far, no analysis results at call sites have been overwritten using the behavior, even though the callee could, for instance, overwrite a register on every execution path. This is due to the join operation at call instructions in Step 2, which joins the data-flow facts at the entry of a call instruction with the data-flow facts coming from the called function. From the
behavior of a function after Step 1 or Step 2, it is not visible whether there exists a path through the function on which a data-flow fact is generated or whether it is generated on all paths through that function. Step 4 is executed in order to tackle this source of imprecision. This step is basically a repetition of Step 2, but a modified join operation is executed at call sites. If a data-flow fact was propagated into a called function in Step 3 but is not available at the exit node of the respective function, then this observation implies that the data-flow fact was overwritten on all paths through the called function. Consequently, superfluous data-flow facts are removed at the corresponding call instruction and are replaced with the behavior of the function, leading to increased precision.

In Step 1, Step 2, and Step 3, only additional information is collected. That is, after each iteration of each step, the analysis results become larger or remain unchanged. This is in contrast to Step 4, where the analysis results become smaller than at the end of Step 3, while still an over-approximation is preserved. Step 4 is only required if the data-flow fact also contains information where it was generated, which is required only in RDA. For GIFA and LVA, only the first three steps are executed.

5.2. Representation of hardware dependencies

As explained in Section 3, I/O registers control the behavior of the microcontroller. Reading or writing an I/O register often influences the behavior of the microcontroller. For instance, if bit 0—called TOIE0—of the timer interrupt mask register TMSK is set to one and interrupts are enabled, the timer/counter 0 overflow interrupt is enabled. Moreover, reserved bits should be written to zero if accessed in order to ensure compatibility with future devices. Another example can be seen in dependencies between different I/O registers when using I/O ports. Each port has three associated registers: a data register PORTx, a data direction register DDRx, and a port input register PINx, where x is the name of the I/O port. If the nth bit of DDRx is one, then the nth bit of PINx is configured as an output pin. If it is zero, the nth bit of PINx is configured as an input pin, and holds a nondeterministic value when it is read. Changing PORTx can, for instance, activate the pull-up resistors connected to the corresponding pin, depending on the configurations of PINx and DDRx.

In order to account for such details of the microcontroller, hardware dependencies are represented using a so-called dependency map in [MC]square. The dependency map contains for each I/O register a list of effects caused by accessing the corresponding register. The entries in the dependency map are used during static analysis in order to precisely model the influence of instructions on the behavior of the microcontroller. This information is stored once and reused whenever an instruction accesses an I/O register, which provides a generic model and simplifies the implementation of the analyses. All dependencies are described in the ATMEL ATmega16 datasheet [2].

5.3. Analyses

In the following, the different static analyses implemented in [MC]square are detailed. These analyses use the intraprocedural and interprocedural approaches detailed before. Moreover, they use the dependency map in order to account for behavior caused by accessing certain I/O registers.

5.3.1. Stack analysis

The only analysis independent of the results of all other analyses is STA. In the binary code, the stack is frequently used to temporarily store the contents of working registers used in a function. In the beginning, the contents of these registers is pushed onto the stack, and at the end of the function, the contents of these registers is taken back from the stack and written into the corresponding registers. Hence, for a data-flow analysis it looks as if this function depends on the values of these registers, although the function does not use the values. Moreover, the stack is used to store return addresses from function calls or interrupt handlers. STA is an intraprocedural analysis used to check two conditions. First, it is used to check whether a function is well-behavedness analysis of runtime stacks described by Linn et al. [39].

Due to the dynamic nature of stacks, the size and contents of the stack at a specific program location can only be determined during runtime. In Fig. 5, an example is depicted that demonstrates the stack usage for storing working registers. Here, register r1 is only used as a temporary variable, and at the end of the function it contains the same value as in the beginning of the function. A standard data-flow analysis that does not model the stack cannot recognize this. To solve this problem, and hence obtain more accurate results for other data-flow analyses, an abstract interpretation [12] is used to determine for each program location the set of possible stack configurations. Here, an abstraction of all possible stack configurations is propagated through each function. In each function, a check is performed whether the local stack configuration has not changed when the function is exited. The abstract interpretation observes all accesses to the stack such as push, pop, changing the stack pointer, and write accesses into the memory area of the stack. Moreover, it determines if at the end of the function the original values of the working registers are restored. The outcome of STA is a set of triples, which consist of the address of the push instruction, the address of the pop instruction, and the register. If STA fails due to an infinite number of possible stack configurations, for example, caused by loops or manual changes of the stack pointer, it is assumed that this function changes the contents of the complete SRAM and all working registers used within the function.
0x23: push r1 ; store r1 on the stack
0x24: in r1 PORTA ; read value from input port
0x25: out PORTB r1 ; write value to output port
0x26: pop r1 ; restore value of r1
0x27: ret ; return from function

Fig. 5. Store intermediate values on the stack.

0 ⇡ 1 ... 254 255 0 1
(\(a\)) Register-wise. \(b\) Bit-wise.

Fig. 6. Value representation using lattices.

In the example shown in Fig. 5, STA correctly recognizes that at location 0x26 the original value of r1 is restored and the value of r1 is not modified. That is, the outcome of the analysis of this code fragment is (0x23, 0x26, r1). If the instruction at address 0x26 were replaced by pop r0, the stack analysis would fail because the value stored on the stack would not be written back into the original source register. That is, the register configuration would have changed.

5.4. Reaching definitions analysis

RDA computes for each program location and each memory location the set of program locations that may have written the value of the given memory location. Formally speaking, given a CFG \(G = (\mathcal{V}, \mathcal{E})\), the analysis produces for each \(p \in \mathcal{V}\) a set of pairs consisting of program locations \(p' \in \mathcal{V}\) and memory locations \(m \in \mathbb{N}\). RDA can be defined as an equation system using a transfer function \(\omega^{RDA}\) as follows:

\[
RDA_{\text{entry}}(p) = \begin{cases} \bot & : p \text{ is initial} \\
\bigsqcup \{RDA_{\text{exit}}(p') | (p', p) \in \mathcal{E}\} & : \text{otherwise} 
\end{cases}
\]

\[
RDA_{\text{exit}}(p) = \omega^{RDA}(p, RDA_{\text{entry}}(p))
\]

\[
\omega^{RDA}(p, l) = (l \setminus \text{kill}^{RDA}(p)) \cup \text{gen}^{RDA}(p)
\]

Here, \(\text{kill}^{RDA}(p)\) denotes the set of reaching definitions overwritten in instruction \(p\), while \(\text{gen}^{RDA}(p)\) denotes the reaching definitions generated in \(p\). Intuitively speaking, if an incoming definition is overwritten in \(p\), then it is removed from the dataflow facts and a new definition is generated. The analysis is conducted as an interprocedural fixed point iteration based on the algorithm described in Section 5.1.3.

Moreover, our approach applies an abstract interpretation to directly perform an extended form of constant propagation. Reaching definitions are annotated with the respective value. In case instructions are observed that write a fixed value into a register, the reaching definitions are annotated with this value. For instance, if an instruction \(\text{add } r0 r1\) is executed, then \(r0\) always contains the value 0 afterwards. In a similar way, if an instruction \(\text{add } r0 r1\) is executed and the reaching definitions of \(r0\) and \(r1\) are annotated with exact values, then RDA also infers the precise value of \(r0\) after this instruction. In case the results are ambiguous, however, it is assumed that any possible value can be written in order to generate an over-approximation.

In this analysis, all registers except the SREG are represented using the lattice depicted in Fig. 6(a). The SREG is modelled bit-wise using the lattice depicted in Fig. 6(b). Many instructions of the ATMEL ATmega16 alter only single bits of the SREG. Instruction such as \(\text{cpi}\) and \(\text{sei}\), for instance, only set or clear the global interrupt flag, but no other bit of the SREG is touched. Arithmetic instructions such as \(\text{add}\) set certain bits of the SREG, for example, the negative flag or the zero flag, depending on the outcome of the operation, but the global interrupt flag remains unchanged. While it is often not possible to infer the exact value of all bits, it can be done for certain bits of the SREG.

In Fig. 7, the transfer functions used in the abstract interpretation are given for some instructions of the ATMEL ATmega16. Here, the value of a register \(r\) in the corresponding program location is denoted by \(\|r\|\). The instruction \(\text{ldi } r c\) loads a constant value \(c\) into register \(r\). The instruction \(\text{mov } r s\) copies the value of register \(s\) into register \(r\). For an \(\text{add } r s\) instruction, which sums up the values of \(r\) and \(s\) and stores the result in \(r\), the precise value of the destination register can be computed if both the values of \(r\) and \(s\) are known at this location. Similarly, instructions such as \(\text{eor } (\text{exclusive-or})\) are handled. In case \(\text{eor } r r\) is executed, the value can be directly inferred, even if the exact value of register \(r\) itself is unknown.

RDA uses the results of STA in order to obtain more precise analysis results than possible without knowledge about stack usage. Consider the example depicted in Fig. 5. Here, a function stores the value of \(r1\) on the stack by executing \(\text{push } r1\).
ldi r c = c
mov r s = ||s||
add r s = 
\[
\begin{align*}
|\|r\|| + |\|s\|| & : |\|r\|| \neq \bot \neq |\|s\|| \land |\|r\|| \neq \top \neq |\|s\|| \\
\top & : |\|r\|| = \top \lor |\|s\|| = \top \\
\bot & : |\|r\|| = \bot \lor |\|s\|| = \bot
\end{align*}
\]
eor r s = 
\[
\begin{align*}
0 & : r = s \\
\top & : |\|r\|| = \top \lor |\|s\|| = \top \\
\bot & : |\|r\|| = \bot \lor |\|s\|| = \bot \\
r \otimes s & : \text{otherwise}
\end{align*}
\]

Fig. 7. Instruction-specific transfer functions used in RDA.

(a) Program using two functions. (b) Corresponding CFGs.

Fig. 8. Example program to depict interprocedural RDA.

and restores its value using pop r1 before the function is exited. In this case, the reaching definition generated through the pop r1 instruction can be removed and safely be replaced by the reaching definition for r1 in the push r1 instruction because the value of r1 was not altered in-between. Consequently, STA leads to more concise reasoning about the origins of values because it allows for filtering of definitions that stem from storing intermediate values, which are not changed.

Furthermore, RDA also depends on GIFA. If interrupts are enabled at a given program location, all reaching definitions stemming from interrupt handlers have to be added to this program location. This means that increased precision in GIFA leads to smaller results of RDA. This exemplifies the dependencies between different static analyses.

In the following, an example for RDA using the program given in Fig. 8 is described. Here, two functions are used, which are located at program locations 0x40 and 0x60. The function 0x60 is called by the function located at address 0x40. In both functions, nondeterministic values are read from the environment through the input pins PINA, PINB, and PINC. The results of RDA are depicted in Table 1. For clarity, only the reaching definitions for r0 are presented and the memory location r0 is omitted. That is, an entry 0x42 in the table represents a reaching definition (0x42, r0). Furthermore, as the values of r0 depend on nondeterministic input, the value analysis always infers \top, and hence, the value annotations are omitted as well. Each column presents the results after executing each step of the interprocedural fixed point iteration (cp. 5.1.3). In each row, the incoming data-flow facts for the respective instruction are depicted.

The behaviors of the functions generated in Step 1 are \{0x42\} for function 0x40 and \{0x61\} for function 0x60, respectively. In Step 2, the results at the call instruction at program location 0x45 are combined with the local behavior of function 0x60, which leads to definitions 0x42 and 0x61 at the exit of the call instruction. In the following Step 3, data-flow facts from call sites are propagated into called functions. Since only a single function call is present in this example, only the results of function 0x60 are extended. In Step 4, the analysis results are reduced. A check is performed whether the definition 0x42, which is present at the entry of instruction 0x45, is also included in the return statement of the called function. As this is not the case, the definition is removed, which leads to smaller results. Consequently, register r0 has exactly one reaching definition at each program location.

5.5. Global interrupt flag analysis

The global interrupt flag, which is stored in the highest bit of the SREG, defines whether interrupts are globally enabled or disabled. Without an analysis that determines the value of the global interrupt flag, it is assumed that interrupts are enabled
Table 1
Results of RDA for register r0 and the program given in Fig. 8.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Step 1</th>
<th>Step 2</th>
<th>Step 3</th>
<th>Step 4</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x40</td>
<td>0x42</td>
<td>0x42, 0x61</td>
<td>0x42, 0x61</td>
<td>0x61</td>
</tr>
<tr>
<td>0x41</td>
<td>0x42</td>
<td>0x42, 0x61</td>
<td>0x42, 0x61</td>
<td>0x61</td>
</tr>
<tr>
<td>0x42</td>
<td>0x41</td>
<td>0x41</td>
<td>0x41</td>
<td>0x41</td>
</tr>
<tr>
<td>0x43</td>
<td>0x42</td>
<td>0x42</td>
<td>0x42</td>
<td>0x42</td>
</tr>
<tr>
<td>0x44</td>
<td>0x42</td>
<td>0x42</td>
<td>0x42</td>
<td>0x42</td>
</tr>
<tr>
<td>0x45</td>
<td>0x42</td>
<td>0x42</td>
<td>0x42</td>
<td>0x42</td>
</tr>
<tr>
<td>0x46</td>
<td>0x42</td>
<td>0x42, 0x61</td>
<td>0x42, 0x61</td>
<td>0x61</td>
</tr>
<tr>
<td>0x47</td>
<td>0x42</td>
<td>0x42, 0x61</td>
<td>0x42, 0x61</td>
<td>0x61</td>
</tr>
<tr>
<td>0x60</td>
<td>∅</td>
<td>∅</td>
<td>0x42</td>
<td>0x42</td>
</tr>
<tr>
<td>0x61</td>
<td>∅</td>
<td>∅</td>
<td>0x42</td>
<td>0x42</td>
</tr>
<tr>
<td>0x62</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
</tr>
<tr>
<td>0x63</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
</tr>
<tr>
<td>0x64</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
</tr>
<tr>
<td>0x65</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
</tr>
<tr>
<td>0x66</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
<td>0x61</td>
</tr>
</tbody>
</table>

0x23: in r0 SREG ; write SREG into r0
0x24: push r0 ; store value on the stack
...
0x3c: pop r0 ; read value of SREG from stack
0x3d: out SREG r0 ; write back into SREG
...
0x44: reti ; return from interrupt handler

Fig. 9. Restore SREG in an interrupt handler.

0x40: eor r0 r0 ; r0 = 0x00
0x41: out SREG r0 ; sreg = 0x00, i.e., interrupt flag cleared

Fig. 10. Initialization sequence for the SREG in compiler-generated code.

at every program location in order to generate an over-approximation. If an interrupt handler reads a certain register and
interrupts are active at any program location, this register can never be reset by DVR.

The status of the global interrupt flag is represented using the lattice depicted in Fig. 6(b). In order to obtain precise
results, an abstract interpretation is applied that determines the status of the global interrupt flag for each program location.
This abstract interpretation observes all accesses to the SREG done via instructions cli and sei and direct/indirect write
accesses.

With respect to dependencies between different static analyses, first of all, GIFA depends on STA. Consider the example
given in Fig. 9. This code fragment depicts parts of an interrupt handler. When an interrupt handler is entered, the global
interrupt flag is automatically cleared by the hardware. This means that the global interrupt flag is cleared in instruction
0x24, where the SREG is stored on the stack. When the value of the SREG is read from the stack in instruction 0x3c and
written back into the SREG in instruction 0x3d, the global interrupt flag is still cleared. Without the information that the
value pushed onto the stack and the value read from the stack are equal, the static analysis would have to assume that
interrupts are possibly enabled.

Moreover, sequences of instructions such as the program fragment depicted in Fig. 10 are frequently found in the
compiler-generated code in order to reset the SREG. Here, first r0 is set to 0 before the value is written into the SREG.
Consequently, the global interrupt flag is cleared in program location 0x41. The value written into the SREG in instruction
0x41 is extracted from value annotations computed using RDA.

5.6. Live variable analysis

LVA is an analysis that determines for each program location the set of variables that may be read on some execution path
through the program before they are overwritten [28]. These variables are called alive because their value may be required
in some execution. Its results are influenced by the results of GIFA and its precision strongly influences the effectivity of
DVR.

In contrast to the analyses described before, LVA is a backward data-flow analysis. That is, the analysis traverses the CFG
of the program in reverse order. Once an instruction is visited that reads a certain variable, this variable is added to the set
of live variables. In like manner, a data-flow fact for a variable is removed in a certain program location if it is overwritten
in the corresponding instruction. Given a CFG $G = (V, E)$ and $p \in V$, LVA can be expressed using the following equation
Fig. 5.1.2

Table 2
Results of LVA for the program given in Fig. 8.

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Step 1</th>
<th>Step 2</th>
<th>Step 3</th>
<th>Instruction</th>
<th>Step 1</th>
<th>Step 2</th>
<th>Step 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>0x40</td>
<td>∅</td>
<td>∅</td>
<td>∅</td>
<td>0x60</td>
<td>[r0]</td>
<td>[r0]</td>
<td>[r0]</td>
</tr>
<tr>
<td>0x41</td>
<td>[r1]</td>
<td>[r1]</td>
<td>[r1]</td>
<td>0x61</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
</tr>
<tr>
<td>0x42</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
<td>0x62</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
</tr>
<tr>
<td>0x43</td>
<td>[r0]</td>
<td>[r0]</td>
<td>[r0]</td>
<td>0x63</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
</tr>
<tr>
<td>0x44</td>
<td>∅</td>
<td>[r0]</td>
<td>[r0]</td>
<td>0x64</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
<td>[r0, r1]</td>
</tr>
<tr>
<td>0x45</td>
<td>∅</td>
<td>[r0]</td>
<td>[r0]</td>
<td>0x65</td>
<td>[r1]</td>
<td>[r1]</td>
<td>[r1]</td>
</tr>
<tr>
<td>0x46</td>
<td>∅</td>
<td>∅</td>
<td>∅</td>
<td>0x66</td>
<td>∅</td>
<td>∅</td>
<td>∅</td>
</tr>
<tr>
<td>0x47</td>
<td>∅</td>
<td>∅</td>
<td>∅</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

system:

\[
LVA_{\text{exit}}(p) = \begin{cases} 
\perp & : p \text{ is final} \\
\bigcup \{LVA_{\text{entry}}(p') \mid (p, p') \in E \} & : \text{otherwise} 
\end{cases}
\]

\[
LVA_{\text{entry}}(p) = \omega(p, LVA_{\text{exit}}(p))
\]

\[
\omega_{LVA}(p, l) = l \setminus \text{kill}^{LVA}(p) \cup \text{gen}^{LVA}(p)
\]

Note the different ordering compared to the equation system described in Section 5.1.2 in order to propagate data-flow facts back-to-front. Here, \(\text{gen}^{LVA}(p)\) generates a data-flow fact for a variable \(m\) if \(m\) is read in \(p\). Similarly, \(\text{kill}^{LVA}(p)\) contains the set of variables that are written in \(p\).

As described before, functions are all program fragments reachable via call statements. Additionally, interrupts are also handled like functions. In the binary code, functions do not have formal parameter values. Communication with functions is done via global variables, globally accessible registers, the runtime stack, or indirect loads and stores from and to a memory area indicated by a pointer register. The latter case is seldom used and leads to an over-approximation in this approach as indirect loads and stores can access all memory locations. The most common case is the usage of global variables, registers, and the stack. To handle functions and interrupt handlers, a local behavior is defined for them (cp. Section 5.1). The behavior of a function regarding LVA is a set containing all memory locations alive at the beginning of the respective function. In the worst case, this approach leads to an over-approximation of the behavior of a function by assuming that all memory locations are alive due to indirect reads, but in most cases few memory locations are alive. This analysis benefits from all analyses described before because without these analyses, the results obtained during LVA would be too inaccurate to be of use for DVR. The different steps of our interprocedural approach are conducted as detailed before.

Similar to the way RDA depends on STA, LVA depends on STA. If a value of a register is only stored on the stack and then read from the stack, the value is never really used. Consequently, pairs of push and pop instructions that have been identified using STA are not considered for LVA. Moreover, LVA also depends on GIFA. A memory location is alive in a program location if interrupts are enabled and the memory location is read in an interrupt handler. Consequently, limiting the set of program locations where interrupts are enabled leads to smaller LVA results.

As an example, consider the code fragment given in Fig. 8. LVA leads to the results shown in Table 2, which details the sets of live variables at the entry of each instruction. After Step 1, register \(r0\) is included in the local behavior of function 0x60 since it is alive at the entry of the function. In the first step, \(r0\) is not alive in the instructions located at 0x44 and 0x45. In Step 2, the local behavior of function 0x60 is combined in the call instruction located at address 0x45. Consequently, \(r0\) is now alive in instructions 0x44 and 0x45. Step 3 does not lead to different results and execution of Step 4 is not required for LVA.

Step 3 in our interprocedural approach is required in order to obtain an over-approximation. In case the instruction \(\text{mov} \ r0 \ \text{PINB} \) was replaced with an instruction that also reads \(r0\), such as \(\text{sub} \ r0 \ r1\), the register \(r0\) would also be alive in 0x40 and 0x46. In this case, the alive register \(r0\) would have to be propagated into the exit of function 0x60 in order to prevent the function from 0x60 from reseting \(r0\).

6. Abstraction techniques

This section details two abstraction techniques, namely DVR and PR, for the ATMEGA ATmega 16 microcontroller. Each of these abstraction techniques leads to state-space reductions in model checking. Moreover, they can be combined. While DVR can always be applied, PR only preserves CTL-\(X\), that is, validity of the \(X\) (next) operator is lost.

6.1. Dead variable reduction

DVR copes with the state-explosion problem by reducing the number of states generated during state-space construction. If two states differ only in the value of a dead variable, that is, a variable whose value is never read again, then both states can be seen as equivalent, and hence, can be merged into a single state. In order to preserve the validity of the model checking results, variables used within the specification are never reset.
6.1.1. Algorithm

Yorav and Grumberg [45] define a variable to be fully dead at a given program location \( p \) if on every execution path starting from \( p \), the variable is overwritten before it is read again. In contrast to their approach, we do not consider partially dead variables, which are variables that are dead on some execution paths. Following from the definition, DVR can be seen as the dual of LVA. That is, a variable that is not alive is dead, and hence, can be reset. Executing LVA is needed in order to compute the set of live variables. During model checking, however, variables need to be reset only once at program locations where they die. Dying locations are those program locations where the variable was alive in a preceding location and then becomes dead. Resetting dying variables instead of dead variables reduces runtime requirements during model checking as there is no need to reset a variable twice.

After the sets of live variables are determined, for each program location \( p \) the set \( D_p \) of dying variables has to be identified. This is done by successively comparing the sets of live variables of two consecutive program locations \( p \) and \( p' \). The variables that are alive at \( p \) and are no longer alive at \( p' \) die at \( p' \). Let LVA\((p)\) and LVA\((p')\) denote the sets of live variables at the exit of program locations \( p \) and \( p' \), respectively. Then, we have \( D_p = \bigcup \{ \text{LVA}(p) : (p, p') \in E \} \setminus \text{LVA}(p') \). Furthermore, variables that are assigned a value in \( p' \) and not alive in any succeeding instruction are added to \( D_p \), thus considering the case that a value is never read. Afterwards, the variables accessed in atomic propositions used in the specification have to be removed from the set of dying variables. Finally, every program location \( p \) is annotated with its corresponding set \( D_p \). This set indicates the variables that have to be reset during state-space construction.

6.1.2. Example

Consider the example program given in Fig. 8. Program locations where register \( r0 \) is alive are depicted in Fig. 11(a). Similarly, the dying locations of \( r0 \) are depicted in Fig. 11(b). The program location where \( r0 \) dies are 0x46, 0x47, and 0x65. In these instructions, \( r0 \) was alive in a preceding location and then becomes dead. Hence, \( r0 \) is reset whenever one of these instructions is visited during model checking.

6.2. Path reduction

PR was first described by Yorav and Grumberg [45]. It is used to collapse single successor chains, which are computational paths consisting of states having only single successors, into a single step. This means that only the first and the last state of this path are stored in order to reduce states spaces and memory consumption. Furthermore, states are stored at program locations where the validity of the specification may be influenced. The disadvantage of this method is that it only preserves a divergence-sensitive stuttering bisimulation [42,3] between the concrete and the abstract transition system. Consequently, it preserves CTL*-\( X \), that is, validity of the next operator in CTL is lost. A formal proof is given by Yorav and Grumberg [45]. This restriction, however, is negligible due to rare use of the X operator in specifications when model checking of the binary code. As specifications are often based on the C code of the program and a single C statement is typically compiled into a sequence of different instructions, the X operator is barely used in practice.

6.2.1. Algorithm

Our algorithm for PR consists of a static and a dynamic part in order to determine program locations of interest. These program locations are called breaking points by Yorav and Grumberg [45]. During state-space building, only states that are generated in program locations marked as breaking points are stored. First, program locations which satisfy certain conditions are determined by means of a static analysis. Yorav and Grumberg handle a parallel while language, and hence, all breaking points can be identified statically. In [mc]square, however, some of these breaking points have to be determined dynamically during state-space building due to indirect data accesses and nondeterminism. Yorav and Grumberg define the following program locations \( p \) to be breaking.

(i) \( p \) is the initial or terminating program location
(ii) \( p \) is associated with the program location of an assignment that changes a variable used within the formula
(iii) \( p \) is associated with the program location of a nondeterministic assignment
(iv) \( p \) is the head of a \texttt{while} statement
(v) \( p \) is labeled by a procedure call, or is the statement immediately following a procedure call
(vi) \( p \) is labeled by a communication statement (\texttt{send} or \texttt{receive}), or is the statement following inter-process communication

Condition (i) can be checked statically in [mc\textsc{square}]. Condition (ii) could be detected statically in the binary code as well. Using a static approach, all instructions that indirectly write a memory location have to be marked as breaking, in addition to instructions that directly write memory locations used in the CTL formula. This includes frequently found constructs such as accesses to the stack using \texttt{push} instructions. Thus, detecting condition (ii) entirely statically leads to a coarse over-approximation and many instructions are unnecessarily marked as breaking. Consequently, direct writes are statically marked as breaking and indirect writes are resolved dynamically during state-space building in [mc\textsc{square}].

For similar reasons, condition (iii) cannot be checked statically in [mc\textsc{square}] because nondeterminism is not indicated by certain statements in the binary code. Different memory locations can introduce nondeterminism and are accessed through various instructions. Furthermore, a memory location can change back and forth between nondeterministic and deterministic behavior. For instance, an input port is switched to output or a timer is disabled. Such situations cannot be detected statically. Hence, a static analysis would lead to a too coarse over-approximation. Therefore, the third condition is checked dynamically during state-space building. This check is implemented by observing whether for the respective state more than one successor is generated.

Condition (iv) can be detected statically in the binary code, but it requires special treatment because no explicit instructions for implementing loops exist. Loops are implemented using combinations of branching instructions and unconditional jumps. Detecting loops is required in order to guarantee termination during state-space building. If there is a nonterminating loop in the program under verification without any breaking points on this loop, it would not be possible to detect revisits, and hence, the state-space building would not terminate. Termination of loops, however, cannot be detected.

In order to ensure termination of state-space building, at least one program location on each loop is required to be a breaking point. Hence, all unconditional jumps targeting addresses lower than the address of the respective instruction and all indirect jumps are marked as breaking as well. Moreover, all branching instructions with negative offset are marked as breaking as these can lead to a loop. An instruction such as \texttt{brne} \( \text{-16} \), for example, branches backwards in the program in case the zero flag in the SREG is set.

For condition (v), all call instructions including indirect and relative calls are statically marked as breaking. The targets of the respective call instructions are not breaking. Moreover, all \texttt{ret} instructions, which correspond to return statements in high-level programming languages, are marked as breaking. Marking \texttt{ret} instructions is required because the return addresses stored on the stack may have changed on the execution path from the function entry to the \texttt{ret} instruction. Hence, the instruction immediately following a \texttt{call} instruction is not marked, but the \texttt{ret} instruction leading to the immediately following instruction, because the called function may return to an instruction different from the original \texttt{call} instruction. Such behavior is not possible in the parallel while language considered by Yorav and Grumberg.

Condition (vi) is not directly applicable to the binary code as it does not contain parallel processes. Parallelism is implemented by means of interrupt handlers, which show a comparable behavior (cp. Section 3). The interleaving of the main program and interrupt handlers, however, is asymmetric. That is, interrupt handlers can interrupt the main program but not vice versa. Moreover, there are no explicit communication statements that control the communication between the main program and interrupt handlers. Communication is performed using global variables. Whenever an interrupt handler is active, it can communicate with the main program by writing memory locations that are read by the main program. To represent this behavior, each location where interrupts may occur has to be breaking. Similar to condition (v), which handles \texttt{call} instructions, all \texttt{reti} instructions—return from interrupt handler—are statically marked as breaking points. Instructions where interrupts are enabled could be statically marked because GfA delivers an over-approximation of the status of the interrupt flag. The exact status of the global interrupt flag, however, is always known during model checking. Hence, locations where interrupts are enabled are marked at runtime.

On first sight, it appears that PR would not provide significant benefit. In practice, however, interrupts are only active within certain parts of the program and in most interrupt handlers. Often, interrupt handlers are implemented as long single successor chains, which particularly benefit from PR. This is shown in a case study presented in Section 7.

6.2.2. Example

Consider the program given in Fig. 8. Only those program locations emphasized in Fig. 12(a) are marked as breaking points. The conditions that apply at each program location are detailed in Fig. 12(b). In this program, we assume \texttt{0x40} to be the initial program location. Moreover, the instruction \texttt{0x40} contains a nondeterministic assignment because it reads a value from an input port, and thus, 256 successors are created. The same applies to instruction \texttt{0x41}. Instruction \texttt{0x45} is marked due to the \texttt{call} instruction. Instruction \texttt{0x46} is marked due to condition (iv) in order to detect loops. Instruction \texttt{0x47} is the final location in the program, and hence, condition (i) applies. In the other function, instruction \texttt{0x60} reads a nondeterministic value from the environment and is marked. Moreover, instruction \texttt{0x64} is marked due to condition (iv). A \texttt{ret} instruction is located at \texttt{0x66}, and hence, this instruction is marked due to condition (v). No interrupt
hadlers are used, and thus, condition (vi) has no effect. Condition (ii) is not applied as no specification is used in this example.

7. Case study

This section describes a case study using seven different programs for the ATML ATmega16 microcontroller in order to show the effect of DVR and PR on state spaces. The case study was conducted on a SUN Fire x4600 M2 server equipped with eight dual-core AMD Opteron 8220 processors and 256 GB main memory. Although [mc]square supports parallel state-space building algorithms as described by Brauer et al. [8], only one of the processors was used in this case study in order to generate unbiased results. The seven programs chosen for this case study were all written by students in lab courses, during diploma theses, or in exercises. These programs were written in C and compiled using AVR-Gcc. None of the programs was intentionally written for being model checked.

The results of the case study are shown in Table 3. The first line shows the results for every program without applying any abstraction techniques. The second and the third line show the results using DVR and PR, respectively. The last line demonstrates the results when both abstraction techniques are applied. The column states stored reflects the number of states that are stored in the state space. The column states created presents the number of states that are created during model checking. This number is typically higher than the number of states stored due to revisits. The column size [MB] shows the size of the state space. The last column shows the runtime needed for applying static analysis and building the complete state space. The invariant AG true was used in order to build the complete state space. Only small programs, for which state spaces could be built without applying any abstraction techniques, were used in order to compare the effects of the described techniques.

The first program called light_switch is a simple program utilized to demonstrate basic microcontroller functions. It consists of 162 lines of code. It uses two timers and no interrupts. In this program, DVR lowered the number of states stored by 22.26%. PR lowered the number of states stored by 78.65%, but it increased the number of states created by 164.01%. This is because of long single successor chains of which only the last state was stored. In order to detect revisits, the complete chain had to be visited again. For this small example, the runtime was increased because all static analyses were executed for DVR, which was not the case for the default configuration. Using DVR and PR together led to 84.65% less states stored compared to the application of no abstraction techniques. The reductions caused by both abstraction techniques did not add up completely, but their combination had a noticeable effect. The reduction in the number of states stored did not directly carry over to reduction in terms of memory requirements on this small example. The hash table used to store the state space is always initialized with a default size larger than the state space of the complete program. Consequently, 0.5 MB is the minimal memory requirement of [mc]square during model checking.

The next program called plant controls a fictive chemical plant. It consists of 225 lines of code, and it uses one timer and two interrupts. DVR had no effect in this program because the same variables are used throughout the complete program including the interrupt handlers. Therefore, this program has no location where a variable becomes dead, and hence, no variables were ever reset. PR lowered the number of states stored by 90.29% and increased the number of states created by 13.96%. This means that either the number of revisits was smaller compared to the program light_switch or the length of single successor chains was shorter. Applying PR led to a significant decrease in memory usage, that is, it dropped from 33.6 MB to 3.0 MB.

The program called reentrance is used to demonstrate a reentrance problem. The program itself is rather small. It consists of only 148 lines of code and uses one interrupt handler. A 16-bit variable i is accessed both in the main program and in the interrupt handler. Values are assigned using different instructions that access only 8 bits of i, but access to i is not protected. This lack of protection of a shared variable leads to invalid values of i with respect to the specification. Similarly to the program plant, DVR had no influence for the same reasons. Again, PR significantly reduced the state space and stinted 90.77% of the states stored. The number of states created was only increased by 10.85% due to few revisits. The memory requirements dropped from 23.5 MB to 2.2 MB.
Table 3
Effects of DVR and PR on seven microcontroller programs.

<table>
<thead>
<tr>
<th>Program</th>
<th>Options used</th>
<th>States stored</th>
<th>States created</th>
<th>Size (MB)</th>
<th>Time (s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>light_switch</td>
<td>None</td>
<td>4,268</td>
<td>6,296</td>
<td>1.2</td>
<td>0.11</td>
</tr>
<tr>
<td></td>
<td>DVR</td>
<td>3,318</td>
<td>5,119</td>
<td>1.0</td>
<td>0.33</td>
</tr>
<tr>
<td></td>
<td>PR</td>
<td>911</td>
<td>16,622</td>
<td>0.5</td>
<td>0.55</td>
</tr>
<tr>
<td></td>
<td>Both</td>
<td>655</td>
<td>13,521</td>
<td>0.5</td>
<td>0.52</td>
</tr>
<tr>
<td></td>
<td>DVR</td>
<td>130,524</td>
<td>135,949</td>
<td>33.6</td>
<td>2.11</td>
</tr>
<tr>
<td></td>
<td>PR</td>
<td>12,679</td>
<td>154,921</td>
<td>3.0</td>
<td>1.62</td>
</tr>
<tr>
<td></td>
<td>Both</td>
<td>12,679</td>
<td>154,921</td>
<td>3.0</td>
<td>4.85</td>
</tr>
<tr>
<td>plant</td>
<td>None</td>
<td>130,524</td>
<td>135,949</td>
<td>33.6</td>
<td>1.22</td>
</tr>
<tr>
<td></td>
<td>DVR</td>
<td>107,649</td>
<td>110,961</td>
<td>23.5</td>
<td>1.91</td>
</tr>
<tr>
<td></td>
<td>PR</td>
<td>9,935</td>
<td>123,003</td>
<td>2.2</td>
<td>1.91</td>
</tr>
<tr>
<td></td>
<td>Both</td>
<td>9,935</td>
<td>123,003</td>
<td>2.2</td>
<td>4.85</td>
</tr>
<tr>
<td>reentrance</td>
<td>None</td>
<td>107,649</td>
<td>110,961</td>
<td>23.5</td>
<td>3.0</td>
</tr>
<tr>
<td></td>
<td>DVR</td>
<td>107,649</td>
<td>110,961</td>
<td>23.5</td>
<td>1.62</td>
</tr>
<tr>
<td></td>
<td>PR</td>
<td>9,935</td>
<td>123,003</td>
<td>2.2</td>
<td>1.91</td>
</tr>
<tr>
<td></td>
<td>Both</td>
<td>9,935</td>
<td>123,003</td>
<td>2.2</td>
<td>4.85</td>
</tr>
<tr>
<td>traffic_light</td>
<td>None</td>
<td>107,649</td>
<td>110,961</td>
<td>23.5</td>
<td>1.22</td>
</tr>
<tr>
<td></td>
<td>DVR</td>
<td>107,649</td>
<td>110,961</td>
<td>23.5</td>
<td>1.91</td>
</tr>
<tr>
<td></td>
<td>PR</td>
<td>9,935</td>
<td>123,003</td>
<td>2.2</td>
<td>1.91</td>
</tr>
<tr>
<td></td>
<td>Both</td>
<td>9,935</td>
<td>123,003</td>
<td>2.2</td>
<td>4.85</td>
</tr>
<tr>
<td>window_lift</td>
<td>None</td>
<td>2,342,564</td>
<td>2,589,665</td>
<td>573</td>
<td>30.11</td>
</tr>
<tr>
<td></td>
<td>DVR</td>
<td>307,176</td>
<td>335,724</td>
<td>73</td>
<td>9.32</td>
</tr>
<tr>
<td></td>
<td>PR</td>
<td>318,626</td>
<td>3,818,060</td>
<td>77</td>
<td>40.43</td>
</tr>
<tr>
<td></td>
<td>Both</td>
<td>40,048</td>
<td>494,140</td>
<td>9.5</td>
<td>10.73</td>
</tr>
<tr>
<td>can</td>
<td>None</td>
<td>147,259,483</td>
<td>154,271,836</td>
<td>34,552</td>
<td>2,317</td>
</tr>
<tr>
<td></td>
<td>DVR</td>
<td>147,259,483</td>
<td>154,271,836</td>
<td>33,994</td>
<td>2,410</td>
</tr>
<tr>
<td></td>
<td>PR</td>
<td>21,037,058</td>
<td>55,584,435</td>
<td>754</td>
<td>670</td>
</tr>
<tr>
<td></td>
<td>Both</td>
<td>21,037,058</td>
<td>55,584,435</td>
<td>754</td>
<td>10.73</td>
</tr>
<tr>
<td>vector</td>
<td>None</td>
<td>47,477,797</td>
<td>48,419,003</td>
<td>11,522</td>
<td>812</td>
</tr>
<tr>
<td></td>
<td>DVR</td>
<td>43,306,447</td>
<td>44,247,653</td>
<td>10,632</td>
<td>784</td>
</tr>
<tr>
<td></td>
<td>PR</td>
<td>3,131,994</td>
<td>55,584,435</td>
<td>754</td>
<td>670</td>
</tr>
<tr>
<td></td>
<td>Both</td>
<td>3,131,031</td>
<td>55,583,472</td>
<td>754</td>
<td>720</td>
</tr>
</tbody>
</table>

The program traffic_light was written by students during a lab course. As the name suggests, it is used to control the operation of a traffic light. It comprises 155 lines of code. Moreover, two interrupts and one timer are used. Once again, DVR had no effect. PR showed a performance similar to the programs described before with an increase in the number of states created of 19.68%. The number of states stored were reduced by 89.62% and the memory requirements were reduced from 2.3 MB to 0.5 MB. Again, this is the lower bound in terms of memory requirements when using [mc]square.

A controller for a powered window lift used in a car was implemented in the program window_lift. This program was inspired by a real automotive task. The program we chose for this case study contains 289 lines of code, and it uses three interrupts and two timers. DVR alone had a significant effect on the state space as the number of states stored dropped from 2,342,564 to 307,176, which is a reduction of 86.89%. The required runtime dropped from 30.11 s to just 9.32 s. PR produced comparable results as the number of states stored dropped to 318,626. The number of states created, however, was increased by 47.43%. This explains the increased runtime when only PR was used. In combination, both analyses led to a state space consisting of only 40,048 states, which is a reduction of 98.29% compared to the original state space. When DVR and PR were applied, 494,140 states had to be created, which is a reduction of 80.91% with respect to the original values. Runtime requirements dropped from 30.11 s to 10.73 s. The memory requirements were reduced significantly; only 9.5 MB of memory were needed compared to 573 MB using the default configuration.

A four-channel speed measurement, where communication with peripherals is performed using a CAN bus interface, was implemented in the program can. The program consists of 383 lines of code and uses two interrupt handlers. Using the default configuration, 147,259,483 states were stored. DVR did not cause any reduction because interrupts are active in every program location and indirect reads are used in the interrupt handlers. Using PR, the number of states stored dropped to 21,037,058, for which 4724 MB of main memory are required. This is a reduction of 85.71%. The number of states created increased by 17.16%, but the runtime was slightly decreased.

The program vector reads various inputs from the environment and performs some geometric computations on these inputs. It consists of 930 lines of code but does not use interrupt handlers. Using DVR led to a minor reduction of the number of states stored as indirect reads are often used in this program. This prevented DVR from resetting more memory locations. Using PR, the number of states stored decreased from 47,477,797 to 3131,994, which is a reduction of 93.40%. Using PR resulted in no major reductions in terms of runtime, but memory requirements dropped from more than 11 GB to 754 MB. Consequently, this program could also be checked using a conventional desktop computer, which is not possible without PR. The number of states created increased by 14.80%. Applying both DVR and PR had almost no additional effect.

Summarizing, it can be seen that PR reduces the number of states stored strongly for every program, and hence, memory consumption. On the other hand, PR does not lower the runtime because of the larger number of states created due to
revisits. For some programs, the runtime was even significantly increased. As each state stored using \([\text{MC}]{\text{SQUARE}}\) requires up to 2 kbytes of memory, space is our main concern. Consequently, this analysis should be used whenever possible.

The efficiency of DVR strongly depends on the structure of the analyzed program. Whenever there is a tight coupling between data variables across functions and interrupt handlers, DVR does not reveal any effect. Moreover, the presence of indirect reads strongly reduces the number of program locations where memory locations can be reset in order to ensure an over-approximation. This is a known problem of static DVR techniques. While the runtime overhead caused through the static analysis increases the runtime for small programs such as \texttt{light\_switch} or \texttt{plant}, this overhead is more than offset by the reduced runtimes due to smaller state spaces, as can be observed for the program \texttt{window\_lift}. For many programs, state spaces are reduced using DVR.

Overall, these results show that abstraction techniques based on static analysis can significantly reduce state spaces during model checking. Furthermore, they reduce memory requirements. The runtime overhead caused through static analysis is moderate and pays off when model checking large programs.

8. Conclusion & future work

This paper describes two abstraction techniques employed to tackle the state-explosion problem, and the underlying static analysis framework. Both abstraction techniques were previously used in other model checkers such as SPIN, ESTES, and MURPHI, but could not be transferred one-to-one to \([\text{MC}]{\text{SQUARE}}\) due to the peculiarities of the binary code. Interprocedural analyses that cope with these peculiarities are employed in \([\text{MC}]{\text{SQUARE}}\). While DVR is performed entirely statically, the preparation of programs for PR combines static and dynamic techniques due to interrupts and nondeterminism.

The results of DVR are comparable to the results achieved in other model checking tools, although the analysis has to be performed interprocedurally. On the other hand, PR has a higher impact on state spaces compared to other model checkers due to the structure of binary code. Similar results were observed by Quiros [30]. The binary code tends to have long single successor chains, which need not to be stored completely. For instance, a single C instruction could be compiled into a sequence of six instructions. Another source of single successor chains is interrupts. In most cases an interrupt handler cannot be interrupted by interrupts, and hence, it can be reduced efficiently. Both reduction techniques can be used to lower the size of state spaces. DVR can be used in any case as it does not have any effect on the validity of specifications. PR can only be used if the X operator in CTL is not needed.

A negative effect of using PR is an increased number of created states due to revisits. From our point of view, it is preferable to trade time for space because memory requirements are a bigger problem. In summary, it can be said that reduction techniques using static analysis can be used to tackle the state-explosion problem in explicit state model-checking. Significant improvements can be observed when using DVR and PR for model checking of the binary code. The impact PR has in this specific domain is even higher than in model checkers working on intermediate languages.

Judging from the size of a program alone, estimating sizes of state spaces is infeasible due to several effects. State spaces strongly depend on the number of input values and their domains. Concurrency introduced by interrupt handlers aggravates the state-explosion problem as well. The effectiveness of DVR and PR is difficult to estimate by inspecting the binary code manually because the program locations where states are stored and variables are reset strongly influence state spaces. In the past, we have successfully verified programs consisting of up to 5000 instructions using \([\text{MC}]{\text{SQUARE}}\). On the other hand, we had to stop the verification process for some programs consisting of only 100 instructions after approximately 6000000000 (partly symbolic) states had been created. From our point of view, a further combination of static and dynamic state-space reduction methods appears to be a feasible approach in order to pave the way for the industrial application of binary code model checkers such as \([\text{MC}]{\text{SQUARE}}\).

Apart from the development and implementation of the described algorithms for the ATMEL ATmega16, we have also implemented these techniques for other platforms such as the Intel MCS-51 [33]. Our experiences show that, in order to make such techniques applicable to the binary code, details of the specific target platform have to be taken into account and special analyses have to be developed in order to handle certain architectural features. For instance, the Intel MCS-51 supports register bank switching in contrast to the ATmega16. Four register banks can be selected using the register selection bits. In order to compute precise results for RDA, LVA, and DVR, a register bank analysis has to be performed in order to compute a safe approximation of the register selection bits. Without register bank analysis, no reaching definitions could ever be removed because it would not be clear which register bank is active, and hence, which register is overwritten.

In order to compute an over-approximation, the analysis would have to assume that all register banks could be written. In like manner, LVA and DVR are influenced by register banks.

Different hardware platforms, however, pose different challenges to static analysis and model checking. In particular, any verification argument for the ATmega16 must pay special attention to the fact that GPRs and IORs can be overwritten using indirect stores, which is not possible on the MCS-51. All in all, the MCS-51 requires more analyses, which are simple in their structure, while the ATmega16 requires fewer analyses using more sophisticated techniques.

One of the remaining challenges in static analysis of the microcontroller binary code concerns indirect reads, writes, jumps, and calls. In order to compute the target locations of such constructs, a pointer analysis is required. Knowing the values of pointers renders DVR more precise. For example, an indirect read can access all memory locations, which prevents these locations from being reset. We believe that the performance of DVR can be significantly improved by embodying a pointer analysis. In contrast to high-level languages, inferring pointer values in the binary code strongly depends on the
ability to precisely analyze loops. In compiler-generated code on the ATMEGA ATmega16, for example, all global variables are initialized in a loop during startup, where the corresponding memory locations are accessed using indirect reads and writes. Compared to the high-level code, less semantic information about types and loop conditions is available in the binary code. Few approaches have been developed that are suitable for computing loop conditions and loop invariants for low-level and binary code [21,19,20,27].

References


B. Schlich et al./Science of Computer Programming 76 (2011) 100–118