Java applets run on a Virtual Machine that checks code integrity and correctness before execution using a module called the Bytecode Verifier. Java Card technology allows Java applets to run on smart cards. The large memory requirements of the verification process do not allow the implementation of an embedded Bytecode Verifier in the Java Card Virtual Machine. To address this problem, we propose a verification algorithm that optimizes the use of system memory by imposing an ordering on the verification of the instructions. This algorithm is based on control flow dependencies and immediate postdominators in control flow graphs.
INTRODUCTION
Java is a safe and portable programming language: compiled programs, namely applets, run on the Java virtual machine (JVM), which interprets applets' code (called bytecode).
The JVM checks security constraints using a security module called the Bytecode Verifier, which performs a static analysis of the bytecode in order to ensure the type-correctness of the code before its execution. Bytecode verification is a key point in the security chain of the Java Platform. For example, it ensures that the program does not forge pointers, e.g. by using integers as object references. Together with runtime checks, this ensures that bytecode execution cannot compromise the memory of the system where the JVM is running.
Java Card technology allows Java applets to run on smart cards. A Java Card is a smart card running a JVM, the Java Card Virtual Machine (JCVM), and it is becoming a secure token in various fields, such as banking and public administration. Java Cards represent an interesting research challenge since they have limited resources but require high security features. The Java Card applets are compiled in a standardized compact code called Java Card bytecode and an off-card verifier checks the correctness of the code before it is downloaded and executed on the JCVM. The large memory space requirements of the verification process do not allow the implementation of a bytecode verifier embedded in the JCVM itself. Nevertheless, on-card bytecode verification would be a useful feature, since it would enable the card user to trust applets that were downloaded and installed after the issuance of the card.
To perform verification on-card, many approaches have been proposed in the literature which modify the standard verification process or apply other techniques such as proof carrying code or cryptography. Nevertheless, on-card bytecode verification is still an open research field. All on-card verifiers currently developed need a preprocessing of the bytecode: either a code transformation to satisfy some constraints or the output of a certifying compiler. These proposals are reviewed in Section 6.
The standard verifier statically allocates all the data structures needed for the verification process and de-allocates them when the verification ends. In this paper, we propose a verification algorithm that optimizes the use of memory by associating a lifetime to the data structures. The notion of immediate postdominator (ipd) in control flow graphs (CFGS) is used to partition the code into verification units. Using the units, the verification process is applied locally to a subset of instructions. This algorithm has the potential to save space since it can de-allocate some data structures each time it has completed a verification unit. The algorithm is based on the work in [1] , where the authors propose a formal proof system for space-aware data flow analysis. The rules in the proof system relate sets of paths to other (smaller) sets of paths using the notion of ipd and deduce sentences about sets of paths, based on the validity of sentences on sub-paths. The algorithm saves execution states of conditional instructions. In Section 4 a variant of the algorithm that saves execution states of jump targets is presented. The algorithms have been used to develop prototype verifiers based on an open-source project of the Apache Foundation, called Bytecode Engineering Library (BCEL) [2] . Many class files were tested, in particular the ones included in the Java Card Development Kit. Statistics about the decrease of memory space requirements with respect to the standard verifier are reported in Section 5. Both algorithms require noticeably less memory than the standard verification (up to 58% of reduction). This allows to reduce the number of un-verifiable classes for a given amount of RAM. Furthermore, the speed penalty appears to be acceptable. These measurements suggest that these algorithms are suitable for an on-card implementation of the bytecode verifier.
OVERVIEW
The result of the compilation of a Java program is a set of class files: a class file is generated by the Java Compiler for each class defined in the program. A class file is composed of the declaration of the class and of the JVM Language (JVML) bytecode for the methods of each class. The JVM is a stack machine manipulating an operand stack and a set of registers (local variables) for each method, and a heap containing object instances. Registers are accessed through load and store instructions that push the value of a given register onto the stack or store the top of the stack in a given register. Instructions of JVML are partially typed: for example, iload loads an integer onto the stack, while aload loads a reference; istore requires an integer on top of the stack, while astore requires a reference. JVML includes simple arithmetic instructions, conditional/unconditional jumps and instructions operating on objects.
A Java Card applet is obtained by transforming the class files into cap files in order to perform name resolution and initial address linking [3] . Hereafter, we will refer to the class files only, since cap files and class files conceptually contain the same information.
The bytecode of a method is a sequence of JVML instructions. When an instance method is invoked (invokevirtual instruction) it executes with a new empty stack and with an initial memory where all registers are undefined except for the first one, register 0 that contains the reference to the object instance on which the method is called, and the following k registers, numbered by 1, . . . , k, that contain the actual parameters. If the called method is static (invokestatic instruction) there is no object to reference and then the actual parameters are placed in the registers starting from register 0. When the method returns, control is transferred to the calling method: the caller's execution environment (operand stack and local registers) is restored and the returned value, if any, is pushed onto the operand stack.
The Java source code for a simple method is shown in Figure 1 . Given an array of integers, the method scans the first h positions and saves in the Pair structure the last index in the array whose value is an odd integer and the last index whose value is an integer multiple of 3. The bytecode corresponding to this method is shown in Figure 2a .
The Standard Verifier
The bytecode verifier checks code correctness in two passes. First, it checks some static constraints (e.g. code containment), then it performs a data-flow analysis. This is the most resource consuming pass. During this step, the verifier executes the instructions abstractly, using types instead of actual values, and checks that the operands of each instruction match the required type. The abstract execution state of the JVM consists of the type information for the operand stack and the local variables of the VM. The information needed by the analysis is the abstract execution state of the instructions.
Types are partially ordered and form a lattice (L, v), where > is the top element and ? is the bottom element. We denote by t the least upper bound between two types (the least common super-type).
L includes primitive types (int, double, . . .), object reference types represented by the names of the corresponding classes, and array types ([t denotes the type of an array of elements of type t). Moreover, null is a special type representing the type of the null reference. Finally, > represents either an undefined type or an incorrect type. Primitive types are unrelated. Class types C are related as specified by the class hierarchy. Figure 3 shows an example. In the figure, E, F and G are user defined classes, with F and G extending E. Not all types are shown.
An abstract execution state is a tuple hS, Ri, where S ¼ L Ã is a stack of types and R: N ! L is a function assigning a type to every register. The v relation between types is extended pointwise to register types and stack types. Two stack types are in the v relation only if they have the same size. Let F be the domain of abstract execution states. Given f ¼ hS f , R f i, class Pair {int x; int y;} public class Sample { public static void
Control Dependencies for Space-Aware Bytecode Verification 235
The least upper bound operation between types is extended pointwise to stacks and memories. This requires that the stacks have the same size.
A Java bytecode verification algorithm is presented in [4] : almost all existing bytecode verifiers implement that algorithm; so, we refer to this algorithm as the Standard Verifier. Essentially, the Standard Verifier operates a typelevel Abstract Interpretation [5] of the code.
The verification is performed one method at a time, assuming that all other methods are well-typed when verifying a method. A coinductive argument shows that if all methods are well-typed, the program (a collection of methods) is well-typed.
Every instruction n is assigned an abstract state hS n , R n i, where S n is the stack and R n are the registers before the instruction (before-state, hereafter). The verifier abstractly executes the instructions at the level of types. The rules of the verifier for some instructions are shown in Figure 4 . We assume that M reg and M stack denote the maximum number of registers used by the method and the maximum size of the operand stack. The stack is represented by a sequence whose leftmost element corresponds to the top, · is the concatenation operator and e denotes the empty stack. Given a stack S, ]S denotes the number of its elements. Moreover, given a function R, Every kind k of instruction is described as a transition function I k : hS, Ri ! hS 0 , R 0 i. The transition models the effect of execution of the corresponding instruction on the abstracted program execution state. If the transition holds, hS 0 , R 0 i is the after-state of the instruction.
Some rules are shown in Figure 4 . For example, an istore_ j instruction requires a non-full stack and the int type to be associated to register j; its effect is to push int onto the stack. The after-state is hS 0 , R 0 i with S 0 ¼ int · S and R 0 ¼ R. The rule for invoke checks that the type of the parameters is compatible with the signature of the method; in this case the method invocation leaves the type of the object returned by the method on the stack. Errors are denoted by the absence of a transition.
The after-state of an instruction is assumed as a possible before-state of any successor of the instruction. The beforestate of every successor is updated to the least upper bound between the original before-state and hS 0 , R 0 i. For example, the after-state of instruction ifeq L at address i becomes the before-state of instructions i + 1 and L.
The rules are used in a standard fixpoint iteration using a worklist algorithm [6] : an instruction n is taken from the worklist and the state for the successor program points is computed. If the computed state for a successor program point m changes (if the state at m was not computed or if it is 6 v of the state already computed), m is added to the worklist. The fixpoint is reached when the worklist is empty. In this case the bytecode verification terminates successfully (the verifier accepts the bytecode). Initially, the worklist contains only the node corresponding to the first instruction of the bytecode. The initial stack and register types represent the state on method entrance: the stack is empty and the types of registers corresponding to the parameters are set as specified by the signature of the method. The other registers hold the > undefined type. As a consequence of the algorithm, the state at instructions representing a merge point between control paths, i.e. having more than one predecessor, is the least upper bound of the states after all predecessor instructions. If, for example, register j has type int on one path and type > on another path, the type of j at the merge point is >. Figure 2b shows the typing assignment to the instructions of the given method produced by the verifier. This bytecode is accepted by the verifier.
The Java Card Platform
A Java Card is a smart card running a JVM, the Java Card Virtual Machine (JCVM). Commercial 2005 Java Cards typically provide 1-4 KB of RAM, 32-64 KB of persistent writable memory and 128-256 KB of ROM. Only the RAM should be used to store temporary data structures because the persistent memory is too slow and allows only a limited number of writing cycles.
Bytecode verification for Java Card applets is identical to a standard verification. However, on-card bytecode verification requires special optimizations, since cards have limited memory resources. Bytecode verification is expensive in space because a representation of the program abstract execution state must be stored for each instruction, for the entire duration of the analysis. As an optimization, Sun's bytecode verifier only maintains the state of each program point that is the target of a branch [7] . Hereafter we call the set of stored abstract execution states as dictionary, D for short, while a state, also called frame, is a dictionary entry.
The natural representation of Java Card types may be contained in 3 bytes. One byte indicates the kind of the type and two bytes of payload contain, for instance, a class reference. In [7] it has been shown that, if H is the number of branch targets in the method, the dictionary size is (3M stack + 3M Reg + 3) · H (the 3 bytes of overhead for dictionary entry are needed to store the program counter and the stack height). Therefore, even the dictionary size of not very complex methods is too large to fit in the RAM available on Java Cards.
THE IF-BASED ALGORITHM
In this section, we introduce an algorithm for bytecode verification that exploits the control flow dependencies of the instructions in the CFG to save the space needed by the representation of the program abstract execution states used throughout the analysis.
The CFG of a method is a directed graph, containing the control dependences among the instructions of the method [8] .
DEFINITION 1. (CFG).
A CFG is a directed graph G ¼ (V, E) such that V is the set of instructions and E V · V represents the control dependences among the instructions. There is an edge (i, j) 2 E if j can be directly executed after i. 
Control Dependencies for Space-Aware Bytecode Verification 237
We enrich the CFG with two extra nodes, start and end, representing the entry node and the final node respectively. We assume that there is an edge from start to the first instruction of the bytecode and an edge from each return instruction of the bytecode to end.
DEFINITION 2. (Path). Given a CFG G ¼ (V,E), a path from n to m is a sequence of nodes n
The CFG of the bytecode in Figure 2a is shown in Figure 5 . For simplicity, nodes of the CFG represent basic blocks instead of simple instructions [8] . In the figure, basic blocks of the bytecode are numbered according to the first instruction of the block.
The ipd of a node n in a directed graph represents the first common node in all the paths that start at node n. The ipd calculation can be implemented using bit vectors with a complexity in space of O(n 2 ), where n is the number of basic blocks in the method [8, 9] . The RAM currently available on Java Cards suffices for ipd calculation for methods with $100 basic blocks. Once calculated, the function that maps each conditional instruction to its ipd can be stored in persistent memory, where it will be accessed, during the algorithm, only for reading. In the following, we assume that end is reachable from any other node of the CFG. As consequence, the immediate postdominator always exists for each node different from end. Note that a node does not reach the end node only if it belongs to an infinite loop. However, current Java Cards are single-threaded, and infinite loops are not acceptable in this context. Thus, we can reasonably reject bytecodes where end is not reachable from any other node. This condition can be checked prior to the execution of the verification algorithm proper, as a by-product of the ipd calculation. In fact, the calculation requires the reverse CFG of the program (from successors to predecessors). In visiting the CFG from end to start, all visited nodes can be marked. In the end, if any unmarked node is found the code is rejected. Finally, we always assume that there is an arc from start to end. This ensures that end ¼ ipd(start) and simplifies the algorithms.
We will spend the remaining part of the section to illustrate the approach. Then we will show the algorithm in a pseudocode format (Section 3.1). The main idea is that, using the notion of ipd, the paths from a node n to a node m that postdominates n are split in the paths from node n to its ipd and the paths that go from ipd(n) to m. The paths that go from a node n to its ipd are the union of the paths that go from every successor of n to ipd(n). According to this the code is partitioned into verification units. A verification unit starts at a control instruction n and ends at ipd(n). We also say that n is the control instruction of the unit and that ipd(n) is the ipd of the unit. Starting with an initial frame, instructions are verified in sequence, by repeatedly applying the transition function, until a control instruction is reached. Whenever a control instruction is reached a new verification unit is entered. The current frame (the beforestate of the control instruction of the unit) becomes the input frame for the unit. Then all paths in the verification unit are visited, starting with the input frame, and the frame calculated at the end of each path is accumulated (by merging) at the ipd node. Finally, the verification unit is exited and the visit continues at ipd node, starting with the accumulated frame.
Verification units are built at run time through the use of a stack that records the open verification units, each with its input frame. The verification unit on the top of the stack is called the current unit. The same verification unit can appear on the stack at most once, at any time.
Due to cycles in the CFG, the conditional instruction of an open verification unit can be reached again, with a new calculated frame. If the new frame is v to the input frame saved into the stack for such a unit, a local fixpoint in the current path is reached and another path is chosen for the current unit. Otherwise, the corresponding verification unit is verified 
238
C. Bernardeschi et al.
again, with an input frame equal to the least upper bound between the new frame and the input frame saved in the stack.
The algorithm
Algorithm 1, written in pseudo-code, needs some explanation. The type Node is used to indicate instructions (nodes in the control flow graph) of the program being verified. The type Frame describes abstract states for the verification of an instruction. Given a Node n, nsucc(n) returns the numbers of successors of n, succ(n, i) returns the i-th successors of n and ipd(n) returns the immediate postdominator of n.
Every Record is a structure composed of four fields named source, paths, beforeFrame and acc respectively. Access to the fields is denoted by the usual dot syntax.
Given a Record t, t.source is a conditional node, t.beforeFrame is a Frame that saves the before-state of node t.source, t.paths is the number of paths not verified yet and t.acc is a Frame in which the before-state of the ipd(t.source) node will be accumulated.
The algorithm uses a stack of Record, named s in the following, to save information on the entered and not yet exited verification units. The last entered verification unit is exited first since ipds are reached in the opposite order of the corresponding conditionals [10] . The operations on the stack are the usual push and pop plus some additional functions: find(n 0 ) applied to the stack s returns a reference to the Record r such that r.source ¼ n 0 ; s.top is a reference to the Record that is on the top of s (null if s is empty).
The function verify(n, f ) implements the transition function. It performs the verification of instruction n when the before-state is given by the Frame f. If the verification is successful, the after-state is returned. Otherwise the call fails.
Algorithm 1 is composed of a loop that terminates either when s is empty (in this case success is returned) or when one of the calls to verify fails.
The analysis begins at the first instruction of the code, node n 0 , with the initial frame f 0 that assigns types to the registers according to the method parameters, and assumes an empty operand stack. Moreover, s contains the special record hstart, 1, >, ?i. For the entire analysis, t is the current unit (see instruction t s.top), t.beforeFrame is the input frame of such a unit, n c is the node reached during the verification of the unit (current node) and f c (current frame) is the before-state of n c .
Inside the loop two main cases are distinguished depending on whether the current node is the ipd of the current unit or not. In the first case (see label 1) we have two sub-cases. If all paths of the verification unit have been verified (that is t.paths ¼ 1), the verification unit is exited (see instruction s.pop), the current node is n c and the new current frame is the least upper bound between the frame accumulated at n c (t.acc) and the current frame (see instruction f c f c t t.acc). If, instead, not all paths of the verification unit have been verified (label 2), the current frame is accumulated at ipd by executing the instruction t.acc t.acc t f c and the number of not yet verified paths is decreased. Finally, the verification continues starting from the next successor of node t.source, with a frame equal to the after-state of the conditional instruction of the current unit. Instruction n c succ(t.source,t.paths) sets the current node, while instruction f c verify(t.source, t.beforeFrame) sets the current frame.
Part at label 3, inside the loop, is executed if the current node is not the ipd of the current unit. In this situation, if n c is not a conditional, verification continues with the current node and the current frame set to the successor of n c (instruction n c succ(n c , 1)) and the after-state of n c ALGORITHM 1. Algorithm that records execution states of conditional instructions and ipds. Verification fails when a call to verify fails.
Node
(instruction f c verify(n c , f c )) respectively. Alternatively, if n c is a conditional, the part at label 4 of the loop is executed. This is the part of the code that implements the already presented strategy for handling cycles in the CFG. If the verification unit is not open, i.e. there does not exist a record in s with node n c , (instruction r s.find(n c
The last case applies when the current frame f c 6 v r.beforeFrame. The code removes all records from s until the record whose reference is r, is found. Node n c is left unchanged, while frame f c is assigned the least upper bound of its value and the frame stored for node n c in the stack. In the next cycle, the function s.find (n c ) will return null and the verification unit will be re-verified with the new frame.
For simplicity, the algorithm deals with start and end as any other node. Nevertheless, frames need not be saved at these special nodes.
In the following, we show the application of the Algorithm 1 to the example in Figure 2 . Each line of Figure 6 shows the state of the relevant data structures at the beginning of each loop: the current node, n c the current frame, f c (representing the before-state of node n c ), and the stack of Records, s. Verification starts at node 1. Basic blocks 1 and 3 are (abstractly) executed and then node 5 is visited with frame f 5 . Node 5 is a conditional instruction, so a new verification unit is opened, pushing a new Record on the stack (the resulting stack is shown in the line corresponding to n c ¼ 6 in Figure 6 ). The new verification unit has node 5 as conditional instruction, 2 (sets of ) paths to visit (corresponding to the two successors of node 5) and an input frame equal to f 5 . The initial before-state for node ipd(5) is set to ?. Then, basic block 6 is executed and node 11 is reached, with before-state f 11 . Again, node 11 is conditional instruction, so a new verification unit is opened and, then, basic block 12 is executed. Now (n c ¼ 15), node ipd(11) is reached, with before-state f 15 . One of the two (sets of ) paths originating from node 11 is finished: frame f 15 is accumulated on the stack and another path is started from 11, using the input frame f 11 (stored on the stack s) as before-state. This second path leads directly to node 15, with frame f Figure 6 ). Note that verification of the units started by instructions 11 and 20 can reuse the same memory, since they are not 'nested': this is where the algorithm saves memory. On the other side, note that, when execution of node 25 is completed, basic block 3 must be executed again, even if it had already been executed with the same beforestate. In fact, the algorithm does not store any before-state for node 3, and can never stop the iteration at this point. This example should make it clear that the algorithm is trading time for space. The algorithm will stop when node 5 is visited again, with frame f 0 5 , since node 5 is stored in the stack. Frame f 0 5 (the current frame) is compared with frame f 5 (the input frame stored in the stack when the verification unit was opened). Since f 0 5 v f 5 , we can assume that the rest of this path has already been verified (or will be verified at a later time) with a frame at least as high (in the lattice of types) as f 0 5 . Thus, we can abort the current path and jump directly at ipd(5) ¼ 26. Note that f c is set to ?, since this aborted path must not contribute to the calculation of the before-state of ipd (5). Finally, the other path starting from node 5 is taken. The path leads directly to node 26 ¼ ipd (5) and closes the verification unit. Node end is then visited, the verification unit opened by the start node is closed and the verification as a whole is completed.
Correctness
We say that an algorithm is safe if it rejects all methods that are rejected by the Standard Verifier. However, the safety requirement could be easily fulfilled by an algorithm that rejects every method, so it is also desirable that the algorithm be precise, i.e. accept all methods accepted by the Standard Verifier. A safety and preciseness proof for (a recursive version of ) Algorithm 1 can be found in [1] . Both safety and preciseness are proved with respect to the Meet-Over-all-Paths (MOPs) solution of the Data Flow Analysis, rather than the Standard Verifier solution. The MOP solution maps each instruction (or basic block) n to the t of all frames with which n is visited, in all possible execution paths that lead to n. The Standard Verifier computes an approximation of the MOP solution, using a fixpoint iteration. It is known that the fixpoint solution is equal to the MOP solution if the analysis is distributive, i.e. if the transition function I k of each kind of instruction k is such that, for all frames f 1 , f 2 , we have
The proof in [1] proceeds in two main steps.
First, a finite representation of all paths in the CFG is built. This representation takes the form of a control tree, where each node i of the tree is marked with a source node n i , a target node m i (with n i and m i nodes in the CFG and m i pd n i ) and a subset g i of nodes in the CFG. The idea is that the subtree below each node i contains a (finite) representation of all paths between n i and m i in the CFG, truncated at nodes in g i . This representation is built by first splitting paths from n i to m i into two sets of sub-paths: sub-paths from n i to ipd(n i ) and sub-paths from ipd(n i ) to m i . Then, sub-paths from n i to ipd(n i ) are split in the paths from each successor of n i , to ipd(n i ). This splitting is repeated recursively, until either the source node is equal to the target node or the source node is found in g i . The subtrees where the root r is such that m r ¼ ipd(n r ) contain the paths of the verification unit of node n r . Moreover, we set n r 2 g j for all nodes j in the subtree below node r. Indeed, g j represents the set of open verification units. It is proved that, for any CFG, we can always build a finite control tree. Second, the control tree built in the previous step is visited, in order to assign a frame to each source node n i , target node m i and all nodes in g i , according to the transition function. This visit may involve backtracking whenever an assignment of a frame to a source node n i is not consistent with the assignment given to the same node in g i (if any). It is proved that the algorithm always terminates and produces an assignment that is safe with respect to the MOP solution. Moreover, if the analysis is distributive, the assignment is also precise.
Algorithm 1 can be understood as an iterative version of the recursive algorithm presented in [1] , where the Stack of Record s implements both the stack of recursive calls and a global instance of sets g i .
THE TARGET-BASED ALGORITHM
In this section we introduce Algorithm 2 as a variant of Algorithm 1. In Algorithm 2, after-states of conditional nodes are stored as before-states of their target nodes. By target nodes, we mean the successors of conditional instructions, other than the instruction that follows sequentially in the bytecode. In the normal case, each conditional instruction has just one target, the exception being multi-target instructions such as tableswitch. The algorithm uses a global dictionary, D, that maps a subset of the CFG nodes to their beforestates. We use D[n] to denote the before-state mapped to node n in D. Mappings from target nodes to frames are added and removed to the dictionary as verification units are entered and left. The rationale behind this variant algorithm is that targets are often shared among different conditional nodes, so memory can be saved by storing a single frame for them. Moreover, the algorithm performs the accumulation of before-states at ipd nodes in the global dictionary as well. This also saves memory in the frequent case when the ipd node is also a target of some conditional node. In Algorithm 2, instructions that are added or modified with respect to Algorithm 1 have been numbered. The stack of open verification units is still used (instruction 1), but each stack element only stores the number of the corresponding conditional node and the number of the still-to-be-visited paths. Instruction 2 introduces the global dictionary as a map from nodes to frames. The global dictionary is manipulated by means of functions update and flush. The update function takes two parameters: a node n and a frame f.
Control Dependencies for Space-Aware Bytecode Verification 241
The function expands the dictionary to include all targets of node n, mapping them to frame f. If some node m was already present in the dictionary, the function sets
is not in the dictionary, the function adds it, with a mapping to ?. Conversely, the flush function takes a single parameter, n, and removes from the dictionary all mappings that were added in a previous call to update(n, f ).
When the ipd of the current verification unit is visited, and there are no other paths to visit, the current frame f c is updated with the frame accumulated at the ipd node, and the verification unit is closed, as in Algorithm 1. However, we can see, in instruction 3, that the frame accumulated at the ipd node is taken from the global dictionary, as anticipated, and not from the stack. Moreover, instruction 4 has been added to remove all frames mapped to nodes that belonged exclusively to the closing verification unit. Instruction 5 is executed when we are visiting the ipd of the current verification unit, but there are still other paths to visit. The instruction accumulates the frames at the ipd node, in the dictionary. Then, another path is started. The starting frame is taken from the dictionary (instruction 6).
Further differences from Algorithm 1 are to be found in the visit of conditional nodes. When a new conditional unit is entered, a new Pair is pushed on the stack (instruction 7) and the dictionary is updated (instruction 8). If a conditional node n is visited, and the verification unit of node n is already open (line 9), the after frame of node n is compared with the before frames of all targets of node n, as stored in the dictionary (these nodes were added to the dictionary by instruction 8, when the verification unit of node n was entered). If 8n 2 Targets(n c ), verify(n c , f c ) v D [n], the current path is terminated. Otherwise all verification units that have been opened after the verification unit of node n are aborted, and the verification restarts at node n, with the current frame. Instruction 10 has been added: it removes from the dictionary all nodes that had been added when opening the aborted verification units. Note that the targets of node n itself are not removed from the dictionary. This causes instruction 8 (that will be executed on re-entering the verification unit of node n) to merge the new calculated frame with the previous ones. This is required to ensure termination.
From the previous description, it should be clear that flush operations on the dictionary are always called in reverse order with respect to update operations. Thus, the dictionary can be implemented as a pair of stacks: a stack s 1 of pairs hn, fi, where f ¼ D[n], and a stack s 2 of pairs hm, pi, where m is a node and p is a pointer to a position in stack s 1 . Each update(m, f ) operation does the following: The flush (m) operation pops hm, pi from s 2 (we know from the algorithm that m will always be on the top of s 2 ) and sets the top of s 1 to p. The D[n] operator can be implemented in the following way:
(i) starting from the bottom of s 2 , the first pair p ¼ hm, pi is found, such that n is a target of m. If no such pair exists, n is not in the dictionary, otherwise the search can continue; (ii) let p 0 ¼ hm, p 0 i be the pair above p in s 2 (if p was the top of s 2 , let p 0 be the top of s 1 ); a linear search for n is performed in stack s 1 , between positions p (excluded) and p 0 . Figure 7 shows the execution of Algorithm 2 applied to the example of Figure 2. 
Comparison of the Target-based and
If-based algorithms Algorithms 2 and 1 behave similarly: they generally visit the same nodes, and in the same order. The two algorithms may behave differently when visiting cycles, which cause a conditional node n to be visited two times. This can be clarified by an example. Suppose node n is the only conditional node that appears in a cycle in the CFG. Then, the two algorithms visit the same nodes, with the same frames, until they reach node n for the first time, with frame f. Now, both algorithms push a new record on the stack. However, Algorithm 1 stores frame f (as a field of the newly created stack record), which is the before-state of node n, while Algorithm 2 stores f 0 ¼ verify(n, f ) (mapped to all targets of node n in the dictionary), which is the after-state of node n. Then, both algorithms continue visiting the same nodes, with the same frames, until they visit node n for the second time. Both algorithms find n already in the stack and must decide whether the current path can be terminated, or whether the stack must be unwound. Algorithm 1 terminates the current path if the current frame, f c , is less than or equal to the frame f stored on the stack on the first visit to n, while Algorithm 2 checks that f 0 c ¼ verifyðn‚ f c Þ is less than or equal to all frames D[m i ], currently mapped to the targets m 1 , . . . , m s of node n in the dictionary. These dictionary entries have been inserted when n was visited for the first time, and cannot have been deleted and re-inserted in the meantime (since the verification unit started by node n is still open). So, they all hold frame f 0 , at least. We must say 'at least', since one or all targets m i may be shared with other conditional nodes n 0 j inside the verification unit of n; so, on visiting nodes n 
EXPERIMENTAL RESULTS
The BCEL [2] was used to develop our prototype verifiers. BCEL is a set of open-source APIs for reading, writing and modifying (binary) class files. It also contains a bytecode verifier called JustIce. We have implemented the Target-based algorithm and have run it on the applications shown in Table 1 . The first five applications are small, and could be fit in current Java Cards: mobileRPG is a Role Playing Game engine for PDAs and Java enabled cellphones; RPNCalc is a reverse polish notation calculator; JCSA is the collection of the Java Card Sample Applets from the Java Card specification; JGraphT is a graph library and Pacap is an electronic purse applet. We have also examined four bigger applications/packages to collect information on a large set of real world Java class files (even if, clearly, each complete package cannot be downloaded on a Java Card): jcdk is the Java Card Development Kit; Marauroa is a server for a multi player on-line game; Azureus is a BitTorrent client and jre is the Sun Java Runtime Environment, version 1.5.0.
For each method, we have recorded the frame size s, the number of targets t, the maximum height, h max , reached by the stack of open verification units, and the maximum number of frames, n max , allocated in the dictionary during the verification. The memory requirement of the Target-based algorithm is approximated as n max · s, since the size of the other data structures is negligible. The memory requirement of the Sun Verifier is estimated as t · s, while the If-based algorithm requires approximately 2 · h max · s, since the If-based algorithm uses the same stack as the Target-based algorithm, but it stores two frames in each stack position (the before frames of a conditional node and of its ipd). Figure 8 shows the estimated memory requirements for the three algorithms, for each of the five small applications. We can see that the Target-based algorithm can reduce memory requirements from 58 (JGraphT) to 27% (JCSA), with respect to the Standard Verifier. We can also note that in two cases (mobileRPG and RPNcalc) the If-based algorithm performs worse than the Standard Verifier: this is mainly due to the necessity to accumulate frames at the ipd of each conditional instructions, which doubles the size of the stack used by the algorithm. For the JCSA collection, the Target-based algorithm requires more memory than the the If-based one. To understand why, note that the Target-based algorithm may allocate more frames than the If-based one, when instructions with more than one target are found in the bytecode. In the JCSA collection, this is the case for the 'process' methods, which mainly consist of a moderately big switch statement. Figure 9 shows the same information for the four bigger packages. Here, we can see that, again, the Target-based algorithm is always better than the Standard Verifier. Note that, in bigger packages, there is a higher probability of finding a method that uses large multi-way jumps, causing the Targetbased algorithm to require more memory than the If-based one. The high memory requirement of the Target-based algorithm in the jcdk package is due to a method that contains a huge switch statement, with 190 targets.
Nevertheless, we think that the Target-based algorithm may be preferable to the If-based one even for packages such as jcdk. In fact, the problems are generally limited to a few methods, and the applet programmer may decide to split those methods in smaller ones, so that they can be verified. Control Dependencies for Space-Aware Bytecode Verification 245
Clearly, this splitting can be carried out independently of the algorithm used, so further investigation is required. Figure 10 shows the number of classes, in the jcdk package, that contain methods that cannot be verified, as a function of the available memory, for the three algorithms. These are the classes that the programmer should rewrite. We can see that the unverifiable classes decrease very rapidly as more memory is made available, confirming the fact that the problems are limited to a few methods. We can see that the Target-based algorithm is able to verify the higher number of classes for nearly all sizes of memory, thus demanding less work from the applet programmer. Figure 11 shows the same information for the JCSA collection. Again, the Target-based algorithm performs better, under this metric, for nearly all sizes of memory. Similar results hold for the other packages also.
To obtain a rough estimate of the running time required by the algorithms, we have run the JustIce and the Targetbased verifiers on all the applications in Table 1 . We have used JustIce as an approximation of the real Sun Verifier. The two verifiers do not behave identically, even if they both use the same algorithm, since JustIce stores a frame for each instruction, rather than for each basic block. For each method, we counted the number of times that the two algorithms had to 'verify' an instruction (i.e. compute an after-state). To compare the Standard algorithm and the Target-based one, we use the ratio of this number to the number of instructions in the corresponding method. The main difference in the running time between the two algorithms lies in the fact that, once the Target-based algorithm has deallocated the frames of some instructions, it may be forced to compute them again at a later time, if those instructions belonged to another path. Table 2 shows, for each application and for each algorithm, the average value of the ratio, its worst-case value and its 95% percentile. We can see that, in most cases, the Standard algorithm verifies each instruction only once. The Target-based algorithm causes a small penalty, on average, as expected. Unexpectedly, the Target-based algorithm can also perform slightly better, in some cases, than the algorithm implemented in JustIce (this is the case for JCSA and Pacap). However, this effect is due to the specific implementation of the Standard algorithm in JustIce. We also note that the Target-based algorithm may perform very badly on some methods, as shown by the worst-case value for the ratio. However, the fact that 95% percentiles are generally acceptable shows that the worst-case scenarios are sufficiently rare.
RELATED WORK
Many approaches have been presented to ensure typecorrectness of the applets executed by a Java Card. First, a cryptography-based approach could be applied: the applet issuer sends the cap file to a trusted third party (TTP), who digitally signs it with his private key. To install an applet, one must check, using the TTP public key, that the downloaded code is identical to the code the producer sent to the TTP. This mechanism assumes trust relations between code producer and code consumer, and between the code consumer and the TTP.
Rose [11] proposes a solution based on a certification system (inspired by the PCC, 'Proof Carrying Code' work by Necula [12] ). The verification process is split in two phases: lightweight bytecode certification (LBC) and lightweight bytecode verification (LBV). LBC is performed offcard and produces a certificate that must be distributed with the bytecode. LBV is performed on-card, and it is a linear verification process that uses the certificate to assure that both the bytecode is safe and the certificate has not been altered. LBV is currently used in the KVM of Sun's Java 2 Micro Edition. A major drawback in this solutions is that every code must be distributed with its certificate, thus increasing the size of data that must be downloaded to the card.
Leroy, in [13] , proposes to reduce memory requirements with an off-card code transformation, also known as code normalization. The transformed code complies with the following constraints: every register contains the same type for all method instructions and the stack is empty at the merge points. The verification of a normalized code is not expensive: only one global state is required since the type of the registers never changes. As with PCC, the off-card elaboration may raise the cost of deployment and decrease interoperability.
Deville and Grimaud [14] suggest use of the persistent memory for storing the data structures needed for the verification process. Their strategy holds all data structures in RAM as long as possible and swaps them in persistent memory when RAM space is missing. Since persistent memory cells have a limited number of writing cycles, special care is taken to both reduce the number of writes and to reduce the impact of those writes, using a special encoding for types. In contrast, our approach requires less writes to persistent memory (namely, only those required to store the ipd table at the beginning of the verification), thus extending the lifetime of the card.
In [15] , an approach that reduces the use of memory by performing the verification during multiple specialized passes is presented. Each pass is dedicated to a single type. The algorithm reduces the space necessary to encode a type since, during each pass, the abstract interpreter only needs to know whether the type (saved in a register or in a stack location) is compatible with the pass or not. The compatibility among types is given by the type hierarchy. This approach reduces the size of each frame, but still statically allocates a frame for each basic block. This solution is completely orthogonal to the one presented in this paper, and the two approaches may be combined to further reduce the memory requirements of the verification.
Naccache et al., in [16] , propose a different method to reduce the size of frames. This size is normally equal for all frames of a given method, and is computed using the maximum values for the stack height and for the number of local variables used throughout the method (these values are provided by the Java compiler). Naccache et al. [16] propose to preprocess each method, before verification, to compute a different frame size for each basic block. Since it is generally the case that the stack is empty at the merge points, this results in a significant reduction of the memory requirements. Again, this approach is orthogonal to ours, in that it reduces the size of frames, but not their number. However, a combination of the two approaches would require dynamic allocation of variable length frames, which is complex.
CONCLUSION
Experimental results show that the Target-based algorithm leads to significant benefits in terms of reduced memory overhead, with respect to the Standard Verifier algorithm. The Target-based algorithm trades time for space, so it is expected that the Standard Verifier algorithm should be faster. Experimental results show that the time penalty of the Target-based algorithm is generally acceptable.
Compared with many other related work, the approach proposed in this paper has the advantage that the algorithm can be applied to unmodified cap files. The memory reduction is only obtained by applying a special strategy to instruction verification. The code is split into verification units and verification is applied locally to subsets of instruction.
Future work includes the extension of the tool to the full JCVM instruction set: currently, subroutines and exceptions are not supported. As far as exceptions are concerned, we plan to save a frame at the entry point of each exception handler, similarly to Sun's solution. Instructions in the method bytecode, protected by a given exception handler, update the exception handler's frame according to the standard verifier rules. Then, exception handlers are verified separately on the CFG whose initial node is the entry point of the exception handler, using our algorithm. Subroutines are fragments of code shared by different execution paths. They are used by compilers for generating more compact code and for translating the try-finally statement [4] . The jsr instructions are goto instructions that also push a return address on the stack. This return address is then used by the ret instruction for transferring the control to the instruction immediately Control Dependencies for Space-Aware Bytecode Verification 247 after the jsr. Since many jsrs in a method can refer to the same subroutine, the verification should be polymorphic over the types of the registers that are not touched by the subroutine [17] . Many approaches were suggested to cope with this polymorphism (see [18, Section 4] for a complete reference). All these techniques complicate the verifier and/or require more space for type indicators and thus are not suitable for an on-card implementation. Regardless, subroutines are not heavily used in Java code [19] . This is even more true for Java Card code, since compilers tend to expand simple finally blocks inline, without using subroutines. For instance, in all of the small applications we tested (mobileRPG, RPNCALC, JCSA, JGraphT and Pacap) no subroutines were found, and in the bigger applications only 103 out of 120,575 methods contained subroutines (<0.01%). Therefore, a simple solution could be to statically calculate return addresses of subroutines and, then, regard ret instructions as branches to any instruction that follows a jsr. That is, for each ret instruction r called by jsr instructions {j 1 , . . . , j k } the CFG will contain edges (r, j i + 1), with 1 i k. Note that, during the verification of the method, jsr instructions not reachable from the entry point of the method are not considered (since they create paths in the CFG that do not correspond to any real execution path), and similarly for exception handlers. Even though this choice rejects some correct programs, the authors believes that these cases are extremely rare in the context of Java Card. A preliminary version of the work presented in this paper appeared in [20] .
