ABSTRACT
Introduction
Traditionally, a program's execution is said to be correct only if its architectural state is numerically perfect on a cycle-by-cycle basis. A similar (though slightly looser) notion of correctness requires a program's visible architectural state, i.e. its output state, to be numerically perfect. In both cases, correctness requires precise numerical integrity at the architecture level, which is a very strict requirement. We refer to the traditional notion of correctness as architecture-level correctness. Such strict assumptions regarding program output has led to conventional error detection and recovery techniques (such as triple (or dual) modular hardware/software redundancy (TMR, DMR)) that suffer from large performance and energy overheads. In many applications, even if execution is not 100% numerically correct, the program can still appear to execute correctly from the user's perspective [3] [21] . Although such numerically faulty executions do not pass the test of architecture-level correctness, they may be completely acceptable at the user or application level. We refer to such a notion of correctness as application-level correctness. Good examples of such error tolerant applications are programs from the multimedia domain, where a few bits errors in an image or video stream may still be acceptable. However, even applications that exhibit a high degree of error tolerance contain certain instructions (and program segments) that are required to be numerically correct for the program output to be acceptable. Such instructions are called application-level critical instructions or critical instructions in short, in this paper. In this paper, we focus on determining critical program segments which, when erroneous, will affect application level correctness. We assume that our application source code and the target hardware are correct, but transient errors (such as single event upsets due to high-energy particle strikes) are the sources of errors during run-time. Identifying critical instructions allows us to selectively replicate and verify critical program segments while executing a single, unchecked copy of the remainder of the program, thus greatly reducing overhead (in terms of time and energy) for transient error detection and recovery by reducing the number of instructions that need to be duplicated and checked at runtime.
Related Work and Our Contributions
Several researchers have proposed techniques to distinguish between critical and non-critical instructions in programs. We briefly survey the existing techniques in this section:
Monte-Carlo based techniques:
The approaches proposed in [3] [4] [5] are based on extensive random, fault injection into program code and then observing their effect on program execution. In [3] , based on these results, the program stack, register file contents and the PC, along with certain manually identified application specific data structures, are marked as part of critical state,. A detailed analysis of the frequency and type of abnormal program symptoms that are caused by errors is presented in [4] . The authors in [4] identify such symptom generating instructions and show that the probability of an error showing up as a symptom within a relatively small instruction window is high. In [5] , likely program invariants are detected using Monte-Carlo simulation and checks are inserted to verify these invariants during program execution. Fault injections and analysis are performed at register transfer level by [8] . The advantage of MonteCarlo techniques is that they are general and can be applied to any application; the downside is the possibility of high running time and missing some critical instructions.
Program analysis techniques:
In [29] , the authors mark instructions that affect global variables and arguments to functions as high-value. Approaches proposed in [7] [9] provide a simple static analysis technique wherein instructions that affect control flow and memory address computation are tagged critical and are marked for protection. However, such an approach might not be safe as some data-flow critical instructions might be missed. In [6] , the authors analyze multimedia workloads that can tolerate errors, and propose exploiting this to address manufacturing defects. In [10] [11], the authors use dynamic dependence graphs (DDG) to identify critical instructions and perform static analysis to determine all instructions that affect such instructions. However, similar to Monte-Carlo based techniques, the above mentioned approaches are not guaranteed to identify all critical instructions (because the DDGs are input dependent). In [12] , the authors use formal methods with symbolic expressions to obtain exhaustive error propagation and coverage metrics. But results are reported for small programs and it is unclear whether this technique can scale to large programs.
Using program invariants and patterns
The compiler research community has proposed several approaches for detecting errors in programs based on static analysis [16] [17] using approximated fault models. Runtime error detectors are specified by the designer using rule-based templates in [19] . Daikon [15] and DIDUCE [14] are systems which dynamically detects program invariants. It is unclear whether all critical data and instructions can be protected by using such invariants. In [18] , the authors try to learn common program patterns from the source code. Deviations from these patterns are tagged with a warning for possible errors. However, it would be very time consuming, if not impossible, for the user to specify all possible invariants for all critical instructions in a program. In failure oblivious computing platforms such as in [13] , the target platform is modified so that the faulty application can recover from a checkpoint even in the case of an instruction that causes fatal error (such as program termination). However, if the faulty instruction is critical in terms of application-level correctness, the output error can be arbitrarily large. In contrast to previous work, our contribution is a highly efficient, profiling-guided static program analysis technique and runtime monitoring approach that is guaranteed to identify all critical instructions in a program. In particular, our approach includes: (1) A scalable program analysis phase that conservatively classifies instructions into 2 sets -static critical (SC) and static non-critical (SNC) based on the number of instructions that each instruction affects. The analysis is conservative because it might classify certain non-critical instructions as critical. (2) A profiling phase that further divides the instructions in set SC into two subsets -likely critical (LC) (typically a small subset) and likely non-critical (LNC) based on the results of profiling. (3) A lightweight runtime monitoring mechanism that tracks instructions that were classified as LNC by the static analysis to ensure that corrective actions are taken if these instructions become critical at runtime (for example, in corner cases, non-profiled input sequences etc). This makes our approach different from Monte Carlo based techniques, because profiling is used only to identify instructions that are likely noncritical (LNC), but cannot be proven by static analysis to be non-critical. Runtime monitoring is used to detect when such instructions might become critical (in rare cases) and to take corrective actions if necessary. (4) We use the results of our analysis to ensure application-level correctness in the presence of soft errors. Instructions belonging to the set LC are duplicated and checked using the software-based approach proposed in [20] , instructions in LNC are monitored at runtime (in case such instructions become critical remedial actions need to be taken as described in Section 7.3), while instructions in set SNC are neither duplicated nor monitored.
Put together, our approach can lead to 22% average energy savings for multimedia applications while guaranteeing application-level correctness, when compared to a recent work [9] , which cannot guarantee application-level correctness. Comparing to the technique proposed in [20] which guarantees both application-level and numerical correctness, our method achieves 79% energy reduction. The remainder of this paper is organized as follows. Section 3 describes certain preliminaries associated with our method. Section 4 provides a brief overview of our technique. Sections 5 and 6 describe our static analysis technique. Section 7 describes our lightweight runtime monitoring technique to detect if any LNC instruction becomes critical at runtime. Section 8 describes experiments and results and finally we conclude our paper in Section 9.
Program Representation

Preliminaries
Programs can exhibit enhanced error resilience at the application-level when multiple valid output values are permitted. In this paper, we say such programs have "elastic outputs." Elastic outputs commonly occur in programs computing results that are interpreted qualitatively by the user, such as multimedia applications and heuristic-based algorithms (such as genetic algorithms, loopy belief propagation and support vector machines) that attempt to solve complex problems for which absolute optimal solutions are too costly to compute. Programs with elastic outputs have application-dependent fidelity metrics associated with them to mathematically characterize the quality of the solution. Examples of fidelity metrics include PSNR (peak signal to noise ratio) for the multimedia applications, bit error rate for error correcting codes, etc. Application-level correctness can be defined in terms of the value of such fidelity metrics that estimate the overall quality of solution. Application-level correctness -Given an application A with:
• A set of outputs O c that require numerical correctness.
• A vector of elastic outputs O.
• A fidelity metric F A (I, O) associated with its input I and the corresponding elastic output O.
• A user-specified threshold T for acceptable output. An output instance (O, O c ) obtained by executing application A with input instance I in the presence of soft-errors is defined to be application-level correct if F A (I, O)≥ T and O c is numerically correct. We believe that a domain expert would define these parameters for a given application and the application writer (by whom our analysis technique would be used) would be able to tune the acceptable threshold based on the end user's requirements. In our technique, it is the responsibility of the application writer to mark the outputs which can tolerate errors before running our analysis technique -any outputs not marked as error-tolerant are assumed to have the requirement of numerical correctness. In our technique, we assume that if an error in instruction x can propagate to output element o i then the magnitude of error of o i can be arbitrarily large and hence, the number of incorrect outputs determines whether the program execution satisfies application-level correctness. We also introduce the following concepts.
• N min : Given the maximum possible value of error for one output element (E max ), N min is the minimum number of output elements that must be erroneous (each with error E max ) so that F A (I, O) falls below the specified threshold T.
• Basic block: A basic block in a program is a sequence of consecutive instructions that has one entry point, one exit point and no other branch instructions.
• Instruction instance: Instance of an instruction x in the program refers to a dynamic execution of the instruction at run-time.
• α-AFFECTER: An instruction x is said to be an α-AFFECTER of instruction y if an error in one instance of x can propagate to at least α instances of y.
• Static instruction id (SID): Each static instruction is given a unique id at compile time. would need to be erroneous to cause large degradation in output quality.
On the other hand, a single error in the computation of variable i (line 3 in Figure 1 ) might cause significant degradation (especially if the error occurs in early iterations). 
Program representation
We implement all our analysis techniques in the LLVM compiler framework [22] -however, our technique is not restricted to LLVM and can be applied easily to other compiler infrastructures. The LLVM intermediate representation (IR) is a static single assignment (SSA) based representation that essentially models a RISC processor with infinite registers where accesses to pointers (and arrays) are only through load/store instructions while all other instructions operate on register operands. Starting from the LLVM IR, we construct a weighted program dependence graph (PDG) G(V, E, W) as follows:
1. entry: 2. X = call sqrt(Y); 2. bb: 
Overview of the proposed method
Our proposed method, named CIAP (Critical Instruction Analysis and Protection), consists of the following steps 1. Construction of the weighted PDG from the LLVM IR (Section 5). 2. Using the PDG to compute N min -AFFECTER for all outputs. As per the definition given above, the weighted PDG estimates the effect of an erroneous instruction on its immediate successors. Section 6 describes how this information is propagated through the PDG to estimate the effect on instructions that write to the final outputs. This step classifies instructions into 2 sets: static critical (SC) and static non-critical (SNC). The classification is conservative in the sense that certain non-critical instructions might be marked SC. 3. Profiling using given sample inputs is used to further refine the instructions in set SC. Instructions that are frequently seen to be critical are classified as likely critical (LC) while others are classified as likely non-critical (LNC).
4. Runtime monitoring is used to detect whether any instructions in set LNC become critical at runtime. The monitoring mechanism and the corrective measures to take when an instruction in LNC becomes critical are described in Section 7.2. 5. Reliable execution is achieved by duplicating the instructions that are in set LC after step 3. Checks are inserted in the program for error detection and recovery (Section 7). Together with the runtime monitoring system, errors (if any) in critical instructions are detected and corrected by re-execution.
Constructing PDG and computing weights
We first introduce some terminology that we use in this section:
. ,L lv ) for node v represents the loop nest inside which node v resides. The outermost loop is the first element of the loop vector. Also, lv represents the index of the innermost loop. Trip count of loop L, TC(L) is the number of iterations executed by loop L. If this value cannot be determined statically, then it is assumed to be a large value TC max .
Constructing PDG
The construction of the PDG, apart from edges connecting store and load operations, is trivial given SSA-based IR. For each LLVM instruction v, an edge (given by e(u→v)) is added from every operand of v (given by u) to v. For load-store instructions, we use LLVM's alias analysis to determine whether the address for the load and store instructions can overlap. If the alias analysis returns true and if the object accessed is an array and the array index is an affine function of loop index variables, we further check for dependence using the Omega library [23] for accesses with affine indexes. If a true dependence exists, a new edge is added in the PDG between the nodes corresponding to the load-store pair. If the array access is not an affine expression of the loop index variables or the object being accessed is not an array, then we make a conservative assumption and assume both loop carried and loop independent dependence between the load-store pair over all common loops in L vec (s) and L vec (l) . Figure 3 shows the part of the weighted PDG for the example in Figure  2 . The dotted edge indicates that it is a loop carried dependence edge between the store (node sc) and load instruction (node c_i_1), and the solid edges represent loop independent dependences.
Computing edge weights -Static method
We identify three cases for computing the weight of edge e(u, v) based on the type of the nodes u and v. 
which is the product of the trip counts of the loops of which v is part of but u is not. Note that if v is not part of any loop nest or is part of an outer loop relative to u then this value evaluates to 1. For the example in Figure 3 , node X is outside the loop nest that contains node out_i. Hence, w(X, out_i)) is set to N because any error in the computation of X will affect N instances of the node out_i. 
from which data is loaded, edge e is handled in the same way as Case 1. If u is a store instruction, which may alias with the address of the load, then the w(e) is set to Max_live_read(u, v) which is defined and computed as follows. Max_live_reads(s, l) is the maximum number of instances of load instruction l that read the live value produced by store instruction s. Essentially, this number estimates the number of instances of l that are impacted by an error in store instruction s. In general, this is a difficult quantity to compute -we describe two scenarios in which this quantity can be efficiently computed. If neither scenario occurs, we conservatively set this value to a large number MLR max . Scenario 1: instructions s and l are in the same basic block, have identical address operands and s is before l in the basic block. In this case, the Max_live_reads(s, l) is 1 because a new instance of s is always executed before l and both access the same address. 
I I
). In this case, the barvinok library [24] can be used to get an upper bound of the number of integer points within this polytope, each of which is a legal solution to the above constraints -this provides an upper bound for Max_live_reads. For the example in Figure  3 , for one instance of node sc, there will be one instance of node c_i_1 in the next iteration which reads the value of the store. Hence, Max_live_reads(sc, c_i_1) is set to 1 and e(sc, c_i_1) is a loop-carried dependence edge.
Computing α-AFFECTER from weighted PDG
In this section, we present an algorithm Propagation to determine whether an instruction x is a α-AFFECTER of instruction y given a weighted PDG G. We first assume that G is a directed acyclic graph (DAG) and then relax this assumption.
Acyclic PDG
The value propagate(x→v) represents the maximum number of instances of node v to which an error in a single instance of x can propagate to. Algorithm Propagation iterates over the nodes in topological order, starting at x. At first, the value of propagate(x→n) is set to 0 for all nodes n. propagate(x→x) is then set to 1 (because in a DAG one instance of a node affects only one instance of itself). For node v, the value of propagate(x→u) has been computed for all predecessors u (since we traverse G in topological order). By definition, propagate(x→u) represents the number of instances of u to which an error from a single instance of x can propagate to and w(u, v) is the number of instances of v to which an error from a single instance of u can propagate to. Hence, the maximum number of instances of v to which an error from a single instance of x can propagate to through node u is simply the product w(u,v)* propagate(x→u). We take the maximum over all predecessors of v to get an upper bound on the number of instances of v to which an error in a single instance of x can propagate to. 
→ = →
In the example in Figure 3 , let the us consider a part of the graph which does not contain any cycles -subgraph containing nodes X, out_i and so.. To compute propagate(X→so), we first compute propagate(X→out_i) which is simply the edge weight N. Then we multiply this by the w(out_i, so), which is 1 thus giving propagate(X→so) value N.
PDG with cycles
Our goal is to modify the PDG by adding new edges and updating some edge weights so that all cycles can be removed. Existence of cycle C implies that an error propagated to any single node instance in C will propagate to all node instances in C (in the worst case). Hence, any node that can affect one node of the cycle must affect all the nodes of the cycle. This information must be reflected in the modified PDG. Since the PDG has cycles, the original program must have loops. This implies that for each cycle C in the PDG G, there exists an edge be which corresponds to the back-edge of the loop L associated with the cycle C. We first add edges to all nodes in C from nodes which are not part of C but affect at least one node in C. More precisely, for all
edges e'(u→v) for all v C ∈ and set w(e') to TC(L)*w(be). This
ensures that any error from a node outside the cycle will be propagated to all nodes in the cycle and the number of instances affected is upper bound by the product of trip count of the loop L and weight of the backedge. We can then delete be. By repeating this process for all cycles, we finally end up with a DAG, for which we can reuse algorithm Propagation. The only difference is the initial value of propagate(x→x) for nodes x that are part of the cycle in G ⎯ the initial value is set to TC(L)*w(be). This implies that any error in one instance of x will propagate to all instances of x in the cycle of which both x is part of (in the worst case). In Figure 3 , the cycle involving nodes sc and c_i_1 (both loop-carried and loop-independent dependence between store and load), results in the propagate(sc→sc) and propagate(c_i_1→c_i_1) for both nodes to be initialized with value N.
Identification of critical instructions:
Let NumC=(i 1 , i 2 , .. i n ) be the instructions that directly write to outputs that are required to be numerically correct and J=(j 1 , j 2 , .. j m ) be those that directly write to the vector of elastic outputs. An instruction x is marked static critical (SC) if: Rule 1: x is 1-affecter for any instruction in NumC i.e. Proof is omitted due to page limit
Control flow optimization
We define a critical basic block as a basic block that contains at least one critical instruction. Figure 4 shows the outline of our analysis technique C-Opt. In the 'Initialize' step (line 1), basic blocks are marked critical based on the conditions:
• All basic blocks containing at least one critical instruction (section 6.2) are marked critical.
• All basic blocks that exit from loops that contain at least one critical basic block are marked critical.
• All basic blocks that can cause program termination (exit function etc.) are marked critical. Our algorithm then iterates over all basic blocks (lines 4-6) and for each block A tries to find whether control flow from A can reach a critical basic block before ipostdom(A) (line 7). This step is a simple depth-first search in the CFG starting at A. If yes, then A is marked as a critical block and A's branch instruction (A br ) is also marked critical (lines 9-11). All instructions that are 1-affecters of A br (denoted by DS(A br )) are also marked critical as are the basic blocks that contain these instructions (line 12). The algorithm terminates when no more basic blocks are marked critical (line 15). The algorithm is guaranteed to terminate because in the worst case all basic blocks are marked critical and the outer loop exits. The complexity is O(n 3 ) where n is the number of basic blocks.
Theorem 2:
Procedure C-Opt does not remove any static critical instructions.
Proof omitted due to page limit.
Assuring application-level correctnessprofiling and runtime monitoring
Profiling edge weights
We identify two main sources of conservatism in static analysis that might lead to high edge weights resulting in several instructions being marked critical conservatively.
Conservative edge weights -Conservative alias and control flow analysis techniques might set certain edge weights much higher than they occur during program execution.
Correlated edge weights -It is possible that two dependence edges are never active simultaneously during program execution due to control flow, data dependent address computation etc. In our approach, we track edge weights that are overestimated by static analysis. Determining correlations between edge weights requires path tracking which we do not investigate in our work. Profiling is used to estimate edge weights of two kinds of edges -edges between a store and load instruction and a control dependent edge with a phi node as its endpoint. We profile these two types of edges only because static analysis is overly conservative for these two kinds of edges. We compare weights of edges obtained during profiling with those estimated by static analysis. Any edge whose weight obtained from profiling is 25% or smaller of the value estimated by static analysis is identified as a conservative edge (the choice of threshold 25% does not affect the result significantly. Other thresholds can be used). We then substitute the new values of edge weights obtained from profiling for conservative edges and re-run the Propagate technique described in Section 6. It is possible that certain instructions which were marked critical earlier by static analysis are now no longer critical. We mark these instructions as likely non-critical (LNC) to be monitored at runtime. 1 .
for(i=0; i<#blocks; ++i) 5.
Mark A as critical basic block; 10.
Abr ← A's branch 11.
Mark Abr as critical; 12.
Mark DS(Abr) as critical; 13. } 14.
} 15.
}while ( new critical blocks found ); 
Runtime monitoring of edge weights
At runtime certain edges may have higher weights for certain corner input cases, which would make the conclusions of the profiling-guided criticality analysis invalid. Hence, we propose the use of a lightweight runtime scheme that can monitor edge weights and trigger a signal when an edge weight increases above a threshold (which might make certain instructions critical). It is straightforward to track control edge weights given the basic blocks in the program. We explain how we track weights of edges for dependences between a store and load instruction. Each memory location that is accessed by the load or store has a tracker structure associated with it -the tracker has two fields a src field to store the static instruction id (SID) of the instruction writing to the address and a counter field to determine the number of times the memory location is read before being overwritten by a new value. When a store instruction writes to an address, it sets the src field of the tracker to its SID value and resets the counter to 0. When a load instruction reads from an address, it reads the src field associated with the tracker and increments the value of the counter. The src SID and the load instructions' SID together uniquely determine the dependence edge and the value of the counter represents the edge weight. As long as the edge weight is below the threshold, the results of the profiling guided analysis hold and all instructions in set LNC are still non-critical. If the edge weight crosses the threshold value, then there is a chance that certain instructions in the set LNC may become critical. In our implementation, it is set to 80% of the edge weight obtained during profiling. Accessing the counter location given the source and destination SIDs at runtime needs to be efficient. We use a perfect hashing scheme [27] to achieve this. -a perfect hash is feasible since all the possible likely source ids given the destination id is already known after profiling.
Ensuring application-level correctness
In our method two versions of the code are compiled -(i) version V 1 which assumes that all instructions are critical and uses the method in [20] to guarantee application-level correctness (in fact, also complete numerical correctness) through software-based duplication and check of all instructions, and (ii) version V 2 which assumes that only instructions classified as likely critical (LC) after the profiling phase are critical. In version V 2 , all instructions in the set LC are duplicated and their results are compared before writing to memory (similar to [20] ). If any errors are detected, then execution is rolled back to the start of the basic block and instructions are re-executed. Version V 2 also contains runtime monitors to track dependence edges associated with likely non-critical (LNC) instructions. Note that the instructions marked as static noncritical (SNC) are neither checked nor monitored (since any errors in SNC instructions will not affect application-level correctness).
To begin with, version V 2 is executed. If an edge weight exceeds its threshold value, then it is possible that a LNC instruction is likely to become critical at runtime. Hence, whenever any edge weight crosses the threshold value, we switch to version V 1 as we can no longer guarantee non-criticality of instructions in version V 2 due to the unexpected dependence. Note that since [20] is a software-only based fault tolerance scheme, there are certain kinds of hardware errors which cannot be detected by such a scheme -for more details, we refer the reader to [34] . While we choose to use the scheme in [20] for our experiments, it is important to note that any improvements to software-only fault tolerant schemes (such as hardware support for correct update of the PC) are orthogonal to our technique and hence can be jointly used.
Experiments and Results
We assume our program will run on a commercial off the shelf (COTS) processor platform where the memory structures (caches, external memory) are protected by ECC. We measure the energy overhead for achieving reliable execution by simulation using Winsconsin's GEMS simulation infrastructure [25] and McPAT [26] for power estimation. We simulate a 4-issue processor with 64 KB L1 caches and 4MB L2 cache.
Error injection methodology:
For each instruction in IR-level representation, based on the given error probability (0.01% in our case) and the number of times the instruction is executed (obtained using a profiling run), we use a pseudo-random number generator to determine whether that particular instruction should be impacted by a single-event upset. If yes, a new function call is inserted into the program -the new function call invokes a pseudo-random number generator and determines whether to inject an error or not at run-time. Using real soft error rates for applications would take a long time for errors to appear; hence we use 0.01% as a sample value. For actual soft-error rates, we refer the reader to [35] .
Illegal memory accesses:
In [4] , the authors showed that when an error at the RTL-level for a processor is not masked, it appears in software primarily as an illegal memory address leading to a program crash. However, since many instructions are non-critical, an application-level correctness aware system would ignore memory accesses with faulty address values. Fortunately, current processor ISAs such as SPARC v9
[36] provide non-faulting loads (for optimized compilation) -load instructions which will execute correctly when the memory address is legal and will be ignored when illegal. Such non-faulting loads have no overhead relative to normal load instructions when the address is legal. Profiling helps reduce weights of certain edges which leads to many instructions being marked as LNC, particularly for the larger benchmarks. Table 2 shows the overhead (in terms of number of instructions executed at run-time and energy) of duplicating critical instructions to achieve application-level correctness. For all applications, the maximum allowed fall in quality of solution is set to 5%. Columns 2-5 show the number of instructions (relative to the single correct copy) executed at run-time. As can be seen, for the random error injections, the overhead of recovery is small primarily because of the relatively small number of critical instructions. Columns 6 and 7 show the energy savings achieved by our method compared to the approach in [9] -column 6 shows the impact of static analysis only while column 7 considers both static analysis and run-time monitoring. Column 8 shows the energy savings over the technique proposed in [20] that guarantees numerical correctness. It should be noted the denominator used in computing the percentages is the corresponding value of the single error-free execution (without duplication) -hence, some of the percentages are abovc 100%. On the average, our technique provides 21% energy reduction compared to [9] and ensures application level correctness (which [9] may not guarantee) and 79% energy savings compared to [20] (which ensures numerical correctness and hence is a stricter form of checking). Columns 9 and 10 of Table 2 show the overhead associated with runtime monitoring (for the applications that benefited from runtime monitoring). The main source of overhead is the write and read of the source and destination SIDs in memory. Hashing is relatively fast. For our set of applications, we did not observe any edge weights crossing their threshold valuesthis might be a characteristic of media benchmarks.
Analysis of results
For applications such as DCT and YCC2RGB in JPEG, the data flow is non-critical and only loops form the critical program segments. These applications have affine accesses to arrays and hence static analysis is able to perform well. GSM and G721 operate on small chunks of data (16-130 bytes) and almost all dependences which are not well analyzed by static techniques affect the control flow. A few control statements are marked non-critical by our control-flow analysis technique. The kernels in Susan have affine accesses with a large number of instructions affecting control flow -however, each control decision affects only one output element and each decision is independent. Thus, our control flow analysis can save a significant number of critical instructions. Static analysis performs adequately for these applications. Static analysis performs poorly for the remaining benchmarks. LDPC is pointer-heavy with linked lists being used for representing a sparse matrix. Denoising on the other hand is an iterative algorithm with a check for convergence. Although it shows strong loop-dependence, the number of iterations, before convergence, is rarely large enough (even in presence of errors) for an error to propagate to a large number of elements. H264 and libmad (mp3 decoder) are streaming applications; however, static analysis is not able to perform effective alias analysis due to the complex nature of the code involved. For these applications, the runtime monitoring scheme is useful as it helps to overcome the overly conservative nature of static analysis. Our analysis does not show much difference in the number of critical instructions detected when the error tolerance limit is varied. The primary reason for this is that our technique is predominantly a worstcase based static analysis technique. If the analysis technique determines that there might be a single dynamic instance of a given instruction that might be critical, the instruction is marked critical and all dynamic instances of the instruction have a duplicate copy. Table 3 shows the sensitivity of the result to the threshold used for edge weights, which is used distinguish between critical and likely noncritical (LNC) instructions. The first row of the table shows the threshold values used for edge weights when comparing weights provided by static analysis and profiling. For each threshold value, we report the number of dynamic LNC instructions (column titled #LNC) and the runtime overhead of monitoring these LNCs (column labeled RTMO) normalized to the number of dynamic instructions executed by a single, correct instance of the application. In general, as the threshold increases more nodes are marked LNC and are marked for monitoring at run-time. Thus, the overhead associated with achieving application level reliability is a trade-off between the number of instructions duplicated (marked critical or likely critical) and the number of instructions monitored at runtime (because they are marked LNC). Also, it can be observed that the change in the number of instructions classified as LC or LNC is not uniform with the change in threshold value. Broadly speaking, the number of LNC and LC instructions remains more or less identical for a certain interval of the threshold value after which there is a sudden drop. The reason behind this phenomenon is based on the fact that although several edge weights may fall below a threshold, for each node there might be multiple dependence paths to the outputs -hence, for a node to be marked as LNC, in every path there needs to be an edge whose weight falls below the set threshold. The number of LNC and LC nodes changes suddenly when every path has an edge whose weight falls below the threshold.
Conclusion
In this paper, we presented a hybrid analysis technique which analyzes programs with elastic outputs to determine critical instructions. Our approach consists of a static analysis phase which can identify critical instructions and mark those which are identified as critical because of the conservative nature of static analysis. This is followed by a profiling phase that determines whether instructions conservatively marked critical by static analysis do become critical at run-time -if not; these instructions are monitored by a lightweight run-time system. We use the results of our analysis to reduce the number of replicated instructions and show significant benefits over previous approach while guaranteeing application-level correctness. To the best of our knowledge, this is the first scalable method with small overhead (in terms of performance and energy) that can guarantee the application-level correctness in the presence of soft errors. 
