Data-dependent branches constitute single biggest source of remaining branch mispredictions. Typically, data-dependent branches are associated with program data structures, and follow store-load-branch execution sequence. A set of memory locations is written at an earlier point in a program. Later, these locations are read, and used for evaluating branch condition. Branch outcome depends on data values stored in data structure, which, typically do not have repeatable pattern. Therefore, in addition to history-based dynamic predictor, we need a different kind of predictor for handling such branches.
Introduction
Most branch prediction techniques rely on branch history information for predicting future branches. Some use short history [11] [21] [36] , while others use longer history [16] [25] [30] [32] . History-based dynamic branch prediction schemes have shown to reach high prediction accuracy for all except few hard-to-predict branches. Figures  1 and 2 show branch mispredictions per 1K instructions for EEMBC and SPECint benchmark suites using several history-based branch predictors. As can be seen from the figures, several benchmarks have higher branch mispredictions even when using long history branch predictor.
This work is based on the following observation: Hardto-predict data-dependent branches are commonly associated with program data structures such as arrays, linked lists, trees etc., and follow store-load-branch execution sequence similar to one shown in listing 1. A set of memory locations is written while building and updating the data structure (line 2, listing 1). During data structure traversal, these locations are read, and used for evaluating branch condition (line 7, listing 1). routelookup  text01  rotate01  coremark  ospf  dither01  cmy  canrdr01  AutCorPulse  cjpeg  matrix01  tblook01  ttsprk01  djpeg  fBitAlStep  a2time01  iirflt01  bitmnp01  aifirf01  idctrn01  cacheb01  ViterbZeros  puwmod01  FFTPulse  pntrch01  basefp01  rspeed01  aiifft01  aifftr01  pktflow512 176_gcc  175_vpr_place  175_vpr_route  473_astar  300_twolf  256_bzip2  197_parser  181_mcf  186_crafty  164_gzip  252_eon  254_gap  255_vortex  253_perlbmk Mispredictions per 1K instructions Gshare YAGS BiMode TAGE
Figure 2. MPKI for SPECint benchmark suite
This paper proposes Store-Load-Branch (SLB) predictor, a compiler-assisted dynamic branch prediction scheme Listing 1. Store-Load-Branch execution sequence 1 f o r ( node = h e a d ; node ! =NULL ; node = node→ n e x t ) { 2 node→key = ... ; 3 } 4 . . 5 node = h e a d ; 6 w h i l e ( node ! =NULL) { 7 if(node→key <condition>) { 8 . . . 9 } 10 node = node→ n e x t ; 11 } for data-dependent branches using data value correlation.
Compiler identifies all program points where data structure associated with a hard-to-predict data-dependent branch is referenced and modified. At run-time, hardware tracks marked store instructions that modify the data structure, computes branch condition flags ahead of time using store data values, and buffers them in a structure at store addresses. Later, during data structure traversal, pre-computed flags are read using predicted load address, and used for predicting branch outcome.
Typically, instruction that loads data structure values is quickly followed by branch instruction for evaluating branch condition using loaded values. Therefore, actual load address is usually not available before branch instruction gets fetched. Hence, we use predicted load address for reading pre-computed branch flags. Addresses for simple data structures such as arrays are easier to predict using stride-based address predictor. Bekerman et al. proposed load address predictor for irregular data structures such as linked list and tree [4] , which we adapted according to our requirement.
We compared our design with the state-of-the-art TAGE branch predictor [32] . Results show that, for several benchmarks, top mispredicting branches in the TAGE predictor are accurately predicted using SLB predictor. On average, compared to standalone TAGE predictor, combined TAGE+SLB predictor reduces branch mispredictions per 1K instructions (MPKI) by 21% for SPECint [34] benchmark suite. Similarly, for EEMBC [12] benchmark suite, MPKI is reduced by 51%. This paper makes following contributions:
1. We investigate program patterns that manifest hard-topredict, data-dependent branches. 2. We propose SLB prediction scheme for datadependent branches, which predicts branch outcome using pre-computed branch flags. 3. Our implementation of SLB requires adding couple of hint instructions to ISA, and light-weight hardware structures.
Rest of the paper is organized as follows: Section 2 presents motivating examples from real benchmarks. Section 3 describes SLB prediction scheme. Simulation methodology and results are presented in Sections 4 and 5 respectively. Section 6 discusses related work. Finally, Section 7 concludes the paper. 8 9 / * P e r f o r m t h e DCT * / 10 (*do dct) (workspace); 11 12 r e g i s t e r DCTELEM temp , q v a l ; 13 r e g i s t e r i n t i ; 14 r e g i s t e r JCOEFPTR o u t p u t p t r = c o e f b l o c k s We will use cjpeg benchmark from eembc-consumer suite as a motivating example. The benchmark performs standard JPEG compression on a given image. Input image is broken into block of 8x8 pixels, and each block goes through discrete cosine transform (DCT), quantization and entropy coding steps. As shown in listing 2, do dct function (line 10) populates an array, workspace, with DCT coefficients whose values range between -1024 to 1023. During quantization, each DCT coefficient is read (line 18), compared (line 20) to see if it is positive or negative, and quantized accordingly. The branch 'if (temp < 0)' (line 20) is an example of a hard-to-predict data-dependent branch. Table 1 (row 3) shows branch characteristics and prediction accuracy of this branch using short and long history predictors. It shows that using longer history TAGE predictor does not improve prediction accuracy for this branch. Instead, if branch condition flags are computed and buffered while populating array workspace in do dct function 1 (see listing 3, lines 8-9), a simple buffer lookup can yield perfect branch prediction.
Example 2: Data-Dependent Indirect Branches
Indirect branches are generally harder to predict than direct branches as they may have multiple targets corresponding to a single static indirect branch.
Listing 4 shows a ray tracer program, Eon, taken from SPECint2000 suite. Each ray must be tested for intersection with all the objects in the scene. A scene consists of several different types of objects with a common base class mrSurface, as shown in the listing 4 (lines [1] [2] [3] [4] [5] [6] [7] [8] . While reading the scene, these objects are stored into a 3D datastructure called grid using its mrGrid::insert method (line 10). The grid is later traversed in the mrGrid::viewingHit method (line 21), to see if the incoming ray hits any object in the grid. During each iteration of the while loop, the next object's pointer (oPtr) is read from the grid (line 36), and depending on the type of the object, corresponding viewingHit() method is invoked (line 41). Since the sequence of objects stored in the grid does not have a repeatable pattern, predicting target address for virtual function call (line 41) using history-based indirect branch predictor results in lower prediction accuracy (see table 1, row 4). Instead, if viewingHit() function addresses corresponding to different types of objects in the grid are buffered while inserting ob-1 do dct is a pointer function, and is assigned as: fdct→do dct = jpeg fdct islow; jects in the grid (line 15 and 18), a buffer lookup during grid traversal yields correct function address corresponding to the read object. Note that, in addition to accurately predicting target address for 'oPtr→viewingHit()' function call (line 41), another hard-to-predict branch at line 37, 'if ( oPtr )', can also be accurately predicted since it also depends on the same object read from the grid. 35 36 w h i l e ( iterator.Next ( oPtr , t C e l l M i n , t C e l l M a x ) ) { 37 if(oPtr) { 38 t C e l l M a x = ggMin ( tmax , t C e l l M a x ) ; 39 t C e l l M i n = ggMax ( t C e l l M i n , t m i n ) ; 40 41 i f ( oPtr→viewingHit ( r , t i m e , t C e l l M i n , 42 t C e l l M a x , VHR, MR) ) 43 r e t u r n g g T r u e ; 44 } 45 } 46 r e t u r n g g F a l s e ; 47 } 3 Store-Load-Branch (SLB) Predictor
Overview
SLB predictor is a compiler-assisted dynamic branch prediction technique specifically targeted at improving prediction accuracy of data-dependent branches. Most datadependent branches are associated with program data structures such as array, linked list, tree etc. During traversal, these branches operate on elements of data structure. Branch outcome depends on data values stored in the structure, which, typically do not have repeatable patterns. Therefore, instead of relying on branch history information, we compute branch flags and use them for predicting branch direction. Due to deep processor pipeline, data values (and resulting branch flags) are often not available before branch instruction gets fetch. Therefore, instead of using load values, we compute branch flags ahead of time using store values while updating the data structure.
Implementation Details

Compiler and Architecture Support
For a data-dependent branch, SLB scheme relies on compiler to identify its store-load-branch (ST-LD-BR) sequence. Starting with branch instruction, compiler identifies load instruction(s) on which branch is dependent. It then identifies store instruction(s) feeding load instruction(s). ST-LD-BR sequence for a branch is encoded and passed down to hardware using special load hint (HLD) and store hint (HST) instructions shown in figure 3. Compiler inserts an HLD/HST instruction for every static load/store associated with the branch. Lower 9 bits encode load/store pc offset from HLD/HST instruction. HLD bits 20:9 specify branch pc offset from HLD instruction. On seeing an HLD/HST instruction, hardware uses pc offset values for computing absolute address for load or store instruction associated with the branch instruction. HLD bits 24:21 specify load stride value. HST bits 24:21 specify condition code for evaluating branch outcome at store time.
Hardware Support
For identifying ST-LD-BR sequence at run time, hardware provides three main tables, the store table, the load table, and the branch table, as shown in figure 4 .
Populating Load Table: On executing an HLD instruction, an entry is created in load table if it does not already exist. 'Tag' field in load table is populated with lower 12 bits of load instruction address which is computed using 'ld pc offset' in HLD instruction. 'Br pc' field is populated with 12 bits of branch instruction address which is computed using 'br pc offset' in HLD instruction. Rest of the fields in load table are associated with load address predictor, and are explained later.
Populating Store Table: Similar to load table, on executing an HST instruction, an entry is created in store table. 'Tag' field is populated with lower 12 bits of store instruction address which is computed using 'st pc offset' in HST instruction. 'Br cond' field is populated with 4-bit branch condition code specified in HST instruction.
Populating Branch Table: 'Tag' field in branch table is populated with lower 12 bits of branch instruction address which is computed using 'br pc offset' in HLD instruction.
Once ST-LD-BR sequence is populated in the tables, computing and consuming branch flags can begin.
Computing Branch Flags at Store Time: During code generation, compiler inserts compare instruction prior to store instruction for comparing store value with branch condition. This compare instruction sets the flag register. When store instruction executes and matches store table tag, branch flag is computed using flag register value and corresponding branch condition code in store table. Computed branch flag is stored in T/NT prediction table at store address. For more information on condition codes, flag register and compare/branch instruction, see ARMv7-A architectural manual [2] .
Consuming Branch Flags at Fetch Time: During data structure traversal, when a data-dependent branch matches Predicted load address is generated using load instruction which appears earlier in program order than branch instruction, therefore, load address prediction is not on critical path of making branch prediction.
Generating Predicted Load Address: Since branch prediction is made early in the pipeline, and data structure traversal often occurs in tight loop, 'predicted load address' is used for accessing T/NT prediction table. Figure 4 (dotted box) shows load address predictor, adapted from Bekerman's load address predictor [4] . Address prediction was originally proposed for reducing load instruction latency. In this paper, we propose an alternate use of load address predictors: using predicted load addresses for predicting data-dependent branches. For regular data structures (e.g. arrays) that are traversed linearly, stride-based load address predictor is sufficient [3] [8]. Bekerman et al. proposed advance load address predictor for recursive data structures (e.g. linked lists and trees) [4] . It uses a two-level scheme for predicting the next load address. First level is a perstatic-load table, the load table, where each entry records history of recent addresses seen by the associated load. The history is then used to index a second level table, the link table, which provides the predicted address. Typically, recursive data structure addresses can be accurately predicated by keeping last two addresses in the history. See [4] for more details on load address predictor.
Support for Indirect Branches
As oppose to a direct branch, an indirect branch can have multiple targets. Therefore, predicting an indirect branch requires predicting branch target address as oppose to branch direction. Most indirect branch prediction schemes maintain a history of recent targets taken by an indirect branch, and uses history to index into a 'target cache' for predicting the next target address [6] [9] [10] [17] . Similar to direct branches, data-dependent indirect branches typically do not follow history.
SLB indirect branch prediction scheme uses traditional BTB to store multiple targets of an indirect branch at different BTB indices. Figure 5 shows the block diagram for SLB indirect branch target address prediction (only necessary changes from figure 4 are shown).
Populating Store Data Array: When a store instruction executes and matches store table tag, lower 12 bits of store data value is buffered in store data array at store address.
Predicting Branch Target Address: When an indirect branch is fetch, it reads stored data value from store data array using 'predicted load address', hash it with branch pc, and index into branch target buffer (BTB) for predicting branch target address. Different store data values correspond to different targets of an indirect branch, which are stored and subsequently accessed at different BTB indices.
Updating the BTB: If an indirect branch mispredicts, either, because target address is seen for the first time, or it is replaced by another branch target, BTB is updated. Same index computed at BTB lookup time, is used for updating the BTB. 
Choosing Between TAGE and SLB Predictor
In a combined TAGE+SLB predictor configuration, not every branch is predicted using SLB predictor. If a branch address matches branch table tag in figure 4 , prediction is Load Address Predictor Load taken from SLB predictor, else prediction is taken from default TAGE predictor.
Implementation Cost
Load, store, and branch tables shown in figure 4 are all per-static-instruction tables, while link table and T/NT prediction table holds dynamic values. We have sized various structures in SLB predictor based on number of SLB branches observed in benchmarks. Table 4 shows number of SLB branches identified in each benchmark along with their associated load and store instructions. 
Experimental Framework
Simulation Methodology
Results presented in this paper are collected from an ARM performance simulator running benchmarks from EEMBC, SPECint2000 suites [12] [34] . In addition, top mispredicting benchmark from SPECint2006 suite, 473.astar, is also included in the experiment. Benchmarks are compiled with ARM RealView compilation tool (RVCT 4.1) [28] with -O3 optimization flag. Our detailed cycle accurate simulator models a superscalar out-of-order processor core with 4-wide, 15 stages integer pipeline, 32KB L1 I and D cache, and 1MB L2 cache. Table 3 shows simulation parameters for the front-end pipeline. We used Bi-Mode branch predictor [21] as our baseline. We then compared the baseline with 1) standalone TAGE predictor [32] , and 2) combined TAGE+SLB predictor. We obtain TAGE predictor code from [31] and integrated with our timing simulator. 2 We assume 32 bits for data and address, and 12 bits for tag. 
f ( i i s a h a r d−t o−p r e d i c t b r a n c h ) 6 mark l o a d i n s t r u c t i o n ( s ) f e e d i n g i n t o 7 t h e b r a n c h ( t h r o u g h r e g i s t e r ID m a t c h i n g ) 8 e n d i f 9 i f ( i i s m a r k e d l o a d i n s t r u c t i o n ) 10 f o r e a c h ( s t o r e a r r a y . r e a d s t o r e a d d r e s s ( ) 11 == l o a d a d d r e s s ) { 12 mark t h e s t o r e i n s t r u c t i o n 13
} 14 e n d i f 15 }
Identifying Relevant Load/Store Instructions
SLB predictor relies on compiler to identify program points where data structure associated with a data dependent branch is referenced and modified. For experiments in this paper, we have written a binary profiler for identifying load/store instructions associated with a data-dependent branch. Listing 5 shows pseudo code for our binary profiler.
Results and Analysis
Store to Branch Delay
SLB predictor uses store data for computing branch flags, therefore, store data should be available before branch instruction gets fetched. Table 4 shows cycle count between store data becoming available and branch flags computed using store data getting consumed. Following two program characteristics explain high cycle count between store-branch instruction pair. Firstly, data structure update and traversing typically happen in different program phases. Secondly, even when update and traversal are adjacent to each other, dynamic instructions created by update loop routelookup  text01  rotate01  coremark  ospf  dither01  cmy  canrdr01  AutCorPulse  cjpeg  matrix01  tblook01  ttsprk01  djpeg  fBitAlStep  a2time01  iirflt01  bitmnp01  aifirf01  idctrn01  cacheb01  ViterbZeros  puwmod01  FFTPulse  pntrch01  basefp01  rspeed01  aiifft01  aifftr01  pktflow512  hpg  ConvEn2  Mispredictions per 1K instructions   53 BiMode TAGE TAGE+SLB Figure 6 . MPKI (EEMBC) generates necessary cycles between the first store operation of update loop and the first load-branch operation of traversal loop. Table 4 shows that only a handful of static branch instructions are predicted using SLB predictor. Yet, these hard-to-predict, data-dependent branches are top contributors to overall misprediction (see table 7 and 8 for top mispredicting branches in each benchmark). Figure 6 and 8 compares MPKI of baseline Bi-Mode predictor with 1) standalone TAGE predictor, and 2) combined TAGE+SLB predictor. As shown in these figures, MPKI remains high for several benchmarks even when using state-of-the-art TAGE predictor. On average, combined TAGE+SLB predictor reduces MPKI for EEMBC suite from 4.48 to 2.21, a reduction of 51%. Similarly, for SPECint suite, MPKI is reduced from 9.56 to 7.50, a reduction of 21%. Table 7 and 8 show up to 10 top mispredicting branches in benchmarks we have studied. In most cases, except coremark and text01, SLB predictor is able to accurately predict top mispredicting branches of Bi-Mode and TAGE predictor. Branches not handled by SLB predictor are marked as 'x'. In coremark, top mispredicting branch is an indirect branch resulting from a switch-case statement inside a for loop. In some case statements, control variable on which the indirect branch is dependent is also updated. This update is not available before next iteration of the for loop. As a result, SLB predictor reads stale value from store data array, therefore, mispredicts branch target address.
Branch MPKI
There are several benchmarks where combined TAGE+SLB predictor has same MPKI as that of the standalone TAGE predictor. This is because we have excluded these benchmarks from SLB analysis due to one of the following three reasons: 1) Benchmark MPKI is already low (e.g. EEMBC benchmarks on the left), 2) Mispredicting branches do not follow ST-LD-BR execution sequence (e.g. EEMBC/dither01, SPECint/164.gzip, SPECint/256.bzip2), and 3) Identifying ST-LD-BR sequence without compiler analysis is hard (e.g. SPECint/176.gcc, SPECint/197.parser). Figure 7 and 9 compares performance improvement when using standalone TAGE predictor, combined TAGE+SLB predictor, and perfect predictor. For EEMBC suite, combined TAGE+SLB predictor doubled the speedup of standalone TAGE predictor over the baseline Bi-Mode predictor. A perfect predictor shows a potential of another Table 4 . Number of SLB-enabled branches, and associated load/store instructions 7% performance improvement over the baseline. Similarly, for SPECint suite, combined TAGE+SLB predictor shows performance improvement of 11% over baseline Bi-Mode predictor. This is almost double the speedup of standalone TAGE predictor over the same baseline. Perfect predictor shows a potential for 29% performance improvement.
SLB Performance Impact
SLB Area and Power Analysis
Area and power overhead of SLB scheme was estimated using McPAT [22] , a framework for modeling processor area and power. Parameters of a generic McPAT RISC core (Alpha21364) were adjusted to match simulated ARM core. TAGE and SLB predictor components were added for estimating their area and power overhead. Table 5 shows area and power overhead of SLB predictor for a 32 nm process technology node. Figure 10 shows critical timing paths in fetch unit. First two paths are through TAGE and SLB branch direction predictor, while the third path is through BTB for branch target address computation. Table 6 shows access time for various structures in fetch unit for a 32 nm technology process using CACTI [19] . Table   SLB T/NT Table   Figure 10 . Comparing critical timing paths in fetch unit
SLB Timing Analysis
• Access Time for BTB (ns) = BTB delay = 0.458
Critical Timing Path in Fetch Unit
Computing next instruction address after a branch instruction involves a) determining if branch is taken or not taken, i.e branch direction prediction, and b) if branch is indeed taken, computing branch target address. Therefore, critical path for fetch unit is the maximum delay between branch direction prediction and branch target address prediction, as shown below:
• Critical Timing Path in Fetch Unit (ns) = Max{Br. dir. pred. delay, Br. target pred. delay} = Max{Max{TAGE, SLB}+mux delay, BTB delay} = Max{Max{0.384, 0.39} + 0.041, 0.458} = Max{0.431, 0.458} = 0.458
Related Work
Branch prediction research can be categorized into three classes: static branch prediction, dynamic branch prediction, and compiler-assisted dynamic branch prediction.
Static Branch Predictors: In static branch prediction, branch direction is predicted before program is executed, and same prediction is used for all dynamic instances of that branch. Profiling is used in [13] . Simple heuristics such as predict all backward branches taken, and predict all forward branch not-taken were proposed in [33] . Machine learning was using in [5] to infer branch behavior of new program using existing programs. Dynamic Branch Predictors: Dynamic branch prediction techniques propose hardware that attempts to learn branch behavior at run-time. While most dynamic schemes use branch taken/not-taken history information for training the branch predictor, few schemes have explored adding other information to the prediction process.
(i) Short History Predictors: James. E. Smith first presented bimodal branch prediction scheme [33] . Repeatedly taken branches will be predicted to be taken, and repeatedly not-taken branches will be predicted as not-taken. Two-level adaptive branch prediction scheme was presented in [37] . It is based on the observation that a branch can have multiple repetitive patterns. The two-level scheme differentiates among these patterns by keeping a record of direction taken by last m instances of each branch, and using it to index into a table of k-bit counter. Combining branch predictor was proposed in [24] which combines and takes advantage of different predictors types. (ii) Anti-aliasing Predictors: Several schemes proposed different indexing mechanism for reducing branch aliasing effect. gselect predictor concatenates branch history and branch address, and uses it to index into counter table [27] . gshare predictor uses exclusive OR of branch address with branch history to index into counter table [24] . gskewed predictor uses multiple counter tables indexed by different hash functions [26] . Bi-Mode [21] and YAGS [11] predictors partitioned counter table into two halves − taken and not-taken. It reduces negative interference by keeping branches biased towards 'taken' direction in the taken array, and those biased towards 'not-taken' direction in the not-taken array. Agree predictor [35] updates prediction counter based on whether or not branch bias matches branch outcome, irrespective of branch direction. (iii) Long History Predictors: Recent research has shown that prediction accuracy can further be improved by utilizing longer branch history, and using different history length for predicting different branches. Using perceptrons instead of two-bit saturating counter was proposed in [16] . It is based on the observation that not all branches in history are important. O-GEHL [29] , PPM-like [25] and TAGE [32] use multiple predictor tables, each indexed by an increasing length of branch history. In O-GEHL, final prediction is computed by summing predictions read from each predictor Available Register Value Information (ARVI) predictor [7] hashes register values with branch pc to index into prediction table. Branch prediction through Value prediction was proposed in [15] . Address-Branch Correlation (ABC) predictor [14] observed that values inside a data structure tend to be stable, therefore, branch outcome can be correlated simply with address of data structure instead of value inside data structure. We argue that if values inside data structure are stable, multiple iterations over data structure should generate same branch outcome every time, and should be predictable with a history-based predictor. Correlating load address with branch outcome is also used in [1] , however, unlike [14] , they update predictor in case there are stores to those addresses.
Compiler-Assisted Dynamic Branch Predictors: Compiler-assisted dynamic branch prediction combines strengths of static and dynamic approaches − the low overhead of compiler-time analysis with the effectiveness of dynamic prediction. Wish branch was proposed in [18] . It combines strengths of conditional branches and predication. Compiler generate code for predicated execution, but leaves conditional branches intact. At run-time, if branch turns out to be an easy-to-predict, branch prediction is used, else predicated code is executed. In [23] , compiler defines a prediction function for each branch, and inserts instructions for computing prediction function.
Conclusion and Future Work
Data-dependent branches are single biggest source of remaining branch mispredictions. These branches are commonly associated with program data structures such as arrays, linked lists, trees etc., and follow store-load-branch execution sequence. A set of memory locations is written while building and updating the data structure. During data structure traversal, these locations are read, and used for evaluating branch condition. Branch outcome depends on data values stored in the data structure, which, typically do not have repeatable pattern. Therefore, in addition to history-based predictors, we need a different kind of predictor for data-dependent branches.
Taking advantage of store-load-branch execution sequence, we propose a compiler-assisted Store-Load-Branch (SLB) predictor. For every data-dependent branch, compiler identifies all store instructions that modify the data structure associated with the branch. These store instructions are dynamically tracked, and stored values are used for computing branch flags ahead of time. These branch flags are temporarily buffered in a hardware structure, and later used for making predictions. Section 5.2 shows that compared to standalone TAGE predictor, a hybrid TAGE+SLB predictor reduces branch MPKI by 21% for SPECint benchmark suite, and by 51% for EEMBC benchmark suite. Estimated power and area overhead of SLB predictor is 0.28% and 0.5% respectively (section 5.4), with no timing overhead (section 5.5).
Finally, we are working on a compiler implementation for identifying relevant load/store instructions using LLVM's Data Structure Analysis (DSA) [20] . In contrast to other alias analysis that operates on individual memory references, DSA operates at the level of entire instance of data structure, and provides context-sensitive mod/ref analysis.
