Increasing speed and size demands on computer systems have resulted in corresponding demands on storage systems. Since it has been generally recognized that the speed and capacity requirements of storage systems cannot be fulfilled at an acceptable costperformance level within any single technology, storage hierarchies that use a variety of technologies have been investigated. Several previous papers describe the general concepts of hierarchy d e~i g n "~ and e v a l~a t i o n ,~-~ whereas others deal with specific hierarchy systems, such as the core-drum combination on the ICT Atlas c o m p~t e r "~ and the cache-core combination on the IBM System/360, Model 85.'0'11 This paper introduces an efficient technique called "stack processing" that can be used in the cost-performance evaluation of a large class of storage hierarchies. The technique depends on a classification of page replacement algorithms as "stack algorithms" for which various properties are derived. These properties may be of use in the general areas of program modeling and system analysis, as well as in the evaluation of storage hierarchies. For a better understanding of storage hierarchies, we briefly review some basic concepts of their design. 
Increasing speed and size demands on computer systems have resulted in corresponding demands on storage systems. Since it has been generally recognized that the speed and capacity requirements of storage systems cannot be fulfilled at an acceptable costperformance level within any single technology, storage hierarchies that use a variety of technologies have been investigated.
Several previous papers describe the general concepts of hierarchy d e~i g n "~ and e v a l~a t i o n ,~-~ whereas others deal with specific hierarchy systems, such as the core-drum combination on the ICT Atlas c o m p~t e r "~ and the cache-core combination on the IBM System/360, Model 85.'0 '11 This paper introduces an efficient technique called "stack processing" that can be used in the cost-performance evaluation of a large class of storage hierarchies. The technique depends on a classification of page replacement algorithms as "stack algorithms" for which various properties are derived. These properties may be of use in the general areas of program modeling and system analysis, as well as in the evaluation of storage hierarchies. For a better understanding of storage hierarchies, we briefly review some basic concepts of their design.
The purpose of a storage system is to hold information and to associate the information with a logical address space known to the remainder of the computer system. For example, the Central Processing Unit (CPU) may present a logical address to the storage system with instructions to either retrieve or modify the information associated with that address. If the storage system consists of a single device, then the logical address space corresponds directly to the physical address space of the device. Alternatively, a storage system with the same address space can be realized by a hierarchy of storage devices ranging from fast but expensive to slower but relatively inexpensive devices. In such storage hierarchies, the logical address space is often partitioned into equal-size pages (or unequal-size segments) that represent the blocks of information being moved between devices in the hierarchy.
A hierarchy management facility is included to control the movement of pages and to effect the (generally dynamic) association between the logical address space and the physical address space of the hierarchy. When the CPU references a logical address, the hierarchy management facility first determines the physical location of the corresponding logical page and may then move the page to a fast storage device where the reference is effected. Since these actions are "transparent" to the remainder of the computer system (except for timing), the logical operation of the hierarchy is indistinguishable from that of a single-device system.
The goal of the hierarchy management facility is to maximize the number of times logical information is in the faster devices when being referenced. As this goal is approached, most references are directed to the fast, small stores whereas most of the logical address space is distributed over the slower, large stores.
The storage system then acquires the approximate speed of the fast stores while maintaining the approximate cost-per-bit of the slower and less expensive stores. This increase in cost-performance is the primary justification for storage hierarchies.
Clearly, many factors can affect the cost-performance of a storage hierarchy. On the performance side, one must consider the capacity and characteristics of each storage device, the physical structure of the hierarchy, the way in which information is moved by the hierarchy management facility, and the expected pattern of storage references. On the cost side, the hardware and/or software required to find and move logical information must be considered, as well as the cost-per-bit and capacity of each device. Because of these factors, it is quite difficult to design an "optimal" hierarchy.
The typical approach to hierarchy evaluation employed by computer designers has been to simulate as many hierarchy systems as possible, at various levels of During the first stages of design, a large number of relatively simple simulations may be run with fixed, standard address traces. These traces are assumed to be "typical" sequences of storage references obtained from existing computer systems, and they are used to approximate the reference behavior of future systems. The purpose of these simulations is to measure such statistics as data flow and frequency of access to each device in order to estimate the overall performance of an actual system. The resulting performance estimates can then be used to narrow the field of possible designs, which then receive more detailed examination.
Alternatively, one may try to develop analytical techniques that avoid point-by-point simulation but still yield accurate statistics for data flow and access frequencies. Several papers deal with such techniques for hierarchy e~a l u a t i o n .~-~ I n general, the approach here is to run a relatively small number of simulations and extrapolate the measured statistics to a larger class of hierarchies. The difficulty with this approach is the need for various assumptions about the statistical properties of address traces and data flows required to formulate the analytical equations. Moreover, it is difficult to include a quantitative dependence on such factors as data path structure, page replacement alg~rithrn,'~ and address mapping ~c h e m e ,~ so that many simulations may still be necessary.
This paper presents a technique that can be used to circumvent much of the simulation effort required in hierarchy evaluation. Specifically, we present an efficient procedure that determines, for a given address trace, the exact frequency of access to each level of a hierarchy as a function of page size, replacement algorithm, number of levels, and capacity at each level. In the following, we consider a class of multilevel, demand-paging hierarchies14 with the same replacement algorithm at every level. The procedures developed here are applicable to a large class of well-known replacement algorithms having certain inclusion properties defined later. These algorithms-which we call stack algorithms-include "least frequently used," "least recently used," "optimal," and a "random" replacement algorithm.
The system model
An H-level paged storage hierarchy consists of a collection of storage devices MI, M 2 , . . . , M H , a network of data paths connecting the devices, and a hierarchy management facility. Each device is partitioned into physical blocks called page frames. For convenience, the highest-level store M , is called the local store and the lowest-level store MH is the backing store as shown in Figure 1 . The hierarchy management facility controls page movement between the devices and associates each logical page with a physical page frame. Special storage and processing hardware may be required, but they are not included in our model.
References to the storage hierarchy are presented by a single device called the generator, and they are sequentially serviced in the order in which they are presented. References from the generator may may represent the requests of several devices, such as the CPU and the channel, in an actual system. The time sequence of logicaladdress references X = x,, xz, . . . , xL is called an address trace, where each address consists of n bits as shown in Figure 2 . The set of 2" possible addresses is partitioned into 2k pages of 2n-k logical addresses each. The high-order k bits of each address represent the number of the page containing the address, and the low-order n -k bits represent the location or displacement of the address within the page. Since information movement on the hierarchy is accomplished by transferring pages between levels, we can analyze space allocation and data movement for a trace X by considering a corresponding page trace X k = x:, xi, . . . , x,"-where each x: is the number of the page containing address
When we consider a given fixed page size, we omit the superscript k , and denote pages by x i .
A reference from the generator can be serviced only from the local store M , . Thus if the desired page resides in a lower level device M i , i.e. where i > 1, the hierarchy management facility must bring that page up to M , for servicing. The hierarchy provides a path for bringing pages up to M , , which may or may not require staging through intermediate levels. Any temporary storage required for bringing a page up to M , is included in the hierarchy management hardware, and is therefore not represented in our model. In this paper we restrict our attention to linear storage hierarchies in which the only paths for moving pages down the hierarchy are direct ones from each level M i to level M i + ,, where i = 1, 2, . . . , H -1. The reasons for this restriction are discussed later in this paper. Note that the four-level hierarchy in Figure  1 is a linear hierarchy.
The capacity of the backing store is assumed to be at least 2k page frames, and all logical pages initially reside in the backing store. At any time, each logical page resides in exactly one page frame of the hierarchy. A mapping function is associated with each hierarchical level, and specifies for each logical page the page frames it may occupy in that level. The mapping function is further defined as :
Unconstruined if any page can occupy any page frame of the
Fully constrained if each page can occupy only a single page
Partially constrained in all other cases.
In a later section, we define a technique called "congruence mapping" that generates a whole spectrum of mapping functions. storage device.
frame. For simplicity in developing techniques for analyz,ing storage hierarchies, we first consider a two-level, demand-paged hierarchy with unconstrained mapping. Later, our results are extended to certain classes of multilevel linear hierarchies employing the three types of mapping functions. The local store or buffer has a capacity of C pages, and is directly connected to the backing store as shown in Figure 3 . At time t , the generator presents a request for page x, to the hierarchy. Under demand paging, if x, is in the buffer, the reference proceeds and no page movement occurs. Otherwise, x, is brought to the buffer from the backing store.
If the buffer is already full, x, replaces some page y , in the buffer. The selection of the particular page y , is performed by the buffer replacement algorithm. This operation is a key element of storage management.
In the two-level hierarchy shown in Figure 3 , a reference to a page residing either at level M , or at M , is called an access to that level.
For a given hierarchy and page trace, we define the access frequencies F, and F, where F, is the relative number of accesses to level M, during the processing of the trace. Thus, if N , accesses are made to level MI, and N2 = L -Nl accesses are made to level M2, we obtain F, = N , / L and F, = NJL.
Some important measures of storage hierarchy performance can be obtained from these access frequencies. For example, one can combine access frequencies with a set of effective access times { T < ] to obtain an effective (or average) hierarchy access time
In general, access times depend on the access paths, device access times, and characteristics of the hierarchy management facility. The access frequencies depend only on the page trace, capacity of the buffer, and replacement algorithm.
For a two-level hierarchy, accesses to the buffer are called successes; the relative frequency of successes as a function of capacity is given by the success function F(C). For a given capacity C, page trace X = xl, x,, . . . x,,, replacement algorithm, and arbitrary time t (where 1 5 t 5 L), the set of pages in the buffer just after the completed reference to x, is denoted by B,(C). The initial buffer contents is represented by B,(C). By convention
for all C where 4 is the empty set. The set of distinct pages referenced in xl, x2, . . . , x, is denoted by rt, and the number of pages in rt is denoted by The inclusion property can be observed in Figure 4 where at time t = 5, for example
Because of the inclusion property, the buffer contents at any time and for all capacities can be represented in the following compact and useful way. We order the set of pages rt into a list S, = s t ( l ) , s, (2) Figure 4 is
The stack S,, at time t = 0 has no entries and is therefore called a null stack, that is, one with no entries. The entire sequence of LRU stacks corresponding to Figure 4 is included in Figure  5 .
Besides representing the buffer contents for all capacities, the LRU stack can be used to efficiently determine the success function F(C). Let us suppose that at time t , page x, has been previously referenced and thus is a member of at least one set B,-,(C), where 
We call this page position the stack distance Ai, since A, is essentially the "distance" from the top of the stack to X, = s/-I(At) (Note that here A , = C,. When constrained mapping functions are considered, the stack distance may not always equal the critical capacity.) If x, has not been previously referenced, then A , is set to infinity. The sequence of stack distances for our example is included in Figure 5 .
The significance of stack distances is that they lead directly to the success function. To see this, let n(A) be the number of times the stack distance A is observed in processing a trace. Since the stack distance equals the critical capacity, the number of times that the referenced page is found in the buffer is Cand the success function is given by the expression
In practice, the set (n(A)] can be determined from a set of distance counters, as shown i n Figure 5 . All counters are set initially to zero, and the counter for each distance A is incremented whenever that distance occurs. For k-bit page numbers, we need at most 2k + 1 counters, corresponding to 1 5 A 5 2k and A = m . At the conclusion of a page trace, the final values of the distance counters are the values { n(A)) , and F(C) is obtained from Equations
and 5.
We now calculate the value of the success function in a numerical example. For A's of 1, 2, 3, 4, and a , the cL#rresponding final counter values in Figure 5 are 2, 1, 2, I , and 4. This distribution is shown in Figure 6A . Dividing by L equals 10 in Figure 5 , and summing cumulatively, we obtain the success function shown in Figure 6B . One can verify that the F(C) values for the curve in Figure 6B agree with those obtained in the simulations of Figure 4 .
To find the access frequencies F, and E2, for a given buffer capacity C, we take F, = F(C,) and E2 = 1 -F,. As an example, for C = 3 pages, Let us suppose that page xL has been previously referenced and appears at position A on stack St-l. For time t , we know that x, must be the top entry in St, because it is the most recently referenced page. Consider now a page b at some position j on St-, where 1 5 j < A. At time t -I , page b is the jth most recently referenced page, and the intervening pages do not include x,. At time t , page xt is added to this set so that page b must now be at position j + 1 on stack St. If j is greater than A, page b must remain at position j at time t , since the set of more recently referenced pages is unchanged from time t -1 .
The net effect of this page motion is shown in Figure 7A . Page x, is moved to the top of the stack, pages previously above xt are down-shifted one position, and all other pages retain the same position. If xt were not previously referenced, x, would be placed on the top and all other pages would be down-shifted one position as shown in Figure 7B . 
wherep,(i) has a higher priority thanp,(i + I) for 1 _< i < The algorithm then selects for replacement the page in B,-,(C) that has the lowest priority.
A convenient notation for working with priorities is min(A), where
A is an arbitrary set of pages in r,+,, and min(A) is the unique page in A having lowest priority on the list P , . If B,-,(C) c B,-,(C + 1) and x, @ B,_,(C + I), we can express the replaced pages JJ,(C) and y,(C + 1) as follow:
and
( 1 1) NO. 2 . 1970
STORAGE HIERARCHY EVALUATION
Equations 7-9 are based on the definition of the replacement algorithm, whereas Equation IO is based on the properties of minimization.
We conclude from Equation I 1 that any replacement algorithm that induces a priority list P, for every time t satisfies Equation 6 and is therefore a stack algorithm. For example, the priority list for LRU is just the ordering of pages in r f by most recent reference.
The priority list for "least frequently used" (LFU) replacement is the ordering of referenced pages by most frequent reference together with a scheme to break ties. 
Equations 12, 14, and 15 are based on the constraints of demand paging, whereas Equation 13 is derived from Equation 11.
If x f has not been previously referenced, the defining equations for stack St are the following:
In this case, Equations 16 and 17 express the fact that replacements are required for all buffer capacities in the range 1 5 C 5 y,-,.
Equation 18 corresponds to the new page x, being added to the stack, with the result that a buffer of capacity
is now full. If xi is not found on stack St_,, as shown in Figure 8B , then A, = m , and we use Equation 18 . In either case, the replacement algorithm does not have to be applied to all the pages for stack updating. Only a sequence of pairwise decisions between pages s,+,(i) and y,(i -1) is required.
Comparing our stack updating procedure with the one for LRU shown in Figure  7 , we see that page y l ( C ) under LRU is always S,-~(C). In fact, the priority list P, is exactly equal to stack S,-,, 
For an arbitrary stack algorithm, the stack updating is more complex than for LRU, and the order of stack elements at time t -1 may be very different from that at time t .
Let us now examine several examples of stack algorithms. In general any replacement algorithm that bases its decisions on some page usage quantity, whether measured or predicted, naturally induces a priority list and is, therefore, a stack algorithm. One example, of course, is LRU, and another example previously mentioned is least freauentlv used (LFU) replacement. that has been referenced the fewest number of times over the interval 1 5 T 5 t , or perhaps over some "backward window" interval t -h 5 T 5 t , where 0 < h 5 t. If two or more pages are tied for least frequency of use, then some arbitrary rule is used to break the tie. As long as the rule is consistent for all pages and all capacities (e.g., if the tied pages are numerically ordered) a priority list P, is induced, and LFU is a stack algorithm.
Other examples of stack algorithms may arise in analytical studies of program behavior. If an address trace is generated from some random process, it may be desirable to study the behavior of replacement algorithms that base their decisions on the parameters of the random process. One such process is a time-invariant, first-order Markov chain,'"16 where any page c is referenced immediatelv after Dage b with a fixed transition probability T!,,..
(where b and c range over all referenced pages) and by the page referenced at time t = 1.
probability" (LTP) since, 1 chosen for removal is the one that minimizes T,,, over those pages in the bufl'er. Supplying an appropriate rule for breaking ties, we see that LTP induces a priority list and is a stack algorithm.
Another replacement algorithm is to remove the page with the largest expected time until next reference. We call this strategy LNR for "longest next reference." The expected times until next reference can be obtained from the II-matrix by standard techn i q u e~.~~ As with LTP, LNR induces a priority list if we supply an appropriate tie-breaking rule. testing a Markov model of the program), page reference statistics may be used to estimate the matrix n. For example, the observed transition freauencies over some interval t -h to t can be used to then be constructed for each time t , according to the probabilities remains a stack algorithm.
Other stack algorithms may base their decisions on information from the programmer or compiler, or on properties of the computer system. For example, the programmer or compiler may supply to should be given high priorities in the immediate future. Another case is where the operating system assigns priorities to program pages in a .nultiprogrammed system, based perhaps on the position of the program in a task queue. If all the pages in the address space can be ordered in a priority list P, for each time t , the resulting replacement algorithm is a stack algorithm.
In the examples given, we see that priority lists can arise in a variety of ways. We now consider a replacement algorithm called "first-in/first-out" (FIFO) that is not a stack algorithm. Under FIFO, the page that has remained in the buffer for the longest (continuous) time up to time t is removed.
A peculiarity of FIFO is illustrated by the following page trace
X = a b c d a b e a b c d e
As shown in Reference 18, the success function for this trace is not monotonic, and takes the form shown in Figure 9 . Since stack algorithms have monotonic success functions, we conclude that FIFO is not a stack algorithm and does not induce a priority list P, at every time t. In amplifying this conclusion, we note that the relative priorities between pages in I'+, may depend on the buffer capacity C. Thus in the example, one can verify that page d has lowest priority of all pages in B, (3) in the sense that d has been in the buffer longest. However, page d has highest priority in B,i(4), since it was brought into the buffer latest.
Whenever the priorities among pages depend on the capacity of the buffer, we cannot define a single priority list that applies to every capacity. One instance of this is when priorities depend on the frequency of reference to pages after their entering the buffer. Another case is when priorities depend on total time spent in the buffer.
As long as priorities are independent of capacity, and as long as one can order the referenced pages to reflect these priorities, then stack-processing techniques can be used to find the success function.
An optimum replacement algorithm
We now discuss a replacement algorithm that yields the maximum value for the success frequency over the space of all replacement algorithms-for every page trace and every buffer capacity. Such an algorithm is said to be an optimum replacement algorithm. Belady13 describes an optimum replacement algorithm called MIN, and shows how to evaluate the success frequency for a given page trace and a given buffer capacity. In the following discussion, we describe a stack algorithm called OPT an optimum replacement algorithm. Using certain properties of LRU and OPT. the entire success function for OPT can be determined in two passes of a page trace.
The replacement algorithm OPT has the following characteristics. Whenever a page must be pushed from the buffer, the chosen page is the one whose next reference is farthest in the future. If a tie results because two or more buffer pages are never referenced again, the tie is broken by an arbitrary rule fl that pushes the page with the latest alphabetical or numerical order. An example of OPT replacement is shown in Figure 10 , for the buffer capacity C = 3. As an illustration, notice that at time t = 5 page c is pushed from the buffer, since the other buffer pages a and b are referenced sooner. At time t = 9, page b is pushed from the buffer, because page d is referenced again (at time t = lo), and page a has priority over page b by our rule R.
A formal proof that OPT is an optimal replacement algorithm is given in the Appendix. We note here that OPT is not realizable in an actual computer system because it requires knowledge of future page references. However, OPT does serve as a useful benchmark for any replacement algorithm, including stack-type algorithms. To show that OPT is a stack algorithm, observe that a priority list P, can be constructed for OPT at each time t. Specifically, P , is the list of the pages referenced again, ordered by their time of next reference, followed by the list of the pages not referenced again, as ordered by the tie-breaking rule fl.
The stack processing technique for OPT is illustrated in Figure 11 . Priority lists are ordered as described above, and curly brackets denote the pages ordered under the rule fl. For example, at time t = 8 the priority list is P , = c, d, a, b, because c is the next page --, x f ,, (where x f r is the first reference to page a after time t). If page a is not referenced again, the forward distance is defined as infinity. Note that the priority list under OPT is a listing of the pages in r,-, according to their increasing forward distances. An illustrative example of forward distance determination is given in Figure 12 .
If the forward distances to all pages in I'-, are known at time t -1, the new forward distances at time t can be determined iteratively from the single forward distance These results form the basis of a two-pass stack processing technique for determining the success function for OPT replacement. The technique is illustrated by Figure 13 . The first pass is a backward scan of the page trace X using LRU replacement, denoted by the left-pointing arrow. The LRU stack distances are stored, in reverse order, on a "distance tape." The second pass is a forward scan using OPT FLIITT(C, X") and F,,Itu(C, X ) are equal.
Another result, which is proved in the Appendix, is that FOPT(C, X ) is equal to F,,,(C, X"), where F,,,(C, X ) is the OPT success function for trace X. Thus, our two-pass technique can be implemented with forward-backward scans as well as with backward-forward scans. During the first scan, the success function for LRU is obtained, and the distance tape generated. During the second scan the success function for OPT is obtained.
Random replacement
In the stack algorithms considered thus far, a unique success function is associated with each trace. We now extend stack-processing techniques to cover a "random replacement" algorithm (RAND) that does not always yield a unique success function. With R A N D , if the buffer has a capacity of C, any given page is chosen for replacement with a probability of 1/C. In analyzing R A N D , one might perform a
Monte Carlo simulation for each buffer capacity to obtain a R A N D success function. Repeating these simulations would yield a set of sample success functions to characterize R A N D . The sample success functions could then be used to estimate an "average" success function.
A question that arises is whether stack processing can be used to generate a sample success function for R A N D or any other algorithm that bases a replacement choice on the value of some random variable. We observe that R A N D is not a stack algorithm, because there certainly exists a trace and a time t for which the inclusion property fails to hold with a nonzero probability.
Our approach is to define a replacement algorithm RR, which is a stack algorithm having the same statistical properties as RAND for each capacity C. The algorithm RR is defined as follows: at each time 2, the priority list P, is obtained by randomly ordering the set of pages in r,-l (each of the Y,-~! possible orderings is equally likely to be chosen). Observe that RR is a stack algorithm, since it induces a priority list.
To establish that RR is statistically equivalent to R A N D , assume that a replacement is necessary in a buffer of capacity C at time t.
Since y,(C) = min [B,-,(C)], and P, is randomly chosen, the probability that any given page is y,(C) is l/C-the same as for R A N D .
One difficulty in implementing RR is the generation of the random priority list P,. Fortunately, it is possible to update the stack without actually constructing the entire priority list. Assuming that A, > j , NO let q,(t) denote the probability that page s,-,(j) has priority over page y , ( j -1) at time t. If s,+,(j) does not have priority over y , ( j -l), we know that s,-,
Since this occurs with probability l/j, we obtain
Using Equation 20, the stack can be updated at time t for RR replacement by choosing page s,(j) = s,+,(j) with probability ( j -l)/j, for 2 5 j < A, and j < Y,-,. As a check, let us compute the probability Q that an arbitrary page b is pushed from a buffer of capacity C at time t. Assuming that page b occurs at some position k on stack S t -, where I 5 k 5 C, then Q is given by the following expression:
. , S,(C) = S,-,(C)} (21)
The events in the joint probability in Equation 21 are independent, so that we obtain
.P,(s,(k + 2 ) = st-,@ + 2 ) ) . * .
. .P,{S,(C) = S,-I(C))

= (!)(L)(*) k k + l k + 2 . . . ( y )
Since Q = 1/C holds for any page b and capacity C, we have verified that the stack updating for RR can be accomplished using Equation 20 , and that RR has the same statistical properties as R A N D for each buffer capacity. Note that although a particular value of a point on the success function, for example F(4) = 0.3, is equally likely to occur under both R A N D and RR, the occurrence of a particular success function is not equally likely.
As the example with RR illustrates, stack processing techniques can be extended to cover probabilistic replacement algorithms. In fact, a replacement algorithm can have a mixture of probabilistic and nonprobabilistic aspects. For instance, the arbitrary rule used to break ties in LFU and other algorithms may choose a page at random. Another possibility is for a replacement algorithm to favor some pages probabilistically in the construction of the priority list, thereby realizing a so-called "biased replacement" algorithm.'2 In any case, the only requirement is that the priority list be constructed to reflect the probabilistic properties of the desired replacement algorithm for every capacity C.
Congruence mapping
up to now, we have restricted our attention to two-level storage hierarchies with unconstrainted mapping at the first level. Under this type of mapping, any page in the buffer may be replaced by the referenced page. The advantages of unconstrained mapping are that all available page frames in the buffer can be used, and also that seldom used pages cannot become "locked" into the buffer by mapping constraints. A disadvantage with unconstrained mapping is that extensive associative searches may be necessary to locate pages in the buffer. Moreover, the implementation overhead of the replacement algorithm may be excessive, since relative priority information must be maintained for all pages in the buffer. To offset these disadvantages, a constrained mapping scheme can be employed whereby each page is restricted to occupy a member of only a subset of the buffer page frames.
One such mapping technique is called congruence mapping, by which the 2k distinct pages in the address space are partitioned into 2"
disjoint congruence classes, where 0 5 a 5 k , and each class contains 2k-" pages. The classes are numbered consecutively from 0 to 2" -1, and class membership is determined from the a low-order bits of the page number. I n this case, the a low-order bits constitute the class number [x] of a page, and the remaining k -a bits are called the page prejx as shown in Figure 15 . The quantity a is called the class length. For a class length equal to zero, we set [x] = 0 for all pages.
In a two-level hierarchy with congruence mapping, every congruence class is assigned an equal number of page frames in the buffer-to be used exclusively by members of that class. This number is called the class capacity and is denoted by D. (The total capacity of the buffer in pages is thus C = 2 " . D.) When a page x is referenced, it may appear in any of the D page frames reserved for class [x] . If the reference page is not in the buffer, and if the D page frames are all occupied by other members of class [x], a replacement algorithm selects one of these pages for removal. We assume that the same replacement algorithm is used separately for each of the classes.
Note that when the class length a is zero, all pages are in the same class, and the mapping is unconstrained. When the buffer capacity C is a power of 2, and when C = 2'", only one page is allocated to each class, and the mapping function is fully constrained. Thus for a fixed buffer capacity C = 2", where 0 5 h 5 k , we can vary the mapping function from unconstrained to partially and fully constrained simply by varying the value of a from 0 to h. If we also view the backing store as 2" individual backing stores, as shown in Figure 16 , the two-level hierarchy partitions into a collection of 2" distinct subhierarchies, each with a buffer capacity of D page frames. When the replacement algorithm is a stack algorithm, these subhierarchies can be evaluated separately using stack processing techniques. In practice, 2" stacks (one for each subhierarchy) can be maintained as the trace is processed. Each page reference x causes only the stack for class [x] to be updated, and a stack distance A to be determined from that stack.
In congruence mapping, to calculate the success function for a given trace and given class length (Y, the stack distances must be carefully interpreted. Whenever a stack distance A is measured, the corresponding critical capacity of the entire buffer is 2" . A , since this is the minimum buffer capacity necessary to contain the referenced page. Therefore, the success function F"(C) for the set of capacities C = 2". D where D = 1, 2 , . ' ' , is given by However, if page x has not been previously referenced, the bottom of stack St+, is reached and A; is set equal to infinity for all class lengths a. In either case, each distance A: is used to increment the appropriate distance counter for class length a.
An example of this procedure is indicated in Figure 17 . In Figure  17A , the right match functions are found by scanning down the stack. In Figure 17B , the right match frequencies { p ( r ) } are plotted in reverse order as a function of r . Cumulative summation, according to Equation 23, then yields the desired LRU stack distances { A: } .
Note that the stack distance for class length zero is the same stack distance A as obtained for L R U replacement with unconstrained mapping. In previous sections of this paper, stack processing techniques are developed to obtain the success function for a two-level hierarchy. For each buffer capacity, this success function represents the relative number of accesses to the buffer for a given page trace.
We now show that the same success function can be used to find the access frequencies for all levels of a multilevel, linear hierarchy for any number of levels, and any capacity at each level. Recall that in a linear hierarchy, the only downward data path from each level + . , until a vacant page frame is found. Note that positions of pages in the hierarchy-and therefore the access frequenciesdo not depend on the structure of upward data paths to the local store, but depend only on the replacement algorithm and the capacity at each level.
M , is to the next level
We have shown that when a stack replacement algorithm is used for a two-level hierarchy, the top C, pages of the stack are the contents of a buffer of capacity C, as shown in Figure 18A . Let us now assume that the replacement algorithm for a multilevel hierarchy induces a priority list at every time and that this list determines the replacement decisions at every level of the hierarchy. If this is true, then for any number of levels and any set of capacities C,, C,, . . . , CI,, the contents of each level at any time can be determined from the stack for this replacement algorithm. More precisely, let B; (C,) denote the contents of level M , at time t , and let U , denote the sum C, + C, + . . . + C,. We then claim that
or equivalently that B:(C,) can be identified as the first C, entries of stack S,, and B: can be identified as the next C, entries, etc. This result is illustrated for a four-level hierarchy in Figure I8B . At level M,, the page J I~( U~, -~) that has been pushed from M , -, finds a vacant page frame, and all lower levels remain unchanged. Then
Thus we have shown that Equation 24 is satisfied at time t.
The significance of this result is that a stack distance A, where C, + . . . + C,-l < A 5 C, + . . . + C,, corresponds to an access to hieyarchy level M,, and the relative number of such A's is simply the access frequency F , to that level. Thus
As with two-level hierarchies, all other accesses are directed to the backing store so that
The determination of access frequencies is illustrated graphically in Figure 19 for a four-level hierarchy. Note that the technique illustrated in the figure cannot be used for an arbitrary hierarchy or success function. However, the technique can be used for any linear hierarchy as long as the replacement algorithm always induces a single priority list for all hierarchy levels.
Our treatment of multilevel linear hierarchies can be extended to include hierarchies with congruence mapping functions. We assume that the same class length a is used for every level and that D, page frames are allocated to each congruence class at level Mi.
The total capacity of level M , is then
Using the success function F"(C) and Equations 25 and 26, we obtain the access frequency F: for each level as follows:
dated, but a store distance A' is recorded. The distributions { n'(A')) and (n"(As)) can then be used to determine the fetch and store access frequencies to each level of the hierarchy. It should be clear that this technique also works if congruence mapping is included. We can also consider a modified fetch-store design where the page usage statistics are updated for a store operation even though no page motion occurs. This change is incorporated by updating the priority list for both fetches and stores. Thus, for modified fetchstores, the net change in our model is that the stack is not updated for store operations.
Besides distinguishing fetches from stores, a computer system may also distinguish the various sources of store requests. For example, a "call-back" feature can be used by which a page in the buffer is moved to the backing store if the page is stored into by an I/O device. The motivation here is to free the buffer of pages not needed by the CPU, and to service all I/O stores from the backing store.
For a call-back hierarchy, the generator must specify at least two kinds of references-CPU references, and stores from the I/o channel. Stack processing techniques can then be modified as follows. When a CPU store or fetch occurs, the stack is updated in the normal way (except for special entries to be described later), and a distance counter n"' "(A)
is incremented. When an I/O store occurs, say at time t , a counter n''"(A) is incremented. If page x, does not occur on stack then S, is equal to
If page xI does occur on stack St_,, then S, = St-, except that xt is replaced by the special entry " # ." This entry, counted for all stack distance measurements, represents the empty page frame caused by page xt returning to the backing store. To ensure that empty page frames are filled as soon as possible, all #-entries are assigned the lowest priority in replacement decisions.
The call-back feature can be used in conjunction with the fetchstore or modified fetch-store schemes. In all cases, the correctness of the modified stack processing techniques can be established.
Since stack processing allows a large sample of "typical" address tapes to be analyzed, for many hierarchy models, the efficiency gained at the early stages of hierarchy design may be great enough to impact the whole design process. More of these traces can be processed in a given time, and more hierarchy designs can be evaluated for a given number of traces. The availability of this data may help justify the "typical"-trace approach to design, or may help in the development of other models for system requirements. As an example, program models can be more deeply investigated by evaluating both a program and its model under a very large number of address traces. Improvement in program modeling, in turn, may enhance the success of analytical disciplines that use these models, such as storage interference studies for multiprogrammed systems.
paging) another replacement algorithm exists that uses demand paging and causes the same or a fewer total number of pages to be loaded into the buffer. This result is used to show that OPT is an optimal replacement algorithm and, in fact, that OPT causes the minimum total number of pages to be loaded into the buffer. Finally, it is shown that the success function under OPT for any trace is identical to the success function under OPT for the reverse of the trace.
Definition
IS1 denotes the number of elements in a set S . la/, denotes the number of occurrences of a symbol a in a A = { a, b, . . . } is a finite set of N page addresses or pages. 
an I -p o k y .
A policy is a particular realization of a replacement algorithm for a given trace. For such a trace and initial buffer state B,,, an I-policy and an 0-policy together determine the sequence of buffer states that will occur during the trace. An I-policy gives the set of pages loaded into the bufTer, and an 0-policy gives the set removed. I f p , = 4, no page is removed, and if q, = 4, no page is loaded in. Note that only certain pairs of 0-and I-policies are meaningful. For example, a page cannot be removed if it is not in the buffer.
We consider only meaningful policies, where q,+l $ B, and P , +~ G Under demand paging, single pages are loaded when necessary until the buffer fills; subsequently, page swaps occur only when necessary.
One measure of goodness for a policy pair P and Q is the total number of pages loaded into the buffer e:=, lqt 1 under the policy pair. The following theorem supports the usefulness of demand paging.
Theorem 1
Let P and Q be a valid policy pair for X and Bo. There exists a valid demand policy pair P" and Q" for X and B, such that Proof. P" and Q" will be constructed by forming a sequence of valid policy pairs (P", en), (P', Ql), (P', Q'), . . . , (P", QR), where P" = P , Q" = Q , P" = P", Q" = Q", and 
is not a function of the policies, x;=, /q,l is a constant and optimum For a given trace X and initial state Bo let us define an optimum replacement policy pair P and Q as a pair that is valid and minimizes /q, 1 algorithm over the class of valid policies. From Theorem 1 there always exists an optimum policy pair which is also a demand policy pair. Since (A3) holds for all demand policies we can find an optimum demand policy pair if we can find a demand policy P" such that I+/ ,,I) 2 I +\ , .
where P is any demand policy.
Definition
Let X be a trace, and let a E A be a page. 
which again can be treated as in Case 3A.
Note that the situation where ib = 8 can not arise in Case 3B, since b E B ( b -l . We have therefore successfully exhausted the possible cases, and Lemma 1 is proved. Thus we see that an OPT policy results in a minimum number of pages being loaded into the buffer over the class of all valid policies. After giving preliminary Lemmas 2 and 3, we present a final theorem concerning OPT policies.
Lemma 2
For a trace X , let the set Bc represent the first C distinct pages referenced in X . For a buffer of capacity C, if P is a valid demand policy for X and some B; C Bc, then P is a valid demand policy for X and any BL C Bc.
Proof. For a trace X : let the set Ec represent the last C distinct pages referenced in X . For a buffer of capacity C, if P is a valid demand policy for X and Bo, there exists a valid demand policy P' with a state sequence Bo, B:, B;, . . . , B,: such that B; = E, and ! + I p , 2 /+IP.
Proof. Let i be the smallest integer such that x ,, . . . , x,* contains C distinct pages. Suppose, under policy P, that B,-, contains n elements of Ec, i.e. 1 B ,-, n E , 1 = n. It follows that at least C -n pages will be loaded into the buffer following time i -1. Setting p: = p k for 1 5 k 5 i -1, we will specify the remainder of P' in such a way that exactly C -n pages are loaded into the buffer following time t -1, We observe that, since at most C distinct pages are referenced following time i -I , we never need remove a page b from the buffer where b E E(,. Thus, if a page must be removed at time 4 for i 5 e 5 L , there always exists a page c, where c Ec, in the buffer, and we set p: = c. If P' is constructed in this manner, and from Equation
A3
we have 141ps 2 l+lp. Furthermore, since no page in E, is ever removed from the buffer following time t = i and lE, I = C, we see that B: = E,. for the demand policy P". Thus our original assumption is false, and it must be the case that I+lTro = I+lPo.
