In this paper, we present aggressive, proactive mechanisms that tailor file system resource management to the needs of l/O-intensive applications. In particular, we show how to use applicationdisclosed access patterns (hints) to expose and exploit I/O parallelism, and to dynamically allocate file buffers among three competing demands: prefetching hinted blocks, caching hinted blocks for reuse, and caching recently used data for unhinted accesses. Our approach estimates the impact of alternative buffer allocations on application execution time and applies cost-benefit analysis to allocate buffers where they will have the greatest impact. We have implemented informed prefetching and caching in Digital's OSF/1 operating system and measured its performance on a 150 MHz Alpha equipped with 15 disks nmning a range of applications. Informed prefetching reduces the execution time of text sewch, scientific visualization, relational database queries, speech recognition, and object linking by 20-83%. Informed caching reduces the execution time of computational physics by up to 42% and contributes to the performance improvement of the object linker and the database. Moreover, applied to multiprogrammed, I/Ointensive workloads, informed prefetching and caching increase overall throughput.
Introduction
Traditional disk and file buffer cache management is reactive; disk accesses are initiated and buffers allocated in response to application demands for file data. In this paper, we show that proactive disk and buffer management based on application-disclosed hints can dramatically improve performance. We show how to use these hints to prefetch aggressively, thus eliminating the I/O stalls This work was supported in part by Advanced Resemch Projects Agency contract DABT63-93-C-O054, in part by Nationat Scieuce Foundation grant ECD-8907068, and in part by donations aad scholarships horn Data General, Symbios Logic, IBM, Digital, and Seagate. The United States government has certain rights in this material. The views and conclusions contained in this document rue those of the authors end should not be interpreted as representing the official policies, either expmmed or implied, of any of the fending agencies.
Permission to make digital/hard copy of part or ail of this work for personal or classroom use is ranted without fee provided that copies are not made ! or distributed for pro It or mmmercial advantage, the copyright notice, the title of the publication and ite date appear, and notioe is given that copying is by permission of ACM, Inc. To capy otherwise, to republish, to post on servers,or to redistribute to lists, requires prior specific permission andlor a fee. Storage parallelism is increasingly available in the form of disk azrays and striping device drivers. These hardware and software arrays promise the I/O throughput needed to balance everfaster CPUS by distributing the data of a single file system over many disk arms [Salem86] . Trivially parallel I/O workloads benefit immediately; very large accesses benefit from parallel transfer, and multiple concurrent accesses benefit from independent disk actuators. Unfortunately, many I/O workloads are not at all parallel, but instead consist of serial streams of non-sequential accesses.
In such workloads, the service time of most disk accesses is dominated by seek and rotational latencies. Moreover, these workloads access one disk at a time while idling the other disks in an array.
Disk arrays, by themselves, do not improve I/O performance for these workIoads any more than mukiprocessors improve the performance of single-threaded programs. Prefetching strategies are needed to "parallelize" these workloads.
The second factor encouraging our proactive I/O management is that ever-faster CPUS are processing data more quickly and encouraging the use of ever-larger data objects. Unless file-cache miss ratios decrease in proportion to processor performance, Amdahl's law tells us that overall system performance will increasingly depend on I/O-subsystem performance [Patterson88] .
Unfortunately, simply growing the cache does not decrease cachemiss ratios as much as one might expect. For example, the Sprite group's 1985 caching study led them to predict higher hit ratios for larger caches. But in 1991, after larger caches had been installed, hit ratios were not much changed -files had grown just as fast as the caches [Ousterhout85, Balcet91] . This suggests that new techniques are needed to boost I/O performance.
Tbe problem is especially acute for read-intensive applications. Write performance is less critical because the writing application generally does not wait for the disk to be written. In this common case, write behind can exploit storage parallelism even when the application's writes are serial and non-sequential [Rosenblum91, Solworth90] . Examples of read-intensive applications include text search, 3D scientific visualization, relational database queries, speech recognition, object code linkers, and computational physics. In general, these programs process large amounts of data relative to file-cache sizes, exhibit poor access locality, perform frequent non-sequential accesses, and stall for l/O for a significant fraction of their total execution time.
Yet, all of these applications' access patterns are largely predictable. This predictability could be used directly by the application to initiate asynchronous I/O accesses. But this sort of explicit prefetching can cripple resource management. First, the depth to which art application needs to prefetch depends on the throughput of the application, which varies as other applications place demands on the system. Second, asynchronously fetched data may eject useful data from the file cache. Third, asynchronously fetched file blocks end up indistinguishable from any other block in virtual memory, requiring the programmer to be explicitly aware of virtual image size to avoid losing far more to paging than is gained from parallel I/O. Finally, the specializations a progmmmer puts into overcoming these problems may not be appropriate when the program is ported to a different system.
Instead, we recommend using the predictability of these applications to inform the file system of future demands on it. Specifically, we propose that applications disclose their future accesses in hints to the file system. We show how to use this information to exploit storage parallelism, balance caching against prefetching, and distribute cache buffers among competing applications.
The rest of this paper explains and justifies proactive I/O management based on informed prefetching and caching. Sections 2 aud 3 review related work and describe disclosure-based hints.
Section 4 develops our cost-benefit model and Section 5 describes its implementation in Digital's OSF/1 v2. OA to minimize the stall that has already started. Informed prefetching would like a buffer to initiate a read and avoid disk latency. To respond to these buffer requests, the buffer allocator compares their estimated benefit to the cost of freeing the globally least-valuable buffer. To identify this buffer, the allocator consults the two types of buffer suppliers. The LRU queue uses the traditional rule that the least recently used block is least valuable. In contrast, informed caching identifies as least valuable the block whose next hinted access is furthest in the future. The buffer allocator takes the least-valuable buffer to fulfill a buffer demand when the estimated benefit exceeds the estimated cost. In the following sections, we define our system model and then develop each estimator's strategy for valuing buffers.
System model
We assume a modem operating system with a file buffer cache running on a uniprocessor with sufficient memory to make available a substantial number of cache buffers. With respect to our workload, consistent with our emphasis on read-intensive applications, we assume that all application I/O accesses request a single file block that can be read in a single disk access. Further, we assume that system parameters such as disk access latency, Td~k, are constants. Lastly, as mentioned above, we assume enough disk parallelism for there never to be any congestion (that is, there is no disk queueing). As we shall see, distressing as these assumptions may seem, (he policies derived from this simple system model behave well in a real system, even one with a single congested disk.
The execution time, T, for an application is given by
where NVO is the number of I/O accesses, TCPU is the inter-access application CPU time, and TWO is the time it takes to service an I/O access. Figure 3 diagrams our system model.
In our model, the I/O service time, Tvo, includes some system CPU time. In particular, an access that hits in the cache experiences time Thit to read the block from the cache. In the case of a cache miss, the block needs to be fetched from disk before it may be delivered to the application. In addition to the latency of the fetch, Tdi$b these requests suffer the computational overhead, TdfiVeP of allocating a buffer, queuing the request at the drive, and servicing the interrupt when the disk operation completes. The total time to service an I/O access that misses in the cache, Tmi~$,is the sum of these times:
Tmi~$ = Thi~+ Td,ive, + Tdi,k .
(2)
In the terms of this model, allocating a buffer for prefetching can mask some disk latency. Deallocating an LRU cache buffer makes it more likely that an unhinted access misses in the cache and must pay a delay of Tmi~~instead of Thir Ejecting a hinted block from the cache means an extra disk read will be needed to prefetch it back later. In the next sections, we quantify these effects.
The benefit of allocating a buffer to a consumer
The two consumers of buffers are demand accesses that miss in the cache and prefetches of hinted blocks. Since any delay in servicing a demand miss adds to I/O service time, we treat requests from demand misses as undeniable and assign them inthite value.
Computing the benefit of prefetching, explained below, is a bit harder.
Prefetching a block according to a hint can mask some of the kdtYICy of a disk read, Tdi~~Thus, in general, an application accessing such a prefetched block will stall for less than the full Td&~SUppOSe we are currently using x buffers to prefetch x accesses into the future. Then, stall time is a function of x, T~tdJx), and the service time for a hinted read, also a function of x, is TPf(x) = Thit+ Tdriver+ 'T~falr (x) .
(3)
The benefit of using an additional buffer to prefetch one access deeper is the change in the service time, ATPf (x) = Tpf (x + 1) -Tpf (x)
(4] = T,,all (X + 1) -T,tall (x) o (5) Evaluating this expression requires an estimate of T~ti~x).
A key observation is that the application's data consumption rate is finite. Typically, the application reads a block from the cache in time Thil) does some computation, TCPU, and pays an overhead, Tdn'veP for future accessescurrently being prefetchcd.
Thus, even if all intervening accesses hit in the cache, the soonest we might expect a block x accesses into the future to be requested is X(TCPU + Thil + Tdn-ver). Under our assumption of no disk congestion, a prefetch of this xth future block would complete in Tdi$k time. Thus, the stall time when requesting this block is at most There is no benefit from prefetching further ahead than the prefetch horizon. P(Tcpu), recognizing that it is a function of a specific application's inter-access CPU time. Because there is no benefit from prefetching more deeply than the prefetch horizon, we can easily bound the impact of informed prefetching on effective cache size; prefetching a stream of hints will not lead informed prefetching to acquire more than P(TCPU) buffers.
Equation (6) is an upper bound on the stall time experienced by the xth future access assuming that the intervening accesses are cache hits and do not stall. Unfortunately, it overestimates stall time in practice. In steady state, multiple prefetches are in progress and a stall for one access masks latency for another so that, on average, only one in .x accesses experiences the stall in Equation Having estimated the benefit of giving a buffer to a demand miss or prefetch consumer, we now consider the cost of freeing a buffer that could be used to obtain these benefits. We estimate the cost fwst of taking a buffer from the LRU queue and then of ejecting a hinted block to take the buffer it cccupies.
The cost of shrinking the LRU cache
Over time, the portion of demand accesses that hit in the cache is given by the cache-hit ratio, H(n), a function of the num- This figure illustrates informed prefetching as a pipeline. In thk example, three prefetch buffers are used, prefetches proceed in pamllel, Tcpu is fixed, and F'(TCPU) = 5. At time T=O, the application gives hints for all its accesses and then requests tie first block. Prefetches for the first three accesses are initiated immediately. The first access stalls until the prefetch completes at T=5, at which point the data is consumed and the buffer is msed to initiate the fourth prefetch. Accesses two and three proceed witbout stalls because the latency of prefetches for those accesses is overlapped with the latent y of the first prefetch. But, the fourth access stalls for T~mll = Td&~-3( Tcp~Thi1+ Td,jver ). The next two accesses don't stall, 'but the seventh does. The application settles into a pattern of stalling every third access, In general, when x buffers are used for prefetching, a stall occurs once every x accesses.
Average Stall Time VS. Prefetch Depth 
Though the stall time, T$lal~x), is zero when x is greater than the prefetch horizon, Tdnver represents the constsnt CPU overhead of ejecting a block no matter how far into the future the block will be accessed.
The cost of ejecting a block, ATejec~x), does not affect every access; it only affects the next access to the ejected block. Thus, to express this cost in terms of the common currency, we must average this change in I/O service over the accesses that a buffer is freed. If the hint indicates the block will be read in y accesses, and the prefetch happens x accesses in advance, then ejection frees one buffer for a total of y-x buffer-accesses. Conceptually, if the block is ejected and its buffer lent where it accrues an savings in average I/O service time, then it will have y-x accesses to accrue a total savings that exceeds the cost of ejecting the block.
Averaging over y-x accesses, the increase in service time per buffer-access is
where T~ti~x) is given by Equation (8) The globally least-valuable buffer is the one whose maximum valuation is minimal over all buffers. Hence, our replacement policy employs a global min-max valuation of buffers. While the overhead of this estimation scheme might seem high, in practice, as we shall see in Section 5, the value of only a small number of buffers needs to be determined to fmd the globally least-valuable.
An example: emulating MRU replacement
As an aid to understanding how informed caching 'discovers' good caching policy, we show how it exhibits MRU (mostrecently-used) behavior for a repeated access sequence. Figure 8 illustrates an example.
At the start of the f~st iteration through a sequence that repeats every N accesses, the cache manager prefetches up to the prefetch horizon. After the first block is consumed, it becomes a candidate for repla~ment either for tiwther prefetching or to service demand misses. However, if the bit-ratio function, H(n), indicates that the least-recently-used blocks in the LRU queue don't get many hits, then these blocks will be less valuable than the hinted block just consumed. Prefek%ing continues, replacing blocks from the LRU list and leaving the hinted blocks in the cache after consumption.
As this process continues, more and more blocks are devoted to caching for the repeated sequence and the number of LRU . MRU behavior of the informed cache manager on repeated access sequences. The number of blocks allocated to caching for a repeated access pattern grows until the caching benefit is not sufficient to hold an additional buffer for the N accesses before it is reused. At that point, the least-valuable buffer is the one just consumed because its next access is furthest in the future. This block is recycled to prefetch the next block within the prefetch horizon.
A wave of prefetching, consumption, and recycling moves through the accesses until it joins up with the blocks still cached from the last iteration through the data.
Because the prefetch horizon limits prefetching, there are never more than the prefetch horizon, P(TCPU), buffers in this wave.
Even if a disk array delivers blocks faster than the application consumes them, there is no risk that the cache manager will use the cached blocks to prefetch further into the future. Thus, the MRU behavior of the cache manager is assured. Further, the cache manager strikes a balance in the number of buffers used for prefetching, caching hinted blocks, and LRU caching.
The informed cache manager discovers MRU caching without being specifically coded to implement this policy. This behavior is a result of valuing hinted, cached blocks and ejecting the block whose next access is furthest in the future when a buffer is needed. These techniques will improve cache performance for arbitrary access sequences where blocks are reused with no prticular pattern. All that is needed is a hint that discloses the access sequence.
Implementation of informed caching and prefetching
Our implementation of informed prefetching and caching, which we call TIP, replaces the unified buffer cache (UBC) in version 2.OA of Digital's OSF/1 operating system. To service unhinted demand accesses, TIP creates an LRU estimator to manage the LRU queue and estimate the value of its buffers. In addition, TIP creates an estimator for every process that issues hints to manage its hint sequence and associated blocks.
To fmd the globally least-valuable buffer, it is sufficient that each estimator be able to identify its least-valuable buffer and declare its estimated value. From the LRU estimator's perspective, the least-recently-used buffer is least valuable. For a hint estimator, because all disk accesses are assumed to take the same amount of time, the least-vrduable buffer contains the block whose next access is furthest in the future. TIP takes these declared estimates, normalizes them by the relative access rates, and ranks the estimators by these normalized declared values.
When there is a demand for a buffer, TIP compares the normalized benefit of servicing the demand to the normalized declared cost of the lowest-ranked estimator. If there are multiple consumers with outstanding requests, TIP considers the requests in order of their expected normalized benefit. If the benefit exceeds the cost, TIP asks the lowest-ranked estimator to give up its leastvaluable buffer. After doing so, the estimator stops tracking this buffer. As far as it is concerned, the buffer is gone. It identifies a new least-valuable buffer from among the buffers it is still tracking and declares its value. TIP then reranks the estimators if necessary.
Before the block is actually ejected, TIP checks to see if any other estimator would value the buffer more thn the cost of the lowest-rmdwd estimator. If so, that estimator starts tracking the buffer, including it when identifying its least-valuable buffer. 'H-M request for a buffer is then reconsidered from the start. At some later time, when this new estimator picks this almost-ejected buffer for replacement, the first estimator will get a chance to revalue the buffer and resume tracking it. A data structure keeps track of which estimators value a buffer at all to make this search for another estimator fast.
Once TIP is sure that no estimator vrdues the buffer more than the current global minimal amount, the block is ejected and the buffer reallocated.
Since only tracked blocks are ever picked for replacement, all . The LRU list is broken into segments, Sl, S2, S3, . . . Each buffer is tagged to indicate which segment it is in. The tag is updated when a buffer passes from one segment to the next. When there is a cache hit in segment i, the segment hit count, hi, is incremented. That segment's contribution to the hit ratio is then h#A, where A is the total number of accesses to the LRU cache.
often a large jump in the hit ratio when the entire working set of an application tits into the buffer cache. TIP's LRU estimator uses a simple mechanism to avoid being stuck in a local minima that ignores the benefit of a much larger cache: AH(n) is modified to be maxi~" {H (i) } ; that is, the value of the marginal hit ratio is rounded up to the vrdue of any larger marginal hit ratio occurring deeper in the LRU stack. Thus, if the LRU cache is currently small, but a larger cache would achieve a much higher hit ratio, this mechanism encourages the cache to grow. 
To simplify the prefetcher's estimate of the value of acquiring a buffer, we recognize that it will obtain at least a few buffers and use the following variant of Equation hinted block in terms of y, the number of accesses till the hinted read, and x, how far in advance the block will be prefetched back.
To eliminate the overhead of determining the value of x dynamically, we simplify this expression by assuming that the prefetch will occur at the (upper bound) prefetch horizon,~. If the block is already within the prefetch horizon, y <~, we assume that the prefetch will occur at the next access. If the decision to prefetch a block has already been made, then the cost, TdtiveP of performing a disk read will be paid. Any blocks that could piggyback on this read avoid most of the disk related CPU costs. If there are hinted blocks that can cluster with the required block, and they are not prefetched now in such a cluster, their later prefetch will incur the full overhead of performing a disk access and possibly the cost of any unmasked disk latency.
These are exactly the costs considered when deciding whether to eject a hinted block. Thus, the decision to include an additional hinted contiguous block in a cluster is the same as the decision not to eject this additional hinted block once the prefetch is complete.
If the informed cache would decide not to eject the block if it were in cache, then a buffer is allocated and the additional block is included in the pending cluster read. Figure (a) shows the performance of the Davidson algorithm applied to a computational-physics problem. The algorithm repeatedly reads a large file sequentially. OSF/1's aggressive readahead algorithm performs about the same as TIP-1 with hints for this access pattern, Informed caching in TIP-2 reduces elapsed time by more than 3090 on one disk by avoiding disk latency. On more disks, prefetching masks disk latency, but informed caching still reduces execution time more than ls~o by avoiding the overhead of going to disk. Figure (b) shows that informed caching in TIP-2 discovers an MRU-like policy which uses additional buffers to increase cache hits and reduce execution time. TIP-2 takes advantage of a 16 MB cache to reduce execution time by 42%. In contrast, LRU caching derives no benefit from additional buffers until there are enough of them to cache the entire dataset, which is 16.3 MB (20898K blocks). hint and hint CPU times. Figure 1 It is often assumed that because disks are so slow, good performance is only possible when data is in main memory. Thus, many applications, including XDS, require that the entire dataset reside in memory. Because memory is still expensive, the amount available often constrains scientists who would like to work with higher resolution images and therefore larger datasets. Informed prefetching invalidates the slow-disk assumption and makes outof-core computing practical, even for interactive applications. To demonstrate this, we added an out-of-core capability to XDS.
To render a slice through an in-core dataset, XDS iteratively determines which data point maps to the next pixel, reads the datum from memory, appties fake coloring, end writes the pixel in the output pixel array. To render a slice from an out-of-core dataset, XDS splits this loop in two. Both to manage its internal cache and to generate hints, XDS first maps all of the pixels to data-point coordinates and stores the mappings in an army. Having determined which data blocks will be needed to render the cumettt slice, XDS ejects unneeded blocks from its cache, gives hints to TIP, and reads the needed blocks from disk. In the second half of the split loop, XDS reads the cached pixel mappings, reads the corresponding data from the cached blocks, and applies the false coloring [Patterson94] .
Our test dataset consists of 5123 32-bit floating point values requiring 512 MB of disk storage. The dataset is organized into 8 KB blocks of 16x16x8 data points and is stored on the disk in Zmajor order. Our test renders 25 random slices through the dataset. While OSF/1 readahead is effective for the sequential access pattern of Davidson, it is detrimental for XDS. XDS frequently reads a short sequential run, which triggers an equal amount of readahead by OSF/1. Only slices closely aligned with the Z-axis read long mns of sequential blocks for which the readahead is effective. Consequently, for this set of 25 slices, the nonhinting version of XIX reads 1.86 times as much data from disk as the application actually consumes. This combination of false resdahead and lack of I/O parallelism causes XDS to take about 12 seconds to render an arbitrary slice without hints, leading to unacceptable interactive performance.
In contrast, informed prefetching both avoids false readahead and exploits the concurrency of a disk array. TIP-1 eliminates '70~o of the I/O stall time on four disks, and 92% on 10 disks. On 10 disks, TIP-1 reduces the time to render a random slice by a factor of 6 to about 2 seconds, resulting in a much more tolerable interactive latency.
TIP-1 and TIP-2 perform similarly. However, because TIP-2 mm use hints to coalesce into one disk read blocks that are contiguous on disk but widely separated in the access sequence, TIP-2 reduces the number of distinct disk reads from 18,700 to 15,000. . . sli& through~512 MB dataset. Without~, OSF/1 m-&es poor use of the dis~array. But, informed by hints, TIP is able to prefetch in parallel and mask the latency of the many seeks. There is very little data reuse, so the informed caching does not decrease elapsed time relative to the simple prefetching in TIP-1. Figure (b) shows the benefits of informed prefetching for the Sphinx speech-recognition program. Sphinx is almost CPU-bound, so the improvements are less dramatic. As for XDataSlice, there is little data reuse so informed caching provides no benefit over TIP-1, and, in fact, incurs some additional overhead. Figure ( Sphinx, like XDS, came to us as an in-core only system.
Since it was commonly used with a dictionary containing 60,000 words, the kmguage model was several hundred megabytes in size.
With the addition of its internal caches and search data structures, virh.m-memory paging occurs even on a machine with 512 MB of memory. We modified Sphinx to fetch from disk the language model's word-pairs and word-triples as needed. This enables Sphinx to run on our 128 MB test machine 90% as fast asona512
MB machine.
We additionally modified Sphinx to disclose the word-pairs and word-triples that will be needed to evaluate each of the potentkd words offered at the end of each frame. Because the language model is sparsely populated, at the end of each frame there are about 100 byte ranges that must be consulted, of which all but a few are in Sphinx's internal cache. However, there is a high variance on the number of pairs and triples consulted and fetched, so storage parallelism is often employed. given by replicating the loop that opens input files. The read of the secondary header, whose location is data dependent, is not hinted.
Its contents provide the location and size of the symbol and string tables for that file. A loop splitting technique similar to that in XDataSlice is used to hint the symbol and string table reads.
After verifying that it has all the data needed to produce a fully linked executable, Gnuld makes a pass over the object tiles to read and process debugging symbol information. This involves up to nine small, non-sequential reads from each tile, Fortunately, the previously read symbol tables determine the addresses of these accesses, so Gnuld loops through these tables to generate hints for its second pass.
During its second pass, Gnuld constructs up to five shuffle lists which specify where in the executable file object-file debugging information should be copied. When the second pass completes, Gnuld finalizes the link order of the input tiles, and thus the organization of non-debugging ECOFF segments in the executable file. Gnuld uses this order information and the shuffle lists to give hints for the finat passes.
Our test links the 562 object files of our TIP-1 kernel. These objects file comprise approximately 64 MB, and produce an 8.8MB kernel. Figure 13( To disclose these inner-relation accesses, we employ a lMpspfitting technique similar to that used in XDS. In the precomputation phase, Postgres reads the outer relation (disclosing its sequential access), looks up each outer-relation tuple address in the index (unhinted), and stores the addresses in an array. Postgres then discloses these precomputed block addresses to TIP. In the second pass, Postgres rereads the outer relation but skips the index lookup and instead directly reads the inner-relation tuple whose address is stored in the array. A more dramatic example is a non-hinting Agrep running with Gnuld shown in Figure 16( For example, the second bar from the left in any quartet of (a) is Gnuld not hinting and Agrep hinting. Compare bars one and two or three and four to see the impact of giving hints when the other application is respectively hinting or non-hinting. Compare bars one and three or two and four to see the impact of the second application giving hints.
Gnttki gives hints, it runs longer than Agrep and so never gets out of the way.
Future work
Together, informed caching and informed prefetching provide a powerful resource management scheme that takes advantage of available storage concurrency and adapts to ars application's use of buffers.
Although the results reported in this paper are taken from a running system, there remain many interesting related questions. The key to achieving these goals is to strike a bakmce between the desire to prefetch and the desire to cache.
We present a framework for informed caching based on a cost-benefit model of the vrdue of a buffer. We show how to make independent locaI estimates of the value of caching a block in the LRU queue, prefetching a block, and caching a block for hinted reuse. We define a basis for comparing these estimates: the time gained or lost per buffer per I/O-access interval, and we develop a global min-max algorithm to arbitrate among these estimates and maximize the global usefulness of every buffer.
Our results are taken from experiments with a suite of six I/Ointensive applications executing on a Digital 3000/500 with an array of 10 disks. Our applications include text search, data visualization, database join, speech recognition, object linking, and computational physics. With the exception of computational physics, none of these applications, without hints, exploits the parallelism of a disk array well. Informed prefetching with at least four disks reduces the elapsed time of the other five applications by 20% to 85%. For the computational physics application, which repeatedly reads a large file sequentially, OSF/1's aggressive readahead does as well as informed prefetching. However, informed caching's adaptive policy values this application's recently used blocks lower than older blocks and so "discovers" an MRU-like policy that improves performance by up to 42%. Finally, our experimental multiprogramming results show that introducing hints always increases throughput.
Instructions for obtaining access to the code in our TIP prototype can be found in our Intemet World Wide Web pages:
http:/lwww.cs.cmu. edu/afslcslWeblGroupslPDL.
Acknowledgments
We wish to thank a number of people who contributed to this work including: Chrwlotte Fischer and the Atomic Structure Calcu- 
