Suppose that the cache block size is 2 words, that the subblock size is 1
word, that a program writes the first word in a memory block, and that the write misses. In subblock placement, the word will be written to the cache, and the second word in the cache block will be invalidated.
However, the simplified model will mark both words as valid after the write. Subsequently, if the program reads the second word, the read will incorrectly hit. Thus the CPI reported for caches with subblock placement can be less than the actual CPI. These incorrect hits, however, occur rarely since SML programs tend to do few assignments (see Section 4.3) and since most writes are to sequential locations.
(2) Ignoring the effects of context switches and system calls. Amer Diwan et al. since Section 5.4 shows that write-buffer costs are small.
(9) Assuming CPU cycle time does not vary with memory organization. This may not be true, since the CPU cycle time depends on the cache access time, which may differ across cache organizations.
For example, a 128K cache may take longer to access than an 8K cache. Amer Dlwan et al. thus there are few write misses, the benefit of subblock placement will be reduced.
Benchmarks

Cache and TLB Configurations Simulated
The design space for memory systems is enormous. There are many variables involved, and the dependencies among them are complex. Therefore we could study only a subset of the memory system design space. In this study, we restrict ourselves to features found in currently popular RISC workstations [Cypress 1990; DEC 1990a; 1990b; Slater 1991] . Each breakdown graph breaks down the memory system overhead into its components for one configuration in a summary graph. The write-buffer depth in these graphs is fixed at six entries.
In this section we present only the summary graphs for VLIW ( Figure   2 ).
The data for other programs are similar and are given in the Appendix. A six-deep write buffer coupled with page-mode writes is sufficient to absorb the bursty writes. As expected, memory system features which reduce the number of misses (such as higher associativity and larger cache sizes) also reduce the write-buffer overhead.
Write-Buffer Depth
In Section 5.3.5 we showed that a six-deep write buffer coupled with pagemode writes was able to absorb the bursty writes in SML/NJ programs. In this section we explore the impact of write-buffer depth on the write-buffer contribution to CPI. Since the speed at which the write buffer can retire writes depends on whether or not the memory system has page-mode writes, we conducted two sets of experiments: one with and the other without page-mode writes.
We varied the write-buffer depth from 1 to 6. We conducted this study for two of the larger benchmarks:
CW and VLIW. We fixed 3For Lexgen this region extends a little beyond 512K. the block size at 16 bytes and the write miss policy at write-allocate/subblock placement. Figure  6 gives the write-buffer costs for VLIW with caches of associativity one and two and in a memory system with page-mode writes; Figure  7 does the same in a memory system without page-mode writes. The graphs plot the CPI contribution of the write buffer against cache size; there is one curve for each write-buffer depth. Increasing the cache size or associativity reduces the number of read and instruction-fetch misses, and thus reduces the number of main-memory transactions. Reducing the number of main-memory transactions increases the effectiveness of the write buffer since the write buffer fills up less frequently and has more cycles in which to retire its writes (Section 2.1).
In memory systems with page-mode writes ( Figure  6 ), the difference between the CPI contribution of a one-deep write buffer and a six-deep write buffer is less than 0.05. This is surprisingly small considering the burstiness of the writes. This is due to the effectiveness of page-mode writes. At the next write one cycle later the write buffer is full, and the CPU stalls. After four cycles (see Table V) Amer Dlwan et al, Amer Diwan et al. Amer Dlwan et al. 
APPENDIX. SUMMARY TABLES
