




Bachelor of Engineering (Hons)
Birla Institute of Technology and Science
Rajasthan, INDIA
1990
Submitted to the faculty of the
Graduate College of the
Oklahoma State University










Dean of the Graduate College
11
ACKNOWLEDGEMENTS
I thank my graduate advisor Dr. Mansur H. Samadzadeh for his advice, assistance,
and guidance. His constructive criticism helped me gain confidence. During my whole
graduate studies, I got inspimtion and motivation due to his constant guidance. My sincere
thanks to Drs. Blayne Mayfield and Mitch Neilsen for serving on my graduate committee.
I also want to thank Mr Jim McGee and Mr. Andy Adsit, my supervisor at the
University Computer Center, OSU, for allowing flexible working hours.
I would like to thank my husband Srikanth for his strong encouragement at times
of difficulty, love and understanding throughout this whole process. Fmally, I would like
to express my gratitude to my parents, brother, and sisters. Without their support and





L INTRODUcnON ........................................ 1
n. LITERAlURE REVIEW ..
2.1 Introduction . . . . . .
2.2 Definitions .
2.2 Storage Hiea-arehy ...•....••.•••.•••••••••.•••••••
2.4 Cache Memory .
2.5 Cache Design Parameters
2.S.1 Cache Size .
2.5.2 Block Size .
2.5.3 Cache Organization .
2.5.4 Misses in Prefeteh .. . . . . ..
2.5.5 Misses Occurring in Cumps ..
2.S.6 Cache Coherence .
2.5.7 Cache Consistency .
2.5.8 Replacement Algorithms . . . .
m. DESIGN AND IMPLEMENTATION ISSUES .
3.1 Implementation Platform and Environment .
3.1.1 Sequent Symmetty S/81
3.2 Objective .
3.3 Input Parameters .
3.3.1 Trace Collection Method
3.3.2 Cache Organization ....
3.3.3 Replacement Policies .. . . . . . . . . . . . . . . . . . . . . . . .
3.3.4 ,Scheduling .
3.4 Design of the Simulation .
3.4.1 Page Map Table
















































APPEN'DICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . •. 41
APPENDIX A- GLOSSARY AND TRADEMARK
DNFO~TION . 42





1. Different address tracing techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . • • .. 2
2. Associative mapping using a page map table, given the
virtu.al address . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • .. 11
3. Organization of cache and main memory 19
4. Data. sttueture of cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
S. Data structure of main melDOry . • • • • . • • . . . . • • • . • • • • • • • • • • . . • . • • • 20
6. The data structure used for page map table 23
7. Demand page algorithm 24
8. Pagefault_handler algorithm 24
9. Hit ratio vs. cache size for gcc (LRU policy) 30
10. Miss ratio vs. cache size for gee (LRU policy) 30
11. Miss ratio vs. delay due to a miss for gcc (LRU policy) 31
12. Cache size VI. effective access time for gee (LRU policy) 31
13. Miss ratio vs. cache size for spice (LRU policy) 32
14. Hit ratio vs. cache size for spice (LRU policy) 32
15. Miss ratio VI. delay due to a miss for spice (LRU policy) 33
16. Cache size vs. effective access time for spice (LRU policy) 33
17. Miss ratio vs. cache size for espresso (LRU policy) . . . . . . . . . . . . . . . . . . .. 34
18. Hit ratio vs. cache size for espresso (LRU policy) 34
vi
Figure
19. Cache size vs. effective access time for espresso
(l..,RU pcllicy) .••......•.. • · · .. · · ••.. • . • • • • • • • . • • • • • • • • • . • • •• 35
20. Hit ratio vs. cache size for GNU chess (LRU pcllicy) . • • • • • . • • • • • • • . • • .• 35
21. Miss ratio vs. cache size for GNU chess (l..,RU pcllicy) •...••....••.•••. 36
22. Cache size vs. effective access time for GNU chess
(I..,RU pcllicy) •...•..••....•...........•.•.••...•.....•.•••• 36
vii
LIST OF TABLES




Cache memory is used in most computer systems. An important goal in the design
of a computer system is that it should behave according to the expectations of the
designer. The performance of a system can be captwM and evaluated using various
techniques. Trace-driven simulation is one of the techniques used to smdy the
performance of a computer system.
Nowadays most of the small, mediUlD, and large machines have cache memories
to improve their perfonnance. Infonnation located in cache can generally be retrieved in
less time than the infonnation located in main memory [Smith82]. Trace-driven simulation
is a technique by which, using the actual address traces as the external stimuli, a model
of a proposed system, e.g., cache memory, can be evaluated.
Several address tracing techniques have been developed over the last ten y~,
each one with its own merits and demerits [Stunke191]. These techniques are typically
analyzed with respect to issues such as speed, flexibility, completeness, reduction in
execution time, and accuracy. Different methods of address tracing techniques can be
classified into five categories as given in Figure 1 (adapted from [StunkeI91]). A brief
description of these five techniques is given below.












asse mbly compiler object
level based level
Figure 1. Different address tracing techniques
piXie
off processor memory n:quests when they are sent to off-chip caches and memory chips.
In the altering microcode technique, commonly known as AroM (address tracing using
microcode), the ttaces are obtained by making minor changes to the existing microcode
of a machine. This technique has been employed to obtain addresses for VAX
architectures [Agarwal88]. In the internlpt based technique, every instruction geoemtes
a CPU interropt and the interropt routine analyzes the opcode, calculates the memory
addresses, if any, and stores it in a buffer. Most architectures provide a trap bit that can
be enabled and a corresponding interrupt routine that can be modified to acquin: the
traces. In the instrumented program technique, the application program is instrumented
at specific points. During run time, these extta instructions log the trace information
which, when postprocessed, gives the actual trace. The code level insertion technique can
be carried out at various levels such as source code level, assembly level, or binary level
3
or object level [Stunkc191]. Software simulation methods can model processor execution
and simultaneously provide user ttaees. Pixie is one of the software simulation methods
used to capture traces. This method of generating addIess traces was initially developed
on SPARe systems at Berkeley [Lovett93].
The most accurate way of studying cache performance, before a machine is
actually built, is through simulation [Marcovitz88]. By changing the parameters of a
simulation model, it is possible to simulate a cache of any size. Using this kind of
approach and model, one can design a cache model for a requited behavior. If some
discrepancies are detected, based on the performance analysis of the model, the cache can
be redesigned.
Chapter n of this thesis provides a review of the current literature on the traee-
driven simulation technique and memory management in general. Chapter ill provides a
discussion on the design and the implementation details of the software that was
developed as part of this thesis. The testing and evaluation of the software developed are
discussed in Chapter IV. The last chapter, Chapter V, provides a summary of this thesis,




The most accumte method of detennining the performance of a specific computer
design or the validity of a new architectural approach, is to build it [Lilja93].A complete
implementation is time consuming and expensive, and generally precludes the opportunity
for using the perfonnance evaluation for tuning the system. Therefore, it is necessary to
explore the details of the design, before building a system, using mathematical analysis
or by simulation. A primary goal in modeling a system before constructing the actual
system is to reduce the memory access time in order to reduce the execution time and
improve the perfonnance of the system. Since cache memories are often used in modem
computer systems, the study of cache size, mapping, and replacement algorithms is an
. important field in computer system performance evaluation.
2.2 Definitions
This section contains some of the basic definitions about cache memory that are




• A trace is an address sequence obtained by executing a program and reconIing every
memory location referenced by the program dming its execution.
•Locality 0/ reference is a property exhibited by nmning processes, that processes tend
to reference storage in nonuniform, highly localized patterns.
• All data that is written by at least one processor, and read or written by at least one
other processor, is marked as non-cacheabk.
•Clumpiness means occuning close together. In this thesis, misses refer to cache misses.
Clumpincss in misses refers to misses occUlTing close, or almost overlapping.
• Pre/etch is to get data or instructions required by a program before they are actually
needed.
•Block size or line size is the amount of storage associated with an address tag.
• A cache miss in a cache occurs whenever the desired information is not available in
the cache.
• A cache hit in a cache occurs whenever the desired information is available in the
cache and the processor does not have to wait for the information.
.• A block is defined as a group of words which can be read from or written to a device.
A block in a cache can be divided into words. A block can have any number of words.
Whenever there is a miss, instead of getting one word, a whole block is brought into the
cache.
• When the CPU executes instructions that modify the contents of the current addIess
space, those changes must be reflected in main memory. Effecting the modifications
immediately to the main memory is called write-through.
,
• When the CPU executes instructions that modify the contents of the current adc;Imss
space, those changes can be initially modified in cache and later be reflected in the
memory. This is called copy-back.
•Page map table is a table used to map virtual addresses onto physical addresses.
• Multiprogramming is defined as a collection of processes nmning logically in parallel
where the CPU switches from one process to another process.
• When D101'e than one process is requesting the CPUt the operating system must decide
which one to run first. That pan of the operating system concerned with this decision is
called the scheduler and the process of assigning the CPU to jobs is called scheduling.
2.3 Storage Hierarchy
Storage hierarchy refers to arranging storage devices on the basis of access speed
and cost so that only the most impottant information, i.e., the programs and data
referenced by the CPU directly, is kept on the expensive fast devices and the rest of the
infonnation is kept on inexpensive slow devices [Leung82]. The principal reason in
.having a hierarchial memory system is to improve the effective memory access time aDd
accordingly increase the processing speed [Smith82]. For example, in a two-level memory
hierarchy system having a main memory and an auxiliary memory, the information must
first be ·moved to primary storage before it can be referenced by the CPU. Thus the
auxiliary memory has a copy of all the infonnation stored in main memory. When a copy
of data is modified in main memory, the copy of data in auxiliary memory must also be
modified using a write-through or copy-back scheme. In a two-level system, the data is
7
referenced· from the main memory. If the data that is referenced is not available in the
main memory, then the data must be transferred from the auxiliary memory to main
memory and, unless main memory is not full yet, some page in the main memory must
be replaced using one of the replacement policies such as LRU, FIFO, or MRU.
The conventional storage hierarchy, consisting of main/auxiliary memory, was
extended in the early 60's using an additional level called cache memory, which is a high-
speed storage with a much faster access time than the main memory [Smith82]. Cache
storage is extremely expensive compared to the main storage and therefmeonly small
caches are typically used.
The address space is divided into equal blocks called pages and the main memory
is divided into blocks of the same size called page frames. A page of data will reside in
a page frame of memory, and the typical size of such a block is 512 t9 lK words
[Leung82]. Data transfers in the memory hierarchy are usually done by pages, rather than
individual words or bytes, because locality of teference plays an imponant role in page
transfer..
2.4 Cache Memory
Cache memory, as used in most computer systems, is a high-speed buffer memory
interposed between main memory and the CPU. With the anival of a logical acJdmss from
the CPU, the operation of cache starts [Smith82]. At any time, cache contains most of the
information that a processor needs. Whenever a reference is made to new data and that
data is not present in cache, the old data in cache has to be replaced to give room to the
8
new data brought from main memory. So, in this context, the issues of data traffic
between cache and main memory are analogous to the issues of data traffic between
memory and auxiliary memory.
2.5 Cache Design Parameters
In uniprocessor computers, the main reason in employing a cache is to reduce the
effective memory access time. If the miss ratio is reduced, the execution time can also
be reduced. The execution time being "the sum of the time to service each cache hit plus
the sum of the time to service each cache miss" [Marcovitz88]. H the misses occur close
together (referred to as clumpiness of misses), then the time to service each cache miss
can be less. Thus cache miss ratio can be a good perfonnance metric in a single-
processor, single-cache computer.
There are four important aspects to be considered in designing a cache memory
[Smith82].
1) ImproVing the probability of finding a memory reference's target in the cache (the hit
.ratio).
2) Minimizing the miss ratio.
3) Minimizing the delay due to a miss.
4) Minimizing the overheads of updating main memory, i.e., whether to use a write-
through or copy-back to reflect the modifications.
The following subsections describe the design parameters of a cache memory
system such as cache size, block size, cache organization, misses in prefeteh, misses
9
occmring in clumps, cache coherence, cache consistency, and replacement algorithms.
2.5.1 Cache Size
The size of the cache is an important design decision that impacts the peJformancc
and cost of a cache memory system. The larger the cache, the higher the probability of
finding the required infonnation in it [Smith82]. Obviously, cache cannot be expanded
without limi~ due to its cost and physical size.
2.5.2 Block Size
A block is a group of words that can be read from and written to a device.
Selecting the block size is also an important decision that has to be considered in a
memory system design. Kaplan and Winder [Kaplan73] indicated that there are a number
of trade-offs in selecting the block size. Obviously, the transmission time for moving a
small block from main memory to cache is less compared to that for a bigger block.
Locality 'of reference plays an imponant role in making a decision about the block size.
If the block size is large, the transmission time may be large, but the process can refer
to the same block. If the block size is small, we may have to access main memory twice
instead of just once. So the designer has to decide about the block size so as to improve
the perfonnance of the system.
2.5.3 Cache Organization
Cache organization is one of the design parameters that would influence the
10
performance and cost of a cache memory system. In Older to locate an element in cache,
it is necessary to have some kind of mapping which maps a main memory addmss to a
cache location, or to search the cache associatively.
Various cache organizations such as fully associative, direct mapping, or set
associative are used in most computer systems [Leung82]. 1be fully associative cache
organization allows any page from main memory to be assigned to any page frame in
cache. Figure 2 gives a clear picture of associative mapping. For each page ofdata stored,
the COITCsponding main memory addIess is also stored. Whenever a reference is made,
all the addresses are searched so as to fmd the match for the referenced address. In direct
mapping cache organization, each page in memory can be mapped to a particular location
in cache. This indicates that direct mapping is more restrictive than fully associative cache
organization. Set associative cache organization involves organizing the cache into S sets
of E elements per seL Thus the pagc frames in a set associative cache are grouped into
a number of sets [Smith82]. Each page in main memory is mapped onto a page frame,
which belongs to a particular set in cache. If a particular page is in cachc, it must be
stored in one of the elements in the corresponding set in cache. In this kind of cache
organization, replacement policies will be made to the set of elements involved.
2.5.4 Misses in Prefetch
Prefetching is one of the popular sttategies used to get the pages in cache befo~
a particular page is required. Prefetehing is used to get the data or insttuctions befom they
are actually needed by a program, with the intention that the program might use them in
11
the near future. In prefetebing, the data that may be required in the near future is brought
into the prcfeteh buffer.
There are two situations that can cause misses to occur when using prefeteh
buffers.
1) When the processor requests either data or instructions from main memory that is not
available in cache, the processor has to wait till it gets the data; and
















p Virtual page number
p' Page frame number in main memory
d Displacement
Figure 2. Associative mapping using a page map table, given the virtual address
12
2.S.s Misses Occmring in Oomps
Marcovitz discussed the clumpiness of misses, i.e., misses occuning close together for a
shared memory multiprocessor with prefctehing [Marcovitz88]. When misses aft: close
together, the miss service times can be overlapped. When misses occur, it is good if they
occur in clumps because the service time for those misses can be reduced. Hence the
prefeteh buffer has to wait for more than one miss to occur. Thus the number of misses
that occur close together can be a good performance metric for a uniprocessor computer
in designing a cache [Marcovitz88].
2.5.6 Cache Coherence
Cache coherence must be maintained when considering multiprocessor computers
with shared memory and private caches. In these cases, the cache works like a
uniprocessor's cache as long as a processor accesses data that is not shared with any other
processor, keeping a copy of the recently used locations. In a uniprocessor environment,
memory locations are shared only by a single processor, hence cache coherence need not
be maintained as the processor can read the correct value. In a multiprocessor
environment, the data in a particular location disappears from a processor's cache when
another processor writes into it [Hill90]. When memory locations are shared among
processors, cache coherence must be maintained so that each processor sees a correct
value for the same variable. Marcovitz discusses cache coherence using non-cacheable
marking [Marcovitz88]. Non-cacheable marking can help in maintaining cache coherence
in a multiprocessor environment
14
are typically used in Older to replace the data in cache [Smith82]. The LRU policy using
the stack model can be used to replace the infOrmation in cache. In the LRU stack model
algorithm, the addresses referenced by the processor are placed in a stack with the most
recently used address at the top of the stack and the least teeeDdy used address at the
bottom of the stack. When a particular address is refetenc~ a~h for the referenced
block is carried out in the stack. The referenced address is then placed on the top of the
stack and all other addresses are shifted down [Wang90].
CHAPTERm
DESIGN AND IMPLEMENTATION ISSUES
3.1 Implementation Platform and Environment
3.1.1 Sequent Symmetry S/81
The Symmetry S/81 is a powerful mainframe-elass multiprocessor system
developed by Sequent Computer System, Inc. Its shared-memory, multiprocessing
architecture consists of the following elements [Sequent90):
• A parallel architecture using multiple industry-standard microprocessors.
• The DYNlX/ptx or DYNIX V3.0 opemting system, both UNIX system pons.
• Standard interfaces including Ethernet, MULTIBUS, VMEbus nad SCSI.
The operating system of the Symmetry S/81 have been engineered to incorporate
parallel processing features. However, UNIX compatible softw~ can ron on ~e
Symmetry S/81 without modification or with slight modification. In multi-user
applications, tasks are automatically distributed to multiple processors which generally
increases system throughput and reduces response times [Sequent90].
DYNIX V3.0 supports both the Berkeley UNIX and UNIX System V command
sets, whereas DYNlXlptx is compatible with AT&T System V3.2 only [Sequent90]. The




The main purpose of this thesis was performance analysis of cache using a trace-
driven simulation technique. The simulation was run using address tmees with variations
in cache size, size of a page in cache, leplacemcnt algorithms, and cache access time.
Simulation runs provided experimental JeSuits showing the performance changes (see
Section 4.2) due to variations in those panuneters.
3.3 Input Parameters
3.3.1 Trace Collection Method
Tmces can be collected using certain UNIX utilities such as the profcommand and
UiIDumpSymborrable available on the Sequent Symmetry S/81 machine using
DYNIX/ptx. These address traces serve as input to the simulation. The prof command is
used in generating addresses referenced by programs during execution. The profcommand
interpretS a profile file produced by the monitor function. Profiling is a three-step process.
.First a program is compiled with a -p option, t1\en the program is executed, and finally
the program is run to analyze the data. In DYNlX/ptx, the -p option to the C compiler
command cc arranges for calls to monitor the addresses at the beginning and at the end
of the execution and the proftle file to be written [Scqucnt90].
Some of the traces used as input to the simulation were developed at the "Parallel
Architecture Research Laboratory" of New Mexico State University [Spice94]. Gce, spice,
espresso, and eqntott were some of the ttaees that were developed on the dlx architec~
17
machine and 8ft' kept in the public directory of the ftp site traeebase@nmsu.edu
[Spice94].
3.3.2 Cache Organization
A fully associative cache organization (see Section 2.5.3 for the definitions of
various cache organizations) with page map tables and pages is used in this thesis to
study the performance analysis of the cache. At any time, the cache contains page map
tables and pages of the active jobs only. Several other cache organizations can also be
used for performance analysis of cache.
3.3.3 Replacement policies
The LRU and FIFO replacement policies using a time-stamp are used in replacing
the pages in cache in order to give room to new pages. The resident bit in the page map
table plays an important role in the implementation of replacement policies.
3.3.4 Scheduling
A round-robin scheduling with time-slicing was used in this thesis work to
simulate a multiprogramming environment The choice of a particular scheduling
algorithm can play an important role in improving the performance ofa computer system.
3.4 Design of the Simulation
A traee-driven simulation has been developed on the Sequent Symmeay SlS1
18
machine nmning the DYNIX/ptx operating system using the C programming language.
The input to the simulation is a reference string of five jobs. The tefeleDCe string of five
jobs is stored in a reference file called REFILE. Each tefelence in the reference string
contains two fields. The first field is the refe~nce type and the second field is the
memory address. Each reference in the reference file has a reference type and takes three
values 0, 1, or 2. The value 0 or 1 indicates that a tead operation needs to be pcrf~
and the value 2 indicates that a write operation has to be perfonned.
An array of records has been used to simulate the cache. Once the user inputs the
size of the cache, the array of records will be dynamically allocated according to the input
value. Figure 3 gives a picture of the cache and main memory organization used in the
simulation. A certain amount of the space in cache has been allotted for page map tables
and a certain amount of space has been allotted for pages. A~ any time, the cache contains
the p.age map tables and the pages of active jobs.~ size of the page map tables is fixed
and virtual memory is achieved through page traffic betwccn the main memory and the
cache. Figure 4 gives the data structure used in simulating the cache. Main memory
.contains the page map tables of all the jobs in the system. Wbcne~er ~ ..jQ~< .~~Q.~s
~~.!~~ ..~.copy of the pale map ~ble is brought~ the main memory~p'u~ .~ ~~._
Main memory also contains the global free frame table. This free frame table contains the
information as to which page in the main memory is either allotted or available. Once a
job terminates, all the page fnunes allotted to that job are made available for the other




















memory with page map







Main memory with page map tables
and pages of the jobs in the system
Figure 3. Organization of cache and main memory
3.4.1 Page Map Table
Page map table is used to map vinual addresses to physical addresses. ~.._~
~dress contains a(viI1ual page number and an offset (a virtual address in general can
. '.- ~--""""""",•• "w.""",--~,-,,,,,,,,,,,,,,,,~,~,,,,,~--~,,, ...••.,... -;"-...... •.. ··~·.i ......_,_·····'_ ... ~·· .
contain a segment number also, but segmentation is beyond the scope of this thesis). The
virtual page number is used as an index into the page table. From the page table entry,
"' /
the pag~ frame nuihber is found. The page frame number is appended to the offset~,.~
"'-.... r ... _ ... -. ·_- ...·• ... - • _ '. _ _.", _-".,.. "",'. .,. .. ', _I'~"' •• ~1r·'''''''·W-'''
fQ.ml_~~"ppy~i~ addres_~. The exact layout of each entry in the page map table is highly
machine dependen~ but the kind of information stored is almost the same from machine
to machine. A typical page table entry has 32 bits, out of which 21 bits are allotted for
20
the frame number, 1 bit for the modified or ditty bit (to indicate if the referenced page
has been modified), 1 bit used as the resident bit (to indicate if the page is in cache), and












Figure 5. Data structure of main memory
3.4.2 Process Control Block
The PCB is a central store of infOrmation that allows the operating system to
locate all key infonnation about a process. When the operating system switches the CPU
among processes, it uses the save areas in the PCB to hold the information such as the
identification number of a process, the current Slate of the process, and the process'
priority. Whenever a process gets the CPU, it uses the infOrmation stored in the PCB to
21
restart the process.
The pcb typically contains the following infOrmation for each job.
1) The job id
2) When the job entered the system.
4) Number of pages allotted for the job.
S) The starting address of the job in cache.
3) The starting address of the page map table of the job in the main memory.
In this simulation program, a free PCB is obtained and allotted for a job whenever
a job enters the system, and the job's identification number is stored in the PCB.
Whenever the CPU switches among jobs, the jobs' current swus is stored in the PCB so
that, when the job gets back the CPU, the operating system can use the information stored
in the PCB to restan the process.
3.5 Implementation Details
The main input to the simulation program is a reference string (also referred to as
a trace) and the cache size. The reference file (REFILE) consists of reference strings for
five jobs~ Each reference in the reference fde is processed separately. Some of the
references used for this thesis are actual memory traces [Spicc94]. To simulate a multi-
user environment, the individual traces were interleaved.
The simulation program is menu driven. A user can input design parameters such
as cache. size and replacement policy, and obtain performance graphs generated by the
system. The simulation has been implemented using the round..robin schedtJ1ing algorithm.
In round-robin scheduling, a job is ron till the time slice exp~s, the job terminates, the
job asks for I/O, the job biggers a page fault, or the job asks for interprocess
22
commuDjcation, then the next job in the queue is given the CPU.
The cache used in this simulation contains the plge "",n tables and nstI'P_C of active
... ~ -------.'-"""-~".,,_ .• - ••. , .'"_., .' - .. " , ,_ .•._.- ,....... . ~ ,.f'O':: 'i., , t__ ,.'"" ..~~=o» ":'_.r_ _,.."" ,~~, " .'_"
jobs only. The size of the page map table and the number of pages allotted for each job
-..--.----~-,., .
are fixed and the maximum degree of multiprogranmring is four. Thus, when four jobs
are active, a copy of the four jobst page map tables are brought into the cache from main
memory. A fixed number of cache page frames ale allotted to the active jobs when the
cache is loaded Each page of each job is mapped onto a distinct page in main memory
via the page map table. Figure 6 gives the data sbUCture used for the page map table in
the simulation. The first page referenced by a job is always loaded into the cache and the
resident bit for that page in the page map table is set to 1. The rest of the pages allotted
for a job in cache are loaded upon request, using a demand page algorithm given in
Figure 7. Each time a page is loaded into the cache, the resident bit for that particular
page in the page map table is set to 1.
Once all the pages allotted for ajob become full and a new page has to be brought
in, one of the pages allotted for the job needs to be ~placed using one of the leplacement
policies such as LRU or FIFO using the time-stamp. A variable called clock is used to
indicate when a page was last referenced. A time-stamp is associated with each entry in
the page map table and is used for implementing replacement policies. When a reference
is made to a particular page and that page is not available in cache (i.e., a cache miss),
the desired page has to be brought into the cache. The main memory page frame number
is obtained from the page map table and the page is brought from the main memory and









Figure 6. The Data stnlcture used for page map table
In the LRU (least recently used) policy, the entry in the page map table whose
resident bit is set to 1 is checked to fmd the entry with the lowest time-stamp, and that
page is replaced. In the FIFO (fIrSt in first out) policyt the entry in the page map table
whose resident bit is set to 1 is checked to see which entry has the highest time-stamp
value, and then that page is replaced. Figure 8 gives the pseudocode of the
pagefault_handler algorithm used in the simulation. Once the job tenninates, the cache
is flushed and the main memory free frame table is adjusted accordingly.
calculated by the number of statements executed to get the page from main memory, and
the effective access time is calculated by the sum of the time to service each cache hit
plus the sum of the time to service each cache miss.
if(paqe is not in cache)









Figure 7. Demand page algorithm







Figure 8. Pagefault_handler algorithm
24
CHAPTER IV
EVALUATION OF 1HE SIMULATION
In this chapter, the evaluation of the simulation is mentioned with some
observations based on the simulation. The results obtained through the simulation are
compared against the results obtained by Marcovitz [Marcovitz88], Smith [Smith82], and
Agarwal [Agarwal93].
4.1 Test Programs
Several traces obtained from the Parallel Architecture Research Laboratory of New
Mexico State University were used to drive the simulation [Spice94]. The test programs
that were used are gee, spice, espresso, eqntott, and matrix. These traces were captured
in real time from ten SPEC89 programs running on a Sun 3/00 under SunOS 4.0.3
[Spice94]. TABLE I gives the nature and characteristics of the traces used. Sevc~
graphical user interface application programs written in C were also used to collect
address ttaees. These reference strings were also used to drive the simulation. The
programs that were used were GNU chess, lander, xboard, xpaint, and ClWM. The miss
ratios, hit ratios, delays due to a caehe misses and execution times were obtained and the
graphs were plotted to evaluate the simulation. A brief description of the programs, whose
traces were obtained to drive the simulation, are given below.
25
26















,Gee Progmm: Gee is the trace obtained from the GNU C compiler. The GNU C compiler
is written in C. This benchmark "measures the time it takes for the GNU C compiler to
convert a number of its pre-processed source fues into optimized Sun-3 assembly
language output" [Jhonson94].
Spice Progmm: Spice is the trace of the analog circuit simulator written in FORTRAN
with a C interface to UNIX. This benchmark is a "general purpose circuit simulation
program for nonlinear dc, nonlinear transient, and linear ac analyses" [Johnson94].
27
Esoresso: Espresso trace is the trace obtained from a program used to minimize logic
equations in computer design. This program is written in C.
Egntott: Eqntott trace is the trace of a program that converts logic equations to troth
tables.
Matrix: Matrix trace is a trace obtained from a mattix multiplication program written in
C. This ttbenchmark also performs transposes using Linpack routines on matrices of Older
300" [JoOOso094].
GNU Chess: GNU chess is an ANSI/C version chess program developed by Stuart
Cracmft.
Lunar Lander Game: The lunar lander game is a C implementation program of the old
"lunar lander'· game seen in amusement arcades. This program was developed using
curses.
Xboard: Xboard is an XIIIR4-based user interface for GNU chess.
Xpaint: Xpaint is also a graphical user interface program developed in X-windows. The
program was developed by David Koblas used for drawing and editing figures similar
to macpainl
CIWM: CIWM (Claude's tab window manager) is a window manager for X-Windows.
28
4.2 GRAPHS
Graphs have been plotted using Harvard Graphics (Harvatd Graphics for windows
Ver 3.0, a software package developed by the Software Publishing Corporation), which
is an interactive graphics package used to plot graphs. This tool was used to plot graphs
with the hit ratio on the Y axis vs. the cache size on the X axis, or the miss ratio OIl the
Y axis VI. the cache size on the X axis. Several graphs were plotted with different cache
~,..~"",--.'U- ,_.", - \. ~, - ~ : ~.' "':, - ~. , •.,' :jIl)~_J.,,_.- ' , ' ~, "\Il-' ,.;7'I:" k,.,'~ .. _, _"'._ -,..". ~ _ ' " ,.,.-:,A.,
sizes and a fix~_~p~~g~_~_~~ ..~_12...~~.~,p~ge, using the two diffcmnt replacement
.~_._._-_.... .
policies of LRU and FIFO. Graphs were also plotted with the delay due to a miss on the
X axis and the miss ratio on the Y axis for all the test programs. From the graphs
obtained, it can be observed that the miss ratio can be a good performance metric in
designing a cache. From the graphs. it can also be observed that the perfonnance of a
system can be improved by including page map tables and pages in cache, becallse the
effective access time is less.
4.3 Observations
The graphs were plotted for all the test programs, and the graphs obtained were
compared with the comparable graphs from the literature [Agarwal94] [Marcovitz88]. The
graph in Figure 9 for the gcc trace (using the LRU policy) shows that as the cache size
increases, the miss ratio decreases; but after a certain stage, the miss ratio is not affected
even after increasing the cache size. The graph in Figure 10 for the gee trace (using the
LRU policy) shows that as the cache size increases hit ratio also increases. The graph in
Figure 11, plotted for the gee trace, shows that the delay due to a cache miss decreases
29
as the miss ratio decreases, because there ate few misses and the amount of time 10
service a miss is less. The graph in Figwe 12 shows that as the miss ratio decreases, the
effective access time also decreases because the number of times the main memory is
accessed to service a cache miss is less. The graphs (the delay due to cache miss vs. the
miss ratio and the miss ratio vs. the effective access time)W~ compared with the graphs
obtained by Marcovitz [Marcovitz88]. The graph in Figure 13 for the spice trace (with
the LRU policy) docs not show much difference in the miss ratio, even after increasing
the cache size, mainly because of the reference pattern. The graph in Figwe 17 for the
espresso trace (with the LRU policy) shows that sometimes the miss ratio decreases and
sometimes the miss ratio remains unchanged even after increasing the cache size,
depending on the behavior of the program in execution. We can observe the same changes
even in the hit ratio vs. the cache size. The graphs plotted W~ also compared with the
results obtained by Agarwal [Agarwal93]. The graphs in Figwes 14, IS, and 16 plotted
for the spice trace, in Figures 18, and 19 plotted for the espresso trace, and in Figures 20,
21, and 22 plotted for the GNU chess trace can be analyzed in a similar way. So, by
. having the page map tables and the pages in cache, the effective access time is reduced.
If effective access time is reduce~ the overall execution time is also reduced and these











16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size










16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size











0276 0.28 0.284 0.288 0.292 0.296 0.3 0.304 0308 0.312 0.316 0.32 0.324
miss ratio







420016 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size











16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size











0.88 "-.- ~---~ --..I.......
16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size

















700 ........_~ __'_ __..10_....L_ __'__ __L._ __'__--'
0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12
miss ratio







70016 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size









16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size








0.4816 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size







16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size









0.1516 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size












16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88
cache size


















Figure 22. Cache size V5. effective access time for GNU chess (LRU policy)
CHAP1ER V
SUMMARY AND FU1URE WORK
5.1 Summary
In Chapter I, the signifICance of the simulatio~ the introduction, and the main
objective of the thesis was stated. Chapter n presented a general introduction to cache
memory. The topics covered in this chapter consisted of the basic definitions to
understand cache design, storage hierarchy, and some of the imponant cache design
parameters such as cache size, block size, cache organization, replacement policyt cache
coherence, snoopy cache mechanis~ and cache consistency. Chapter mdiscussed the
implementation issues and traee-driven simulation. Section 1of OIapter m addressed the
implementation platfonn and the run-time environment Oaapter m also contains the trace
collection method, a brief description of page map tables, and other implementation
details. Chapter IV discusses the evaluation of the simulation, the test prograrm used, and
the graphs obtained.
The main objective of this thesis was to develop a simulation package for cache
memory using a ttaee-iJriven simulation technique. This package can be used to design
a system and improve the perfonnancc of an existing system. 1be results of this
simulation were compared with the results obtained by Marcovitz [MaJcovitz88], Agarwal
37
[Agarwal93], and Smith [Smith82].
5.2 Future Wark
The future versions of this package should remove one or more restrictions
mentioned below. The size of page map tables used in cache and main memory am fixed
in the current implementation. The page map table size can be varied and allocated
dynamically. A fIXed number of page frames were allotted for each active job in cache.
The number of page fnunes allotted for each job can be varied. Several other replacement
algorithms such as second chance replacemen~ most recently used (MRU), and least
frequently used algorithms can also be used as page replacement policies. Several other
scheduling algorithms like FIFO (fIrSt in first out), SJF (shonest job fIrSt), and priority
scheduling can also be used.
38
REFERENCES
[Agarwal88] Anant Agarw~ John Hennessey, and Mark Horowitz, "Cache
Performance of Operating System and Multiprogramming Workloads", ACM
Transactions on Computer Systems, Vol. 6, No.4, pp. 393-431, November
1988.
[Agarwal93] Anant Agarwal and Steven D. Pudar," Column-Associative Caches: A
Technique for Reducing the Miss Rate of Direct-Mapped Caches", Proce~dings
01 the 20th Annual International Symposium on Computer Architecture, Los
Alamitos, CA, USA pp. 179-190, 1993.
[Hill90] Mark D. Hill and James R. Larus, "Cache Considerations for Multiprocessor
Programmers", Communications of the ACM, Vol. 33, No.8, pp. 97-102,
August 1990.
[Johnson89] Eric E. Johnson, "Working Set Prcfetehing for Cache Memories", ACM
Computer Architecture News, Vol. 17, No.6, pp. 37-141, December 1989.
[Johnson94] Collen S. Schieber and Eric E. Johnson, "RATCHET: Real-time Address
Trace Compression Hardware for Extended Traces", ACM Per/ormtJllCe
~al""tion Reviews, Vol. 21, No.3, pp. 22-32, April 1994.
[Kaplan73] K. R. Kaplan and R. O. Winder, "Cache-Based Computer Systems",
IEEE Computer, Vol. 6, No.3, pp. 30-36, March 1973.
.[Uung82] Yuk-Hoi Leung, "A Variable Cache Simulation System", Project Reportlor
Masters Degree, University of Southwestern LDuisian~ S7 pages, May 1982.
[Lilja93] David J. Lilja, "Cache Coherence in Large-Scale-Memory Multiprocessors:
Issues and Comparisons", ACM Computing Surveys, Vol. 2S, No.3, pp. 303-
338, September 1993.
[Lovett93] Tom LDvett, Sequent Computer Systems, Inc., Personal Communication,
June 1993.
[Marcovitz88] David Michael Marcovi~ "A Multiprocessor Cache Performance
Mettic", Technical Repon CSRD Rpt. No. 813 (UILU-ENG-88-8011), Centre
39
for Supercomputing Research and D~~lop~nt,University of Dlinois. Umana.
IL, August 1988.
[Sequent90] DYNIXlptt User's Guide, Sequent Computer Systems, Inc., 1990.
[Smith82] Alan Jay Smi~ "Cache Memories", ACM Computing Swveys, VoL 14,
No.3, pp. 228-270, September 1982.
[Spicc94] An International Trace Archive, NMSU TrQCebas~, New Mexico Stale
University, Lascruses, NM, 1994.
[Stenstrom90] Per Stenstrom, " A Survey of Cache Coherence Schemes for
Multiprocessors", IEEE Computer, Vol. 23, No.6, pp. 12-24, June 1990.
[Stunkel91] Craig B. Stunkel, Bob Janssens, and W. Kent Fuchs, "Addlcss Tracing for
Parallel Mechanisms", IEEE Computer, VoL 24, No.1, pp. 31-38, January
1991.
[Wang90] Wen-Hann Wang and Jean Loup Baer, "Efficient Trace-Driven Simulation
Methods for Cache Perfonnance Analysis", ACM SIGMEfRlCS: PerformtJnC~















One of the code insertion techniques in which a program is
modified at the assembly level to generate addresses.
Coherence is correctness; cache coherence means caches must be
able to see the~ value for the same variable when a memory
location is shaJed by different processors so as to maintain the
correct execution of programs.
One of the code insertion techniques in which a program is modifi-
ed during compile time to generate addresses referenced by the
processor.
An interactive, command-driven function plotting program.
The time between two misses on a single processor.
One of the code insertion techniques (sometimes called as link time
code modification) in which a program is modified during the
link time for generating address traces.
A program used to capture traces teferenccd by the processes
during program execution.
TRADEMARK INFORMAnON
DEC is a registered trademark of Digital Equipment Corporation.
DYNIX, DYNIXlptx, Sequent, and Symmetry are registered trademarks of the Sequent
Computer System, Inc.






This program is used to study the perfo~nce of cache. A cache with
page map tables and pages has been used in the stmulation. At any
instance, cache contains page map tables and pages of active jobs only.
Certain amount of storage in cache is reserved for page map tables and
certain amount of storaqe is used for pages. The jobs table gives the
info~tion of the startinq address of the job in cache. Each entry in
the page map table contains the page frame number, resident bit,
modified bit, location of the first paqe in cache, and time stamp. The
replacement policy used is LRU and FIFO to replace the page to give
room to the incoming page. The following information is obtained from
the simulation. The cache miss ratio, the hit ratio, effective access























/* size of the each page *1
/* the storage space for page map table in
cache*1
/* the storaqe space for page map table in
memory*1
1* size of the page map table *1
1* to check if the jobs are done */
1* max~ number of jobs */
1* a global variable to check if it is a
cache miss*1
1* a global variable to check if the
page map table of the job is loaded
successfully*1
1* a check to find if the cache is full */
1* variable used for random number
generator*1
1* a check to find if the job haa
terminated *1
/* check to see if the time has expired·/
/* a boolean variable for true or false*/
1* a boolean variable for true or false*/
1* a variable used for random number
generator*1





1* a variable used for random number
generator*/




The structure used for cache,main memory,p&ge map table, pcb, list of jobs
in the system, list of jobs in the active queue.
***********************************************************************/














/* structure declaration for the word */
1* each entry in page map table contains the follovinq fields
1. The main memory paqe frame number
2. The resident bit to indicate whether the page is present
in cache;
3. The modified bit to indicate that the page has been modified
since it has been last referenced.
4. The time stamp used for the replacement policy.



















































































struct perf perf [MAXJOBS] ;
struct arr arry[1024];
struct pmC ca~arry[10];
struct jobs table list[MAXJOBS];
struct P9_1Ist list-P98S [1200];
struct temperory buffer[MAXJOBS];
struct pmC ma.JXDtarry lMAXJOBS] ;
/* some of the global variables used */
int temp[25]:
int cachesize - 0;
int numpagee cache - 0;
int perfcnt =0:
double seed - 1.0;
int buffcnt - 0;
int clock tick - 1;
int numpg-frames - O,blksize - 0;
int pc}_size - O,pmtendaddr ca - O,pmtendaddr IDa - 0;






int choice,i - O,j • 0;
int time slice - 0:
int base-addrca - O,base addrma - O,pmt cntma - 0;
int statcnt - O,pmt cntca - O,numpme - O,config_no - 1;
float exec time - O~O;
int mem:size - 0;
char policy[6],schedulinq[lO];






printf(" * MENU *\0"):
printf(" * ------------ ----. -- *\n");
printf(" * *\n");
printf(" * ENTER 0 -> ENTER THE CACHE SIZE *\nW);
printf(W * ENTER 1 -> PERFORMANCE ANALYSIS *\n");
printf(" * ENTER 2 -> TO END THE SESSION *\nW);
printf(" * *\n");
printf(" ****************************************\n");











pc} size - 512;
'dW,confiq_fto)i
time slice - 0;
cacheaize - 0;
base adclrma - 0;
base-addrca - 0;
blksIze - 512;
numpages cache - 0:
statcnt -; 0;
numpg frames - 0;
printfC-\n ENTER THE CACHE SIZE NOW-);
scanfC-'d",'cachesize);
~et(Cstruct pcb info *)pcb,NULL,MAXJOBS *
sizeof(struct pcb Info»;
~et«struct arr *)arry,NULL,1024 * sizeofCstruct
arr»:
~et(Cstruct temperory *)buffer,NULL,MAXJOBS *
sizeof(struct temperory»:
initialize-P8rfo~nce();
me~et«struct pmt *)ma~tarry,NULL,MAXJOBS *
sizeof(struct pmt»;
~et«struct pmt *)ca~tarry,NULL,10 *
sizeof(struct pmt»;
~etC(struct jobs table *)list,NULL,MAXJOBS *
sizeof(struct jobs table»;
~et«struct pg_Iist *)list-Pges,NULL,1200 *
sizeof(struct P9_1ist»j




numpaqes_cache - (caches!ze)/(P9_size) - PMTSIZECA;
cachesize - cachesize/S12;
printf("\n CACHE SIZE IS
\d",caches!ze);
printf("K");
printf("\n NUMBER OF WORDS/PAGE IN CACHE
\d",blksize);







1* main memory size is in bytes */
numpq_frames - «~ize)/pq_size) - PMTSIZEMEMi
pmtendaddr_ca - (PMTSIZECA) * 512;
pmtendaddr_ma - (PMTSIZEMEM) * 512:
for (pmt_cntma - 0; pmt_cntma <-
(PMTSIZEMEM)/2;pmt_cntma++)
(
IDaJXlltarry(pmt_cntma] .pmt_flaq - 0:
maJXlltarry[pmt_cntma].base~taddr-
base addrma:
base:addrma - base addrma + 1024;
47
48
numpmt - (PMTSIZECA I 2) - 1;





base:addrca - base_addrca + 1024;
mycache - NULL;
mem - NULL;
/* allocatinq memory to cache dynamically */
mycache - (struct cache*)m.lloc(sizeof(struct cache»;
if(mycache -- NULL)
(
printf("\n MEMORY ALLOCATION ERROR");
exit(l);
)
/* allocating memory and initialising members in cache
*/
allocatemem initialise cache():
/* allocating memory for main memory */
mem - (struct mamem*)malloc(sizeof(struct mamem»;
if (mem -- NULL)
(
print£("\n MEMORY ALLOCATION ERROR-):
exit(l)i
allocatemain_initialise():
/* making page map tables for the jobs in the system
*/
memory module();





printf("\n JOBS STARTED EXECUTION");
if (strcmp(schedulinq, "RR") -- 0)
(
printf("\n ENTER THE TIME SLICE");
scanf("'dW,'time slice);




printf("\n ERROR IN TYPE OF SCHEDULING"):
break:
case 1: printf(W\n PERFORMANCE STATISTICS"):
statcnt - Ii
for(i - O;i<- perfcntii++)
(
exec time - (float) (perf[i].cache hit) +
perfTi] .page_time: -
printf("\n JOB ID : \dw,perf[i].job_id);
printf("\n CACHE HITS :
\4.3£", (float) (perf[i].cache hit)/(float)
(perf[il.cache_hit + perf[i]:cache_mdss»;
49
print£(-'n CACHE MISS :
'4.3£-, (float) (perf[i].cache miss)/(float)
(perf[i].cacbe hit + perf[i]~cache miss»;
printf(-\n DELAy T~ : -
'4.3£-, (perf(i].page time»;
printf(-'n EXECUTION-T~ : '4.3f-,exec time);
printf(-\n UPDATE TLME : -
\f·,perf[i].update time);

















memaet«struct word *) (mycache->pmpt),NULL,4096 *
s!zeof(struct word»;
free(mycache);





memaet«struct word *) (mem->pmt),NULL,S120 *
sizeof(struct word»;















int i - Oi
for(i - Oii < MAXJOBSii++)
(





perf[i].hit time - 0.0;
perf[i].miss time - 0.0;
perf[i].update time - 0.0:
perf[i].cupage-time - 0.0:






FUNCTION : allocatemem initialise cache()
PURPOSE : This function is used to allocate memory dynamically to




iot i - O,k - 0;
/* initialisinq paqe map tables */




mycache->pmpt[i] .modifibit - 0;
mycache->pmpt(i].time_stamp - -1;
mycache->pmpt(i].oper - , ';
mycache->pmpt{i].lpq_ca - -1;
/*allocatinq memory dynamically to the paqe*/
mycache->pg - (PAGE **)malloc(numpaqes_cache * sizeof(PAGE *»;
if(mycache->pg -- NULL)
(
printf("\n MEMORY ALLOCATION ERROR");
exitCl);
}
for(i - 0:1 < numpaqes_cache:i++)
(
mycache->pg[i] - (PAGE *)malloc(sizeof(PAGE»;
if (mycache->pg(i] -- NULL)
(
printf("\n MEMORY ALLOCATION ERROR");
exit(l);
}











FUNCTION : allocatemain initialise()




int i O,k - 0;





mem->pmt[i].time stamp - -1;
mem->pmt [i] .oper-- , ';
mem->pmt[i].lpq_ca - -1;
)
/*allocating memory dynamically to the page*!
mem->mpg - (PAGE **)malloc«numpg frames) * sizeof(PAGE *»;
if (mem->mpq -- NULL) -
{
printf("\n MEMORY ALLOCATION ERROR")i
exit(l);
for(i - Oii < numpg_frameSii++)
(
mem->mpg[i] • (PAGE *)malloc(sizeof(PAGE»;
if (mem->mpq[i] -- NULL)
(
printf("\n MEMORY ALLOCATION ERROR");
exit(l)i
/*allocate memory dynamically to the word*!










FUNCTION : get random no()
PURPOSE : This functIon returns a pseudo random number generator
qreater than or equal to zero and less than 1.The maxtmum




seed - test + MMi
52
double hi,lo,teati
hi - (int) (seed/QQ):
10 - seed - QQ * hi;


















m - get random noel;




FUNCTION : memory module ()
PURPOSE : This function is used to make the page map tables of the





int pmt_index - O,pcb_index - 0;
int flaq,rnd - O,lpgno - 0;
char lopgno[S],type(S];
int typ,count - O,j_id - O,ent - 0;
int max - O,loca - O,tempaddr - 0;




printf(ft\n ERROR OPENING INPUT FILE");
exit(l);
/* obtain the paqe map table */
pmt_index - obtain~tma();
















/*numpq_frames - (numpq_frames - PMTSIZEMEM) + 1:*/
while(!(feof(fpl»)
(
~et«struct arr *)arry,NULL,1024 * sizeof(struct arr»i
ma~tarry(pmt_index].~_flaq- 1;
pcb [pcb_index] .pcb_flg - 1;





perf (perfcnt] .job id - j idi
max - 0: --
tempaddr - pcb [pcb_index] .base_add~;
flag - TRUE;
while«strcmp(type,"JID·) !- 0) " (flaq -- TRUE»
(































loea - tempaddr + Ipgno;
}




/* making the flag of the occupied page to 1*'





























pmt index • obtain~tma()i















PURPOSE This function is used to load the paqe map table into cache
and the page frames are allocated to the jobs dependinq on






int i - O,prid - O,loca - 0;
int jd - O,ref - O,pcin - O,addr ma - 0;
int main_no. O,paqe_fto, fr....P9 - O;num - 0:
int locind - O,locca - O,sum - O,locca_addr - 0;
prid - idi





/* finding the pcb for the job given the job id*1
pcin - i;






/* address of the job in the main memory *1
addr_ma - pcb(pcin].base_add~;




/* fixed number of page frames being allotted to each job */
pcb[pcin).nopages - numpages_cache/4i
num - pcb(pcin].nopag8si
num - num + fr,J>9i
1* setting the pages that have been allotted to each job as
occupied*/





1* if there is no free page map table available then
the cache is returned full */
if (locind -- -1)
return(CACHEFULL):
loca - addr ma + ii




1* if a free paqe ~p table is available then the page map
table is loaded into cache */
Iocca addr - ca~arry[locind].base~taddr;
pcb[pcin].baseTaddrcac - locca_addri
c8-pmtarry[loc1nd].pmt_flaq - 1;








mycache->pmpt(locca).lpg ca ~ mem->pmC[loca).lpg Cai
mycache->pmpt[locca].ttmi stamp - -
~>pmC(loca].ttme stamp:
mycache->pmpt[loccaT.oper ~ ~~[loca).oper;
ref - buffer[jd].wrdbf[O].ref no;
main_no - mycache->pmpt(pcb(pcin].baae addrcac +
ref].mapaqeno; -
mycache->pmpt[pcb(pcin).base addrcac + ref].residbit - 1:
mycache->pmpt[pcb[pcin].baae-addrcac + ref).lpq_ca - fr-P9i
mycache->pmpt(pcb[pcin).base-addrcac + ref].time_stamp -
clock tick; -
clock-tick++;


















FUNCTION : round robin scheduling()







int cae reply • O,cpu_reply - 0:
int actIcnt - O,flq,actcnt - 0:
int i - O,jb_ind - O,jb_id - O,jld - 0;
int j id - 0, jobs cnt - O,NOMOREJOBS,numjobs - O,ALLJOBSLOADED;
int ALLJOBSDONE,deq multi - 0;









jb_id - list[jb ind].liat id:- -
ALLJOBSLOADED - FALSE·• I
wh11e(NOHOREJOBS -- TRUE)
(
1* loadinq the jobs until cache is full */
cae_reply - loadinq cache(jb id);7hile «cac_reply !--CACHEFULL) " (ALLJOBSLOADED !- nUE))





















/* the job is run until time slice expires */
cpu_reply -
run job timeslice(rrslice,j id,repolicy,active que);
actIve que[actcnt].nopaqesaIlt - active que[O]7nopaqesallt;
while(cpu reply !- TERMINATED) -
( -







active que[actcnt].jbflq - Oi







printfCW\n TERMINATED JID : \d -,j_id);
active_que[O].list_id - Oi
active que[O].jbflq - 0:









jld - alljobs loaded()·







jb ind - get job(numjobs);
jb:id - listTjb_ind].list_id:





/* jobs beinq sent to CPU */
cpu_reply -
run_job_timeslice(rrslice,j_id,repolicy,active que):
if (cpu_reply -- TERMINATED) -
(
printf("\n TERMINATED JID : 'd-,j_id):
active que[O].list id· 0;
active:que[O].jbflq - 0;







/* if the job has not te~nat.d, then the next job in
the active queue is given the CPU */
active_que[actcnt].list_id - j_id:
fore! - O;i < actcnt:i++)
(
)





PURPOSE : This function is used to adjust the number of jobs in the






int i - O,j - Oi












FUNCTION : obtain ind()
PURPOSE : This function is used to obtain the correct job id when the





int i - 0:







FUNCTION : get job()
PURPOSE : This function is used to qet the next job in the system
**********************************************************************/
int qet job (njbs)
int njbs:
(
int i - 0;













PURPOSE : This function is used to check if all the jobs in the system





int i - 0;
1* finding out the number of jobs in the system */













FUNCTION run job timeslice()
PURPOSE This function is used to run the jobs qiven the timealice and
the job is run till the time slice expires, Once the time
slice expires and if the job haa not te~nated, the status
of the job is kept in the program counter so the next time
the job becomes active, the job can start its execution from





struct jobs table actlist[MAXJOBS]i
( -
int main_no - O,jbpages - O,rnd_no - 0:
int pqin cache - 0:
int pcb Id - O,jb run - O,i - O,j - O,k • 0;
int tim8 - O,prescnt - O,ref - O,main_frno - 0:
int maddre - O,pmtaddr - 0;
int TERHFLG,check- O,perfct - 0;
if(pcb[i].jid -- rjid)
break:





pcb id - ii
jbpaqes - pcb[pcb_id).nopaqes;
pgin_cache - pcb[pcb_id].lfirst-P9ca ;
maddre - pcb[pcb_id].base_addrcaci





jb run - i;
jbpages - pgin_cache + jbpaqes;







present - pcb[pcb id).pccounter;
'l'ERMFLG - 0; -




ref - buffer(jb_run).wrdbf[preacnt].ref no;
pmtaddr - maddre + refi -tf (bUffer[jb_runJ .wrdbf[prescnt).term£lQ -- 1)
TERMFLG - 1:
}









j - pqin cache + check:











main no - mycache->pmpt[pmtaddr].mapaqeno;
delay++;




















































FUNCTION paqefault handler LRUtime stmp()
PURPOSE This function is used to replace the page in the cache to
give room to the incoming paqe using a least recently used
policy using time stamp.
**********************************************************************/
paqefault handler LRUtime stmp(replno,perfjd,pgref,perfl)
int replno,perfjd;pgref,Pirfli
(
int i - O,minimum • 0;
int baddr - O,j - O,k - 0;
int page being_replaced - O,~l - O,~_frame - O,pqno_cache - 0;
int tempIndex - 0;
struct arr temp(30]i
/* Here an LRU replacement policy is used to replace the paqe
usinq the time stamp */




iOr(i - baddr;i <- (pcb(perfjd). jb_size + baddr) ;i++)
delay++;
tf(myCaChe->pmptlil.reSidbit -- 1)














for(k - l;k < j;k++)
(
delay++:
if (temp [k] .no < mdnLmum)
(


















if (mycache->pmpt [rpq_frame] .oper - 'w')
(













~>mpg[~l]->Vrd[i] .lpg ca -
} mycache->pg(pgno_cachel->wrd(il.lpg_ca;
perf [perfl] •update time..... 6 * (512 * 0.0005) i
delay++; -














delay - delay + 6 * 512;
1* here the page map tables are updated */
mycache->pmpt[baddr + pgref].residbit - 1;
delay++:
mycache->pmpt[baddr + pgref].lpg_ca - pgno_cache;
delay++:












int i - O,k - O,m - 0:
int addrca - O,pmtind - O,addzma - O,loc - O,pqes - 0:
int main_no - O,totca - O,pqe - O,totmem - 0:
addrca - pcb(caind).base_addrcac:
addrma - pcb(caind].base_addCDem:
loe - pcb[caind].lfirst-P9ca ;
pges - pcb[caind].nopages;
totca - addrca + 1024;





pqe - loe + pqea;









mycache->pq[i]->vrd(k).time stamp - -1:
mycache->pq[il->Wrd[k).oper-. , 'i
mycache-~[i]->wrd[k].lpq__c. - -1;
1* free frame table is set */
fore! - addrca;i < totca·i++)( ,
main_no. mycache->pmpt(i].mapagenoi
if(main no !- -1)
( --





mem->mpq(main no)->wrd[m).time stamp - -1;
mem->mpq (main:no] ->wrd(m] .oper--. I ';







mycache->pmpt[i].oper - I 'i
mycache->pmpt(i).lpq_ca • -1;





mem->pmt [i] •time_stamp - 0;





PURPOSE : This function is used to get the paqe map table so as to





int i - O,numpmts - 0;
"
n~s - PMTSIZECA I 2;








FUNCTION paqefault han~er FIFOtimestamp()
PURPOSE This function is used to replace the page in cache so as to






int i - O,maximum - 0:
int j - O,k - O,baddr - 0;
int replaced-paqe - O,~l - O,~_frame - O,pgno_cache - 0;
int tempindex - 0;
float upd time - O.O,pq time - 0.0:
struct arr temp (30] : -
/* Here an LRU replacement policy is used to replace the paqe
using the time stamp */












































mycache->pmpt[~ frame].time stamp - -1:
delay++: - -
pgno_cache - mycache->pmpt[rpg frame].lpg ca:
delay++; --
mycache-~t[~ frame].lpg ca - -1;
delay++; - -
tf (mycache->pmpt [rpg_framel .oper - 'v')















perf[perff].update_time+- 6 * (512 * 0.0005);
delay++;















delay - delay + 6 * 512;
I * here the paqe map tables are being updated *I
mycache-~t[baddr + fpgref].reaidbit - 1;
delay++;
mycache-~t(baddr+ fpgref].lpg_ca - pgno_cache;
delay++:







PURPOSE : This function is used to get the page map table in the ...-ory




int i - 0:









PURPOSE : This function is used to get the page map table in cae. so as




int i - 0;
int numpmts - 0;
numpmts - PMTSIZECA/2:












FUNCTION : obtain-pcb() j h




int i - 0;
for(i - Oii < MAXJOBSii++)
"





PURPOSE : This function is used to get the free page in cache to allot




int i - 0;










Candidate for the degree of
Master of Science
Thesis: CACHE PERFORMANCE ANALYSIS: A TRACE-DRIVEN SIMULAnON
Major Field: Computer Science
Biographical:
Personal Data: Born in Hyderabad, INDIA, on December 18, 1968, daughter
of N. Sreemm and N. Chandra Leela.
Education: Graduated from 5t. Anns Junior College, Hyderabad. INDIA in
May 1985; received Bachelor of Engineering (Hoos) degree in Qlemjcal
Engineering from Birla Institute of Technology and Science. PilaDi,
Rajasthan, INDIA in June 1990. Completed the requirements for the Master
of Science degree in Computer Science at the Computer Science
Department at Oklahoma State University in July 1994.
Experience: Worked as design engineer for Gwalior Rayon Industries; employed
by Oklahoma State University, University Computer Center as a piuate
research assistant from October 1992 to June 1994.
