Research on computer systems benchmarking by Smith, Alan Jay
NASA-CR-201398
Final Report
February, 1996
NASA - Ames Grant NCC2-550
Research on Computer Systems Benchmarking
Principal Investigator:.
Profem, or Alam lay Smith
Computer Science Division
EECS Department, Electronic Research Laboratory
University of California
Berkeley, CA 94720-1776
smith@cs.berkeley.edu 510-642-5290
j
This grant addresses the topic of Research on Computer Systems Benchmarking and is
more generally concerned with performance issues in computer systems. This report reviews
work in those areas during the period of NASA support under this grant.
The bulk of the work performed under this grant was done by a graduate student, Rafael
Saavedra-Barrera, who received his Ph.D. during the period of funding. (Ph.D., February, 1992,
"CPU Performance Evaluation and Execution Time Prediction Using Narrow Spectrum Bench-
marking", (Computer Science Division Technical Report UCB/CSD 92/684), MS, September,
1988, "Machine Characterization and Benchmark Performance Prediction" (UC Berkeley Com-
puter Science Division Technical Report 88/437). Saavedra-Barrera was the winner of the David
Sakrison award for the best Ph.D. thesis in the EECS Department in 1991-92.)) His research con-
cemed benchmarking and analysis of CPUs, compilers, caches and benchmark programs. The
first part of this work concemed the issue of benchmark performance prediction. ("Machine
Characterization Based on an Abstract High Level Language Machine", Rafael Saavedra-Barrera,
Alan Jay Smith and Eugene Miya, IEEE Transactions on Computers, special issue on Perfor-
mance Evaluation, December, 1989, 38, 12, pp. 1659-1679.) Runs of a benchmark or a suite of
benchmarks are inadequate to either characterize a given machine or to predict the running time
of some benchmark not included in the suite. Further, the observed results are quite sensitive to
the nature of the benchmarks, and the relative performance of two machines can vary greatly
depending on the benchmarks used. In this first paper, we reportecd on a new approach to bench-
marking and machine characterization. The idea is to create and use a machine characterizer,
which measures the performance of a given system in terms of a Fortran abstract machine. For-
tran is used because of its relative simplicity and its wide use for scientific computation. The ana-
lyzer yields a set of parameters which characterize the system and spotlight its strong and weak
points; each parameter provides the execution time for some primitive operation in Fortran. We
presented measurements for a large number of machines ranging from small workstations to
supercomputers. We then combined these measurements into groups of parameters which relate
to specific aspects of the machine implementation, and used these groups to provide overall
machine characterizations. We also defined the concept of pershapes, which represent the level of
performance of a machine for different types of computation. We introduced a metric based on
pershapes that provides a quantitative way of measuring how similar two machines are in terms of
their performance distributions. This metric was related to the extent to which pairs of machines
have varying relative performance levels depending on which benchmark is used.
In another paper to come out of his research, ("Performance Characterization of Optimiz-
ing Compilers", Rafael Saavedra-Barrera and Alan Jay Smith, IEEETSE, July, 1995, vol. 21, no.
https://ntrs.nasa.gov/search.jsp?R=19960027466 2020-06-16T03:53:17+00:00Z
-2-
7, pp.615-628),Saavedranalyzedcompilerperformance.Optimizingcompilershavebecome
anessentialcomponentin achievinghigh levelsof performance.Varioussimpleandsophisti-
catedoptimizationsare implementedat different stagesof compilationto yield significant
improvements,but littleworkhasbeendonein characterizingtheeffectivenessof optimizers,or
in understandingwheremostof thisimpm'_emeracomesfrom. In thispaperwestudiedtheper-
formanceimpactof optimizationin thecontextof ourmethodologyforCPUperformancecharac-
terizationbasedontheabstractmachinemodel.Themodelconsideredall machinestobediffer-
entimplementationsof thesamehighlevellanguageabstr_ machine;in previousresearch,the
modelhasbeenusedasabasisto analyzemachineandbenchmarkperformance.In thispaper,
we:1)showedthatourmodelcanbeextendedto characterizetheperformanceimprovementpro-
videdby optimizersandto predicttheruntimeof optimizedprograms;2) measuredtheeffec-
tivenessof severalcompilersin implementingdifferentoptimizationtechniques;and3)analyzed
theoptimizationopportunitiespresentin theFortranSPECandotherbenchmarks.
Benchmarkprogramsareanalyzedin anotherpaperbySaavedra("Analysisof Benchmark
CharacteristicsandBenchmarkPerformancePrediction",RafaelSaavedra-BarreraandAlan Jay
Smith,TechnicalReportUCB/CSD-92-715,December,1992,submittedforpublication).Stan-
dardbenchmarkingprovidestheruntimesfor givenprogramsongivenmachines,butfailstopro-
videinsightasto whythoseresultswereobtained(eitherin termsof machineorprogramcharac-
teristics),andfailsto provideruntimesfor thatprogramonsomeothermachine,or someother
programson thatmachine.Wehavedevelopedamachine-independentmodelof programexecu-
tion to characterize both machine performance and program execution. By merging these
machine and program characterizations, we can estimate execution time for arbitrary
machine/program combinations. Our technique allows us to identify those operations, either on
the machine or in the programs, which dominate the benchmark results. This information helps
designers in improving the performance of future machines, and users in tuning their applications
to better utilize the performance of existing machines. Here we applied our methodology to char-
acterize benchmarks and predict their execution times. We presented extensive run-time statistics
for a large set of benchmarks including the SPEC and Perfect Club suites. We showed how these
statistics can be used to identify important shortcomings in the programs. In addition, we gave
execution time estimates for a large sample of programs and machines and compare these against
benchmark results. Finally, we developed a metric for program similarity that makes it possible
to classify benchmarks _vith respect to a large set of characteristics.
Saavedra also considered the effect of the memory hierarchy in: ("Measuring Cache and
TLB Performance and Their Effect on Benchmark Run Times", Rafael Saavedra-Barrera and
Alan Jay Smith, IEEE TC, October, 1995, 44, 10, pp. 1223-1235.) In previous research, we
developed and presented a model for measuring machines and analyzing programs, and for accu-
rately predicting the running time of any analyzed program on any measured machine. That work
is extended in this paper by: (a) developing a high level program to measure the design and per-
formance of the cache and TLB for any machine; (b) using those measurements, along with pub-
lished miss ratio data, to improve the accuracy of our run time predictions; (c) using our analysis
tools and measurements to study and compare the design of several machines, with particular ref-
erence to their cache and TLB performance. As part of this work, we described the design and
performance of the cache and TLB for ten machines. The work presented in this paper extends a
powerful technique for the evaluation and analysis of both computer systems and their workloads;
this methodology is valuable both to computer users and computer system designers.
A summary of some of the early work on this project appears in "Performance Prediction
by Benchmark and Machine Analysis", Rafael Saavedra-Barrera and Alan Jay Smith, Computer
Science Division Technical Report UCB/CSD 90/607, December, 1990. That paper has been
revised and updated, and should appear shortly as a book chapter. In this paper, we present a new
-3-
methodologyforCPUperformanceevaluationbasedontheconceptof anabstractmachinemodel
andcontrastit with benchmarking. The model consists of a set of abstract parameters represent-
ing the basic operations and constructs supported by a particular programming language. The
model is machine-independent, and is thus a convenient medium for comparing machines with
different instruction sets. A special program, called the machine characterizer, is used to measure
the execution times of all abstract parameters. Frequency counts of parameter executions are
obtained by instrumenting and running programs of interest. By combining the machine and pro-
gram characterizations we can and do obtain accurate execution time predictions. This abstract
model also permits us to formalize concepts like machine and program similarity. A wide variety
of computers, from low-end workstations to high-end supercomputers, have been analyzed, as
have a large number of standard benchmark programs, including the SPEC scientific benchmarks.
We present many of these results, and use them to discuss variations in machine performance and
weaknesses in individual benchmarks. We also explain how the basic model can be extended to
account for the effects of compiler optimization, memory hierarchy, and vectorization. We also
indicate, when appropriately, how these factors affect our ability to predict the execution time of
programs. More details are given in the appropriate references.
The work by Saavedra was continued into the domain of parallel and vector machines by
Stephen Von Worley. (Steven Von Worley, MS, May, 1995, "Microbenchmarking and Perfor-
mance Prediction for Parallel Computers"; "Microbenchmarking and Performance Prediction for
Parallel Computers", (Stephen Von Worley and Alan Jay Smith), Technical Report
UCB/CSD-95-873, May, 1995, submitted for publication.) In that work, we extended the earlier
work to parallel computers. We described a portable benchmarking suite and performance pre-
diction methodology which accurately predicts the run times of Fortran 90 programs running
upon supercomputers. The benchmarking suite measures the optimization capabilities of a given
Fortran 90 compiler, execution rates of abstract Fortran 90 operations, and the processing charac-
teristics of the underlying architecture as exposed by compiler-generated code. To predict the run
time of an arbitrary program, we combine our benchmark results with dynamic execution mea-
surements, and augment the resulting prediction with simple factors which account for overhead
due to architecture-specific effects, such as remote reference latencies. We measured two super-
computers: a dedicated 128-node TMC CM-5, a distributed memory multiprocessor, and a 4-node
partition of a Cray YMP-C90, a tightly-integrated shared memory multiprocessor. Our measure-
ments show that the performance of the YMP-C90 far outstrips that of the CM-5, due to the qual-
ity of the compilers available and the architectural characteristics of each machine. To validate
our prediction methodology, we predicted the run time of five interesting kernels on these
machines; nearly all of the predicted run times are within 50-percent of actual run times, much
closer than might be expected.
In addition to the work described above, related work has also benefitted to some extent by
NASA support. A student named Jeff Gee finished his Ph.D. in 1993 ("Analysis of Cache Perfor-
mance in Vector Processors and Multiprocessors"). The first part of that work addressed the
issue of the use of caches in vector processors. In the paper "The Performance Impact of Vector
Caches", (Gee and Smith, Proc. 25'th Hawaii Intl. Conf. on System Sciences, January, 1992,
Hawaii, Volume I, pp. 437-448), we considered whether vecctor supercomputers should have
caches. Cache memories have not been used for vector supercomputers, as far as we know,
because of a belief that program behavior in relevant workloads was such as to preclude efficient
cache operation. It has been possible to make efficient use of such machines by carefully pro-
gramming around the resulting long memory delays, although unmodified, "dusty-deck" code
usually performs poorly. In related research, we have found that hit ratios are high for large
caches in processors with vector workloads. In this paper, we addressed the specific issue of the
direct effect of cache memory on vector processor performance. The issue in processor design is
machine performance, of which the hit ratio of the cache is only one determinant. In this paper,
-4-
we simulated three vector processors, the designs for which are derived from expected technology
changes applied to the Ardent Titan. Our simulator was an accurate timing model incorporating
the necessary aspects of the design of the cache and memory system. We found that current
trends in memory and processor performance will lead to increasingly severe memory speed and
bandwidth limitations. Either of two designs using large cache memories (2MB, 4MB) on the
average double processor performance relative to the design without a cache. Hit ratios for
almost all of the programs used for trace driven simulation, drawn from real Ardent workloads,
are over 99%. Based on me work presented here and elsewhere, we recommend that future super-
computers incorporate cache memories.
The second vector cache study with Gee ("The Effectiveness of Caches for Vector Proces-
sors", Jeffrey Gee and Alan Jay Smith, Proc. Int. Conf. on Supercomputing, Manchester, Eng-
land, July 11-15, 1994, pp. 333-343.) more directly studied the behavior of vector workloads.
Vector processors have typically used vector registers, interleaved memory, and pipelined access
to data to provide sufficient memory system performance. Caches have been used mainly for
instructions and scalar data, while vectors are usually uncached, presumably partially because of
the belief that there is insufficient vector locality in these workloads. In this study we used mem-
ory address traces from an Ardent Titan to examine both reference locality and cache perfor-
mance in a vector processing environment. Many of the Titan traces are from real vectorized
applications which reference large amounts of data. We found that vector references contain
somewhat less temporal locality, but large amounts of spatial locality compared to instruction and
scalar references. Cache miss ratios were found to be comparable to those measured and pub-
lished previously for various non-vectorized workloads. We provided analyses of trace behavior
with regard to parameters of interest to cache designers. Calculations based on our measured
miss ratios indicated that caches will improve average access times, which in tum can be expected
to translate into significant improvements in machine performance. Arguments suggesting other-
wise were discussed and considered.
The second part of Gee's research concerned cache memory design for multiprocessors. In
the first paper with Gee, ("Analysis of Multiprocessor Memory Reference Behavior", Jeffrey
Gee, Alan Jay Smith, Proc. ICCD'94 (IEEE Intl. Conf. on Computer Design: VLSI in Computers
and Processors), Cambridge, MA, October 10-12, 1994, pp. 53-59), we analyzed multiprocessor
memory reference behavior. Shared-memory multiprocessors can provide impressive perfor-
mance at reasonable costs, although private caches are usually needed to alleviate the potential
bottleneck at shared memory. These private caches in turn require the use of cache-consistency
(coherency) protocols, whose performance is a strong function of the reference behavior within
multiprocessor applications. In this paper we characterized the memory reference behavior in a
wide variety of scalar and vector multiprocessor address traces from production workloads. This
analysis was for the purpose of estimating and improving the performance of cache-consistency
protocols. Our analysis extended previous results in the literature by performing a wider variety
of analyses, and analyzing a larger and more diverse set of multiprocessor traces, including a pro-
duction vector workload. We found wide differences between the sharing behavior observed in
vector and scalar applications. Compared to scalar programs, vector programs reference shared
data more frequently and contain larger amounts of processor locality, the tendency for shared
data to be used by only one processor over periods of time. Write sharing by different processors
over short intervals are infrequent in one workload but frequent in another. This implies that
sequentially-consistent programming models will remain necessary unless applications are
recoded to avoid such reference patterns.
The second MP-cache paper with Gee, ("Evaluation of Cache Consistency Algorithm Per-
formance", Jeffrey Gee and Alan Jay Smith, Proc. Mascots'96 (intl. Workshop on Modeling,
Analysis and Simulation of Computer and Telecommunications Systems) Conference, pp.
-5-
236-249,FebruaryI-3, 1996,SanJose,CA.),presentedtheresultsof extensivesimulationsof
multiprocessorcacheconsistencyalgorithms.Asmorecomputersystemsturnto multiprocessing
for improvedperformance,additionalresearchisneededtoevaluateandimprovetheperformance
of cacheconsistencyprotocols.In thisstudy,weusedtrace-drivensimulationtoexaminetheper-
formanceof severalconsistencyprotocols,includingsomenewadaptiveprotocolswhichhavenot
beenexaminedin priorresearch.Thisstudyusedawidervarietyof tracesthanhavebeenprevi-
ouslyanalyzed,includingsomeproductionapplicationsfroma vectormini-supercomputersys-
tem,andpresentedawidervarietyof analysesthanhavebeenpreviouslyshownfor agivenwork-
load. Wefoundthatthesharingcharacteristicsof applicationprogramshavealargebearingon
therelativeperformanceof thedifferentprotocols.Update-based protocols outperform invali-
date-based protocols when accesses to shared data are highly interleaved among different proces-
sors (fine-grain sharing), while invalidate-based protocols are superior if one processor performs
all accesses to shared data over long periods of time (coarse-grain sharing). Adaptive protocols
provide the best overall performance across all applications; we present a new protocol called
Update-Once, which yields the highest average performance. In even the best cases, however,
estimated processor utilizations are unacceptably low due to the overhead to maintain consistent
caches. To extract good performance from multiprocessor systems, existing application programs
must be recoded to reduce sharing between processors.
Gee also was a collaborator in a paper in which we measured the miss ratios for the SPEC
benchmarks: ("Cache Performance of the SPEC Benchmark Suite" (Jeffrey Gee, Mark Hill,
Dionisios Penvmatikatos, Alan Smith), IEEE MICRO, 13, 4, August, 1993, pp. 17-27. The SPEC
benchmark suite consists a set of public-domain, non-trivial programs that are widely used to
measure the performance of computer systems, particularly those in the Unix workstation market.
These benchmarks were expressly chosen to represent real-world applications and were intended
to be large enough to stress the computational and memory system resources of current-
generation machines. The extent to which the SPECmark (the figure of merit obtained from run-
ning the SPEC benchmarks under certain specified conditions) accurately represents performance
with live real workloads is not well established; in particular, there has been some question
whether the memory referencing behavior (cache performance) is appropriate. In this paper, we
presented measurements of miss ratios for the entire set of SPEC benchmarks for a variety of
CPU cache configurations; this study extends earlier work that measured only the performance of
the integer (C) SPEC benchmarks. We found that instruction cache miss ratios were generally
very low, and that data cache miss ratios for the integer benchmarks were also quite low. Data
cache miss ratios for the floating point benchmarks were more in line with published measure-
ments for real (i.e. non-benchmark, non-synthetic) workloads. We believe that the discrepancy
between the SPEC benchmark miss ratios and those observed elsewhere is partially due to the fact
that the SPEC benchmarks are all almost exclusively user state CPU benchmarks run until com-
pletion as the single active user process. We therefore believe that SPECmark performance levels
may not reflect system performance when there is multiprogramming, time sharing and/or signifi-
cant operating systems activity.
Another student, John Tse, looked at the issue of prefetching into CPU cache memories.
(John Tse, MS, June, 1995, "Performance Evaluation of Cache Prefetching Strategies"; "Perfor-
mance Evaluation of Cache Prefetch Implementation", John Tse and Alan Jay Smith, Technical
Report UCB/CSD-95-873, June, 1995, submitted for publication.) Prefetching into CPU caches
has long been known to be effective in reducing the cache miss ratio, but implementations of
prefetching have been unsuccessful in improving CPU performance. The reasons for this are that
prefetches interfere with normal cache operation by making cache address and data ports busy,
the memory bus busy and the memory banks busy, and by not necessarily being complete by the
time that the prefetched data is actually referenced. In this paper, we presented the results of a
very detailed cycle by cycle trace driven simulation of a uniprocessor memory system, in which
-6-
wevaryseveralrelevantarchitecturalparametersin ordertodeterminewhenandif prefetchingis
useful.Wefoundthatinorderforprefetchingto actuallyimproveperformance,theaddressarray
needsto bedoubleported,andthedataarrayneedstoeitherbedoubleportedor fullybuffered.It
isalsoveryhelpfulfor thebustobereasonablywide,bustransactionsto besplitandmainmem-
ory to be interleaved.Underthebestcircumstances,i.e.with a significantinvestmentin extra
hardware,prefetchingcansignificantlyimproveperformance.
A studentnamedJeffRothmanhasimplementedaparallelprogramtracer,andiscollecting
tracesfromavarietyof workloads.Heiscurrentlyconcentratingonthedesignof sectorcaches
foruni-andmultiprocessormachines.Ourplansaretodothefollowing:(a)Characterizetheper-
formanceof sectorcaches,anddetermineoptimalsectorcachedesigns;suchdesignsareparticu-
larlyapplicablefor microprocessorbasedsystemswithon-chiptagstorageandoff-chipdatastor-
age.(b) Extendthe studyof sectorcachesto themultiprocessorcase.(c) Evaluatetheeffec-
tivenessof anewsectorcachedesignthatI haveinvented.(d)Studytheeffectof changesin the
sourcecodeonsharingpatternsandtheeffectivenessof consistencyalgorithms.Rothmanspent
fall of 1993visitingatSiemens(Munich),wherehedevelopedacodeanalyzerwhichcanbeused
forcodereorganizationto minimizebustrafficinMPsystems.
In workby ChrisPerleberg("BranchTargetBufferDesignandOptimization",ChrisPer-
lebergandAlanJaySmith,IFEETC, 42, 4, April, 1993, pp. 396-412), we studied Branch Target
Buffers. A Branch Target Buffer (BTB) can reduce the performance penalty of branches in
pipelined processors by predicting the path of the branch and caching information used by the
branch. This paper discussed two major issues in the design of BTBs, with the goal of achieving
maximum performance with a limited number of bits allocated to the BTB implementation. First
is the issue of BTB management - when to enter and discard branches from the BTB. Higher per-
formance can be obtained by entering branches into the BTB only when they experience a branch
taken execution. A new method for discarding branches from the BTB was examined. This
method discards the branch with the smallest expected value for improving performance, outper-
forming the LRU strategy by a small margin, at the cost of additional complexity. The second
major issue discussed was the question of what information to store in the BTB. A BTB entry
can consist of one or more of the following: branch tag (i.e. the branch instruction address), pre-
diction information, the branch target address, and instructions at the branch target. A variety of
BTB designs, with one or more of these fields, were evaluated and compared. This study was
then extended to multilevel BTBs, in which different levels in the BTB have different amounts of
information per entry. For the specific implementation assumptions used, multi-level BTBs
improved performance over single level BTBs only slightly, also at the cost of additional com-
plexity. Multi-level BTBs may, however, provide significant performance improvements for other
machines implementations. Design target miss ratios for BTBs were developed, so that the per-
formance of BTBs for real workloads may be estimated.
A new graduate student, Winston Hsu, has just started work on branch prediction. In previ-
ous work (with John K-F Lee and Chris Perleberg), we studied prediction algorithms for
branches, and the design of branch target buffers. That work, and other work in the field, has not
sufficiently considered two items that we plan to study. First, we would like to look at the effec-
tiveness of branch prediction if one has access to the source code. In particular, one may be able
to compute the branch direction earlier in the source code and/or compute an effective hint. Sec-
ond, with superscalar or VLIW machines, extra hints or precomputations can often be done for
free, using otherwise unoccupied slots in the pipelines. Given the much higher penalty for unpre-
dicted branches in such machines, we believe that there is a large potential payoff to such a study.
Another student, Ricki Blau, also finished her Ph.D. during the period of this grant. (Ricki
Blau, Ph.D., December, 1992, "Performance Evaluation for Computer Image Synthesis Sys-
tems"). Her dissertation applied performance analysis to the problem of computing complex-
-7-
three dimensional images. First, it identifies factors that affect the cost of image synthesis and
characterizes the complexity of realistic images. Four categories of performance factors are
defined: scene characteristics, viewing specifications, rendering parameters and the computing
environment. This classification provides a framework for discussing image complexity and
designing performance experiments. The complexity of several complex images from an actual
animation workload is described in detail. A methodology is presented for the construction of
reproducible and controllable performance measurement experiments. To measure the perfor-
mance of a rendering system, an experimenter provides a set of test data, including image specifi-
cations. The dissertation describes a portable tool that generates test cases, varying the scene
characteristics and viewing specifications under the control of a set of parameters. This model
generator has been implemented for two different rendering systems. Its test cases have been
used to detect performance differences between the two systems and to evaluate the effects of
varying the scene characteristics. The last part of the dissertation addresses the workload parti-
tioning problem for MIMD rendering systems. A simple, low-overhead adaptive algorithm bal-
ances the workload effectively on a 16-node rendering accelerator. The algorithm uses the ren-
dering time observed for one frame to predict costs for the next frame. The resulting cost esti-
mates can be used by a second algorithm to divide the work among the available processing
nodes. The cost estimates are approximate, but are obtained with little overhead. The net result
is an improvement of thirty to eighty percent over the previous load balancing schemes for pro-
duction quality rendering. An analysis of several competing schemes demonstrates that tradeoffs
between balancing the load and preserving locality are a key consideration in the design of a par-
allel rendering system.
Jay Lorch finished an MS on a study of power consumption in the Apple Powerbook.
(Jacob Lorch, MS, December, 1995, "A Complete Picture of the Energy Consumption of a
Portable Computer"). A paper on this work is in preparation: High battery lifetime is important to
the usability and acceptance of portable computers. In order to develop strategies to minimize
power consumption, designers need a good picture of the total power consumption of a system.
For this purpose, we indicated the power consumptions of a set of portable computers and how
they are broken down among the components of those computers. Then, we use user profiles to
show how the use of power-saving features currently implemented serves to reduce these power
consumptions by 41--66. We also show how these power-saving features affect the breakdown of
overall power consumption, so that we can evaluate how successful certain new software tech-
niques and hardware changes would be at reducing power consumption. The results of this paper
point out the most promising avenues for further work in the reduction of power consumption,
and indicate some strategies that can provide an immediate power benefit to the class of machines
studied.
In work principally carried out by a graduate student, Vigyan Singhal, we collected and
studied traces from database systems. We were mainly concerned with the issue of locking and
concurrency control. Concurrency control is essential to the correct functioning of a database due
to the need for correct, reproducible results. For this reason, and because concurrency control is a
well formulated problem, there has developed an enormous body of literature studying the perfor-
mance of concurrency control algorithms. Most of this literature uses either analytic modeling or
random number driven simulation, and explicitly or implicitly makes certain assumptions about
the behavior of transactions and the patterns by which they set and unset locks. Because of the
difficulty of collecting suitable measurements, there have been only a few studies which use trace
driven simulation, and still less study directed toward the characterization of concurrency control
behavior of real workloads. In a paper written with Singhal (" Characterization of Contention in
Real Relational Databases" Technical Report UCB/CSD-94-801, Computer Science Division,
UC Berkeley, March, 1994, to appear, VLDB Journal), we present a study of three database work-
loads, all taken from IBM DB2 relational database systems running commercial applications in a
-8-
productionenvironment.Thisstudyconsiderstopicssuchasfrequencyof lockingandunlocking,
deadlockandblocking,durationof locks,typesof locks,correlationsbetweenapplicationsof
locktypes,two-phasevs.non-two-phaselocking,whenlocksareheldandreleased,etc. In each
case,weevaluatedthebehaviorof theworkloadrelativeto theassumptionscommonlymadein
theresearchliterature,anddiscusstheextento whichthoseassumptionsmayormaynot leadto
eroneousconclusions.Wealsopresenteda simplemathematicalmodelwhichpredictsthefre-
quencyof blockingto beexpectedin theseworkloads,andcomparethosepredictionsto the
observedfrequency.SinghalreceivedhisM.S.degree(undermy supervision)in Spring,1994.
HefinishedhisPh.D.(in theareaof CAD)in 1995.
In workprincipallycarriedoutbyagraduatestudent,BarbaraTockeyZivkov,anextensive
studyof diskcachingperformanceis in progress.Thisstudypresentsextensiveanalysisof disk
tracestakenfrom a varietyof real,productioncomputersystems,includinga bank(Security
Pacific),a transportationcompany(CrowleyMaritime),a telecommunicationscompany,anoil
company(GulfOil), MonarchMarkingcompany,andtwootherlargecorporations(whopreferto
remainanonymous).ThesystemstracedincludebothIBM andHoneywellmainframesystems,
andincludebothnormaloperatingsystemstracesanddatabasesystemstraces.Someof thelatter
wereobtainedwith thehelpof researchersat IBM Almadenresearchlaboratory.Thepaperin
preparation(whichshouldbe finishedby spring,1996)characterizesthetraces,andthenstudies
diskcachingperformanceunderavarietyof algorithms.Zivkovisexpectedto finishherMSthis
spring.
Currentlyinprogressisaneffort(bygraduatestudentsMinZhouandJayLorch)to collect
tracesfromPCsystems(bothIntelX86basedPCsandApplecomputers).Thesetracesarebeing
collectedfor two reasons.First,wearein theearlystagesof a researchprojectin theareaof
algorithmsforpowermanagementi portablecomputers.Thetracescollectedwill containinfor-
mationonuserinput,diskactivity,andapplicationprogramactivity.Withregardto I/O systems,
weexpecthetracesto beusedfortwopurposes.First,wewill beusingthemtostudyalgorithms
for minimizingdiskpowerconsumption.Wecandothisbydeterminingwhenthedisk is likely
to beidle andthustumingit off. Wecanalsoimproveperformanceby successfullyimproving
diskcachingperformance(fewerdisk I/Os)andby prefetchingandretainingin semiconductor
storagethoseportionsof thediskaddresspacethatarelikely to beneededin thenearfuture. In
addition,weexpecto usethetracesto studythedesignof diskcachesin thePCenvironment.
Thisisasubjectof greatindustrialinterest,butit doesn'tseemtohavebeenaddressedbythedisk
orPCindustries.
