Event-based Re-training of Statistical Contention Models for Heterogeneous Multiprocessors by Alex Bobrek et al.
Event-based Re-training of Statistical Contention Models
for Heterogeneous Multiprocessors
Alex Bobrek
ECE Department
Carnegie Mellon University
Pittsburgh, PA 15213 USA
abobrek@ece.cmu.edu
JoAnn M. Paul
ECE Department
Virginia Tech
Blacksburg, VA 24061 USA
jmpaul@vt.edu
Donald E. Thomas
ECE Department
Carnegie Mellon University
Pittsburgh, PA 15213 USA
thomas@ece.cmu.edu
ABSTRACT
Embedded single-chip heterogeneous multiprocessor (SCHM)
systems experience frequent system events such as task pre-
emption, power-saving voltage/frequency scaling, or arrival
of new events/data from the outside world. Traditionally,
the designers model these events by explicitly coupling them
to corresponding simulation events within environments such
as SystemC. However, this approach places a burden on the
designer to identify which events are important enough to
be captured by simulation, resulting in an overly conser-
vative selection of events to model. This work presents
a technique for de-coupling of system events from simu-
lation events, removing the burden from the designer to
determine which events signiﬁcantly aﬀect the system per-
formance model, while decreasing simulation runtime. En-
abling this de-coupling is a prediction model that quantiﬁes
the magnitude of changes introduced by system events, and
identiﬁes those important enough to be considered simula-
tion events. The prediction model is evaluated by over 4000
separate scenarios featuring dynamic event changes to pro-
cessor speed, bus speed, application type, and application in-
put data, ﬁnding that almost 70% of tested events impacted
contention modeling by less than 10%. Without resorting
to detailed simulation, the prediction model captures the
system event eﬀects to within 5% of actual measured error,
while staying within 18% in 95% of all tests.
Categories and Subject Descriptors: C.4 [Performance
of Systems]: Modeling techniques, Performance attributes;
I.6.5 [Simulation and Modeling]: Model Development – Mod-
eling methodologies
General Terms: Performance, Design
Keywords: Performance Modeling, Simulation, Statistical
Contention Modeling, Heterogeneous Multiprocessors
1. INTRODUCTION
Single chip heterogeneous multiprocessor (SCHM) systems
contain tens of heterogeneous processors, a variety of com-
munication and memory architectures, and are often focused
on certain workload classes. Tens to hundreds of times per
second SCHMs experience system events such as preemp-
tive scheduling, scaling frequency/voltage or shutting down
parts of the chip to save power, and responding to outside
events and data. To model the performance eﬀects of these
events, designers manually identify and couple each system
event to a simulation event within a system simulation en-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
CODES+ISSS’07, September 30–October 3, 2007, Salzburg, Austria.
Copyright 2007 ACM 978-1-59593-824-4/07/0009 ...$5.00.
Entire assumption space
Initial  assumptions for the
pre-event performace model
New assumptions, changed
due to a system event B
Magnitude of system change
A
Figure 1: Change in model assumptions due to sys-
tem events.
vironment such as SystemC. However, this one-to-one cou-
pling of system and simulation events places the burden on
the designer to identify which system events are important
enough to be represented in the simulation. Conservatively
specifying too many events reduces simulation performance,
while specifying too few aﬀects accuracy.
This paper presents a technique for de-coupling of system
events from simulation events, removing the burden from
the designer to determine which events signiﬁcantly aﬀect
the system performance model, while decreasing simulation
runtime. By monitoring the assumptions made by the per-
formance model prior to the system event, and comparing
them to the assumptions’ state post-event, our approach es-
timates the introduced model error. Figure 1 shows a simple
2D view of the assumption space with a small gray area rep-
resenting the acceptable assumption region for the current
performance model. Two consecutive system events are il-
lustrated, with their changes shown by magnitudes of vectors
A and B. These vectors are proportional to the modeling er-
ror expected due to the change in the system; because of the
smaller deviation in assumptions, system event A is expected
to have a smaller impact on model error than B. The key
challenge tackled by this paper is how to quantify the mag-
nitude of vectors like A and B by isolating the parameters
reﬂecting the changes in model assumptions. Identifying the
magnitude of system changes allows the impact of a system
event to be deﬁned, allowing large magnitude events like B
to be coupled to a simulation event.
As a basis for this work, we will focus on the statisti-
cal contention modeling presented in [3], which uses short
training runs of a cycle-accurate (CA) simulation to train a
high level model for estimating contention within SCHMs.
This statistical approach captures application and system-
speciﬁc information during training and is able to apply it
to estimate contention within 1% of CA simulation while
operating 40X faster. However, when a system event re-
sults in a signiﬁcant deviation in model assumptions (large
vector magnitude in Figure 1), a simulation event must be
speciﬁed, resulting in re-training of the statistical model via
detailed simulation.
The contribution of this paper is a quantitative approach
that, by comparing the distributions of key shared resource
access statistics, can determine whether an SCHM system
event should be considered a simulation event without re-
sorting to detailed CA simulation. Figure 2 illustrates the
simulation advantages gained by identifying which system
events are important simulation events. Timeline A repre-
69sents a situation where every event is treated as a simula-
tion event, requiring re-training of the statistical contention
model (shown as dark blocks). Since re-training must be
done at the CA level, the penalty for model switching is
signiﬁcant. However, by identifying important simulation
events (thick lines in Timeline B),the number of re-trainings
can be reduced, increasing overall simulation performance
without signiﬁcantly aﬀecting accuracy.
In the experiments presented in this paper, featuring over
4000 separate SCHM scenarios with dynamic event changes
to processor speed, bus speed, application type, and applica-
tion input data, we found that almost 70% of system event
changes tested had less than 10% impact on the statisti-
cal contention model accuracy. By observing the changes in
key statistics of shared resource access patterns, we identify
the remaining 30% of simulation events with high impact
on contention, ensuring that only the events that matter to
the overall accuracy are simulated in detail. This predic-
tion model can assess system change impact to on average
within 5% of actual measured error, while staying within
18% in 95% of all tests.
This paper will ﬁrst give the necessary background on the
statistical contention model and its simulation framework.
We will review access attribute-based contention modeling
introduced in [3], and show how statistics related to those
attributes can be used to evaluate which system-changing
events aﬀect contention the most. Finally, we will describe
the experiments featuring system changes due to runtime
events, while drawing conclusions about their eﬀect on con-
tention and the resulting accuracy of the statistical con-
tention model.
2. STATISTICAL CONTENTION
MODELING
The purpose of this section is to provide some background
on the statistical contention modeling approach introduced
in [3] by describing the underlying simulation framework
along with the access attribute-based model used to capture
contention behavior.
2.1 Simulation Framework
The statistical contention model is based on the Model-
ing Environment for Software and Hardware (MESH) frame-
work. MESH [4] allows designers to answer questions about
how the numbers and types of processors and communica-
tions resources, the scheduling decisions, and the software
tasks (mapping and complexity) aﬀect the overall perfor-
mance of SCHM systems. MESH increases simulation per-
formance by executing application code natively on the host
platform to capture data dependent execution, but emu-
lates target system performance through annotations in-
serted within the code. This approach, called execution-
driven simulation with cross-proﬁling or back-annotation of
target information, has been commonly used for traditional
multiprocessor simulation [6], as well as simulation of SCHM
systems [1] [4].
Figure 3 shows a simple application control ﬂow graph ex-
ecuting natively on the host simulator. At the end of each
basic block, an annotation is inserted estimating the tar-
get processor performance. The annotation, shown within a
box in the detailed area of Figure 3, is not a simple delay
Wall Clock Time Re-training periods (CA Simulation)
Fast simulation using the statistical
contention model (40X faster than CA)
A
B
System Events
Simulation Events
Figure 2: Performance impact of identifying crucial
simulation events from a group of system events.
Host Simulator
annotation
annotation annotation
annotation
for (i=0;i<MAX_COLS;i++) {
  result += matrixA[i][row] * 
    matrixB[i]; 
  consume(ADD=1:MUL=1);
}
ARM DSP
Target Processor Models
3 cycles 2 cycles
Figure 3: Execution-based simulation with back-
annotation of target system information.
value, but instead captures the computational complexity of
the natively-executing code. In this example, each loop it-
eration is annotated as one add and one multiply operation,
which are then in turn“consumed”by hardware resources to
determine timing. This level of indirection allows one set of
annotations to capture code performance on multiple het-
erogeneous processors, speeding up the design exploration
process. Note that annotations do not have to contain in-
struction types, and can be made at diﬀerent levels of ab-
straction.
Target system information within annotations and pro-
cessor models can be extracted from a CA simulation of the
target processor, inferred by the compiler, or be manually
inserted by the designer. Because most of the instructions
are executing natively on the host system, this approach re-
sults in about a two orders of magnitude speedup over CA
simulation and can be very accurate (< 1% error) depend-
ing on the quality of annotations. More detail on the MESH
simulator, including its features, accuracy, and performance
is available in [4].
2.2 Access Attribute Models
Although promising, the back-annotated simulation strat-
egy experiences problems when simulating concurrent shared
resource (SR) accesses. Since annotations containing SR ac-
cesses may have their timing aﬀected by contention, only
blocks with timing localized to a single processing element
can be accurately annotated. Therefore, once an SR access
is reached, an annotation must be inserted, forcing an ex-
pensive context switch to the other processors to determine
contention. Solving this problem, the access attribute-based
statistical model summarizes the impact of SR access con-
tention at a high level of abstraction, thus removing the
negative eﬀects of frequent SR accesses on simulator perfor-
mance.
Figure 4 illustrates the statistical contention model’s role
within the MESH simulation via a timeline of two concur-
rent applications running on separate processors while shar-
ing memory. Multiple consecutive accesses to the shared re-
source are assumed to be serviced with no contention, speed-
ing up execution. Once a designer-inserted annotation is
reached, the distribution of skipped accesses during the pre-
ceding block is included with other annotation information.
By combining SR information from multiple concurrently
Time
Processor 1
Processor 2
Penalties Added
Contention
Access Attribute
Information
Fast-forwarded S.R. accesses
Annotation Points
Contention
Model
Figure 4: Fast-forwarding of shared resource ac-
cesses while estimating contention delays.
70executing threads, the MESH simulator calculates the three
access attributes used to estimate contention:
• Average Requested Utilization (ρ): Captures the
overall busyness of the SR.
• Access Balance (B): Quantiﬁes how much requested
utilization for each thread varies with regard to the
average.
• Thread Concurrency (T): Average number of threads
making SR accesses.
A key distinction between annotation information and access
attributes must be made: annotations contain information
only about SR accesses within single thread of execution;
they can be created from a compiler or a single threaded
simulation without the need of knowing how the application
interacts with others. Access attributes, on the other hand,
summarize the SR access behavior for multiple concurrent
threads, capturing the current interleaving of applications
within an SCHM system. Complete details about how ac-
cess attributes are calculated from annotation information
is included in [3].
Once the simulation reaches an annotation point, the three
access attributes are passed back to the statistical contention
model, which estimates contention during the block, and ap-
plies it after the block in a form of a penalty (black boxes
in Figure 4). The statistical contention model trains the
relationship between the three access attributes and the ex-
pected contention delay by sampling a cycle-accurate repre-
sentation of the system. Tests have shown that a represen-
tative CA simulation of about 10 mil. cycles is enough to
statistically describe the contention delay - access attribute
relationship for most systems.
Nonparametric regression (e.g. curve ﬁtting) is used to
describe the contention-attribute relationship of the follow-
ing format:
DPT = f1(ρ) + f2(B) + β × T
where DPT represents contention delay per unit time, and
f(·) are the nonparametric ﬁts to the ρ and B access at-
tributes. In general, as average requested utilization (ρ)
or the concurrency level (T) increase, so does the DPT.
Higher values of balance (B) decrease contention delay, since
they indicate that the system is unbalanced, i.e. one of the
threads makes the majority of S.R accesses, making con-
tention less likely.
Thus, by leveraging the information gathered during train-
ing, the contention model is capable of estimating contention
during large blocks of annotated MESH execution to within
1% of the CA simulation. Skipping large amounts of SR
accesses allows the MESH simulator to perform 40X faster
than CA simulation, while training the contention model
takes a trivial section of the overall runtime. More detail
on the access attribute-based statistical contention model,
including how the three access attributes were selected and
how the model was created and tested, is included in [3].
3. SYSTEM EVENT PRUNING
Deviations in SR accesses patterns due to system events
can invalidate the assumptions set by the statistical con-
tention model during the training process and introduce sig-
niﬁcant modeling errors. Therefore, every signiﬁcant devi-
ation from the trained model assumptions must result in
statistical re-training, a computationally expensive process.
Using a prediction model that estimates the contention mod-
eling error in presence of system events (shown as vector
magnitudes from Figure 1), only the events with a signiﬁ-
cant impact on model error are followed by re-training, al-
lowing the contention model to retain its simulation speed
advantage in presence of frequent system events. This sec-
tion will describe how the necessary statistics for contention
Trained
Model
Attributes
ρ
B
T
ρ
B
T
Event
i
Event
j
New
Attributes
Wall Clock Time
Prediction
Model
Re-train @ j?
Yes/No
sampling
area
Figure 5: Comparing access attributes to determine
need for re-training.
model error prediction are collected and how they are used
to create the prediction model.
3.1 Collecting Comparison Statistics
Any change in SR access distributions is reﬂected within
the distributions of ρ, B, and T attributes. Therefore, by
comparing the populations of access attributes collected from
the baseline (i.e. well-trained) model, to the populations af-
ter the system event, it is possible to capture the magnitude
of system change.
Figure 5 illustrates this concept. At the start of the simu-
lation (event i), the initial contention model is trained (dark
box on the timeline). During this period, samples of ρ, B,
and T access attributes are collected, and their distribu-
tions measured (illustrated as the small probability density
function graphs in Figure 5). When event j is reached, the
MESH simulator continues monitoring the same access at-
tributes for a period following the event (shown as dashed
border box). Note that this sampling can be performed
much quicker in high speed simulation mode than during
detailed training, as illustrated by the relative sizes of the
colored boxes in Figure 5. Upon the completion of the sam-
pling area, the access attributes for both the training model
and the newly acquired data are passed to the prediction
model.
3.2 The Prediction Model
Much like the contention model itself, the prediction model
uses regression to estimate contention model error based on
the populations of ρ, B, and T. Since there are many sta-
tistical methods to compare two populations of data, we
considered a multitude of diﬀerent options. For all values of
ρ, B, and T, we compared common statistics such as me-
dians and standard deviations. Additionally, we looked at
the number of outliers that the new data had outside the
range deﬁned by the trained model. Finally, we ran the
Kolmogorov-Smirnov (KS) test which is used to determine
whether the shapes of two (non-normal) distributions diﬀer
signiﬁcantly.
To capture these relationships, we ran over 4000 diﬀerent
situations (described in the Experiments section) in which
a system is slightly changed from the baseline, either by
changing the speed of one processor, changing the speed of
a shared bus, or changing the application and/or its input
data. For each of these situations, the error of the con-
tention model due to mis-training was found by comparing
the results to a detailed CA simulation.
Unlike the contention model itself, which uses a nonpar-
ametric regression technique, we found that a simpler lin-
ear regression is suﬃcient to describe the relationship be-
tween model error and access attribute deviations. Although
we tried over 10 diﬀerent combinations of medians, stan-
dard deviations, KS-tests, and outliers for all the access at-
tributes, the two statistics that showed the best correlation
to model error were the Average Requested Utilization Dif-
ference (Dρ) and the KS Test of Utilization/Balance Ratio
(KSratio). Table 1 gives more details about these two pre-
diction model explanatory variables (i.e. variables used to
71Table 1: Prediction Model Explanatory Variables
Model Parameter Deﬁnition Comments
Average Requested Utilization Diﬀer-
ence (Dρ)
Dρ = |median(ρT) − median(ρN)| Captures the overall busyness of the shared resource.
KS Test of Utilization/Balance Ratio
(KSratio)
KSratio = KS(
ρT
BT ,
ρN
BN ) Quantiﬁes how much requested utilization for each thread
varies with regard to the average.
explain model error) and shows how they were calculated.
Intuitively, the Dρ variable captures how the system change
has impacted the overall utilization of the SR. For example,
any system event that results in slowing down one of the
processors would be reﬂected in the Dρ variable, since the
median of the ρ access attribute would be lower in the new
data than the baseline. The KSratio variable captures more
subtle changes in the distributions of both ρ and B access
attributes, especially as they relate to each other. For ex-
ample, a system change that substitutes one application for
another would be reﬂected in the KSratio variable because
the changed access patterns aﬀect the distributions of the
collected ρ and B access attributes.
Using the two explanatory variables described above, the
best regression format describing the variable to model error
relationship was found to be:
√
Merror = Dρ + KSratio
Since Merror variance increases with higher values of Dρ
and KSratio, a square root transformation of Merror vari-
able is used to even out the variance across the entire range
of explanatory variables. Transformations of response vari-
ables in this manner are common in the regression model
building process when increasing variances are encountered.
Figure 6 shows the collected model error data and its rela-
tionship to one of the explanatory variables, Dρ. Note that
as the utilization diﬀerence between the trained and the new
system increases, so does the range of model errors (i.e. the
variance increase that prompted the square root transforma-
tion). The circles in Fig. 6 show what the regression model
described above would predict each of the errors to be (i.e.
every collected model error has a corresponding predicted
error). Note that the prediction model captures the upward
error trend as Dρ increases, but cannot capture all the vari-
ance in data found for high Dρ values. Although usage of
the KSratio variable helps (seen as widening of the ﬁt as D
increases), the prediction model describes only about 70%
of the variance in the observed model errors. This is not
unexpected, since Dρ and KSratio, cannot capture all the
behavior resulting in increases in model error.
The prediction model described in this section enables
the MESH simulator to selectively re-train only the system
events with potentially large impact on contention modeling
error. Although the prediction model does not capture all
the variables aﬀecting model error, the results in the Experi-
ments section will show that it can make the correct decision
on which system events to ignore in the great majority of
cases.
0.0 0.1 0.2 0.3 0.4 0.5
0
5
0
1
0
0
1
5
0
                                                     
                             
 
         
                 
     
   
     
                     
 
 
     
     
   
                    
 
 
     
 
     
 
                    
   
     
   
   
 
                     
   
     
   
 
 
        
 
 
   
 
 
     
   
    
          
   
   
 
 
   
 
   
       
 
   
   
     
     
 
 
 
 
   
     
 
 
 
       
 
   
      
 
   
   
 
                                                                                                                                                                                                              
                                                             
       
                                                                                                                  
       
   
 
   
                
                               
 
 
   
 
 
 
   
   
             
            
 
 
    
     
 
   
   
 
 
 
   
 
 
             
        
   
 
 
          
 
   
   
 
 
 
   
 
 
   
 
  
 
 
 
   
 
   
      
 
        
 
   
 
 
     
  
   
   
 
       
 
       
 
      
 
   
   
 
     
 
 
 
   
   
                                                                                                                                                                                                                                                                                                                                                      
 
                                                         
   
    
 
     
                                                   
   
     
 
     
          
                     
              
  
    
 
     
          
                
  
   
      
 
 
  
     
 
 
 
        
   
      
       
   
   
       
 
  
   
 
     
 
 
 
       
 
               
 
 
       
          
 
 
 
     
 
 
 
                                                                                                                                                                                                                                                                                                                                                                          
                
             
                 
          
         
 
 
              
   
   
         
 
     
 
 
     
 
 
   
 
     
 
 
       
     
 
 
         
 
                         
 
   
 
     
 
    
     
 
 
 
 
 
   
 
   
         
         
     
 
 
         
       
                     
             
 
 
           
 
   
 
                                     
   
     
 
 
 
 
            
       
     
   
 
   
 
       
     
    
               
 
            
   
 
 
                                    
           
                
                            
   
 
     
 
 
 
        
 
         
                 
          
              
                          
 
     
       
     
 
 
 
   
 
       
 
 
 
 
   
 
        
     
     
                      
 
 
   
 
           
     
     
   
         
   
      
   
         
       
       
   
                     
 
 
 
    
          
     
   
 
     
 
                                    
 
    
                
          
 
                                                                    
       
                                   
 
              
 
     
   
                          
 
               
 
     
   
                                                       
                             
   
 
 
   
 
 
 
   
 
     
     
   
 
   
 
 
   
 
 
 
   
 
     
         
 
   
        
             
          
   
         
                       
                                                                                                                                                                                                                                                         
        
                          
   
        
    
               
    
 
   
 
 
 
 
   
 
 
 
 
 
 
 
 
   
 
   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
 
  
 
   
 
   
 
 
 
 
 
   
 
   
 
 
   
     
 
 
   
 
 
 
     
  
 
   
     
 
 
 
 
 
   
 
   
 
 
           
 
     
 
 
     
                
 
      
      
                    
 
 
                                                              
                                                                                                                                          
                                                                 
    
                                                        
 
 
   
                   
                                 
 
 
 
   
  
                
      
                    
    
 
 
 
   
   
     
       
   
      
      
              
    
 
 
 
 
 
 
   
 
 
 
 
 
 
 
 
 
   
 
 
 
   
 
 
 
   
 
 
 
 
 
 
 
 
 
   
 
   
 
 
   
     
 
 
   
 
 
 
     
 
 
   
 
 
 
 
 
 
 
       
       
 
           
 
     
 
         
 
   
 
     
   
 
   
                         
                 
     
          
                                                                
                                                                                                                                                                                                                                                       
                         
            
                
 
  
 
          
 
 
   
      
 
          
 
   
 
 
        
 
   
 
     
   
 
 
 
   
     
 
 
           
 
   
 
 
         
 
 
 
 
 
       
 
 
 
 
 
 
 
 
 
 
   
 
 
 
 
 
 
 
 
  
 
   
   
 
   
 
     
 
 
 
   
 
 
  
 
  
   
 
   
 
     
   
 
   
   
   
          
             
 
           
                                       
                                                                                                                                                 
 
                     
              
 
       
   
        
 
    
        
 
  
   
  
 
    
 
 
   
 
    
 
 
 
   
 
 
   
   
     
   
                     
                                                                   
       
 
 
   
 
   
 
   
 
     
 
 
     
 
   
 
         
 
       
     
 
 
 
 
           
 
     
   
 
 
       
 
     
 
 
           
 
 
 
 
                                   
 
                         
   
                                         
  
   
      
       
                                                                                                                
 
     
 
 
   
 
           
 
     
   
   
 
         
 
                                                                                                                                                     
 
   
 
     
 
         
           
 
   
 
     
 
         
             
                                                                                                         
     
                            
                                                          
       
 
 
                                                                                                          
   
       
 
 
             
                                                                                                                                                                       
                  
             
                                                                                                                                                                                     
         
                                                                                                     
                          
   
                                        
 
   
 
 
 
  
 
   
 
 
 
 
                                                                                                                                                                                            
 
     
 
 
 
 
   
 
 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
0.0 0.1 0.2 0.3 0.4 0.5
0
5
0
1
0
0
1
5
0
M
e
r
r
o
r
D   
o
Measured Error Values
Predicted Error Values
Figure 6: Measured vs. predicted model error as ﬁt
with relation to the Dρ explanatory variable.
4. PRIOR WORK
The SCHM performance modeling space is populated with
many simulation and analytical approaches. Approaches op-
erating at CA level explicitly deﬁne a simulation event for
every cycle, while the Transaction Level Modeling (TLM) [5]
[9] approaches explicitly deﬁne simulation events at trans-
action boundaries. Both of these approaches operate at a
high level of detail and can easily capture impacts of system
events, albeit at the cost of long simulation times, interfer-
ing with designer’s ability to rapidly transverse the SCHM
design space.
Analytical contention models, such as queuing theory-
based models [2], are popular in modeling data network
contention and have been used in embedded systems as well.
However, they assume purely exponential access inter-arrival
times, an assumption that breaks down when loop-based ap-
plications create SR accesses with more periodic patterns.
Queuing models are not designed to respond to temporal
changes in the system, instead, they are optimized to ﬁnd
average throughputs and wait times. Thus they are not well
suited to capture impacts of system events.
Statistical simulation and statistical sampling approaches
[7] [10] extrapolate commonly recurring program behavior
by statistical sampling in a way similar to our contention
model. However, these approaches focus on estimating sys-
tem throughput, and are less appropriate for capturing sys-
tem response to outside stimuli, crucial for modeling em-
bedded systems. These statistical sampling approaches do
monitor the quality of their statistical estimate by periodi-
cally reverting to detailed CA simulation, similar to our re-
training approach. However, this“re-training”is done at reg-
ular intervals, and performed after important system events
like our approach. Therefore, the simulation (re-training)
events are not pruned at all, but are explicitly and period-
ically deﬁned, just like CA simulation but at a higher level
of abstraction.
By introducing a method to estimate the impact of run-
time system events on contention, our work selects system
events of importance, reducing the overhead of model re-
training, and allowing a high-level contention model to op-
erate in a dynamic SCHM environment. To our knowledge,
our model is the only contention modeling approach oper-
ating above the CA/TLM level that can adjust to system
changes during execution.
5. EXPERIMENTS
This section presents a set of experiments featuring an
SCHM system experiencing a variety of system event changes.
The goal of these experiments is to determine how well the
prediction model from Section 3 tracks the actual contention
model error, while identifying system events with the most
impact. The test system executes several concurrent ap-
plications running on ARM processors while contending for
shared memory through a shared bus. Since applications are
independent, the only contention in the system is contention
for resources, not data. For the evaluation of the contention
model, we assume that each application is running on its
own processor, therefore, there is no contention for execu-
tion resources, only for shared memory blocks. The memory
and bus models assume a constant service delay for each un-
contended access. Only one outstanding memory access is
allowed per processor, meaning that processors stall on con-
72tention. The impact of caches is not modeled, although we
believe that, with some adjustments, the statistical regres-
sion training approach could be used to model caches as well.
Note that none of these assumptions are limitations of the
MESH simulator, but are instead limiting factors chosen by
us to isolate the statistical model’s accuracy.
To capture a wide variety of design perturbations, access
patterns, and contention levels, we ran tests with groups of
2, 3, 4, and 5 single-threaded applications executing con-
currently. We selected several multimedia, encryption, com-
pression, and signal processing applications from SPEC2000
and MiBench [8] benchmark suites: adpcm (adaptive diﬀer-
ential pulse code modulation), FFT, jpeg, gzip, rijndael (en-
cryption), rsynth (speech synthesis), and crc (cyclic redun-
dancy check). To select these applications from the greater
set of SPEC and MiBench benchmarks, each benchmark’s
memory access utilization and coeﬃcient of variation were
measured and used for the selection criteria, ensuring the
widest variety of memory access patterns. The 7 applica-
tions, executing for all combinations for groups of 2, 3, 4
and 5 threads at a time, provide us with over 100 diﬀerent
test runs for each experiment data point.
System events and their impact on system behavior are
modeled through ﬁve representative experiments:
• Application Change: Simulates a preemptive sch-
eduling switch to a diﬀerent application.
• Data set Change: Simulates change in external in-
put; new data is delivered to a already running appli-
cation.
• Processor Speed Change: Simulates voltage/frequency
scaling, resulting in relative speedup or slowdown of
one of the processors.
• Bus Speed Change: Simulates voltage/frequency
scaling of the bus, resulting in relative speedup or slow-
down of access time to memory.
• Coprocessor Change: Simulates shutting down a
section of the chip by removing a ﬂoating point and/or
multiply-accumulate unit.
Experiments summarized above capture a wide variety,
albeit not all, of possible system-level events at a variety
of data points. For example, the bus speed change experi-
ment includes altering the memory access time from 50% to
150% at 10% intervals. Taking into account the ﬁve separate
experiment types, their multitude of data points, and over
100 diﬀerent application combinations for each data point,
the experiments in this section include over 4000 separate
SCHM operating situations. In the following sections we
will discuss these individual system changes separately, an-
alyze their impact on contention modeling, and evaluate the
prediction model’s ability to anticipate the modeling error.
Due to the space limitations, we will focus most of our anal-
ysis on application change and data set change experiments
while summarizing data from the remaining three experi-
ments.
5.1 Application Change
To simulate a switch of one application with the another
as a result of a scheduling decision, we replaced the adpcm
and jpeg applications with others from the group of 7. Fig-
ure 7 shows the model error when the adpcm application
is replaced by one of the other 6 applications labeled on
the x-axis. Each data set is also separated according to the
thread concurrency (i.e. 2th represents all data where 2
applications ran concurrently). The data is presented in a
standard box plot format, where the thick line in the middle
of the boxes represents the median of collected data, and the
edges of the box represent the edges of the ﬁrst and third
quartiles. As the data shows, replacing adpcm with crc,
gzip, or jpeg applications results in highest errors. This is
not a surprising result since the adpcm benchmark has very
0
2
0
4
0
6
0
8
0
95% Prediction Interval
Predicted Mean Error
0
2
0
4
0
6
0
8
0
2th 3th 4th 5th 2th 3th 4th 5th 2th 3th 4th 5th 2th 3th 4th 5th 2th 3th 4th 5th 2th 3th 4th 5th
            jpeg       |       crc       |         fft        | gzip      |    rijndael    |     rsynth          
M
o
d
e
l
 
E
r
r
o
r
 
C
o
m
p
a
r
e
d
 
t
o
 
C
A
 
S
i
m
u
l
a
t
i
o
n
Figure 7: Model error as the adpcm application is
swapped out by another application.
low memory utilization and burstiness, while crc, gzip, and
jpeg access memory often and with frequent bursty patterns.
Also, note that as the concurrency level rises, the model er-
ror decreases. This result is also intuitive, since a change
in one application aﬀects system operation more when only
two applications are present vs. ﬁve.
Red diamonds in Figure 7 shows the median error esti-
mated by the prediction model. Additionally, the top 95%
prediction interval is shown by a downward pointing trian-
gle. Where the median predicted error tells the designer
what average error can be expected from a design change,
the 95% prediction interval suggests an approximate upper
bound for the error estimate. The prediction model is on
average within 5.01% actual error, while 95% of all errors
are below 12.79%. More importantly, the model is eﬀective
at separating the low error substitutions like adpcm-ﬀt from
high error substitutions like adpcm-gzip.
5.2 Data Set Change
To simulate the eﬀect of data set changes on contention,
this experiment replaces the input data sets of two data-
dependent applications from the benchmark set: gzip and
jpeg. Figure 8 shows the model error when input data for
the gzip application is changed. In this case, the gzip appli-
cation was trained with a long text ﬁle of a Charles Dick-
ens’ classic“A Christmas Carol”. Data labeled“great exp.”,
“moby dick”, and “jane eyre” are alternative data sets that
feature long text ﬁles of English text. The“bmp img”is a bit
mapped (uncompressed) image, while “random” and “zero”
provide a completely randomized and completely uniform
data sets. Since the gzip application uses Huﬀman trees
for compression, the frequency of encountered tokens signif-
icantly impacts the behavior of the application. Therefore,
data sets with a single token (zero) or evenly distributed to-
kens (random) will produce a much diﬀerent responses than
when using English text. Figure 8 reﬂects this observation,
while the prediction model nicely distinguishes the random
and zero data sets from the text-based ones. Due to rela-
tively low model errors associated with data set changes, the
prediction model was very accurate in estimating contention
model error in this experiment, staying within the average
of 1.84% while 95% of the cases were predicted with about
6.21% error.
5.3 Other Experiments
To capture eﬀects of frequency scaling on contention, the
processor speed change experiment consists of a group of
processors operating at approximately the same clock fre-
quency, while the operating frequency of one processor is
varied. As expected, increasing deviation from the trained
baseline increases model error, while the error is higher for
systems with less concurrency since change to one proces-
730
5
1
0
1
5
2
0
2
5
3
0
95% Prediction Interval
Predicted Mean Error
0
5
1
0
1
5
2
0
2
5
3
0
2th 3th 4th 5th 2th 3th 4th 5th 2th 3th 4th 5th 2th 3th 4th 5th 2th 3th 4th 5th 2th 3th 4th 5th
         random    |      zero      |  great exp.  | moby dick  |  jane eyre   |   bmp img.       
M
o
d
e
l
 
E
r
r
o
r
 
C
o
m
p
a
r
e
d
 
t
o
 
C
A
 
S
i
m
u
l
a
t
i
o
n
Figure 8: Model error as input data for the gzip
application is changed.
sor represents a higher fraction of the system. Doubling
the operating frequency of one of the processors produced
an average 17% error for the 2 processor situation and an
average 7% error for the 4 and 5 processor situations. In-
creasing processor frequency to 300% of the original resulted
in 25% error for the 2 processor case and 10% error other-
wise. The prediction model was eﬀective in predicting this
error to within an average of 3.65%.
Scaling the bus speed aﬀects latency in the path to mem-
ory equally for all processors in the system. To capture this
eﬀect on contention model error, bus speed was reduced to
50% and increased to 150% of the trained value. Since mem-
ory latency change has a much higher eﬀect on the system
as a whole than scaling the speed of one processor, the con-
tention model error was appropriately higher: scaling down
to 50% speed introduced an average of 40% error, while
speeding the bus up to 150% introduced 18% error. Due
to higher errors encountered, the prediction model was less
eﬀective at predicting bus speed model error, estimating it
to within an average of 8.37%.
By shutting down specialized hardware functional units,
SCHM systems can save power at the expense of lowering
performance for certain types of workloads. The copro-
cessor change experiment examines these cases by compar-
ing systems with and without ﬂoating point and multiply-
accumulate (MAC) units. For the ﬂoating point test, the ex-
periment focuses on FFT and jpeg applications which both
use ﬂoating point arithmetic. Since jpeg’s DCT algorithm
can take advantage of MAC functionality, jpeg application
is used to evaluate the MAC unit addition. These tests pro-
duced surprisingly low contention modeling errors of about
5%. However, although the addition of a ﬂoating point and
a MAC unit changed the contention patterns signiﬁcantly,
only a relatively small number of instructions was aﬀected.
Therefore, for most of the runtime, the baseline-trained con-
tention model was suﬃcient.
5.4 Results Summary
Table 2 summarizes the prediction model accuracy. In ad-
dition to the mean prediction error for each experiment, the
table shows the 95% percentile of the collected prediction
errors, while the third column illustrates the percentage of
tests that fell outside the prediction models’ 95% prediction
interval (triangles in Figures 7 and 8). For all experiments,
the prediction model was on average within 5.43% of the ac-
tual model error, 95% of all predictions were within 18.87%,
and 7.63% of the data was outside of the 95% interval pro-
vided by the prediction model. The percentage of values
over the bound is higher than calculated due to a slight
non-normality in the distribution of model error residuals
after the ﬁt, where a normal distribution is assumed when
calculating prediction intervals.
Table 2: Prediction Model Accuracy Summary
Experiment Mean Error 95th Over
Predict Err Percentile Bound
Application Change 5.01% 12.79% 3.09%
Dataset Change 1.84% 6.21% 0.00%
PE Speed Change 3.65% 11.54% 12.10%
Bus Speed Change 8.37% 29.48% 5.80%
Coprocessor Change 3.85% 9.67% 0.00%
All 5.13% 18.28% 6.08%
6. CONCLUSIONS
This paper introduced an event-based re-training tech-
nique for statistical contention modeling within SCHM sys-
tems, allowing frequent system events to be pruned down
to a set of simulation events that truly aﬀect contention be-
havior. The backbone of this technique includes a prediction
model for estimation of contention modeling error without
relying on detailed CA simulation. The prediction model
is on average within 5.13% of the actual contention values,
allowing the simulator to select which system events do not
have a signiﬁcant impact on contention, removing around
50% of simulation overhead (or about 70% of the number of
events) due to frequent contention model re-training.
By removing the burden of abstracting system events from
the designer, this work enables a high level statistical con-
tention model to simulate a broader range of SCHM sce-
narios. Although we showed that de-coupling system events
from simulation events is beneﬁcial for statistical contention
models, any modeling method for dynamic SCHM systems
operating above the CA level can improve its accuracy and
speed by identifying the events that do not signiﬁcantly im-
pact model assumptions.
7. ACKNOWLEDGMENTS
This work was supported in part by an NSF Graduate Re-
search Fellowship, SRC contract 2005-HJ-1312, and the Na-
tional Science Foundation, under grants 0509193, 0607934,
and 0606675. Any opinions, ﬁndings, and conclusions or
recommendations expressed in this material are those of the
authors and do not necessarily reﬂect the views of the NSF.
8. REFERENCES
[1] J. R. Bammi, W. Kruijtzer, L. Lavagno, E. Harcourt, and
M. T. Lazarescu. Software performance estimation strategies in
a system-level design tool. In CODES ’00, pages 82–86, 2000.
[2] D. Bertsekas and R. Gallager. Data Networks. Prentice Hall,
1992.
[3] A. Bobrek, J. M. Paul, and D. E. Thomas. Shared resource
access attributes for high-level contention models. In DAC ’07,
pages 720–725, 2007.
[4] A. Bobrek, J. J. Pieper, J. E. Nelson, J. M. Paul, and D. E.
Thomas. Modeling shared resource contention using a hybrid
simulation/analytical approach. In DATE ’04, pages
1144–1149, 2004.
[5] L. Cai and D. Gajski. Transaction level modeling: an overview.
In CODES+ISSS ’03, pages 19–24, 2003.
[6] R. Covington, J. Jump, and J. Sinclair. Cross-proﬁling as an
eﬃcient technique in simulating parallel computer systems. In
Computer Software and Applications Conference, pages
75–80, 1989.
[7] L. Eeckhout, S. Nussbaum, J. E. Smith, and K. De Bosschere.
Statistical simulation: adding eﬃciency to the computer
designer’s toolbox. IEEE Micro, 23(5):26–38, 2003.
[8] M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge,
and R. Brown. MiBench: A free, commercially representative
embedded benchmark suite. In IEEE Workshop on Workload
Characterization, pages 3–14, 2001.
[9] S. Pasricha, N. Dutt, and M. Ben-Romdhane. Extending the
transaction level modeling approach for fast communication
architecture exploration. In DAC ’04, pages 113–118, 2004.
[10] T. F. Wenisch, R. E. Wunderlich, and et. al. SimFlex:
Statistical Sampling of Computer System Simulation. IEEE
Micro, pages 2–15, July-August 2006.
74