Pint, herramienta de simulación basada en trazas Pin by Izquierdo Riera, Francisco Blas
Escola Tècnica Superior d’Enginyeria Informàtica






Author: Francisco Blas Izquierdo Riera
Director: Julio Sahuquillo Borrás - UPV




In the course of this project we have developed a set of programs to
improve the correction and execution time of the gem5 simulator.
For this, we moved the functional simulation step out of gem5 into an
independent instrumented process to ensure correction in the functional stage
and to provide a good execution speed (since the code will then be natively
executed). This instrumentation is done by Pin.
Also, in order to allow efficient communication between the processes de-
spite the limitations imposed by Pin to the available tools, an IPC framework
to allow message passing between the processes was developed. This frame-
work uses lockless fifo queues over shared memory so the resulting slowdown
is minimal.
Keywords: hardware, simulator, x86, Pin, Pintool, gem5, ipc, fifo, C++
Contents
1 Introduction 4
1.1 Project rationale . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Project objectives . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Project strengths . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Memory structure . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 State of the art 8
2.1 Architectural simulators . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Graphite . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Multi2Sim . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 gem5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Instrumentation systems . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 gprof . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 SPEC CPU2006 . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 SPLASH-2 . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 System description 16
3.1 Mead: a message passing framework . . . . . . . . . . . . . . 16
2
Contents Contents
3.2 Pint: a Pin based trace generator . . . . . . . . . . . . . . . . 18
3.3 Schnapps: a simple consumer of the traces . . . . . . . . . . . 19
3.4 Gin5: a gem5 trace player . . . . . . . . . . . . . . . . . . . . 20
4 System design 21
4.1 Mead: a message passing framework . . . . . . . . . . . . . . 21
4.2 Pint: a Pin based trace generator . . . . . . . . . . . . . . . . 23
4.3 Schnapps: a simple consumer of the traces . . . . . . . . . . . 25
4.4 Gin5: a gem5 trace player . . . . . . . . . . . . . . . . . . . . 26
5 Results 28
6 Conclusions 30
6.1 Improvements for next release . . . . . . . . . . . . . . . . . . 31
A User manual 32
A.1 Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.2 Pint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.3 Schnapps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.4 Gin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33






Despite the vast amount of hardware simulators that exist nowadays, most of
them either lack flexibility on the simulations or are slow since they simulate
the code execution instead of instrumenting the natively executed executable.
As a related issue, since code is simulated and not executed it is common to
find bugs where the simulator will not set the processor state properly which
cause corner cases where not acting as the processor causes execution issues
with some programs.
Also many simulators lack support for parallel execution and those who
do tend to add big overheads when running the simulation in a single machine
and will not support instruction level simulation granularity.
Finally current simulators tend to add big overheads to the functional
simulation step which makes it unfeasible to run large tests even when sim-
ulating simple systems.
Program instrumentation solves all these shortcomings by running the
code natively (modified so it will also execute the instrumentation code),
4
Chapter 1. Introduction 1.2. Project objectives
allowing it to run in parallel and, since code is executed natively in the
processor, providing a completely native execution.
Given the limitations of the current simulators we consider that the com-
munity needs a flexible and fast instrumentation based tracer able to be used
with a broad range of programming languages who will handle local simula-
tions in parallel, with small overheads and with instruction level granularity.
1.2 Project objectives
Our main objective is providing an instrumentation based tracer that can be
used with other simulators. Given the problems with the size these traces
can have we will feed them in a lively fashion.
In order to see whether these objectives are met or not we will measure
the slowdown compared to the non instrumented program with a simple trace
consumer (to ensure it is not the bottleneck). Our objective is getting at least
similar slowdowns to the ones of Graphite [9], but removing the caveats it
has at least on single core processors.
Also we intend to design an architecture which can later be expanded
to support multiple simultaneous execution threads. This will be done on a
later version of the project though due to timing constraints.
1.3 Project strengths
The biggest problem with instrumentation-based systems is that the instru-
mentation code is limited heavily by the instrumentation API of the instru-
mentation system (for example the POSIX thread API can not be used with
Pin), this also reduces vastly the number of languages that can be used, in
5
1.3. Project strengths Chapter 1. Introduction
order to overcome these limitations, we use FIFO queues placed in shared
memory to extract the data from a process to another, using the operating
system process separation to execute the simulation in a different processor.
As a result, the simulator overcomes the restrictions caused by the instru-
mentation framework since these will only apply to the process where the
program is being instrumented.
As a side effect of this approach, the resulting simulators will be seg-
mented since the functional simulation can be done on a different processing
unit than the one running the simulation itself. As the number of processor
cores increases it is likely that hardware simulators will use the segmen-
tation approach more extensively in order to increase performance. Also,
when shared the cache accesses caused by the shared memory communica-
tion cause slowdowns, hyperthreading processors can be used and proper
processor affinity to the processes can be set so the critical simulation parts
(i.e those responsible of bottlenecks) will be set along with the previous part
on the different threads of a single core so the reads from the FIFO queue
are likely to be on the level 1 cache.
To ensure real parallelism each thread of the instrumented program can
use a different FIFO queue to extract its traces1. This also allows the user to
limit instruction granularity by setting an appropriate queue size since the
instrumentation will stop execution once the queue is full2.
As an example of how Pint can be used we also created Gin5 a slightly
modified version of the gem5 simulator which uses Pint’s instrumentation as
the source of the memory access information during the simulation.
Another known problem is that simulators tend to be very good on sim-
1The completely multithreaded implementation will be finished with the next release
of Pint
2For this a simulation started event is added to ensure the first instruction will not be
executed until the simulator wants it.
6
Chapter 1. Introduction 1.4. Memory structure
ulating a specific part of the system whilst having issues on others. We
consider that in the future this FIFO system may be useful to interconnect
simulators so the best of them can be gotten.
1.4 Memory structure
In this introduction we presented the problem we are trying to fix, our ob-
jectives and our strengths.
On the next section, we will explain the state of the art at the time of
our publication in the topics of Architectural Simulators, Instrumentation
systems and Benchmarks.
Afterwards we will analyze the four modules we developed for the project
and we will continue later with the design decisions.
We will finally present our benchmarking results and our conclusions.
Annexed you will find a brief user manual in case you want to try our
system and the referred bibliography.
On the Annex folder you will find the sources we developed in this project.
7
Chapter 2
State of the art
Of the many simulators currently available, we have chosen three to explain
which is the current state of the art for being the ones on which most work
is being done nowadays: gem5, Multi2Sim and Graphite.
Also, as instrumentation tools we will cover gprof based profiling and Pin.
Finally, as benchmarks we will cover the SPEC CPU2006 and the SPLASH-
2 benchmarks.
2.1 Architectural simulators
Architectural simulators are tools used to see how a proposed processor de-
sign would work without the need of building the processors themselves.
Despite these share a some similarities with virtual machines in that they
execute programs and that the main focus in both is the correct execution of
the program; virtual machines have their main focus in providing a speedy
execution of the program, whilst architectural simulators focus on providing
good statistics of the program execution and executing the program in the
same way the architecture would use.
8
Chapter 2. State of the art 2.1. Architectural simulators
Architectural simulators tend to be structured in a set of stages, disas-
sembly, functional simulation and cycle by cycle simulation.
During the disassembly stage the machine code to be executed is trans-
formed into a set of structures that can be understood by the simulator, the
set of structures used is critical for an efficient simulation.
During the functional simulation the resulting set of structures is inter-
preted by the simulator to modify the internal state of the processor struc-
tures and the representation of the simulated program’s memory space.
Finally during the cycle by cycle simulation the represented architecture
and system are simulated in a cycle by cycle basis so the timing results are
precise.
Simulators may have these stages clearly differentiated or not but all of
the do have these stages.
Also some simulators emulating only the memory system (and further
processor structures) are based on memory traces. A memory trace is a
description of the memory accesses made by a particular program when run
which is then replayed on the simulated memory system.
In general trace based simulators tend to be fast since they will not only
remove the execution step but also use a more simplified model for the pro-
cessor. But traces have a few problems: on one side the programs being run
need to be run in a way in which they will be generated, for example with
a dynamically instrumented program, and when big enough they can take a
lot of space, for example a trace of the SPLASH-2 LU with contiguous blocks
trace would take around 1.5GiB if each access could be stored in only 32 bits.
Anyway there are some nice works in trace generation with Pin for simu-
lators like Dinero IV [7], an example of which can be found in the dinerotool
[1] by Kenneth Barr.
9
2.1. Architectural simulators Chapter 2. State of the art
When using traces it is hard to overcome the requirement of using traces,
but, it is possible to overcome the space limitation restrictions by feeding
them live into our memory simulator. This was the approach chose by us.
2.1.1 Graphite
Graphite [18] [9] is a multicore simulator also written over Pin designed to
provide real multithreading both when run locally and when run over a large
number of computers. In order to do this, graphite hijacks some syscalls of
the syscalls which will then be sent either to the local kernel or to the central
kernel or to both. A similar procedure is used to track memory access es and
an internal "MMU" tracks which machine has which copy of the memory.
In order to synchronize threads Graphite provides a few different syn-
chronization ways of which the fastest is the lax synchronization method.
Given the popularity of this simulator nowadays Graphite was the simu-
lator chosen as the reference against which we will compare the speed of our
system.
Saddly, one of the major caveats with Graphite is that it is very system
specific and, as a result, it was impossible for us to run it on our testing
equipment.
2.1.2 Multi2Sim
Multi2Sim [20] [3] is a simulator supporting a big set of targets to emulate
different architectures, both CPU and GPU.
As a simulator it is split in different components, a disassembler intended
to convert the input programs into something the simulator can understand
and use, a functional simulator which maintains the CPU and memory state
10
Chapter 2. State of the art 2.2. Instrumentation systems
and runs the code and a cycle by cycle simulator which does the execution.
It also provides some visual tools for checking how the simulation is run.
2.1.3 gem5
gem5 [16] [2] is the result of the merge of two powerful simulators: M5 and
GEMS. gem5 is a simulator able to emulate some architectures both in Full
Mode (this is, running the kernel as part of the simulation) and in Syscall
Emulation mode (as the two aforementioned simulators by emulating the
kernel for the provided binary).
As a full system simulator it is known for the flexibility it has for emulat-
ing different systems, not only by the number of architectures it supports but
also by the number of devices it can emulate and the flexibility it provides
in doing so.
The main problem it has is that although the modules are written in
C++, they are usually run by a python script which complicates the system.
This flexibility gem5 was the reason for choosing this simulator as a target
for implementing our system.
2.2 Instrumentation systems
Instrumentation systems provide ways to know how is the code running,
either for later statistic generation and performance checking or for other
uses like memory trace generation.
Instrumentation can be dynamic if the code that will control how the
program is running is added when it is executed or static if this code is in-
terleaved when building the program with the compiler. Normally dynamic
instrumentation is preferable since it will allow us to instrument also propri-
11
2.2. Instrumentation systems Chapter 2. State of the art
etary programs and will not require a modified compiler.
2.2.1 gprof
gprof [17] [6] is a profiling system used along with programs compiled with
special flags by gcc [14] [13]. For this gcc will embed the profiling code and
mix it with the compiled sources before assembly. This technique is called
static instrumentation since it is done in compilation time.
Programs compiled with profiling flags will generate when run binary
file, called gmon.out, containing the execution statistics. Afterwards a call
to gprof can be used to interpret the generated file.
Although traces could be also generated by using these techniques the
requirement of having to compile the programs with a particular compiler is
an impediment in some cases thus the ideas provided by this system where
discarded.
2.2.2 Pin
Pin [21] [10] [8] on the other side is a dynamic instrumentation framework,
this means that instrumentation code is added dynamically. For this Pin
hijacks with ptrace the program to be run, as a debugger like gdb [15] [12]
would do, and then loads the Pintools’ code and the Pin framework into the
running program and modifies the process so it will run the code produced
by the JIT generator provided by PIN.
For this to work, Pin provides a modified version of the C++ runtime
which has some features stripped down in order to prevent incompatibilities
with the program being run. Anyway most of the C++ features can still be
used by the tools and for those that can not Pin provides alternatives (for
example locks).
12
Chapter 2. State of the art 2.3. Benchmarks
The main problem with Pin is that running it on hardened systems is
complicated since the default method used by Pin to attach to the program
via ptrace is considered dangerous by these kernels (since it is not a par-
ent attaching to its child but the other way around), also the JIT compiler
provided by Pin causes problems because it tries to have mapPings which
are both writable and executable which is another technique restricted by
hardened systems.
Despite these issues Pin was the system chosen for providing the instru-
mentation framework.
2.3 Benchmarks
Benchmarks are programs with standardized inputs that are used to measure
and compare the performance of different systems running them. Depending
of the component being measured different metrics can be used: power con-
sumption, execution time, number of frames per second generated, etc. Of
these in this project we care the most about execution time.
Benchmarks can be synthetic when they emulate the load caused by typ-
ical programs of a particular type, examples of which are Dhrystone [25]
and Whetstone [5]; or application when they run one or more real world
programs like the two we have analyzed. In general real world benchmarks
provide more meaningful results since they allow you to see how will real
applications behave.
2.3.1 SPEC CPU2006
The SPEC CPU2006 [22] [11] benchmark is a set of programs from the real
world which are provided along with some inputs to test the speed of a
13
2.3. Benchmarks Chapter 2. State of the art
system and with a main focus on the CPU execution speed. Despite being
there since the 2006 these benchmarks are widely used and understood in
the academic and real world.
Most of the programs provided with the benchmark are licensed with GPL
style licenses and are well known in the free software world, for example gcc
or perl, whilst others come from different research projects. It is because of
this that the copyright is held over the input files in this benchmarks.
The main problem with these benchmarks is that they focus on single
threaded processes.
2.3.2 SPLASH-2
The SPLASH-2 [23] [26] benchmark was developed by the Flash research
group at the Stanford university to provide a set of benchmarks that could be
used on shared memory multiprocessor systems. Although the benchmarks
are quite old and require modifications to work properly they can still be
used and have the advantadge of running in a short time.
The applications provided are related to the scientific world with examples
of 3 body gravity simulators or some kernels like the LU decomposition of a
matrix.
Since the original tests will not run, we used a modified version of the
SPLASH-2 benchmark [19]. Even more modifications were required for the
null macro to work properly and for the tests to be able to be run with Pin
on hardened systems, these modifications are provided as a patch file in the
source distribution.
The main reason for choosing these was that the relative performance
results of these tests (although with more than one processor) were provided
on [9] so we did not need to run the benchmarks again for Graphite and thus
14
Chapter 2. State of the art 2.3. Benchmarks




Our application will be divided in 4 modules: Mead, a framework for provid-
ing an efficient message passing interface between different processes; Pint,
a Pin based trace generator; Schnapps, a simple consumer of the traces; and
Gin5, a gem5 trace player for the memory system.
The traces will be generated by Pint and then fed through Mead to either
Schnapps or Gin5 which will process it and provide some simulation statistics.
3.1 Mead: a message passing framework
The pattern of message passing is not new and it conforms the base of some
Object Oriented views. Mead will provide a fast and simple way of passing
around the traces as messages stating that something has happened (for
example the program made an execution memory access of size x at position
y). These messages may contain the thread identifier of the thread that
caused them and also attached data, for example in the case of a memory
write the data available before writing and the data being written.
Although the API provided by mead is quite agnostic of the message
16
Chapter 3. System description 3.1. Mead: a message passing framework
passing system being used we have chosen producer-consumer FIFO queues.
FIFOs are used since they are a known pattern which allows for easy im-
plementation and migration over other interprocedural communication sys-
tems, if interprocess shared memory is not an option, like POSIX message
queues or datagram sockets.
Our FIFO model differs slightly from the standard model since it allows
for two communication types, on one hand you have the event communication
system which can queue many events for further handling by the receiving
side. On the other you will find a command interface able of holding a
single command. The command interface requires acknowledging the sent
command and is used to indicate important events which require specific
handling by the queue system like the death of the FIFO or the beginning
and ending of the simulation procedures.
The main difference between events and commands are that events are
unidirectional (from the producer to the consumer) whilst commands can be
used bidirectionally (as long as the absence of collisions is guaranteed by the
programmer) and are more easily handled with the futex syscall which makes
them very useful for events which will require a really heavy processing on
the other side by allowing the other thread to preempt the CPU while this
is done.
The FIFO architecture is based over a central FIFO (called the main
FIFO) which is used to send global events which are supposed to stall the
simulation until attended (so the simulator can decide whether it should clear
or not the per thread queues before processing the aforementioned event),
this queue handles at least the thread creation and deletion events where a
new FIFO queue is negotiated between both sides, but it can also be used
to process events like the creation and deletion of new mappings amongst
17
3.2. Pint: a Pin based trace generator Chapter 3. System description
others.
Given the importance of the main FIFO in the architecture it is important
that both processes know where to access it beforehand and are able to ne-
gotiate its creation independently of who arrived first (since synchronization
is impossible before the FIFO creation).
The framework also features a per thread FIFO which can be used to send
events which are not of global significance to the listener on the other side.
This lets the programmer communicate information fast since the queues can
be then lockless and, as a result, as long as there are at least two processors
available the current process will not be changed by the kernel preventing
expensive context switches. The creation of these FIFOs should ne negotiated
over the main FIFO when implementing the multithreaded version.
3.2 Pint: a Pin based trace generator
Pint by itself it is not a simulator but a framework providing efficient ways
to extract the data from the instrumented program through Mead. The
version presented with this project is single threaded (although designed to
be multithreaded and with part of the work for that already done) and relies
on Mead for communicating with the simulator itself. As an example the
provided instrumentation will study memory accesses made by the program
(of any type ranging from prefetches to execution fetches) and sends them
out with Mead so the simulator can prevent the issues associated with Pin
tracing tools.
The code here is focused heavily on speed and thus the user must have
the option to choose the features that should to be used.
The granularity of the execution can be easily tunned by setting an ap-
18
Chapter 3. System description3.3. Schnapps: a simple consumer of the traces
propriate queue size. For example for instruction by instruction execution
the queue must have size one.
Also, a simulation started event must be the first one to be queued so
you can discard old elements when you want the execution to be done.
Since all instructions will start (and contain) with a single fetch event it is
possible to use this event as the differentiator between instructions. Anyway
it is a good idea to integrate at least the number of events the instruction
will cause to make tracing easier. This may be done on future versions.
Pint also provides a way to specify the number of instructions that must
be executed before switching to the next simulation mode and thus you can
provide the number of instructions that must be executed (by the sum of
threads) before switching to another simulation mode.
The mode automaton allows for three simulation modes which are switched
in the following order, the fast forward, the warm up and the simulation
mode, which will then go back to the fast forward.
In the fast forward mode instructions are just accounted and executed
but no data is generated which allows for near native speed execution. In
the warm up mode and simulation mode instructions will generate events
for filling the caches but the entrance and exit of the simulation status are
notified to the consumer so it can handle statistics properly.
3.3 Schnapps: a simple consumer of the traces
Schnapps is intended to be used mainly for analyzing the performance of the
instrumentation code by consuming the events generated whilst trying to
avoid causing any bottlenecks in execution, and also as an example program
of how to extract the generated traces.
19
3.4. Gin5: a gem5 trace player Chapter 3. System description
Schnapps reads the traces generated by Pint and outputs the map changes
as they happen (in a diff like format) and some statistics for the current sim-
ulation and for the total run, in particular, amount of data read or written
by the different memory access types, the number of said accesses that has
happened and an execution mark made by xoring the different accesses’ ad-
dresses together the idea being that different marks imply different traces
being generated but the same mark does not necessarily imply the same
trace being generated. .
3.4 Gin5: a gem5 trace player
Although previous versions of gem5 came with a trace player supporting
different formats, these modules stopped being maintained long time ago
and those stopped building, as a result and despite being a good base for
starting the work given the big amount of changes the memory system has
suffered since then a different base was necessary.
As a result we have set again a generic CPU for playing traces (missing
the TLB) which will ask for memory access request to the queues via a clear
interface so it can be used also with other types of trace formats including




4.1 Mead: a message passing framework
Mead has a macro of particular interest: USE_YIELD which will enable
the use of the yield system call to let other process use the processor when
waiting.
On mead, we have chosen to implement a lockless buffer ring over shared
memory for our FIFOs since it is a well known pattern [4] [24].
The other reasons for choosing such an structure was speed and inde-
pendence. By being lockless we avoid expensive spinlocks that would hinder
performance whilst avoiding also having to either use the ones provided by
Pin everywhere or building our own. Also having the data in shared memory
will prevent us from making expensive system calls to have the messages
passed and will allow the usage of cache for that.
It should be taken into account that the lockless queue will only work
properly if a single thread acts as reader and a single thread acts as writer.
In case of having more threads at either side they require a lock to work
properly.
21
4.1. Mead: a message passing framework Chapter 4. System design
Our implementation is based on templates so it can be used with different
classes although it should be taken into mind that the same class (i.e. no
inheritance) should be used on the whole queue.
Also one of the current major caveats is that the shared memory address
is currently hardcoded and, as a result, only a single instance of the program
can be started at the same time. We expect to fix this in future versions
by providing a launcher that will allocate an anonymous shared memory
segment and pass its identifier to both Pint and the trace reader being used.
The command types are defined by the shmstatus enum. Since newly
allocated shared memory is filled with 0s we assume the 0 value as the initial
state (NONE). The server will then write a SERVER_STARTED command
and wait for a CLIENT_ACK then. Finally when dying the server is ex-
pected to send the SERVER_DIED command so the client will not wait
forever for data.
Of the many methods provided, those of special relevance for the pro-
grammer are the gethead and gettail methods used to be able to access the
data we want to insert or extract from the queue, the push and pop methods
used for adding or removing an element from the queue and the full and
empty methods used to check for these states.
Also some methods for waiting in case the queue is empty/full are pro-
vided but these must be used carefully since if the consumer is singlethreaded
it could hang waiting forever for the queue to match the condition. In this
case special waits monitoring the main queue status too are recommened in-
stead. Also a wait_push method is provided that will wait until a push can
be done.
For control handling we provide the send_control, receive_control and
ack_control methods. In particular send_control will wait until an ACK is
22
Chapter 4. System design 4.2. Pint: a Pin based trace generator
sent back to state the condition was taken care of.
Finally the wait_start and tell_Start methods are provided for initial-
ization and instead of yields they use calls to the futex syscall to lock the
thread until they have been attended to reduce the processor load in some
situations.
4.2 Pint: a Pin based trace generator
In order to ensure unwanted features will not hinder performance prepro-
cessor based switches can be used to disable those you are not interested in
using, also some other options can be set in this way.
The macros of interest here are PADSIZE which defines the amount of
bytes of the cache line in order to prevent false sharing, MAXMEMSIZE
which defines the maximum size a single memory access may have (used
for amongst other setting the size of the buffers), USE_DATA which will
enable the infrastructure for fetching and sending the accessed data in the
events, MULTITHREADED which will enable the still incomplete multi-
threaded code, USE_STATES which will enable the fast forward, warm up
and simulation state machine and DTRACE which will make pint output
some debugging information.
Given the impact these features can have we decided to allow the user
disable them at compile time. Also some of these features can be disabled
at run time although they will still have some impact on the execution, in
particular, USE_STATES will still cause the slowdowns of the conditionals
introduced before the instrumentation calls to handle the state machine and
USE_DATA will make the event size, and thus the queues larger.
A final option that can be disabled is mapping tracing after a context
23
4.2. Pint: a Pin based trace generator Chapter 4. System design
change (disabled by default) and after a syscall. The reason for this is the
great slowdown caused by this operation since it requires at least 3 system
calls in order to be executed and parsing a large text file.
In Pint the instrumentation is added by the Instruction function, when
given the choice between adding complexity here or in the instrumentation
functions we should add it here since this function is executed with much less
frequency than the instrumentation code. As can be seen this function just
tells Pin to add calls to the proper instrumentation functions, either with
previous conditionals if the state machine is being used or without them
otherwise.
Here we should consider all the parsemaps functions which are wrappers
around the original parsemaps function that will take care of generating the
events may maps be added or deleted. Also, as it can be seen, this function
will consume quite a lot of resources given the way in which it works. Sadly
the PIN framework on which pint is based does not provide any API in order
to distinguish the mappings made by the instrumentations (including the
JIT caches and the instrumentation code itself) as a result a lot of events
will be generated on the simulation status queue. In order to reduce this
overhead we assume mappings may only change after either coming back
from a context change (as is the case when the application is being ptraced
by a debugger) or coming back from a system call, this reduces the overhead
greatly but still generates a lot of spurious mapping changes that may pollute
the simulator assumptions. We expect this issue to be fixed with the addition
of a proper API on future versions of PIN. Unlike memory access information
given the importance of the mapping information it is sent independently of
the simulation mode as it is generated.
In order to take track of the memory accesses the RecordMemExec,
24
Chapter 4. System design 4.3. Schnapps: a simple consumer of the traces
RecordMemRead, RecordMemPrefetch and RecordMemPreWrite functions
are used. Also when the user is insterested in the data generated by these
functions, RecordMemPreWrite changes its behavior so it can access the
memory information provided before the access and a new function called
RecordMemWrite and executed after the instruction finishes is added, the
reason for this is that the written data can not be known otherwise.
In order to handle the state machine we have an enum called state which
contains the current simulation state, a function called nextState which takes
care of handling the previous variable and the one with the instruction
counter, and is called only when the instruction counter reaches zero, we also
have the StateCounter method that will decrease the instruction counter by
one and say whether we have processed the last instruction or, we also have
CounterDone which sends the events for starting or ending a simulation and
finally we have the Instrument function which check whether instrumentation
code should or not be run in the current state.
We finally have a few callbacks, ThreadStart used to notify the cre-
ation of new threads, ThreadFini used to notify its destruction and Fini
which is called before the instrumented program exits and will generate the
SERVER_DIED event.
4.3 Schnapps: a simple consumer of the traces
The code on Schnapps is all written on the main function given its simplicity.
First, the queues are negotiated with Pint, afterwards, the variables hold-
ing the stats are initialized to 0 and we state we are not simulating anything.
With that done we enter the main loop that will process information
until the trace generator reports that it has died. In this loop, the data from
25
4.4. Gin5: a gem5 trace player Chapter 4. System design
the thread queue is extracted and added to the statistics. Afterwards, the
main queue is checked for events like mappings being added/removed and
these changes are printed. And finally control signals are handled properly,
including the beginning of a simulation (by setting the stats to 0) and the
end (by printing the simulation stats).
Finally, once outside of the loop and with the simulation finished, we
print the total stats.
4.4 Gin5: a gem5 trace player
The biggest amount of coding is likely to have been written in these classes
since we had to revamp the trace readers and the trace CPUs so they would
work with the current memory system used by gem5.
The MemTraceReader class is a very simple class providing a single
method called getNextRequest that will provide either a pointer to the next
Request to be played on the memory system or a NULL pointer along with
the reason why it was provided.
The memory requests are represented by the MemTraceRequest which
returns packets through the getNextPkt method.
The PinReader class is derived from the MemTraceReader class and aside
from handling the Pint queues also adds some callback to delete the queues
when done.
Finally the TraceCPU class provides the MemPort classes and the Tick-
Event classes which are required by the simulator and is the responsible of
requesting the data to the reader when necessary and sending the requests to
the memory system through the proper port. From a CPU point of view it
emulates a system without a TLB (we basically take the LSBs of the address
26
Chapter 4. System design 4.4. Gin5: a gem5 trace player
to convert the virtual addresses we get into physical addresses) with ports
for an instruction and a data cache.





The benchmark results can be seen in the following table (extracted from the
annexed .ods file).
It surprises us to get a slowdown as high as 804x in the case of LU and
also the fact that fmm only got a 94x slowdown in the Graphite benchmarks.
Anyway, if we discard the fmm benchmark we can see that our system per-
forms better than graphite in all cases using a single processor.
28
Chapter 5. Results




















The project development has taken a long time given the research components
it had yet, its development helped us have a good insight on how to improve
simulators speed.
Also, given the promising results obtained with the benchmarks (worst
case of 804 when simulating, best case of 199 with a mean of 360,5 and an
average of 419) run during the development and testing of this fairly limited
version we think that ideas like simulation segmentation and instrumentation
based simulation on independent process may help to the development of
faster and more powerful simulators and will continue with its development.
We expect to see in the future heavily multithreaded simulators where
each processor has its own group of threads each handling the different stages
independently in order to speed up execution times on multiprocessor ma-
chines.
We also expect to see in the future more simulators used the process based
separation between the data collection routines responsible of the execution of
the program and the simulation itself in order to allow for the usage of higher
level languages with less restrictions whilst still providing high performance
30
Chapter 6. Conclusions 6.1. Improvements for next release
and native execution of the simulated programs.
6.1 Improvements for next release
In the next release we intend to have a fully parallel instrumentation frame-
work, we will also reimplement the trace simulator as a full gem5 CPU so
it can have proper TLB handling and can be extended internally with more
complex models. Finally we will change the queuing system so the simulator
knows how many events will be generated by the instruction being executed
before these events are handled down. We will also interconnect Multi2sim
with gem5 in order to prove the powerfulness of Mead.
Once we release the next version we intend to publish a paper on a pub-





In order to build the sources it suffices with running the make command on
the sources directory.
A.2 Pint
Running the Pint pintool is quite easy and for that it is enough to run:
./pin -t source/tools/SimpleExamples/obj-intel64/pinatrace.so – command
arguments
Options can be set by setting the desired switches between pinatrace.so and
the –
Currently the following options are available:
-f number : adds the set number of instruction to be run in the fast forward
state (used many times it will set more instruction counts to be run the next
time we go back to said state)
-w number : adds the set number of instruction to be run in the warm up
32
Appendix A. User manual A.3. Schnapps
state (used many times it will set more instruction counts to be run the next
time we go back to said state)
-s number : adds the set number of instruction to be run in the simulation
state (used many times it will set more instruction counts to be run the next
time we go back to said state)
-syscallmap {0,1} : disables, if 0, or enables, if 1, the checking of process
mappings after returning from a syscall
-ctxchangemap {0,1} : disables, if 0, or enables, if 1, the checking of process
mappings after a context change
-values {0,1}: disables, if 0, or enables, if 1, the copying of data along
with the memory events
A.3 Schnapps
For running Schnapps just run ./consumer
A.4 Gin5
Gin5 requires a python file setting the system to be emulated. An example
of such system can be found in the pintrace.py file. Once you have set
up your system on a python file you just need to run the gem5.fast binary
followed by the file containing the system being defined.
Scripts to set up systems may take arguments from the command line if
introduced after the script file. Our example file does not make use of this







3 #include <l inux / futex . h>
4 #include <sys / ipc . h>
5 #include <sys /sem . h>
6 #include <sys /shm . h>
7 #include <sys / s y s c a l l . h>
8 #include <sys / types . h>
9 #include <unistd . h>
10
11 #include <cs i gna l >
12 #include <cstd io>
13 #include <cs td l i b >
14 #include <cs t r i ng>
15 #include <new>
16
17 #define l i k e l y (x ) __builtin_expect ( ! ! ( x ) , 1)
18 #define un l i k e l y (x ) __builtin_expect ( ! ! ( x ) , 0)
19
20 // Compile t ime c o n f i g u r a t i o n
21 #define PADSIZE 64 // 64 b y t e l i n e s i z e as per i n t e l s p e c s
22 #define MAXMEMSIZE 32 // 256 b i t s as per AVX, needs t o be i n c r e a s e d on t h e f u t u r e
23 //#d e f i n e USE_DATA 1 // Add c o d e p a t h s t o o b t a i n t h e d a t a on t h e memory a c c e s s e s
24 //#d e f i n e MULTITHREADED 1 // Add c o d e p a t h s t o a l l o w f o r m u l t i t h r e a d e d a p p l i c a t i o n s
25 //#d e f i n e USE_STATES 1 // Use a s t a t e machine t o a l l o w f o r f a s t f o r w a r d and warm up s t a t e s
26 //#d e f i n e DTRACE 1 // Add d e b u g i n g o u t p u t
27 //#d e f i n e USE_YIELD 1 // Use y i e l d ( ) or t h e p i n e q u i v a l e n t when w a i t i n g f o r t h e o t h e r t h r e a d
28 #define QSIZE_ 1024 // D e f a u l t FIFO queue s i z e
29
30 #i f d e f USE_YIELD
31 #i f d e f PIN_H
32 #define YIELD PIN_Yield
33 #e l s e
34 #include <sched . h>
35 #define YIELD sched_yie ld
36 #endif








45 typedef void VOID;
46 typedef u_int32_t UINT32 ;
47 typedef u_int8_t UINT8 ;
48 #endif
49
50 enum DataType { INVALDATA, STARTTH, ACCMEM };
34
Appendix B. Relevant source code
51 enum AccessType {ACCEXEC, ACCREAD, ACCWRITE, ACCPREFETCH };
52
53
54 // #d e f i n e PAD( n ) ( ( ( ( n ) + ( PADSIZE − 1 ) ) / PADSIZE ) ∗ PADSIZE)
55
56 #include <cas s e r t>
57
58 #i f d e f DTRACE
59 #define dcp r i n t f ( c , . . . ) i f ( c ) f p r i n t f ( s tder r , __VA_ARGS__)
60 #define dp r i n t f ( . . . ) f p r i n t f ( s tder r , __VA_ARGS__)
61 #define dcputs ( c , a ) i f ( c ) fput s ( ( a ) , s t d e r r )
62 #define dputs ( a ) fput s ( ( a ) , s t d e r r )
63 #e l s e
64 #define dcp r i n t f ( . . . )
65 #define dp r i n t f ( . . . )
66 #define dcputs ( c , a )
67 #define dputs ( a )
68 #endif
69
70 //TODO: use a l i g n m e n t s i n s t e a d o f p a d d i n g s
71 //TODO: use o t h e r padded s t r u c t f o r t h e d a t a from read t o w r i t e
72 c l a s s MemAccess {
73 private :
74 //We are not g o i n g t o use d e r i v a t e c l a s s e s h e r e f o r e f f i c i e n c y
75 AccessType type ;
76 VOID ∗ ea ; // E f f e c t i v e a d d r e s s o f t h e a c c e s s
77 #i f d e f USE_DATA
78 char data [MAXMEMSIZE] ; // Contains e i t h e r t h e d a t a e x e c u t e d / read or t h e d a t a c o n t a i n e d b e f o r e w r i t i n g
79 char wdata [MAXMEMSIZE] ; // This i s v a l i d o n l y when t h e d a t a a c c e s s i s a w r i t e c o n t a i n s t h e w r i t t e n d a t a
80 #end i f
81 UINT32 s i z e ; // S i z e o f t h e a c c e s s
82 #i f d e f MULTITHREADED
83 UINT32 t i d ; // The ID o f t h e t h r e a d g e n e r a t i n g t h e a c c e s s
84 #end i f
85 #i f d e f USE_DATA
86 i n l i n e void setData ( ) {
87 a s s e r t (PIN_SafeCopy ( data , ea , s i z e ) == s i z e ) ;
88 }
89 i n l i n e void copyData ( const MemAccess &ma) {
90 memcpy( data , ma. data , s i z e ) ;
91 i f ( type == ACCWRITE )
92 memcpy(wdata , ma. wdata , s i z e ) ;
93 }
94 #end i f
95 public :
96 #i f d e f USE_DATA
97 i n l i n e void setWdata ( ) {
98 a s s e r t (PIN_SafeCopy (wdata , ea , s i z e ) == s i z e ) ;
99 }
100 #end i f
101 i n l i n e void MemAccessSet ( AccessType type , VOID ∗ ea , UINT32 s i z e
102 #i f d e f MULTITHREADED
103 , UINT32 t i d
104 #end i f
105 ) {
106 this−>type = type ;
107 this−>ea = ea ;
108 this−>s i z e = s i z e ;
109 #i f d e f MULTITHREADED
110 this−>t id = t id ;
111 #end i f
112 #i f d e f USE_DATA
113 i f ( type != ACCPREFETCH) {
114 setData ( ) ;
115 }
116 #end i f
117 }
118 i n l i n e void MemAccessSet ( const MemAccess &ma) {
119 type = ma. type ;
120 ea = ma. ea ;
121 s i z e = ma. s i z e ;
122 #i f d e f MULTITHREADED
123 t i d = ma. t i d ;
124 #end i f
125 #i f d e f USE_DATA
126 i f ( type != ACCPREFETCH) {
127 copyData (ma) ;
128 }
129 #end i f
130 }
131 void show (FILE ∗ f ) ; // R e q u i r e s t h e C LOCK
132 i n l i n e AccessType getType ( ) const { return type ;}
133 i n l i n e void∗ getEA () const { return ea ;}
35
Appendix B. Relevant source code
134 #i f d e f USE_DATA
135 i n l i n e const void∗ getData ( ) const { return data ;}
136 i n l i n e const void∗ getWData ( ) const { return wdata ;}
137 #end i f
138 i n l i n e UINT32 ge tS i z e ( ) const { return s i z e ;}
139 #i f d e f MULTITHREADED
140 i n l i n e UINT32 getTid ( ) const { return t i d ;}
141 #end i f
142 } cachea l i gned ;
143
144 union SimDataU {
145 c l a s s MemAccess ma;
146 } ;
147
148 c l a s s SimData {
149 private :
150 DataType type ;
151 SimDataU data ;
152 public :
153 SimData ( ) : type (INVALDATA) {
154 }
155 SimData (DataType _type ) : type (_type ) {
156 }
157 i n l i n e DataType getType ( ) const {
158 return type ;
159 }
160 i n l i n e void setType (DataType _type ) {
161 type = _type ;
162 }
163 i n l i n e MemAccess & getMa ( ) {
164 type = ACCMEM;
165 return data .ma;
166 }
167 i n l i n e const MemAccess & getCMa ( ) const {
168 a s s e r t ( type == ACCMEM) ;




173 enum InstEventType {
174 ADDMAPPING, //A mapping was added d u r i n g t h e l a s t c o n t e x t change / s y s c a l l
175 REMOVEMAPPING, //A mapping was removed d u r i n g t h e l a s t c o n t e x t change / s y s c a l l
176 ADDTHREAD, //A new e x e c u t i o n t h r e a d has been spawned
177 REMOVETHREAD, //An e x e c u t i o n t h r e a d has c e a s e d e x i s t i n g
178 } ;
179
180 struct range {
181 unsigned long int b ; // b e g i n
182 unsigned long int e ; // end
183 i n l i n e bool operator < ( const struct range &r ) const {
184 // There s h o u l d n ’ t be o v e r l a p p i n g r a n g e s ( a t l e a s t i n t h e o r y ) ;




189 union InstEventData {
190 range r ;




195 c l a s s InstEvent {
196 private :
197 InstEventType type ;
198 InstEventData data ;
199 public :
200 i n l i n e InstEvent ( ) { }
201 i n l i n e void SetInstEvent ( InstEventType _type , range _r) {
202 type = _type ;
203 data . r = _r ;
204 }
205 i n l i n e InstEventType getType ( ) const {
206 return type ;
207 }
208 i n l i n e range getRange ( ) {
209 a s s e r t ( type == ADDMAPPING | | type == REMOVEMAPPING) ;
210 return data . r ;
211 }
212 } cachea l i gned ;
213 // This i s a c l a s s i m p l e m e n t i n g l o c k l e s s s i n g l e p r o d u c e r s s i n g l e consumer q u e u e s
214 // They are v e r y u s e f u l f o r f a s t e f f i c i e n t IPC t h r o u g h s h a r e d memory t h o u g h you
215 // need t o e n s u r e t h e s t r u c t u r e b e i n g queued has a l l t h e n e c e s s a r y d a t a i n s i d e
216 // i . e . doesn ’ t u s e s r e f e r e n c e s .
36
Appendix B. Relevant source code
217 // C u r r e n t l y we use them f o r two purpouses , p a s s i n g e v e n t s r e l a t e d t o memory
218 // mappings and t h r e a d s between t h e i n s t r u m e n t a t i o n and t h e s i m u l a t o r and
219 // p a s s i n g around t h e memory a c c e s e s o f each t h r e a d .
220
221 //We o n l y use QSIZE −1 t h u s t h e r e i s a l w a y s one e l e m e n t f r e e f o r p r o c e s s i n g b e f o r e q u e u e i n g .
222 #define NEXTQELEM(v) ( ( ( v ) + 1) % QSIZE)
223
224 enum shmstatus {NONE = 0 , // I n i t i a l s t a t e
225 CLIENT_ACK=1, //The c l i e n t c o n f i r m s r e c e p t i o n o f p r e v i o u s s t a t e
226 SERVER_STARTED=2, //The s e r v e r has j u s t s t a r t e d
227 SERVER_DIED=3, //The s e r v e r has d i e d
228 // This ones r e f e r t o t h e n e x t i n s t r u c t i o n pushed t o t h e queue ( so t h e y i n c l u d e up u n t i l t h e ACCEXEC a f t e r t h a t )
229 SERVER_SIM_START=4, //We are g o i n g t o jump i n t o s i m u l a t i o n r e s e t s t a t s




234 // A l o c k l e s s s i n g l e p r o d u c e r s i n g l e consumer queue , w i t h more than 1 you w i l l need l o c k s
235 template <c l a s s T, int QSIZE=QSIZE_> c l a s s SHMQ {
236 T queue [QSIZE ] cachea l i gned ;
237 v o l a t i l e sig_atomic_t qhead cachea l i gned ;
238 v o l a t i l e sig_atomic_t q t a i l cachea l i gned ;
239 // Elements are i n s e r t e d on t h e head and removed from t h e t a i l l i k e a snake .
240 v o l a t i l e sig_atomic_t con t r o l cachea l i gned ;
241 public :
242 i n l i n e SHMQ () : qhead (0 ) , q t a i l (0 ) {
243 }
244 i n l i n e T & gethead ( ) { return queue [ qhead ] ; }
245 i n l i n e T & g e t t a i l ( ) {
246 a s s e r t ( ! empty ( ) ) ;
247 return queue [ q t a i l ] ;
248 }
249 i n l i n e bool f u l l ( ) {return NEXTQELEM( qhead ) == q t a i l ; }
250 i n l i n e bool empty ( ) {return q t a i l == qhead ; }
251 // Wait f o r t h e queue not t o be f u l l
252 i n l i n e void wai t_fu l l ( ) {
253 while ( un l i k e l y ( f u l l ( ) ) ) YIELD( ) ;
254 }
255 // Wait f o r t h e queue not t o be empty ( I f t h e s e r v e r d i e s i t w i l l n e v e r be )
256 i n l i n e bool wait_empty_cond ( ) {
257 return ( empty ( ) && con t r o l == CLIENT_ACK) ;
258 }
259 i n l i n e void wait_empty ( ) {
260 while ( un l i k e l y (wait_empty_cond ( ) ) ) YIELD( ) ;
261 }
262 i n l i n e void wait_not_empty ( ) {
263 while ( un l i k e l y ( ! empty ( ) ) ) YIELD( ) ;
264 }
265 i n l i n e void push ( ) {
266 a s s e r t ( ! f u l l ( ) ) ;
267 qhead = NEXTQELEM( qhead ) ;
268 }
269 // Wait i f n e c e s s a r y t h e n push
270 i n l i n e void wait_push ( ) {
271 wai t_fu l l ( ) ;
272 push ( ) ;
273 }
274 i n l i n e void pop ( ) {
275 a s s e r t ( ! empty ( ) ) ;
276 q t a i l = NEXTQELEM( q t a i l ) ;
277 }
278 i n l i n e enum shmstatus r e c e i v e_cont ro l ( ) {
279 i f ( c on t r o l == CLIENT_ACK) return NONE;
280 return (enum shmstatus ) con t r o l ;
281 }
282 i n l i n e void ack_control ( ) {
283 while ( un l i k e l y ( con t r o l == CLIENT_ACK)) YIELD( ) ;
284 con t r o l = CLIENT_ACK;
285 }
286 i n l i n e void send_control (enum shmstatus s t ) {
287 a s s e r t ( s t != CLIENT_ACK) ; // For t h i s we s h o u l d use a c k _ c o n t r o l i n s t e a d
288 con t r o l = s t ;
289 // Wait f o r t h e ACK
290 while ( un l i k e l y ( con t r o l != CLIENT_ACK)) YIELD( ) ;
291 }
292 i n l i n e void wait_start ( ) {
293 sig_atomic_t control_ ;
294 while ( ( control_ = con t r o l ) != SERVER_STARTED) s y s c a l l ( SYS_futex , &contro l ,FUTEX_WAIT, control_ ,NULL,NULL, 0 ) ;
295 con t r o l = CLIENT_ACK;
296 s y s c a l l ( SYS_futex , &contro l ,FUTEX_WAKE,1 ,NULL,NULL, 0 ) ;
297 }
298 i n l i n e void t e l l _ s t a r t ( ) {
299 sig_atomic_t control_ ;
37
Appendix B. Relevant source code
300 con t r o l = SERVER_STARTED;
301 s y s c a l l ( SYS_futex , &contro l ,FUTEX_WAKE,1 ,NULL,NULL, 0 ) ;
302 // Wait f o r t h e ACK




307 typedef SHMQ<SimData> SimDataq ;
308 typedef SHMQ<InstEvent> InstEventq ;
309
310 SimDataq ∗ s e r v e r_ in i t 2 ( ) ;
311 SimDataq ∗ c l i e n t_ i n i t 2 ( ) ;
312 void s e r v e r_ f i n i 2 ( SimDataq ∗q ) ;
313 void c l i e n t_ f i n i 2 ( SimDataq ∗q ) ;
314
315 //TODO: Fix t h e c a s e where t h e c l i e n t i s t h e one d o i n g t h e f i n a l i z a t i o n
316
317 //TODO: a c c e s s q u e u e s s h o u l d be c r e a t e d d y n a m i c a l l y and p a s s e d t h r o u g h t h e e v e n t queue
318 SimDataq ∗ get_q2 ( int &shmid ) {
319 SimDataq ∗ q ;
320 i f ( ( shmid = shmget (2684 , s i z e o f ( SimDataq ) , IPC_CREAT | 0666)) < 0) {
321 per ro r ( " shmget " ) ;
322 e x i t ( 1 ) ;
323 }
324 void ∗shm ;
325 i f ( ( shm = shmat ( shmid , NULL, 0) ) == (void ∗) −1) {
326 per ro r ( " shmat " ) ;
327 e x i t ( 1 ) ;
328 }
329 q = static_cast<SimDataq∗>(shm ) ;




334 SimDataq ∗ s e r v e r_ in i t 2 ( ) {
335 int shmid ;
336 SimDataq ∗ q = get_q2 ( shmid ) ;
337 new ( q ) SimDataq ( ) ; //We use a p l a c e m e n t new so we have t h e SimDataq i n t h e s h a r e d memory
338 q−>t e l l_ s t a r t ( ) ;
339 return q ;
340 }
341
342 SimDataq ∗ c l i e n t_ i n i t 2 ( ) {
343 int shmid ;
344 SimDataq ∗ q = get_q2 ( shmid ) ;
345 q−>wait_start ( ) ;
346 // S i n c e we are c o n n e c t e d we t e l l t h e OS t h e segement can be d e l e t e d
347 i f ( shmctl ( shmid , IPC_RMID,NULL) < 0)
348 per ro r ( " shmctl " ) ;
349 return q ;
350 }
351
352 void s e r v e r_ f i n i 2 ( SimDataq ∗q ) {
353 q−>send_control (SERVER_DIED) ;
354 }
355
356 void c l i e n t_ f i n i 2 ( SimDataq ∗q ) {
357 q−>ack_control ( ) ;
358 q−>~SimDataq ( ) ;
359 }
360
361 //TODO: w i t h p r o p p e r t e m p l a t e u s a g e t h i s c o u l d g e t p r e t t i e r
362 InstEventq ∗ s e r v e r_ in i t ( ) ;
363 InstEventq ∗ c l i e n t_ i n i t ( ) ;
364 void s e r v e r_ f i n i ( InstEventq ∗q ) ;
365 void c l i e n t_ f i n i ( InstEventq ∗q ) ;
366
367 InstEventq ∗ get_q ( int &shmid ) {
368 InstEventq ∗ q ;
369 i f ( ( shmid = shmget (2687 , s i z e o f ( InstEventq ) , IPC_CREAT | 0666)) < 0) {
370 per ro r ( " shmget " ) ;
371 e x i t ( 1 ) ;
372 }
373 void ∗shm ;
374 i f ( ( shm = shmat ( shmid , NULL, 0) ) == (void ∗) −1) {
375 per ro r ( " shmat " ) ;
376 e x i t ( 1 ) ;
377 }
378 q = static_cast<InstEventq∗>(shm ) ;





Appendix B. Relevant source code
383 InstEventq ∗ s e r v e r_ in i t ( ) {
384 int shmid ;
385 InstEventq ∗ q = get_q ( shmid ) ;
386 new ( q ) InstEventq ( ) ; //We use a p l a c e m e n t new so we have t h e I n s t E v e n t q i n t h e s h a r e d memory
387 q−>t e l l_ s t a r t ( ) ;
388 return q ;
389 }
390
391 InstEventq ∗ c l i e n t_ i n i t ( ) {
392 int shmid ;
393 InstEventq ∗ q = get_q ( shmid ) ;
394 q−>wait_start ( ) ;
395 // S i n c e we are c o n n e c t e d we t e l l t h e OS t h e segement can be d e l e t e d
396 i f ( shmctl ( shmid , IPC_RMID,NULL) < 0)
397 per ro r ( " shmctl " ) ;
398 return q ;
399 }
400
401 void s e r v e r_ f i n i ( InstEventq ∗q ) {
402 q−>send_control (SERVER_DIED) ;
403 }
404
405 void c l i e n t_ f i n i ( InstEventq ∗q ) {
406 q−>ack_control ( ) ;






Appendix B. Relevant source code
pinatrace.cpp
1 /∗BEGIN_LEGAL
2 ∗ I n t e l Open Source L i c e n s e
3 ∗
4 ∗ C o p y r i g h t ( c ) 2002 −2011 I n t e l C o r p o r a t i o n . A l l r i g h t s r e s e r v e d .
5 ∗
6 ∗ R e d i s t r i b u t i o n and use i n s o u r c e and b i n a r y forms , w i t h or w i t h o u t
7 ∗ m o d i f i c a t i o n , are p e r m i t t e d p r o v i d e d t h a t t h e f o l l o w i n g c o n d i t i o n s are
8 ∗ met :
9 ∗
10 ∗ R e d i s t r i b u t i o n s o f s o u r c e code must r e t a i n t h e above c o p y r i g h t n o t i c e ,
11 ∗ t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r . R e d i s t r i b u t i o n s
12 ∗ i n b i n a r y form must r e p r o d u c e t h e above c o p y r i g h t n o t i c e , t h i s l i s t o f
13 ∗ c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r i n t h e documentat ion and / or
14 ∗ o t h e r m a t e r i a l s p r o v i d e d w i t h t h e d i s t r i b u t i o n . N e i t h e r t h e name o f
15 ∗ t h e I n t e l C o r p o r a t i o n nor t h e names o f i t s c o n t r i b u t o r s may be used t o
16 ∗ e n d o r s e or promote p r o d u c t s d e r i v e d from t h i s s o f t w a r e w i t h o u t
17 ∗ s p e c i f i c p r i o r w r i t t e n p e r m i s s i o n .
18 ∗
19 ∗ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 ∗ ‘ ‘AS IS ’ ’ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 ∗ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 ∗ A PARTICULAR PURPOSE ARE DISCLAIMED . IN NO EVENT SHALL THE INTEL OR
23 ∗ ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT , INCIDENTAL,
24 ∗ SPECIAL , EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 ∗ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE,
26 ∗ DATA, OR PROFITS ; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 ∗ THEORY OF LIABILITY , WHETHER IN CONTRACT, STRICT LIABILITY , OR TORT
28 ∗ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 ∗ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 ∗ END_LEGAL ∗/
31 /∗ ===================================================================== ∗/
32 /∗
33 ∗ @ORIGINAL_AUTHOR: Robert Cohn
34 ∗/
35
36 /∗ ===================================================================== ∗/
37 /∗ ! @ f i l e
38 ∗ This f i l e c o n t a i n s an ISA−p o r t a b l e PIN t o o l f o r t r a c i n g memory a c c e s s e s .
39 ∗/
40
41 #include " pin .H"






48 // Use when c a l l i n g C and C++ l i b r a r y f u n c t i o n s
49 PIN_LOCK c_lock ;
50
51 // FILE ∗ T r a c e F i l e ;
52 FILE ∗ S t a t sF i l e ;
53
54 // KNOB<s t r i n g > KnobOutputFi le (KNOB_MODE_WRITEONCE, " p i n t o o l " ,
55 // " o " , " p i n a t r a c e . o u t " , " s p e c i f y t r a c e f i l e name " ) ;
56 KNOB_COMMENT fcomment ( " p in t oo l : t r a c e " , " Options ␣ f o r ␣ the ␣ t r a c i ng ␣ behaviour " ) ;
57 #i f d e f USE_STATES
58 KNOB<UINT64> Knobf (KNOB_MODE_APPEND, " p in t oo l : t r a c e " ,
59 " f " , " " , "Number␣ o f ␣ i n s t r u c t i o n s ␣ to ␣ f a s t ␣ forward . ␣Must␣be␣used␣ as ␣many␣ times ␣ as ␣−w␣and␣−s . ␣ I f ␣not␣ s e t ␣ the ␣ s imulator ␣ w i l l ␣ t r a c e ␣ the ␣whole␣program . " ) ;
60 KNOB<UINT64> Knobw(KNOB_MODE_APPEND, " p in t oo l : t r a c e " ,
61 "w" , " " , "Number␣ o f ␣ i n s t r u c t i o n s ␣ to ␣use ␣ f o r ␣ s t ru c tu r e ␣warming . ␣Must␣be␣used␣as ␣many␣ times ␣ as ␣−f ␣and␣−s . ␣ I f ␣not␣ s e t ␣ the ␣ s imulator ␣ w i l l ␣ t r a c e ␣ the ␣whole␣program . " ) ;
62 KNOB<UINT64> Knobs (KNOB_MODE_APPEND, " p in t oo l : t r a c e " ,
63 " s " , " " , "Number␣ o f ␣ i n s t r u c t i o n s ␣ to ␣use ␣ f o r ␣ s imu la t ion . ␣Must␣be␣used␣ as ␣many␣ times ␣ as ␣−f ␣and␣−w. ␣ I f ␣not␣ s e t ␣ the ␣ s imulator ␣ w i l l ␣ t r a c e ␣ the ␣whole␣program . " ) ;
64 #endif
65 KNOB_COMMENT mcomment( " p in t oo l : t r a c e " , " Options ␣ f o r ␣map␣ t r a c i ng " ) ;
66 KNOB<BOOL> KnobSyscallMap (KNOB_MODE_WRITEONCE, " p in too l : t r a c e " ,
67 " sysca l lmap " , " 1 " , "Check␣mappings␣ a f t e r ␣ r e tu rn ing ␣ from␣a␣ s y s c a l l " ) ;
68 KNOB<BOOL> KnobCtxChangeMap(KNOB_MODE_WRITEONCE, " p in too l : t r a c e " ,
69 " ctxchangemap " , " 0 " , "Check␣mappings␣ a f t e r ␣a␣ context ␣change " ) ;
70 #i f d e f USE_DATA
71 KNOB<BOOL> KnobValues (KNOB_MODE_WRITEONCE, " p in t oo l : t r a c e " ,
72 " va lues " , " 1 " , "Output␣memory␣ va lues ␣ reads ␣and␣wr i t t en " ) ;
73 #endif
74
75 #i f d e f MULTITHREADED
76 PIN_LOCK h_lock ;
77 s t a t i c TLS_KEY wMemAccess ;
78 #define GetQLock ( t i d ) GetLock(&(h_lock ) , t i d+1)
79 #define ReleaseQLock ( ) ReleaseLock (&(h_lock ) )
80 #e l s e
81 #define GetQLock ( t i d )
82 #define ReleaseQLock ( )
40
Appendix B. Relevant source code
83 #endif
84
85 s t a t i c SimDataq ∗q ;
86 s t a t i c InstEventq ∗ i q ;
87
88 s t a t i c FILE ∗mout ;
89
90 void parsemaps (void ) {
91 FILE ∗ f ;
92 s t a t i c set<range> prev ;
93 set<range> s , rem , add ;
94 range r ;
95 //TODO: we ’ l l need t o have a l o c k h e r e t o e n s u r e t h r e a d s don ’ t c o l l i d e
96 f = fopen ( " /proc / s e l f /maps " , " r " ) ;
97 while ( f s c a n f ( f , "%lx−%lx " ,&( r . b) ,&( r . e ) ) == 2) {
98 s . i n s e r t ( r ) ;
99 // Use t h i s when want ing t o copy t h e c o n t e n t s f o r v e r i f i c a t i o n
100 // p r i n t f ("% l x −%l x " , r . b , r . e ) ;
101 // i n t c ;
102 // w h i l e ( ( c = f g e t c ( f ) ) != ’\ n ’ ) p u t c h a r ( c ) ;
103 // p u t c h a r ( ’ \ n ’ ) ;
104 while ( f g e t c ( f ) != ’ \n ’ ) ;
105 } ;
106 //We don ’ t need t h e f i l e any more so r e l e a s e t h e FD
107 f c l o s e ( f ) ;
108
109 // N o t i f y r e m o v a l s f i r s t t o p r e v e n t i n t e r s e c t i o n s
110 s e t_d i f f e r e n c e ( prev . begin ( ) , prev . end ( ) , s . begin ( ) , s . end ( ) ,
111 i n s e r t e r ( rem , rem . begin ( ) ) ) ;
112 for ( set<range >:: i t e r a t o r i t = rem . begin ( ) ; i t != rem . end ( ) ; i t++) {
113 iq−>gethead ( ) . SetInstEvent (REMOVEMAPPING,∗ i t ) ;
114 iq−>wait_push ( ) ;
115 // p r i n t f ("− %l x −%l x \n " , i t −>b , i t −>e ) ;
116 }
117
118 //Now n o t i f y a d d i t i o n s
119 s e t_d i f f e r e n c e ( s . begin ( ) , s . end ( ) , prev . begin ( ) , prev . end ( ) ,
120 i n s e r t e r ( add , add . begin ( ) ) ) ;
121 for ( set<range >:: i t e r a t o r i t = add . begin ( ) ; i t != add . end ( ) ; i t++) {
122 iq−>gethead ( ) . SetInstEvent (ADDMAPPING,∗ i t ) ;
123 iq−>wait_push ( ) ;
124 // p r i n t f ("+ %l x −%l x \n " , i t −>b , i t −>e ) ;
125 }
126
127 //Ok f i n a l l y we ’ l l make t h e o l d s e t t h i s one
128 prev = s ;
129 }
130
131 void parsemaps1 (THREADID _1 , CONTEXT ∗_2 , SYSCALL_STANDARD _3 , VOID ∗_4) {
132 f put s ( " S y s c a l l ! \ n" ,mout ) ;
133 parsemaps ( ) ;
134 }
135
136 void parsemaps2 (THREADID _1 , CONTEXT_CHANGE_REASON _2 , const CONTEXT ∗_3 , CONTEXT ∗_4 , INT32 _5 , VOID ∗_6) {
137 f put s ( " Context␣change !\ n" ,mout ) ;
138 parsemaps ( ) ;
139 }
140
141 void parsemaps3 (void ) {
142 f put s ( " I n i t i a l ! \ n" ,mout ) ;




147 s t a t i c char AccessType2Char ( AccessType at ) {
148 switch ( at ) {
149 case ACCEXEC: return ’X ’ ;
150 case ACCREAD: return ’R ’ ;
151 case ACCWRITE: return ’W’ ;
152 case ACCPREFETCH: return ’P ’ ;




157 #i f d e f USE_DATA
158 s t a t i c VOID EmitMem(FILE ∗ f , VOID ∗ data , UINT32 s i z e )
159 {
160 switch ( s i z e )
161 {
162 case 0 :
163 break ;
164
165 /∗ TODO: Here we do some a s s u p t i o n s a b o u t s i z e s , a p r o p p e r program s h o u l d f i l l them p r o p e r l y ∗/
41
Appendix B. Relevant source code
166 case 1 :
167 f p r i n t f ( f , " 0x%02hhx " , (∗ static_cast<UINT8∗>(data ) ) ) ;
168 break ;
169
170 case 2 :
171 f p r i n t f ( f , " 0x%04hx " , (∗ static_cast<UINT16∗>(data ) ) ) ;
172 break ;
173
174 case 4 :
175 f p r i n t f ( f , " 0x%08x " , (∗ static_cast<UINT32∗>(data ) ) ) ;
176 break ;
177
178 case 8 :




183 i f ( s i z e > 0) {
184 f p r i n t f ( f , " 0x%02hhx " , (∗ static_cast<UINT8∗>(data ) ) ) ;
185 for (UINT32 i = 1 ; i < s i z e ; i++)
186 {








195 void MemAccess : : show (FILE ∗ f ) {
196 f p r i n t f ( f ,
197 #i f d e f MULTITHREADED
198 "%10u"
199 #end i f
200 " ␣%c␣%#016lx%␣3d␣ " ,
201 #i f d e f MULTITHREADED
202 (UINT32) t id ,
203 #end i f
204 AccessType2Char ( type ) , (unsigned long int ) ea , s i z e ) ;
205 #i f d e f USE_DATA
206 i f (KnobValues ) {
207 i f ( type != ACCPREFETCH) {
208 EmitMem( f , data , s i z e ) ;
209 i f ( type == ACCWRITE) {
210 f put s ( " ␣−>␣ " , f ) ;




215 #end i f
216 f put s ( " \n" , f ) ;
217 }
218
219 s t a t i c INT32 Usage ( )
220 {
221 f put s (
222 " This ␣ t oo l ␣ produces ␣a␣memory␣ address ␣ t ra c e .\ n"
223 " For␣ each␣memory␣ acc e s s ␣ ( execute / read/wr i t e / p r e f e t ch ) ␣ the ␣ea␣ i s ␣ recorded \n"
224 " \n" , s t d e r r ) ;
225
226 f put s (KNOB_BASE: : StringKnobSummary ( ) . c_str ( ) , s t d e r r ) ;
227





233 s t a t i c VOID PIN_FAST_ANALYSIS_CALL RecordMemExec (VOID ∗ ip , UINT32 s i z e
234 #i f d e f MULTITHREADED
235 , THREADID t id
236 #end i f
237 )
238 {
239 //TODO: use a per t h r e a d l o c k l e s s queue
240 GetQLock ( t i d +1);
241 q−>gethead ( ) . getMa ( ) . MemAccessSet (ACCEXEC, ip , s i z e
242 #i f d e f MULTITHREADED
243 , t i d
244 #end i f
245 ) ;
246 iq−>wait_not_empty ( ) ;
247 q−>wait_push ( ) ;
248 ReleaseQLock ( ) ;
42
Appendix B. Relevant source code
249 }
250
251 s t a t i c VOID PIN_FAST_ANALYSIS_CALL RecordMemRead(VOID ∗ ea , UINT32 s i z e
252 #i f d e f MULTITHREADED
253 , THREADID t id
254 #end i f
255 )
256 {
257 //TODO: use a per t h r e a d l o c k l e s s queue
258 GetQLock ( t i d +1);
259 q−>gethead ( ) . getMa ( ) . MemAccessSet (ACCREAD, ea , s i z e
260 #i f d e f MULTITHREADED
261 , t i d
262 #end i f
263 ) ;
264 iq−>wait_not_empty ( ) ;
265 q−>wait_push ( ) ;
266 ReleaseQLock ( ) ;
267 }
268
269 s t a t i c VOID PIN_FAST_ANALYSIS_CALL RecordMemPrefetch (VOID ∗ ea , UINT32 s i z e
270 #i f d e f MULTITHREADED
271 , THREADID t id
272 #end i f
273 )
274 {
275 //TODO: use a per t h r e a d l o c k l e s s queue
276 GetQLock ( t i d +1);
277 q−>gethead ( ) . getMa ( ) . MemAccessSet (ACCPREFETCH, ea , s i z e
278 #i f d e f MULTITHREADED
279 , t i d
280 #end i f
281 ) ;
282 iq−>wait_not_empty ( ) ;
283 q−>wait_push ( ) ;




288 s t a t i c VOID PIN_FAST_ANALYSIS_CALL RecordMemPreWrite (VOID ∗ ea , UINT32 s i z e
289 #i f d e f MULTITHREADED
290 , THREADID t id
291 #end i f
292 )
293 {
294 #i f d e f USE_DATA
295 #i f d e f MULTITHREADED
296 MemAccess ∗ma = static_cast<MemAccess ∗>(PIN_GetThreadData (wMemAccess , t i d ) ) ;
297 ma−>MemAccessSet (ACCWRITE, ea , s i z e , t i d ) ;
298 #e l s e
299 q−>gethead ( ) . getMa ( ) . MemAccessSet (ACCWRITE, ea , s i z e ) ;
300 #end i f
301 #e l s e
302 GetQLock ( t i d +1);
303 q−>gethead ( ) . getMa ( ) . MemAccessSet (ACCWRITE, ea , s i z e
304 #i f d e f MULTITHREADED
305 , t i d
306 #end i f
307 ) ;
308 iq−>wait_not_empty ( ) ;
309 q−>wait_push ( ) ;
310 ReleaseQLock ( ) ;





316 #i f d e f USE_DATA
317 s t a t i c VOID PIN_FAST_ANALYSIS_CALL RecordMemWrite (
318 #i f d e f MULTITHREADED
319 THREADID t id
320 #end i f
321 )
322 {
323 #i f d e f MULTITHREADED
324 MemAccess ∗ma = static_cast<MemAccess ∗>(PIN_GetThreadData (wMemAccess , t i d ) ) ;
325 #i f d e f USE_DATA
326 ma−>setWdata ( ) ;
327 #end i f
328 #e l s e
329 #i f d e f USE_DATA
330 q−>gethead ( ) . getMa ( ) . setWdata ( ) ;
331 #end i f
43
Appendix B. Relevant source code
332 #end i f
333
334 //TODO: use a per t h r e a d l o c k l e s s queue
335 GetQLock ( t i d +1);
336 #i f d e f MULTITHREADED
337 q−>gethead ( ) . getMa ( ) . MemAccessSet (∗ma) ;
338 #end i f
339 iq−>wait_not_empty ( ) ;
340 q−>wait_push ( ) ;




345 #i f d e f USE_STATES
346 s t a t i c UINT64 inscount = −1;
347 s t a t i c enum s t a t e { FASTFORWARD = 0 , WARMING = 1 , SIMULATION = 2} s t a t e = SIMULATION;
348 s t a t i c UINT32 f Index = 0 ;
349 s t a t i c UINT32 wIndex = 0 ;
350 s t a t i c UINT32 sIndex = 0 ;
351
352 i n l i n e VOID nextState ( ) {
353 do {
354 switch ( s t a t e ) {
355 case FASTFORWARD:
356 in scount = Knobw . Value (wIndex ) ;
357 wIndex++;
358 s t a t e = WARMING;
359 break ;
360 case WARMING:
361 in scount = Knobs . Value ( sIndex ) ;
362 sIndex++;
363 s t a t e = SIMULATION;
364 break ;
365 case SIMULATION:
366 // I f we are done s i m u l a t i n g s t o p
367 i f (Knobf . NumberOfValues ( ) == fIndex )
368 PIN_ExitApplication ( 0 ) ;
369 in scount = Knobf . Value ( f Index ) ;
370 f Index++;
371 s t a t e = FASTFORWARD;
372 break ;
373 default :
374 f put s ( "Unknown␣ s t a t e \n" , s t d e r r ) ;
375 PIN_ExitApplication ( 1 ) ;
376 break ;
377 }
378 } while ( inscount == 0 ) ;
379 }
380
381 s t a t i c ADDRINT PIN_FAST_ANALYSIS_CALL StateCounter (VOID ∗ ip
382 #i f d e f MULTITHREADED
383 THREADID t id




388 dp r i n t f ( " i n s ␣%p !\n" , ip ) ;
389 i f ( inscount == 0)
390 dputs ( " switch !\ n" ) ;
391 return in scount == 0 ;
392 }
393
394 s t a t i c ADDRINT PIN_FAST_ANALYSIS_CALL Instrument (
395 #i f d e f MULTITHREADED








404 s t a t i c VOID PIN_FAST_ANALYSIS_CALL CounterDone (
405 #i f d e f MULTITHREADED
406 THREADID t id
407 #end i f
408 )
409 {
410 enum s t a t e o r i g = s t a t e ;
411 i f ( o r i g == SIMULATION) {
412 q−>send_control (SERVER_SIM_END) ;
413 }
414 nextState ( ) ;
44
Appendix B. Relevant source code
415 i f ( s t a t e == SIMULATION) {
416 q−>send_control (SERVER_SIM_START) ;
417 }
418 dp r i n t f ( " s t a t e :%d−>%d\n" , or ig , s t a t e ) ;
419 //We o n l y want t o change t h e i n s t r u m e n t a t i o n when s w i t c h i n g from any s t a t e t o t h e f a s t f o r w a r d s t a t e or v i c e v e r s a
420 // i f ( ( o r i g == FASTFORWARD && s t a t e != FASTFORWARD ) | | ( o r i g != FASTFORWARD && s t a t e == FASTFORWARD) )
421 //TODO: t h i s i s n ’ t w o r k i n g as e x p e c t e d : ( ( y e t )




426 // I n s t r u m e n t a t i o n
427 s t a t i c VOID In s t r u c t i on ( INS ins , VOID ∗v )
428 {
429 // Also u s i n g t h e IF − t h e n c a l l b a c k sys tem make PIN more l i k e l y t o i n l i n e t h e c o u n t e r code
430 #i f d e f USE_STATES
431 INS_Inse r t I fCa l l ( ins , IPOINT_BEFORE, (AFUNPTR) StateCounter , IARG_FAST_ANALYSIS_CALL,
432 IARG_INST_PTR,
433 #i f d e f MULTITHREADED
434 IARG_THREAD_ID,
435 #end i f
436 IARG_END) ;
437 INS_InsertThenCall ( ins , IPOINT_BEFORE, (AFUNPTR)CounterDone , IARG_FAST_ANALYSIS_CALL,
438 #i f d e f MULTITHREADED
439 IARG_THREAD_ID,
440 #end i f
441 IARG_END) ;
442 #endif
443 // i f ( s t a t e != FASTFORWARD) {
444 #i f d e f USE_STATES
445 INS_Inse r t I fCa l l ( ins , IPOINT_BEFORE, (AFUNPTR) Instrument , IARG_FAST_ANALYSIS_CALL,
446 #i f d e f MULTITHREADED
447 IARG_THREAD_ID,
448 #end i f
449 IARG_END) ;
450 INS_InsertThenCall
451 #e l s e
452 INS_InsertCal l
453 #endif
454 ( ins , IPOINT_BEFORE, (AFUNPTR)RecordMemExec , IARG_FAST_ANALYSIS_CALL,
455 IARG_INST_PTR,
456 IARG_UINT32, INS_Size ( i n s ) ,
457 #i f d e f MULTITHREADED
458 IARG_THREAD_ID,
459 #end i f
460 IARG_END) ;
461
462 i f ( INS_IsMemoryRead ( i n s ) )
463 {
464 #i f d e f USE_STATES
465 INS_Inse r t I fPred i ca tedCa l l ( ins , IPOINT_BEFORE, (AFUNPTR) Instrument , IARG_FAST_ANALYSIS_CALL,
466 #i f d e f MULTITHREADED
467 IARG_THREAD_ID,
468 #end i f
469 IARG_END) ;
470 INS_InsertThenPredicatedCal l
471 #e l s e
472 INS_InsertPred icatedCal l
473 #endif
474 ( ins , IPOINT_BEFORE,
475 (AFUNPTR)( INS_IsPrefetch ( i n s )? RecordMemPrefetch : RecordMemRead ) , IARG_FAST_ANALYSIS_CALL,
476 IARG_MEMORYREAD_EA,
477 IARG_MEMORYREAD_SIZE,
478 #i f d e f MULTITHREADED
479 IARG_THREAD_ID,




484 i f ( INS_HasMemoryRead2( i n s ) )
485 {
486 #i f d e f USE_STATES
487 INS_Inse r t I fPred i ca tedCa l l ( ins , IPOINT_BEFORE, (AFUNPTR) Instrument , IARG_FAST_ANALYSIS_CALL,
488 #i f d e f MULTITHREADED
489 IARG_THREAD_ID,
490 #end i f
491 IARG_END) ;
492 INS_InsertThenPredicatedCal l
493 #e l s e
494 INS_InsertPred icatedCal l
495 #endif
496 ( ins , IPOINT_BEFORE, (AFUNPTR)( INS_IsPrefetch ( i n s )? RecordMemPrefetch : RecordMemRead ) , IARG_FAST_ANALYSIS_CALL,
497 IARG_MEMORYREAD2_EA,
45
Appendix B. Relevant source code
498 IARG_MEMORYREAD_SIZE,
499 #i f d e f MULTITHREADED
500 IARG_THREAD_ID,




505 // i n s t r u m e n t s s t o r e s u s i n g a p r e d i c a t e d c a l l , i . e .
506 // t h e c a l l happens i f f t h e s t o r e w i l l be a c t u a l l y e x e c u t e d
507 i f ( INS_IsMemoryWrite ( i n s ) )
508 {
509 #i f d e f USE_STATES
510 INS_Inse r t I fPred i ca tedCa l l ( ins , IPOINT_BEFORE, (AFUNPTR) Instrument , IARG_FAST_ANALYSIS_CALL,
511 #i f d e f MULTITHREADED
512 IARG_THREAD_ID,
513 #end i f
514 IARG_END) ;
515 INS_InsertThenPredicatedCal l
516 #e l s e
517 INS_InsertPred icatedCal l
518 #endif
519 ( ins , IPOINT_BEFORE, (AFUNPTR)RecordMemPreWrite , IARG_FAST_ANALYSIS_CALL,
520 IARG_MEMORYWRITE_EA,
521 IARG_MEMORYWRITE_SIZE,
522 #i f d e f MULTITHREADED
523 IARG_THREAD_ID,
524 #end i f
525 IARG_END) ;
526 #i f d e f USE_DATA
527 #i f d e f USE_STATES
528 INS_Inse r t I fPred i ca tedCa l l ( ins , IPOINT_BEFORE, (AFUNPTR) Instrument , IARG_FAST_ANALYSIS_CALL,
529 #i f d e f MULTITHREADED
530 IARG_THREAD_ID,
531 #end i f
532 IARG_END) ;
533 INS_InsertThenPredicatedCal l
534 #e l s e
535 INS_InsertPred icatedCal l
536 #endif
537 ( ins ,
538 ( ! INS_HasFallThrough ( i n s )?IPOINT_TAKEN_BRANCH:IPOINT_AFTER) ,
539 (AFUNPTR)RecordMemWrite , IARG_FAST_ANALYSIS_CALL,
540 #i f d e f MULTITHREADED
541 IARG_THREAD_ID,







549 // M u l t i t h r e a d s t u f f :
550
551
552 #i f d e f MULTITHREADED
553 s t a t i c VOID ThreadStart (THREADID tid , CONTEXT ∗ ctxt , INT32 f l a g s , VOID ∗v )
554 {
555 MemAccess ∗ma = new MemAccess ( ) ;




560 s t a t i c VOID ThreadFini (THREADID tid , const CONTEXT ∗ ctxt , INT32 code , VOID ∗v )
561 {





567 //TODO: maybe i n t e g r a t e t h i s i n t o t h e queue c l a s s and t h e s o c k e t per t h r e a d p r o t o c o l
568 // s t a t i c b o o l e n d i n g = f a l s e ;
569 // s t a t i c THREADID p r o c e s s o r ;
570 // s t a t i c PIN_THREAD_UID p r o c e s s o r u i d ;
571 //
572 // s t a t i c VOID ProcessQueue (VOID ∗ n o t h i n g ) {
573 // THREADID t i d = PIN_ThreadId ( ) ;
574 // w h i l e ( ! e n d i n g | | ! q−>empty ( ) ) {
575 // GetLock(& c_lock , t i d ) ;
576 // w h i l e ( ! q−>empty ( ) ) {
577 // q−> g e t t a i l ( ) ;
578 // q−>pop ( ) ;
579 // }
580 // R e l e a s e L o c k (& c _ l o c k ) ;
46
Appendix B. Relevant source code
581 // // Let o t h e r s f i l l t h e queue




586 s t a t i c VOID Fin i ( INT32 code , VOID ∗v )
587 {
588 // e n d i n g = t r u e ;
589 // PIN_WaitForThreadTermination ( p r o c e s s o r u i d , PIN_INFINITE_TIMEOUT ,NULL ) ;
590 i f ( KnobSyscallMap | | KnobCtxChangeMap)
591 f c l o s e (mout ) ;
592 s e r v e r_ f i n i 2 (q ) ;
593 s e r v e r_ f i n i ( iq ) ;
594 }
595
596 int main ( int argc , char ∗argv [ ] )
597 {
598 i f ( PIN_Init ( argc , argv ) )
599 {
600 return Usage ( ) ;
601 }
602
603 #i f d e f USE_STATES
604 i f ( ! ( Knobf . NumberOfValues ( ) == Knobw . NumberOfValues ( ) && Knobf . NumberOfValues()==Knobs . NumberOfValues ( ) ) )
605 {
606 f put s ( "The␣number␣ o f ␣ occur r ence s ␣ o f ␣−f ␣−h␣and␣−s ␣must␣be␣ the ␣same . " , s t d e r r ) ;




611 i q=s e r v e r_ in i t ( ) ;
612 q=se rv e r_ in i t 2 ( ) ;
613 q−>gethead ( ) . setType (STARTTH) ; //TODO move t o t h e t h r e a d s t a r t c a l l b a c k s
614 q−>wait_push ( ) ;
615
616 #i f d e f USE_STATES
617 i f (Knobf . NumberOfValues ( ) >= 1)
618 nextState ( ) ;
619 // This one i s done due t o t h e way i n s t r u m e n t a t i o n works
620 in scount++;
621 // Send t h e simu s t a r t command i f n e c e s s a r y
622 i f ( s t a t e == SIMULATION)
623 #endif
624 q−>send_control (SERVER_SIM_START) ;
625 INS_AddInstrumentFunction ( In s t ruc t i on , 0 ) ;
626 PIN_AddFiniUnlockedFunction ( Fini , 0 ) ;
627
628 //Open t h e o u t p u t f i l e
629 i f ( KnobSyscallMap | | KnobCtxChangeMap)
630 mout = fopen ( " maptrace . txt " , "w" ) ;
631
632
633 // Monitor s y s c a l l s and so f o r mapping c h a n g e s
634 i f ( KnobSyscallMap )
635 PIN_AddSyscallExitFunction ( parsemaps1 , NULL) ;
636 // Monitor a l s o a f t e r c o n t e x t c h a n g e s s i n c e i f we are p t r a c e d mappings may have changed
637 i f (KnobCtxChangeMap)
638 PIN_AddContextChangeFunction ( parsemaps2 , NULL) ;
639 // A l t h o u g h p i n u s e s c o d e c a c h e s i t h i d e s t h i s d e t a i l s from t h e i n s t r u m e n t a t i o n code so our i n s t r u c t i o n s c a c h e s don ’ t
b r e a k .
640 // This means t h e i n s t r u c t i o n a d d r e s s e s we g e t are mapped t o t h e mappings c o r r e s p o n d i n g t o t h e l i b r a r i e s and not t o t h e JIT
641 // code so we don ’ t have t o worry a b o u t c h a n g e s t o t h e s e mappings , but , s i n c e we s t i l l can ’ t d i s c e r n them from a p p l i c a t i o n
642 // mappings we s t i l l have t o r e s e r v e s p a c e f o r them i n t h e s i m u l a t o r s p a c e . This a l s o means we ’ l l be h a v i n g some movement
643 // i n t h e map s p a c e a l m o s t a l w a y s u n t i l PIN p r o v i d e s an a p i t o d i s c e r n p i n / t o o l mappings from a p p l i c a t i o n ones .
644
645 // Thread C a l l b a c k s
646 In i tLock (&c_lock ) ;
647
648 #i f d e f MULTITHREADED
649 In i tLock (&h_lock ) ;
650 wMemAccess = PIN_CreateThreadDataKey ( 0 ) ;
651 PIN_AddThreadStartFunction ( ThreadStart , 0 ) ;
652 PIN_AddThreadFiniFunction ( ThreadFini , 0 ) ;
653 #end i f
654
655 // S t a r t queue p r o c e s s o r t h r e a d
656 // p r o c e s s o r = PIN_SpawnInternalThread ( ProcessQueue , NULL, 0 , &p r o c e s s o r u i d ) ;
657 // i f ( p r o c e s s o r == INVALID_THREADID) r e t u r n 1 ;
658 // I n i t i a l map l o a d i n g
659 i f ( KnobSyscallMap | | KnobCtxChangeMap)
660 parsemaps3 ( ) ;
661 PIN_StartProgram ( ) ;
662
47
Appendix B. Relevant source code
663 return 0 ;
664 }
48
Appendix B. Relevant source code
consumer.cpp
1 #include <pinat race . h>
2 #include <cstd in t>
3 #include <cint types>
4
5 int main ( ) {
6 InstEventq ∗ i q ;
7 i q = c l i e n t_ i n i t ( ) ;
8 SimDataq ∗q ;
9 q= c l i e n t_ i n i t 2 ( ) ;
10 uint64_t nins = 0 ;
11 uint64_t nins2 = 0 ;
12 uint64_t nrea = 0 ;
13 uint64_t nrea2 = 0 ;
14 uint64_t nwri = 0 ;
15 uint64_t nwri2 = 0 ;
16 uint64_t npre = 0 ;
17 uint64_t npre2 = 0 ;
18 uint64_t s i n s = 0 ;
19 uint64_t s i n s 2 = 0 ;
20 uint64_t s r ea = 0 ;
21 uint64_t s rea2 = 0 ;
22 uint64_t swr i = 0 ;
23 uint64_t swr i2 = 0 ;
24 uint64_t spre = 0 ;
25 uint64_t spre2 = 0 ;
26 uint64_t mark = 0 ;
27 uint64_t mark2 = 0 ;
28 bool s imu la t ing = f a l s e ;
29 while (q−>rece i v e_cont ro l ( ) != SERVER_DIED) {
30 while ( ! q−>empty ( ) ) {
31 a s s e r t (q−>g e t t a i l ( ) . getType ( ) != INVALDATA) ;
32 i f (q−>g e t t a i l ( ) . getType ( ) == ACCMEM) {
33 const MemAccess &ma = q−>g e t t a i l ( ) . getCMa ( ) ;
34 mark ^= ( uint64_t ) ma. getEA ( ) ;
35 mark2 ^= ( uint64_t ) ma. getEA ( ) ;




40 s i n s += ma. g e tS i z e ( ) ;





46 s r ea += ma. g e tS i z e ( ) ;





52 swr i += ma. g e tS i z e ( ) ;





58 spre += ma. g e tS i z e ( ) ;
59 spre2 += ma. g e tS i z e ( ) ;
60 break ;
61 default :
62 puts ( " Unexpected␣ acc e s s ␣ type ! " ) ;
63 }
64 }
65 q−>pop ( ) ;
66 }
67 // Have we j u s t empt ied t h e b u f f e r or has an e v e n t happened ?
68 while ( ! iq−>empty ( ) ) {
69 i f ( iq−>g e t t a i l ( ) . getType ( ) == REMOVEMAPPING){
70 range r=iq−>g e t t a i l ( ) . getRange ( ) ;
71 p r i n t f ( "−␣%lx−%lx \n" , r . b , r . e ) ;
72 } e l s e i f ( iq−>g e t t a i l ( ) . getType ( ) == ADDMAPPING){
73 range r=iq−>g e t t a i l ( ) . getRange ( ) ;
74 p r i n t f ( "+␣%lx−%lx \n" , r . b , r . e ) ;
75 }
76 iq−>pop ( ) ;
77 }
78 switch (q−>rece i v e_cont ro l ( ) ) {
79 case SERVER_DIED:




Appendix B. Relevant source code
83 s imu la t ing = f a l s e ;
84 puts ( " S imulat ion ␣ s t a t i s t i c s : " ) ;
85 puts ( "Number␣ o f ␣ a c c e s s e s : " ) ;
86 p r i n t f ( " ␣␣ i n s t r u c t i o n s : ␣%"PRIu64 "\n" , n ins ) ;
87 p r i n t f ( " ␣␣ reads ␣␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , nrea ) ;
88 p r i n t f ( " ␣␣ wr i t e s ␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , nwri ) ;
89 p r i n t f ( " ␣␣ p r e f e t ch e s ␣␣ : ␣%"PRIu64 "\n" , npre ) ;
90 p r i n t f ( " ␣␣ t o t a l ␣␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , n ins+nrea+nwri+npre ) ;
91 puts ( " Total ␣ acce s s ed ␣memory␣by␣ type␣ ( bytes ) : " ) ;
92 p r i n t f ( " ␣␣ i n s t r u c t i o n s : ␣%"PRIu64 "\n" , s i n s ) ;
93 p r i n t f ( " ␣␣ reads ␣␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , s r ea ) ;
94 p r i n t f ( " ␣␣ wr i t e s ␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , swr i ) ;
95 p r i n t f ( " ␣␣ p r e f e t ch e s ␣␣ : ␣%"PRIu64 "\n" , spre ) ;
96 p r i n t f ( " ␣␣ t o t a l ␣␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , s i n s+srea+swr i+spre ) ;
97 p r i n t f ( " Execution ␣mark : ␣%"PRIx64 "\n" ,mark ) ;
98 q−>ack_control ( ) ;
99 break ;
100 case SERVER_SIM_START:
101 nins = 0 ;
102 nrea = 0 ;
103 nwri = 0 ;
104 npre = 0 ;
105 s i n s = 0 ;
106 s r ea = 0 ;
107 swr i = 0 ;
108 spre = 0 ;
109 mark = 0 ;
110 s imu la t ing = true ;
111 q−>ack_control ( ) ;
112 break ;
113 }
114 // Wait f o r b u f f e r t o r e f i l l
115 while (q−>wait_empty_cond ( ) && iq−>wait_empty_cond ( ) ) YIELD( ) ;
116 }
117 puts ( " Total ␣ s t a t i s t i c s : " ) ;
118 puts ( "Number␣ o f ␣ a c c e s s e s : " ) ;
119 p r i n t f ( " ␣␣ i n s t r u c t i o n s : ␣%"PRIu64 "\n" , n ins2 ) ;
120 p r i n t f ( " ␣␣ reads ␣␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , nrea2 ) ;
121 p r i n t f ( " ␣␣ wr i t e s ␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , nwri2 ) ;
122 p r i n t f ( " ␣␣ p r e f e t ch e s ␣␣ : ␣%"PRIu64 "\n" , npre2 ) ;
123 p r i n t f ( " ␣␣ t o t a l ␣␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , n ins2+nrea2+nwri2+npre2 ) ;
124 puts ( " Total ␣ acce s s ed ␣memory␣by␣ type␣ ( bytes ) : " ) ;
125 p r i n t f ( " ␣␣ i n s t r u c t i o n s : ␣%"PRIu64 "\n" , s i n s 2 ) ;
126 p r i n t f ( " ␣␣ reads ␣␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , s r ea2 ) ;
127 p r i n t f ( " ␣␣ wr i t e s ␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , swr i2 ) ;
128 p r i n t f ( " ␣␣ p r e f e t ch e s ␣␣ : ␣%"PRIu64 "\n" , spre2 ) ;
129 p r i n t f ( " ␣␣ t o t a l ␣␣␣␣␣␣␣ : ␣%"PRIu64 "\n" , s i n s 2+srea2+swri2+spre2 ) ;
130 p r i n t f ( " Execution ␣mark : ␣%"PRIx64 "\n" ,mark2 ) ;
131 c l i e n t_ f i n i 2 (q ) ;
132 c l i e n t_ f i n i ( iq ) ;
133 return 0 ;
134 }
50
Appendix B. Relevant source code
mem_trace_reader.hh
1 /∗
2 ∗ C o p y r i g h t ( c ) 2004 −2005 The Regents o f The U n i v e r s i t y o f Michigan
3 ∗ A l l r i g h t s r e s e r v e d .
4 ∗
5 ∗ R e d i s t r i b u t i o n and use i n s o u r c e and b i n a r y forms , w i t h or w i t h o u t
6 ∗ m o d i f i c a t i o n , are p e r m i t t e d p r o v i d e d t h a t t h e f o l l o w i n g c o n d i t i o n s are
7 ∗ met : r e d i s t r i b u t i o n s o f s o u r c e code must r e t a i n t h e above c o p y r i g h t
8 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r ;
9 ∗ r e d i s t r i b u t i o n s i n b i n a r y form must r e p r o d u c e t h e above c o p y r i g h t
10 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r i n t h e
11 ∗ documentat ion and / or o t h e r m a t e r i a l s p r o v i d e d w i t h t h e d i s t r i b u t i o n ;
12 ∗ n e i t h e r t h e name o f t h e c o p y r i g h t h o l d e r s nor t h e names o f i t s
13 ∗ c o n t r i b u t o r s may be used t o e n d o r s e or promote p r o d u c t s d e r i v e d from
14 ∗ t h i s s o f t w a r e w i t h o u t s p e c i f i c p r i o r w r i t t e n p e r m i s s i o n .
15 ∗
16 ∗ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
17 ∗ "AS IS " AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
18 ∗ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
19 ∗ A PARTICULAR PURPOSE ARE DISCLAIMED . IN NO EVENT SHALL THE COPYRIGHT
20 ∗ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT , INCIDENTAL,
21 ∗ SPECIAL , EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
22 ∗ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE,
23 ∗ DATA, OR PROFITS ; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
24 ∗ THEORY OF LIABILITY , WHETHER IN CONTRACT, STRICT LIABILITY , OR TORT
25 ∗ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26 ∗ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27 ∗










38 #include "mem/packet . hh "
39 #include "mem/ reques t . hh "
40 #include " params/MemTraceReader . hh "
41 #include " sim/ sim_object . hh "
42
43 /∗∗
44 ∗ This c l a s s c o n t a i n s t h e i n f o o f t h e t r a c e r e q u e s t and some u s e f u l methods t o
45 ∗ s p l i t i t
46 ∗/
47 c l a s s MemTraceRequest : public FastAl loc {
48 Addr _paddr ;
49 unsigned _size ;
50 Request : : Flags _f lags ;
51 Tick _time ;
52 int _asid ;
53 Addr _vaddr ;
54 int _contextId ;
55 int _threadId ;
56 Addr _pc ;
57 MemCmd _cmd;
58 public :
59 MemTraceRequest ( ) {}
60 MemTraceRequest (Addr paddr , int s i z e , Request : : Flags f l a g s ,
61 MemCmd: : Command cmd)
62 : _paddr ( paddr ) , _size ( s i z e ) , _f lags ( f l a g s ) , _time ( curTick ( ) ) , _cmd(cmd)
63 { }
64
65 MemTraceRequest (Addr paddr , int s i z e , Request : : Flags f l a g s , Tick time ,
66 MemCmd: : Command cmd)
67 : _paddr ( paddr ) , _size ( s i z e ) , _f lags ( f l a g s ) , _time ( time ) , _cmd(cmd)
68 { }
69
70 ~MemTraceRequest ( ) {} // f o r F a s t A l l o c
71
72 /∗∗
73 ∗ Are we s c h e d u l e d t o run a l r e a d y
74 ∗/
75 i n l i n e bool mustRun ( ) {
76 return _time <= curTick ( ) ;
77 }
78
79 i n l i n e Tick time ( ) {




Appendix B. Relevant source code
83 i n l i n e bool i s I n s tFe t ch ( ) {
84 return _f lags . i s S e t ( Request : : INST_FETCH) ;
85 }
86
87 i n l i n e bool l a s tPacketSent ( ) {




92 ∗ Get t h e n e x t p a c k e t w i t h p r o p e r bounds f o r t h i s b l o c k s i z e
93 ∗ W i l l r e t u r n NULL when done
94 ∗/
95 PacketPtr getNextPkt ( int bs i ze , Packet : : NodeID dest , MasterID mid ) {
96 i f ( la s tPacketSent ( ) ) {
97 return NULL;
98 }
99 // Base a d d r e s s o f t h e b l o c k
100 Addr base = (_paddr & ~( b s i z e − 1 ) ) ;
101 // Current b l o c k m a x s i z e
102 int msize = bs i z e − (_paddr − base ) ;
103 //Minimum
104 i f ( msize > _size ) msize = _size ;
105 // Generate t t h e r e q u e s t and t h e p a c k e t
106 RequestPtr req = new Request (_paddr , msize , _f lags , mid ) ;
107 PacketPtr pkt = new Packet ( req ,_cmd, dest ) ;
108 pkt−>dataDynamicArray (new char [ msize ] ) ;
109 // C a l c u l a t e t h e new b a s e a d d r e s s and s i z e
110 _paddr += msize ;
111 _size −= msize ;








120 ∗ Pure v i r t u a l b a s e c l a s s f o r memory t r a c e r e a d e r s .
121 ∗/
122 c l a s s MemTraceReader : public SimObject
123 {
124 public :
125 enum reason {EOT,STAT_RESET,STAT_DUMP};
126 /∗∗ C o n s t r u c t t h i s MemoryTrace r e a d e r . ∗/
127 MemTraceReader ( const MemTraceReaderParams ∗p) : SimObject (p) {}
128
129 //TODO: redo doc p k t s h o u l d c o n t a i n time , r e q u e s t , command and d a t a .
130 /∗∗
131 ∗ Read t h e n e x t r e q u e s t from t h e t r a c e . Returns t h e r e q u e s t i n t h e
132 ∗ p r o v i d e d R e q u e s t P t r and t h e c y c l e o f t h e r e q u e s t i n t h e r e t u r n v a l u e .
133 ∗ @param r e q Return t h e n e x t r e q u e s t from t h e t r a c e .
134 ∗ @return The c y c l e o f t h e r e q u e s t , 0 i f none i n t r a c e .
135 ∗/






Appendix B. Relevant source code
pin_reader.hh
1 /∗
2 ∗ C o p y r i g h t ( c ) 2004 −2005 The Regents o f The U n i v e r s i t y o f Michigan
3 ∗ A l l r i g h t s r e s e r v e d .
4 ∗
5 ∗ R e d i s t r i b u t i o n and use i n s o u r c e and b i n a r y forms , w i t h or w i t h o u t
6 ∗ m o d i f i c a t i o n , are p e r m i t t e d p r o v i d e d t h a t t h e f o l l o w i n g c o n d i t i o n s are
7 ∗ met : r e d i s t r i b u t i o n s o f s o u r c e code must r e t a i n t h e above c o p y r i g h t
8 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r ;
9 ∗ r e d i s t r i b u t i o n s i n b i n a r y form must r e p r o d u c e t h e above c o p y r i g h t
10 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r i n t h e
11 ∗ documentat ion and / or o t h e r m a t e r i a l s p r o v i d e d w i t h t h e d i s t r i b u t i o n ;
12 ∗ n e i t h e r t h e name o f t h e c o p y r i g h t h o l d e r s nor t h e names o f i t s
13 ∗ c o n t r i b u t o r s may be used t o e n d o r s e or promote p r o d u c t s d e r i v e d from
14 ∗ t h i s s o f t w a r e w i t h o u t s p e c i f i c p r i o r w r i t t e n p e r m i s s i o n .
15 ∗
16 ∗ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
17 ∗ "AS IS " AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
18 ∗ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
19 ∗ A PARTICULAR PURPOSE ARE DISCLAIMED . IN NO EVENT SHALL THE COPYRIGHT
20 ∗ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT , INCIDENTAL,
21 ∗ SPECIAL , EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
22 ∗ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE,
23 ∗ DATA, OR PROFITS ; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
24 ∗ THEORY OF LIABILITY , WHETHER IN CONTRACT, STRICT LIABILITY , OR TORT
25 ∗ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26 ∗ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27 ∗




32 ∗ @ f i l e






39 #include " cpu/ t rac e / reader /mem_trace_reader . hh "
40 #include " cpu/ t rac e / reader / pin_atrace . hh "
41 #include " params/PinReader . hh "
42
43 /∗∗
44 ∗ A memory t r a c e r e a d e r f o r a p i n memory t r a c e .
45 ∗/
46 c l a s s PinReader : public MemTraceReader
47 {
48 friend c l a s s DeleteQueuesCal lback ;
49 /∗∗ The t r a c e . ∗/
50 SimDataq ∗q ;
51 /∗∗ I n f o r m a t i o n on mapping c h a n g e s ∗/
52 InstEventq ∗ i q ;
53 bool s imu la t ing ; // Wether we are i n s i m u l a t i o n s t a t e or not
54 bool drop ; // S h o u l d we drop t h e n e x t e l e m e n t ( has i t been p r o c e s s e d )
55
56 protected :
57 void removeQueues ( ) ;
58 public :
59 /∗∗
60 ∗ C o n s t r u c t an M5 memory t r a c e r e a d e r .
61 ∗/
62 PinReader ( const PinReaderParams ∗p ) ;
63
64 ~PinReader ( ) ;
65
66
67 //TODO: redo doc p k t s h o u l d c o n t a i n time , r e q u e s t , command and d a t a .
68 /∗∗
69 ∗ Read t h e n e x t r e q u e s t from t h e t r a c e . Returns t h e r e q u e s t i n t h e
70 ∗ p r o v i d e d R e q u e s t P t r and t h e c y c l e o f t h e r e q u e s t i n t h e r e t u r n v a l u e .
71 ∗ @param r e q Return t h e n e x t r e q u e s t from t h e t r a c e .
72 ∗ @return The c y c l e o f t h e r e q u e s t , 0 i f none i n t r a c e .
73 ∗/
74 virtual MemTraceRequestPtr getNextRequest (MemTraceReader : : reason &reason ) ;
75 } ;
76
77 #endif // __PIN_READER_HH__
53
Appendix B. Relevant source code
pin_reader.cc
1 /∗
2 ∗ C o p y r i g h t ( c ) 2004 −2005 The Regents o f The U n i v e r s i t y o f Michigan
3 ∗ A l l r i g h t s r e s e r v e d .
4 ∗
5 ∗ R e d i s t r i b u t i o n and use i n s o u r c e and b i n a r y forms , w i t h or w i t h o u t
6 ∗ m o d i f i c a t i o n , are p e r m i t t e d p r o v i d e d t h a t t h e f o l l o w i n g c o n d i t i o n s are
7 ∗ met : r e d i s t r i b u t i o n s o f s o u r c e code must r e t a i n t h e above c o p y r i g h t
8 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r ;
9 ∗ r e d i s t r i b u t i o n s i n b i n a r y form must r e p r o d u c e t h e above c o p y r i g h t
10 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r i n t h e
11 ∗ documentat ion and / or o t h e r m a t e r i a l s p r o v i d e d w i t h t h e d i s t r i b u t i o n ;
12 ∗ n e i t h e r t h e name o f t h e c o p y r i g h t h o l d e r s nor t h e names o f i t s
13 ∗ c o n t r i b u t o r s may be used t o e n d o r s e or promote p r o d u c t s d e r i v e d from
14 ∗ t h i s s o f t w a r e w i t h o u t s p e c i f i c p r i o r w r i t t e n p e r m i s s i o n .
15 ∗
16 ∗ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
17 ∗ "AS IS " AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
18 ∗ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
19 ∗ A PARTICULAR PURPOSE ARE DISCLAIMED . IN NO EVENT SHALL THE COPYRIGHT
20 ∗ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT , INCIDENTAL,
21 ∗ SPECIAL , EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
22 ∗ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE,
23 ∗ DATA, OR PROFITS ; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
24 ∗ THEORY OF LIABILITY , WHETHER IN CONTRACT, STRICT LIABILITY , OR TORT
25 ∗ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26 ∗ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27 ∗




32 ∗ @ f i l e
33 ∗ D e c l a r a t i o n o f a memory t r a c e r e a d e r f o r a p i n memory t r a c e .
34 ∗/
35
36 #include " base / ca l l ba ck . hh "
37 #include " cpu/ t ra c e / reader /pin_reader . hh "
38 #include " sim/ sim_exit . hh "
39 #include <set>
40
41 //TODO: l o o k why t h e u s e r i n t e r r u p t r e c e i v e d e v e n t doesn ’ t c a l l s t h e C a l l b a c k
42
43 /∗∗ C a l l b a c k t o c l e a n t h e q u e u e s ∗/
44 c l a s s DeleteQueuesCal lback : public Callback {
45 public :
46 DeleteQueuesCal lback ( ) ;
47 void proce s s ( ) ;
48 } ;
49 s t a t i c DeleteQueuesCal lback dqc ;
50
51
52 /∗∗ L i s t o f PinReader e l e m e n t s f o r t h e queue d e l e t i n g c a l l b a c k ∗∗/
53 s t a t i c std : : set<PinReader ∗> reade r s ;
54
55 DeleteQueuesCal lback : : DeleteQueuesCal lback ( ) {
56 r e g i s t e rEx i tCa l l b a ck ( this ) ;
57 }
58
59 void DeleteQueuesCal lback : : p roce s s ( ) {
60 for ( std : : set<PinReader ∗>:: i t e r a t o r i t = reade r s . begin ( ) ; i t != reade r s . end ( ) ; i t++) {




65 //TODO: Send c l i e n t f i n a l i z a t i o n e v e n t s i f n e c e s s a r y
66
67 void PinReader : : removeQueues ( ) {
68 i f ( q ) {
69 c l i e n t_ f i n i 2 (q ) ;
70 q = NULL;
71 }
72 i f ( iq ) {
73 c l i e n t_ f i n i ( iq ) ;
74 i q = NULL;
75 }
76 warn ( "Done " ) ;
77 }
78
79 PinReader : : PinReader ( const PinReaderParams ∗p) : MemTraceReader (p ) , s imu la t ing ( f a l s e ) {
80 i q = c l i e n t_ i n i t ( ) ;
81 q = c l i e n t_ i n i t 2 ( ) ;
82 // Wait f o r t h e i n i t i a l e v e n t
54
Appendix B. Relevant source code
83 while (q−>empty ( ) ) { YIELD( ) ; }
84 drop = true ;
85 r eade r s . i n s e r t ( this ) ;
86 }
87
88 PinReader : : ~ PinReader ( ) {
89 removeQueues ( ) ;




94 MemTraceRequestPtr PinReader : : getNextRequest (MemTraceReader : : reason &reason )
95 {
96 MemCmd: : Command cmd ;
97 MemTraceRequestPtr req ;
98 Request : : Flags f l a g s ;
99 i f ( drop ) {
100 a s s e r t ( ! q−>empty ( ) ) ;
101 q−>pop ( ) ; // Drop p r e v i o u s d a t a
102 }
103 while ( true ) {
104 // Wait f o r new t r a c e s i f t h e s e r v e r d i e d j u s t send NULL
105 while (q−>wait_empty_cond ( ) && iq−>wait_empty_cond ( ) ) YIELD( ) ;
106 //TODO: t h i s s t i l l needs some c l e a n i n g , t h e CPU must end any a c c e s s e s b e f o r e t h e r e s e t , same b e f o r e t h e dump
107 switch (q−>rece i v e_cont ro l ( ) ) {
108 case SERVER_DIED:
109 //The l a s t dump s h o u l d be made by m5 i t s e l f
110 reason = MemTraceReader : :EOT;
111 drop = f a l s e ;
112 return NULL;
113 case SERVER_SIM_END:
114 s imu la t ing = f a l s e ;
115 q−>ack_control ( ) ;
116 reason = MemTraceReader : :STAT_DUMP;
117 drop = f a l s e ;
118 return NULL;
119 case SERVER_SIM_START:
120 s imu la t ing = true ;
121 q−>ack_control ( ) ;
122 reason = MemTraceReader : : STAT_RESET;





128 warn ( " State ␣not␣ supported ! " ) ;
129 }
130 i f ( ! q−>empty ( ) ) {
131 switch (q−>g e t t a i l ( ) . getType ( ) ) {
132 case ACCMEM: {
133 const MemAccess &ma = q−>g e t t a i l ( ) . getCMa ( ) ;
134 switch (ma. getType ( ) ) {
135 case ACCEXEC:
136 f l a g s . s e t ( Request : : INST_FETCH) ;
137 cmd = MemCmd: : ReadReq ;
138 break ;
139 case ACCREAD:
140 cmd = MemCmd: : ReadReq ;
141 break ;
142 case ACCWRITE:
143 cmd = MemCmd: : WriteReq ;
144 break ;
145 case ACCPREFETCH:
146 f l a g s . s e t ( Request : :PREFETCH) ;
147 cmd = MemCmd: : ReadReq ;
148 break ;
149 default :
150 panic ( " Access ␣ type␣unknown" ) ;
151 }
152 Addr ea = (Addr )ma. getEA ( ) ;
153 ea &= (Addr )134217727 ; // 128Mb −1 : P
154 //By d e f a u l t t ime i s s e t t o 0
155 req = new MemTraceRequest ( ( Addr ) ea , ( int )ma. g e tS i z e ( ) , f l a g s , cmd ) ;
156 drop = true ;





162 panic ( " Unexpected␣data␣ type " ) ;
163 }
164 }
165 while ( ! iq−>empty ( ) ) {
55
Appendix B. Relevant source code
166 // P r o c e s s mapping c h a n g e s
167 i f ( iq−>g e t t a i l ( ) . getType ( ) == REMOVEMAPPING) {
168 // range r=iq −> g e t t a i l ( ) . getRange ( ) ;
169 //TODO: remove mapping from TLB
170 } e l s e i f ( iq−>g e t t a i l ( ) . getType ( ) == ADDMAPPING) {
171 // range r=iq −> g e t t a i l ( ) . getRange ( ) ;
172 //TODO: add mapping from TLB
173 }






180 PinReaderParams : : c r e a t e ( )
181 {
182 return new PinReader ( this ) ;
183 }
56
Appendix B. Relevant source code
trace_cpu.hh
1 /∗
2 ∗ C o p y r i g h t ( c ) 2004 −2005 The Regents o f The U n i v e r s i t y o f Michigan
3 ∗ A l l r i g h t s r e s e r v e d .
4 ∗
5 ∗ R e d i s t r i b u t i o n and use i n s o u r c e and b i n a r y forms , w i t h or w i t h o u t
6 ∗ m o d i f i c a t i o n , are p e r m i t t e d p r o v i d e d t h a t t h e f o l l o w i n g c o n d i t i o n s are
7 ∗ met : r e d i s t r i b u t i o n s o f s o u r c e code must r e t a i n t h e above c o p y r i g h t
8 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r ;
9 ∗ r e d i s t r i b u t i o n s i n b i n a r y form must r e p r o d u c e t h e above c o p y r i g h t
10 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r i n t h e
11 ∗ documentat ion and / or o t h e r m a t e r i a l s p r o v i d e d w i t h t h e d i s t r i b u t i o n ;
12 ∗ n e i t h e r t h e name o f t h e c o p y r i g h t h o l d e r s nor t h e names o f i t s
13 ∗ c o n t r i b u t o r s may be used t o e n d o r s e or promote p r o d u c t s d e r i v e d from
14 ∗ t h i s s o f t w a r e w i t h o u t s p e c i f i c p r i o r w r i t t e n p e r m i s s i o n .
15 ∗
16 ∗ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
17 ∗ "AS IS " AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
18 ∗ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
19 ∗ A PARTICULAR PURPOSE ARE DISCLAIMED . IN NO EVENT SHALL THE COPYRIGHT
20 ∗ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT , INCIDENTAL,
21 ∗ SPECIAL , EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
22 ∗ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE,
23 ∗ DATA, OR PROFITS ; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
24 ∗ THEORY OF LIABILITY , WHETHER IN CONTRACT, STRICT LIABILITY , OR TORT
25 ∗ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26 ∗ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27 ∗




32 ∗ @ f i l e
33 ∗ D e c l a r a t i o n o f a memory t r a c e CPU o b j e c t . Uses a memory t r a c e t o d r i v e t h e






40 #include <st r ing>
41
42 #include "mem/mem_object . hh "
43 #include "mem/packet . hh " // f o r R e q u e s t P t r
44 #include "mem/port . hh "
45 #include " params/TraceCPU . hh"
46 #include " sim/ eventq . hh " // f o r Event
47 #include " sim/ sim_object . hh "
48
49 // Forward d e c l a r a t i o n .
50 c l a s s MemTraceReader ;
51
52 enum CMD { Inva l id , Read , Write , Writeback } ;
53
54 /∗∗
55 ∗ A cpu o b j e c t f o r running memory t r a c e s t h r o u g h a memory h i e r a r c h y .
56 ∗/
57 c l a s s TraceCPU : public MemObject
58 {
59 private :
60 c l a s s MemPort : public Port
61 {
62 TraceCPU ∗ tcpu ;
63 PacketPtr retryPkt ;
64 bool accessRetry ;
65 public :
66 MemPort( const std : : s t r i n g &_name , TraceCPU ∗_tcpu )
67 : Port (_name , _tcpu ) , tcpu (_tcpu )
68 { accessRetry = f a l s e ; }
69
70 bool l ocked ( ) {
71 return accessRetry ;
72 }
73 void sendPkt ( PacketPtr pkt ) ;
74 protected :
75
76 virtual bool recvTiming ( PacketPtr pkt ) ;
77
78 virtual Tick recvAtomic ( PacketPtr pkt ) ;
79
80 virtual void recvFunct iona l ( PacketPtr pkt ) ;
81
82 virtual void recvRangeChange ( ) ;
57
Appendix B. Relevant source code
83
84 virtual void recvRetry ( ) ;
85 } ;
86 /∗∗ Port f o r i n s t r u c t i o n t r a c e r e q u e s t s , i f any . ∗/
87 MasterID _instMasterId ;
88 MemPort i cache ;
89 /∗∗ Port f o r d a t a t r a c e r e q u e s t s , i f any . ∗/
90 MasterID _dataMasterId ;
91 MemPort dcache ;
92
93 /∗∗ Data r e f e r e n c e t r a c e . ∗/
94 MemTraceReader ∗dataTrace ;
95
96 /∗∗ Number o f o u t s t a n d i n g r e q u e s t s . ∗/
97 int outstandingRequests ;
98
99 /∗∗ Next p a c k e t c o n a t i n i n g data , time , r e q u e s t , command , e t c ∗/
100 MemTraceRequestPtr nextRequest ;
101
102 /∗∗ Reason f o r t h e p a c k e t t o be NULL ∗/
103 MemTraceReader : : reason reason ;
104
105 /∗∗ Next r e q u e s t . ∗/
106 MemCmd: : Command nextCmd ;
107
108 /∗∗
109 ∗ Event t o c a l l t h e TraceCPU : : t i c k
110 ∗/
111 c l a s s TickEvent : public Event
112 {
113 private :
114 TraceCPU ∗cpu ;
115
116 public :
117 TickEvent (TraceCPU ∗c ) : Event (CPU_Tick_Pri ) , cpu ( c ) {}
118 void proce s s ( ) { cpu−>t i c k ( ) ; }
119 virtual const char ∗ de s c r i p t i o n ( ) const { return "TraceCPU␣ t i c k " ; }
120 } ;
121
122 TickEvent t ickEvent ;




127 ∗ C o n s t r u c t a TraceCPU o b j e c t .
128 ∗/
129 TraceCPU( const TraceCPUParams ∗p ) ;
130
131 i n l i n e Tick t i c k s ( int numCycles ) { return numCycles ; }
132
133 /∗∗
134 ∗ Perform a l l t h e a c c e s s e s f o r one c y c l e .
135 ∗/
136 void t i c k ( ) ;
137
138 /∗∗
139 ∗ Handle a c o m p l e t e d memory r e q u e s t .
140 ∗/
141 void completeRequest ( PacketPtr req ) ;
142
143 virtual Port ∗ getPort ( const std : : s t r i n g &if_name , int idx = −1);
144 } ;
145
146 #endif // __CPU_TRACE_TRACE_CPU_HH__
58
Appendix B. Relevant source code
trace_cpu.cc
1 /∗
2 ∗ C o p y r i g h t ( c ) 2004 −2005 The Regents o f The U n i v e r s i t y o f Michigan
3 ∗ A l l r i g h t s r e s e r v e d .
4 ∗
5 ∗ R e d i s t r i b u t i o n and use i n s o u r c e and b i n a r y forms , w i t h or w i t h o u t
6 ∗ m o d i f i c a t i o n , are p e r m i t t e d p r o v i d e d t h a t t h e f o l l o w i n g c o n d i t i o n s are
7 ∗ met : r e d i s t r i b u t i o n s o f s o u r c e code must r e t a i n t h e above c o p y r i g h t
8 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r ;
9 ∗ r e d i s t r i b u t i o n s i n b i n a r y form must r e p r o d u c e t h e above c o p y r i g h t
10 ∗ n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r i n t h e
11 ∗ documentat ion and / or o t h e r m a t e r i a l s p r o v i d e d w i t h t h e d i s t r i b u t i o n ;
12 ∗ n e i t h e r t h e name o f t h e c o p y r i g h t h o l d e r s nor t h e names o f i t s
13 ∗ c o n t r i b u t o r s may be used t o e n d o r s e or promote p r o d u c t s d e r i v e d from
14 ∗ t h i s s o f t w a r e w i t h o u t s p e c i f i c p r i o r w r i t t e n p e r m i s s i o n .
15 ∗
16 ∗ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
17 ∗ "AS IS " AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
18 ∗ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
19 ∗ A PARTICULAR PURPOSE ARE DISCLAIMED . IN NO EVENT SHALL THE COPYRIGHT
20 ∗ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT , INCIDENTAL,
21 ∗ SPECIAL , EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
22 ∗ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE,
23 ∗ DATA, OR PROFITS ; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
24 ∗ THEORY OF LIABILITY , WHETHER IN CONTRACT, STRICT LIABILITY , OR TORT
25 ∗ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26 ∗ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27 ∗




32 ∗ @ f i l e
33 ∗ D e c l a r a t i o n o f a memory t r a c e CPU o b j e c t . Uses a memory t r a c e t o d r i v e t h e
34 ∗ p r o v i d e d memory h i e r a r c h y .
35 ∗/
36
37 #include <algorithm> // For min
38
39 #include " cpu/ t rac e / reader /mem_trace_reader . hh "
40 #include " cpu/ t rac e / trace_cpu . hh "
41 // #i n c l u d e "mem/base_mem . hh " // For PARAM c o n s t r u c t o r
42 // #i n c l u d e "mem/ mem_interface . hh "
43 //#i n c l u d e " params /TraceCPU . hh "
44 #include " base / s t a t i s t i c s . hh "
45 #include "mem/packet . hh "
46 #include " sim/ eventq . hh "
47 #include " sim/sim_events . hh "
48 #include " sim/ sim_exit . hh "
49 #include " sim/system . hh"
50
51 using namespace std ;
52
53 TraceCPU : : TraceCPU( const TraceCPUParams ∗p)
54 : MemObject (p ) ,
55 _instMasterId (p−>sys−>getMasterId (name ( ) + " . i n s t " ) ) , i cache ( " i n s t r u c t i o n s " , this ) ,
56 _dataMasterId (p−>sys−>getMasterId (name ( ) + " . data " ) ) , dcache ( " data " , this ) ,
57 dataTrace (p−>trace ) , outstandingRequests ( 0 ) , t ickEvent ( this )
58 {
59 nextRequest = dataTrace−>getNextRequest ( reason ) ;
60 schedu le (&tickEvent , curTick ( ) + t i c k s ( 1 ) ) ;
61 }
62
63 //TODO: f i x u n a l i g n e d a c c e s s e s o u t o f b l o c k b o u n d a r i e s
64 void
65 TraceCPU : : t i c k ( )
66 {
67 a s s e r t ( outstandingRequests >= 0 ) ;
68 a s s e r t ( outstandingRequests < 1000) ;
69 int instReqs = 0 ; //TODO c o n v e r t t o s t a t s
70 int dataReqs = 0 ; //TODO c o n v e r t t o s t a t s
71 while ( ! nextRequest ) {
72 i f ( outstandingRequests ) return ;
73 switch ( reason ) {
74 case MemTraceReader : :EOT:
75 // No more r e q u e s t s t o send . F i n i s h t r a i l i n g e v e n t s and e x i t .
76 //TODO: f i x t h i s
77 // i f ( queue ()−>empty ( ) ) {
78 exitSimLoop ( " end␣ o f ␣memory␣ t ra c e ␣ reached " ) ;
79 // } e l s e {
80 // i f ( ! t i c k E v e n t . s c h e d u l e d ( ) )
81 // s c h e d u l e (& t i c k E v e n t , queue ()−> n e x t T i c k ( ) + t i c k s ( 1 ) ) ;
82 // }
59
Appendix B. Relevant source code
83 return ;
84 case MemTraceReader : : STAT_RESET:
85 nextRequest = dataTrace−>getNextRequest ( reason ) ;
86 Stat s : : r e s e t ( ) ;
87 break ;
88 case MemTraceReader : :STAT_DUMP:
89 nextRequest = dataTrace−>getNextRequest ( reason ) ;




94 i f ( nextRequest−>mustRun ( ) ) {
95 int b s i z e = 0 ;
96 i f ( nextRequest−>i s In s tFe t ch ( ) ) {
97 b s i z e=icache . peerBlockS ize ( ) ;
98 } e l s e {
99 b s i z e=dcache . peerBlockS ize ( ) ;
100 }
101
102 // Rest o f t h e r e q u e s t : g e t t h e new a d d r e s s and t h e new s i z e
103 i f ( nextRequest−>i s In s tFe t ch ( ) ) {
104 PacketPtr nextPkt = nextRequest−>getNextPkt ( bs i ze , 0 , _instMasterId ) ;
105 // a s s e r t ( nextPkt −>req −>thread_num < 4 && " Not enough t h r e a d s " ) ;
106 nextPkt−>se tS r c ( 0 ) ;
107 ++instReqs ;
108 // DPRINTF( " i d %d i n i t i a t i n g %s r e a d a t addr %x ( b l k %x ) e x p e c t i n g %x \n " ,
109 // id , d o _ f u n c t i o n a l ? " f u n c t i o n a l " : " " , req −>g e t P a d d r ( ) ,
110 // b l o c k A d d r ( req −>g e t P a d d r ( ) ) , ∗ r e s u l t ) ;
111 i c ache . sendPkt ( nextPkt ) ;
112 } e l s e {
113 PacketPtr nextPkt = nextRequest−>getNextPkt ( bs i ze , 0 , _dataMasterId ) ;
114 // a s s e r t ( nextPkt −>req −>thread_num < 4 && " Not enough t h r e a d s " ) ;
115 nextPkt−>se tS r c ( 0 ) ;
116 ++dataReqs ;
117 i f ( nextPkt−>cmd . isRead ( ) ) {
118 // DPRINTF( " i d %d i n i t i a t i n g %s r e a d a t addr %x ( b l k %x ) e x p e c t i n g %x \n " ,
119 // id , d o _ f u n c t i o n a l ? " f u n c t i o n a l " : " " , req −>g e t P a d d r ( ) ,
120 // b l o c k A d d r ( req −>g e t P a d d r ( ) ) , ∗ r e s u l t ) ;
121 } e l s e i f ( nextPkt−>cmd . i sWri te ( ) ) {
122 // DPRINTF( MemTest , " i n i t i a t i n g %s w r i t e a t addr %x ( b l k %x ) v a l u e %x \n " ,
123 // d o _ f u n c t i o n a l ? " f u n c t i o n a l " : " " , req −>g e t P a d d r ( ) ,
124 // b l o c k A d d r ( req −>g e t P a d d r ( ) ) , d a t a & 0 x f f ) ;
125 } e l s e panic ( "CMD␣not␣ implemented " ) ;
126 dcache . sendPkt ( nextPkt ) ;
127 }
128 // I f we are done w i t h t h e c u r r e n t p a c k e t we go f o r t h e n e x t .
129 i f ( nextRequest−>lastPacketSent ( ) ) {
130 delete nextRequest ;
131 nextRequest = dataTrace−>getNextRequest ( reason ) ;
132 }
133 } e l s e i f ( ! t ickEvent . scheduled ( ) )




138 TraceCPU : : getPort ( const std : : s t r i n g &if_name , int idx )
139 {
140 i f ( if_name == " data " )
141 return &dcache ;
142 e l s e i f ( if_name == " i n s t r u c t i o n s " )
143 return &icache ;
144 e l s e





150 TraceCPU : : MemPort : : recvTiming ( PacketPtr pkt )
151 {
152 i f ( pkt−>isResponse ( ) ) {
153 tcpu−>completeRequest ( pkt ) ;
154 } e l s e {
155 // must be snoop u p c a l l
156 a s s e r t ( pkt−>isRequest ( ) ) ;
157 a s s e r t ( pkt−>getDest ( ) == Packet : : Broadcast ) ;
158 }




163 TraceCPU : : MemPort : : recvAtomic ( PacketPtr pkt )
164 {
165 panic ( "Atomic␣ a c c e s s e s ␣not␣ supported " ) ;
60
Appendix B. Relevant source code
166 // must be snoop u p c a l l
167 a s s e r t ( pkt−>isRequest ( ) ) ;
168 a s s e r t ( pkt−>getDest ( ) == Packet : : Broadcast ) ;




173 TraceCPU : : MemPort : : r ecvFunct iona l ( PacketPtr pkt )
174 {










185 TraceCPU : : MemPort : : recvRetry ( )
186 {
187 i f ( sendTiming ( retryPkt ) ) {
188 // DPRINTF( MemTest , " a c c e s s R e t r y s e t t i n g t o f a l s e \n " ) ;
189 accessRetry = f a l s e ;






196 TraceCPU : : MemPort : : sendPkt ( PacketPtr pkt ) {
197 // i f ( atomic ) {
198 // c a c h e P o r t . sendAtomic ( p k t ) ;
199 // c o m p l e t e R e q u e s t ( p k t ) ;
200 // }
201 // e l s e
202 tcpu−>outstandingRequests++;
203 i f ( ! sendTiming ( pkt ) ) {
204 // DPRINTF( MemTest , " a c c e s s R e t r y s e t t i n g t o t r u e \n " ) ;
205
206 accessRetry = true ;




211 // TODO: h a n d l e s t a t s
212 // v o i d
213 // TraceCPU : : r e g S t a t s ( )
214 // {
215 // u s i n g namespace S t a t s ;
216 //
217 // numReadsStat
218 // . name ( name ( ) + " . num_reads " )




223 // . name ( name ( ) + " . num_writes " )




228 // . name ( name ( ) + " . num_exec " )





234 TraceCPU : : completeRequest ( PacketPtr pkt )
235 {
236 Request ∗ req = pkt−>req ;
237
238 outstandingRequests −−;
239 // DPRINTF( MemTest , " c o m p l e t i n g %s a t a d d r e s s %x ( b l k %x ) %s \n " ,
240 // pkt −>i s W r i t e ( ) ? " w r i t e " : " read " ,
241 // req −>g e t P a d d r ( ) , b l o c k A d d r ( req −>g e t P a d d r ( ) ) ,
242 // pkt −>i s E r r o r ( ) ? " e r r o r " : " s u c c e s s " ) ;
243
244 //Remove t h e a d d r e s s from t h e l i s t o f o u t s t a n d i n g
245
246 i f ( pkt−>i sE r r o r ( ) ) {
247 warn ( " Access ␣ f a i l e d ␣ f o r ␣%s␣ at ␣%x\n" ,
248 pkt−>isWri te ( ) ? " wr i t e " : " read " , req−>getPaddr ( ) ) ;
61
Appendix B. Relevant source code
249 } e l s e {
250 //TODO: h a n d l e s t a t s
251 // i f ( pkt −>isRead ( ) ) {
252 // numReads++;
253 // numReadsStat++;
254 // } e l s e {






261 pkt−>deleteData ( ) ;
262 delete pkt−>req ;
263 delete pkt ;
264 i f ( ! t ickEvent . scheduled ( ) )




269 TraceCPUParams : : c r e a t e ( )
270 {
271 return new TraceCPU( this ) ;
272 }
273
274 /∗ To c o n v e r t ∗/
275
276
277 // v o i d
278 // MemTest : : c o m p l e t e R e q u e s t ( P a c k e t P t r p k t )
279 // {
280 // R e q u e s t ∗ r e q = pkt −>r e q ;
281 //
282 // i f ( issueDmas ) {
283 // dmaOutstanding = f a l s e ;
284 // }
285 //
286 // DPRINTF( MemTest , " c o m p l e t i n g %s a t a d d r e s s %x ( b l k %x ) %s \n " ,
287 // pkt −>i s W r i t e ( ) ? " w r i t e " : " read " ,
288 // req −>g e t P a d d r ( ) , b l o c k A d d r ( req −>g e t P a d d r ( ) ) ,
289 // pkt −>i s E r r o r ( ) ? " e r r o r " : " s u c c e s s " ) ;
290 //
291 // MemTestSenderState ∗ s t a t e =
292 // dynamic_cast <MemTestSenderState ∗>( pkt −>s e n d e r S t a t e ) ;
293 //
294 // u i n t 8 _ t ∗ d a t a = s t a t e −>d a t a ;
295 // u i n t 8 _ t ∗ pkt_data = pkt −>g e t P t r <uint8_t >() ;
296 //
297 // //Remove t h e a d d r e s s from t h e l i s t o f o u t s t a n d i n g
298 // s t d : : s e t <unsigned > : : i t e r a t o r removeAddr =
299 // o u t s t a n d i n g A d d r s . f i n d ( req −>g e t P a d d r ( ) ) ;
300 // a s s e r t ( removeAddr != o u t s t a n d i n g A d d r s . end ( ) ) ;
301 // o u t s t a n d i n g A d d r s . e r a s e ( removeAddr ) ;
302 //
303 // i f ( pkt −>i s E r r o r ( ) ) {
304 // i f ( ! s u p p r e s s _ f u n c _ w a r n i n g s ) {
305 // warn ( " F u n c t i o n a l Access f a i l e d f o r %x a t %x \n " ,
306 // pkt −>i s W r i t e ( ) ? " w r i t e " : " read " , req −>g e t P a d d r ( ) ) ;
307 // }
308 // } e l s e {
309 // i f ( pkt −>isRead ( ) ) {
310 // i f (memcmp( pkt_data , data , pkt −>g e t S i z e ( ) ) != 0) {
311 // p a n i c ("% s : read o f %x ( b l k %x ) @ c y c l e %d "
312 // " r e t u r n s %x , e x p e c t e d %x \n " , name ( ) ,
313 // req −>g e t P a d d r ( ) , b l o c k A d d r ( req −>g e t P a d d r ( ) ) , c u r T i c k ( ) ,






320 // i f ( numReads == ( u i n t 6 4 _ t ) n e x t P r o g r e s s M e s s a g e ) {
321 // c c p r i n t f ( c e rr , "%s : c o m p l e t e d %d read , %d w r i t e a c c e s s e s @%d\n " ,
322 // name ( ) , numReads , numWrites , c u r T i c k ( ) ) ;
323 // n e x t P r o g r e s s M e s s a g e += p r o g r e s s I n t e r v a l ;
324 // }
325 //
326 // i f ( maxLoads != 0 && numReads >= maxLoads )
327 // ex i tS imLo op ( " maximum number o f l o a d s r e a c h e d " ) ;
328 // } e l s e {
329 // a s s e r t ( pkt −>i s W r i t e ( ) ) ;
330 // f u n c P o r t . w r i t e B l o b ( req −>g e t P a d d r ( ) , pkt_data , req −>g e t S i z e ( ) ) ;
331 // numWrites++;
62





336 // n oR e sp o ns e Cy c le s = 0 ;
337 // d e l e t e s t a t e ;
338 // d e l e t e [ ] d a t a ;
339 // d e l e t e pkt −>r e q ;
340 // d e l e t e p k t ;
341 // i f ( ! t i c k E v e n t . s c h e d u l e d ( ) )
342 // s c h e d u l e ( t i c k E v e n t , c u r T i c k ( ) + t i c k s ( 1 ) ) ;
343 // }
344
345 // v o i d
346 // MemTest : : t i c k ( )
347 // {
348 //
349 // // make new r e q u e s t
350 // /∗ u n s i g n e d cmd = random ( ) % 1 0 0 ;
351 // ∗ u n s i g n e d o f f s e t = random ( ) % s i z e ;
352 // ∗ u n s i g n e d b a s e = random ( ) % 2 ;
353 // ∗ u i n t 6 4 _ t d a t a = random ( ) ;
354 // ∗ u n s i g n e d a c c e s s _ s i z e = random ( ) % 4 ;
355 // ∗ b o o l u n c a c h e a b l e = ( random ( ) % 100) < p e r c e n t U n c a c h e a b l e ;
356 // ∗
357 // ∗ u n s i g n e d dma_access_size = random ( ) % 4 ; ∗/
358 // u n s i g n e d cmd = 0 ;
359 // o f f s e t ++;
360 // u n s i g n e d b a s e = 0 ;
361 // u i n t 6 4 _ t d a t a = random ( ) ;
362 // u n s i g n e d a c c e s s _ s i z e = 0 ;
363 // b o o l u n c a c h e a b l e = f a l s e ;
364 //
365 // u n s i g n e d dma_access_size = random ( ) % 4 ;
366 //
367 // // I f we aren ’ t d o i n g c o p i e s , use i d as o f f s e t , and do a f a l s e s h a r i n g
368 // //mem t e s t e r
369 // //We can e l i m i n a t e t h e l o w e r b i t s o f t h e o f f s e t , and t h e n use t h e i d
370 // // t o o f f s e t w i t h i n t h e b l k s
371 // // o f f s e t = b l o c k A d d r ( o f f s e t ) ;
372 // // o f f s e t += i d ;
373 // // a c c e s s _ s i z e = 0 ;
374 // // dma_access_size = 0 ;
375 //
376 // R e q u e s t ∗ r e q = new R e q u e s t ( ) ;
377 // R e q u e s t : : F l a g s f l a g s ;
378 // Addr paddr ;
379 //
380 // i f ( u n c a c h e a b l e ) {
381 // f l a g s . s e t ( R e q u e s t : : UNCACHEABLE) ;
382 // paddr = uncacheAddr + o f f s e t ;
383 // } e l s e {
384 // paddr = ( ( b a s e ) ? baseAddr1 : baseAddr2 ) + o f f s e t ;
385 // }
386 // b o o l d o _ f u n c t i o n a l = f a l s e ;
387 //
388 // i f ( issueDmas ) {
389 // paddr &= ~ ( ( 1 << dma_access_size ) − 1 ) ;
390 // req −>s e t P h y s ( paddr , 1 << dma_access_size , f l a g s ) ;
391 // req −>s e t T h r e a d C o n t e x t ( id , 0 ) ;
392 // } e l s e {
393 // paddr &= ~ ( ( 1 << a c c e s s _ s i z e ) − 1 ) ;
394 // req −>s e t P h y s ( paddr , 1 << a c c e s s _ s i z e , f l a g s ) ;
395 // req −>s e t T h r e a d C o n t e x t ( id , 0 ) ;
396 // }
397 // a s s e r t ( req −>g e t S i z e ( ) == 1 ) ;
398 //
399 // u i n t 8 _ t ∗ r e s u l t = new u i n t 8 _ t [ 8 ] ;
400 //
401 // i f ( cmd < p e r c e n t R e a d s ) {
402 // // read
403 //
404 // // For now we o n l y a l l o w one o u t s t a n d i n g r e q u e s t per a d d r e s s
405 // // per t e s t e r This means we assume CPU does w r i t e f o r w a r d i n g
406 // // t o r e a d s t h a t a l i a s s o m e t h i n g i n t h e cpu s t o r e b u f f e r .
407 // i f ( o u t s t a n d i n g A d d r s . f i n d ( paddr ) != o u t s t a n d i n g A d d r s . end ( ) ) {
408 // d e l e t e [ ] r e s u l t ;
409 // d e l e t e r e q ;
410 // r e t u r n ;
411 // }
412 //
413 // o u t s t a n d i n g A d d r s . i n s e r t ( paddr ) ;
414 //
63
Appendix B. Relevant source code
415 // // ∗∗∗∗∗ NOTE FOR RON: I ’m not s u r e how t o a c c e s s checkMem . − Kevin
416 // f u n c P o r t . r e a d B l o b ( req −>g e t P a d d r ( ) , r e s u l t , req −>g e t S i z e ( ) ) ;
417 //
418 // c c p r i n t f ( ce r r ,
419 // " i d %d i n i t i a t i n g %s r e a d a t addr %x ( b l k %x ) e x p e c t i n g %x \n " ,
420 // id , d o _ f u n c t i o n a l ? " f u n c t i o n a l " : " " , req −>g e t P a d d r ( ) ,
421 // b l o c k A d d r ( req −>g e t P a d d r ( ) ) , ∗ r e s u l t ) ;
422 //
423 // P a c k e t P t r p k t = new Packet ( req , MemCmd : : ReadReq , Packet : : B r o a d c a s t ) ;
424 // pkt −>s e t S r c ( 0 ) ;
425 // pkt −>dataDynamicArray ( new u i n t 8 _ t [ req −>g e t S i z e ( ) ] ) ;
426 // MemTestSenderState ∗ s t a t e = new MemTestSenderState ( r e s u l t ) ;
427 // pkt −>s e n d e r S t a t e = s t a t e ;
428 //
429 // i f ( d o _ f u n c t i o n a l ) {
430 // a s s e r t ( pkt −>needsResponse ( ) ) ;
431 // pkt −>s e t S u p p r e s s F u n c E r r o r ( ) ;
432 // c a c h e P o r t . s e n d F u n c t i o n a l ( p k t ) ;
433 // c o m p l e t e R e q u e s t ( p k t ) ;
434 // } e l s e {
435 // sendPkt ( p k t ) ;
436 // }
437 // } e l s e {
438 // // w r i t e
439 //
440 // // For now we o n l y a l l o w one o u t s t a n d i n g r e q u e s t per a d d r e e s s
441 // // per t e s t e r . This means we assume CPU does w r i t e f o r w a r d i n g
442 // // t o r e a d s t h a t a l i a s s o m e t h i n g i n t h e cpu s t o r e b u f f e r .
443 // i f ( o u t s t a n d i n g A d d r s . f i n d ( paddr ) != o u t s t a n d i n g A d d r s . end ( ) ) {
444 // d e l e t e [ ] r e s u l t ;
445 // d e l e t e r e q ;
446 // r e t u r n ;
447 // }
448 //
449 // o u t s t a n d i n g A d d r s . i n s e r t ( paddr ) ;
450 //
451 // DPRINTF( MemTest , " i n i t i a t i n g %s w r i t e a t addr %x ( b l k %x ) v a l u e %x \n " ,
452 // d o _ f u n c t i o n a l ? " f u n c t i o n a l " : " " , req −>g e t P a d d r ( ) ,
453 // b l o c k A d d r ( req −>g e t P a d d r ( ) ) , d a t a & 0 x f f ) ;
454 //
455 // P a c k e t P t r p k t = new Packet ( req , MemCmd : : WriteReq , Packet : : B r o a d c a s t ) ;
456 // pkt −>s e t S r c ( 0 ) ;
457 // u i n t 8 _ t ∗ pkt_data = new u i n t 8 _ t [ req −>g e t S i z e ( ) ] ;
458 // pkt −>dataDynamicArray ( pkt_data ) ;
459 // memcpy ( pkt_data , &data , req −>g e t S i z e ( ) ) ;
460 // MemTestSenderState ∗ s t a t e = new MemTestSenderState ( r e s u l t ) ;
461 // pkt −>s e n d e r S t a t e = s t a t e ;
462 //
463 // i f ( d o _ f u n c t i o n a l ) {
464 // pkt −>s e t S u p p r e s s F u n c E r r o r ( ) ;
465 // c a c h e P o r t . s e n d F u n c t i o n a l ( p k t ) ;
466 // c o m p l e t e R e q u e s t ( p k t ) ;
467 // } e l s e {





Appendix B. Relevant source code
pintrace.py
1 # C o p y r i g h t ( c ) 2006 −2007 The Regents o f The U n i v e r s i t y o f Michigan
2 # A l l r i g h t s r e s e r v e d .
3 #
4 # R e d i s t r i b u t i o n and use i n s o u r c e and b i n a r y forms , w i t h or w i t h o u t
5 # m o d i f i c a t i o n , are p e r m i t t e d p r o v i d e d t h a t t h e f o l l o w i n g c o n d i t i o n s are
6 # met : r e d i s t r i b u t i o n s o f s o u r c e code must r e t a i n t h e above c o p y r i g h t
7 # n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r ;
8 # r e d i s t r i b u t i o n s i n b i n a r y form must r e p r o d u c e t h e above c o p y r i g h t
9 # n o t i c e , t h i s l i s t o f c o n d i t i o n s and t h e f o l l o w i n g d i s c l a i m e r i n t h e
10 # documentat ion and / or o t h e r m a t e r i a l s p r o v i d e d w i t h t h e d i s t r i b u t i o n ;
11 # n e i t h e r t h e name o f t h e c o p y r i g h t h o l d e r s nor t h e names o f i t s
12 # c o n t r i b u t o r s may be used t o e n d o r s e or promote p r o d u c t s d e r i v e d from
13 # t h i s s o f t w a r e w i t h o u t s p e c i f i c p r i o r w r i t t e n p e r m i s s i o n .
14 #
15 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
16 # "AS IS " AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
17 # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
18 # A PARTICULAR PURPOSE ARE DISCLAIMED . IN NO EVENT SHALL THE COPYRIGHT
19 # OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT , INCIDENTAL,
20 # SPECIAL , EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
21 # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES ; LOSS OF USE,
22 # DATA, OR PROFITS ; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
23 # THEORY OF LIABILITY , WHETHER IN CONTRACT, STRICT LIABILITY , OR TORT
24 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 #






33 from m5. ob j e c t s import ∗
34
35 par se r = optparse . OptionParser ( )
36
37 #p a r s e r . add_option ("−m" , "−−m a x t i c k " , t y p e =" i n t " , d e f a u l t=m5 . MaxTick ,
38 #metavar ="T" ,
39 #h e l p =" Stop a f t e r T t i c k s " )
40
41 ( opt ions , args ) = par se r . parse_args ( )
42
43 i f args :
44 print " Error : ␣ s c r i p t ␣doesn ’ t ␣ take ␣any␣ p o s i t i o n a l ␣arguments "
45 sys . e x i t (1 )
46
47 # d e f i n e p r o t o t y p e L1 cache
48 proto_l1 = BaseCache ( s i z e = ’ 32kB ’ , a s soc = 4 , b lock_s ize = 128 ,
49 l a t ency = ’ 1ns ’ , tgts_per_mshr = 1)
50
51 proto_l1 . mshrs = 1
52
53 pr = PinReader ( )
54
55 tcpu = TraceCPU( t rac e = pr )
56
57 # n e x t comes L1 cache , i f any
58 #p r o t o t y p e s . i n s e r t ( 0 , p r o t o _ l 1 )
59
60 # system s i m u l a t e d
61
62 system = System (physmem = PhysicalMemory ( l a t ency = " 100ns " ) )
63
64 new_bus = Bus ( c l o ck=" 500MHz" , width=16)
65 system . physmem . cpu_side_bus = new_bus
66 system . physmem . port = new_bus . master
67
68 data_l1 = BaseCache ( s i z e = ’ 32kB ’ , a s soc = 4 , b lock_s ize = 64 ,
69 l a t ency = ’ 1ns ’ , tgts_per_mshr = 8)
70 data_l1 . mshrs = 1
71
72 ins_l1 = BaseCache ( s i z e = ’ 32kB ’ , a s soc = 4 , b lock_s ize = 64 ,
73 l a t ency = ’ 1ns ’ , tgts_per_mshr = 8)
74 ins_l1 . mshrs = 1
75
76 new_bus . cache = [ data_l1 , ins_l1 ]
77 new_bus . s l av e = data_l1 . mem_side
78 new_bus . s l av e = ins_l1 . mem_side
79
80 data_l1 . cpu = tcpu
81
82 tcpu . data = data_l1 . cpu_side
65
Appendix B. Relevant source code
83 tcpu . i n s t r u c t i o n s = ins_l1 . cpu_side
84 #d e f m a k e _ l e v e l ( spec , p r o t o t y p e s , a t t a c h _ o b j , a t t a c h _ p o r t ) :
85 #f a n o u t = s p e c [ 0 ]
86 #p a r e n t = a t t a c h _ o b j # use a t t a c h o b j as c o n f i g p a r e n t t o o
87 #i f l e n ( s p e c ) > 1 and ( f a n o u t > 1 or o p t i o n s . f o r c e _ b u s ) :
88 #new_bus = Bus ( c l o c k ="500MHz" , w i d t h =16)
89 #new_bus . p o r t = g e t a t t r ( a t t a c h _ o b j , a t t a c h _ p o r t )
90 #p a r e n t . cpu_side_bus = new_bus
91 #a t t a c h _ o b j = new_bus
92 #a t t a c h _ p o r t = " p o r t "
93 #o b j s = [ p r o t o t y p e s [ 0 ] ( ) f o r i i n x r a n g e ( f a n o u t ) ]
94 #i f l e n ( s p e c ) > 1 :
95 ## we j u s t b u i l t caches , more l e v e l s t o go
96 #p a r e n t . cache = o b j s
97 #f o r cache i n o b j s :
98 #cache . mem_side = g e t a t t r ( a t t a c h _ o b j , a t t a c h _ p o r t )
99 #m a k e _ l e v e l ( s p e c [ 1 : ] , p r o t o t y p e s [ 1 : ] , cache , " cpu_side " )
100 #e l s e :
101 ## we j u s t b u i l t t h e MemTest o b j e c t s
102 #p a r e n t . cpu = o b j s
103 #f o r t i n o b j s :
104 #t . t e s t = g e t a t t r ( a t t a c h _ o b j , a t t a c h _ p o r t )
105 #t . f u n c t i o n a l = system . funcmem . p o r t
106
107 #m a k e _ l e v e l ( t r e e s p e c , p r o t o t y p e s , sys tem . physmem , " p o r t " )
108
109 # −−−−−−−−−−−−−−−−−−−−−−−
110 # run s i m u l a t i o n
111 # −−−−−−−−−−−−−−−−−−−−−−−
112
113 root = Root ( fu l l_system = False , system = system )
114 root . system .mem_mode = ’ t iming ’
115
116 root . system . system_port = root . system . physmem . port
117
118 # Not much p o i n t i n t h i s b e i n g h i g h e r than t h e L1 l a t e n c y
119 m5. t i c k s . setGlobalFrequency ( ’ 1ns ’ )
120
121 # i n s t a n t i a t e c o n f i g u r a t i o n
122 m5. i n s t a n t i a t e ( )
123
124 # s i m u l a t e u n t i l program t e r m i n a t e s
125 exit_event = m5. s imulate (m5.MaxTick )
126
127 print ’ Ex i t ing ␣@␣ t i c k ’ , m5. curTick ( ) , ’ because ’ , ex it_event . getCause ( )
66
Bibliography
[1] Kenneth Barr. Dinerotool. Oct. 2005. url: http://kbarr.net.
[2] Nathan Binkert et al. “The gem5 simulator”. In: SIGARCH Comput.
Archit. News 39.2 (Aug. 2011), pp. 1–7. issn: 0163-5964. doi: 10.1145/
2024716.2024718. url: http://doi.acm.org/10.1145/2024716.
2024718.
[3] Zhongliang Chen et al. The Multi2Sim Simulation Framework. url:
http://www.multi2sim.org/files/multi2sim-r277.pdf.
[4] Circular buffer. Nov. 2012. url: http : / / en . wikipedia . org / w /
index.php?title=Circular_buffer&oldid=522370238#Always_
Keep_One_Slot_Open.
[5] H. J. Curnow and B. A. Wichmann. “A synthetic benchmark”. In: The




[6] Susan L. Graham, Peter B. Kessler, and Marshall K. Mckusick. “Gprof:
A call graph execution profiler”. In: SIGPLAN Not. 17.6 (June 1982),




[7] Mark Hill and Jan Edler. Dinero IV Trace-Driven Uniprocessor Cache
Simulator. Feb. 1998. url: http://www.cs.wisc.edu/~markhill/
DineroIV/.
[8] Chi-Keung Luk et al. “Pin: building customized program analysis tools
with dynamic instrumentation”. In: Proceedings of the 2005 ACM SIG-
PLAN conference on Programming language design and implementa-
tion. PLDI ’05. Chicago, IL, USA: ACM, 2005, pp. 190–200. isbn:
1-59593-056-6. doi: 10.1145/1065010.1065034. url: http://doi.
acm.org/10.1145/1065010.1065034.
[9] J.E. Miller et al. “Graphite: A distributed parallel simulator for mul-
ticores”. In: High Performance Computer Architecture (HPCA), 2010
IEEE 16th International Symposium on. Jan. 2010, pp. 1 –12. doi:
10.1109/HPCA.2010.5416635. url: http://groups.csail.mit.
edu/carbon/docs/graphite_hpca2010_preprint.pdf.
[10] Vijay Janapa Reddi et al. “PIN: a binary instrumentation tool for com-
puter architecture research and education”. In: Proceedings of the 2004
workshop on Computer architecture education: held in conjunction with
the 31st International Symposium on Computer Architecture. WCAE
’04. Munich, Germany: ACM, 2004. doi: 10.1145/1275571.1275600.
url: http://doi.acm.org/10.1145/1275571.1275600.
[11] Cloyce D. Spradling. “SPEC CPU2006 Benchmark Tools”. In: SIGARCH
Computer Architecture News 35 (1 Mar. 2007).
[12] Richard M. Stallman. GDB manual: the GNU source-level debugger.
2nd, GDB version 2.5. Free Software Foundation, Inc. 51 Franklin
Street, Fifth Floor, Boston, MA 02110-1301, USA, Tel: (617) 876-3296,
Feb. 1988, pp. ii + 63.
68
Bibliography Bibliography
[13] Richard M. Stallman. Using and Porting GNU CC. Tech. rep. 51 Franklin
Street, Fifth Floor, Boston, MA 02110-1301, USA, Tel: (617) 876-3296:
Free Software Foundation, Inc., 1988.
[14] The gcc website. url: http://gcc.gnu.org/.
[15] The gdb website. url: http://www.gnu.org/software/gdb/.
[16] The Gem5 website. url: http://www.gem5.org/.
[17] The gprof website. url: http://sourceware.org/binutils/docs/
gprof/.
[18] The Graphite website. url: http://groups.csail.mit.edu/carbon/
?page_id=111.
[19] The modified SPLASH-2 website. url: www.capsl.udel.edu/splash/.
[20] The Multi2Sim website. url: http://www.multi2sim.org/.
[21] The Pin website. url: http://software.intel.com/en-us/articles/
pintool/.
[22] The SPEC CPU2006 website. url: http://www.spec.org/cpu2006/.
[23] The SPLASH-2 website. url: http://web.archive.org/web/http:
//www-flash.stanford.edu/apps/SPLASH/.
[24] vanDooren. Creating a thread safe producer consumer queue in C++
without using locks. Jan. 2007. url: http://msmvps.com/blogs/
vandooren / archive / 2007 / 01 / 05 / creating - a - thread - safe -
producer-consumer-queue-in-c-without-using-locks.aspx.
[25] Reinhold P. Weicker. “Dhrystone: a synthetic systems programming
benchmark”. In: Commun. ACM 27.10 (Oct. 1984), pp. 1013–1030.




[26] S.C. Woo et al. “The SPLASH-2 Programs: Characterization and Method-
ological Considerations”. In: Proc. of the 22nd International Sympo-
sium on Computer Architecture. June 1995.
70
