Time-Sharing Time Warp via Lightweight Operating System Support by Pellegrini, Alessandro & Quaglia, Francesco
Time-Sharing Time Warp
via Lightweight Operating System Support
Alessandro Pellegrini and Francesco Quaglia
DIAG – Sapienza, University of Rome
Via Ariosto 25, 00185 Rome, Italy
{pellegrini, quaglia}@dis.uniroma1.it
ABSTRACT
The order according to which the dierent tasks are car-
ried out within a Time Warp platform has a direct impact
on performance, given that event processing is speculative,
thus being subject to the possibility of being rolled-back.
It is typically recognized that not-yet-executed events hav-
ing lower timestamps should be given higher CPU-schedule
priority, since this contributes to keep low the amount of
rollbacks. However, common Time Warp platforms usually
execute events as atomic actions. Hence control is bounced
back to the underlying simulation platform only at the end of
the current event processing routine. In other words, CPU-
scheduling of events resembles classical batch-multitasking
scheduling, which is recognized not to promptly react to
variations of the priority of pending tasks (e.g. associated
with the injection of new events in the system). In this
article we present the design and implementation of a time-
sharing Time Warp platform, to be run on multi-core ma-
chines, where the platform-level software is allowed to take
back control on a periodical basis (with ne grain period),
and to possibly preempt any ongoing event processing ac-
tivity in favor of dispatching (along the same thread) any
other event that is revealed to have higher priority. Our
proposal is based on an ad-hoc kernel module for Linux,
which implements a ne grain timer-interrupt mechanism
with lightweight management, which is fully integrated with
the modern top/bottom-half timer-interrupt Linux architec-
ture, and which does not induce any bias in terms of relative
CPU-usage planning across Time Warp vs non-Time Warp
threads running on the machine. Our time-sharing architec-
ture has been integrated within the open source ROOT-Sim
optimistic simulation package, and we also report some ex-
perimental data for an assessment of our proposal.
Categories and Subject Descriptors
D.4.1 [Operating Systems]: Process Management|Threads;
I.6.8 [Simulation and Modeling]: Types of Simulation|
Discrete Event, Parallel
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
SIGSIM-PADS’15, June 10–12, 2015, London, United Kingdom.
Copyright c 2015 ACM ISBN 978-1-4503-3583-6/15/06...$15.00.
DOI: http://dx.doi.org/10.1145/2601381.2601398.
General Terms
Algorithms, Performance
Keywords
PDES; Speculative Processing; Preemptive Scheduling
1. INTRODUCTION
Time Warp [10] is the reference synchronization protocol
for optimistic parallel processing of discrete event simula-
tion models. It allows the worker threads operating within
the simulation platform to process simulation events specu-
latively, with no preliminary assessment of their safety. On
the other hand, in case processed events are eventually re-
vealed as non-causally consistent, their eects on the sim-
ulation model execution trajectory are undone via rollback
schemes based on state recovery techniques exploiting either
checkpointing [19, 21, 18] or reverse computing [5, 24]. The
relevance of Time Warp (and of simulation platforms based
on this synchronization paradigm) lies in that it allows for
extremely high scalability. In fact, as recently shown by
the results provided in [12], Time Warp systems exhibit the
potential for scaling up to millions of processing units.
As for software development, beside some historical pro-
posal along the path of developing Time Warp as a special-
purpose operating system oriented at supporting discrete
event applications [11], the common trend is the one ac-
cording to which Time Warp systems are built as user-space
platforms, to be hosted on top of general-purpose operating
systems (see, e.g., [4, 17, 14, 6]). As a consequence, the
platform-level software within the whole Time Warp archi-
tecture is seen by the application-level code (namely, the
simulation model) as a library oering a specic API (e.g.,
for injection and CPU-dispatching of simulation events) and,
in the most advanced cases (see [18]), providing application
transparent support for recoverability1.
The major consequence by the library-based approach is
that control is bounced back to the platform-level software
(along any worker thread) only upon the occurrence of spe-
cic run-time events, such as the end of the execution of the
last CPU-dispatched simulation event or the interception of,
e.g., memory update operations that trigger platform-level
recoverability capabilities ultimately allowing to restore a
1As an example, in some proposals compile/link time (in-
strumentation) directives are used to allow the platform-
level library to intercept memory updates issued by the ap-
plication code so as to transparently support log/recovery
of the application-level data structures.
previous simulation model's state. In other words, all the
simulation events processed within conventional Time Warp
systems are actually CPU-dispatched according to a clas-
sical batch-multitasking scheme, where the platform-level
software is not allowed to take back control independently
of the activities that are carried out by the application code.
As a consequence, the platform-level software is not allowed
to re-evaluate CPU assignment until the completion of the
last-dispatched simulation event. Therefore, it is not able
to CPU-dispatch any other simulation event that may have
been produced in the system, which may have a higher prior-
ity (e.g., a lower timestamp) compared to the one currently
being processed by the CPU [22].
Regaining control frequently, with re-evaluation of the as-
signment of the CPU, can be a relevant means for improv-
ing performance (thanks to the reduction of the incidence of
rollback) given that the CPU capacity can be dynamically
assigned to simulation events that currently require more
prompt execution, e.g. due to their lower timestamps, and
thus can more likely give rise to dependencies aecting the
virtual time window that is currently covered by the worker
threads processing activities. We note that this aspect is
also related to improving the energy eciency of Time Warp
platforms, given that reducing the amount of rollbacks in the
parallel run means reducing the overall energy used for any
individual productive unit of work done (namely, eventually
committed events) [20].
Clearly, the period according to which the platform-level
software re-gains control needs to be ne grain, especially
in contexts where CPU-requirements for processing simu-
lation events are on the order of (tens of) microseconds.
Hence classical timer-interrupt settings supported by con-
ventional operating systems do not suce for this purpose.
As an example, common Linux congurations lead the timer
to interrupt the current thread running on any CPU-core
with period on the order of 1 to 4 milliseconds (higher val-
ues are typically used on machines with larger amounts of
CPU-cores given that CPU preemption and reassignment
is less critical on these systems compared to those with re-
duced number of cores). Also, bouncing control back to
the platform-level software via standard temporized signals,
such as the POSIX SIGALRM signal, would be unfeasible be-
cause of the overhead, given that this approach would re-
quire the whole chain of signal management mechanisms to
be passed through at kernel level.
In this article we present the design and implementation
of a time-sharing Time Warp system, to be hosted on top
of multi-core machines running Linux, where the platform-
level software is allowed to take back control on a periodical
basis, with very ne grain period (e.g. on the order of tens
of microseconds) and is allowed to re-evaluate CPU assign-
ment, and to dynamically schedule higher priority simula-
tion events. This is achieved with minimal overhead thanks
to the capabilities of an ad-hoc Linux module for timer man-
agement that we have developed, which allows for (periodi-
cal) control ow variations along any running thread with no
intervention by the chain of kernel mechanisms used for sup-
porting POSIX signals. Our proposal is fully compliant with
the conventional and scalability oriented top/bottom-half
timer-interrupt management oered by Linux, and does not
create any bias in terms of actual CPU-assignment across the
threads (including kernel-level threads) operating in the sys-
tem. In other words, Time Warp threads are allowed to see
their original CPU ticks (those natively assigned by the oper-
ating system) as partitioned into sub-intervals (with proper
control ow management at the end of each sub-interval),
while any other active thread in the system (which is not a
Time Warp worker thread) is not subject to any partitioning
of its assigned ticks. This prevents to impair fairness, which
is an essential pre-requisite given that the time-sharing Time
Warp platform might run on a multi-user conventional plat-
form. This aspect is clearly relevant also when considering
fairness across Time Warp worker threads and kernel-level
threads used by the operating system for housekeeping.
Besides the ability to optimize CPU assignment depend-
ing on the (dynamic) priority of the tasks to be performed
within the Time Warp system, our proposal has also the ca-
pability to address some specic liveness problems related to
the speculative nature of Time Warp, such as application-
level innite loops that may arise when reaching an applica-
tion non-admissible state due to out of order events' execu-
tions [15]. These loops can be (timely) broken thanks to our
time-sharing approach which can be exploited for support-
ing preemptive rollback operations leading to the squash of
the non-admissible state trajectory.
The fully featured time-sharing architecture we have de-
veloped has been integrated within the open source ROOT-
Sim package2 [17, 9], and operates in a fully transparent
way to the overlying application code. Hence, the benets
from it come with no explicit intervention by the application
programmer. We also report experimental data for an as-
sessment of the time-sharing Time Warp proposal in terms
of both overhead by the extra-ticks and nal delivered per-
formance with a real-world case study application in the area
of simulation of wireless systems.
The remainder of the article is structured as follows. In
Section 2 we discuss related work. The whole time-sharing
architecture is presented in Section 3. Experimental data
are provided in Section 4.
2. RELATEDWORK
In the wide area of High-Performance-Computing (HPC)
systems, some literature studies exist on the relation be-
tween performance and timer-interrupt frequency. The com-
mon idea underlying most of the outcoming performance
optimization proposals is that the lower the timer-interrupt
frequency, the better the nal delivered performance [7, 8,
25]. The exacerbation of this approach led to dening tick-
less operating systems (namely, with extremely reduced fre-
quency of the timer-interrupt) as the best conguration for
hosting HPC applications. However, these studies have been
tailored to the case of non-speculative processing, where
any work carried out by the thread running on whichever
CPU-core is actually useful, hence there is no actual need
to change the software execution ow (e.g. periodically)
in order to optimize synchronization dynamics in terms of
reduction of wasted computation, which is instead the case
optimistic Time Warp systems. Also, the above studies have
been tailored to evaluate the eects of the variation of the
timer-interrupt frequency in contexts where the actual man-
agement of the timer-interrupt is still based on the native
rules applied by the operating system kernel. In other words,
the above proposals have been aimed at simply congur-
ing the timer-interrupt behavior (limited to its frequency)
2Available at http://github.com/HPDCS/ROOT-Sim.
in HPC contexts, not at introducing ad-hoc software mod-
ules for exploiting timer events, which is instead the path
we followed. In fact, our proposal puts in place a special
(and lightweight) mechanism for handling timer-interrupts.
Overall, we follow an approach which is completely dierent
from the one dealt with by those literature studies in terms
of both reference scenario (speculative vs non-speculative
processing) and architectural impact on the system organi-
zation.
In the context of optimistic PDES systems, the only work
we are aware of which deals with the relation between per-
formance and timer-interrupt conguration is the one in [3].
Here the author proposes an approach which is opposite to
ours, where the Time Warp threads are allowed to take CPU
control for longer periods (thus being not interrupted for
a while) in order to be able to fully execute a simulation
model with no interference by other workload on the sys-
tem, and to deliver the output in real-time. This solution is
still along the path of tick-less operating systems, with the
dierence that the tick-less behavior is triggered on-demand
(namely whenever a time-critical parallel simulation needs
to be executed), hence it is not a static conguration of
the underlying operating system. Our approach is fully or-
thogonal to this one because our target is the reduction of
wasted time, thanks to an appropriate periodic variation of
the control ow along Time Warp threads. Also, while the
proposal in [3] is based on reserving the computing capac-
ity for Time Warp programs|thus excluding the possibility
for other tasks to be run on the system for a while|in our
approach we do not create any bias in the usage of the com-
puting system across Time Warp threads and other kinds
of threads. We only allow the Time Warp threads to see
their own ticks as partitioned into sub-intervals (as hinted,
with proper control ow management at the end of each
sub-interval). This leads our proposal to be suited also for
context where the computing platform is shared across dif-
ferent users and applications.
As pointed out, our time-sharing Time Warp proposal also
entails the capability of supporting preemptive rollback of
the currently (incorrectly) executed simulation event. The
topic of preemptive rollback in optimistic discrete event sim-
ulation has been studied in literature, mainly in [6, 23]. The
Time Warp platform presented in [6] supports event pre-
emptive rollback for optimistic parallel simulation on shared
memory machines, and is based on direct manipulation of
the event list of the recipient simulation object by the thread
along which the generation of a new event is handled. With
this solution, the sender thread is able to determine the cur-
rent simulation time of the recipient simulation object and
whether any message/anti-message being sent to that ob-
ject violates causality. If this is the case, the sender thread
noties the violation to the thread handling the recipient
object, which is done to timely interrupt any in-progress ac-
tivity in order to execute rollback operations. Our solution
is dierent since it does not rely on cross-thread signaling.
Also, in our approach, any Time Warp thread is allowed to
change its current ow (and dynamically dispatch a dier-
ent simulation event after preemption of the last dispatched
one) independently of the actual materialization of a causal-
ity violation, rather when we also must give higher priority
to a dierent simulation event, possibly to be executed at a
simulation object dierent from the currently running one,
anyhow currently bound to the worker thread (e.g., in or-
der to reduce the likelihood of future rollback generation).
This is basically due to the fact that our time-sharing Time
Warp system is not limited to the support for preemptive
rollback. As for the preemptive rollback approach in [23], it
is suited for distributed memory systems (based on Myrinet
interconnection) while we deal with shared memory multi-
core machines. Also, the solution in [23] is based on polling,
and the polling code to periodically verify the causal consis-
tency of the current event needs to be nested by the applica-
tion programmer in proper points of the application native
code. Instead, our proposal is based on interrupts, and is
fully transparent to the application code (and thus to the
developer). Finally, similarly to [6], the solution in [23] does
not cope with control ow variations associated with the
dynamic generation of higher-priority events (namely with
timestamps lower than that of the event currently executed
along the thread) that are not currently giving rise to a roll-
back operation.
As a matter of fact, our time-sharing Time Warp approach
is based on a kind of dual-mode execution, where control is
periodically pushed back to platform mode via the kernel-
level extra ticks' management system. Dual-mode execu-
tion in Time Warp systems has been already studied in
[16], however this work is focused on the management of
dierent memory views (via operating system-level ad-hoc
facilities) so that when running in application mode, only a
sub-portion of the whole address space is accessible by the
Time Warp worker thread, namely the sub-portion keep-
ing the memory layout of the dispatched simulation object.
Any access to the state of another object generates a trap
that gives control back to the platform code, which actuates
proper thread synchronization mechanisms so as to allow the
access to any valid memory location by any event processing
routine (as for classical sequential style coding of DES mod-
els). Rather, the present proposal is tailored to variations of
the control ow in order to react to the generation of higher
priority simulation events or tasks (such as rollbacks to be
processed) and to the dynamical assignment of CPU to these
events. Still, like the proposal in [16], we retain application
transparency.
Finally, the present proposal is clearly related to the sem-
inal work in [11], where Time Warp is instantiated as a
special-purpose operating system (although operating in user
space), destined to host discrete event applications accord-
ing to a speculative processing paradigm. The core dier-
ence between what we are presenting and the proposal in
[11] lies in that such an approach is still based on batch-
processing of the events, with preemption only used in case
of causality errors aecting the currently-dispatched simula-
tion event. Rather, our approach is a truly time-sharing one,
which is achieved thanks to the employment of kernel-level
ad-hoc (and lightweight) modules specically oriented to the
management of control ow variations (hence preemption in
its wide usage) on a ne-grain periodical basis.
3. THETIME-SHARINGARCHITECTURE
3.1 Basics on Linux Timer-Interrupts
As for our target machines, namely x86 ones, modern pro-
cessors are equipped with various timer facilities, among
which one is ultimately exploited to drive the passage of
time on each individual CPU-core. This is the LAPIC-timer
component supported by APIC (Advanced Programmable
8259A PIC
CPU 0
Timer
Local APIC
CPU 1
Local APIC
. . .
CPU n-1
Local APICNMI
LINT0
LINT1
Internal BUS
Interrupt
Messages IPIs
Interrupt
Messages IPIs
Interrupt
Messages IPIs
I/O APIC
System Chipset
Interrupt
Messages
External
Interrupts
Figure 1: x86 interrupt system.
Interrupt Controller), which is a timer-component local to
any CPU-core in the system. The general hardware organi-
zation on x86 architectures regarding interrupt management
is depicted in Figure 1.
The LAPIC-timer can be congured to operate in dier-
ent modes, among which the one used by the Linux kernel
is the periodic-interrupt mode. More specically, at kernel
startup, a so called calibration procedure is executed such
that the LAPIC-timer is setup (in terms of its internal hard-
ware counter, upon the expiration of which the interrupt is
issued towards the associated CPU-core) so as to periodi-
cally generate interrupts according to the frequency estab-
lished by the CONFIG_HZ parameter dened at kernel compile
time. As hinted, classical interrupt periods for entry level to
medium-end machines range from 1 to 4 milliseconds, which
is reected to values of CONFIG_HZ ranging from 1000 to 250.
Once setup at startup, the conguration of the LAPIC-timer
is never changed, thus the same interrupt period is always
used at system steady state.
As for software-side management of the interrupt, we have
that the actual management scheme supported by Linux is
based on the top/bottom-half paradigm. More specically,
upon the receipt of the LAPIC-timer interrupt, a very mini-
mal piece of code is executed by the CPU-core, which is only
used to update timing information and to possibly ag the
current thread in such a way that eventually the scheduler
is called and a context switch (that switches this thread o
the CPU) can take place. Actually, a thread that has been
dispatched on CPU is typically allowed to run for various
timer-ticks before being agged for re-schedule. The gen-
eral scheme for top/bottom-half management, for the sake
of clarity, is reported in Figure 2.
All the above (ne grain) actions are representative of
the top-half portion of the timer-interrupt handler. On the
the other hand, the call to the kernel schedule() function
for performing actual context switches (if requested) is ac-
tuated right prior to leaving kernel mode3. Therefore, the
schedule() function, which represents the core part of the
bottom-half of the timer-interrupt manager, is executed only
in case no kernel-level critical task is being executed by the
thread. This allows for scalability on multi-core processors,
given that de-scheduling a thread during the execution of
any kernel-level critical task, such as a spinlock-protected
kernel-level critical section, would lead the critical section
to be locked up to the point in time where this same thread
will be CPU-rescheduled, an operation that may occur after
3A minor variation is in place for the case of kernel threads,
which never leave kernel mode operations.
Upon interrupt request, verify
which device needs attention
Send Acknowledgement to the
device. Do not fire any other interrupt
while managing the current one
Check device status. What caused
the interrupt request? What is
the best way to handle it?
Schedule Bottom Half as a taks
TOP HALF
Check device status.
What processing should be done?
Perform all needed operations with
the device. This might involve
transfering data and updating status
Check device conditions;
Send signals to user applications
in case they are needed
BOTTOM HALF
INTERRUPT
REQUEST
Figure 2: Top/bottom-half general scheme.
an unpredictable amount of time (also depending on system
workload and relative thread priorities).
3.2 The Extra-Tick Logic
As hinted, our time-sharing architecture is based on a
kernel-level dierentiation between Time Warp threads and
other kinds of threads (running generic applications), since
only the former ones need to be managed according to the
lightweight extra-tick scheme. To this end, we have devel-
oped a Linux module oering support for a special device
le called dev_extra_tick such that:
 this special device le is single instance, hence no two
dierent concurrently-opened I/O sessions on it are al-
lowed. This is compliant with the idea that a single
process (namely the multi-thread Time Warp platform
running on the multi-core machine) needs to use the
facilities oered by the special device le for support-
ing the execution of all of its worker threads;
 a thread can register itself as a Time Warp worker-
thread by issuing as simple ioctl call towards the de-
vice le.
Registering a thread on the special device le allows the
kernel to know that the thread needs to undergo the extra-
tick policy. Actually, registration means that the thread
identier (as seen by the kernel, not by the pthread library,
since the two are typically dierent), is registered into a fast
access hash table, which is installed as part of the kernel
module data structures implementing the special device le
driver.
At this point, the portions of the whole kernel architecture
that need to know whether some thread is registered and
needs ad-hoc tick management are two: the kernel scheduler,
and the top-half of the timer-interrupt. The external module
implementing the management logic for the dev_extra_tick
device le is also in charge of redening the actual behav-
ior of the kernel scheduler and of the top-half of the timer-
interrupt so that their logic is modied so as to become
compliant with extra-tick management requirements.
Rather than providing a recompiled version of the kernel
with the aforementioned changes already implemented, our
module adopts a dynamic patching approach where parts of
the executable image of the kernel are rewritten at startup
of the external module implementing the device le.
To patch the kernel schedule() function, we retrieve the
memory position of the corresponding machine instructions
block from the system-map (typically available in Linux in-
stallations from the /boot directory of the root le system),
and we inject into this routine an execution ow variation
such that control goes to a schedule_hook() routine oered
by the external module right before the scheduler() would
execute its nalization part (e.g. stack realignment and re-
turn). A scheme of this patching approach is shown in Fig-
ure 3, which has been tested on Linux kernels from 2.6 to
3.2. The red block of code implementing the nalization
part of original schedule() function is initially sampled and
copied at the end of the schedule_hook() function by also
adjusting relative memory references (if present) in the copy.
Then we replace the red block of code in the original version
of schedule() with a block of machine code (the yellow
part) which allows passing control to schedule_hook() so
that the actual nal part of the scheduling process is under
the control of our external module. In the end, the sched-
ule_hook() function will simply execute the same return
actions originally planned by the kernel's schedule() func-
tion. However, patching the original scheduler in this way
allows the hook to take control when the decision about what
thread needs to take control of the CPU-core4 is already -
nalized. Hence, we know what thread will have control of
the CPU-core for the current set of ticks.
As a consequence, the hook is able to check whether the
thread is a registered one (so that it needs to be extra-ticked)
by consulting the aforementioned fast access hash table im-
plementing the registration record, and in the positive case
it executes the following additional steps:
(A) It changes the LAPIC-timer period by scaling it on the
basis of a conguration parameter supported by our
kernel module. The scaling factor is what determines
the length of the extra-tick interval.
(B) It records in a per CPU-core entry of a proper control
table (still managed by the module) that the current
CPU-core is working in extra-tick mode.
(C) It records in a proper per registered-thread entry of a
control table (again managed by the module) a counter
of extra-ticks not yet consumed by such a thread within
the current tick period.
Clearly, the information recorded in point B is also used in
order to revert the LAPIC-timer conguration to the orig-
inal one. More in detail, if the scheduler passes control to
a non-registered thread, and the current CPU-core is regis-
tered as operating in extra-tick mode, then the LAPIC-timer
is restored to its initially congured counter value, thus the
scheduled thread will run with a classical tick length, and the
control record associated with the CPU-core is reset in order
to reect that the CPU-core is no longer operating in extra-
tick mode. Note that this approach works also in scenarios
where the thread registered within the dev_extra_tick de-
vice le looses control of the CPU-core because of a passage
into a sleep state (e.g. for an I/O interaction). Overall the
above scheme allows restoring the LAPIC-timer congura-
tion to the original one each time a non-registered thread is
4It has actually already taken control of the CPU-core, since
we are returning from the scheduling process.
Original kernel image External module stuff
Copy (Step 1)
schedule()
address in the
System-map
Copy (Step 2)
Redirection code (based on jumps) injected in 
the original schedule() function 
to pass control to the 
scheduler-hook
schedule_hook()
function
actual jump
Ad-hoc tick 
management for 
registered (Time 
Warp) threads
Figure 3: Dynamic patching of the kernel scheduler.
SP
IP
CPU-context
saved/restored
upon 
entering/exiting
LAPIC-timer top-half 
kernel 
stack area
user
stack area
added element
address of the extra-tick
management callback function
Step 1
Step 2
Figure 4: Stack and CPU context management by
the LAPIC-timer top-half hook.
(re)scheduled independently of any state-transition of regis-
tered (hence extra-ticked) threads in the operating system
state diagram.
Let us now analyze how the original top-half of the LAPIC-
timer interrupt has been patched in our architecture, so that
the extra-ticks can be actually exploited for control ow vari-
ations of the dev_extra_tick registered threads, namely the
ones that in our overall architectural organization run within
the Time Warp platform. The patch has been developed by
targeting kernel version 3.16.7, but it is of general use (ex-
cept for a few minor modications that might be required
for other kernel versions depending on the exact path of
execution of, e.g., very basic actions in the preamble of the
actual timer-interrupt management logic|details on this as-
pect will come shortly).
Top-half modules in conventional Linux congurations are
made up by two dierent code blocks, a launcher and an
actual top-half procedure. The launcher takes control when
the CPU-core rmware accepts the interrupt. It is in charge
of aligning the kernel-level stack of the interrupted thread
to a proper snapshot and then of calling the actual top-half
module. Such a snapshot also includes the CPU-context to
be restored once the interrupt top-half procedure ends. This
includes the stack pointer (SP) and the instruction pointer
(IP) associated with the interrupted execution ow, which
will play a major role in our time-sharing architecture.
In our patching approach of the LAPIC-timer interrupt
management logic, we have still exploited the system-map to
locate the launcher code block in the kernel memory image,
and then we patched it by replacing the call to the original
top-half with one to a top-half hook function oered by the
external module that we have developed, which therefore
fully replaces the original top-half procedure. This top-half
hook is in charge of executing the same identical basic ac-
tions as those executed by the original top-half procedure
(such as acknowledging the accepted interrupt). However, it
discriminates if the interrupted thread is a dev_extra_tick
registered one (namely, one subject to extra-tick manage-
ment), and in the positive case it executes the following ac-
tions:
(i) It decreases the extra-tick counter associated with the
thread (as pointed out, this is the counter that is set
upon the reschedule of any thread registered on the
dev_extra_tick device le).
(ii) If the counter reaches the value zero, then it means
that a whole originally-sized tick-period has expired
(hence the thread consumed all the extra-ticks granted
to it in its current tick period). In this case, the top-
half hook calls the actual kernel function used to up-
date kernel-level timing information (in most of the re-
cent kernel versions this work is carried out via the lo-
cal_apic_timer_interrupt() kernel function). This
mimics the behavior of the original top-half manager
execution path, given that it would trigger the timing
information update function exactly at the end of each
originally-sized tick-period, hence upon any LAPIC-
timer interrupt when using the classical timer calibra-
tion.
(iii) The top-half hook changes the IP kept by the pro-
cessor image registered into the system stack upon
interrupt acceptance, so that the interrupted thread
will gain control in a proper machine code block upon
the restore of that image onto the CPU-core (namely,
when returning from LAPIC-timer interrupt). Conse-
quently, the top-half hook also changes the application-
level stack layout of the thread by adding a program-
counter return value that will allow that code block to
exactly return control to the instruction interrupted
by the extra-tick (namely, the original IP value logged
into the CPU-context snapshot on the system stack).
This is done by exploiting the SP value from the logged
CPU-context, which then is also modied in order to
reect the insertion of a new element at the top of the
user level stack. A schematization of the performed
operations is provided in Figure 4.
(iv) Finally, in case the extra-tick counter of the thread reg-
istered within the dev_extra_tick device le reached
the value zero|see point (ii)|the thread is again lled
with the number of extra-ticks (say N) it is allowed to
receive in the next tick period.
In our time-sharing Time Warp architecture, the address
of the code block that will take control thanks to the instruc-
tion pointer variation in point (iii) represents an ad-hoc call-
back function of the Time Warp platform, which will period-
ically (namely, at each extra-tick expiration) bring control
to the platform-level software along any Time Warp worker
thread. This address is posted to the kernel when calling the
same ioctl system call that is used for registering the thread
in the dev_extra_tick device le as one to be extra-ticked.
Overall, a Time Warp thread can atomically register itself
for being subject to extra-ticks and post the address of the
LAPIC-timer interrupt
Interrupted thread 
is registered ?
no
local_apic_timer_inteterrupt()
return from interrupt
Decrease thread 
extra-tick counter
yes
Counter is zero ?
Set counter to N
local_apic_timer_inteterrupt()
yes
1 – post IP onto the stack (and update SP)
2 – post callback address onto IP
no
Basic actions
yes
Kernel or interrupt 
context already on
no
Figure 5: Behavior of the top-half hook for the
LAPIC-timer interrupt.
callback function whose execution is activated thanks to the
actions by the top-half hook of the LAPIC-timer interrupt
we provide within our module.
The actual scheme according to which our top-half hook
for the LAPIC-timer interrupt works is depicted in Figure 5.
As one may observe, the hook version of the LAPIC-timer
interrupt manager is still lightweight given that the addi-
tional actions it performs (compared to the original version
of the top-half procedure) have constant time and are mostly
related to decrementing and (possibly) setting a counter (the
per-registered-thread extra-tick counter) and setting a few
memory locations, one in the application-level thread stack
(see point 1 in the bottom-right box of Figure 5, where the
IP present in the snapshot of the user level CPU-context,
which is already logged in the system level stack upon en-
tering the top-half hook, is saved onto the user level stack),
and the other ones in the user level CPU-context logged in
the stack (namely IP and SP values, that will be then re-
stored upon exiting the interrupt procedure).
As an additional note, care must be taken when receiving
an extra-tick along a thread which is already working at ker-
nel level. This may happen in case the extra-ticked thread
called a system call, or if it entered kernel mode because of
the receipt of a generic interrupt by some device (see Fig-
ure 1) or even because of a generic trap (e.g. an empty-zero
memory access requiring the physical allocation of the re-
quested page). In this case, variation of the ow control
must not be actuated given that it would violate any, e.g.,
atomicity rule for the execution of already activated kernel-
level code. This is reected in our scheme by having that
the control ow variation is actuated only if the interrupted
thread was not already working in kernel mode or was not
already subject to an interrupt. Such a check has been im-
plemented in our top-half hook by simply checking whether
the IP to be restored contains a kernel-level address, and
by also checking whether an interrupt context was already
active on the current CPU-core (a check that is anyhow al-
ready assessed by the basic actions that are carried out also
by the original top-half procedure).
The periodic execution ow variation induced via extra-
tick management, although being similar in spirit to the
one adopted for handling POSIX signals, is still much more
lightweight, given that on demand usage of temporized sig-
nals would require passing through the whole scheduling
process (and also trough system calls for requesting the ac-
tivation of each individual alarm). Also, conventional sig-
nal handlers for temporized signals cannot operate with the
same time granularity we can impose in our architecture
(namely the extra-tick period) just because, in the worst
case, a whole tick period might be requested in order for the
kernel to take control back and determine that some alarm
has expired for the thread.
As a nal note, we ensure execution safety while mounting
the kernel module5 by synchronizing all the CPU-cores dur-
ing the mounting operation via the smp_call_function()
kernel function. This can be used to trigger the execution of
a same code block on all the cores. We have used it to im-
plement a master/slave protocol, similar in spirit to the one
used while booting the Linux kernel, where a single CPU-
core executes the actual patching of the kernel code (by also
temporarily disabling write protection on the kernel image)
during module startup, while the other CPU-cores wait for
the master to nish. In this way, no critical race will take
place, preventing any CPU-core from accessing the not yet
nalized image of the kernel.
3.3 Detection and Management of Event and
Task Priority Variations
The extra-tick architecture presented in the previous sec-
tion is of general applicability, in terms of its ability to
periodically bring control back to a specic portion of the
platform-level software, and to dynamically (re)schedule on
CPU higher-priority tasks. On the other hand, how to ex-
ploit it within an optimistic PDES platform depends on
proper platform internals.
In this section we discuss the integration we performed
between the extra-tick manager and the ROOT-Sim open
source optimistic simulation platform. However, given that
this platform implements some relevant reference architec-
tural solutions specically tailored for multi-core environ-
ments (see [27]), the presented integration, beyond giving
rise to a specic time-sharing Time Warp solution, can also
be seen as one providing reference guidelines for time-sharing
organizations of optimistic PDES.
The core ROOT-Sim aspect that is of interest in this dis-
cussion is the management of the message (or anti-message)
exchange across dierent worker threads operating within
the platform. More in detail, given that ROOT-Sim man-
ages any subset of simulation objects, say S, by (temporar-
ily) binding them to a specic worker thread, say t, and
the adopted CPU-dispatching rule is the classical Lowest-
Timestamp-First (LTF), the CPU-dispatched event associ-
ated with the objects in S is always the one with highest
priority.
The only exception is when messages, or anti-messages,
produced while concurrently running other simulation ob-
jects along other worker threads, and destined to some ob-
ject belonging to the set S handled by t, will carry a times-
tamp which is lower than the one associated with the last-
dispatched lowest-timestamp event. Clearly, this cannot be
known before the CPU-schedule operation along thread t.
5We recall that during the mount operation, the module
initialization function rewrites some parts of the kernel code
at run-time, which is a critical procedure.
In ROOT-Sim, the exchange of messages/anti-messages
across dierent worker threads does not take place by di-
rectly incorporating the corresponding information into the
destination object event queue. Rather, messages are ex-
changed according to a top/bottom-half approach, still ori-
ented to scalability. Particularly, each worker thread man-
ages a set of bottom-half queues (one for each simulation
object it is currently handling) such that any other worker
thread in the system can notify the presence of new data
to be ultimately incorporated into the destination object's
event queue via the corresponding bottom-half queue. Check-
ing whether some new data is present into a bottom-half
queue, and actual processing of the data with incorpora-
tion into the destination event queue, is carried out exclu-
sively by the worker thread in charge of (currently) handling
the destination object. This scheme is complemented by
a constant-time management of the critical section for in-
serting/deleting elements into/from any bottom-half queue,
which leads actual thread synchronization to scale.
The above scheme has been extended in order to perform
the integration with the extra-tick management architecture
along the following lines. First, each worker thread t oper-
ating in the Time Warp platform has been associated with a
BH mint record, which represents at any time instant the
minimum timestamp of any message/anti-message that has
been recorded in any of the bottom-half queues associated
with the simulation objects that t is currently managing,
since the last ush operation of these queues. In other words,
BH mint represents the minimum value among the times-
tamps of information in transit (if any), which is destined
to some simulation object handled by t.
This record is initialized to the special macro INFTY (via a
single atomic assignment operation) when the worker thread
t accesses its bound bottom-half queues an ushes the data
into the corresponding event queues. Whenever a dierent
worker thread inserts a bottom-half record into any of the
bottom-half queues associated with the simulation objects
managed by t, the reductionBH mint =Min(BH mint; T )
is performed, where T represents the timestamp of the mes-
sage/anti-message that is being placed into the destination
bottom-half queue. In our implementation, this reduction
is performed via an atomic Compare-And-Swap (CAS) in-
struction. This allows manipulating BH mint while not
requiring worker threads that concurrently access two dis-
tinct bottom-half queues bound to t to execute a conicting
critical section6.
Another record, called current timet, is associated with
each worker thread t. It is used to keep track of the times-
tamp of the current simulation event, if any, that has been
CPU-dispatched along t (this is the lowest-timestamp event
according to LTF). The value of current timet is set to the
special value -1 in case thread t is not currently processing
any event (e.g. it is running housekeeping operations within
the Time Warp platform).
The values of current timet and BH mint are used by
the callback function that takes control via the extra-tick
mechanism in order to determine whether some higher pri-
ority task (compared to the one currently processed by the
CPU along thread t) needs to be CPU-dispatched. Particu-
6In fact, each of them needs to temporarily lock a dier-
ent bottom-half queue for data insertion, which helps not
hampering concurrency [27].
larly, the callback function executing along t has the follow-
ing simple structure:
void tick-manager()
1. if (current timet  BH mint)
2. return;
3. else
4. switch_to_platform_context();
The above structure allows changing the current execution
ow along thread t in case:
1) The simulation object currently dispatched for event
execution along t needs to rollback, since it is the recip-
ient of a message or anti-message in its past (namely
BH mint corresponds to the timestamp of a message/
anti-message destined to the currently running simu-
lation object). In this case the rollback operation will
take place according to a preemptive mode just based
on the time-sharing organization and on the (periodic)
regain of control at the Time Warp platform level.
2) Some generic simulation object managed by t dynam-
ically gains a priority higher than that of the cur-
rently running one, since it becomes the recipient of
some message or anti-message with a timestamp lower
than that of the last lowest-timestamp CPU-scheduled
event. The case of an incoming anti-message is again
representative of a causal inconsistent execution at the
destination simulation object, given the adopted LTF
rule for the CPU-dispatching of the events by any
worker thread t.
Overall, in either case, control must return to the Time
Warp platform layer, so that the higher priority task (ei-
ther a rollback operation or not) is promptly executed. In
the pseudo-code this is achieved via the invocation of the
switch_to_platform_context() function, whose actual sup-
port, as well as the support for correct management of in-
dividual (and separate) simulation object contexts, is de-
scribed in the next section.
On the other hand, in case no higher priority task needs
to be executed, the forward execution of the last CPU-
dispatched event is immediately resumed given that the tick-
manager callback function simply returns, thus taking con-
trol back to the point where the original execution ow was
interrupted by the extra-tick at the heart of the time-sharing
architecture.
3.4 Support for Context Switches
The management of dierent per-simulation-object con-
texts (as well as the platform context) is based on the ROOT-
Sim support that has been introduced in [16] to create stack
separation across the dierent simulation objects. This is
achieved by locating the stack of each object in a portion
of memory destined for object usage (e.g. when memory
chunks are dynamically requested while executing events at
that object) via proper (and application transparent) alloca-
tion layers [16, 26]. On the other hand, each worker thread
t also has a platform context associated with it, which is in
turn associated with a proper stack area located in a dier-
ent, and disjoint, memory region.
Execution resume in the dierent stacks, such as when
switch_to_platform_context() is executed, has been sup-
ported via setjump and longjump POSIX APIs. They have
also been used as the support for, e.g. squashing the stack
image of the currently-executing simulation object in case
a preemptive rollback occurs within the time-sharing Time
Warp system (which eventually leads the object to resume
execution with a dierent context, namely a logged one that
is then put back in place).
3.5 Support for Safe Platform Mode Execu-
tion
A nal core aspect deals with the fact that in an ad-
vanced Time Warp platform applications are supported by
allowing the actual application code to live in a piece-wise-
deterministic environment which is able, transparently to
the application code, to support recovery of any incorrect
trajectory of the application state possibly caused by causal-
ity errors in the speculative execution path.
A classical way to achieve this is the one where any inter-
action between the application code and external services,
such as those oered by standard third-party programming
libraries, is intercepted by the Time Warp environment and
is handled according to proper rules. Classical examples are
the ones where recoverable dynamic memory services and/or
recoverable I/O services are exposed to the application via
standard interfaces (e.g. malloc or printf) but are handled
(to make them actually recoverable) via proper logic at the
level of the Time Warp platform. Such a kind of application-
transparent intervention by the platform-level software may
reach a granularity so ne that even a single machine in-
struction can be intercepted and made recoverable, as when
relying on binary code instrumentation to run-time track
memory writes, and to dynamically log recoverability data
so as to retain the possibility to undo the update [18].
According to this view, independently of the presence of
any time-sharing support like the one we provide, an ad-
vanced Time Warp platform can be already seen as a system
working according to a dual-mode execution model, where
the application can trap into platform mode just because of
some (seamless) access to one of the above mentioned ser-
vices. On the other hand, the actual implementation of such
platform-level services, for being correct, may require atom-
icity. Just to mention an example, locks on specic data
structures or memory regions might be acquired by some
worker thread once the application has trapped into plat-
form mode along that thread in order to correctly manage
the triggered service [1].
This atomicity must be guaranteed also in case of the
time-sharing architecture we provide. Hence, if the tick-
management callback is triggered while the running thread
has already trapped into platform mode in a seamless man-
ner on behalf of the application code processing the current
simulation event, our choice is to avoid any variation of the
control ow (independently of the actual presence of higher
priority tasks), so as not to interfere with the already entered
platform-level code block. In order to achieve this target in
ROOT-Sim, we have reorganized the platform-level software
such in a way that any entry point for actual platform op-
erations has been augmented by a wrapper that atomically
sets a ag indicating that the thread is running in platform
mode. The reverse action, which resets the ag to applica-
tion mode, is actuated via wrappers intercepting the return
wall-clock-time 
Start of event 
processing 
Execution of pure 
application code 
printf() 
Platform code execution 
(recoverable printf()) 
Execution of pure 
application code 
T1 
Tick-manager callback immediate return to the already
running platform code block 
Figure 6: Management of extra-ticks in the inter-
leave between application and platform code blocks
within an event processing wall-clock-time window.
of any platform-level service possibly activated while pro-
cessing some event via the aforementioned trap/interception
mechanism. In case a callback for managing some extra-tick
is received while running a platform-mode phase during the
processing of an event in the application software, then the
callback simply returns control to the interrupted execution
point.
A schematization of this behavior is provided in Figure 6,
where we show the arrival of an extra-tick at wall-clock-
time T1, with consequent activation of the extra-tick man-
agement callback function, and where the callback simply
returns given that at the same time instant the platform-
level software was already handling, possibly via proper re-
coverability rules, some I/O operation invoked by the event
processing code implemented at the application level.
We also note that the approach where no control ow
variation is actuated in case the platform mode has been
already entered, e.g., because of an interception of some
third-party library call while processing the current event
by some worker thread, also allows safety of the execution
of any platform-level facility ultimately relying on external
user space libraries, such as libc-xx.so, given that the ow
control in these libraries is never changed by any extra-tick
arrival.
4. EXPERIMENTAL RESULTS
We have executed experiments with the time-sharing ver-
sion of ROOT-Sim by running this platform of top of a 32-
core 64-bit NUMA machine, namely an HP ProLiant server,
equipped with four 2GHz AMD Opteron 6128 processors
(each one equipped with 8 CPU-cores) and 64 GB of RAM.
The operating system is Linux SUSE, kernel version 3.16.7
augmented with our extra-tick management architecture. In
the original conguration of the kernel, the LAPIC-timer
was set to issue an interrupt each 1 millisecond. When run-
ning in extra-tick mode we congured the LAPIC-timer to
send an interrupt 10 times more frequently, thus each 100
microseconds.
As the benchmark for assessing the eectiveness of our
time-sharing proposal, we used a real-world cellular system
simulator, which has already been used as a reference bench-
mark application in a number of other studies oriented to
optimistic PDES (see, e.g., [18]). In this application, each
simulation object models a wireless cell, and we selected a
total number of 1024 cells (represented as hexagons), each
one managing 1000 wireless channels, which provide cover-
age to mobile devices in a squared region. The model is high
delity in terms of how interference across dierent chan-
nels within a same cell, and power management upon call
setup/hando is captured/actuated. Particularly, the appli-
cation handles power management simulation according to
the results in [13]. The application is also highly parame-
terizable by allowing the recalculation of fading coecients
and actual Signal-to-Interference Ratio (SIR) both on the
occurrence of specic events (e.g. the startup of a call) and
periodically (so as to account for, e.g., changes of conditions
in the coverage area). Also, the inter-arrival of calls to mo-
bile devices residing in the coverage area can be congured,
thus leading to dierent values for the wireless channels' uti-
lization factor. This, in its turn, aects both memory and
CPU demand by the simulation given that higher utilization
factors lead to the need for keeping more records for simu-
lating the concurrently active calls in any cells, and also to
more costly operations for scanning and (possibly) updating
these records. As a nal preliminary note, the interaction
across the dierent simulation objects takes place upon the
occurrence of a hando of a mobile device involved in an on-
going communication, in which case the wireless channel at
the source cell is released, and a new one in the destination
cell is attempted to be reserved.
On our experimentation we set the average residual resi-
dence time in the current cell for a mobile device involved
in an on-going call to the value 5 min, while the average
call duration was set to 2 min. Both these values have been
set to follow exponential distributions. Also, we have run
this model with three dierent settings for the channel uti-
lization factor, namely 25%, 50% and 75%, determined by
dierent call inter-arrival rates, with balanced workload on
all the simulation objects (fairly distributed on 32 worker
threads operating within the ROOT-Sim platform on top
of the 32 CPU-core machine), and with periodic recalcula-
tions of the fading coecients of active channels. This set-
tings gives rise to variations of the simulation event's average
CPU requirement from about 70/80 microseconds, to about
150 microseconds. This way we achieved dierentiated con-
gurations in terms of the relation between the event ex-
ecution granularity, and the granularity of the extra-ticks'
interval (recall this has been set to 100 microseconds). In
other words, the adopted settings allowed us to determine
dierent actual likelihoods for an extra-tick to interrupt an
on going event (in fact such a likelihood is higher when the
event granularity is greater), which gave us the possibility to
assess our time-sharing architecture when changing the like-
lihood that a higher priority task can be detected as standing
while the execution of an event is in progress. Further, the
conguration with ner granularity of the events (namely
the one with 25% channel utilization factor) looks also good
for assessing the overhead by our proposal, just given the
reduced likelihood for the extra-tick to occur while an event
is in progress, rather while we are running any platform-
level housekeeping operations, which leads to a case where
the extra-tick simply returns control to the original point
(but cost has anyhow been spent for delivering it, hence for
delivering control to the callback oered by the Time Warp
system).
In this experimentation we compare the original execution
dynamics of ROOT-Sim, namely those based on the classical
batch-multitask paradigm for processing the events (where
control is returned to the underlying platform for dispatch-
ing other events/tasks only at the end of the event-handler
processing routine) with those achieved via the integration
050
100
150
200
250
300
25% 50% 75%
E
x
e
cu
ti
o
n
 T
im
e
 (
se
co
n
d
s)
Channel Utilization Factor
batch-multitask
time-sharing
Figure 7: Execution time results.
Channel
Utilization Factor
25% 50% 75%
execution time (sec) 3500 5145 7610
Table 1: Performance of the serial simulator.
of the time-sharing support (which allows for passing con-
trol to events/tasks that dynamically reveal as higher pri-
ority ones). Particularly, in Figure 7 we report the execu-
tion time (computed as the average over 5 dierent samples,
each referring to a dierent settings of the pseudo-random
seeds used for stochastically driving model execution) for
simulating a specic virtual time interval in the dierent
congurations of the benchmark application and with either
batch-multitask or time-sharing support. Also, we decided
to run with checkpoint interval xed to the value 10 for
all the simulation objects in order not to introduce uctu-
ations and/or variations in performance possibly caused by
some adaptive mechanism to select the checkpoint frequency,
which might interfere with the actual performance variations
natively imputable to the two dierent execution schemes we
are comparatively studying, namely time-sharing vs batch-
multitask. By the data we can draw the following main
observations. First, the results related to the congura-
tion with 25% channel utilization factor, which as hinted
before is essentially useful for overhead assessment, actu-
ally show a minimal overhead by the time-sharing support,
given that the execution times for the two dierent supports
is essentially the same. On the other hand, as soon as the
event granularity is increased (namely for higher values of
the channel utilization factor) we get an actual reduction of
the execution time achieved with the time-sharing support
(as compared to the traditional batch-multitask support).
Specically, the gain in performance is of the order of 8%
for the case of utilization factor set to 50%, and of the order
of 15% when the utilization factor is further increased up to
75%. For completeness, we also report in Table 1 the corre-
sponding execution times for the case of a serial execution
of the same identical application code on top of a sequential
scheduler based on the Calendar-Queue data structure [2],
which allows determining the speedup of the parallel runs,
and hence whether this study refers to competitive parallel
performance.
By the data in Figure 8, we get the explanation of the
actual source of the performance gain by the time-sharing
support compared to the batch-multitask one. Specically,
0.94
0.96
0.98
1
1.02
1.04
1.06
1.08
1.1
25% 50% 75%
E
ff
ic
ie
n
cy
 R
a
ti
o
(t
im
e
-s
h
a
ri
n
g
/b
a
tc
h
-m
u
lt
ta
sk
)
Channel Utilization Factor
Figure 8: Time-sharing eciency over batch-
multitask eciency.
we report in this gure the variation of the ratio between
the optimistic run eciency observed with the time-sharing
support and the one observed with the batch-multitask sup-
port. We recall that the eciency of an optimistic PDES
run represents the percentage of productive work executed
(processed simulation events that are not eventually rolled
back). Hence it is an expression (and a derivation) of the
actual rollback pattern (and amount). By the data we see
that, for the conguration with 25% channel utilization fac-
tor, the two dierent supports provide in practice the same
eciency values. This is aligned with our previous observa-
tions in relation to the application conguration with ner-
event granularity, which exhibits a reduced likelihood of ac-
tual interruption of an event processing phase by the extra-
ticks, with consequent reduction of the possibility to change
the rollback pattern in the time-sharing support by passing
control to some higher priority task (e.g. a simulation event
with smaller timestamp) dynamically injected in the system.
On the other hand, increasing the event granularity leads to
scenarios with increased ability by the time-sharing support
to actually track (while an event is already being processed
in CPU) whether higher priority activities need to be carried
out along the same worker thread and dynamically resched-
ule them in CPU, which leads to limit the negative impact
of, e.g. out-of timestamp order processing. This, in its turn,
leads the relative eciency by the time-sharing support to
increase, compared to the batch-multitask one, up to 9%.
Overall, the time-sharing conguration shows higher likeli-
hood to perform useful (not eventually rolled back) work,
especially for the case of larger granularity events, and this
is achieved with negligible overhead (by the time-sharing
support), with positive eects on performance.
In Figure 9 we provide an additional plot where we show
the variation of the amount of actual event preemptions
(hence execution ow variations) per execution time unit
achieved while running in time-sharing mode for the three
dierent congurations of the channel utilization factor we
have considered in this study. Aligned with the previous
results, the data reported in this plot show how the congu-
rations with larger event granularity (namely, higher chan-
nel utilization factor) manifest a larger percentage of actual
preemptions per wall-clock-time unit. On the other hand,
the interesting point in this plot is that it does not scale
linearly, which is a reection of the fact that greater event
050
100
150
200
250
300
350
25% 50% 75%
N
u
m
b
e
r 
o
f 
P
re
e
m
p
ti
o
n
s 
p
e
r 
W
a
ll
-C
lo
ck
-T
im
e
 U
n
it
Channel Utilization Factor
Figure 9: Number of preemptions per wall-clock-
time unit when running in time-sharing mode.
granularity on the one hand allows for more opportunities
of preemption (since extra-ticks more likely will be delivered
while simulation-event processing is in progress), but on the
other hand, parallel runs with larger grain events are typi-
cally (at least for balanced models like the ones we are con-
sidering) less subject to divergence of the simulation clocks
of the dierent concurrent simulation objects, which leads
to reduced likelihood that the extra-tick callback function
actually nds some higher priority task (e.g. the rollback
of the currently running simulation object) to be carried
out. However, this somehow no linear shape of the curve in
Figure 9 looks intrinsic to the nature of Time Warp dynam-
ics, hence not being a specic limitation of the time-sharing
Time Warp approach we have presented.
5. CONCLUSIONS
In this article we have presented a lightweight support for
allowing time-shared execution of Time Warp platforms on
top of conventional operating systems, such as Linux, and
multi-core machines.
In our proposal, any individual worker thread operating
within the Time Warp system can be interrupted with high
frequency (a frequency much higher than conventional timer
interrupts in classical operating systems' congurations) and
with low interrupt-management overhead, in order to deter-
mine whether any higher priority task (e.g. a rollback to
be promptly processed) or event (e.g. with lower timestamp
compared to the currently executed one) needs to be CPU-
dispatched. This allows for reacting to actual changes of the
priorities of the activities to be carried out within the Time
Warp run, with consequent (possible) advantages in terms
of reduction of the amount of wasted work.
Our operating system based support for time-sharing has
been integrated into an open source Time Warp platform,
and we also report experimental data for an assessment of
both the overhead by our proposal and its nal eectiveness
in terms of improvements of the execution speed of Time
Warp simulations on multi-core machines.
6. REFERENCES
[1] F. Antonacci, A. Pellegrini, and F. Quaglia.
Consistent and ecient output-streams management
in optimistic simulation platforms. In Proceedings of
the ACM SIGSIM Conference on Principles of
Advanced Discrete Simulation, (SIGSIM-PADS),
Montreal, QC, Canada, May 19-22, 2013, pages
315{326, ACM Press, 2013.
[2] R. Brown. Calendar queues: a fast O(1) priority queue
implementation for the simulation event set problem.
Communications of the ACM, 31(10):1220{1227, 1988.
[3] C. D. Carothers. Xsim: real-time analytic parallel
simulations. In Proceedings of the Workshop on
Parallel and Distributed Simulation (PADS),
Washington, D.C., USA, May 12-15, 2002, pages
27{34, IEEE Computer Society, 2002.
[4] C. D. Carothers, D. W. Bauer, and S. Pearce. ROSS:
A high-performance, low-memory, modular time warp
system. Journal of Parallel and Distributed
Computing, 62(11):1648{1669, 2002.
[5] C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto.
Ecient optimistic parallel simulations using reverse
computation. ACM Transactions on Modeling and
Computer Simulation, 9(3):224{253, 1999.
[6] S. R. Das, R. M. Fujimoto, K. Panesar, D. Allison,
and M. Hybinette. GTW: a time warp system for
shared memory multiprocessors. In Proceedings of the
26th Winter Simulation Conference, pages 1332{1339.
Society for Computer Simulation International, 1994.
[7] P. De, R. Kothari, and V. Mann. Identifying sources
of operating system jitter through ne-grained kernel
instrumentation. In Proceedings of the IEEE
International Conference on Cluster Computing, 17-20
September 2007, Austin, Texas, USA, pages 331{340,
IEEE Computer Society, 2007.
[8] K. B. Ferreira, P. Bridges, and R. Brightwell.
Characterizing application sensitivity to OS
interference using kernel-level noise injection. In
Proceedings of the 2008 ACM/IEEE Conference on
Supercomputing (SC), November 15-21, 2008, Austin,
Texas, USA, pages 19:1{19:12, ACM Pess, 2008.
[9] HPDCS Research Group. ROOT-Sim: The ROme
OpTimistic Simulator - v 1.0.
http://www.dis.uniroma1.it/hpdcs/ROOT-Sim/ (last
accessed: May 2015).
[10] D. R. Jeerson. Virtual Time. ACM Transactions on
Programming Languages and System, 7(3):404{425,
1985.
[11] D. R. Jeerson, B. Beckman, F. Wieland, L. Blume,
M. D. Loreto, P. Hontalas, P. Laroche, K. Sturdevant,
J. Tupman, L. V. Warren, J. J. Wedel, H. Younger,
and S. Bellenot. Distributed simulation and the Time
Warp operating system. In Proceedings of the Eleventh
ACM Symposium on Operating System Principles
(SOSP), Austin, Texas, November 8{11, pages 77{93,
ACM Press, 1987.
[12] P. D. B. Jr., C. D. Carothers, D. R. Jeerson, and
J. M. LaPre. Warp speed: executing Time Warp on
1,966,080 cores. In Proceedings of the ACM SIGSIM
Conference on Principles of Advanced Discrete
Simulation, (SIGSIM-PADS), Montreal, QC, Canada,
May 19-22, 2013, pages 327{336, ACM Press, 2013.
[13] S. Kandukuri and S. Boyd. Optimal power control in
interference-limited fading wireless channels with
outage-probability specications. IEEE Transactions
on Wireless Communications, 1(1):46{55, 2002.
[14] D. E. Martin, T. J. McBrayer, and P. A. Wilsey.
WARPED: A Time Warp simulation kernel for
analysis and application development. In Proceedings
of the 29th Hawaii International Conference on System
Sciences (HICSS), Volume 1: Software Technology
and Architecture, January 3-6, 1996, Maui, Hawaii,
USA, pages 383{386. IEEE Computer Society, 1996.
[15] D. M. Nicol and X. Liu. The dark side of risk (what
your mother never told you about Time Warp). In
Proceedings of the Eleventh Workshop on Parallel and
Distributed Simulation (PADS), Lockenhaus, Austria,
June 10-13, 1997, pages 188{195, IEEE Computer
Society, 1997.
[16] A. Pellegrini and F. Quaglia. Transparent multi-core
speculative parallelization of DES models with event
and cross-state dependencies. In Proceedings of the
ACM SIGSIM Conference on Principles of Advanced
Discrete Simulation, (SIGSIM-PADS), Denver, CO,
USA, May 18-21, 2014, pages 105{116, ACM Press,
2014.
[17] A. Pellegrini, R. Vitali, and F. Quaglia. The ROme
OpTimistic Simulator: Core internals and
programming model. In Proceedings of the 4th
International ICST Conference on Simulation Tools
and Techniques (SIMUTools), Barcelona, Spain,
March 22 - 24, pages 96-98, ICST, 2011.
[18] A. Pellegrini, R. Vitali, and F. Quaglia. Autonomic
state management for optimistic simulation platforms.
IEEE Transactions on Parallel and Distributed
Systems (preprint), May 2014,
doi:10.1109/TPDS.2014.2323967.
[19] B. R. Preiss, W. M. Loucks, and D. MacIntyre. Eects
of the checkpoint interval on time and space in Time
Warp. ACM Transactions on Modeling and Computer
Simulation, 4(3):223{253, 1994.
[20] P. Putnam, P. A. Wilsey, and K. V. Manian. Core
frequency adjustment to optimize Time Warp on
many-core processors. Simulation Modelling Practice
and Theory, 28:55{64, November 2012.
[21] F. Quaglia. A cost model for selecting checkpoint
positions in Time Warp parallel simulation. IEEE
Transactions on Parallel and Distributed Systems,
12(4):346{362, 2001.
[22] F. Quaglia and V. Cortellessa. On the processor
scheduling problem in Time Warp synchronization.
ACM Transactions on Modeling and Computer
Simulation, 12(3): 143-175, 2002.
[23] A. Santoro and F. Quaglia. Software supports for
event preemptive rollback in optimistic parallel
simulation on myrinet clusters. Journal of
Interconnection Networks, 6(4):435{457, 2005.
[24] S. K. Seal and K. S. Perumalla. Reversible parallel
discrete event formulation of a tlm-based radio signal
propagation model. ACM Transactions on Modeling
and Computer Simulation, 22(1):4:1{4:23, 2011.
[25] S. Seelam, L. L. Fong, A. N. Tantawi, J. Lewars,
J. Divirgilio, and K. Gildea. Extreme scale computing:
Modeling the impact of system noise in multicore
clustered systems. In Proceedngs of 24th IEEE
International Symposium on Parallel and Distributed
Processing (IPDPS), Atlanta, Georgia, USA, 19-23
April 2010, IEEE Computer Society, pages 1{12, 2010.
[26] R. Toccaceli and F. Quaglia. DyMeLoR: Dynamic
Memory Logger and Restorer library for optimistic
simulation objects with generic memory layout. In
Proceedings of the Workshop on Principles of
Advanced and Distributed Simulation (PADS), Roma,
Italy, June 3-6, pages 163{172. IEEE Computer
Society, 2008.
[27] R. Vitali, A. Pellegrini, and F. Quaglia. Towards
symmetric multi-threaded optimistic simulation
kernels. In Proceedings of the Workshop on Principles
of Advanced and Distributed Simulation (PADS),
Zhangjiajie, China, July 15-19, pages 211{220. IEEE
Computer Society, Aug. 2012.
