Alpaca: Intermittent Execution without Checkpoints by Maeng, Kiwan et al.
Alpaca: Intermittent Execution without Checkpoints
KIWAN MAENG, Carnegie Mellon University, USA
ALEXEI COLIN, Carnegie Mellon University, USA
BRANDON LUCIA, Carnegie Mellon University, USA
The emergence of energy harvesting devices creates the potential for batteryless sensing and computing
devices. Such devices operate only intermittently, as energy is available, presenting a number of challenges for
software developers. Programmers face a complex design space requiring reasoning about energy, memory
consistency, and forward progress. This paper introduces Alpaca, a low-overhead programming model for
intermittent computing on energy-harvesting devices. Alpaca programs are composed of a sequence of user-
defined tasks. The Alpaca runtime preserves execution progress at the granularity of a task. The key insight in
Alpaca is the privatization of data shared between tasks. Updates of shared values in a task are privatized and
only committed to main memory on successful execution of the task, ensuring that data remain consistent
despite power failures. Alpaca provides a familiar programming interface and a highly efficient runtime model.
We also present an alternate version of Alpaca, Alpaca-undo, that uses undo-logging and rollback instead of
privatization and commit. We implemented a prototype of both versions of Alpaca as an extension to C with
an LLVM compiler pass. We evaluated Alpaca, and directly compared to three systems from prior work. Alpaca
consistently improves performance compared to the previous systems, by up to 23.8x, while also improving
memory footprint in many cases, by up to 17.6x.
1 INTRODUCTION
The emergence of extremely energy-efficient processor architectures creates the potential for
computing and sensing systems that operate entirely using energy extracted from their environment.
Such energy-harvesting systems can use energy from radio waves [Sample et al. 2008; Zhang et al.
2011a], solar energy [Lee et al. 2012; Zac Manchester 2015], and other environmental sources. An
energy-harvesting system operates only intermittently when energy is available in the environment
and experiences a power failure otherwise. To operate, a device slowly buffers energy into a
storage element (e.g., a capacitor). Once sufficient energy accumulates, the device begins operating
and quickly consumes the stored energy. Energy depletes more quickly during operation (e.g.
milliseconds) than it accumulates during charging (e.g., seconds). When energy is depleted and the
device powers off, volatile state, e.g. registers and stack memory, is lost, while non-volatile state,
e.g., ferroelectric memory (FRAM), persists. The charge/discharge cycle of an energy-harvesting
device forces software to execute according to the intermittent execution model [Colin and Lucia
2016; Lucia and Ransford 2015; Van Der Woude and Hicks 2016]. An intermittent execution includes
periods of activity perforated by power failures. The key distinction between intermittent execution
and continuously-powered execution is that in the intermittent model a computation may execute
only partially before power fails and must be resumed after the power is restored. Correct and
efficient intermittent execution requires a system to meet a set of correctness requirements (C1-3)
and performance goals (G1-3).
C1: A program must preserve progress despite losing volatile state on power failures.
C2: A program must have a consistent view of its state across volatile and non-volatile memory.
C3: A program must respect atomicity constraints (e.g., sampling related sensors together).
G1: Applications should place as few restrictions on the hardware as possible.
Authors’ addresses: Kiwan Maeng, Electrical and Computer Engineering Department, Carnegie Mellon University, USA,
kmaeng@andrew.cmu.edu; Alexei Colin, Electrical and Computer Engineering Department, Carnegie Mellon Univer-
sity, USA, acolin@andrew.cmu.edu; Brandon Lucia, Electrical and Computer Engineering Department, Carnegie Mellon
University, USA, blucia@andrew.cmu.edu.
, Vol. 1, No. 1, Article . Publication date: September 2019.
ar
X
iv
:1
90
9.
06
95
1v
1 
 [c
s.D
C]
  1
3 S
ep
 20
19
G2: Applications should be tunable at design time to use the energy storage capacity efficiently.
G3: Applications should minimize runtime overhead and memory footprint.
Recent work made progress toward several of these goals, but necessarily compromised on
others. This paper develops Alpaca 1, a programming and execution model that allows software
to execute intermittently. Like state-of-the-art systems, Alpaca preserves progress despite power
failures (C1) and ensures memory consistency (C2). Alpaca uses a static task model that can adhere
to programmer-provided atomicity constraints and energy availability (G2, C3). Memory updates
made by an Alpaca task only commits atomically when the task completes. By discarding memory
updates on power failure, Alpaca can restart a task with negligible cost, without checkpointing
the volatile state as in prior work [Lucia and Ransford 2015; Ransford et al. 2011a; Van Der Woude
and Hicks 2016] (G3). Unlike prior work that requires the entire memory to be non-volatile [Van
Der Woude and Hicks 2016], Alpaca can leverage both volatile and non-volatile memory (G1). We
present two different versions of Alpaca with different design choices. Alpaca’s design differences,
relative to state-of-the-art systems, translate into performance gains of 4-5.2x on average (up 23.8x
in some cases). Also, Alpaca shows smaller memory footprints compared to most of the previous
systems.
Section 2 provides background on intermittent computing. Sections 3 and 4 describe the Alpaca
programming model and its implementation. Section 5 presents an alternate design choice of Alpaca.
Section 6 discusses key design decisions. Sections 7 and 8 describe our benchmarks and evaluation.
We conclude with a discussion of related (Section 9) and future (Section 10) work.
2 BACKGROUND ANDMOTIVATION
Energy-harvesting systems operate intermittently, losing power frequently and unexpectedly.
Intermittent operation compromises forward progress and leads to inconsistent device and memory
states, with unintuitive consequences that demand abstraction by new programming models.
2.1 Energy-Harvesting Devices and Intermittent Operation
Energy-harvesting devices operate using energy extracted from their environment, such as solar
power [Lee et al. 2012; Zac Manchester 2015], radio waves (RF) [Sample et al. 2008], or mechanical
interaction [Karagozler et al. 2013; Paradiso and Feldmeier 2001]. As the processor on such a device
executes software to interact with sensors and actuators or communicate via radio, it manipulates
both volatile and non-volatile memory. An energy-harvesting device can operate only intermittently,
when energy is available. Common energy-harvesting platforms [Sample et al. 2008] use a power
system that charges a capacitor slowly to a threshold voltage. At the threshold, the device begins
operating, draining the capacitor’s stored energy much more quickly than it can recharge. The
system eventually depletes the capacitor, and the device turns off and waits to again recharge to
its operating voltage. These power cycles can occur frequently: RF-powered devices may reboot
hundreds of times per second [Sample et al. 2008].
2.2 Device Model and Hardware Assumptions
Our work makes few assumptions about device hardware. A device’s memory system can include an
arbitrary mixture of volatile and non-volatile memory, unlike prior work that requires all memory
to be non-volatile [Ma et al. 2015; Ransford et al. 2011b; Van Der Woude and Hicks 2016]. Alpaca
works on devices with non-volatile memories that support atomic read and write operations, e.g.
Ferroelectric RAM [TI Inc. 2017] and Flash. In commercially available FRAM implementations
that rely on destructive reads (i.e., rewrite-on-read), access atomicity is satisfied by means of an
1Alpaca: Adaptive Lightweight Programming Abstraction for Consistency and Atomicity
2
internal capacitor that buffers sufficient energy to complete an in-progress access. Our model allows
arbitrary peripheral (I/O) devices as detailed in Section 6.
2.3 Intermittent Execution and Memory Consistency
(a) Sample code from RSA.
(b) Intermittent execution.
Fig. 1. RSA code with intermittent execution.
Software on an energy-harvesting device operates
intermittently: an intermittent execution does not
end when power fails; instead the execution al-
ternates between active periods and inactive pe-
riods. On each power failure, the register file and
volatile memory (i.e., stack and globals) are erased.
Variables in non-volatile memory persist. Prior
work [Balsamo et al. 2016, 2015; Mirhoseini et al.
2013; Ransford et al. 2011a,b] checkpoints volatile
state periodically and restores a checkpoint after a
power failure. Other prior work [Colin and Lucia
2016; Lucia and Ransford 2015; Van Der Woude and
Hicks 2016] found that if an application directly ma-
nipulates non-volatile memory, checkpointing only
the volatile state is not enough to guarantee consis-
tency. The problem exists because some memory
operations may repeat after restarting from a check-
point. Non-volatile state written before a power
failure persists after a restart, and if re-executing
code reads the non-volatile state without first over-
writing it, the code may operate using inconsistent
values. The resulting program behavior is impos-
sible if the device were powered continuously. Pre-
cisely, a non-volatile value that may be read and
later written (i.e., a “write-after-read”, or W-A-R)
between two consecutive checkpoints can become
inconsistent [De Kruijf and Sankaralingam 2013;
Lucia and Ransford 2015; Van Der Woude and Hicks 2016].
Figure 1 illustrates how the combination of aW-A-R dependence and volatile-only checkpointing
can leave data inconsistent. The code, excerpted from our implementation of RSA [Rivest et al.
1978], multiplies two numbers in1 and in2 digit by digit, accounting for carries. A task boundary
or a checkpoint is denoted uniformly by TaskBoundaryOrCheckpoint(). The NV prefix denotes
non-volatile data. The code preserves per-digit progress using non-volatile variables d, carry, and
prod[], the output digit index, most recent carry value, and output product. In the execution,
carry is updated, power fails, and after restarting, mult() uses the already-updated value of carry,
producing the wrong result (Figure 1b). The code first reads, then writes carry (a W-A-R), putting
it at risk of inconsistency. While the figure shows a problem with carry only, d is also read, then
written, presenting another potential consistency problem.
2.4 Overhead of Existing Approaches
Intermittent programming systems that handle volatile and non-volatile memory consistency
preserve progress across power failure by either taking checkpoints [Van Der Woude and Hicks
2016] or bounding tasks [Colin and Lucia 2016; Lucia and Ransford 2015]. Compiler-automated
checkpointing approaches [Ransford et al. 2011a; Van Der Woude and Hicks 2016] are limited to
3
their static analysis, often resulting in much frequent checkpointing then necessary [Van DerWoude
and Hicks 2016]. They also often copies the entire memory, wastefully copying even what has not
been updated [Ransford et al. 2011a], or copies much more than necessary due to the limitation of
pointer aliasing [Lucia and Ransford 2015]. Moreover, they cannot adhere to high-level atomicity
constraints. System asking the programmer to place boundaries [Colin and Lucia 2016; Lucia
and Ransford 2015] does not incur frequent checkpointing overhead. However, to make memory
consistent across power failure, previous work relied on a custom data structure with high space
overhead (i.e., a “channel”) [Colin and Lucia 2016], or a compiler analysis which is prone to high
overhead due to conservatism of the static analysis [Lucia and Ransford 2015].
3 ALPACA PROGRAMMING MODEL
Alpaca is a programming interface that allows programmers to write software that behaves cor-
rectly under an intermittent execution model. Alpaca aims to overcome the limitations of prior
work described in Section 2 and to meet design requirements C1–C3 and design optimization goals
G1–G3 from Section 1. The Alpaca programming model consists of two core concepts, tasks and
privatization. A task is a programming abstraction that is useful for preserving progress, imple-
menting atomicity constraints, and controlling an application’s energy requirements. Privatization
is a language feature that guarantees that any volatile or non-volatile memory accessed by a task
remains consistent, regardless of power conditions.
3.1 Task-Based Programming
A task in Alpaca is a user-defined region of code that executes on a consistent snapshot of memory
and produces a consistent set of outputs. An Alpaca task that eventually has sufficient energy
to execute to completion is guaranteed to have behavior (i.e., control-flow and memory reads
and writes) that is equivalent to some continuously-powered execution regardless of arbitrarily-
timed power failures. As Section 4 describes, if power fails during a task’s execution, Alpaca
effectively discards intermediate results and execution starts again from the beginning of the task.
Consequently, a programmer can reason as though tasks are atomic, like transactions in a TM
system. Computations that consume more energy than the hardware can provide between two
consecutive power failures must be split into multiple tasks.
To program in Alpaca, the programmer decomposes application code into tasks, each marked
with the task keyword. Each task explicitly transfers control to another task (or to itself). A
program’s control flow is defined by the execution of tasks in the sequence specified by the transfer
statements. To transfer control from a task to one of its successors, the programmer uses the
transition_to keyword, which takes the name of a task as its argument and immediately jumps
to the beginning of that task. transition_to statements are valid along any control-flow path
within a task, and all paths through a task must end in a transition_to statement or program
termination. The programmer specifies which task should run on when the system powers on for
the first time using entry keyword. Figure 2 shows a sensing application written using Alpaca.
Alpaca tasks are syntactically similar to Chain tasks [Colin and Lucia 2016], but the memory model
for task interactions differs completely.
Alpaca guarantees to the programmer that a task executes atomically even if power fails during
its execution. When the task completes and the next task begins, changes to memory made by the
completed task are guaranteed to be visible and control never flows backward to the completed
task again, unless an explicit transition_to statement executes. Conversely, if a task does not
complete due to a power failure, control does not advance to any other task, which prevents the
partially updated state from becoming visible. Alpaca allows only a single task sequence and does
not support parallel task execution. This design choice is reasonable because parallel hardware is
4
Fig. 2. An application written in Alpaca. The program samples a sensor, calculates an average, and transmits
via radio.
extremely rare in intermittent devices due to its relatively high power consumption. Alpaca does
not support concurrent (i.e., interleaved as threads) task sequencing. Concurrency is limited to I/O
routines only, which are addressed in Section 6.3.
Task atomicity guarantees correctness by ensuring that if any of a task execution’s effects become
visible, then all of them are visible, and by ensuring that a completed task’s execution takes effect
only once. Moreover, task-based execution preserves progress, assuming that eventually the system
buffers sufficient energy to complete any task. Alpaca’s atomicity property derives from its memory
model and data privatization mechanism.
3.2 Alpaca Memory Model and Data Privatization
Alpaca’s memory model provides a familiar programming interface allowing tasks to share data via
global variables. At the same time, the memory model design allows an efficient implementation
of the task-atomicity guarantee. The Alpaca memory access model divides data into task-shared
and task-local data. Multiple tasks or multiple different executions of the same task may share
data using task-shared variables. Task-shared variables are named in the global scope and are
allocated in non-volatile memory. Task-shared variables have a typical load/store interface: once
a task wrote a value to a task-shared variable, that same task or another task may later read the
value by referencing the variable name. Task-local variables are scoped only to a single task, must
be initialized by that task, and are allocated in the efficient volatile memory.
As discussed in Section 2.3, directly manipulating non-volatile memory in an intermittent
execution can leave data inconsistent due toW-A-R dependencies. To prevent these inconsistencies,
Alpaca privatizes task-shared variables to a task during compilation. Privatization creates a task-
local copy of a task-shared variable in a privatization buffer. As the task executes, it manipulates the
copies in the privatization buffer. When the task completes it copies data to a commit list that the
task uses to atomically commit all updates buffered in the privatization buffer. Section 4.2 describes
how privatization works and why it is sufficient to keep data consistent. We emphasize, however,
that from the programmer’s perspective, privatization is invisible. To support our privatization
analysis, the programmer need only specify (1) tasks and (2) task-shared variables. With this
information alone, Alpaca provides its consistency guarantee automatically and efficiently.
The new syntatic elements Alpaca introduce is summarized in Table 1.
4 ALPACA IMPLEMENTATION
Our prototype implements the programming model defined in Section 3 using a compiler analysis
and a runtime library. The key requirements for an Alpaca implementation are (1) preserving
5
Table 1. Summary of Alpaca keywords.
Keyword Description
task Identifies a function as an Alpaca task.
transition_to Ends a task and start another task.
TS Identifies a variable as task-shared.
entry Task that executes when the device boots for the first time.
init Function that executes on every reboot, to reinitialize peripherals.
progress at the granularity of tasks, (2) ensuring that task-shared and task-local data are consistent,
and (3) doing so efficiently.
To meet these requirements, our Alpaca implementation uses two techniques. The first technique
is data privatization, which ensures that data remain consistent by transparently copying selected
values into temporary buffers and redirecting the task’s accesses to the buffer. The second technique
is two-phase commit, which both preserves progress and guarantees that a completed task’s updates
to its privatized values are all rendered consistently in memory. Alpaca’s use of task-based execution
is the foundation of its efficient support for privatization and two-phase commit.
4.1 Task-Based Execution
Alpaca tasks are void functions with arbitrary code identified with the task keyword. Alpaca
maintains a global cur_task pointer in non-volatile memory that records the address of the task
that began executing at the last successful task transition. Alpaca also maintains a global non-
volatile 16-bit counter, cur_version, which is initially 1, is incremented on each reboot or task
transition, and is reset to 1 when it reaches its maximum value. The counter is used to privatize
arrays efficiently (Section 4.4). To transition from one task to the next at a transition_to statement,
Alpaca assigns cur_task to the address of the next task and jumps to the start of that task. When
task execution resumes after a power failure, control transfers to the start of cur_task.
4.2 Privatization
Alpaca privatizes a subset of task-shared variables in a task to keep them consistent in case power
fails as the task executes. We describe privatization of scalar (i.e., non-array) data first. Privatization
of arrays is described later in Section 4.4. To privatize a variable, Alpaca statically allocates a
privatization buffer and copies the variable that may become inconsistent to its local privatization
buffer. Alpaca re-writes subsequent memory access instructions to refer to the privatization buffer
instead of the original memory location of the variable. At the end of the task, right before the
transition to the following task, Alpaca commits any changes made to the privatized copy to
its original location, using the two-phase commit procedure (Section 4.3). Privatization ensures
that tasks execute idempotently because updates to memory are committed only after a task has
completed. Idempotent execution ensures that a task’s effects are atomic, which is one of Alpaca’s
main language-level guarantees.
The correctness and efficiency of Alpaca’s privatization analysis rely on several key properties of
Alpaca’s design. For efficiency, Alpaca does not privatize all task-shared variables. Instead, Alpaca
detects W-A-R dependencies during compilation and privatizes only the variables involved in
the dependencies (Section 2). To identify affected variables, Alpaca performs an inter-procedural,
backward traversal of each task’s control-flow graph, tracking accesses to each task-shared variable
along each path. If a write and then a read to the same task-shared variable are encountered along
any path in the backward traversal, Alpaca privatizes that task-shared variable.
6
Alpaca’s compiler generates the instructions for privatizing a variable. The compiler first allocates
a privatization buffer in non-volatile memory for each variable that needs to be privatized. At the be-
ginning of the task, the compiler inserts code that copies the variable value from its original location
to its privatization buffer. Then, the compiler replaces each reference to the original value inside the
task with a reference to the privatization buffer. Before each transition_to statement, the com-
piler invokes the first phase of the two-phase commit operation, pre_commit (Section 4.3), passing
as arguments the addresses of the original variable and its privatization buffer along with its size.
Fig. 3. Privatization and commit. transition_to calls
commit.
Figure 3 shows a sketch of Alpaca’s instru-
mentation for an example task code. Compiler-
inserted privatization code is in green and code
deleted by the compiler is struck-through. As
in Line 1, the user defines task-shared variable
by annotating it as TS. TS variables are saved in
non-volatile memory. The code in this example
requires only c to be privatized because it is the
only W-A-R variable; code accessing all other
data requires no instrumentation. Variable c
is privatized on Line 3, and the access to it on
Line 6 is re-written to refer to the private copy
c_priv (Line 7). After privatization, only the commit operation can modify the location c. Selective
instrumentation avoids runtime overhead and is the key to Alpaca’s high performance.
Our implementation of the compiler analysis privatizes variables in functions called frommultiple
tasks, assuming the variable requires privatization in some of its callers. During analysis of a task
that calls a function that accesses such a variable, the compiler rewrites the function’s body to refer
to the variable’s privatized copy. Consequently, the variable will remain privatized for any other
caller of the same function, even if that caller does not involve the variable in a W-A-R dependency.
This “contagious” privatization is safe, conservative, and could be eliminated by replicating the
function body, creating a version for each combination of privatized and non-privatized variables
that the function refers to. We allow contagious privatization in favor of the code bloat from
replication. In practice, redundant privatization is rare in the benchmarks that we studied.
Algorithms 1—3 depict Alpaca’s privatization analysis. The analysis identifies variables potentially
involved inW-A-R dependences, adds code to privatize those variables, and adds code to atomically
commit privatized copies when a task completes. The code at the end of Algorithm 1 identifies the
largest possible number of variables that may need to be committed by a single task and statically
allocates a commit list that accommodates them all. Section 4.3 explains in detail how Alpaca uses
its commit_list to commit privatized data.
Algorithm 1 Pseudo-code for Alpaca Compiler.
1: function AlpacaCompiler(ModuleM)
2: for t ∈ M .tasks do
3: warSet ← AlpacaFindWAR(t ) ▷ Find W-A-R variables
4: AlpacaTransform(t ,warSet ) ▷ Modify code for W-A-R variables
5: maxCommitListSize ←Max(maxCommitListSize ,warSet .size)
6: SetCommitListSize(maxCommitListSize) ▷ Determine commit_list size
7
Algorithm 2 Function Finding W-A-R Variables for Each Tasks.
1: function AlpacaFindWAR(Task t )
2: warSet ← ∅
3: for i ∈ t .instructions do
4: for v ∈ i .possibleWriteAddress do ▷ Find writes
5: if v ∈ taskSharedVariables then
6: i .writeSet ← i .writeSet ∪v
7: for v ∈ i .possibleReadAddress do ▷ Find reads
8: if v ∈ taskSharedVariables then
9: i .readSet ← i .readSet ∪v
10: for i ∈ t .instructions do ▷ Detect W-A-R
11: for j ∈ i .possiblePreviousInst do
12: for v ∈ i .writeSet ∩ j .readSet do
13: warSet ← warSet ∪v
14: if i .isFunctionCall then ▷ For function call (See Section 4.2)
15: f ← i .дetCalledFunction
16: for v ∈ f .usedTaskSharedVariables do
17: warSet ← warSet ∪v
18: returnwarSet
Algorithm 3 Function Inserting Privatization and Pre-commit Code When Needed.
1: function AlpacaTransform(Task t , SetwarSet )
2: for v ∈ warSet do
3: if v .isPrivatizationBu f f erAbsent then ▷ Create privatization buffer
4: CreateBuffer(v)
5: InsertPrivatizationCode(t , v) ▷ Insert privatization code
6: for i ∈ t .instructions do
7: if v ∈ i .usedOperands then ▷ Redirect accesses
8: RedirectUsageToBuffer(i , v)
9: if i .isTransitionTo then ▷ Insert pre-commit code
10: InsertPrecommitBefore(i , v)
4.3 Committing Privatized Data
At the end of a task’s execution (i.e., upon reaching a transition_to statement) Alpaca performs a
two-phase commit of updates made to privatized data by that task. The commit operation atomically
applies all updates to variables’ original locations. The operation is divided into two phases: pre-
commit and commit. The pre-commit operation is implemented by the pre_commit function in
Alpaca runtime library. This function takes the variable information as an argument and records it
in an entry in the commit_list table, depicted in Figure 4a. The commit_list is a table with exactly
one entry for each privatized variable. A variable’s commit_list entry contains the variable’s
original address, privatization buffer’s address, and size. Calls to pre_commit are inserted by the
compiler at transition_to statements, as was described in Section 4.2.
The commit_list generated in the first phase records updates to privatized data that must be
committed in the second phase. Alpaca stores an end-index that always points to the entry after the
last valid entry in the commit_list. The commit_listmust be stored in non-volatile memory since
8
(a) Executing task 1 (b) Committing task 1 (c) Executing task 2
Fig. 4. Making progress in Alpaca. Each panel shows the execution at left and the system state at right. The
current phase is shaded. We omit privatization instructions for clarity. The system state shows that a in
Task 1 and b in Task 2 are privatized into privatization buffers marked pb. Initially, a=1 and b=0. (a) Task
1 writes to b directly and writes to a’s privatization buffer because a is involved in a W-A-R dependence.
Updates to privatized variables are written to the commit_list during the pre-commit phase of the task. A
power failure during execution or pre_commit restarts at the beginning of the task. (b) Task 1 proceeds to the
commit phases where Task 1 applies its update to a. A power failure during commit restarts in commit. (c) The
transition_to operation atomically begins Task 2, which privatizes b because Task 2 reads then writes it.
its contents must persist if a failure happens during the second phase. As seen in Algorithm 1, our
implementation statically allocates a region of memory large enough to fit the maximum number
of entries that may be required by any task in the program (i.e., the maximum number of calls to
pre_commit at any transition_to statement in any task). After the last pre_commit call before
each transition_to, the compiler inserts an instruction to set a non-volatile commit_ready bit
that marks the task ready for the second phase, as shown in Figure 4b. Alpaca runtime checks
commit_ready on boot. If commit_ready is unset, the previously executing task was either in
progress or had completed only a partial pre-commit, so that task is re-executed from its start,
discarding the partial execution or the partial pre-commit. Otherwise, the second phase is invoked.
The second phase, commit, is implemented in the Alpaca runtime library by a void function,
commit. The function iterates over entries in the commit_list from the first up to end-index. For
each entry, the variable value is copied from its privatization buffer to its original memory location.
The commit operation succeeds when it copies all entries in the commit_list and sets end-index
to zero. After a successful commit, the runtime clears the commit_ready bit and proceeds to the
following task (Figure 4c). If power fails during commit, commit_ready remains set. Since the
runtime checks the bit on boot, it will retry the commit operation until it completes successfully.
If power fails after commit but before transition_to completes the transition to the next task,
then commit will re-execute on next boot and will trivially complete since end-index is zero. The
transition_to that failed to complete will then run again.
4.4 Privatizing and Committing Arrays
Alpaca privatizes and commits array variables differently from scalar variables because naively
privatizing an entire array (i.e., copying the entire array to a privatization buffer as a task starts) is
9
unnecessary if the task accesses only part of the array. Alpaca statically pre-allocates a privatization
buffer for each array that may be read then written (i.e., may be involved in a W-A-R dependence).
The array’s privatization buffer contains the same number of entries as the original array. Privati-
zation takes place at the granularity of an array element. In the example in Figure 5, to privatize
array C, the compiler allocates C_priv buffer (Line 2) and inserts the instrumentation code that is
highlighted in green (and explained below).
Fig. 5. Privatization and commit for arrays.
Like a scalar variable, privatizing an array ele-
ment involves initializing a copy in the privatiza-
tion buffer (Line 11), redirecting accesses to the
buffer (Lines 12-13), and adding the variable to the
commit_list via a call to pre_commit (Line 16).
Alpaca uses the compiler to redirect array element
accesses to their privatization buffers the same as
for scalars, but initializing privatization buffers and
pre-commit for arrays are different. Alpaca initial-
izes an array element’s privatization buffer the first
time an execution accesses the element: either ex-
plicit instrumentation inserted by Alpaca initial-
izes the buffer before the element’s first read or the
element’s first write directly writes to the buffer.
Alpaca does pre-commit for an array element only
once after the first write to it.
One key design choice in Alpaca was to decide
when instrumentation on a read operation should
initialize an array element’s privatization buffer.
Read instrumentation should not initialize the pri-
vatization buffer after a previous write in the task
because the initialization would overwrite the writ-
ten value. Instead, the read instrumentation can
initialize the privatization buffer either once before the first read that happens before the first write
or (possibly redundantly) at every read before the first write. We chose the latter option to avoid
the overhead of dynamically tracking the first read, which incurs a high runtime overhead.
We avoid invoking pre-commit unconditionally after every write because multiple writes to the
same element would append duplicate entries to the commit_list, which is inefficient and precludes
a statically sized commit_list. Furthermore, pre-commits cannot be batched and executed before
a task transition (like for scalar variables), because the set of elements dynamically accessed is
unknown statically. Batching would require dynamically tracking the set of modified elements
in a data structure that supports efficient insertion and traversal which is complex. Executing
pre-commit after the first write obviates the complexity of batching and only requires Alpaca to
identify the first write to an array element.
Correctly handling array privatization and pre-commit requires some instrumentation to execute
conditionally, only on an element’s first write. To identify an element’s first write, Alpaca must track
the set U of array elements that have been written since the beginning of the task in the current
execution attempt. A write of an element is first if and only if the element is not in this setU at the
time of the access. The data structure that representsU needs only to provide efficient insertion
and lookup, which our version-backed bitmask data structure does. A version-backed bitmask is a
bitmask that supports a constant-time clear operation using a versioning mechanism described later
in this section. We represent U by setting logical bits (i.e., “entries”) in a version-backed bitmask
10
that is statically allocated for each array being privatized. In Figure 5, the version-backed bitmask
for C is C_vbm allocated on Line 3.
Each version-backed bitmask entry is a 16-bit integer version. To set an entry (vbm_set), Alpaca
copies the global cur_version counter into the entry. To test an entry (vbm_test), Alpaca compares
the version stored in that entry to the global cur_version counter; equality indicates the entry
is set, inequality indicates unset. Consequently, when the global cur_version counter changes,
all version-backed bitmasks are implicitly cleared. When the cur_version counter overflows and
rolls over, the runtime explicitly resets all entries in all version-backed bitmasks to zero.
To track the set U of array elements updated in the current task execution attempt, the Alpaca
compiler instruments reads and writes to array elements with code to set and test entries in the
array’s version-backed bitmask. When reading from an array element that has not been modified
yet, i.e. its entry in U is unset (Line 10), then the runtime initializes the element copy in the
privatization array (Line 11). When writing to an array element for the first time, after checking
that its entry in U is not set (Line 14), it inserts the element into U by setting its entry (Line 15),
and appends the written array element to the commit_list by calling pre_commit (Line 16). The
setU is cleared at the next task transition or reboot, since the cur_version counter increments on
each task transition and reboot (Section 4.1), which implicitly clears the version-backed bitmask.
5 ALPACAWITH UNDO-LOGGING
The design of Alpaca described up to this point relies on privatization and commit, which is a
redo-logging approachMaeng et al. [2017] to keeping memory consistent across intermittent failures.
We also developed a more efficient Alpaca design variant that relies instead on undo-logging. Both
design variants use the same programming interface, differing only in how they manage memory.
Section 8 compares undo- and redo-logging, showing that Alpaca-undo is on average 1.53x faster
than its redo-logging counterpart. This text refers to the Alpaca design using privatization and
commit as Alpaca-redo, refers to the undo-logging variant as Alpaca-undo, and refers to both
generally as Alpaca.
5.1 Undo-Logging Instead of Privatization
Redo-logging and undo-logging each present advantages and disadvantages. Alpaca-redo privatizes
variables involved inW-A-R dependences and commits updates to those data, when a task completes.
Redo-logging affords zero-cost recovery: requiring no action before continuing after a power failure.
However, redo-logging pays a cost in its need to first privatize data and later commit them, which
requires two copy operations per variable per completed task. In contrast, undo-logging backs up
a variable and subsequently manipulates the variable in place, requiring no action when a task
completes. However, an undo-logging system must restore values saved in the undo log before
continuing execution after a power failure. While restoring from power failure is more costly than
in a redo-logging system, undo-logging requires only one copy operation (to back a variable up)
per variable per completed task. When successful completion of a task is more common than the
interruption of a task by a power failure, undo-logging will be more efficient than redo-logging. In
Alpaca, successful task completion is usually more common than interruption by a power failure
because a typical task requires much less energy than the maximum energy that the device can
buffer. In the worst case when all tasks require the maximum amount of energy that the device can
buffer, the task will fail once for each completion. Under Alpaca’s assumptions, undo-logging is
appealing because tasks cannot fail more often than they complete.
11
5.2 Undo-logging Compiler and Runtime System
Similarly to Alpaca-redo, Alpaca-undo relies on a compiler to transform code and insert calls to the
Alpaca-undo runtime system into the program.
1   TS int A[30]; TS int a; 
2   NV int A_bak[30]; NV int a_bak; 
3   NV int A_vbm[30];
4   task example_1() {
5     backup(&a_bak, &a, sizeof(a)); 
6     a++; 
7     for (int k=0; k<15; k++) { 
8       int r = rand()%30;
9       int tmp = A[r]; 
10      if (!vbm_test(A_vbm[r])) { 
11        vbm_set(A_vbm[r]); 
12        backup(&A_bak[r], &A[r],  
            sizeof(A[r])); 
13      } 
14      A[r] = tmp + 1; 
15    } 
16    transition_to(example_2) 
17  } 
#def vbm_test(v) v == cur_version 
#def vbm_set(v)  v = cur_version
Fig. 6. Alpaca-undo’s compiler transforma-
tion.
Figure 6 shows how Alpaca-undo transforms code
to implement undo-logging. Instead of allocating a pri-
vate copy, the compiler allocates a static undo log (Line
2). Alpaca-undo selectively backs up non-array W-A-R
variables at the start of the task (Line 5). For an ar-
rays, Alpaca-undo uses Alpaca’s version-backed bit-
mask scheme to detect the first write, as discussed in
Section 4.4 (Line 10). Alpaca-undo backs up an array
value before its first write (Line 12). Unlike Alpaca-
redo, Alpaca-undo need not redirect memory accesses
to a copy because operations manipulate data in place
(Line 6, 9, 14). Additionally, Alpaca-undo does not need
instrumentation before an array read (Line 9). Alpaca-
undo detects variables involved in W-A-R dependences
using Algorithms 1—3, identically to Alpaca-redo.
Figure 7 shows how Alpaca-undo backs up and re-
stores data to keep memory consistent. At the begin-
ning of a task, Alpaca-undo backs up the task’s W-A-R
variables (Figure 7a). Alpaca-undo maintains a list of
backed-up variables (backup_list). Alpaca-undo also
sets the need_rollback flag, indicating that there are
variables backed up. Then, Alpaca-undo manipulates variables in place (Figure 7b). Even after the
update to a its original value remains in the backed-up copy. When a task successfully completes,
Alpaca-undo clears the need_rollback flag and backup_list (Figure 7c). The runtime system
clears the list efficiently by resetting the list’s iterator, without zeroing its contents. After a power
failure, Alpaca-undo rolls back changes by iterating through the backup_list if the need_rollback
flag is set. For each entry, the runtime system writes the backed-up value into its corresponding
memory location. After processing all entries, the runtime unsets the need_rollback flag and
continues. Section 4.2 discusses sizing the backup list.
6 ALPACA DISCUSSION
Alpaca’s programming model guarantees that tasks will execute atomically. Our Alpaca imple-
mentation efficiently provides this atomicity guarantee by selectively privatizing data. Besides
programmability, efficiency, and consistency, Alpaca supports I/O operations and allows modular
re-use of code. This section discusses these aspects of Alpaca and characterizes its main limitations.
6.1 Low Overhead
A key contribution of this work is that our Alpaca implementation has low overhead compared
to existing systems to which we can directly compare (we quantify the difference in Section 8).
Alpaca’s overhead is low, because privatization is simple and because Alpaca privatizes variables
selectively. Privatization has a low cost, primarily because it rarely occurs: most variables are not
privatized because they are either local to a task or shared but not involved in W-A-R dependences.
Furthermore, Alpaca’s task-based execution avoids all checkpointing cost. Alpaca needs to retain
only the information about which task was last executing. Alpaca does not incur the cost of tens of
12
cur_task = Task 1 
need_rollback = 1 
cur_version = 1 
Task 1 
Task 2 
context
b = a; 
a++;
backup_list
NVM 
a
bb = b + a; 
... 
&a 
1 1
backup(&a);  
backup(&b);  
backup
data
transition(); 
(a) Backup of variable a
cur_task = Task 1 
need_rollback = 1 
cur_version = 1 
Task 1 
Task 2 
context
b = a; 
a++;
backup_list
NVM 
a
bb = b + a; 
... 
&a 
2 1
backup(&a);  
backup(&b);  
backup
data
transition(); 
1
(b) Executing task 1
cur_task = Task 2 
need_rollback = 0 
cur_version = 2 
Task 1 
Task 2 
context
b = a; 
a++;
backup_list
NVM 
a
bb = b + a; 
... 
2 1
backup(&a);  
backup(&b);  
backup
data
transition(); 
1
(c) Finishing task 1
Fig. 7. Making progress in Alpaca-undo. Each panel shows the execution at left and the system state at right.
The current phase is shaded. Initially, a=1 and b=0. (a) Before executing Task 1, Alpaca-undo copies the value
of a to its backup copy. Backed up variables are marked in backup_list. (b) Task 1 gets executed, updating
variables in-situ. (c) When Task 1 is finished, Alpaca-undo simply clears the backup_list and related flags.
writes to non-volatile memory to save registers, like Ratchet [Van Der Woude and Hicks 2016], nor
the even higher additional cost to save the stack, like DINO [Lucia and Ransford 2015]. By reducing
copying and privatizing only when necessary, Alpaca saves time and energy.
6.2 Memory Consistency
Alpaca preserves memory consistency despite arbitrarily-timed power failures by making each
attempt to execute a task idempotent. Task idempotence guarantees that if any attempt has sufficient
energy to complete, the effects of a single, atomic execution of the task are made visible in memory.
The memory state immediately after a task transition is equivalent to the corresponding state
in execution on continuous power. Alpaca guarantees idempotence by privatizing non-volatile
variables involved in W-A-R dependences and requiring volatile state to be task-local.
6.2.1 Non-volatile Memory Consistency. Taking a cue from prior work [De Kruijf and Sankar-
alingam 2013; de Kruijf et al. 2012; Lucia and Ransford 2015; Van DerWoude and Hicks 2016], Alpaca
privatizes only non-volatile variables involved in W-A-R dependencies. We show that privatizing
only this subset is sufficient by proving that only memory accesses related by W-A-R can cause a
value written by the task before a power failure to be read by the same task after the power failure.
Consider one task and assume that control flows along the same path each time the task re-
executes, which is true of all code that does not perform I/O operations (we discuss I/O later
in this section). Consider one memory location and let R ji andW
j
i respectively denote the ith
memory read and write to that location during the jth attempt to execute the task. If power fails
in attempt j after k accesses and the task re-executes, then the sequence of memory accesses is:
X j0, . . .W
j
p . . .X
j
k − [power failure] − X
j+1
0 . . .R
j+1
q . . ., where X stands for either read or write and
our hypothesis postulates a writeW jp before the power failure and a read R
j+1
q that returns the same
value. The hypothesis implies that q < p, otherwise,W j+1p would overwrite the value written by
W jp before R
j+1
q reads it. The order q < p implies that R jq precedesW jp in the task code, which is the
definition of a W-A-R dependence.
13
6.2.2 Volatile Memory Consistency. In Alpaca, the only volatile data are task-local variables.
Since all local variables must be initialized before use in a task, local reads after a power failure
will never access uninitialized memory. Since volatile memory clears on reboot, local reads will
never observe a value written before the power failure.
Like prior work [Van Der Woude and Hicks 2016], Alpaca conservatively assumes that compiler
optimizations cannot introduce memory read or write instructions and Alpaca safely interacts with
any compiler optimization that adheres to this assumption.
6.3 I/O
Fig. 8. I/O in Alpaca.
Code that interacts with sensors and actuators poses three
difficulties: (1) some I/O-related actions must execute atom-
ically, (2) external inputs introduce non-determinism, and
(3) actuation or output cannot be undone. Alpaca allows the
programmer to express (1) and (2) through careful coding
patterns that we describe below. Alpaca targets applications
that can tolerate repeated outputs, where (3) is acceptable.
Some applications include I/O-related code that should
execute atomically, such as the code in Figure 8. The
code reads temperature and pressure sensors and sets the
heaterOn or coolerOn flag, based on the sensed data. The
temperature and pressure values should be consistent. Al-
paca lets the programmer ensure that the values will be
consistent by putting the actions in the same task. In con-
trast, a system with dynamic [Balsamo et al. 2015; Ransford
et al. 2011a] or compiler-inserted [Van Der Woude and
Hicks 2016] task boundaries gives the programmer no way
to ensure that the input operations execute atomically.
The code in the example asserts that heaterOn and coolerOn are never both true. The code
misbehaves if a power failure occurs after assigning one of the flags (e.g., heaterOn). If the sen-
sor’s result is different in the following execution attempt, the code could set the other flag (e.g.,
coolerOn), violating the assertion. The core issue is that non-volatile memory updates are con-
ditionally dependent on sensed inputs. If control-flow depends on the input, then conditional
non-volatile memory updates can violate task idempotence. We note that this problem also afflicts
prior efforts [Colin and Lucia 2016; Lucia and Ransford 2015; Van Der Woude and Hicks 2016]. A
programmer can preserve idempotence by using intermittence-safe I/O programming patterns.
Concretely, one programming pattern that avoids the problem in this example is to use a dedicated
task to read and store both temp and pres, and to use another task to do the conditional updates to
heaterOn and coolerOn. Alternatively, a programmer could avoid the problem by ensuring that
both execution paths access the same set of memory locations: inserting coolerOn = false; on
the if branch and inserting heaterOn = false; on else branch.
6.4 Forward Progress
Guaranteeing forward progress in an intermittent, energy-harvesting system is a difficult problem
that is orthogonal to the problems solved by Alpaca. The key challenge is that a system buffers
a fixed amount of energy before it begins operating and if the energy required by a task exceeds
the buffered amount, the task will never complete executing, preventing progress. A task’s energy
cost can be input dependent, adding further complexity. This progress issue is not unique to
14
Alpaca, afflicting prior task-based systems as well [Colin and Lucia 2016; Lucia and Ransford 2015;
Mirhoseini et al. 2013; Ransford et al. 2011a; Van Der Woude and Hicks 2016].
Prior work has used ad hoc techniques that attempt to ensure progress, to the detriment of other
system characteristics. Ratchet [Van Der Woude and Hicks 2016] inserts a dynamic checkpoint
between static checkpoints after repeatedly failing to make progress. Other systems [Balsamo et al.
2016, 2015; Ransford et al. 2011b] dynamically checkpoint in response to an interrupt when energy
is low. Dynamic checkpointing requires capturing enough state to restart from an arbitrary point,
which can take a prohibitive amount of time [Colin and Lucia 2016], especially with hybrid volatile
and non-volatile memory. Dynamic checkpointing may also violate I/O atomicity (see Section 6.3).
We opted not to include a dynamic checkpointing fall-back in Alpaca. Instead the programmer
must ensure for sizing tasks such that tasks in their program do not require more energy than
their target device can buffer. As long as this condition is satisfied, Alpaca always avoids atomicity
violations and guarantees correctness. None of the tasks in our test programs have a forward
progress problem. It would be straightforward to incorporate a dynamic checkpointing fall-back
into our Alpaca prototype.
6.5 Reusability of Tasks
In a task-based programming model for intermittent execution, code reuse via functions is insuffi-
cient, because functionality that uses more energy than the device can buffer cannot be encapsulated
in a single function. The programmer of Alpaca can reuse the sequence of tasks as a C programmer
would reuse a function by passing arguments, return address, and return value manually through
task-shared variables.
6.6 Prototype Limitations
Our Alpaca prototype supports a useful subset of the C language, handling most uses of pointers
and complex data structures. Our prototype has a few implementation-specific limitations, which
we emphasize are not fundamental limitations of Alpaca.
We implemented a limited pointer alias analysis and our prototype requires that a TS pointer
only ever be assigned the address of a TS variable if that address is constant. Allowing TS pointers
to constant variables permits the especially important case of function pointers.
Our prototype requires the programmer to refer to array elements directly, i.e., writing A[30]
instead of *(p + 30). Our prototype statically inserts code to maintain version-backed bitmasks
on array accesses. Array indirection would require our prototype to use instrumentation that
dynamically disambiguates pointers to arrays, to determine which bitmask to update. Our prototype
makes the calculated choice to avoid this additional dynamic analysis cost by requiring direct array
access. We note that this strategy is similar to DINO [Lucia and Ransford 2015].
7 BENCHMARKS AND METHODOLOGY
We evaluated Alpaca using a collection of applications taken from prior work running on real,
energy-harvesting hardware. Our evaluation ran on aWISP5 [Sample et al. 2008] energy-harvesting
platform that runs a TI MSP430FR5969 microprocessor with harvested RF energy. We used a Saleae
Digital Logic Analyzer to measure the execution time by timing GPIO pulses generated at the end
of each application. To power the WISP, we used the ThingMagic Astra-EX RFID reader as an RF
energy source with its power parameter set to -X 50, and a distance between the WISP and the
power source of 20cm.
We evaluated Alpaca using applications ported to run on harvested energy using DINO [Lucia
and Ransford 2015], Chain [Colin and Lucia 2016], Alpaca-redo, Alpaca-undo, and Ratchet [Van
Der Woude and Hicks 2016], allowing for a thorough direct comparison. DINO and Chain versions
15
37
39
CEM CF RSA AR BF BC GEOMEAN
0
2
4
6
8
10
12
Ru
n 
tim
e 
(n
or
m
ali
ze
d)
plain C
Alpaca-undo
Alpaca-redo
Chain
DINO
Ratchet
(a) On continuous power
23
25
CEM CF RSA AR BF BC GEOMEAN
0
2
4
6
8
10
Ru
n 
tim
e 
(n
or
m
ali
ze
d)
Alpaca-undo
Alpaca-redo
Chain
DINO
Ratchet
(b) On harvested energy
Fig. 9. Run time performance. Data are normalized to performance of (a) plain C and (b) Alpaca-undo.
of four applications were provided by the authors of the Chain [Colin and Lucia 2016] paper: activity
recognition (AR), cuckoo filter (CF), rsa encryption (RSA), and cold-chain equipment monitoring
(CEM). We ported two additional applications from the MIBench [Guthaus et al. 2001] to run with
DINO, Chain, and two versions of Alpaca.
DINO assumes precise pointer aliasing to work correctly [Lucia and Ransford 2015], and performs
very poorly on conservative, practical pointer aliasing. Thus, the DINO code we obtained from the
author of Chain [Colin and Lucia 2016] was hand-annotated assuming perfect pointer aliasing. Our
evaluation shows that Alpaca even outperforms the hand-annotated oracle DINO. Alpaca does not
assumes any perfect pointer aliasing as in DINO.
We ported Ratchet [Van Der Woude and Hicks 2016], which was originally targeted for ARM
architecture, to run on TI MSP430 series. While doing so, we lack some of the ARM-specific
optimizations suggested by the original work. According to the evaluation by the authors, omitting
the optimizations can lead to around 1.6x slowdown [Van Der Woude and Hicks 2016].
We studied six applications, summarized in Table 2.
Table 2. Summary of benchmarks.
App Description
CEM LZW-compresses a random number stream using 512-entry dictionary.
CF Stores and retrieve a sequence of random numbers using 128-entry filter.
RSA Encrypts a 11-byte string with 64 bit key using RSA encryption.
AR Collects 128 accelerometer samples and use nearest neighbor classification to detect movement.
BF Encrypts a 32-byte string using Blowfish encryption.
BC Counts the number of set bits in a given input stream.
To ensure a fair comparison, applications use identical task definitions for Chain, Alpaca-redo,
and Alpaca-undo, and we inserted task boundaries at equivalent code points for DINO. Since
inserting boundaries automatically is part of the system for Ratchet, we did not manually inserted
boundaries for Ratchet.
16
8 EVALUATION
Our evaluation compares directly to Chain, DINO, and Ratchet and illustrates several findings
about Alpaca. The data show that both Alpaca-undo and Alpaca-redo outperforms existing systems
while running natively on existing hardware both on harvested energy and running on continuous
power. Our evaluation characterizes these findings, showing that Alpaca avoids the costliest
time and memory overheads of prior approaches. We qualitatively and quantitatively show that
programming with Alpaca is simple compared to other approaches. We also contrast our Alpaca-
redo implementation with an alternative Alpaca-redo design that privatizes data to volatile memory,
showing that our baseline design is usually more efficient because of additional overheads required
by volatile privatization.
8.1 Run Time Performance
25
Alp
aca
-un
do
Alp
aca
-re
do
Ch
ainDIN
O
Ra
tch
et
0
2
4
6
8
10
Ru
n 
tim
e 
(n
or
m
ali
ze
d)
checkpointing
channeling
logging
task transition
app code
CEM CF RSA AR BF BC
Fig. 10. Breakdown of overheads. Bars are normal-
ized to the run time of Alpaca-undo.
Figure 9 shows Alpaca’s run time performance,
measured on real hardware on both continuous
power and on harvested RF energy. Performance
on continuous power is an upper bound on per-
formance because it avoids reboot-related over-
head. Performance on harvested energy includes
all reboot-related overheads and is representative
of a real-world deployment.
Figure 9a shows performance on continuous
power for each system, normalized to a plain C
implementation that implements each application
without considering intermittence. As expected, Al-
paca has an overhead compared to plain C code,
with the average slowdown of 1.55x (Alpaca-undo)
and 2.31x (Alpaca-redo), respectively. However,
both Alpaca outperforms previous state-of-the-art
systems Chain, DINO, and Ratchet. Also, Alpaca-
undo consistently outperforms Alpaca-redo by
1.49x on average, exemplifying the optimization discussed in Section 5.1 to be useful. When
compared to the previous state-of-the-arts, Alpaca-undo outperforms Chain, DINO, and Ratchet by
5.44x, 4.22x, and 2.99x on average.
Figure 9b shows performance on harvested energy. Here, we omit the plain C variant because
it does not handle intermittence and cannot run correctly on harvested energy. Alpaca-undo
again outperforms Alpaca-redo by 1.53x, and both Alpaca outperforms all the other systems, by
Alpaca-undo outperforming Chain, DINO, and Ratchet by 5.19x, 4.63x, and 4.00x on average. The
performance gap is mostly larger on harvested energy because power failures introduce reboot-
related overheads and Alpaca’s reboot overhead is extremely low.
8.2 Characterizing Alpaca’s Runtime Overhead
To better understand Alpaca’s performance, we made detailed measurements of each system’s
major overheads. The two Alpaca’s main overheads are logging (undo-logging for Alpaca-undo,
and redo-logging or privatization for Alpaca-redo) and task transitioning. Chain’s major overheads
are channel manipulation and task transitioning. The task transitioning of the three systems are
different: Alpaca-undo’s task transitioning simply clears the index of the backup list and some
flags, whereas Alpaca-redo’s task transitions commit privatized state and Chain’s task transitions
17
commit all data written to “self” channels. DINO and Ratchet’s major overheads are checkpointing
and restoring the checkpoint on reboot.
We measured each system’s overheads by toggling GPIO at the beginning and at the end of each
overhead and summing up the duration using Saleae Logic Analyzer. When measuring the overhead,
we experimented on continuous power instead of harvested energy, since frequent GPIO toggling
consumes non-negligible amount of energy. On continuous power, we used the microcontroller’s
internel timer to periodically mimic power failure. Our measurements are not exact, and may
over-estimate overheads whose resolutions are finer than what can be precisely captured by our
method, such as Alpaca’s logging overhead which is only few instructions. Nonetheless, we expect
the result to show the rough scale of each overhead without deviating too much from the truth.
Figure 10 shows overheads of each system. The data show that Alpaca has high performance
because it imposes few overheads. Alpaca’s logging requires many fewer operations than Chain’s
channel manipulations and Ratchet and DINO’s checkpointing. Also, the task transition overhead
of Alpaca-undo is shown to be much less than the task transition overhead of Alpaca-redo. This is
because Alpaca-redo needs to commit privatized values on transition, and is the main reason for
the speedup of Alpaca-undo against Alpaca-redo.
8.3 Non-Volatile Memory Consumption
CEM CF RSA AR BF BC GEOMEAN
0
1
2
3
4
5
6
7
8
NV
M 
us
ag
e 
(n
or
m
ali
ze
d)
17.6 12.1
Alpaca-undo
Alpaca-redo
Chain
DINO
Ratchet
Fig. 11. Non-volatile memory use.
We measured the non-volatile memory con-
sumption by inspecting each application binary.
For Ratchet which uses FRAM as its main mem-
ory, we measured the size of the stack to mea-
sure the non-volatile memory consumption. For
DINO which reserves double-buffered check-
pointing space equal to twice the maximum
stack size of 2KB, i.e., 4KB total, we added 4KB
to the number from the binary. None of these
applications dynamically allocates non-volatile
memory, as is typical in embedded systems.
Figure 11 shows that Alpaca-undo and Alpaca-
redo uses moderate non-volatile memory, using
slightly more than Ratchet, but much less than
Chain and DINO. Alpaca uses less non-volatile memory than Chain mainly because Chain creates
multiple versions of variables that exist in different channels. Alpaca uses less non-volatile memory
than DINO because DINO checkpoints all volatile state and versions some non-volatile state, while
Alpaca never checkpoints and only selectively privatizes non-volatile state.
8.4 Privatizing Data to Volatile Memory
We evaluated an alternative implementation of Alpaca-redo, called Alpaca-VM that uses volatile
memory to store privatized values, motivated by the fact that volatile memory accesses require less
energy than non-volatile memory accesses. Unlike Alpaca-undo whose updates are made in-situ in
FRAM, Alpaca-redo privatizes the W-A-R variable, making privatizing to volatile memory possible.
To ensure that volatile values commit atomically despite failures, Alpaca-VM must make a full
copy of all privatized values to a non-volatile commit buffer during pre-commit. Privatizing data to
volatile memory is only a net benefit if the time and energy saved by using volatile memory in the
task are more than the time and energy consumed by copying to the commit buffer.
We experimented with a microbenchmark which does fixed number of read-modify-write (RMU)
operation to measure how many accesses to volatile privatized data are required to amortize the
18
increased pre-commit cost of using volatile privatization buffers. The experiment result implied that
when the task contains more than around 110 RMWs, Alpaca-VM begins to outperform Alpaca-redo.
We quantified the number of read and writes per each task in our real applications. Our tasks
had 2.1 reads and 1.05 writes to a privatized variables on average. The numbers are much smaller
than the “tipping point” which was around 110 RMWs, suggesting that volatile privatization is
unlikely to pay off.
We implemented Alpaca-VM and the result showed that the performance is often worse than, or
negligibly different from Alpaca-redo’s performance, which is consistent with our “tipping point”
characterization. Alpaca-VM is only likely to be viable and beneficial in a system with a much
larger energy buffering capacitor that accommodates more (i.e., hundreds of) reads and writes in
each task.
8.5 Comparing Programmer Effort
We compared the programming effort required by Alpaca to the effort required by Chain, DINO,
and Ratchet and found that Alpaca requires reasonable code changes compared to Ratchet and
DINO code, but requires less change than writing Chain code. Like Alpaca, Chain also requires
the programmer to decompose code into tasks, which is different from writing typical C code
and we characterize task sizing next. Unlike Alpaca, Chain also requires additional effort to re-
write memory access code in terms of channel operations, which is different from a typical C
programming. Alpaca instead allows code to manipulate task-shared variables like ordinary C
variables using loads and stores.
Table 3. Lines of code and number of keywords.
App Alpaca Chain DINO Ratchet
LoC # Bnd. # Decl. LoC # Bnd. # Decl. # R/W LoC # Bnd. LoC
CEM 372 19 28 721 19 40 63 338 13 325
CF 397 19 29 707 19 41 72 335 11 324
RSA 765 27 40 1197 27 53 123 722 35 687
AR 466 19 26 713 19 34 57 439 8 431
BF 614 18 24 740 18 29 75 556 9 547
BC 313 23 26 588 23 26 57 276 10 266
8.5.1 Quantifying Programmer Effort. We quantified the difference in programmer effort between
systems by comparing the differences in the number of lines of code (LoC) and the number of
keywords by each system. Keywords are divided into three types: boundary, declaration, and
read/write. Boundary keywords (Bnd) represent task boundaries (i.e., transition_to) in Alpaca
and Chain, and checkpoints in DINO. Declaration keywords (Decl) modify function and data
declarations: task and TS for Alpaca, and task and channel declaration for Chain. Read/Write
keywords (R/W) access memory and only occur in Chain (channel in and channel out), because
Alpaca and DINO use a standard C read/write memory interface. Ratchet does not require any
additional keywords.
Table 3 summarizes the data. On average, the number of lines of Alpaca code is 11% higher than
DINO code and 14% higher than Ratchet, but Alpaca requires 39% fewer lines then Chain. The
number of keywords used by Alpaca code is 240% more than the number used by DINO code, but
is only 27% of the number used by Chain code. Although these data are only a rough indicator
of programming complexity, the data suggest that Alpaca’s complexity lies somewhere between
Chain and DINO.
19
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80
Distance from power source (cm)
0
5
10
15
20
25
30
35
Ru
n 
tim
e 
(m
s)
(a) One charge cycle run time for various distance
100 101 102 103
Number of NVM write inside a task
2
4
6
8
10
12
Ru
n 
tim
e 
(x
10
4  C
yc
le)
(b) Run time for various task size
Fig. 12. Choosing task size. (a) Energy availability does not vary with input power. (b) Task overhead varies
with task size.
8.5.2 Choosing a Task’s Size. Dividing a program into tasks is a key part of Alpaca development,
and we we experimentally characterize the process to show that it is reasonable. Alpaca preserves
forward progress at the granularity of a task assuming the system eventually buffers sufficient
energy to complete each task. However, a real, energy-harvesting system with a fixed-size energy
buffer, may never be able to buffer sufficient energy for a very long task to complete, preventing
progress. If a task is too short, its privatization, commit, and transition overhead will be relatively
very high, impeding performance. Based on knowledge of the device and the energy cost of program
tasks, the programmer must assign work to an Alpaca task.
While it is a non-trivial programming task, defining the extents of Alpaca tasks requires only
modest programmer effort. We observed that on today’s energy-harvesting hardware, the task
decomposition problem is independent of input power and depends only on the device’s energy
buffer size. Figure 12a shows data for a microbenchmark that runs a loop on a WISP5 [Sample et al.
2008] device harvesting energy from an RF power supply. The x-axis shows the distance to the RF
power supply, which corresponds to input power. The y-axis shows the time to the first brown
out, at which point the device has exhausted energy accumulated in its capacitor and must slowly
recharge. Except for distances so small that the RF supply effectively continuously powers the
device (~10cm), the amount of work that the system can execute before browning out is invariant
to input power; the energy buffer is constant. Forming Alpaca tasks is thus a reasonable (albeit
non-trivial) task because the programmer need only reason about the total energy cost of a task.
The programmer need not reason about instantaneous input power, nor the power envelope of
particular hardware operations, which would be difficult.
We also experimentally observed that choosing a task size that amortizes privatization, commit,
and transition costs is not overly challenging. On a WISP5 device, we studied the effect of task size
on the run time of a microbenchmark that executes a fixed amount of work across a varying size
of tasks. The microbenchmark executes a fixed total number of read-modify-write operations on
entries in an array. We varied the number of accesses per task, and Figure 12b shows the relationship
between task size and total run time. Run time decreases as task size grows because tasks better
amortize commit and transition cost. However, the effect saturates as tasks grow, revealing that
even relatively small tasks of around 100 read-modify-write amortize Alpaca’s overheads well.
20
The data suggest that choosing a task size that amortizes task overheads will not be prohibitively
challenging to a programmer.
9 RELATEDWORK
Alpaca relates to prior work in several areas. Most related are prior efforts studying intermittent
computing, some of which discussed in Section 2. We also relate Alpaca to work on idempotent
compilation, systems with non-volatile memory, transactions and transactional memory.
9.1 Energy-Harvesting and Intermittent Computing
There is a large body of work on intermittent execution and other support for intermittent systems.
Some work [Maeng and Lucia 2018; Mirhoseini et al. 2013; Ransford et al. 2011a; Van DerWoude and
Hicks 2016] preserves progress and keep memory consistent by placing checkpoint automatically
that copies the volatile state. Alpaca avoids the overhead of volatile state checkpointing and
conservatism of the checkpoint placed by the compiler.
Other work [Colin and Lucia 2016; Lucia and Ransford 2015] versions non-volatile memory
either manually [Colin and Lucia 2016] or automatically [Lucia and Ransford 2015] to support
systems with mixed-volatility [Ransford and Lucia 2014; Sample et al. 2008; Zhang et al. 2011a].
Alpaca’s non-volatile memory protection is more efficient than the prior systems (see Section 8).
Some systems [Baghsorkhi and Margiolas 2018; Bhatti and Mottola 2017; Colin and Lucia 2018]
tries to statically estimate energy use of a code and optimize checkpoint placement. However,
estimating energy use in arbitrary code is difficult and error prone. Alpaca asks the programmer to
place the boundaries of the task.
QuickRecall [Ransford et al. 2011b], Hibernus [Balsamo et al. 2015], and Hibernus++ [Balsamo
et al. 2016] do on-demand checkpointing of volatile state when supply voltage is below a threshold.
This approach is effective, but requires continuous supply voltage measurement hardware, which
is not typically available [Sample et al. 2008; Zhang et al. 2011a]. Also, choosing a threshold
voltage is not straightforward. Too high a threshold makes the system checkpoint and wait for
energy, even if there is ample energy to continue. Too low a threshold may fail to guarantee
that checkpointing completes, which is especially problematic with a variable size call stack and
arbitrary global variables. Alpaca is energy agnostic, avoiding hardware requirements and threshold
voltage assignment issues.
Non-volatile processors [Ma et al. 2015] and Clank [Hicks 2017] propose architectural support
making intermittent software simple, but precluding the use of an existing hardware and imposing
a performance and complexity overhead. Dewdrop [Buettner et al. 2011] runs small, “one-shot”
tasks on intermittent hardware, optimizing task scheduling to maximize task completion likelihood
given limited energy. Dewrop, however, does not support computations that span failures.
Other work addresses intermittent computation, like Alpaca, but unlike Alpaca, these efforts are
not programming or execution models. Incidental computing [Ma et al. 2017] and NEOFog [Ma et al.
2018] optimize specific applications on top of the non-volatile processor. Wisent [Aantjes et al. 2017;
Tan et al. 2016] addresses intermittence, but is not a computing model, instead enabling reliable
software updating of in situ intermittent devices. Ekho [Zhang et al. 2011b] helps test intermittent
devices with support to collect and replay representative power traces from a realistic environment.
EDB [Colin et al. 2016] is a hardware/software tool that allows programmers to profile and debug
intermittent devices without interfering with their energy level. Federated energy [Hester et al.
2015] is a disaggregated energy buffering mechanism that decouples the energy storage of different
hardware components. Flicker [Hester and Sorber 2017] eases the design of an energy-harvesting
hardware platform by modularized peripherals and harvesters. TARDIS and CusTARD [Hester et al.
2016] keeps time on power failure and Mayfly [Hester et al. 2017] ensure timeliness of the data.
21
Capybara [Colin et al. 2018] enables changing the energy buffer size on-the-fly to support variety
of application demands. Some earlier work addresses computing using harvested energy, but unlike
Alpaca, these systems to not explicitly address intermittent computation. Eon [Sorber et al. 2007]
is one of the earliest efforts to target harvested-energy computation, scheduling prioritized tasks
based on energy availability. ZebraNet [Juang et al. 2002] dealt with the challenges of solar energy
in an adversarial environment.
9.2 Idempotent Code Compilation
Several prior efforts [De Kruijf and Sankaralingam 2013; de Kruijf et al. 2012; Zhang et al. 2013] noted
that a program decomposed into idempotent sections is robust to a number of failure modes because
idempotent sections can be safely re-executed. Idempotence systems break W-A-R dependances by
dividing dependent operations with a checkpoint (or section boundary). Like these systems, Alpaca
leverages the fact that eliminatingW-A-R dependences makes tasks idempotently re-executable.
Unlike other systems, however, Alpaca does not make code sections idempotent by inserting
checkpoints. Instead Alpaca ensures task atomicity by using task-based execution to avoid the need
for volatile state checkpoints, and privatization of non-volatile data involved inW-A-R dependences
to make tasks idempotently restartable.
As discussed in Section 2, Ratchet [Van Der Woude and Hicks 2016] uses compiler idempotence
analysis to insert checkpoints tomake inter-checkpoint regions idempotent, assumingmainmemory
is entirely non-volatile. Alpaca makes no assumption about memory volatility making it applicable
to more varied hardware, and its tasks’ sizes are free from idempotence analysis, unlike Ratchet.
9.3 Memory Persistency and Non-Volatile Memory Systems
The increasing availability of non-volatile memory creates a need for models defining the allow-
able reorderings of non-volatile memory updates and persist actions, which ensure data become
persistent [Pelley et al. 2014, 2015]. Relaxing the ordering of updates and persist actions to different
locations may expose a re-ordering to code resuming execution after a failure and persistency
models describe which of these re-orderings are valid. Other, earlier work developed mechanisms
for managing data structures in non-volatile memory, and for building consistent memory and
file systems out of byte-addressable non-volatile memory [Coburn et al. 2011; Condit et al. 2009;
Doshi and Varman 2012; Dulloor et al. 2014; Moraru et al. 2013; Narayanan and Hodson 2012;
Venkataraman et al. 2011; Volos et al. 2011]. Alpaca relates to these efforts because both aim to
keep non-volatile memory consistent across power failures. The prior work differs from Alpaca,
however, in purpose and mechanism. Alpaca is programming model and run-time implementation
that keeps data consistent across extremely frequent failures in intermittent executions. These prior
efforts focused on large-scale systems and are only peripherally applicable to intermittent devices.
9.4 Transactions and Transactional Memory
Transactions [Gray and Reuter 1992] and, in particular, transactional memory [Hammond et al. 2004;
Harris et al. 2005; Herlihy and Moss 1993; Shavit and Touitou 1995] (TM) systems are also related
to Alpaca. Transactional memory targets multi-threading systems. A transaction speculatively
updates memory until a (usually) statically defined atomic region ends. Transactions commit when
they complete execution, updating globally visible state, or aborting their speculative updates due
to a conflicting access in another thread, and beginning execution again. Transactions are similar to
Alpaca because Alpaca buffers a task’s updates privately, committing them to global memory when
a task ends. Moreover, when a power failure interrupts a task, its privatized updates are aborted
and it begins again from its start. However, Alpaca differs in that it targets intermittent systems
22
with potentially extremely frequent failures. Unlike TM, Alpaca does not target multi-threaded
programs, instead aiming to keep memory consistent between re-executions across power failures.
10 CONCLUSION AND FUTUREWORK
This work proposed Alpaca, a programming model for low-overhead intermittent computing that
does not require checkpointing, using a task-based execution model and a logging scheme built on
idempotence analysis. Compared to competitive systems from prior work, our Alpaca prototype
achieves significant performance improvement compared to a variety of systems from the literature.
Looking to the future, Alpaca emphasizes a need raised by Chain and DINO for a system to aid,
or automate the decomposition of a program into tasks, which is currently a reasonable task, but
mostly a manual process.
ACKNOWLEDGMENTS
Thanks to the anonymous reviewers for their valuable feedback and to Vignesh Balaji and Emily
Ruppel for contributing to discussions about the work. This work was supported by National
Science Foundation Award CNS-1526342, a Google Faculty Research Award, and a gift from Disney
Research. Kiwan Maeng was partially supported by a scholarship from the Korea Foundation for
Advanced Studies. Visit http://intermittent.systems.
REFERENCES
Henko Aantjes, Amjad YMajid, Przemyslaw Pawelczak, Jethro Tan, Aaron Parks, and Joshua R Smith. 2017. Fast Downstream
to Many (Computational) RFIDs. IEEE INFOCOM 2017 - The 36th Annual IEEE International Conference on Computer
Communications (2017).
Sara S Baghsorkhi and Christos Margiolas. 2018. Automating efficient variable-grained resiliency for low-power IoT systems.
In Proceedings of the 2018 International Symposium on Code Generation and Optimization. ACM, 38–49.
Domenico Balsamo, Alex S Weddell, Anup Das, Alberto Rodriguez Arreola, Davide Brunelli, Bashir M Al-Hashimi, Geoff V
Merrett, and Luca Benini. 2016. Hibernus++: a self-calibrating and adaptive system for transiently-powered embedded
devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 12 (2016), 1968–1980.
Domenico Balsamo, Alex SWeddell, Geoff VMerrett, Bashir M Al-Hashimi, Davide Brunelli, and Luca Benini. 2015. Hibernus:
Sustaining computation during intermittent supply for energy-harvesting systems. IEEE Embedded Systems Letters 7, 1
(2015), 15–18.
Naveed Anwar Bhatti and Luca Mottola. 2017. HarvOS: Efficient code instrumentation for transiently-powered embedded
sensing. In Proceedings of the 16th ACM/IEEE International Conference on Information Processing in Sensor Networks. ACM,
209–219.
Michael Buettner, Ben Greenstein, and David Wetherall. 2011. Dewdrop: An Energy-aware Runtime for Computational
RFID. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI’11). USENIX
Association, Berkeley, CA, USA, 197–210.
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011.
NV-Heaps: Making Persistent Objects Fast and Safe with Next-generation, Non-volatile Memories. In Proceedings of the
Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
XVI). ACM, New York, NY, USA, 105–118. DOI:http://dx.doi.org/10.1145/1950365.1950380
Alexei Colin, Graham Harvey, Brandon Lucia, and Alanson P. Sample. 2016. An Energy-interference-free Hardware-Software
Debugger for Intermittent Energy-harvesting Systems. In Proceedings of the Twenty-First International Coonference on
Architectural Support for Programming Languages and Operating Systems (ASPLOS ’16). ACM, New York, NY, USA,
577–589. DOI:http://dx.doi.org/10.1145/2872362.2872409
Alexei Colin and Brandon Lucia. 2016. Chain: Tasks and Channels for Reliable Intermittent Programs. In Proceedings of
the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
(OOPSLA 2016). ACM, New York, NY, USA, 514–530. DOI:http://dx.doi.org/10.1145/2983990.2983995
Alexei Colin and Brandon Lucia. 2018. Termination checking and task decomposition for task-based intermittent programs.
In Proceedings of the 27th International Conference on Compiler Construction. ACM, 116–127.
Alexei Colin, Emily Ruppel, and Brandon Lucia. 2018. A Reconfigurable Energy Storage Architecture for Energy-harvesting
Devices. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages
and Operating Systems (ASPLOS ’18). ACM, New York, NY, USA.
23
Jeremy Condit, Edmund B Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee.
2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM SIGOPS 22nd symposium on
Operating systems principles. ACM, 133–146.
Marc De Kruijf and Karthikeyan Sankaralingam. 2013. Idempotent code generation: Implementation, analysis, and evaluation.
In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE Computer
Society, 1–12.
Marc A. de Kruijf, Karthikeyan Sankaralingam, and Somesh Jha. 2012. Static Analysis and Compiler Design for Idempotent
Processing. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation
(PLDI ’12). ACM, New York, NY, USA, 475–486. DOI:http://dx.doi.org/10.1145/2254064.2254120
Kshitij Doshi and Peter Varman. 2012. WrAP: Managing byte-addressable persistent memory. In Memory Archiecture and
Organization Workshop.(MeAOW).
Subramanya R Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson.
2014. System software for persistent memory. In Proceedings of the Ninth European Conference on Computer Systems.
ACM, 15.
Jim Gray and Andreas Reuter. 1992. Transaction Processing: Concepts and Techniques (1st ed.). Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA.
Matthew R Guthaus, Jeffrey S Ringenberg, Dan Ernst, Todd M Austin, Trevor Mudge, and Richard B Brown. 2001. MiBench:
A free, commercially representative embedded benchmark suite. In Workload Characterization, 2001. WWC-4. 2001 IEEE
International Workshop on. IEEE, 3–14.
Lance Hammond, Vicky Wong, Mike Chen, Brian D Carlstrom, John D Davis, Ben Hertzberg, Manohar K Prabhu, Honggo
Wijaya, Christos Kozyrakis, and Kunle Olukotun. 2004. Transactional memory coherence and consistency. In ACM
SIGARCH Computer Architecture News, Vol. 32. IEEE Computer Society, 102.
Tim Harris, Simon Marlow, Simon Peyton-Jones, and Maurice Herlihy. 2005. Composable Memory Transactions. In
Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’05). ACM,
New York, NY, USA, 48–60. DOI:http://dx.doi.org/10.1145/1065944.1065952
Maurice Herlihy and J Eliot B Moss. 1993. Transactional memory: Architectural support for lock-free synchronization. In
Proc. of the 20th Annual International Symposium on Computer Architecture. 289–300.
Josiah Hester, Lanny Sitanayah, and Jacob Sorber. 2015. Tragedy of the Coulombs: Federating Energy Storage for Tiny,
Intermittently-Powered Sensors. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems
(SenSys ’15). ACM, New York, NY, USA, 5–16. DOI:http://dx.doi.org/10.1145/2809695.2809707
Josiah Hester and Jacob Sorber. 2017. Flicker: Rapid Prototyping for the Batteryless Internet-of-Things. In Proceedings of the
15th ACM Conference on Embedded Network Sensor Systems. ACM, 19.
Josiah Hester, Kevin Storer, and Jacob Sorber. 2017. Timely Execution on Intermittently Powered Batteryless Sensors. In
Conference on Embedded Networked Sensor Systems (SenSys 2017). ACM, New York, NY, USA.
Josiah Hester, Nicole Tobias, Amir Rahmati, Lanny Sitanayah, Daniel Holcomb, Kevin Fu, Wayne P. Burleson, and Jacob
Sorber. 2016. Persistent Clocks for Batteryless Sensing Devices. ACM Trans. Embed. Comput. Syst. 15, 4, Article 77 (Aug.
2016), 28 pages. DOI:http://dx.doi.org/10.1145/2903140
Matthew Hicks. 2017. Clank: Architectural Support for Intermittent Computation. In Proceedings of the 44th Annual
International Symposium on Computer Architecture. ACM, 228–240.
Philo Juang, Hidekazu Oki, Yong Wang, Margaret Martonosi, Li Shiuan Peh, and Daniel Rubenstein. 2002. Energy-efficient
Computing for Wildlife Tracking: Design Tradeoffs and Early Experiences with ZebraNet. In Proceedings of the 10th
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X). ACM,
New York, NY, USA, 96–107. DOI:http://dx.doi.org/10.1145/605397.605408
Mustafa Emre Karagozler, Ivan Poupyrev, Gary K Fedder, and Yuri Suzuki. 2013. Paper generators: harvesting energy from
touching, rubbing and sliding. In Proceedings of the 26th annual ACM symposium on User interface software and technology.
ACM, 23–30.
Yoonmyung Lee, Gyouho Kim, Suyoung Bang, Yejoong Kim, Inhee Lee, Prabal Dutta, Dennis Sylvester, and David Blaauw.
2012. A modular 1mm 3 die-stacked sensing platform with optical communication and multi-modal energy harvesting.
In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012 IEEE International. IEEE, 402–404.
Brandon Lucia and Benjamin Ransford. 2015. A Simpler, Safer Programming and Execution Model for Intermittent Systems.
In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015).
ACM, New York, NY, USA, 575–585. DOI:http://dx.doi.org/10.1145/2737924.2737978
Kaisheng Ma, Xueqing Li, Mahmut Taylan Kandemir, Jack Sampson, Vijaykrishnan Narayanan, Jinyang Li, Tongda Wu,
Zhibo Wang, Yongpan Liu, and Yuan Xie. 2018. NEOFog: Nonvolatility-Exploiting Optimizations for Fog Computing.
In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and
Operating Systems. ACM, 782–796.
24
Kaisheng Ma, Xueqing Li, Jinyang Li, Yongpan Liu, Yuan Xie, Jack Sampson, Mahmut Taylan Kandemir, and Vijaykrishnan
Narayanan. 2017. Incidental Computing on IoT Nonvolatile Processors. In Proceedings of the 50th Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO-50 ’17). ACM, New York, NY, USA, 204–218. DOI:http://dx.doi.
org/10.1145/3123939.3124533
Kaisheng Ma, Yang Zheng, Shuangchen Li, Karthik Swaminathan, Xueqing Li, Yongpan Liu, Jack Sampson, Yuan Xie, and
Vijaykrishnan Narayanan. 2015. Architecture exploration for ambient energy harvesting nonvolatile processors. In High
Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. IEEE, 526–537.
Kiwan Maeng, Alexei Colin, and Brandon Lucia. 2017. Alpaca: intermittent execution without checkpoints. Proceedings of
the ACM on Programming Languages 1, OOPSLA (2017), 96.
Kiwan Maeng and Brandon Lucia. 2018. Adaptive Dynamic Checkpointing for Safe Efficient Intermittent Computing. In
OSDI.
Azalia Mirhoseini, Ebrahim M Songhori, and Farinaz Koushanfar. 2013. Idetic: A high-level synthesis approach for enabling
long computations on transiently-powered ASICs. In Pervasive Computing and Communications (PerCom), 2013 IEEE
International Conference on. IEEE, 216–224.
Iulian Moraru, David G Andersen, Michael Kaminsky, Niraj Tolia, Parthasarathy Ranganathan, and Nathan Binkert. 2013.
Consistent, durable, and safe memory management for byte-addressable non volatile main memory. In Proceedings of the
First ACM SIGOPS Conference on Timely Results in Operating Systems. ACM, 1.
Dushyanth Narayanan and Orion Hodson. 2012. Whole-system Persistence. In Proceedings of the Seventeenth International
Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York,
NY, USA, 401–410. DOI:http://dx.doi.org/10.1145/2150976.2151018
Joseph A Paradiso and Mark Feldmeier. 2001. A compact, wireless, self-powered pushbutton controller. In International
Conference on Ubiquitous Computing. Springer, 299–304.
Steven Pelley, PeterM. Chen, and Thomas F.Wenisch. 2014. Memory Persistency. In Proceeding of the 41st Annual International
Symposium on Computer Architecuture (ISCA ’14). IEEE Press, Piscataway, NJ, USA, 265–276.
Steven Pelley, Peter M Chen, and Thomas FWenisch. 2015. Memory Persistency: Semantics for Byte-Addressable Nonvolatile
Memory Technologies. IEEE Micro 35, 3 (2015), 125–131.
Benjamin Ransford and Brandon Lucia. 2014. Nonvolatile Memory is a Broken Time Machine. In Proceedings of the
Workshop on Memory Systems Performance and Correctness (MSPC ’14). ACM, New York, NY, USA, Article 5, 3 pages.
DOI:http://dx.doi.org/10.1145/2618128.2618136
Benjamin Ransford, Jacob Sorber, and Kevin Fu. 2011a. Mementos: System Support for Long-running Computation on
RFID-scale Devices. (2011), 159–170. DOI:http://dx.doi.org/10.1145/1950365.1950386
Benjamin Ransford, Jacob Sorber, and Kevin Fu. 2011b. Mementos: System Support for Long-running Computation on RFID-
scale Devices. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages
and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 159–170. DOI:http://dx.doi.org/10.1145/1950365.1950386
Ronald L Rivest, Adi Shamir, and Leonard Adleman. 1978. A Method for Obtaining Digital Signatures and Public-key
Cryptosystems. Commun. ACM 21, 2 (Feb. 1978), 120–126. DOI:http://dx.doi.org/10.1145/359340.359342
Alanson P Sample, Daniel J Yeager, Pauline S Powledge, Alexander V Mamishev, and Joshua R Smith. 2008. Design of an
RFID-based battery-free programmable sensing platform. IEEE Transactions on Instrumentation and Measurement 57, 11
(2008), 2608–2615.
Nir Shavit and Dan Touitou. 1995. Software Transactional Memory. In Proceedings of the Fourteenth Annual ACM Symposium
on Principles of Distributed Computing (PODC ’95). ACM, New York, NY, USA, 204–213. DOI:http://dx.doi.org/10.1145/
224964.224987
Jacob Sorber, Alexander Kostadinov, Matthew Garber, Matthew Brennan, Mark D. Corner, and Emery D. Berger. 2007. Eon:
A Language and Runtime System for Perpetual Systems. In Proceedings of the 5th International Conference on Embedded
Networked Sensor Systems (SenSys ’07). ACM, New York, NY, USA, 161–174. DOI:http://dx.doi.org/10.1145/1322263.
1322279
Jethro Tan, Przemysław Pawełczak, Aaron Parks, and Joshua R Smith. 2016. Wisent: Robust downstream communication and
storage for computational RFIDs. In Computer Communications, IEEE INFOCOM 2016-The 35th Annual IEEE International
Conference on. IEEE, 1–9.
TI Inc. 2017. Products for MSP430FRxx FRAM. http://www.ti.com/lsds/ti/microcontrollers-16-bit-32-bit/msp/
ultra-low-power/msp430frxx-fram/products.page. (2017). Accessed: 2017-04-08.
Joel Van Der Woude and Matthew Hicks. 2016. Intermittent Computation Without Hardware Support or Programmer
Intervention. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16).
USENIX Association, Berkeley, CA, USA, 17–32.
Shivaram Venkataraman, Niraj Tolia, Parthasarathy Ranganathan, and Roy H. Campbell. 2011. Consistent and Durable Data
Structures for Non-volatile Byte-addressable Memory. In Proceedings of the 9th USENIX Conference on File and Stroage
Technologies (FAST’11). USENIX Association, Berkeley, CA, USA, 5–5.
25
Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight Persistent Memory. In Proceedings
of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS XVI). ACM, New York, NY, USA, 91–104. DOI:http://dx.doi.org/10.1145/1950365.1950379
Zac Manchester. 2015. KickSat. http://zacinaction.github.io/kicksat/. (2015).
Hong Zhang, Jeremy Gummeson, Benjamin Ransford, and Kevin Fu. 2011a. Moo: A batteryless computational RFID and
sensing platform. Department of Computer Science, University of Massachusetts Amherst., Tech. Rep (2011).
Hong Zhang, Mastooreh Salajegheh, Kevin Fu, and Jacob Sorber. 2011b. Ekho: Bridging the Gap Between Simulation and
Reality in Tiny Energy-harvesting Sensors. In Proceedings of the 4th Workshop on Power-Aware Computing and Systems
(HotPower ’11). ACM, New York, NY, USA, Article 9, 5 pages. DOI:http://dx.doi.org/10.1145/2039252.2039261
Wei Zhang, Marc de Kruijf, Ang Li, Shan Lu, and Karthikeyan Sankaralingam. 2013. ConAir: Featherweight Concurrency
Bug Recovery via Single-threaded Idempotent Execution. In Proceedings of the Eighteenth International Conference on
Architectural Support for Programming Languages and Operating Systems (ASPLOS ’13). ACM, New York, NY, USA,
113–126. DOI:http://dx.doi.org/10.1145/2451116.2451129
26
