Decentralized On-line Task Reallocation on Parallel Computing
  Architectures with Safety-Critical Applications by Khamvilai, Thanakorn et al.
Decentralized On-line Task Reallocation on Parallel
Computing Architectures with Safety-Critical
Applications
Thanakorn Khamvilai
School of Aerospace Engineering
Georgia Institute of Technology
Atlanta, GA, USA
tkhamvilai3@gatech.edu
Philippe Baufreton
Safran Electronics & Defense
Massy, France
philippe.baufreton@safrangroup.com
Louis Sutter
School of Aerospace Engineering
Georgia Institute of Technology
Atlanta, GA, USA
lsutter6@gatech.edu
Franc¸ois Neumann
Safran Electronics & Defense
Massy, France
francois.neumann@safrangroup.com
Eric Feron
School of Aerospace Engineering
Georgia Institute of Technology
Atlanta, GA, USA
eric.feron@aerospace.gatech.edu
Abstract—This work presents a decentralized allocation algo-
rithm of safety-critical application on parallel computing archi-
tectures, where individual Computational Units can be affected
by faults.
The described method consists in representing the architecture
by an abstract graph where each node represents a Computa-
tional Unit. Applications are also represented by the graph of
Computational Units they require for execution. The problem
is then to decide how to allocate Computational Units to appli-
cations to guarantee execution of the safety-critical application.
The problem is formulated as an optimization problem, with the
form of an Integer Linear Program. A state-of-the-art solver is
then used to solve the problem.
Decentralizing the allocation process is achieved through
redundancy of the allocator executed on the architecture. No
centralized element decides on the allocation of the entire
architecture, thus improving the reliability of the system.
Experimental reproduction of a multi-core architecture is also
presented. It is used to demonstrate the capabilities of the pro-
posed allocation process to maintain the operation of a physical
system in a decentralized way while individual component fails.
Index Terms—parallel computing, multi-core, reconfigurable,
safety-critical, fault tolerance, decentralized, integer linear pro-
gramming
I. INTRODUCTION AND PRIOR ART
The onset of multi-core processors appeared as a golden
opportunity for the embedded systems industry to improve
efficiency of embedded computers. Multicore processors
carry several benefits over single core ones, bringing
more computational power through parallelization without
increasing chip’s internal frequency, and without increased
energy consumption or increased heating. They now pervade
cellular communication devices and embedded electronics for
mass-market, for example, and many other industries are now
This effort has been funded in part by SAFRAN and by the National Science
Foundation, Grants CNS 1544332 and 1446758.
taking advantage of such processors, such as the automotive
industry [1], the biotechnology industry [2] and the circuit
industry [3]. However, as far as critical systems are concerned,
these benefits come with great certification challenges [4]
[5], since parallel applications on a multi-core processor may
interfere. The aerospace industry is yet undertaking to take
up this challenge [6].
A reconfigurable multi-core architecture that could host
safety critical applications, e.g. [7], [8], [9], can become an
example of a safe multi-core processor by taking advantage
of the inherent redundancy of such processors that enables
graceful degradation [10]: when some core fails, we can
use the multiple remaining ones by reallocating affected
applications to a healthy area of the chip.
The inherent redundancy in such parallel architecture can
also be seen as an opportunity to increase the reliability of
computing systems, be it in safety critical embedded systems
or for computing centers requiring guaranties of continuity of
service.
For example, several attempts have been made to increase
the reliability of safety-critical systems using multi-core
processors. In [11], an “hypervisor” is used to organize access
to shared resources for applications, including safety-critical
ones. However, a failure of this hypervisor is not taken
into account in this patent. Therefore, such technique just
moves the problem since the whole reliability is carried by
the reallocation decision organ, which constitutes a single
point of failure: the most complex and efficient reallocator is
pointless if the system it executes on fails. In [12], backup
allocations are pre-calculated for each failure case and they
are stored by individual Computational Units (CUs). For
ar
X
iv
:1
91
0.
06
31
3v
2 
 [c
s.D
C]
  1
9 N
ov
 20
19
small architecture with only a few CUs, this solution is
satisfactory and ensures a continuous fault tolerance of the
system without requiring a centralized allocator. However,
storing backup configuration can require a lot of memory
when the architecture becomes bigger. Also, the proposed
approach does not consider application that can themselves
be parallelized and executed on several CUss at the same time.
Our approach differs from these two solutions by providing
an on-line and decentralized reallocation algorithm for a
general architecture that can be represented by a graph and
for parallelized applications requiring several CUs to execute.
Even though this work is motivated by a multi-core
architecture, it presents a decentralized task allocation
algorithm for an abstract parallel computing architecture
made of a set of CUs connected together and forming a
network. Such an architecture can represent for example
a multi-core processor, with each CU standing for one
core, a cluster of high-performance computers, or a team
of mobile robots. The aim of the algorithm is to find the
optimal allocation of an a priori defined set of tasks on the
architecture while taking into account the faults affecting the
CUs. The faults are assumed to be detected by the algorithm
when they occur either via a timeout mechanism or a voter,
but this work does not provide details of those fault detection
mechanisms. As described later, two types of fault will be
considered, the first one completely stopping the operation of
the CU, and a second one considered to modify the computed
output of the CU.
The second main feature of this work is the decentralized
aspect of the allocation process. Decentralized means here
that there is no central element deciding alone of the
allocation for the rest of the architecture. Instead, we use
redundant copies of the allocation algorithm executed on the
architecture itself, meaning that the copies must reallocate
themselves. This is achieved by using majority voting systems.
This work also presents an experimental setup reproducing
several aspects of a parallel computing architecture and used
to implement the proposed decentralized allocation algorithm.
The setup uses a network of Raspberry Pi single board
computers [18] to represent the CUs of the architecture.
II. THEORETICAL ASPECT
A. Mathematical description of the allocation problem
This section describes the mathematical formulation of
the general allocation problem that is considered in this
work. The idea is to use this mathematical formulation in
an Integer Linear Program (ILP), whose solution is the best
allocation of the tasks on the parallel computing platform
(multi-core processors, network of computers in a computing
center, etc), according to criteria described in Section III-C2,
taking into account the number of applications running,
their priority, the number of reallocated applications and the
length of communication paths between allocators and other
applications.
The considered parallel computing platform is represented
by a directed simple graph G = (V,E), where V is the set of
vertices and E ⊆ {(x, y) ∈ V 2 | x 6= y} is the set of edges
[13]. Each vertex of G represents a CU, for example one core
in a multi-core processor or at a different scale, one computer
in a massively parallel supercomputer, and each edge of G
represents a physical communication link between two CUs.
The communication links are considered bidirectional, and
therefore the orientation of edges can be chosen arbitrarily: we
choose them to be oriented only to write more conveniently
further constraints on the communication flow.
The graph G therefore represents the topology of the
platform. For example, the platform can have a simple square
mesh topology, as represented in Fig. 1.
Figure 1: Example of square mesh topology. Orientation of edges are
arbitrary.
From G, we define parameters that will be used later in this
work.
Definition 1. NCUs is defined as the number of CUs in the
computing platform, that is the number of vertices of G.
Definition 2. Npaths is defined as the number of Physical
Communication Links, or physical paths, in the platform, that
is the number of edges of G.
Let Napp ∈ N and A = {appk, k ∈ J1, NappK} be a set
of applications to be executed on the parallel computing
platform. The applications in A are ranked by priority, app1
having the highest priority and appNapp having the lowest one.
The ranking is established a priori and represents the tolerated
order in which we stop applications in case of computing
resource failures. In the context of a commercial aircraft, an
example of such applications with different priority would
the engine controller, with the highest priority, and a health
monitoring application, with a lower priority, which is in
charge of analyzing data from the engine in order to estimate
its wear and to predict when maintenance operations are
required. In case of computing resource failures, it would
be tolerated in this context to stop the health monitoring
application in order to maintain the execution of the engine
controller.
For k ∈ J1, NappK, we assume that the compiler for the
considered architecture decomposes the application appk
into a undirected simple graph Gk = (Vk, Ek), where each
vertex, that we will call Application Node, represents a
sub-task of appk that must be executed by a CU, and each
edge represents a required communication link between two
Application Nodes, that we will call an Application Link. Fig.
2 gives an example of such application graphs.
Figure 2: Example of application graphs. Each application node is identified
with a unique index.
app1 has highest priority, app3 has the lowest.
From each graph Gk for k ∈ J1, NappK, we define the
following parameters.
Definition 3. Nknodes is defined as the number of Application
Nodes in application k and Nklinks is defined as the number of
Application Links in application k
Definition 4. Nnodes :=
∑Napps
k=1 N
k
nodes is the total number of
Application Nodes, and Nlinks :=
∑Napps
k=1 N
k
links is the total
number of Application Links.
Each application node is given a global index j ∈ J1, NnodesK
with the following procedure: the nodes of app1 keep the
same indices as in the local numbering of vertices in G1
; then the global indices for nodes of app2 are obtained
by increasing their local indices by N1nodes ; and so on for
the nodes of appk, by increasing the local numbering by∑k−1
l=1 N
l
nodes. The result of the global numbering of the
nodes can be seen on Fig. 2. An identical process is applied
to obtain a global numbering of the edges of the application
graphs.
The problem that we tackle here is to assign applications to
CUs of the architecture while faults affect some CUs, taking
into account the priority of the applications and specific
constraints of the architecture. A solution will look like Fig.
3. The approach that we take here to solve the problem is to
formulate the allocation problem as Integer Linear Program
(ILP) and use a state-of-the-art IP solver such as “GNU
Linear Programming Kit” (GLPK) [14].
An additional aspect of the problem that we propose to
solve is to make the allocation process decentralized, in the
Figure 3: Example of a solution with a fault on CU 11.
sense detailed in the introduction and in Section IV, with no
central computing element allocating the tasks according to
the solution of the ILP problem. The way this decentralized
allocation is achieved is specifically described in Section IV:
it involves several copies of the task computing the allocation
and being executed on the platform itself. The number of such
copies is the last parameter of our problem.
Definition 5. Nrealloc is defined as the number of copies of the
Allocator Application.
The next section details how the allocation problem is
formulated as an ILP problem.
III. ILP FORMULATION OF THE TASK ALLOCATION
PROBLEM
A. Matrix representation of graphs
Definition 6. From the graph representation G = (V,E) of
the parallel computing platform, the NCUs × Npaths incidence
matrix G associated with G is defined as:
[G]ij :=

−1 if ej ∈ E leaves vi ∈ V
1 if ej ∈ E enters vi ∈ V
0 otherwise
And the NCUs × Npaths NoC unoriented incidence matrix Gˆ
associated with G is defined as:
[Gˆ]ij := |[G]ij |
Definition 7. From the graph Gk = (Vk, Ek) representing
the k-th application, the Nknodes×Nklinks application unoriented
incidence matrix Hk associated with Gk is defined as:
[H]kij :=
{
1 if vki ∈ Vk and ekj ∈ Ek are incident
0 otherwise
Furthermore, the Nnodes×Nlinks overall application unoriented
incidence diagonal block-matrix H is defined as
H :=
H
1
. . .
Hk

B. Definition of the Decision Variables
Definition 8. The NCUs×Nnodes decision matrix XCUs→nodes,
mapping Application Nodes to CUs, is defined as:
XCUs→nodesij =
1 if the CU i is allocated to theApplication Node j
0 otherwise
Definition 9. The Npaths ×Nlinks decision matrix Xpaths→links,
mapping Application Links to Physical Links, is defined as:
Xpaths→linksij =
1 if the Physical Link i is allo-cated to the Application Link j
0 otherwise
Definition 10. The Napps × 1 decision vector r, representing
which applications are executed, is defined as:
ri =
{
1 if the application i is running
0 if it is dropped
Definition 11. The Nnodes×1 decision vector M , representing
which application nodes are reallocated, is defined as:
Mi =
1 if the Application Node i is movedfrom its previously allocated CU
0 otherwise
Definition 12. For k ∈ J1, NappK, the Npaths ×NCUs decision
matrix XComm, k, representing communication paths between
the k-th allocator application and every CU of the platform,
is defined as:
XComm, kij =

−1 if the Physical Link i is used
to communicate between the al-
locator k and the CU j in the
negative direction
1 if the Physical Link i is used
to communicate between the al-
locator k and the CU j in the
positive direction
0 otherwise
Positive (respectively negative) direction means that the com-
munication takes place in the same (respectively opposite)
direction as the edge of the directed graph G, as in Fig. 1.
C. Formulation of the optimization model
This section gives the detail of the formulation of the
optimization problem that is solved each time a new fault is
detected. This formulation includes the detail of the chosen
objective function and constraints.
1) General form of the optimization model: The allocation
problem is formulated as an Integer Linear Program (ILP) of
the form [15]:
maximize f(x) = cTx
subject to M1x ≤ b1
M2x = b2
and x is a vector of integers.
(1)
x is the global vector of decision variables derived from
the vectorization and the aggregation of the decision matrices
from Section III-B. We define it formally as:
x =

vec(XCUs→nodes)
vec(Xpaths→links)
r
M
vec(XComm, 1)
...
vec(XComm, Nrealloc)

(2)
where vec is the common vectorization function for matrices:
∀ Q = (qi,j)1≤i≤m, 1≤j≤n,
vec(Q) = [q1,1, . . . , qm,1, q1,2, . . . , qm,2, . . . , q1,n, . . . , qm,n]
T.
c is the coefficients of the objective function and M1, M2,
b1 and b2 are parameters derived from the aggregation of the
constraints of the problem that are described in the following
sections. For example, for each scalar inequality constraint,
after arranging the inequality with all decision variables on
the left-hand side in the same order as in x and constant
terms on the right-hand side, a row containing the coefficients
of the decision variables is added to M1 and the constant
term is added in the vector b1. The same is done for equality
constraints to build M2 and b2.
2) Objective function: Given the priority of the applications
in an ascending order i.e. the first application has the highest
priority and the Napps-th application has the lowest one, the
objective function is used in order to maximize the number
of executed applications while minimizing the number of
reallocations and the length of communication paths. The
chosen objective function, in terms of x as defined above in
equation 2, is:
max
{
f(x) =
Napps∑
k=1
αk · rk − (β + 1)
Nnodes∑
j=1
Mj
−
Nrealloc∑
k=1
NCUs∑
j=1
Npaths∑
i=1
∣∣∣XComm, kij ∣∣∣
}
,
(3)
where
β = Nrealloc ×NCUs ×Npaths
αNapps = (β + 1)×Nnodes + β + 1
and ∀k < Napps :
αk =
Napps∑
l=k+1
αl + (β + 1)×Nnodes + β + 1.
(4)
The coefficients of the objective function are chosen to
prioritize the different aspects that are optimized in this
function.
1) The first priority is to execute each application, even if
it means more reallocations and longer communication
paths.
2) Then, minimizing the number of reallocations is more
important than having shorter communication paths,
since a reallocation temporarily interrupts the execution
of the allocation.
3) When running all applications is not feasible, the
priorities of the applications are enforced and executing
any given application is more important than running
any number of applications with a lower priority.
However, if because of its geometry, a given application
cannot be executed anyway, nothing prevents lower-
priority applications from being executed.
These requirements motivated the choice for the coefficients
in the objective function. The proof that these coefficients
allow the objective function to meet these requirements is
given in Appendix.
Note that the problem of minimizing or maximizing the
absolute value of the XComm, kij variables, which is a nonlinear
program, can be reformulated as a linear program by intro-
ducing additional variables and constraints [15], that were
not presented in the previous section for conciseness. For
each entry XComm, kij of X
Comm, an auxiliary variable XˆComm, kij
is introduced to represent its absolute value, and two extra
constraints are added:
+XComm, kij ≤ XˆComm, kij ,
−XComm, kij ≤ XˆComm, kij .
XˆComm, kij is then used instead of
∣∣∣XComm, kij ∣∣∣ in the objective
function. Because the objective function tends to maximize
−
∣∣∣XComm, kij ∣∣∣, so to minimize XˆComm, kij , one of the two
previous constraints will be binding, the stricter one,
where the left-hand side is the greatest and equal to
max(+XComm, kij ,−XComm, kij ), which is exactly
∣∣∣XComm, kij ∣∣∣.
The other constraint will be non-binding and therefore does
not affect the optimal point. It thus ensures that XˆComm, kij is
equal to
∣∣∣XComm, kij ∣∣∣.
3) Constraints:
a) Domain of decision variables: The decision variables
XCUs→nodes, Xpaths→links, r and M are binary i.e. the value of
their entries must be either 0 or 1.
The entries of XComm, k for k ∈ J1, NreallocK must belong
to {−1, 0, 1}.
b) Resource allocation and partitioning: Several
equations express the constraints of allocating the resources
of the CUs to applications while enforcing partitioning on the
platform.
• Each CU can be allocated to at most one application, as
a way to enforce spatial partitioning of applications on
the platform, i.e.
∀i ∈ J1, NCUsK, Nnodes∑
j=1
XCUs→nodesij ≤ 1. (5)
• Each running Application Node must be assigned to
exactly one CU, i.e.
∀i ∈ J1, NnodesK, NCUs∑
j=1
XCUs→nodesji = rN(i). (6)
N(i) is the application number corresponding to
Application Node i.
• A physical communication link of the platform can be
allocated to at most one Application Link1, i.e.
∀i ∈ J1, NpathsK, Nlinks∑
j=1
Xpaths→linksij ≤ 1. (7)
• Each running Application Link must be assigned to
exactly one physical communication link of the platform,
i.e.
∀i ∈ J1, NlinksK, Npaths∑
j=1
Xpaths→linksji = rL(i). (8)
L(i) is the application number corresponding to
Application Link i.
c) Compliance with the platform: An Application link,
adjacent to an Application Node that has been mapped to
a given CU, must be allocated to a Physical Link that is
adjacent to that CU, i.e.
XCUs→nodes H = Gˆ Xpaths→links. (9)
1This does not mean that this communication link cannot be used for other
communication purposes on the architecture, but only one of the Application
Link computed by the compiler for the applications can be allocated to that
physical communication link.
This equation (9) is equivalent to the scalar equations
(III-C3c):
∀i ∈ J1, NCUsK, ∀j ∈ J1, NlinksK,
Nnodes∑
k=1
XCUs→nodesik Hkj =
Npaths∑
l=1
Gˆil X
paths→links
lj . (10)
The left-hand side is equal to one if and only if the CU i has
been allocated to Application Node k and Application Node
k is adjacent to Application Link j. The right-hand side is
equal to one if and only if the CU i is adjacent to the Physical
Link l and the Physical Link l is allocated to Application
Link j, which proves the correctness of the constraint.
d) Reallocating several applications: A given Applica-
tion Node can either remain affected to the same CU, either
be moved or be dropped:
∀i ∈ J1, NCUsK, ∀j ∈ J1, NnodesK, s.t. XCUs→nodesold ij = 1,(
1− rN(j)
)
+Mj +X
CUs→nodes
ij = X
CUs→nodes
old ij ,
(11)
with XCUs→appsold be the parameter containing the mapping
between CUs and Application Nodes computed during the
previous allocation.
This constraint (11) is ignored for the initial allocation.
e) Faults: We assume the parallel computing platform is
equipped a fault detection system that can detect and inform
the allocators when a CU fails. From this information, we
can add constraints to take into accounts fault in the platform.
Within a CU i:
• If the CU is healthy, any Application Node can be mapped
on the CU.
• If the CU is faulty, then no Application Nodes can be
mapped on the CU i:
Nnodes∑
k=1
XCU→appsik = 0. (12)
The detection of this fault is either assumed for the model
or detected by the voter using the majority rule described in
V-A2.
f) Communication constraints: Since the allocators are
executed on the platform, we must ensure that they will be
able to send the allocation they computed to the other CUs
of the platform, given the communication links that allows
each CU to send a message only through its neighbors.
Therefore, we must make sure that there exists a path from
each allocator to the other CUs.
∀k ∈ J1, NreallocK, G XComm, k = Sk (13)
where Sk is the NCUs×NCUs source-sink matrix Sk, depending
on XCUs→nodes and defined by:
[S]kij :=

0 if deg(vi) = 0 in G
0 if CU i is faulty
−XCUs→nodesi node of alloc(k) + [INCUs ]ij otherwise
.
where the degree of a vertex deg(vi) is the number of edges
connected to it, and node of alloc(k) is the Application
Node corresponding to allocator k. When the CU is neither
faulty nor without any neighbor, in each path between an
allocator and a given CU i, the allocator is the source (-1)
and the CU i is the sink (+1).
g) Constraints specific to the architecture: Additional
constraints can be added to respect specific aspects of the
considered architecture.
For example, some multi-core architectures [7], where
intra-application communication between CU can happen
only in a specific way as illustrated in Fig. 4, orientation of
the applications on the architecture matters because nodes
that can communicate in a given orientation will not be able
to do so if they are rotated on the architecture. Therefore
the orientation as computed by the compiler must be enforced.
To ensure correct orientation of applications, another
set of constraints is also needed. In order to enforce this,
the numbering of the CUs on the platform is used. For
example, as illustrated in Fig. 4, a CU has always a number
difference of −1 with its right neighbor and +Nrow with its
top neighbor, where Nrow is the number of Tiles per row of
the NoC (Nrow = 4 in our example).
The difference between the numbers of the contiguous
pairs of CUs allocated to an application must match the
orientation computed by the compiler.
Let jk be the index of the top-left node of the k-th
application:
∀i ∈ J1, NCUsK, ∀k ∈ J1, NappsK,
XCUs→nodesijk = X
CUs→nodes
(i+1)(jk+1)
XCUs→nodesijk = X
CUs→nodes
(i+Nrow)(jk+Nkrow)
.
(14)
where Nkrow is the number of nodes per row of the k-th
application.
IV. DECENTRALIZATION OF THE ALLOCATION SYSTEM
In this paper, we use the word decentralized to qualify
a system where no single CU has control over all the
other ones in the parallel architecture: there is no central
CU whose failure jeopardizes the operation of the whole
parallel architecture. In safety words, this means that no CU
constitutes a single point of failure.
Figure 4: By equating the difference between two CUs’ indices allocated to
an application to a specific number, the spatial orientation of the application
can be enforced.
We focus here on CUs, but there are other elements that
may be a single point of failure and that we do not take
into account in this work. For example, electrical power may
be provided by one unique and central power supply unit,
which is an obvious single point of failure if not designed
carefully. To mitigate the effect of other single point failures,
methods for safety assessment process may be conducted [16].
A. N-modular redundancy and majority voting system
To develop a decentralized allocation system for the
considered parallel computing platform, we chose to use the
concept of N-modular redundancy with a majority voting
system [17].
In this approach, Nrealloc is an odd number greater or equal
to 3, and Nrealloc copies of the same sub-tasks are executed in
parallel. Nrealloc is taken odd to avoid the case where equal
number of copies agree on two different results. The copies
are fed with the same inputs and their outputs are then sent
to a majority voting system. As illustrated in Fig. 5, the
voting system compares the outputs of the redundant copies
and filters them: only the result that has been computed by
the majority of the redundant copies will be transmitted, i.e.
the result computed by at least Nrealloc+12 redundant copies.
The voting system is also used to report the failure of the
redundant copies that do not match the majority result.
B. Decentralized implementation
The proposed idea to decentralize the allocation system is
to execute Nrealloc modular redundant copies of the allocator
application on the architecture itself, with a voting system
implemented on each CU. For further examples, Nrealloc will
be taken equal to 3.
In normal conditions, the Nrealloc copies of the allocator
compute the same allocation, since they solve an identical
ILP problem, with same inputs and constraints, and because
Figure 5: Illustration of the voting process with 3 redundant copies.
GLPK is a deterministic solver. This allocation is then
broadcast to every CU, including the ones executing the
allocators.
If a CU not running an allocator fails, all 3 allocators
compute the same new allocation, in which the affected
application is assigned to a new CU, according the algorithm
described previously in Section III. This new allocation is
then broadcast and received by all CUs. Since the 3 signals
that the CUs receive are coherent, they all comply with it and
therefore, the affected application is reallocated.
On the other hand, as illustrated in Fig. 6, if a CU that was
running a copy of the allocator is affected by a fault, the 2
other ones will compute the same new allocation where the
affected copy is assigned to a new healthy CU. Regardless
of what the faulty allocator computes, only the two coherent
allocation sent by the two healthy allocators will be taken
into account by the CUs, and the faulty allocator will be
reallocated.
V. PRACTICAL EXAMPLE
This section describes the experimental setup that is used
as a representation of the parallel computing platform as well
as the result of reallocating safety-critical applications using
the previously-described optimization problem.
A. Representation of a parallel computing platform
1) Hardware components: To illustrate and demonstrate
the capabilities of the new formulation of the allocation
algorithm in operational conditions, we choose to implement
it on a cluster of single-board computer, Raspberry Pi [18], in
order to control and maintain operation of a physical system
despite the presence of faults.
a) platform description: In this setup, a cluster of
parallel CUs of 4 × 4 units is replicated with a network of
16 Raspberry Pi computers. All of them are connected to
a common routing switch in a local area network (LAN).
Although the use of this common routing switch is a single
point failure, it serves a purpose of visualizing that these
Raspberry Pi computers are grouped as a single parallel
CU. Also, for simplicity of visualization, the network is
(a) Layout of the allocators on the computing architecture.
(b) Information flow between allocators and CUs.
The correct allocation includes the instruction for some node i to run
allocator 3.
Figure 6: Fault affecting a CU running an allocator.
considered to be a square mesh, instead of a toroidal mesh.
One alternative of using a wired LAN network is to use a
routing protocol for multi-hop mobile ad hoc network such
as [19]. This ad-hoc network is implemented on a data link
layer, which allows the data transportation protocol operate in
a wireless and decentralized fashion as if there is a common
routing switch.
The goal of this parallel computing platform is to show the
possibility to decentralize the allocation process; therefore,
there is no central computing unit outside the network and
three copies of the allocator are executed on the network, as
described in Section III.
b) Faults: Two types of faults are considered in
this experiment. The first type is computational fault,
which randomly affect the computations performed by
the Raspberry Pi. We detect this kind of fault by using
redundant copies of the considered application combined
with a voting system that is described below in section V-A2.
The second type of fault is assumed to stop the operation
of the computing unit it affects. We also assume that this
fault can be detected by the network. In practice, each time
one of these faults affects a Raspberry Pi, the status signal
sent by this Raspberry Pi to the allocators is changed to
a signal identifying it as faulty. Each of these two kinds
of fault can be manually triggered or recovered thanks to
a breadboard as seen in Fig. 7 connected to each Raspberry Pi.
Figure 7: Hardware associated with each Raspberry Pi Tile.
The RGB-LED (bottom-left corner) represents the LED application. The red
LED (right side) indicates an healthy Tile when turned on. Each switch is
used to trigger one type of fault.
c) Controlled system: The physical system we chose to
control with this parallel computing platform is a propulsion
system, made of an electric fan mounted on a thrust stand. The
fan is commanded by using Pulse width modulation (PWM).
The measure of the thrust is used by a simple proportional
controller executed as a safety-critical application on the
platform in order to compute the value of the PWM command
required to maintain the thrust at a constant value.
Figure 8: Electric fan mounted on the thrust stand.
The delivered thrust is measured thanks to a load cell on the stand,
indicated by the orange circle.
An extra Raspberry Pi is used as the micro-controller of
the fan: it converts the value measured by the load cell, sends
it to the Controllers where the appropriate control value
is computed, and generates the corresponding PWM signal
controlling the fan. It must therefore be noted that although
the same hardware representation is used, this Raspberry Pi
does not correspond to the same components as the ones used
for the CUs of the platform.
2) Software components: Even if a controller is reallocated
to healthy Tiles when it is affected by a fault, because
of the time required to compute the new allocation and
to actually reallocate the set of tasks, the operation of the
fan may be temporarily altered during the reallocation process.
To avoid interruptions in the operation of the fan during
reallocations, we also use a standard Triple Modular
Redundancy (TMR) architecture [17]. Three copies of the
controller are executed on the parallel computing platform.
Each one separately computes the duty-cycle value of the
PWM signal that should be sent to the fan, given the thrust
value that they all receive from the sensor. The three values
are sent to the Raspberry Pi representing the micro-controller
of the fan, where a voting system decides which control
output should be used. The vote outputs the result that has
been computed by the majority of the controllers, in this
case 2 out of 3. Signals are here considered equal if their
difference is smaller than a given tolerance. In the case of a
fault affecting the output of one of the controllers, the two
remaining healthy controllers ensure that the correct value
is sent to the fan. The voting system also identifies which
controller is not coherent with the two others and informs
the allocators of the fault. The reallocation process that we
implemented can then take place while providing continuity
of service with the two healthy controllers. To complicate
the reallocation tasks, each copy of the controller has been
arbitrarily attributed to 2 Application Nodes. Concretely, only
one of them is responsible of actual computations.
Three copies of the allocator execute the allocation
algorithm itself. They have second rank priority immediately
below the controllers, which represent the safety-critical
application in this case. Giving the allocators only the second
rank in the priority list can be justified when considering
the case where only a controller or an allocator can be
executed on the platform: the resource must be allocated to
the safety-critical application, in this case the controller, that
maintains the operation of the system, whereas the allocator
is only a protection against further faults, but cannot alone
ensure operation of the controlled system.
In addition to these six applications, one dummy application
is considered in this experiment: it occupies 2 Tiles of the
Fabric, but does not perform actual computation except
changing the voltage in the RGB LED to display its
corresponding color. It has the lowest priority.
Figure 9 sums up the list of considered applications for
the experiment, their relative priority and the resources they
require in terms of number of CUs. The initial allocation of
these applications on the model is given in Fig. 10.
B. Results
Starting from the initial allocation given in Fig. 11, faults
are triggered on the model. After each fault, the allocators
detect the faulty Raspberry Pi and compute a new allocation
that is then broadcast on the network. They maintain the
Figure 9: Considered applications for the experiment, their priority and the
number of Tiles they require.
Figure 10: Initial allocation of the applications on the model.
The orange circle identifies the extra Raspberry Pi for interactions with the
stand.
execution of the safety-critical application as long as enough
resources are available for it.
CUs surrounded by faulty neighbors are isolated from the
rest of the platform and cannot communicate. As enforced
by the communication constraints described in paragraph
III-C3f, such a CU is not given any task to execute and is as
good as faulty, as seen in Fig. 11a.
When a CU recovers from a fault, an application can be
allocated back to it as seen in Fig. 11b. Applications are
dropped according to their priority when more computing
units become faulty. However, as illustrated in Fig. 11d, when
no space is available for all 1st priority applications, lower
priority ones are still allowed to be executed.
The voting system implemented on the fan needs at least
two functioning and coherent controllers to run the fan (Fig.
11d), as previously explained in section V-A2: in case the
signals received from the controllers are incoherent, it decides
not to trust any of them and the engine stops, as in Fig. 11e.
Since the controllers have the highest priority, they are
the last remaining applications to be executed in Fig. 11g.
After this step, further faults will affect the controllers but no
reallocation can happen because no more allocator is executed.
VI. CONCLUSION
This work presented a decentralized allocation algorithm
for parallel computing architectures, where individual
Computational Units can be affected by faults. The described
method consisted in representing the architecture by an
abstract graph and formulating the allocation problem as
an optimization problem, with the form of a Integer Linear
Program.
Decentralizing the allocation process has been achieved
through redundancy of the allocator executed on the
architecture. That way, no centralized element decides of the
allocation of the entire architecture.
An experimental reproduction of a parallel computing
architecture has also been built. It has been used to
demonstrate the capabilities of the proposed allocation
process to maintain operation of a physical system in a
decentralized way while individual component fail.
The proposed work assumed that faults affecting the
Computational Units of the architecture were automatically
detected by the allocation algorithm, so that it is able
to compute a new allocation every time a fault affects
a Computational Unit. This work can be improved by
defining a more precise model of the considered faults and
a method to detect them. One first approach to identify
dead Computational Unit would be a simple heartbeat that
each would send to the allocators. A CU not sending its
heartbeat would be considered faulty. One challenge to
tackle in this approach is the fact that the allocators do not
have a fixed position in the architecture, and therefore, the
heartbeat of each CU would have to be broadcast through the
entire architecture to be sure to reach all allocators. Another
solution would be to include the position of the allocator
in the allocation message that they broadcast, so that the
CU know where to send back their heartbeat. In both cases,
the amount communication packets transmitted through the
architecture drastically increases.
The second lead for improvement is the way communication
isolation are taken into account in the allocation problem.
For now, only individual nodes with all of their neighbors
being faulty were considered isolated. However, an entire
area of the architecture, can be isolated from the allocator.
The problem becomes quite tricky when the architecture is
split in two halves that are isolated one from the other: a
decision must be made to decide which area is isolated from
the other. It seems that the area with the highest number of
allocators should be privileged, since they are the ones that
will send the new allocation to other CU, and therefore the
ones in the other isolated area will not be able to receive this
new allocation. They should therefore be considered as lost
CUs.
Also, it should be considered that not only the
Computational Units can fails, but also the communication
links between them. The effect of such faults would be
the same as isolating the Computational Units from their
neighbors and would make more of them unavailable. It would
also change the communication paths usable to connect the
allocators to other applications and would affect their position
since minimizing these paths is a part of the optimization
problem.
APPENDIX
Coefficients of the objective function
This appendix provides the proof that the coefficients in
the objective function from equation 3 allow to meet the
requirements stated in Section III-C. For convenience, this
objective function is rewritten here:
max
{
f(x) =
Napps∑
k=1
αk · rk − (β + 1)
Nnodes∑
j=1
Mj
−
Nrealloc∑
k=1
NCUs∑
j=1
Npaths∑
i=1
∣∣∣XComm, kij ∣∣∣
}
,
(3)
where
β = Nrealloc ×NCUs ×Npaths
αNapps = (β + 1)×Nnodes + β + 1
and ∀k < Napps :
αk =
Napps∑
l=k+1
αl + (β + 1)×Nnodes + β + 1.
(4)
The requirements of Section III-C are also rewritten below.
1) When solving the optimization problem, the objective
function 3 privileges executing any given application,
even if it implies more reallocations and longer com-
munication paths.
2) When solving the optimization problem, the objective
function 3 privileges minimizing the number of reallo-
cations, even if it implies longer communication paths.
3) When solving the optimization problem, the objective
function 3 privileges executing a given application
compared to running any number of applications with
a lower priority.
The following theorems prove that these requirements are
met.
Theorem 1. ∀ k˜ ∈ J1, NappsK :
αk˜ > (β + 1)
Nnodes∑
j=1
1 +
Nrealloc∑
k=1
NCUs∑
j=1
Npaths∑
i=1
1,
that is, the contribution to the value of the objective function
for executing application appk˜ is greater than the maximum
contribution for reducing the number of reallocations and the
length of the communication paths.
Proof. ∀ k˜ ∈ J1, NappsK :
αk˜ ≥ (β + 1)×Nnodes + β,
by definition of αk˜.
Now,
(β + 1)
Nnodes∑
j=1
1 +
Nrealloc∑
k=1
NCUs∑
j=1
Npaths∑
i=1
1 = (β + 1)×Nnodes + β.
So
αk˜ > (β + 1)
Nnodes∑
j=1
1 +
Nrealloc∑
k=1
NCUs∑
j=1
Npaths∑
i=1
1.
Theorem 1 proves that requirement 1 is met.
Theorem 2.
(β + 1)× 1 >
Nrealloc∑
k=1
NCUs∑
j=1
Npaths∑
i=1
1,
that is, the contribution to the value of the objective function
for not reallocating one Application node is greater then the
maximum contribution for reducing the length of communica-
tion paths.
Proof.
β + 1 > β = Nrealloc ×NCUs ×Npaths =
Nrealloc∑
k=1
NCUs∑
j=1
Npaths∑
i=1
1
.
Theorem 2 proves that requirement 2 is met.
Theorem 3. ∀ k˜ ∈ J1, Napps − 1K :
αk˜ >
Napps∑
l=k˜+1
αl
that is, the contribution to the value of the objective function
for executing application appk˜ is greater than the contribution
for executing every applications with lower priority than appk˜,
which are appk˜+1 to appNapps .
Proof. ∀ k˜ ∈ J1, NappsK :
αk˜ =
Napps∑
l=k˜+1
αl + (β + 1)×Nnodes + β + 1 >
Napps∑
l=k˜+1
αl
since (β + 1)×Nnodes + β + 1 > 0.
Theorem 3 proves that requirement 3 is met.
REFERENCES
[1] A. Monot, N. Navet, B. Bavoux, and F. Simonot-Lion, “Multisource
software on multicore automotive ECUs—combining runnable sequenc-
ing with task scheduling,” IEEE Transactions on Industrial Electronics,
vol. 59, no. 10, pp. 3934–3942, Oct 2012.
[2] N. Neves, N. Sebastia˜o, D. Matos, P. Toma´s, P. Flores, and N. Roma,
“Multicore SIMD ASIP for next-generation sequencing and alignment
biochip platforms,” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 23, no. 7, pp. 1287–1300, July 2015.
[3] Y. Lu, H. Zhou, L. Shang, and X. Zeng, “Multicore parallelization of
min-cost flow for CAD applications,” IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, vol. 29, no. 10, pp.
1546–1557, Oct 2010.
[4] J. Nowotsch and M. Paulitsch, “Leveraging multi-core computing archi-
tectures in avionics,” in Dependable Computing Conference (EDCC),
2012 Ninth European. IEEE, 2012, pp. 132–143.
[5] F. Reichenbach and A. Wold, “Multi-core technology–next evolution
step in safety critical systems for industrial applications?” in Digital
System Design: Architectures, Methods and Tools (DSD), 2010 13th
Euromicro Conference on. IEEE, 2010, pp. 339–346.
[6] U. Durak and F. Bapp, “Introduction to special issue on multi-core
architectures in avionics systems,” 2019.
[7] M. Alle, K. Varadarajan, A. Fell, C. R. Reddy, J. Nimmy, S. Das,
P. Biswas, J. Chetia, A. Rao, S. K. Nandy, and R. Narayan, “REDE-
FINE: runtime reconfigurable polymorphic ASIC,” ACM Transactions
on Embedded Computing Systems, vol. 9, no. 2, 2009.
[8] M. A. Watkins and D. H. Albonesi, “Remap: A reconfigurable het-
erogeneous multicore architecture,” in 2010 43rd Annual IEEE/ACM
International Symposium on Microarchitecture, Dec 2010, pp. 497–508.
[9] R. Gerard, “Network on chip (noc) for many-core system on chip in
space applications,” Dec 2017.
[10] L. M. Kinnan, “Use of multicore processors in avionics systems and its
potential impact on implementation and certification,” in Digital Avionics
Systems Conference, 2009. DASC’09. IEEE/AIAA 28th. IEEE, 2009,
pp. 1–E.
[11] “Symmetric multi-processor arrangement, safety critical system, and
method therefor,” Patent US 2015/0 254 123 A1, Sep. 10, 2015.
[12] M. Oriol, T. Gamer, T. de Gooijer, M. Wahler, and E. Ferranti, “Fault-
tolerant fault tolerance for component-based automation systems,” in
Proceedings of the 4th international ACM Sigsoft symposium on Archi-
tecting critical systems, June 2013, pp. 49–58.
[13] M. Mesbahi and M. Egerstedt, Graph theoretic methods in multiagent
networks. Princeton University Press, 2010, vol. 33.
[14] “GLPK reference manual,” GNU Linear Programming Kit, 2012, https:
//www.gnu.org/software/glpk/TOCdocumentation.
[15] F. S. Hillier and G. J. Lieberman, Introduction to Operations Research,
10th ed. New York, NY, USA: McGraw-Hill, 2015.
[16] A. SAE, “Guidelines and methods for conducting the safety assessment
process on civil airborne systems and equipment,” London-UK: SAE,
1996.
[17] C. M. K. Israel Koren, Fault tolerant systems. Morgan Kaufmann
Publishers, 2007.
[18] R. P. Foundation, “Raspberry Pi 3 Model B+,” https://www.raspberrypi.
org/products/raspberry-pi-3-model-b-plus/, Accessed June 2018.
[19] A. Neumann, C. Aichele, M. Lindner, and S. Wunderlich, “Better
approach to mobile ad-hoc networking (batman),” IETF draft, pp. 1–
24, 2008.
(a) After 4 faults, Application 3 has to be dropped. The isolated Tile
cannot be used.
(b) When a Tile recovers from a fault, an allocation can be
reallocated to it.
(c) After more faults, Application 3 and a controller have to be
dropped by lack of resources.
(d) One controller is dropped. The 3 controllers can still be executed,
even if all 1st priority applications are not.
(e) One of the two controllers is affected by a computational fault:
the fan stops since the fan voter does not trust any of the two
incoherent controllers.
(f) The computational faults disappear: the fan starts again.
(g) All allocators have been dropped: no further reallocation is
possible.
Figure 11: Result of the task allocation algorithm. A full video of the demo is available at the link below.
https://gtvault-my.sharepoint.com/:v:/g/personal/lsutter6 gatech edu/EQbz60ttNU1KkzFo0l0tycIBDyboI9KU0SHs4ntq8lPwAw?e=glyKtj
