Message scheduling to reduce AFDX jitter in a mixed NoC/AFDX architecture by Ermont, Jérôme et al.
Message scheduling to reduce AFDX jier
in a mixed NoC/AFDX architecture
Jérôme Ermont
Université de Toulouse/IRIT/Toulouse INP-ENSEEIHT
Toulouse
jerome.ermont@enseeiht.fr
Sandrine Mouysset
Université de Toulouse/IRIT/UPS
Toulouse
sandrine.mouysset@irit.fr
Jean-Luc Scharbarg
Université de Toulouse/IRIT/Toulouse INP-ENSEEIHT
Toulouse
jean-luc.scharbarg@enseeiht.fr
Christian Fraboul
Université de Toulouse/IRIT/Toulouse INP-ENSEEIHT
Toulouse
christian.fraboul@enseeiht.fr
ABSTRACT
Currentavionicsarchitecturearebasedonanavionics fulldu-
plexswitchedEthernetnetwork(AFDX)thatinterconnectsend
systems.AvionicsfunctionsexchangedatathroughVirtualLinks
(VLs),whicharestaticflowswithboundedbandwidth.Thejitterfor
eachVLatAFDXentrancehastobelessthan500µs .Thisconstraint
ismet,thankstoendsystemscheduling.Theinterconnectionof
many-coresbyanAFDXbackboneisenvisionedforfutureavionics
architecture.Theprincipleistodistributeavionicsfunctionson
thesemany-cores.Many-coresarebasedonsimplecoresintercon-
nectedbyaNetwork-on-Chip(NoC).Theallocationoffunctions
ontheavailablecoresaswellasthetransmissionofflowsonthe
NoChastobeperformed insuchawaythatthe jitterforeach
VLatAFDXentranceisstilllessthan500µs .Afirstsolutionhas
beenproposed,whereeach functionmanages the transmission
ofitsVLs.Theideaofthissolutionistodistributefunctionson
eachmany-coreinordertominimizecontentionsforVLswhich
concernfunctionsallocatedondifferentmany-cores.Inthispaper,
weconsiderthatVLtransmissionsaremanagedbyasingletask
ineachmany-core.Wepropose toconstructascheduling table
executedbythistaskusinganIntegerLinearProgram.Theaccess
totheEthernetinterfaceisthenonlyallowedtooneVLleadingto
asignificantreductionofthejitter.
JérômeErmont,SandrineMouysset, Jean-LucScharbarg,andChristian
Fraboul.2018.MessageschedulingtoreduceAFDXjitterinamixedNoC/AFDX
architecture.In26thInternationalConferenceonReal-TimeNetworksand
Systems(RTNS’18),October10–12,2018,Chasseneuil-du-Poitou,France.ACM,
NewYork,NY,USA,9pages.https://doi.org/10.1145/3273905.3273929
1 INTRODUCTION
Aircraftsareequippedwithnumerouselectronicequipment.Some
ofthem,likeflightcontrolandguidancesystems,provideflight
critical functions, while others may provide assistance services
that are not critical to maintain airworthiness. Current avionics
architecture is based on the integration of numerous functions with
different criticality levels into single computing systems (mono-
core processors) [10]. Such an architecture is depicted in Figure 1.
Computing systems are interconnected by an AFDX (Avionics Full
Duplex Switched Ethernet) [1]. The End System (ES) provides an
interface between a processing unit and the network.
Figure 1: An AFDX network.
The continuous need for increased computational power has fu-
eled the on-goingmove to multi-core architectures in hard real-time
systems. However, multi-core architectures are based on complex
hardware mechanisms, such as advanced branch predictors whose
temporal behavior is difficult to master. Many-core architectures
are based on simpler cores interconnected by a Network-on-Chip
(NoC). These cores are more predictable [17]. Thus, many-cores are
promising candidates for avionics architecture.
A typical many-cores architecture provides Ethernet interfaces
and memory controllers. For instance, Tilera Tile64 has 3 Ethernet
interfaces and 4 memory controllers [18], while Kalray MPPA has
8 Ethernet interfaces and 2 memory controllers [8].
An envisioned avionics architecture is depicted in Figure 2. A
set of many-cores are interconnected by an AFDX backbone, lead-
ing to a mixed NoC/AFDX architecture. Avionics functions are
distributed on these many-cores. Communications between two
	







Figure 2: A mixed NoC/AFDX architecture
functions allocated on the same many-cores use the NoC, while
the communications between two functions allocated on different
many-cores use both the NoC and the AFDX. Main constraints on
this communication are the following:
(1) end-to-end transmission delay has to be upper-bounded by
an application defined value,
(2) frame jitter at the ingress of the AFDX network has to be
smaller than a given value (typically 500 µs).
In single core architectures the latter constraint is enforced by
the scheduling implemented in the End System. In many-cores
architectures, frame jitter mainly depends on the delay variation
between the source core and the source Ethernet interface. This
variation is due to two factors. First, the frame can be delayed on
the NoC by other frames transmitted between avionic functions.
Second, the Ethernet controller can be busy, transmitting another
frame. [2] proposes a mapping strategy which minimizes the first
factor, i.e. the variation of this NoC delay. Each core is allocated at
most one function. Each VL is managed by its source function.
In this paper, we mainly address the second factor. We propose
a static scheduling of Ethernet transmissions, based on a table.
Each transmission is allocated a periodic slot. The scheduling is
managed by a dedicated core. The periodic slots are allocated to the
applications in order to guarantee that the access to the Ethernet
interface is allowed to one application. We formulate the mapping
of the applications to the slots as an Integer Linear Program (ILP).
The objectives of the approach are:
(1) to allocate the slots for the transmission of the outgoing
flows within the respect of their periodic constraint ;
(2) to oversample the slots for outgoing flows with period > n
ms, with n = 1, 2, 4, . . .
The remainder of the paper is as follows. Section 2 introduces cur-
rent AFDX and NoC architectures. Section 3 explains the addressed
problem. The new approach is described in Section 4. Section 5
shows the proposed scheduling solution on a case study and give
some results compared to the distributed solution of [2]. Finally,
Section 6 concludes with some future works.
2 SYSTEM MODEL
We summarize the main features of both a AFDX flows and many-
cores considered in this paper.
2.1 AFDX flows
A VL defines a unidirectional connection between one source func-
tion and one ormore destination functions. Each VL is characterized
by two parameters:
• Bandwidth Allocation Gap (BAG).Minimal time interval
separating two successive frames of the same VL. The value
of the BAG is ranging from 1 to 128 ms.
• Lmin and Lmax . Smallest and largest Ethernet frame, in
bytes, transmitted on the VL.
In current architectures, each ES performs a traffic shaping for
each VL to control that frames are transmitted in accordance with
BAG and authorized frame size. The queued frames, which are
ready to be transmitted, are then selected depending on a strategy
configured in the VL scheduler. Therefore, it is possible that more
than one VL has a packet ready and eligible for transmission. In this
case, a queuing delay (jitter) is introduced. This jitter, computed
at the transmitting ES, is the time between the beginning of BAG
interval and the first bit of the frame to be transmitted in that BAG.
This jitter must not be greater than 500µs.
2.2 NoC Architecture and Assumptions
In this paper, we consider a Tilera-like NoC architecture, i.e. a 2D-
Mesh NoC with bidirectional links interconnecting a number of
routers. Each router has five input and output ports. Each input
port consists of a single queuing buffer. The routers at the edge of
the NoC are interconnected to the DDR memory located north and
south of the NoC via dedicated ports. The first and last columns
of the NoC are not connected directly to the DDR. Besides, the
routers at the east side connects the cores to the Ethernet interfaces
via specific ports. Many applications can be allocated on a NoC.
Each application is composed of a number of tasks, where one core
executes only one task. These tasks do not only communicate with
each other (core-to-core flows), but also with the I/O interfaces,
i.e. the DDR memory and Ethernet interfaces (core-to-I/O flows).
These flows are transmitted through the NoC following wormhole
routing [16], an XY policy and a Round-Robin arbitration. Besides,
a credit-based mechanism is applied to control the flows. A flow
consists of a number of packets, corresponding to the maximal
authorized flow size on the NoC. Indeed, a packet is divided into a
set of flits (flow control digits) of fixed size (typically 32-bits). The
maximal size of a NoC packet is of 19 flits as in Tilera NoC. The
wormhole routing makes the flits follow the first flit of the packet
in a pipeline way creating a worm where flits are distributed on
many routers. The credit-based mechanism blocks the flits before
a buffer overflow occurs. The consequence of such a transmission
modelisthatwhentwoflowssharethesamepath,ifoneofthem
isblocked,theotheronecanalsobeblocked.Thus,thedelayofa
flowcanincreaseduetocontentionsontheNoC.TheWorst-case
TraversalTime(WCTT)ofaflowcanbecomputedusingdifferent
methodsproposedintheliterature[5,13].Inthispaper,wechoose
RCNoC [3]tocomputetheWCTTasitleadstotightestboundsof
delayscomparedtotheexistingmethodsonaTilera-likeNoC.This
methodconsidersthepipelinetransmission,andthuscomputesthe
maximalblockingdelayaflowcansufferduethecontentionswith
blockingflows.
3 PROBLEMSTATEMENT
Actually,anoutgoingI/Oflowistransmittedfollowingthreesteps:
(1) A core sends data to the nearest port of DDR memory,
(2) then it sends a DMA command to the Ethernet interface on
a separate network. This DMA command indicates that all
the data are in the DDR memory. This command is stored
into a DMA command FIFO queue.
(3) When the Ethernet interface executes the DMA command,
data packets are sent from the same port of DDR memory
to the same Ethernet interface. The packets of an outgo-
ing I/O flow will incur a contention with different types of
communications on the NoC which could lead to a jitter.
Let us illustrate the delays of these steps with the example in
Figure 3. Two VLs VL1 and VL2 are respectively generated by tasks
t1DDR and t2DDR . At the beginning of VL1 first BAG period, t1DDR
sends VL1 data to the nearest port of DDR memory. This transmis-
sion can take up to WCTTtoDDR . Step 2 (transmission of the DMA
command to the Ethernet interface) is done after this worst-case
delay. Thus step 1 duration is constant and does not generate any
jitter. Similarly, step 2 duration is assumed to be constant, since the
DMA command is sent on a separate network. Thus all the jitter
comes from step 3 (transmission of the data from DDR memory to
the Ethernet interface). Considering VL1 first BAG period in Figure
3, the jitter is the delay d1 of this transmission, which is between 0
and its worst-case duration. The jitter can be much higher. Indeed,
for VL1 second BAG period, the Ethernet interface is busy with VL2
when it receives VL1 DMA command. The delay due to VL2 has to
be added to the jitter, leading to an overall value of d + dV L2 + d2.
The goal is to avoid that the Ethernet interface is busy when it
receives a DMA command (like for VL1 second period in Figure 3).
4 A DEDICATED CORE FOR OUTGOING I/O
FLOWS
We propose to dedicate a specific core of the NoC to the transmis-
sion of VL through the Ethernet interface. The idea is to execute
only one task tDDR for all the applications. This task can be exe-
cuted on a core of the NoC that do not execute any other task. The
transmission of the command of the DDR uses another internal
network and does not impact the communications of other tasks.
The behaviour of the task tDDR is as follow:
(1) Reception of a message from the final task of the function:
the data that need to be sent are in the DDR. In such a way,
we do not change the behaviour of the functions presented
in section 2.2. The transmission of the message is constant
and can be considered as a part of the execution time of the
sending task of the function.
(2) Transmission of a DDR command to the DDR. The corre-
sponding data are then transmitted from the DDR to the
Ethernet interface.
In this paper, we consider that the schedule of the DDR com-
mands, and so the VLs sent by the functions, can be done by using
a scheduling table.
4.1 A scheduling table
The goal of this method is to reduce the jitter induced by the trans-
mission of other VLs from the memory to the Ethernet interface
through the NoC and the transmission of these VLs through the
Ethernet interface. The considered scheduling table is composed
of slots of 31.25µs. The global duration of the table is 128 ms. So
the number of slots is 4096. The table is composed of 128 lines of
1 ms, each line contains 32 slots. A set of slots is allocated to each
VL sent by the applications by considering the BAG duration: a
VL will obtain a slot at exactly each BAG. Such a scheduling is
represented in Table 1. In this example, we denoted by the name
of the application the slot when the corresponding VL should be
sent. As an example, a VL from the application App1 has a BAG
duration equal to 2 ms. It can be located in column 1 of lines 0, 2,
. . . , 126. In the same manner, VL from the application App4 has a
BAG of 8 ms and is located in column 10 of lines 1, 9, 17, . . . , 121.
For each application Appi which send a VLi , the number of
allocated slots ωi is defined as:
ωi =
⌈
WCTT(VLi ) + ft(VLi )
sd
⌉
(1)
where WCTT(VLi) is the WCTT of VLi from the DDR to the Ether-
net interface, ft(VLi) is the transmission delay of the frame through
the Ethernet interface and sd = 31.25µs is the slot duration. The
WCTT(VLi) depends on the contention on the NoC, while ft(VLi)
depends on the size of the frame that is transmitted by the Ethernet
interface. For example, if we consider that the WCTT of VL2, sent
by App2, from the memory to the Ethernet interface is 30 µs and the
transmission of the VL2 by the interface needs 120 µs, the resulting
value of ω2 is 5 as depicted in Table 1.
This scheduling guarantees that the VLs are sent from the mem-
ory to the Ethernet interface at different times, leading to a reduc-
tion of the jitter induced by the transmission of other VLs. In such
a way, the only delay that a VL can suffer when it is transmitted
from the DDR is due to the interferences from the transmission
of internal communications through the NoC that share the same
path as the VL.
The goal is now to define this scheduling table, i.e. the allocation
of the slots to the applications.
4.2 Allocation of the slots to applications
We propose an allocation solution with two objectives:
(1) The main objective of the allocation of the slots is to guaran-
tee that the VLs sent by the applications respect their BAG
constraints. The approach needs to guarantee that an appli-
cation can send a data at exactly every BAG period and will
find the next transmission slot at exactly one BAG from the
t1DDR
t2DDR
DDR
Ethernet
Interface
WCTTToDDR
command
executed
D
M
A
c
o
m
m
a
n
d
VL1
d1
BAG
Jitter
VL2 VL1
WCTTToDDR
Jitter
d2d dVL2
Figure 3: A possible transmission on a given VL.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0
App2 App1 App3 App5
1
App2 App4
2
App2 App1
3
App2
4
App2 App1 App3
5
App2
6
App2 App1
7
App2
8
App2 App1 App3 App5
9
App2 App4
10
App2 App1
11
App2
12
App2 App1 App3
13
App2
14
App2 App1
15
App2 App6
127
App2 App6
………………………………………………………………………………………………………………………………
Table 1: Scheduling table of the VLi sent by each application Appi (VLs are represented by their sender applications).
current transmission. For example, if App1 starts its trans-
mission at slot located in column 5 of line 2, then App1 can
send a new data 2ms (the BAG of VL1 is 2ms) later, at slot
located in column 5 of line 4, as depicted in Table 1.
(2) In Table 1, App6 can send VL6 at column 20 in line 15. Con-
sidering the scheduling and the execution of App6 in the
many-core, VL6 can be ready to be sent at 100µs, for exam-
ple. So, the transmission of this VL through the Ethernet
interface will start 15.5 ms later (15 lines of the table is 15
ms, 16 slots of 31.2µs is 0.5 ms). Thus, we propose to reduce
this delay by allocating more transmission slots for each VLs.
Our second objective is then to allocate slots every n ms for
VLs, with n = 1, 2, 4, . . . In our approach, we propose to fix
n to the minimum value of the BAG of the VLs sent by the
applications. This is what we call oversampling.
We have then formulated these objectives into an ILP. The pro-
posed solution is a sequence per line of slots allocated to the trans-
mission of VLs that is repeated every n ms, with n = 1, 2, 4, . . . The
mathematical definition is given in the following.
4.2.1 Formulation. With the notations in the table 2, we con-
siderm different possible applications and their associated BAG,
the scheduling table provides free slots that can be addressed for
applications.
We have to treat the special case of BAG equal to 1ms because
it impacts all the lines of the scheduling table and so the capacity
of each line is reduced by the number of slots dedicated to the
Constants
m Number of applications
BAG j period of the application j noted Appj
ωj number of slots for application j
C Capacity of each line of scheduling table by removing 1ms BAG
N Number maximum of lines considered for allocation
Boolean Variables
yi equal 1 if line i of the table is used in the solution, 0 otherwise
xi j equal 1 if application j is packed into line i of the table, 0 otherwise
Objective Function
N∑
i=1
yi minimize the number of lines to pack all the applications
Table 2: Notations for Bin Packing Problem
applications. So the capacity, notedC , in terms of free slots for each
line of the scheduling table is defined as follows:
C = 32 −
∑
j ∈J
ωj , (2)
where J = {j ∈ {1, ..,m}|BAG j = 1}. The capacity C should be
in [|0,32|]. So negative values will indicate that the configuration
cannot ensure the transmission.
From the computation of the real capacity of each BAG by taking
into account the 1ms period of application, the allocation of each
application can be formulated as an Integer Linear Program (ILP)
called Bin Packing Problem [15]. The bin packing problem can
be described informally as follows: by consideringm applications,
each having an integer number of slots ωj (j=1,..,m) and a limited
number of lines N of integer capacityC . The objective is to pack all
the applications into the minimum number of lines in the table so
that the total number of slots per application packed in any line of
the table does not exceed its capacity. This will provide an optimal
way to assign slots in the scheduling table by the applications and
thus the filling of the table will permit transmitting applications
with long BAG several times within its own BAG.
We have to fix the limited number of lines N to allocate all the
applications. In fact, in order to repeat the most the allocations in
the table, we take into account all the BAG j (j=1..m), once. So the
number N is equal to the minimum BAG among all the applications
except 1ms (which are already considered by equation (1)) and is
defined as follows:
N = min
j=1...m, BAG j,1
BAG j . (3)
Formally, let’s introduce two binary decision variables yi for
i ∈ {1, ..,N } which indicates the considered line of the table and
xi j , for i ∈ {1, ..,N } and j ∈ {1, ..,m} which indicates the place of
the application j in the line i of the table as defined in the table 2.
From these notations, as the objective is to pack all the applications
into the minimum number of lines in the scheduling table, the
objective function will be formulated as:
min
N∑
i=1
yi
defined in Table 2. This function has to be minimized so that the
total number of slots per application, noted
∑m
j=1 ωjxi j for a line
i ∈ {1, ..,N }, packed in any line i ∈ {1, ..N } of the table does not
exceed its capacity C and this constraint can be expressed for all
the lines i ∈ {1, ..,N } as:
m∑
j=1
ωjxi j ≤ C yi ,∀i = 1, ..,N .
Finally, as the aim is to fill the table by considering all them appli-
cations, this leads to define these following constraints:
N∑
i=1
xi j = 1, ∀j = 1, ..,m.
Thus, we can formulate the problem of filling the scheduling
table by applications as the following ILP problem [14]:
min
N∑
i=1
yi (4)
s .t
m∑
j=1
ωjxi j ≤ C yi ,∀i = 1, ..,N (5)
N∑
i=1
xi j = 1,∀j = 1, ..,m (6)
yi ∈ {0, 1} ,∀i = 1, ..,N (7)
xi j ∈ {0, 1} ,∀i = 1, ..,N ,∀j = 1, ..,m. (8)
Objective function (3) means that the applications should be stored
in aminimumof lines of tableN defined by equation (2). Constraints
(4) impose that the capacity of each line defined by equation (1) is
not exceeded while constraints (5) ensure that all the applications
are packed. This ILP problem is solved by using CPLEX [7].
4.2.2 Example. Let’s consider the application configuration of
Table 1. It is composed of 6 applications. App2 is the first mapped
into the table as the BAG is equal to 1ms. With equation (1), the
capacity C is equal to 27 free slots. Equation (2) computes the
number of lines in which the applications has to be allocated is
2. The resulting scheduling table is given in Table 3. As all the
applications can be packed in two lines, lines 0 and 1 are repeated
in the table. So, applications App3, App4, App5 and App6, for which
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 App2 App1 App5
1 App2 App4 App3 App6
2 App2 App1 App5
3 App2 App4 App3 App6
4 App2 App1 App5
5 App2 App4 App3 App6
6 App2 App1 App5
7 App2 App4 App3 App6
8 App2 App1 App5
9 App2 App4 App3 App6
10 App2 App1 App5
11 App2 App4 App3 App6
12 App2 App1 App5
13 App2 App4 App3 App6
14 App2 App1 App5
15 App2 App4 App3 App6
127 App2 App4 App3 App6
………………………………………………………………………………………………………………………………
Table 3: Scheduling table obtained by the mapping method.
the BAG values are 4, 8, 8 and 16 respectively, have a communication
slot every 2ms. The waiting delay of these applications if they miss
the first communication slot is then 2ms.
5 CASE STUDY
In this section, we evaluate the approach to realistic case studies.
5.1 Considered applications
The considered case study is composed of critical and non critical
applications:
• Full Authority Digital Engine (FADEC) application: It
controls the performance of the aircraft engine. It receives
30 KBytes of data from the engine sensors via an Ethernet
interface and sends back 1500 Bytes of data to the engine
actuators. The application FADECn is composed of n tasks de-
noted tf 0 to tf n−1. tf n−1 is dedicated to send the commands
to the engine actuators via the Ethernet interface. Except
tf n−1, all other tasks exchange 5 KBytes of data through the
NoC. They also send 5 KBytes of data to tf n−1. Figure 4a
shows the tasks graph of FADEC7. This graph illustrates the
core-to-core and core-to-I/O communications between the
tasks of the FADEC application.
• Health Monitoring (HM) application: It is used to rec-
ognize incipient failure conditions of engines. It receives
through an Ethernet interface, a set of frames of size 130 KBytes
and sends back 1500 bytes of data actuators. The application
HMn is composed of n tasks, denoted th0 to thn−1. The last
Figure 4: Task graph of core-to-core and core-to-I/O commu-
nications of the: (a) FADEC application, (b) HM application.
task thn−1 is dedicated to send the data actuators to the Eth-
ernet interface. The task thi sends 2240 bytes of data to thi+1
through the NoC, with i ∈ [0,n − 2]. All these tasks finish
their processing by storing their frames into the memory.
Figure 4b shows the tasks graph of HM6.
FADECapplicationsarecritical,whileHMapplicationsarenon-
critical.Inthiscasestudy,weconsider2configurations:onecom-
posedof8applications(3FADECand5HM)andonecomposedof
9applications(3FADECand6HM).BAGsofthesentdatabythese
applicationsareindicatedinTable4.HM7 istheaddedapplication
betweenthetwoconfigurations.
5.2 Mappingapplicationsintothemanycore
Thedescribedapplicationsare themapped into themany-core.
Differentstrategiescanbeusedsuchas:
• Smart Hill Climbing (SHiC) [11]: this approach maps the
applications without fragmented regions, generated by the
methods in the literature (as in [9], [19],[12]). This method
searches a region of size equal to the size of the application
to be allocated. The tasks of this application are allocated
in the selected region in such a way to reduce the distance
between the communicated tasks. Thus, SHiC allocates the
task with the maximum number of communications at the
center of this region and around it the tasks communicating
with it to form a square shape.
• MapIO [4]: this approach performs the mapping into two
steps. The first step splits the NoC into regions and then
allocates primarily critical applications in a dedicated region
close to memory and Ethernet controllers by following a
circular direction and using rectangular shapes. The second
step consists to allocate the tasks within each application
where some rules are used to minimize the contentions on
the path of the core-to-I/O flows. These rules are based on
the principle of allocating the tasks which generate perpen-
dicular communications on the path of the core-to-I/O flows.
• ex_MapIO [2]: this approach is an extension of MapIO in
order to minimize the delay of outgoing I/O flows on the
NoC (such as d, d1 and d2 in Figure 3). Several mapping rules
have been defined. As an example, one rule minimizes the
number of flows that can delay an outgoing flow on its path
to the Ethernet interface.
As an example, Figure 5 shows the mapping of the two configu-
rations (8 applications and 9 applications) on a 10x10 NoC using
ex_MapIO.
The number of occupied cores is 89 for an 8 applications config-
uration and 96 for a 9 applications configuration. This means that
there are 11 free cores for the first configuration and 4 in the other
one, as we can see in Figure 5. These free cores can execute a tDDR.
As an example, tDDR can be located in column 2 of line 4 in Figure
5a and can be located in column 3 of line 3 in Figure 5b.
Once the applications are mapped, we can compute the WCTT
of the transmission of the VLs from the memory to the Ethernet in-
terface. The results for the two considered configurations are given
in Table 4. In this case study, we consider that for each application
a maximum size frame is sent through the AFDX network. So, the
frame transmission duration is 123 µs for all the applications of the
case study. The number of slots allocated for each application can
be computed and is given in Table 4, by considering Equation ??.
Note that there are no value for SHiC considering 9 applications
since mapping 9 applications using SHiC is not possible.
5.3 Obtained scheduling table
Figure 6 shows the resulting scheduling tables for the different
mapping approaches. As explain in section 4, the number of lines in
which all the applications have to be allocated is 2. This is why we
only represent the 2 first lines of the scheduling tables in Figure 6.
The oversampling of slots is then 2ms. This means that applications
with a BAG greater than 2ms will get a transmission slot every
2ms. If they miss the first transmission slot, they will have to wait
a maximum of 2ms before the data will be transmitted.
5.4 Results
The results are shown in Table 5. Distributed tDDR corresponds
to the solution described in Section 3. In this case, SHiC can not
respect the jitter constraint while MapIO and ex_MapIO allows a
reduction of the jitter leading to a jitter value less than 500µs for
an 8 applications mapping. But, the jitter is still greater than 500µs
when mapping 9 applications.
As expected, the solution proposed in the paper significantly
reduces the jitter of the VLs. The jitter becomes lower than 200µs
when mapping both 8 and 9 applications on the NoC for all the
mapping strategies.
6 CONCLUSION
In this paper, we proposed to replace the mono-core processors
in avionics architecture by a NoC-based many-cores architecture.
Thus, End Systems are replaced by many-cores.
The main contribution of the paper is that it proposes a new
VL transmission strategy which considers one dedicated node in
the many-core architecture to shape the traffic and schedules the
outgoing I/O flows. This dedicated node executes a scheduling
table. The applications are allocated into the table using an ILP
formulation. The two objectives of the strategy is to guarantee the
BAG regulation and to oversample the slots to applications with a
BAG greater than 2ms.
The results show that the jitter is significantly reduced and only
depends on the interferences that can occur on the transmission
path between the DDR and the I/O interface.
The ILP formulation imposes a strong constraint that can be
relaxed: all the applications should be stored in the minimum period
(different to 1ms). As an example, we can notice that it remains a
very little amount of free slots in Figures 6d and 6e (respectively
3 and 6). But applications with longer BAG can be allocated after
this minimum period. This leads to consider variable capacity size
and to formulate a variant of bin packing [14] or a cutting stock
problem with periodic items and thus defining a more complex
strategy for filling the scheduling table.
The worst case transmission delay of a VL from one application
from amany-core to another application executed on another many-
core should be mastered in order to certify the avionic system.
AFDX network uses Network Calculus to compute the maximum
worst case delay [6]. For a NoC used in processors like Tilera,
the method is Recursive Calculus [5]. The problem that could be
addressed now is how to compute the worst case end-to-end delay
of the global communication.
port 1port 2port 3port 4port 5
4567 3 2 1 0
5
6
7
0
1
2
3
4
(10,0) (0,0)
8910
8
9
10
port 11port 12port 13port 14port 15port 16port 17port 18
port 6port 7port 8
FADEC7
tf3
tf2
tf0
tf4
tf6
tf5
tf1th6 th7
th8th4
th1 th0th2th3
th5
th9
HM10
(0,5)
th11
th6th7
th10
th9 th8
th4
th1
th0
th2
th3 th5
th12th13 th14
th15
HM16
th6
th7
th8th4
th1 th0
th2th3
th5
HM9
tf6
tf7
tf1
tf0
tf4
tf3
tf5
tf2
tf8
tf10
tf9
FADEC11
FADEC13
tf6
tf7 tf1tf0
tf4 tf3
tf5 tf2
tf8
tf10
tf9
tf11
tf12
th9
th8 th7
th10
th11
th6
th4 th1
th2
th0
th5
th3
HM12
th9th8th7
th10
th6th4
th1th2 th0
th5
th3
HM11
tDDR
port1port2port3port4port5
4567 3 2 1 0
5
6
7
0
1
2
3
4
(10,0) (0,0)
8910
8
9
10
port11port12port13port14port15port16port17port18
port6port7port8
FADEC7
tf3
tf2
tf0
tf4
tf6
tf5
tf1th6 th7
th8th4
th1 th0th2th3
th5
th9
HM10
(0,5)
th11
th6th7
th10
th9 th8
th4
th1
th0
th2
th3 th5
th12th13 th14
th15
HM16 th6
th7
th8th4
th1 th0
th2th3
th5
HM9
FADEC13
tf6
tf7 tf1tf0
tf4 tf3
tf5 tf2
tf8
tf10
tf9
tf11
tf12
th9
th8 th7
th10
th11
th6
th4 th1
th2
th0
th5
th3
HM12
th9th8th7
th10
th6th4
th1th2 th0
th5
th3
HM11
tf6
tf7
tf1tf0
tf4
tf3
tf5
tf2
tf8
tf10tf9
FADEC11
HM7
th3
th4
th5
th6
th0
th1
th2
tDDR
Figure 5: Mapping 8 (left) and 9 (right) applications on a 10x10 many-core using ex_MapIO .
8 applications 9 applications
SHiC MapIO ex_MapIO MapIO ex_MapIO
Applications BAG WCTT # slots WCTT # slots WCTT # slots WCTT # slots WCTT # slots
FADEC7 4 107 8 52 6 51 6 52 6 51 6
FADEC11 8 183 10 117 8 77 7 105 8 68 7
FADEC13 16 111 8 52 6 51 6 52 6 51 6
HM7 4 - - - - - - 105 8 68 7
HM9 2 107 8 17 5 16 5 37 6 33 5
HM10 16 18 5 52 6 51 6 52 6 51 6
HM11 32 20 5 23 5 17 5 43 6 34 6
HM12 16 20 5 23 5 17 5 43 6 34 6
HM16 32 23 5 157 9 139 9 157 9 139 9
Table 4: WCTT on the NoC of the transmission of VL flows of each application from the memory to the Ethernet interface,
(in µs)
8 applications 9 applications
SHiC MapIO ex_MapIO MapIO ex_MapIO
distributed tDDR 590.5 492.9 419.5 645.9 528.9
one tDDR 183 157 139 157 139
Table 5: Maximum jitter experienced by the transmission of
VL, in µs
REFERENCES
[1] Aeronautical Radio Inc. ARINC 664. 2005. Aircraft Data Network, Part 7: Avionic
Full Duplex Switched Ethernet (AFDX) Network.
[2] Laure Abdallah, Jérôme Ermont, Jean-Luc Scharbarg, and Christian Fraboul.
2017. Towards a mixed NoC/AFDX architecture for avionics applications. In
IEEE 13th International Workshop on Factory Communication Systems, WFCS 2017,
Trondheim, Norway, May 31 - June 2, 2017. 1–10. https://doi.org/10.1109/WFCS.
2017.7991950
[3] Laure Abdallah, Mathieu Jan, Jérôme Ermont, and Christian Fraboul. 2015. Worm-
hole networks properties and their use for optimizing worst case delay analysis
of many-cores. In 10th IEEE International Symposium on Industrial Embedded
Systems (SIES). Siegen, Germany, 59–68.
[4] Laure Abdallah, Mathieu Jan, Jérôme Ermont, and Christian Fraboul. 2016. Reduc-
ing the contention experienced by real-time core-to-I/O flows over a Tilera-like
Network on Chip. In Real-Time Systems (ECRTS), 2016 28th Euromicro Conference
on. IEEE, 86–96.
[5] H. Ayed, J. Ermont, J. l. Scharbarg, and C. Fraboul. 2016. Towards a unified
approach for worst-case analysis of Tilera-like and KalRay-like NoC architectures.
In 2016 IEEE World Conference on Factory Communication Systems (WFCS). 1–4.
[6] H. Charara, J. Scharbarg, J. Ermont, and C. Fraboul. 2006. Methods for bounding
end-to-end delays on an AFDX network. In Proc of the 18th Euromicro Conf. on
Real-Time Systems (ECRTS). Dresde, Germany, 193–202.
[7] IBM ILOG CPLEX. 2009. V12. 1: User’s Manual for CPLEX. International Business
Machines Corporation 46, 53 (2009), 157.
[8] Benoît Dupont de Dinechin, Duco van Amstel, Marc Poulhiès, and Guillaume
Lager. 2014. Time-critical Computing on a Single-chip Massively Parallel Pro-
cessor. In Proc. of the Conf. on Design, Automation & Test in Europe (DATE’14).
97:1–97:6.
[9] Ewerson Luiz de Souza Carvalho, Ney Laert Vilar Calazans, and Fernando Gehm
Moraes. 2010. Dynamic task mapping for MPSoCs. Design & Test of Computers
27, 5 (2010), 26–35.
[10] RTCA DO. 2011. 178C. Software considerations in airborne systems and equipment
certification (2011).
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0
1
FADEC7
HM9 HM13HM12HM11
FADEC13FADEC11
HM10
(a) SHiC
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0
1 HM13HM12HM10
FADEC11FADEC7
HM11
FADEC13
HM9
(b) MapIO with 8 applications
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0
1 HM9 HM11
FADEC7 FADEC13
HM13HM12HM10
FADEC11
(c) Ex_MapIO with 8 applications
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0
1
HM10HM9
HM7 HM13
FADEC13
FADEC11 HM12
HM11FADEC7
(d) MapIO with 9 applications
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0
1
FADEC7 HM11HM9 HM10
HM13HM12HM7
FADEC13
FADEC11
(e) Ex_MapIO with 9 applications
Figure 6: Two first lines of the resulting scheduling tables
[11] Mohammad Fattah, Masoud Daneshtalab, Pasi Liljeberg, and Juha Plosila. 2013.
Smart hill climbing for agile dynamic mapping in many-core systems. In Proc. of
the 50th Annual Design Automation Conference. 39.
[12] Mohamamd Fattah, Marco Ramirez, Masoud Daneshtalab, Pasi Liljeberg, and
Juha Plosila. 2012. CoNA: Dynamic application mapping for congestion reduction
in many-core systems. In 30th Intl. Conf. on Computer Design (ICCD). 364–370.
[13] Thomas Ferrandiz, Fabrice Frances, and Christian Fraboul. 2011. Using Network
Calculus to compute end-to-end delays in SpaceWire networks. SIGBED Review
8, 3 (2011), 44–47.
[14] Jangha Kang and Sungsoo Park. 2003. Algorithms for the variable sized bin
packing problem. European Journal of Operational Research 147, 2 (2003), 365–
372.
[15] Silvano Martello and Paolo Toth. 1990. Knapsack Problems, J.
[16] Prasant Mohapatra. 1998. Wormhole routing techniques for directly connected
muti-computer systems. ACM Computer Survey (CSUR) 30, 3 (September 1998),
374–410.
[17] Vincent Nélis, Patrick Meumeu Yomsi, Luís Miguel Pinho, José Carlos Fonseca,
Marko Bertogna, Eduardo Quiñones, Roberto Vargas, and Andrea Marongiu.
2014. The Challenge of Time-Predictability in Modern Many-Core Architectures.
In 14th Intl. Workshop on Worst-Case Execution Time Analysis. Madridr, Spain,
63–72.
[18] David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards,
Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant
Agarwal. 2007. On-Chip Interconnection Architecture of the Tile Processor. IEEE
Micro 27, 5 (2007), 15–31.
[19] Christopher Zimmer and Frank Mueller. 2012. Low contention mapping of
real-time tasks onto tilepro 64 core processors. In 18th Real-Time and Embedded
Technology and Applications Symposium (RTAS). 131–140.
