Modeling contention interference in crossbar-based systems via Sequence-Aware Pairing (SeAP) by Giesen, Jeremy et al.
Modeling Contention Interference in Crossbar-based
Systems via Sequence-Aware Pairing (SeAP)
Jeremy Giesen∗,†, Pedro Benedicte∗, Enrico Mezzetti∗, Jaume Abella∗, Francisco J. Cazorla∗
∗Barcelona Supercomputing Center (BSC)
†Universitat Politecnica de Catalunya
Abstract—The Infineon AURIX TriCore family of microcon-
trollers has consolidated as the reference multicore computing
platform for safety-critical systems in the automotive domain.
As a distinctive trait, AURIX microcontrollers are designed to
promote high timing predictability as witnessed by the presence
of large scratchpad memories and a crossbar interconnect. The
latter has been introduced to reduce inter-core interference
in accessing the memory system and peripherals. Nonetheless,
the crossbar does not prevent requests from different cores to
the same target resource to suffer contention. Applications are,
therefore, inherently exposed to inter-core timing interference,
which needs to be taken into account in the determination of
reliable execution time bounds. In this paper we propose a
contention modeling technique for crossbar-based systems, and
hence suitable for bounding contention effects in the AURIX
family. Unlike state of the art techniques that build on total
request counts, we exploit the sequence of requests to the
different target resources produced by each core to produce
tighter bounds by discarding contention scenarios that cannot
occur in practice. To that end, we adapt existing techniques from
the pattern matching domain to derive the worst-case contention
effects from the sequences of requests each core sends over the
crossbar. Results on a wide set of synthetic and real scenarios
and benchmark on an AURIX TC297TX show that our technique
outperforms other contention modeling approaches.
I. INTRODUCTION AND MOTIVATION
The AURIX TC29x and its TC39x evolution are two fam-
ilies of Infineon’s microcontrollers with a dominant position
in the automotive market. Both are intended to capture ever-
increasing performance needs while adhering to the highest
requirements of automotive safety standards (ASIL-D) [17].
Processors in these families have been designed to reduce
software execution time variability (jitter) by, for instance, in-
corporating large on-core scratchpad memories, and a crossbar
interconnect. Yet, the fact that each processor in the family im-
plements multicore opens the door to contention interference.
While being capable of serving multiple requests in parallel,
the crossbar cannot handle more than one simultaneous request
to the same hardware shared resource (HSR). When multiple
cores try to access the same HSR at the same time, an arbiter
prioritizes requests causing tasks to experience variable access
latencies. This translates into variability in their execution time
that complicates the determination of trustworthy time bounds.
The contention suffered by a task depends on several aspects
that need to be modeled conservatively but also as precisely
as possible to avoid unnecessary pessimism. First, the number
of requests generated by applications that can be typically
derived by either static [32, 36, 9, 21] or measurement-
based [8, 30] techniques. Second, the HSR arbiter behavior
including its internal buffering and policy [32, 36, 9, 21].
For instance, for an arbiter implementing round-robin arbi-
tration and buffering a single request per contending core,
an absolute upper bound to the maximum contention delay
(mcd) a request from the analysis task can suffer is given by:
mcd = (Nc − 1) × lmax. Nc is the number of cores, one
of which runs the analysis task, and lmax the longest time a
request can hold the HSR. Third, the subset of applications
that can potentially execute in parallel with the application
under analysis, which can be conservatively modeled with
existing system-level timing analysis frameworks, based on
the classical concept of activation window [19], and can be
exploited to determine the maximum number of requests that
can generate contention on the application under analysis when
accessing a given HSR [36, 8]. And fourth, an aspect that
contention analysis approaches cannot practically model is
how the requests of the application under analysis and the
contenders align in time. In this case, contention modeling can
only conservatively assume that requests from the application
under analysis always arrive exactly at the same cycle as the
potential contender requests and always get lower arbitration
priority, hence suffering maximum contention.
Contention analysis approaches exploiting the number of
requests of applications (either total [31, 36, 9, 21, 28] or per-
type [12, 10, 30]) are accurate for modeling HSR with mutu-
ally exclusive access, such as buses. However, building only on
access counts does not allow properly modelling the behavior
of those devices that support some degree of parallelism, such
as the crossbar. Access counts fail to capture whether the
target application and its contenders access different devices
in parallel, thus suffering no contention.
In this work, we contend that this unnecessary source
of pessimism in contention modeling can be avoided by
exploiting information on the sequence of accesses generated
in each core. Similarly to access counts, sequences carry
no timing semantics but the order in which requests are
sent, and hence is not integration dependent, so that values
computed for an application in isolation hold in general during
operation regardless of the contenders. Sequences of requests
in isolation are relatively easy to derive building on modern
off-band tracing support [1, 11]. With sequences we can
exclude infeasible request conflict scenarios. We show this
with a simple illustrative example.
Motivating Example. Figure 1 shows a block diagram of
a dual-core processor in which cores (core0 and core1) are
connected to three HSR (HSRA, HSRB , and HSRC) via a
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current 
or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective 




at hand is a so called TriCore emulation device (ED) that
fully supports the Multi-Core Debug System (MCDS) also
known as OCDS Level3 (Tracing and Calibration). The MCDS
functionalities are accessible by an external tool via the debug
port using JTAG or Debug & Trace Active Probe (DAP) con-
nectors. The emulation device also supports the Aurora Gigabit
Trace (AGBT) interface, allowing to collect real-time tracing
information up to 3.125 Gbit/s. The configuration we used
includes an external debugger and a communication device
connected to the target via the DAP connector. Further details
on the experimental setting will be provided in Section V.
A. Generalization
In this work, we conduct an empirical evaluation on top of
the TC297TX processor and assess analytically our approach
for up to six sequences (cores), as many as in the TC39x.
Cores in the TC39x are organized in two clusters (with 4 and
2 cores, respectively) with homogeneous behavior across cores
inside a given cluster. The TC39x includes support for lock-
step execution in 4 cores. The platform enhancements with
respect to the previous generation also include larger SRAM
memories (up to 6MB) and 6 independent PFlashes.
Our focus on the TC2xx [14] and the TC3xx [15] families
answers their high relevance of these models for diffusion
and current market opportunities. However, our approach is
not specific to those TriCore families of microcontrollers as
it can be applied both to any other architecture building on
crossbar interconnects, or cheaper implementations like having
one bus connecting each target to all masters [26]. This is so as
sequence awareness is useful whenever the interconnect allows
some form of parallelism to serve them.
IV. SEQUENCE AWARE PAIRING
We build on the concept of request pairing [30] to derive
the worst-case contention delay among a concrete set of tasks.
Request pairing ultimately consists in the definition of a
mapping between the multicore requests potentially happening
simultaneously and contending for a given shared resource.
Requests from the task under analysis that are coupled (paired)
with requests from contender tasks running on other cores are
assumed to incur contention delay, contributing to the overall
inter-core interference. This allows to precisely model the
contention happening across cores: in a system with n cores,
under predictable arbitration policies, such as round-robin and
FIFO, only one-to-n pairings are allowed, in the sense that
one access from a core cannot be interfered by more than
one access per each of the remaining n cores. For a precise
contention analysis, a preliminary system-level analysis is re-
quired to derive the set of potentially overlapping tasks and the
worst-case contenting requests they can potentially generate.
This analysis should take into account the specific scheduling
policy as well as other execution-model concerns [8, 30].
Request pairing is effective for modeling (worst-case) con-
tention happening in multiple resources in bus-based sys-
tems [30]. However, it is essentially based on access count
information only and, as seen in the illustrative example
in Section I, it cannot model with sufficient tightness the
contention arising in crossbar-based systems. To overcome
this, we propose a Sequence-Aware Pairing (SeAP) approach
that focuses on sequence-level information to derive the worst-
case pairing of requests while excluding infeasible scenarios.
The problem of finding the worst-case overlapping of a
sequence of requests to a set of target resources over the
cross-bar is reducible to a well-known family of problems
in pattern matching theory, dealing with the derivation of
common sub-sequences among strings of data [41, 13, 24].
Intuitively, the pairing of requests coming from n cores over
the crossbar includes one request from the core under analysis
and at least one request from the remaining n−1 cores. In
other words, the pairing consists of a fixed associations of
a subsequence of requests from the core under analysis with
the subsequences of requests triggered by the remaining n-1
cores. Considering two cores, the pairing consists in a common
subsequence that generates the worst-case contention impact.
Across all cores, however, subsequences are not necessarily a
common subsequence as they only need to be a subsequence
with respect to the sequence from the core under analysis.
Before describing in details our approach, we provide the
necessary background on relevant pattern matching techniques.
A. Background on pattern matching
Detecting common patterns between two or more sequences
of elements is a recurrent problem in computer science.
A specific instance of this problem consists in detecting
the Longest Common Subsequence (LCS) between two or
more sequences from an alphabet of symbols. Conversely to
substrings, a subsequence allows to include non-consecutive
elements from the original sequence. This problem, which
has been considered in [41, 7], has practical uses in different
fields. For instance, LCS algorithms are at the basis of data
comparisons and revision-control systems, such as SVN or
GIT. The first work addressing the LCS problem [41] proposes
a solution for comparing two sequences based on dynamic
programming. The algorithm allows to compute both the
length of the LCS and an example of sequence with that
length but with a discouraging quadratic complexity (both in
time and space) on the input size and alphabet. By discarding
information on the example sequence, the space complexity of
the algorithm can be reduced from quadratic to linear while
maintaining the quadratic time complexity [13].
A different incarnation of the same problem, known as the
Heaviest Common Subsequence (HCS) problem, looks instead
for the common subsequence with the highest cumulative cost,
according to a weight function over the alphabet symbols. A
first solution with quadratic-complexity for the two-sequence
problem has been proposed in [18]. More recently, the same
optimization in space complexity used in [13] has been
adapted to HCS in [24]. This solution only computes the
weight of the HCS (no example subsequence is returned) and
exhibits a quadratic and linear complexity in time and space
respectively. It is worth noting that the optimized algorithms
proposed in [13, 24] for the LCS and HCS problems have
been instantiated to find a common subsequence between
two original sequences. While, in principle, they can also
be applied for obtaining the LCS or HCS of k sequences,
the entailed complexity can be unsustainable with realistically
long sequences (e.g. in the order of 10,000 symbols and
above). Non-exact solutions are often deployed to cope with
large sets of sequences [38].
B. Introduction to SeAP
SeAP is presented with a set of traces describing the
sequence of requests each core sends through the crossbar
to find the request overlapping that generates the maximum
contention impact. Debug support to obtain the sequence of
requests over the interconnect without affecting the program
execution is typically available for high-end embedded tar-
gets [1, 11] to enable efficient verification and validation. Find-
ing the worst-case overlapping of requests over the crossbar
exhibits strong similarities with the problem of finding the
HCS. The solution we propose builds on the optimized HCS
algorithm in [24] but with a critical distinction, which becomes
evident when more than two sequences are considered.
Given a set of k sequences, both LCS and HCS conventional
formulations attempt to find a sequence (longest or heaviest)
that is subsequence of all sequences in the set. The concept
of common subsequence implies a transitive relation with the
original sequences. For example, the HCS is also a pairwise
common subsequence of all the input sequences. This means
that each and every ordered element in HCS(q1, q2, . . . , qk)
appears in all q1, . . . , qk.
SeAP relaxes this requirement as not all sequences have
the same importance: we are interested in finding the pairing
of request that generates the worst-case contention delay on
a given core, not on all cores altogether. The worst-case
(cumulative) pairing may happen when the sequence from
the core under analysis is paired differently with each of the
other sequences (i.e. different subsequences are considered).
This means that, in SeAP, there is no concept of common
subsequence other than at pairwise level.
Relating to our problem, each element in the HCS represents
a pairing of exactly k requests to a given target resource
through the crossbar, whereas in our case, we can already
have contention with just 2 requests to the same target.
This is illustrated in Figure 3 where q1 belongs to the core
under analysis and the pairing induced by the SeAP derives
two distinct subsequences for q2 and q3, AABCA and CCA,
whereas HCS considers only the common subsequence CA.
The cumulative weight (contention effect) obtained with the
SeAP subsequences is necessarily larger than that caused by




− A A B B C C −− A
HCS(q1, q2, q3) =− A A B − C −−− A
C −−−− C C B B A
− A A B B C C −− A
SeAP (q1, q2, q3)=− A A B − C−−− A
C −−−− C C B B A
Fig. 3: SeAP and LCS/HCS difference.
Algorithm 1: SEAP with 2 sequences
Input: Two sequences of requests over the cross-bar
Output: The worst-case contention delay caused by any feasible
pairing.
1 for i← 0 to LEN(x) + 1 do
2 for j ← 0 to LEN(y) + 1 do
3 M [i][j]← 0
4 for i← 1 to LEN(x) + 1 do
5 for j ← 1 to LEN(y) + 1 do
6 M [i][j] = max
(
M [i-1][j],M [i][j-1],M [i-1][j-
1] +W (x[i], y[j])
)
7 return M
As a further minor difference with respect to HCS (and
LCS), in SeAP the symbol alphabet comprises targets acces-
sible via the crossbar as well as the admissible request types
(e.g., read or write). We do so as read/write requests to a
given resource can suffer and generate different contention,
and hence need to be associated to different weights.
C. SeAP for 2 cores
On a two-dimensional scenario, considering only two se-
quences of requests, the SeAP algorithm closely resembles the
HCS solution. As we already discussed, in the bidimensioanl
problem the peculiarity of SeAP does not emerge as the HCS
is symmetric (holds whichever the core under analysis). The
solution adapted from [24] is shown in Algorithm 1. The main
input structures are the two sequences of requests x, y and
the weight function, in our case a matrix W with the worst-
case contention delay incurred by a request to each target in
the crossbar. As an output, the algorithm produces the result
matrix M , from which the heaviest common subsequence
can be derived. Lines 1 to 3 in Algorithm 1 take care of
the initialization of the result matrix. Note that M size is
determined by the length of x and y incremented by 1,
since the algorithm exploits one additional row and column
to represent the null position in the sequences. In lines 4
to 6, the dynamic programming algorithm iterates through
each element of x and y in a nested-loop fashion. Starting
from a subsequence that only accounts for the first element of
each input vector, the solution builds up taking into account
the worst possible outcomes. The result matrix M contains
the HCS of all possible ordered subsequences. For a more
detailed understanding of the baseline solution, we refer the
reader to the original paper [24]. This implementation allows
to obtain both the worst-case contention delay as well as
the subsequence of request (pairing) that determined such
maximum delay. The baseline algorithm works in quadratic
time and space. The space complexity can be reduced to be
linear by using only two rows of the solution matrix M , as
proposed in [13] for LCS and in [24] for HCS. With this
optimization, however, it is not possible to reconstruct the
resulting subsequence.
We illustrate Algorithm 1 over two example sequences x
and y and a simplified weight function W : req type → N:










core is relatively simple to define as it will consist on the
concatenation of the sequences generated by the (ordered)
set of tasks potentially running in the same scheduling
window as the task under analysis. This set of tasks can
be derived with conventional analysis techniques [19].
• In preemptive scenarios, SeAP will behave as existing
approaches [9, 34, 30, 35, 4] and will be able to deliver
contention bounds under reasonable restrictive assump-
tions. For example, a limited-preemption scenario [6]
would provide a trade-off between responsiveness (for
high priority tasks) and analysability. Limited preemption
will have a two-fold impact on SeAP:
(i) Conceptually preemption points divide tasks into sub-
tasks, each one associated to a sub-sequence, as a result
of splitting the original sequence at preemption points.
(ii) Contention will happen between the task under analysis
and the sub-sequences generated by all tasks poten-
tially executing in a given contender core within the
scheduling window of the task under analysis, which
can be derived based on conventional analyses [19].
Fig. 7: Combining sequences under limited preemption.
Figure 7 shows a simple example of combination of sub-
sequences from a contender core n. We assume two tasks τi
and τj in n can potentially execute in parallel with the task
under analysis, with τi having higher priority than τj . We
assume no other task or interrupt can preempt these tasks.
In particular, interrupts that are non-deferrable may require a
special treatment. Fixed preemption points have been defined
for both task, splitting the tasks’ sequences in sub-sequences
S1i, . . . , S3i and S1j, . . . , S3j. As a result of all possible
overlapping (and preemptions) between τi and τj core n
can generate a set of combined sequences of requests qn,
as reported in the figure. Finding the worst-case contention
impact will translate into an optimization problem to find
the combined sequence in qn that leads to the worst-case
contention impact, under a fixed set of preemption-points.
As part of this search, SeAP – as we presented it in this
work – would be invoked several times to evaluate each
possible combined sequence. A conceptually similar search-
based approach has been deployed in [34, 30].
V. EXPERIMENTAL EVALUATION
The proposed SeAP solutions have been evaluated with a
twofold objective in mind. First, we wanted to understand
how the approach compares against a baseline pairing solution
only relying on access counts (nSeAP), such those proposed
in [9, 10, 30]. In particular we were interested in assessing
whether the quality – in terms of tightness – of the results
obtained with SeAP depends on the characteristics of both
platform and inputs, such as number of contenders, latencies,
as well as length of sequences and distribution of request types.
And second, we also wanted to experiment with SeAP on a
real platform to provide evidence of the practical applicability
of the approach and to assess it against a representative
application in the automotive domain. The evaluation is mainly
focused on the compositional SeAP approach outlined in
Section IV-E, which benefits from a reduced complexity both
in time and space and allows to explore scenarios with larger
core counts. Therefore, for the time being, we will refer to
SeAP compositional variant unless explicitly stated.
A. Synthetic Design Space Exploration
In this section we evaluate SeAP by exploring how the
obtained contention bounds are affected by different factors,
namely, the number of contending cores and the particular
sequence of requests by each contender. Without loss of
generality we consider 4 shared resources (rA, rB , rC , rD) that
can be accessed via a crossbar, with each resource accepting
one type of request (A, B, C, and D). Each resource exhibits
a different access latency (lA=2, lB=4, lC = 4, and lD = 8).
We conservatively assume that the maximum contention a
request can generate on another request to the same target
resource in the crossbar is equal to its own maximum duration.
Contention is assumed to be compositional so that, in the
pairing scenario AAA, with 3 cores simultaneously sending a
request to rA, the maximum contention suffered by the core
under analysis would be 2× lA. This scenario enables the use
of the compositional SeAP approach.
We explore 3-core and 6-core setups matching the core
count in the AURIX TC29x and TC39x microcontroller fami-
lies. We refer to each core as c0, c1, ... c5, respectively. In each
experiment we analyze the contention delay and execution
time increase of the task in c0 (core under analysis) due to
the contention generated from the tasks in the remaining cores.
The TC39x equips two crossbars and exhibits heterogeneous
latencies for the different target depending on the core issuing
the request, which can be modeled in SeAP by considering
different latencies per request source/target. In our evaluation
for 6 cores, the real (non-fully-homogeneous) TC39x latencies
are accounted for with some slightly higher pessimism by
assuming that, given a core under analysis in one cluster (e.g.
clus0), requests from cores in the other cluster (clus1) arrive
at a higher frequency than in reality (so as if they had lower
latency). This is equivalent to consider that cores in clus1 sit in
the local cluster, clus0, rather than on the remote one, clus1,
which, in reality, would send requests with lower frequency,
thus creating lower contention. Requests of the core under
analysis are restricted to cluster-local targets.
The evaluation is conducted through a set of synthetically
generated execution profiles, where each core generates a fixed





This work has been partially supported by the Spanish
Ministry of Economy and Competitiveness (MINECO) under
grant TIN2015-65316-P, the SuPerCom European Research
Council (ERC) project under the European Union’s Hori-
zon 2020 research and innovation programme (grant agree-
ment No. 772773), and the HiPEAC Network of Excellence.
MINECO also partially supported Enrico Mezzetti under
Juan de la Cierva-Incorporación postdoctoral fellowship (IJCI-
2016-27396) and Jaume Abella under Ramon y Cajal postdoc-
toral fellowship (RYC-2013-14717).
REFERENCES
[1] Nexus 5001. IEEE-ISTO 5001-2012, The Nexus 5001
Forum Standard for a Global Embedded Processor Debug
Interface. https://bit.ly/2MIoJY1.
[2] Ahmed Alhammad, Saud Wasly, and Rodolfo Pellizzoni.
Memory efficient global scheduling of real-time tasks. In
21st IEEE Real-Time and Embedded Technology and Ap-
plications Symposium, pages 285–296. IEEE Computer
Society, 2015.
[3] Matthias Becker, Dakshina Dasari, Borislav Nikolic,
Benny Akesson, Vincent Nélis, and Thomas Nolte.
Contention-free execution of automotive applications on
a clustered many-core platform. In 28th Euromicro
Conference on Real-Time Systems, ECRTS, pages 14–24.
IEEE Computer Society, 2016.
[4] Alessandro Biondi and Marco Di Natale. Achieving
predictable multicore execution of automotive applica-
tions using the LET paradigm. In IEEE Real-Time
and Embedded Technology and Applications Symposium,
pages 240–250. IEEE Computer Society, 2018.
[5] Sergey Blagodurov, Sergey Zhuravlev, and Alexandra
Fedorova. Contention-aware scheduling on multicore
systems. ACM Trans. Comput. Syst., 28(4):8:1–8:45,
2010.
[6] Giorgio C. Buttazzo, Marko Bertogna, and Gang Yao.
Limited preemptive scheduling for real-time systems. A
survey. IEEE Trans. Industrial Informatics, 9(1):3–15,
2013.
[7] Vacláv Chvátal and David Sankoff. Longest common
subsequences of two random sequences. Journal of
Applied Probability, 12(2):306315, 1975.
[8] Dakshina Dasari, Björn Andersson, Vincent Nélis, Ste-
fan M. Petters, Arvind Easwaran, and Jinkyu Lee. Re-
sponse time analysis of cots-based multicores consider-
ing the contention on the shared memory bus. In IEEE
10th International Conference on Trust, Security and
Privacy in Computing and Communications, TrustCom,
pages 1068–1075. IEEE Computer Society, 2011.
[9] Dakshina Dasari and Vincent Nelis. An Analysis of the
Impact of Bus Contention on the WCET in Multicores.
In IEEE 14th International Conference on High Perfor-
mance Computing and Communication & 2012 IEEE
9th International Conference on Embedded Software and
Systems, HPCC ’12, pages 1450–1457, 2012.
[10] Enrique Dı́az, Enrico Mezzetti, Leonidas Kosmidis,
Jaume Abella, and Francisco J. Cazorla. Modelling mul-
ticore contention on the aurixtm tc27x. In Proceedings
of the 55th Annual Design Automation Conference, DAC,
pages 97:1–97:6. ACM, 2018.
[11] Boris Dreyer and Christian Hochberger. Non-intrusive
online timing analysis of large embedded applications.
In 19th International Workshop on Worst-Case Execution
Time Analysis, WCET, volume 72 of OASICS, pages 2:1–
2:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik,
2019.
[12] Gabriel Fernandez, Javier Jalle, Jaume Abella, Eduardo
Quiñones, Tullio Vardanega, and Francisco J. Cazorla.
Computing safe contention bounds for multicore re-
sources with round-robin and FIFO arbitration. IEEE
Trans. Computers, 66(4):586–600, 2017.
[13] Daniel S. Hirschberg. A linear space algorithm for
computing maximal common subsequences. Commun.
ACM, 18(6):341–343, June 1975.








[16] Infineon. AURIX™ TC29x B-Step 32-Bit Single-Chip
Microcontroller - Users Manual V1.3 2014-12. 2019.
[17] International Organization for Standardization. ISO/DIS
26262. Road Vehicles – Functional Safety, 2009.
[18] Guy Jacobson and Kiem-Phong Vo. Heaviest increas-
ing/common subsequence problems. In Proceedings of
the Third Annual Symposium on Combinatorial Pattern
Matching, CPM, pages 52–66. Springer-Verlag, 1992.
[19] Mathai Joseph and Paritosh K. Pandya. Finding response
times in a real-time system. Comput. J., 29(5):390–395,
1986.
[20] Hyoseung Kim, Dionisio de Niz, Björn Andersson,
Mark H. Klein, Onur Mutlu, and Ragunathan Rajku-
mar. Bounding memory interference delay in cots-
based multi-core systems. In 20th IEEE Real-Time
and Embedded Technology and Applications Symposium,
RTAS, pages 145–154. IEEE Computer Society, 2014.
[21] Hyoseung Kim, Dionisio de Niz, Björn Andersson,
Mark H. Klein, Onur Mutlu, and Ragunathan Rajkumar.
Bounding and reducing memory interference in cots-
based multi-core systems. Real-Time Systems, 52(3):356–
395, 2016.
[22] Namhoon Kim, Bryan C. Ward, Micaiah Chisholm,
James H. Anderson, and F. Donelson Smith. Attacking
the one-out-of-m multicore problem by combining hard-
ware management with mixed-criticality provisioning.
Real-Time Systems, 53(5):709–759, 2017.
[23] Namhoon Kim, Bryan C. Ward, Micaiah Chisholm,
Cheng-Yang Fu, James H. Anderson, and F. Donelson
Smith. Attacking the one-out-of-m multicore problem by
combining hardware management with mixed-criticality
provisioning. In 2016 IEEE Real-Time and Embedded
Technology and Applications Symposium (RTAS), pages
149–160. IEEE Computer Society, 2016.
[24] Rao Li. A linear space algorithm for the heaviest
common subsequence problem. Utilitas Mathematica,
75, 03 2008.
[25] Renato Mancuso, Rodolfo Pellizzoni, Neriman Tokcan,
and Marco Caccamo. WCET derivation under single core
equivalence with explicit memory budget assignment.
In 29th Euromicro Conference on Real-Time Systems,
ECRTS, volume 76 of LIPIcs, pages 3:1–3:23. Schloss
Dagstuhl - Leibniz-Zentrum für Informatik, 2017.
[26] Manish Shah, J. Barreh, J. Brooks, R. Golla, G. Gro-
hoski, N. Gura, R. Hetherington, P. Jordan, M. Luttrell,
C. Olson, Bikram Saha, D. Sheahan, L. Spracklen, and
A. Wynn. Ultrasparc t2: A highly-treaded, power-
efficient, sparc soc. In 2007 IEEE Asian Solid-State
Circuits Conference, pages 22–25, 2007.
[27] Sébastien Martinez, Damien Hardy, and Isabelle Puaut.
Quantifying WCET reduction of parallel applications by
introducing slack time to limit resource contention. In
Proceedings of the 25th International Conference on
Real-Time Networks and Systems, RTNS, pages 188–197.
ACM, 2017.
[28] Enrico Mezzetti, Luca Barbina, Jaume Abella, Stefa-
nia Botta, and Francisco J. Cazorla. AURIX TC277
multicore contention model integration for automotive
applications. In Design, Automation & Test in Eu-
rope Conference & Exhibition, DATE, pages 1202–1203,
2019.
[29] Jan Nowotsch, Michael Paulitsch, Daniel Buhler, Henrik
Theiling, Simon Wegener, and Michael Schmidt. Multi-
core interference-sensitive WCET analysis leveraging
runtime resource capacity enforcement. In 26th Eu-
romicro Conference on Real-Time Systems, ECRTS 2014,
Madrid, Spain, July 8-11, 2014, pages 109–118. IEEE
Computer Society, 2014.
[30] Xavier Palomo, Enrico Mezzetti, Jaume Abella, Rein-
der J. Bril, and Francisco J. Cazorla. Accurate ilp-based
contention modeling on statically scheduled multicore
systems. In 25th IEEE Real-Time and Embedded Tech-
nology and Applications Symposium, RTAS, pages 15–28.
IEEE, 2019.
[31] Rodolfo Pellizzoni, Emiliano Betti, Stanley Bak, Gang
Yao, John Criswell, Marco Caccamo, and Russell Kegley.
A predictable execution model for cots-based embedded
systems. In 17th IEEE Real-Time and Embedded Tech-
nology and Applications Symposium, RTAS, pages 269–
279. IEEE Computer Society, 2011.
[32] Rodolfo Pellizzoni, Bach Duy Bui, Marco Caccamo, and
Lui Sha. Coscheduling of CPU and I/O transactions in
cots-based embedded systems. In Proceedings of the 29th
IEEE Real-Time Systems Symposium, RTSS, pages 221–
231. IEEE Computer Society, 2008.
[33] PLS Programmierbare Logik & Systeme GmbH. Univer-
sal access devices uad. https://bit.ly/2nPLtwO.
[34] Benjamin Rouxel, Steven Derrien, and Isabelle Puaut.
Tightening contention delays while scheduling parallel
applications on multi-core architectures. ACM Trans.
Embedded Comput. Syst., 16(5s):164:1–164:20, 2017.
[35] Benjamin Rouxel, Stefanos Skalistis, Steven Derrien,
and Isabelle Puaut. Hiding communication delays in
contention-free execution for spm-based multi-core ar-
chitectures. In 31st Euromicro Conference on Real-Time
Systems, ECRTS, volume 133 of LIPIcs, pages 25:1–
25:24. Schloss Dagstuhl - Leibniz-Zentrum für Infor-
matik, 2019.
[36] Simon Schliecker, Mircea Negrean, and Rolf Ernst.
Bounding the shared resource load for the performance
analysis of multiprocessor systems. In Design, Automa-
tion and Test in Europe, DATE, pages 759–764. IEEE
Computer Society, 2010.
[37] Andreas Schranzhofer, Rodolfo Pellizzoni, Jian-Jia Chen,
Lothar Thiele, and Marco Caccamo. Worst-case response
time analysis of resource access models in multi-core
systems. In Proceedings of the 47th Design Automation
Conference, DAC, pages 332–337. ACM, 2010.
[38] Shyong Jian Shyu and Chun-Yuan Tsai. Finding the
longest common subsequence for multiple biological
sequences by ant colony optimization. Comput. Oper.
Res., 36(1):73–91, 2009.
[39] Stefanos Skalistis and Alena Simalatsar. Worst-case
execution time analysis for many-core architectures with
noc. In Formal Modeling and Analysis of Timed Systems -
14th International Conference, FORMATS, volume 9884
of Lecture Notes in Computer Science, pages 211–227.
Springer, 2016.
[40] Rohan Tabish, Renato Mancuso, Saud Wasly, Rodolfo
Pellizzoni, and Marco Caccamo. A real-time scratchpad-
centric OS with predictable inter/intra-core communi-
cation for multi-core embedded systems. Real-Time
Systems, 55(4):850–888, 2019.
[41] Robert A. Wagner and Michael J. Fischer. The string-
to-string correction problem. J. ACM, 21(1):168–173,
January 1974.
[42] Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco
Caccamo, and Lui Sha. Memguard: Memory bandwidth
reservation system for efficient performance isolation in
multi-core platforms. In 19th IEEE Real-Time and Em-
bedded Technology and Applications Symposium, RTAS,
pages 55–64. IEEE Computer Society, 2013.
