136 research outputs found
221202
This PhD dissertation has resulted in six
publications in which one of the papers received Best Paper
Award at ICESS 2021. All the papers were published in
reputed venues for real-time systems research, i.e., RTSS
2020, RTNS 2021, ICESS 2021, RTCSA 2022, RTSS 2022,
Elsevier’s Journal of System Architecture. Two more papers
are expected to be published soon.Multicore platforms share the hardware resources such as caches, interconnects, and main memory among all the cores. Due to such sharing, tasks running on different cores compete to access these shared resources which can potentially result in shared resource contention. This shared resource contention can increase the execution times of tasks in a non-deterministic manner. Consequently, the shared resource contention is problematic for hard real-time systems, i.e., systems that run tasks with stringent timing requirements. To address this issue, this PhD dissertation builds novel solutions to model and analyze the shared resource contention that can be suffered by tasks executing on a multicore system. The shared resource contention aware schedulability analysis is then derived by integrating the maximum shared resource contention that can be suffered by the tasks.This work was supported by the CISTER Research
Unit (UIDP/UIDB/04234/2020), financed by National Funds
through FCT/MCTES (Portuguese Foundation for Science and
Technology); by project ADACORSA (ECSEL/0010/2019 -
JU grant nr. 876019) financed through National Funds from
FCT and European funds through the EU ECSEL JU. The JU
receives support from the European Union’s Horizon 2020 research and innovation programme and Austria, Sweden, Spain,
Italy, France, Portugal, Ireland, Finland, Slovenia, Poland,
Netherlands, Turkey - Disclaimer: This document reflects only
the author’s view and the Commission is not responsible for
any use that may be made of the information it contains. This
work is also a result of the work developed under project
Aero.Next Portugal (nÂş C645727867-00000066) and FLY-PT
(grant nÂş 46079, POCI-01-0247-FEDER-046079), also funded
by FCT under PhD grant 2020.09532.BD.info:eu-repo/semantics/publishedVersio
A survey of techniques for reducing interference in real-time applications on multicore platforms
This survey reviews the scientific literature on techniques for reducing interference in real-time multicore systems, focusing on the approaches proposed between 2015 and 2020. It also presents proposals that use interference reduction techniques without considering the predictability issue. The survey highlights interference sources and categorizes proposals from the perspective of the shared resource. It covers techniques for reducing contentions in main memory, cache memory, a memory bus, and the integration of interference effects into schedulability analysis. Every section contains an overview of each proposal and an assessment of its advantages and disadvantages.This work was supported in part by the Comunidad de Madrid Government "Nuevas TĂ©cnicas de Desarrollo de Software de Tiempo Real Embarcado Para Plataformas. MPSoC de PrĂłxima GeneraciĂłn" under Grant IND2019/TIC-17261
Scalability of broadcast performance in wireless network-on-chip
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version
211004
Today multicore processors are used in most modern systems that require computational logic. However, their applicability in systems with stringent timing requirements is still an ongoing research. This is due to the difficulty of ensuring the timing correctness of tasks executing on a multicore platform that comprises a number of shared hardware resources, e.g., caches, memory bus and the main memory. Concurrent accesses to any of these shared resources can generate uncontrolled interference, which complicates the estimations of tasks' worst-case execution time (WCET) and the worst-case response time (WCRT).
The use of the 3-phase task execution model helps in upper bounding the contention due to the sharing of bus/main memory in multicore systems. It divides the execution of tasks into distinct memory and execution phases, where tasks can only access the bus/main memory during their memory phases. This makes bus/memory access patterns of tasks more predictable, enabling a preciser computation of bus/memory contention.
In this work, we show how the bus contention can be computed for the 3-phase task model considering a work-conserving, i.e., round-robin (RR) based, arbitration policy at the memory bus. This is different from existing works that analyze the time-division multiple access (TDMA) and first-come-first-serve (FCFS) based bus arbitration policies. First, we present a solution to model the bus contention that can be suffered/caused by tasks executing on the same/remote cores of a multicore system under an RR-based bus arbitration scheme. We then evaluate the impact of resulting bus contention on taskset schedulability. Experimental results show that our proposed RR-based bus contention analysis can improve taskset schedulability by up to 100 percentage points than the TDMA-based analysis and up to 40 percentage points than the FCFS-based bus contention analysis.This work was partially supported by National Funds through FCT/MCTES (Portuguese Foundation for
Science and Technology), within the CISTER Research Unit (UIDB-UIDP/04234/2020); also by the Operational Competitiveness Programme and Internationalization (COMPETE 2020) under the PT2020 Partnership Agreement, through the European Regional Development Fund (ERDF), and by national funds through
the FCT, within project POCI-01-0145-FEDER-029119 (PREFECT); also by the European Union’s Horizon 2020 - The EU Framework Programme for Research and Innovation 2014-2020, under grant agreement
No. 732505. Project “TEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-0145-FEDER000020” financed by the North Portugal Regional Operational
Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement; also by FCT, under PhD
grant 2020.09532.BD.info:eu-repo/semantics/publishedVersio
220801
Multicore platforms are being increasingly adopted in Cyber-Physical Systems (CPS) due to their advantages over single-core processors, such as raw computing power and energy efficiency. Typically, multicore platforms use a shared memory bus that connects the cores to the off-chip main memory. This sharing of memory bus may cause tasks running on different cores to compete for access to the main memory whenever data/instructions are need to be read/written from/to the main memory. Such competition is problematic, as it may cause variations in the execution time of tasks in a non-deterministic way. To reduce the complexity of analysing this problem, the 3-phase task model was proposed that divides tasks' executions into distinct memory and execution phases. The distinctive memory phases are then scheduled to eliminate/minimize main memory contention between concurrently executing tasks. However, 3-phase tasks running on different cores may still compete to access the shared memory bus/main memory in order to execute memory phases. This paper presents a partitioned scheduling-based approach that allows one to derive memory bus contention-aware worst-case response time of tasks that follow the 3-phase task model. In particular, the bus-contention analysis is derived by considering two memory access models, i.e., (i) dedicated memory access model, where a core having allowed to access the main memory via memory bus is permitted to execute more than one memory phase, and (ii) fair memory access model, that restrict each core to execute only one memory phase in its allocated bus access. Both these models represent different system and application requirements, and the resulting bus contention of tasks may vary depending on the considered model. To evaluate the effectiveness of the proposed bus contention analysis, we compare its performance against an existing analysis in the state-of-the-art by performing (i) case-study experiments, using benchmarks from the Mälardalen Benchmark suite, and (ii) empirical evaluation using synthetic task sets. Results show that our proposed analysis can improve task set schedulability of 3-phase tasks by up to 88 percentage points.This work was partially supported by European Union’s Horizon 2020 -The EU Framework Programme
for Research and Innovation 2014-2020, under grant agreement No. 732505. Project “TEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-0145-FEDER000020”
845 financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL
2020 Partnership Agreement; also by National Funds through FCT/MCTES (Portuguese Foundation for
Science and Technology), within the CISTER Research Unit (UIDP/UIDB/04234/2020); by FCT and the
Portuguese National Innovation Agency (ANI), under the CMU Portugal partnership, through the European
Regional Development Fund (ERDF) of the Operational Competitiveness Programme and Internationaliza850 tion (COMPETE 2020), under the PT2020 Partnership Agreement, within project FLOYD (POCI-01-0247-
FEDER-045912), also by FCT under PhD grant 2020.09532.BD.info:eu-repo/semantics/publishedVersio
A Survey on Cache Management Mechanisms for Real-Time Embedded Systems
© ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {48, 2, (November 2015)} http://doi.acm.org/10.1145/2830555Multicore processors are being extensively used by real-time systems, mainly because of their demand for
increased computing power. However, multicore processors have shared resources that affect the predictability
of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks. One of
the main factors for unpredictability in a multicore processor is the cache memory hierarchy. Recently, many
research works have proposed different techniques to deal with caches in multicore processors in the context
of real-time systems. Nevertheless, a review and categorization of these techniques is still an open topic and
would be very useful for the real-time community. In this article, we present a survey of cache management
techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research
published in 2014. We categorize the main research works and provide a detailed comparison in terms of
similarities and differences. We also identify key challenges and discuss future research directions.King Saud University
NSER
SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures
Near-Data-Processing (NDP) architectures present a promising way to alleviate
data movement costs and can provide significant performance and energy benefits
to parallel applications. Typically, NDP architectures support several NDP
units, each including multiple simple cores placed close to memory. To fully
leverage the benefits of NDP and achieve high performance for parallel
workloads, efficient synchronization among the NDP cores of a system is
necessary. However, supporting synchronization in many NDP systems is
challenging because they lack shared caches and hardware cache coherence
support, which are commonly used for synchronization in multicore systems, and
communication across different NDP units can be expensive.
This paper comprehensively examines the synchronization problem in NDP
systems, and proposes SynCron, an end-to-end synchronization solution for NDP
systems. SynCron adds low-cost hardware support near memory for synchronization
acceleration, and avoids the need for hardware cache coherence support. SynCron
has three components: 1) a specialized cache memory structure to avoid memory
accesses for synchronization and minimize latency overheads, 2) a hierarchical
message-passing communication protocol to minimize expensive communication
across NDP units of the system, and 3) a hardware-only overflow management
scheme to avoid performance degradation when hardware resources for
synchronization tracking are exceeded.
We evaluate SynCron using a variety of parallel workloads, covering various
contention scenarios. SynCron improves performance by 1.27 on average
(up to 1.78) under high-contention scenarios, and by 1.35 on
average (up to 2.29) under low-contention real applications, compared
to state-of-the-art approaches. SynCron reduces system energy consumption by
2.08 on average (up to 4.25).Comment: To appear in the 27th IEEE International Symposium on
High-Performance Computer Architecture (HPCA-27
- …