Search CORE

6 research outputs found

Work-in-Progress: Extending Buffer-Aware Worst-Case Timing Analysis of Wormhole NoCs

Author: Giroudot Frédéric
Mifdaoui Ahlem
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Worst-case timing analysis of Networks-on-Chip (NoCs) is a crucial aspect to design safe real-time systems based on manycore architectures. In this paper, we present some potential extensions of our previously-published buffer-aware worst-case timing analysis approach to cope with bursty traffic such as real-time audio and video streams. A first promising lead is to improve the algorithm analyzing backpressure patterns to capture consecutive-packet queueing effect while keeping the information about the dependencies between flows. Furthermore, the improved algorithm may also decrease the inherent complexity of computing the indirect blocking latency due to backpressure

Open Archive Toulouse Archive Ouverte

Worst-Case Latency Analysis for the Versal Network-on-Chip

Author: Elmor Lang Ian
Publication venue: 'University of Waterloo'
Publication date: 10/01/2022
Field of study

The recent line of Versal FPGA devices from Xilinx Inc. includes a hard Network-On-Chip (NoC) embedded in the programmable logic, designed to be a high-performance system-level interconnect. While the target markets for Versal devices include applications with real-time constraints, such as automotive driver assist, the associated development tools only provide figures for "structural latencies" of data packets, which assume that the network is otherwise idle. In a realistic setting, this information is not enough to ensure deadlines are met, as different packets can contend for NoC switch outputs, which causes packet contents to be buffered while in transit, increasing their latency. In this work, we develop an approach for calculating upper bounds for such worst-case latencies (WCLs), assuming a model where system tasks release packets into the NoC periodically. In order to develop an accurate model for latencies in the network, we review the architecture and operation of the Versal NoC. We focus on a formal description of the NPS switches that compose the NoC from a flit arbitration perspective, based on study the available cycle-accurate switch simulation code. Working with the presented model, we propose an adaptation to an existing approach for WCL analysis in NoC, Recursive Calculus (RC), in order to apply it to the arbitration policy implemented in the Versal NoC. To evaluate the proposed approach, we implement a simulation experiment for the Versal NoC, with custom endpoints that allow for injecting packets programatically and measuring their latencies over the NoC. We simulate both a single NPS module and a complete NoC routing periodic workloads, in order to compare with the values given by the WCL approach and identify sources of pessimism

University of Waterloo's Institutional Repository

Worst-case delay analysis of core-to-IO flows over many-cores architectures

Author: Abdallah Laure
Publication venue
Publication date: 05/04/2017
Field of study

Many-core architectures are more promising hardware to design real-time systems than multi-core systems as they should enable an easier mastered integration of a higher number of applications, potentially of different level of criticalities. In embedded real-time systems, these architectures will be integrated within backbone Ethernet networks, as they mostly provide Ethernet controllers as Input/Output(I/O) interfaces. Thus, a number of applications of different level of criticalities could be allocated on the Network-on-Chip (NoC) and required to communicate with sensors and actuators. However, the worst-case behavior of NoC for both inter-core and core-to-I/O communications must be established. Several NoCs targeting hard real-time systems, made of specific hardware extensions, have been designed. However, none of these extensions are currently available in commercially available NoC-based many-core architectures, that instead rely on wormhole switching with round-robin arbitration. Using this switching strategy, interference patterns can occur between direct and indirect flows on many-cores. Besides, the mapping over the NoC of both critical and non-critical applications has an impact on the network contention these core-to-I/O communications exhibit. These core-to-I/O flows (coming from the Ethernet interface of the NoC) cross two networks of different speeds: NoC and Ethernet. On the NoC, the size of allowed packets is much smaller than the size of Ethernet frames. Thus, once an Ethernet frame is transmitted over the NoC, it will be divided into many packets. When all the data corresponding to this frame are received by the DDR-SDRAM memory on the NoC, the frame is removed from the buffer of the Ethernet interface. In addition, the congestion on the NoC, due to wormhole switching, can delay these flows. Besides, the buffer in the Ethernet interface has a limited capacity. Then, this behavior may lead to a problem of dropping Ethernet frames. The idea is therefore to analyze the worst case transmission delays on the NoC and reduce the delays of the core-to-I/O flows. In this thesis, we show that the pessimism of the existing Worst-Case Traversal Time (WCTT) computing methods and the existing mapping strategies lead to drop Ethernet frames due to an internal congestion in the NoC. Thus, we demonstrate properties of such NoC-based wormhole networks to reduce the pessimism when modeling flows in contentions. Then, we propose a mapping strategy that minimizes the contention of core-to-I/O flows in order to solve this problem. We show that the WCTT values can be reduced up to 50% compared to current state-of-the-art real-time packet schedulability analysis. These results are due to the modeling of the real impact of the flows in contention in our proposed computing method. Besides, experimental results on real avionics applications show significant improvements of core-to-I/O flows transmission delays, up to 94%, without significantly impacting transmission delays of core-to-core flows. These improvements are due to our mapping strategy that allocates the applications in such a way to reduce the impact of non-critical flows on critical flows. These reductions on the WCTT of the core-to-I/O flows avoid the drop of Ethernet frames

Open Archive Toulouse Archive Ouverte

NoC-based Architectures for Real-Time Applications : Performance Analysis and Design Space Exploration

Author: Giroudot Frédéric
Publication venue
Publication date: 13/12/2019
Field of study

Monoprocessor architectures have reached their limits in regard to the computing power they offer vs the needs of modern systems. Although multicore architectures partially mitigate this limitation and are commonly used nowadays, they usually rely on intrinsically non-scalable buses to interconnect the cores. The manycore paradigm was proposed to tackle the scalability issue of bus-based multicore processors. It can scale up to hundreds of processing elements (PEs) on a single chip, by organizing them into computing tiles (holding one or several PEs). Intercore communication is usually done using a Network-on-Chip (NoC) that consists of interconnected onchip routers allowing communication between tiles. However, manycore architectures raise numerous challenges, particularly for real-time applications. First, NoC-based communication tends to generate complex blocking patterns when congestion occurs, which complicates the analysis, since computing accurate worst-case delays becomes difficult. Second, running many applications on large Systems-on-Chip such as manycore architectures makes system design particularly crucial and complex. On one hand, it complicates Design Space Exploration, as it multiplies the implementation alternatives that will guarantee the desired functionalities. On the other hand, once a hardware architecture is chosen, mapping the tasks of all applications on the platform is a hard problem, and finding an optimal solution in a reasonable amount of time is not always possible. Therefore, our first contributions address the need for computing tight worst-case delay bounds in wormhole NoCs. We first propose a buffer-aware worst-case timing analysis (BATA) to derive upper bounds on the worst-case end-to-end delays of constant-bit rate data flows transmitted over a NoC on a manycore architecture. We then extend BATA to cover a wider range of traffic types, including bursty traffic flows, and heterogeneous architectures. The introduced method is called G-BATA for Graph-based BATA. In addition to covering a wider range of assumptions, G-BATA improves the computation time; thus increases the scalability of the method. In a second part, we develop a method addressing design and mapping for applications with real-time constraints on manycore platforms. It combines model-based engineering tools (TTool) and simulation with our analytical verification technique (G-BATA) and tools (WoPANets) to provide an efficient design space exploration framework. Finally, we validate our contributions on (a) a serie of experiments on a physical platform and (b) two case studies taken from the real world: an autonomous vehicle control application, and a 5G signal decoder applicatio

Open Archive Toulouse Archive Ouverte

Partitioning and Analysis of the Network-on-Chip on a COTS Many-Core Platform

Author: Becker Matthias
Behnam Moris
Dasari Dakshina
Nikolic Borislav
Nolte Thomas
Nélis Vincent
Åkesson Benny
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

24th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2017). Pittsburgh, U.S.A..Many-core processors can provide the computational power required by future complex embedded systems. However, their adoption is not trivial, since several sources of interference on COTS many-core platforms have adverse effects on the resulting performance. One main source of performance degradation is the contention on the Network-on-Chip, which is used for communication among the compute cores via the offchip memory. Available analysis techniques for the traversal time of messages on the NoC do not consider many of the architectural features found on COTS platforms. In this work, we target a state-of-the-art many-core processor, the Kalray MPPA. A novel partitioning strategy for reducing the contention on the NoC is proposed. Further, we present an analysis technique dedicated to the proposed partitioning strategy, which considers all architectural features of the COTS NoC. Additionally, it is shown how to configure the parameters for flow-regulation on the NoC, such that the Worst-Case Traversal Time (WCTT) is minimal and buffers never overflow. The benefits of our approach are evaluated based on extensive experiments that show that contention is significantly reduced compared to the unconstrained case, while the proposed analysis outperforms a state-of-the-art analysis for the same platform. An industrial case study shows the tightness of the proposed analysis.info:eu-repo/semantics/publishedVersio

Repositório Científico do Instituto Politécnico do Porto

Crossref

Graph-based Approach for Buffer-aware Timing Analysis of Heterogeneous Wormhole NoCs under Bursty Traffic

Author: Giroudot Frédéric
Mifdaoui Ahlem
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/11/2019
Field of study

This paper addresses the problem of worst-case timing analysis of heterogeneous wormhole NoCs, i.e., routers with different buffer sizes and transmission speeds, when consecutive-packet queuing (CPQ) occurs. The latter means that there are several consecutive packets of one flow queuing in the network. This scenario happens in the case of bursty traffic but also for non-schedulable traffic. Conducting such an analysis is known to be a challenging issue due to the sophisticated congestion patterns when enabling backpressure mechanisms. We tackle this problem through extending the applicability domain of our previous work for computing maximum delay bounds using Network Calculus, called Buffer-aware worst-case Timing Analysis (BATA). We propose a new Graph-based approach to improve the analysis of indirect blocking due to backpressure, while capturing the CPQ effect and keeping the information about dependencies between flows. Furthermore, the introduced approach improves the computation of indirect-blocking delay bounds in terms of complexity and ensures the safety of these bounds even for nonschedulable traffic. We provide further insights into the tightness and complexity issues of worst-case delay bounds yielded by the extended BATA with the Graph-based approach, denoted G-BATA. Our assessments show that the complexity has decreased by up to 100 times while offering an average tightness ratio of 71%, with reference to the basic BATA. Finally, we evaluate the yielded improvements with G-BATA for a realistic use case against a recent state-of-the-art approach. This evaluation shows the applicability of GBATA under more general assumptions and the impact of such a feature on the tightness and computation tim

arXiv.org e-Print Archive

Open Archive Toulouse Archive Ouverte