9,799 research outputs found

    Modeling high-performance wormhole NoCs for critical real-time embedded systems

    Get PDF
    Manycore chips are a promising computing platform to cope with the increasing performance needs of critical real-time embedded systems (CRTES). However, manycores adoption by CRTES industry requires understanding task's timing behavior when their requests use manycore's network-on-chip (NoC) to access hardware shared resources. This paper analyzes the contention in wormhole-based NoC (wNoC) designs - widely implemented in the high-performance domain - for which we introduce a new metric: worst-contention delay (WCD) that captures wNoC impact on worst-case execution time (WCET) in a tighter manner than the existing metric, worst-case traversal time (WCTT). Moreover, we provide an analytical model of the WCD that requests can suffer in a wNoC and we validate it against wNoC designs resembling those in the Tilera-Gx36 and the Intel-SCC 48-core processors. Building on top of our WCD analytical model, we analyze the impact on WCD that different design parameters such as the number of virtual channels, and we make a set of recommendations on what wNoC setups to use in the context of CRTES.Peer ReviewedPostprint (author's final draft

    Network on Chip: a New Approach of QoS Metric Modeling Based on Calculus Theory

    Full text link
    A NoC is composed by IP cores (Intellectual Propriety) and switches connected among themselves by communication channels. End-to-End Delay (EED) communication is accomplished by the exchange of data among IP cores. Often, the structure of particular messages is not adequate for the communication purposes. This leads to the concept of packet switching. In the context of NoCs, packets are composed by header, payload, and trailer. Packets are divided into small pieces called Flits. It appears of importance, to meet the required performance in NoC hardware resources. It should be specified in an earlier step of the system design. The main attention should be given to the choice of some network parameters such as the physical buffer size in the node. The EED and packet loss are some of the critical QoS metrics. Some real-time and multimedia applications bound up these parameters and require specific hardware resources and particular management approaches in the NoC switch. A traffic contract (SLA, Service Level Agreement) specifies the ability of a network or protocol to give guaranteed performance, throughput or latency bounds based on mutually agreed measures, usually by prioritizing traffic. A defined Quality of Service (QoS) may be required for some types of network real time traffic or multimedia applications. The main goal of this paper is, using the Network on Chip modeling architecture, to define a QoS metric. We focus on the network delay bound and packet losses. This approach is based on the Network Calculus theory, a mathematical model to represent the data flows behavior between IPs interconnected over NoC. We propose an approach of QoS-metric based on QoS-parameter prioritization factors for multi applications-service using calculus model

    An enhanced worst-case end-to-end evaluation method for SpaceWire networks

    Get PDF
    The SpaceWire network is scheduled to be used as the sole on-board network for future ESA satellites. However, at the moment, network designers do not have tools to ensure that critical temporal deadlines are met when using best-effort wormhole networks like SpaceWire. In a previous paper, we have presented a first method to compute an upper-bound on the worst-case end-to-end delay of flows traversing such networks. However, its scope was limited by restrictive assumptions on the traffic patterns. Thus, in this paper, we propose a new network model that removes those limitations and allows worst-case delay analysis on SpaceWire networks with any traffic pattern

    High-Integrity Performance Monitoring Units in Automotive Chips for Reliable Timing V&V

    Get PDF
    As software continues to control more system-critical functions in cars, its timing is becoming an integral element in functional safety. Timing validation and verification (V&V) assesses softwares end-to-end timing measurements against given budgets. The advent of multicore processors with massive resource sharing reduces the significance of end-to-end execution times for timing V&V and requires reasoning on (worst-case) access delays on contention-prone hardware resources. While Performance Monitoring Units (PMU) support this finer-grained reasoning, their design has never been a prime consideration in high-performance processors - where automotive-chips PMU implementations descend from - since PMU does not directly affect performance or reliability. To meet PMUs instrumental importance for timing V&V, we advocate for PMUs in automotive chips that explicitly track activities related to worst-case (rather than average) softwares behavior, are recognized as an ISO-26262 mandatory high-integrity hardware service, and are accompanied with detailed documentation that enables their effective use to derive reliable timing estimatesThis work has also been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Enrico Mezzet has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva-Incorporación postdoctoral fellowship number IJCI-2016- 27396.Peer ReviewedPostprint (author's final draft

    Limits on Fundamental Limits to Computation

    Full text link
    An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help evaluate emerging technologies and enrich our understanding of integrated-circuit scaling, we review fundamental limits to computation: in manufacturing, energy, physical space, design and verification effort, and algorithms. To outline what is achievable in principle and in practice, we recall how some limits were circumvented, compare loose and tight limits. We also point out that engineering difficulties encountered by emerging technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl

    Performance evaluation of non-prefiltering vs. time reversal prefiltering in distributed and uncoordinated IR-UWB ad-hoc networks

    Get PDF
    Time Reversal (TR) is a prefiltering scheme mostly analyzed in the context of centralized and synchronous IR-UWB networks, in order to leverage the trade-off between communication performance and device complexity, in particular in presence of multiuser interference. Several strong assumptions have been typically adopted in the analysis of TR, such as the absence of Inter-Symbol / Inter-Frame Interference (ISI/IFI) and multipath dispersion due to complex signal propagation. This work has the main goal of comparing the performance of TR-based systems with traditional non-prefiltered schemes, in the novel context of a distributed and uncoordinated IR-UWB network, under more realistic assumptions including the presence of ISI/IFI and multipath dispersion. Results show that, lack of power control and imperfect channel knowledge affect the performance of both non-prefiltered and TR systems; in these conditions, TR prefiltering still guarantees a performance improvement in sparse/low-loaded and overloaded network scenarios, while the opposite is true for less extreme scenarios, calling for the developement of an adaptive scheme that enables/disables TR prefiltering depending on network conditions

    Slow Learners are Fast

    Full text link
    Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning.Comment: Extended version of conference paper - NIPS 200

    Power-Aware Speed Scaling in Processor Sharing Systems

    Get PDF
    Energy use of computer communication systems has quickly become a vital design consideration. One effective method for reducing energy consumption is dynamic speed scaling, which adapts the processing speed to the current load. This paper studies how to optimally scale speed to balance mean response time and mean energy consumption under processor sharing scheduling. Both bounds and asymptotics for the optimal speed scaling scheme are provided. These results show that a simple scheme that halts when the system is idle and uses a static rate while the system is busy provides nearly the same performance as the optimal dynamic speed scaling. However, the results also highlight that dynamic speed scaling provides at least one key benefit - significantly improved robustness to bursty traffic and mis-estimation of workload parameters

    Submicron Systems Architecture: Semiannual Technical Report

    Get PDF
    [No abstract
    corecore