9,799 research outputs found
Modeling high-performance wormhole NoCs for critical real-time embedded systems
Manycore chips are a promising computing platform to cope with the increasing performance needs of critical real-time embedded systems (CRTES). However, manycores adoption by CRTES industry requires understanding task's timing behavior when their requests use manycore's network-on-chip (NoC) to access hardware shared resources. This paper analyzes the contention in wormhole-based NoC (wNoC) designs - widely implemented in the high-performance domain - for which we introduce a new metric: worst-contention delay (WCD) that captures wNoC impact on worst-case execution time (WCET) in a tighter manner than the existing metric, worst-case traversal
time (WCTT). Moreover, we provide an analytical model of the WCD that requests can suffer in a wNoC and we validate it against wNoC designs resembling those in the Tilera-Gx36 and the Intel-SCC 48-core processors. Building on top of our WCD analytical model, we analyze the impact on WCD that different design parameters such as the number of virtual channels, and we make a set of recommendations on what wNoC setups to use in the context of CRTES.Peer ReviewedPostprint (author's final draft
Network on Chip: a New Approach of QoS Metric Modeling Based on Calculus Theory
A NoC is composed by IP cores (Intellectual Propriety) and switches connected
among themselves by communication channels. End-to-End Delay (EED)
communication is accomplished by the exchange of data among IP cores. Often,
the structure of particular messages is not adequate for the communication
purposes. This leads to the concept of packet switching. In the context of
NoCs, packets are composed by header, payload, and trailer. Packets are divided
into small pieces called Flits. It appears of importance, to meet the required
performance in NoC hardware resources. It should be specified in an earlier
step of the system design. The main attention should be given to the choice of
some network parameters such as the physical buffer size in the node. The EED
and packet loss are some of the critical QoS metrics. Some real-time and
multimedia applications bound up these parameters and require specific hardware
resources and particular management approaches in the NoC switch. A traffic
contract (SLA, Service Level Agreement) specifies the ability of a network or
protocol to give guaranteed performance, throughput or latency bounds based on
mutually agreed measures, usually by prioritizing traffic. A defined Quality of
Service (QoS) may be required for some types of network real time traffic or
multimedia applications. The main goal of this paper is, using the Network on
Chip modeling architecture, to define a QoS metric. We focus on the network
delay bound and packet losses. This approach is based on the Network Calculus
theory, a mathematical model to represent the data flows behavior between IPs
interconnected over NoC. We propose an approach of QoS-metric based on
QoS-parameter prioritization factors for multi applications-service using
calculus model
An enhanced worst-case end-to-end evaluation method for SpaceWire networks
The SpaceWire network is scheduled to be used as the sole on-board network for future ESA satellites. However, at the moment, network designers do not have tools to ensure that critical temporal deadlines are met when using best-effort wormhole networks like SpaceWire. In a previous paper, we have presented a first method to compute an upper-bound on the worst-case end-to-end delay of flows traversing such networks. However, its scope was limited by restrictive assumptions on the traffic patterns. Thus, in this paper, we propose a new network model that removes those limitations and allows worst-case delay analysis on SpaceWire networks with any traffic pattern
High-Integrity Performance Monitoring Units in Automotive Chips for Reliable Timing V&V
As software continues to control more system-critical functions in cars, its timing is becoming an integral element in functional safety. Timing validation and verification (V&V) assesses softwares end-to-end timing measurements against given budgets. The advent of multicore processors with massive resource sharing reduces the significance of end-to-end execution times for timing V&V and requires reasoning on (worst-case) access delays on contention-prone hardware resources. While Performance Monitoring Units (PMU) support this finer-grained reasoning, their design has never been a prime consideration in high-performance processors - where automotive-chips PMU implementations descend from - since PMU does not directly affect performance or reliability. To meet PMUs instrumental importance for timing V&V, we advocate for PMUs in automotive chips that explicitly track activities related to worst-case (rather than average) softwares behavior, are recognized as an ISO-26262 mandatory high-integrity hardware service, and are accompanied with detailed documentation that enables their effective use to derive reliable timing estimatesThis work has also been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant
TIN2015-65316-P and the HiPEAC Network of Excellence. Jaume Abella has been partially supported by the MINECO under
Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Enrico Mezzet has been partially supported by the Spanish
Ministry of Economy and Competitiveness under Juan de la Cierva-Incorporación postdoctoral fellowship number IJCI-2016-
27396.Peer ReviewedPostprint (author's final draft
Limits on Fundamental Limits to Computation
An indispensable part of our lives, computing has also become essential to
industries and governments. Steady improvements in computer hardware have been
supported by periodic doubling of transistor densities in integrated circuits
over the last fifty years. Such Moore scaling now requires increasingly heroic
efforts, stimulating research in alternative hardware and stirring controversy.
To help evaluate emerging technologies and enrich our understanding of
integrated-circuit scaling, we review fundamental limits to computation: in
manufacturing, energy, physical space, design and verification effort, and
algorithms. To outline what is achievable in principle and in practice, we
recall how some limits were circumvented, compare loose and tight limits. We
also point out that engineering difficulties encountered by emerging
technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl
Performance evaluation of non-prefiltering vs. time reversal prefiltering in distributed and uncoordinated IR-UWB ad-hoc networks
Time Reversal (TR) is a prefiltering scheme mostly analyzed in the context of centralized and synchronous IR-UWB networks, in order to leverage the trade-off between communication performance and device complexity, in particular in presence of multiuser interference. Several strong assumptions have been typically adopted in the analysis of TR, such as the absence of Inter-Symbol / Inter-Frame Interference (ISI/IFI) and multipath dispersion due to complex signal propagation. This work has the main goal of comparing the performance of TR-based systems with traditional non-prefiltered schemes, in the novel context of a distributed and uncoordinated IR-UWB network, under more realistic assumptions including the presence of ISI/IFI and multipath dispersion. Results show that, lack of power control and imperfect channel knowledge affect the performance of both non-prefiltered and TR systems; in these conditions, TR prefiltering still guarantees a performance improvement in sparse/low-loaded and overloaded network scenarios, while the opposite is true for less extreme scenarios, calling for the developement of an adaptive scheme that enables/disables TR prefiltering depending on network conditions
Slow Learners are Fast
Online learning algorithms have impressive convergence properties when it
comes to risk minimization and convex games on very large problems. However,
they are inherently sequential in their design which prevents them from taking
advantage of modern multi-core architectures. In this paper we prove that
online learning with delayed updates converges well, thereby facilitating
parallel online learning.Comment: Extended version of conference paper - NIPS 200
Power-Aware Speed Scaling in Processor Sharing Systems
Energy use of computer communication systems has quickly become a vital design consideration. One effective method for reducing energy consumption is dynamic speed scaling, which adapts the processing speed to the current load. This paper studies how to optimally scale speed to balance mean response time and mean energy consumption under processor sharing scheduling. Both bounds and asymptotics for the optimal speed scaling scheme are provided. These results show that a simple scheme that halts when the system is idle and uses a static rate while the system is busy provides nearly the same performance as the optimal dynamic speed scaling. However, the results also highlight that dynamic speed scaling provides at least one key benefit - significantly improved robustness to bursty traffic and mis-estimation of workload parameters
- …