102,842 research outputs found
pTNoC: Probabilistically time-analyzable tree-based NoC for mixed-criticality systems
The use of networks-on-chip (NoC) in real-time safety-critical multicore systems challenges deriving tight worst-case execution time (WCET) estimates. This is due to the complexities in tightly upper-bounding the contention in the access to the NoC among running tasks. Probabilistic Timing Analysis (PTA) is a powerful approach to derive WCET estimates on relatively complex processors. However, so far it has only been tested on small multicores comprising an on-chip bus as communication means, which intrinsically does not scale to high core counts. In this paper we propose pTNoC, a new tree-based NoC design compatible with PTA requirements and delivering scalability towards medium/large core counts. pTNoC provides tight WCET estimates by means of asymmetric bandwidth guarantees for mixed-criticality systems with negligible impact on average performance. Finally, our implementation results show the reduced area and power costs of the pTNoC.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project
(www.proxima-project.eu), grant agreement no 611085. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Mladen Slijepcevic is funded by the Obra Social Fundación la Caixa under grant Doctorado “la Caixa” - Severo Ochoa. Carles
Hern´andez is jointly funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been
partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft
Distributed QoS Guarantees for Realtime Traffic in Ad Hoc Networks
In this paper, we propose a new cross-layer framework, named QPART ( QoS br>rotocol for Adhoc Realtime Traffic), which provides QoS guarantees to real-time multimedia applications for wireless ad hoc networks. By adapting the contention window sizes at the MAC layer, QPART schedules packets of flows according to their unique QoS requirements. QPART implements priority-based admission control and conflict resolution to ensure that the requirements of admitted realtime flows is smaller than the network capacity. The novelty of QPART is that it is robust to mobility and variances in channel capacity and imposes no control message overhead on the network
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
SARA: Self-Aware Resource Allocation for Heterogeneous MPSoCs
In modern heterogeneous MPSoCs, the management of shared memory resources is
crucial in delivering end-to-end QoS. Previous frameworks have either focused
on singular QoS targets or the allocation of partitionable resources among CPU
applications at relatively slow timescales. However, heterogeneous MPSoCs
typically require instant response from the memory system where most resources
cannot be partitioned. Moreover, the health of different cores in a
heterogeneous MPSoC is often measured by diverse performance objectives. In
this work, we propose a Self-Aware Resource Allocation (SARA) framework for
heterogeneous MPSoCs. Priority-based adaptation allows cores to use different
target performance and self-monitor their own intrinsic health. In response,
the system allocates non-partitionable resources based on priorities. The
proposed framework meets a diverse range of QoS demands from heterogeneous
cores.Comment: Accepted by the 55th annual Design Automation Conference 2018
(DAC'18
Study on QoS support in 802.11e-based multi-hop vehicular wireless ad hoc networks
Multimedia communications over vehicular ad hoc networks (VANET) will play an important role in the future intelligent transport system (ITS). QoS support for VANET therefore becomes an essential problem. In this paper, we first study the QoS performance in multi-hop VANET by using the standard IEEE 802.11e EDCA MAC and our proposed triple-constraint QoS routing protocol, Delay-Reliability-Hop (DeReHQ). In particular, we evaluate the DeReHQ protocol together with EDCA in highway and urban areas. Simulation results show that end-to-end delay performance can sometimes be achieved when both 802.11e EDCA and DeReHQ extended AODV are used. However, further studies on cross-layer optimization for QoS support in multi-hop environment are required
Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels
Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels
concurrently. On these GPUs, the thread block scheduler (TBS) uses the FIFO
policy to schedule their thread blocks. We show that FIFO leaves performance to
chance, resulting in significant loss of performance and fairness. To improve
performance and fairness, we propose use of the preemptive Shortest Remaining
Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime
of GPU kernels, we show that such an estimate of the runtime can be easily
obtained using online profiling and exploiting a simple observation on GPU
kernels' grid structure. Specifically, we propose a novel Structural Runtime
Predictor. Using a simple Staircase model of GPU kernel execution, we show that
the runtime of a kernel can be predicted by profiling only the first few thread
blocks. We evaluate an online predictor based on this model on benchmarks from
ERCBench, and find that it can estimate the actual runtime reasonably well
after the execution of only a single thread block. Next, we design a thread
block scheduler that is both concurrent kernel-aware and uses this predictor.
We implement the SRTF policy and evaluate it on two-program workloads from
ERCBench. SRTF improves STP by 1.18x and ANTT by 2.25x over FIFO. When compared
to MPMax, a state-of-the-art resource allocation policy for concurrent kernels,
SRTF improves STP by 1.16x and ANTT by 1.3x. To improve fairness, we also
propose SRTF/Adaptive which controls resource usage of concurrently executing
kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12x, ANTT by
2.23x and Fairness by 2.95x compared to FIFO. Overall, our implementation of
SRTF achieves system throughput to within 12.64% of Shortest Job First (SJF, an
oracle optimal scheduling policy), bridging 49% of the gap between FIFO and
SJF.Comment: 14 pages, full pre-review version of PACT 2014 poste
- …