Search CORE

4,295 research outputs found

Improving performance guarantees in wormhole mesh NoC designs

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Hernández Carles
Panic Milos
Quiñones Eduardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Wormhole-based mesh Networks-on-Chip (wNoC) are deployed in high-performance many-core processors due to their physical scalability and low-cost. Delivering tight and time composable Worst-Case Execution Time (WCET) estimates for applications as needed in safety-critical real-time embedded systems is challenged by wNoCs due to their distributed nature. We propose a bandwidth control mechanism for wNoCs that enables the computation of tight time-composable WCET estimates with low average performance degradation and high scalability. Our evaluation with the EEMBC automotive suite and an industrial real-time parallel avionics application confirms so.The research leading to these results is funded by the European Union Seventh Framework Programme under grant agreement no. 287519 (parMERASA) and by the Ministry of Science and Technology of Spain under contract TIN2012-34557. Milos Panic is funded by the Spanish Ministry of Education under the FPU grant FPU12/05966. Carles Hernández is jointly funded by the Spanish Ministry of Economy and Competitiveness and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella is partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Packet Transactions: High-level Programming for Line-Rate Switches

Author: Alizadeh Mohammad
Balakrishnan Hari
Budiu Mihai
Cheung Alvin
Kim Changhoon
Licking Steve
McKeown Nick
Sivaraman Anirudh
Varghese George
Publication venue
Publication date: 29/01/2016
Field of study

Many algorithms for congestion control, scheduling, network measurement, active queue management, security, and load balancing require custom processing of packets as they traverse the data plane of a network switch. To run at line rate, these data-plane algorithms must be in hardware. With today's switch hardware, algorithms cannot be changed, nor new algorithms installed, after a switch has been built. This paper shows how to program data-plane algorithms in a high-level language and compile those programs into low-level microcode that can run on emerging programmable line-rate switching chipsets. The key challenge is that these algorithms create and modify algorithmic state. The key idea to achieve line-rate programmability for stateful algorithms is the notion of a packet transaction : a sequential code block that is atomic and isolated from other such code blocks. We have developed this idea in Domino, a C-like imperative language to express data-plane algorithms. We show with many examples that Domino provides a convenient and natural way to express sophisticated data-plane algorithms, and show that these algorithms can be run at line rate with modest estimated die-area overhead.Comment: 16 page

arXiv.org e-Print Archive

DSpace@MIT

A Scalable and Adaptive Network on Chip for Many-Core Architectures

Author: Heißwolf Jan
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

In this work, a scalable network on chip (NoC) for future many-core architectures is proposed and investigated. It supports different QoS mechanisms to ensure predictable communication. Self-optimization is introduced to adapt the energy footprint and the performance of the network to the communication requirements. A fault tolerance concept allows to deal with permanent errors. Moreover, a template-based automated evaluation and design methodology and a synthesis flow for NoCs is introduced

KITopen

Worst-case end-to-end delays evaluation for SpaceWire networks

Author: Ferrandiz Thomas
Fraboul Christian
Frances Fabrice
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2011
Field of study

SpaceWire is a standard for on-board satellite networks chosen by the ESA as the basis for multiplexing payload and control traffic on future data-handling architectures. However, network designers need tools to ensure that the network is able to deliver critical messages on time. Current research fails to address this needs for SpaceWire networks. On one hand, many papers only seek to determine probabilistic results for end-to-end delays on Wormhole networks like SpaceWire. This does not provide sufficient guarantee for critical traffic. On the other hand, a few papers give methods to determine maximum latencies on wormhole networks that, unlike SpaceWire, have dedicated real-time mechanisms built-in. Thus, in this paper, we propose an appropriate method to compute an upper-bound on the worst-case end-to-end delay of a packet in a SpaceWire network

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Just Queuing: Policy-Based Scheduling Mechanism for Packet Switching Networks

Author: Miaji Yaser Shamsulhak A
Publication venue
Publication date: 01/01/2011
Field of study

The pervasiveness of the Internet and its applications lead to the potential increment of the users’ demands for more services with economical prices. The diversity of Internet traffic requires some classification and prioritisation since some traffic deserve much attention with less delay and loss compared to others. Current scheduling mechanisms are exposed to the trade-off between three major properties namely fairness, complexity and protection. Therefore, the question remains about how to improve the fairness and protection with less complex implementation. This research is designed to enhance scheduling mechanism by providing sustainability to the fairness and protection properties with simplicity in implementation; and hence higher service quality particularly for real-time applications. Extra elements are applied to the main fairness equation to improve the fairness property. This research adopts the restricted charge policy which imposes the protection of normal user. In terms of the complexity property, genetic algorithm has an advantage in holding the fitness score of the queue in separate storage space which potentially minimises the complexity of the algorithm. The integrity between conceptual, analytical and experimental approach verifies the efficiency of the proposed mechanism. The proposed mechanism is validated by using the emulation and the validation experiments involve real router flow data. The results of the evaluation showed fair bandwidth distribution similar to the popular Weighted Fair Queuing (WFQ) mechanism. Furthermore, better protection was exhibited in the results compared with the WFQ and two other scheduling mechanisms. The complexity of the proposed mechanism reached O(log(n)) which is considered as potentially low. Furthermore, this mechanism is limited to the wired networks and hence future works could improve the mechanism to be adopted in mobile ad-hoc networks or any other wireless networks. Moreover, more improvements could be applied to the proposed mechanism to enhance its deployment in the virtual circuits switching network such as the asynchronous transfer mode networks

Universiti Utara Malaysia: UUM eTheses

On packet switch design

Author: Minkenberg C.J.A.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2001
Field of study

Repository TU/e

Pure OAI Repository

Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming

Author: Caragea George Constantin
Publication venue
Publication date: 01/01/2011
Field of study

Faced with nearly stagnant clock speed advances, chip manufacturers have turned to parallelism as the source for continuing performance improvements. But even though numerous parallel architectures have already been brought to market, a universally accepted methodology for programming them for general purpose applications has yet to emerge. Existing solutions tend to be hardware-specific, rendering them difficult to use for the majority of application programmers and domain experts, and not providing scalability guarantees for future generations of the hardware. This dissertation advances the validation of the following thesis: it is possible to develop efficient general-purpose programs for a many-core platform using a model recognized for its simplicity. To prove this thesis, we refer to the eXplicit Multi-Threading (XMT) architecture designed and built at the University of Maryland. XMT is an attempt at re-inventing parallel computing with a solid theoretical foundation and an aggressive scalable design. Algorithmically, XMT is inspired by the PRAM (Parallel Random Access Machine) model and the architecture design is focused on reducing inter-task communication and synchronization overheads and providing an easy-to-program parallel model. This thesis builds upon the existing XMT infrastructure to improve support for efficient execution with a focus on ease-of-programming. Our contributions aim at reducing the programmer's effort in developing XMT applications and improving the overall performance. More concretely, we: (1) present a work-flow guiding programmers to produce efficient parallel solutions starting from a high-level problem; (2) introduce an analytical performance model for XMT programs and provide a methodology to project running time from an implementation; (3) propose and evaluate RAP -- an improved resource-aware compiler loop prefetching algorithm targeted at fine-grained many-core architectures; we demonstrate performance improvements of up to 34.79% on average over the GCC loop prefetching implementation and up to 24.61% on average over a simple hardware prefetching scheme; and (4) implement a number of parallel benchmarks and evaluate the overall performance of XMT relative to existing serial and parallel solutions, showing speedups of up to 13.89x vs.~ a serial processor and 8.10x vs.~parallel code optimized for an existing many-core (GPU). We also discuss the implementation and optimization of the Max-Flow algorithm on XMT, a problem which is among the more advanced in terms of complexity, benchmarking and research interest in the parallel algorithms community. We demonstrate better speed-ups compared to a best serial solution than previous attempts on other parallel platforms

CiteSeerX

Digital Repository at the University of Maryland

Recommended from our members

Scheduling and Fluid Routing for Flow-Based Microfluidic Laboratories-on-a-Chip

Author: Brisk Philip
Madsen Jan
McDaniel Jeffrey
Minhass Wajid Hassan
Pop Paul
Raagaard Michael Lander
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Microfluidic laboratories-on-a-chip (LoCs) are replacing the conventional biochemical analyzers and are able to integrate the necessary functions for biochemical analysis on-chip. There are several types of LoCs, each having its advantages and limitations. In this paper we are interested in flow-based LoCs, in which a continuous flow of liquid is manipulated using integrated microvalves. By combining several microvalves, more complex units, such as micropumps, switches, mixers, and multiplexers, can be built. We consider that the architecture of the LoC is given, and we are interested in synthesizing an implementation, consisting of the binding of operations in the application to the functional units of the architecture, the scheduling of operations and the routing and scheduling of the fluid flows, such that the application completion time is minimized. To solve this problem, we propose a list scheduling-based application mapping (LSAM) framework and evaluate it by using real-life as well as synthetic benchmarks. When biochemical applications contain fluids that may adsorb on the substrate on which they are transported, the solution is to use rinsing operations for contamination avoidance. Hence, we also propose a rinsing heuristic, which has been integrated in the LSAM framework

eScholarship - University of California

Online Research Database In Technology