Search CORE

928 research outputs found

Static virtual channel allocation in oblivious routing

Author: Cho Myong Hyon
Devadas Srinivas
Kinsy Michel A.
Lis Mieszko
Shim Keun Sup
Wen Tina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Most virtual channel routers have multiple virtual channels to mitigate the effects of head-of-line blocking. When there are more flows than virtual channels at a link, packets or flows must compete for channels, either in a dynamic way at each link or by static assignment computed before transmission starts. In this paper, we present methods that statically allocate channels to flows at each link when oblivious routing is used, and ensure deadlock freedom for arbitrary minimal routes when two or more virtual channels are available. We then experimentally explore the performance trade-offs of static and dynamic virtual channel allocation for various oblivious routing methods, including DOR, ROMM, Valiant and a novel bandwidth-sensitive oblivious routing scheme (BSORM). Through judicious separation of flows, static allocation schemes often exceed the performance of dynamic allocation schemes

DSpace@MIT

Crossref

Application-Aware Deadlock-Free Oblivious Routing

Author: Kinsy Michel A.
Cho Myong Hyon
Wen Tina
Suh Edward
Van Dijk Marten
Devadas Srinivas
Publication venue: Association for Computing Machinery
Publication date: 01/01/2001
Field of study

Conventional oblivious routing algorithms are either not application-aware or assume that each flow has its own private channel to ensure deadlock avoidance. We present a framework for application-aware routing that assures deadlock-freedom under one or more channels by forcing routes to conform to an acyclic channel dependence graph. Arbitrary minimal routes can be made deadlock-free through appropriate static channel allocation when two or more channels are available. Given bandwidth estimates for flows, we present a mixed integer-linear programming (MILP) approach and a heuristic approach for producing deadlock-free routes that minimize maximum channel load. The heuristic algorithm is calibrated using the MILP algorithm and evaluated on a number of benchmarks through detailed network simulation. Our framework can be used to produce application-aware routes that target the minimization of latency, number of flows through a link, bandwidth, or any combination thereof

Trinity University

DSpace@MIT

Crossref

Macquarie University ResearchOnline

Application-Aware Deadlock-Free Oblivious Routing

Author: Cho M. H.
Cormen Thomas H.
Craig
Dally William J.
Edward Suh
Galles Mike
Gross Thomas
Hu J.
Kavaldjiev N. K.
Kleinberg Jon Michael
Li-Shiuan
Marten van Dijk
Michel A. Kinsy
Murali Srinivasan
Myong Hyon Cho
Peh Li-Shiuan
Srinivas Devadas
Tina Wen
William
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

CiteSeerX

DSpace@MIT

Crossref

Guaranteed in-order packet delivery using Exclusive Dynamic Virtual Channel Allocation

Author: Cho Myong Hyon
Devadas Srinivas
Lis Mieszko
Shim Keun Sup
Publication venue
Publication date: 18/08/2009
Field of study

In-order packet delivery, a critical abstraction for many higher-level protocols, can severely limit the performance potential in low-latency networks (common, for example, in network-on-chip designs with many cores). While basic variants of dimension-order routing guarantee in-order delivery, improving performance by adding multiple dynamically allocated virtual channels or using other routing schemes compromises this guarantee. Although this can be addressed by reordering out-of-order packets at the destination core, such schemes incur significant overheads, and, in the worst case, raise the specter of deadlock or require expensive retransmission. We present Exclusive Dynamic VCA, an oblivious virtual channel allocation scheme which combines the performance advantages of dynamic virtual allocation with in-network, deadlock-free in-order delivery. At the same time, our scheme reduces head-of-line blocking, often significantly improving throughput compared to equivalent baseline (out-of-order) dimension-order routing when multiple virtual channels are used, and so may be desirable even when in-order delivery is not required. Implementation requires only minor, inexpensive changes to traditional oblivious dimension-order router architectures, more than offset by the removal of packet reorder buffers and logic

DSpace@MIT

Scalable, accurate multicore simulation in the 1000-core era

Author: Cho Myong Hyon
Devadas Srinivas
Fletcher Christopher Wardlaw
Khan Omer
Lis Mieszko
Ren Pengju
Shim Keun Sup
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2011
Field of study

We present HORNET, a parallel, highly configurable, cycle-level multicore simulator based on an ingress-queued worm-hole router NoC architecture. The parallel simulation engine offers cycle-accurate as well as periodic synchronization; while preserving functional accuracy, this permits tradeoffs between perfect timing accuracy and high speed with very good accuracy. When run on 6 separate physical cores on a single die, speedups can exceed a factor of over 5, and when run on a two-die 12-core system with 2-way hyperthreading, speedups exceed 11 ×. Most hardware parameters are configurable, including memory hierarchy, interconnect geometry, bandwidth, crossbar dimensions, and parameters driving power and thermal effects. A highly parametrized table-based NoC design allows a variety of routing and virtual channel allocation algorithms out of the box, ranging from simple DOR routing to complex Valiant, ROMM, or PROM schemes, BSOR, and adaptive routing. HORNET can run in network-only mode using synthetic traffic or traces, directly emulate a MIPS-based multicore, or function as the memory subsystem for native applications executed under the Pin instrumentation tool. HORNET is freely available under the open-source MIT license at http://csg.csail.mit.edu/hornet/

DSpace@MIT

Crossref

Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System

Author: Kinsy Michel
Pellauer Michael
Publication venue
Publication date: 08/12/2010
Field of study

Heracles is an open-source complete multicore system written in Verilog. It is fully parameterized and can be reconfigured and synthesized into different topologies and sizes. Each processing node has a 7-stage pipeline, fully bypassed, microprocessor running the MIPS-III ISA, a 4-stage input-buffer, virtual-channel router, and a local variable-size shared memory. Our design is highly modular with clear interfaces between the core, the memory hierarchy, and the on-chip network. In the baseline design, the microprocessor is attached to two caches, one instruction cache and one data cache, which are oblivious to the global memory organization. The memory system in Heracles can be configured as one single global shared memory (SM), or distributed shared memory (DSM), or any combination thereof. Each core is connected to the rest of the network of processors by a parameterized, realistic, wormhole router. We show different topology configurations of the system, and their synthesis results on the Xilinx Virtex-5 LX330T FPGA board. We also provide a small MIPS cross-compiler toolchain to assist in developing software for Heracles

DSpace@MIT

Bandwidth-sensitive oblivious routing

Author: Wen Tina
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2009
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (p. 81-83).Traditional oblivious routing algorithms either do not take into account the bandwidth demand, or assume that each flow has its own private channel to guarantee deadlock freedom. Though adaptive routing schemes can react to varying network traffic, they require complicated router designs. In this thesis, we present a polynomial-time heuristic routing algorithm that takes bandwidth requirements of each flow into account to minimize maximum channel load. The heuristic algorithm has two variants. The first one produces a deadlock-free route. The second one produces a minimal route, and is deadlock-free with two or more virtual channels assuming proper VC allocation. Both routing algorithms are oblivious, and need only simple router designs. The performance of each bandwidth-sensitive routing algorithm is evaluated against dimension-order routing and against the other on a number of benchmarks.by Tina Wen.M.Eng

DSpace@MIT

Optimal and Heuristic Application-Aware Oblivious Routing

Author: G. Edward Suh
Keun Sup Shim
Michel A. Kinsy
Mieszko Lis
Myong Hyon Cho
Srinivas Devadas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref