1,586 research outputs found

    OutFlank Routing: Increasing Throughput in Toroidal Interconnection Networks

    Full text link
    We present a new, deadlock-free, routing scheme for toroidal interconnection networks, called OutFlank Routing (OFR). OFR is an adaptive strategy which exploits non-minimal links, both in the source and in the destination nodes. When minimal links are congested, OFR deroutes packets to carefully chosen intermediate destinations, in order to obtain travel paths which are only an additive constant longer than the shortest ones. Since routing performance is very sensitive to changes in the traffic model or in the router parameters, an accurate discrete-event simulator of the toroidal network has been developed to empirically validate OFR, by comparing it against other relevant routing strategies, over a range of typical real-world traffic patterns. On the 16x16x16 (4096 nodes) simulated network OFR exhibits improvements of the maximum sustained throughput between 14% and 114%, with respect to Adaptive Bubble Routing.Comment: 9 pages, 5 figures, to be presented at ICPADS 201

    Master of Science

    Get PDF
    thesisIntegrated circuits often consist of multiple processing elements that are regularly tiled across the two-dimensional surface of a die. This work presents the design and integration of high speed relative timed routers for asynchronous network-on-chip. It researches NoC's efficiency through simplicity by directly translating simple T-router, source-routing, single-flit packet to higher radix routers. This work is intended to study performance and power trade-offs adding higher radix routers, 3D topologies, Virtual Channels, Accurate NoC modeling, and Transmission line communication links. Routers with and without virtual channels are designed and integrated to arrayed communication networks. Furthermore, the work investigates 3D networks with diffusive RC wires and transmission lines on long wrap interconnects

    Overview of Swallow --- A Scalable 480-core System for Investigating the Performance and Energy Efficiency of Many-core Applications and Operating Systems

    Full text link
    We present Swallow, a scalable many-core architecture, with a current configuration of 480 x 32-bit processors. Swallow is an open-source architecture, designed from the ground up to deliver scalable increases in usable computational power to allow experimentation with many-core applications and the operating systems that support them. Scalability is enabled by the creation of a tile-able system with a low-latency interconnect, featuring an attractive communication-to-computation ratio and the use of a distributed memory configuration. We analyse the energy and computational and communication performances of Swallow. The system provides 240GIPS with each core consuming 71--193mW, dependent on workload. Power consumption per instruction is lower than almost all systems of comparable scale. We also show how the use of a distributed operating system (nOS) allows the easy creation of scalable software to exploit Swallow's potential. Finally, we show two use case studies: modelling neurons and the overlay of shared memory on a distributed memory system.Comment: An open source release of the Swallow system design and code will follow and references to these will be added at a later dat

    A Scalable & Energy Efficient Graphene-Based Interconnection Framework for Intra and Inter-Chip Wireless Communication in Terahertz Band

    Get PDF
    Network-on-Chips (NoCs) have emerged as a communication infrastructure for the multi-core System-on-Chips (SoCs). Despite its advantages, due to the multi-hop communication over the metal interconnects, traditional Mesh based NoC architectures are not scalable in terms of performance and energy consumption. Folded architectures such as Torus and Folded Torus were proposed to improve the performance of NoCs while retaining the regular tile-based structure for ease of manufacturing. Ultra-low-latency and low-power express channels between communicating cores have also been proposed to improve the performance of conventional NoCs. However, the performance gain of these approaches is limited due to metal/dielectric based interconnection. Many emerging interconnect technologies such as 3D integration, photonic, Radio Frequency (RF), and wireless interconnects have been envisioned to alleviate the issues of a metal/dielectric interconnect system. However, photonic and RF interconnects need the additional physically overlaid optical waveguides or micro-strip transmission lines to enable data transmission across the NoC. Several on-chip antennas have shown to improve energy efficiency and bandwidth of on-chip data communications. However, the date rates of the mm-wave wireless channels are limited by the state-of-the-art power-efficient transceiver design. Recent research has brought to light novel graphene based antennas operating at THz frequencies. Due to the higher operating frequencies compared to mm-wave transceivers, the data rate that can be supported by these antennas are significantly higher. Higher operating frequencies imply that graphene based antennas are just hundred micrometers in size compared to dimensions in the range of a millimeter of mm-wave antennas. Such reduced dimensions are suitable for integration of several such transceivers in a single NoC for relatively low overheads. In this work, to exploit the benefits of a regular NoC structure in conjunction with emerging Graphene-based wireless interconnect. We propose a toroidal folding based NoC architecture. The novelty of this folding based approach is that we are using low power, high bandwidth, single hop direct point to point wireless links instead of multihop communication that happens through metallic wires. We also propose a novel phased based communication protocol through which multiple wireless links can be made active at a time without having any interference among the transceiver. This offers huge gain in terms of performance as compared to token based mechanism where only a single wireless link can be made active at a time. We also propose to extend Graphene-based wireless links to enable energy-efficient, phase-based chip-to-chip communication to create a seamless, wireless interconnection fabric for multichip systems as well. Through cycle-accurate system-level simulations, we demonstrate that such designs with torus like folding based on THz links instead of global wires along with the proposed phase based multichip systems. We provide estimates that they are able to provide significant gains (about 3 to 4 times better in terms of achievable bandwidth, packet latency and average packet energy when compared to wired system) in performance and energy efficiency in data transfer in a NoC as well as multichip system. Thus, realization of these kind of interconnection framework that could support high data rate links in Tera-bits-per-second that will alleviate the capacity limitations of current interconnection framework

    Poloidal-toroidal decomposition in a finite cylinder. II. Discretization, regularization and validation

    Full text link
    The Navier-Stokes equations in a finite cylinder are written in terms of poloidal and toroidal potentials in order to impose incompressibility. Regularity of the solutions is ensured in several ways: First, the potentials are represented using a spectral basis which is analytic at the cylindrical axis. Second, the non-physical discontinuous boundary conditions at the cylindrical corners are smoothed using a polynomial approximation to a steep exponential profile. Third, the nonlinear term is evaluated in such a way as to eliminate singularities. The resulting pseudo-spectral code is tested using exact polynomial solutions and the spectral convergence of the coefficients is demonstrated. Our solutions are shown to agree with exact polynomial solutions and with previous axisymmetric calculations of vortex breakdown and of nonaxisymmetric calculations of onset of helical spirals. Parallelization by azimuthal wavenumber is shown to be highly effective

    Partial parallelization of VMEC system

    Get PDF

    Evaluating kernels on Xeon Phi to accelerate Gysela application

    Get PDF
    This work describes the challenges presented by porting parts ofthe Gysela code to the Intel Xeon Phi coprocessor, as well as techniques used for optimization, vectorization and tuning that can be applied to other applications. We evaluate the performance of somegeneric micro-benchmark on Phi versus Intel Sandy Bridge. Several interpolation kernels useful for the Gysela application are analyzed and the performance are shown. Some memory-bound and compute-bound kernels are accelerated by a factor 2 on the Phi device compared to Sandy architecture. Nevertheless, it is hard, if not impossible, to reach a large fraction of the peek performance on the Phi device,especially for real-life applications as Gysela. A collateral benefit of this optimization and tuning work is that the execution time of Gysela (using 4D advections) has decreased on a standard architecture such as Intel Sandy Bridge.Comment: submitted to ESAIM proceedings for CEMRACS 2014 summer school version reviewe
    • …
    corecore