201 research outputs found

    SMART: A Single-Cycle Reconfigurable NoC for SoC Applications

    Get PDF
    As technology scales, SoCs are increasing in core counts, leading to the need for scalable NoCs to interconnect the multiple cores on the chip. Given aggressive SoC design targets, NoCs have to deliver low latency, high bandwidth, at low power and area overheads. In this paper, we propose Single-cycle Multi-hop Asynchronous Repeated Traversal (SMART) NoC, a NoC that reconfigures and tailors a generic mesh topology for SoC applications at runtime. The heart of our SMART NoC is a novel low-swing clockless repeated link circuit embedded within the router crossbars, that allows packets to potentially bypass all the way from source to destination core within a single clock cycle, without being latched at any intermediate router. Our clockless repeater link has been proven in silicon in 45nm SOI. Results show that at 2GHz, we can traverse 8mm within a single cycle, i.e. 8 hops with 1mm cores. We implement the SMART NoC to layout and show that SMART NoC gives 60% latency savings, and 2.2X power savings compared to a baseline mesh NoC.United States. Defense Advanced Research Projects Agency. The Ubiquitous High-Performance Computing Progra

    OrthoNoC: a broadcast-oriented dual-plane wireless network-on-chip architecture

    Get PDF
    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksOn-chip communication remains as a key research issue at the gates of the manycore era. In response to this, novel interconnect technologies have opened the door to new Network-on-Chip (NoC) solutions towards greater scalability and architectural flexibility. Particularly, wireless on-chip communication has garnered considerable attention due to its inherent broadcast capabilities, low latency, and system-level simplicity. This work presents ORTHONOC, a wired-wireless architecture that differs from existing proposals in that both network planes are decoupled and driven by traffic steering policies enforced at the network interfaces. With these and other design decisions, ORTHONOC seeks to emphasize the ordered broadcast advantage offered by the wireless technology. The performance and cost of ORTHONOC are first explored using synthetic traffic, showing substantial improvements with respect to other wired-wireless designs with a similar number of antennas. Then, the applicability of ORTHONOC in the multiprocessor scenario is demonstrated through the evaluation of a simple architecture that implements fast synchronization via ordered broadcast transmissions. Simulations reveal significant execution time speedups and communication energy savings for 64-threaded benchmarks, proving that the value of ORTHONOC goes beyond simply improving the performance of the on-chip interconnect.Peer ReviewedPostprint (author's final draft

    Addressing Manufacturing Challenges in NoC-based ULSI Designs

    Full text link
    Hernández Luz, C. (2012). Addressing Manufacturing Challenges in NoC-based ULSI Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1669

    MorphIC: A 65-nm 738k-Synapse/mm2^2 Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning

    Full text link
    Recent trends in the field of neural network accelerators investigate weight quantization as a means to increase the resource- and power-efficiency of hardware devices. As full on-chip weight storage is necessary to avoid the high energy cost of off-chip memory accesses, memory reduction requirements for weight storage pushed toward the use of binary weights, which were demonstrated to have a limited accuracy reduction on many applications when quantization-aware training techniques are used. In parallel, spiking neural network (SNN) architectures are explored to further reduce power when processing sparse event-based data streams, while on-chip spike-based online learning appears as a key feature for applications constrained in power and resources during the training phase. However, designing power- and area-efficient spiking neural networks still requires the development of specific techniques in order to leverage on-chip online learning on binary weights without compromising the synapse density. In this work, we demonstrate MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning rule and a hierarchical routing fabric for large-scale chip interconnection. The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF) neurons and more than two million plastic synapses for an active silicon area of 2.86mm2^2 in 65nm CMOS, achieving a high density of 738k synapses/mm2^2. MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff on the MNIST classification task compared to previously-proposed SNNs, while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE Transactions on Biomedical Circuits and Systems journal (2019), the fully-edited paper is available at https://ieeexplore.ieee.org/document/876400

    Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

    Get PDF
    Locality has always been a critical factor in on-chip data placement on CMPs as accessing further-away caches has in the past been more costly than accessing nearby ones. Substantial research on locality-aware designs have thus focused on keeping a copy of the data private. However, this complicatesthe problem of data tracking and search/invalidation; tracking the state of a line at all on-chip caches at a directory or performing full-chip broadcasts are both non-scalable and extremely expensive solutions. In this paper, we make the case for Locality-Oblivious Cache Organization (LOCO), a CMP cache organization that leverages the on-chip network to create virtual single-cycle paths between distant caches, thus redefining the notion of locality. LOCO is a clustered cache organization, supporting both homogeneous and heterogeneous cluster sizes, and provides near single-cycle accesses to data anywhere within the cluster, just like a private cache. Globally, LOCO dynamically creates a virtual mesh connecting all the clusters, and performs an efficient global data search and migration over this virtual mesh, without having to resort to full-chip broadcasts or perform expensive directory lookups. Trace-driven and full system simulations running SPLASH-2 and PARSEC benchmarks show that LOCO improves application run time by up to 44.5% over baseline private and shared cache.Semiconductor Research CorporationUnited States. Defense Advanced Research Projects Agency (Semiconductor Technology Advanced Research Network

    TechNews digests: Jan - Nov 2008

    Get PDF
    TechNews is a technology, news and analysis service aimed at anyone in the education sector keen to stay informed about technology developments, trends and issues. TechNews focuses on emerging technologies and other technology news. TechNews service : digests september 2004 till May 2010 Analysis pieces and News combined publish every 2 to 3 month

    Networks on Chips: From Research to Products

    Get PDF
    Research on Networks on Chips (NoCs) has spanned over a decade and its results are now visible in some products. Thus the seminal idea of using networking technology to address the chip-level interconnect problem has been shown to be correct. Moreover, as technology scales down in geometry and chips scale up in complexity, NoCs become the essential element to achieve the desired levels of performance and quality of service while curbing power consumption levels. Design and timing closure can only be achieved by a sophisticated set of tools that address NoC synthesis, optimization and validation

    Software-based and regionally-oriented traffic management in Networks-on-Chip

    Get PDF
    Since the introduction of chip-multiprocessor systems, the number of integrated cores has been steady growing and workload applications have been adapted to exploit the increasing parallelism. This changed the importance of efficient on-chip communication significantly and the infrastructure has to keep step with these new requirements. The work at hand makes significant contributions to the state-of-the-art of the latest generation of such solutions, called Networks-on-Chip, to improve the performance, reliability, and flexible management of these on-chip infrastructures

    Design space exploration of photonic interconnects

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 109-113).As processors scale deep into the multi-core and many-core regimes, bandwidth and energy-efficiency of the on-die interconnect network have become paramount design issues. Recognizing potential limits of electrical interconnects, emerging nanophotonic integration has been recently proposed as a potential technology option for both on-chip and chip-to-chip applications. As optical links avoid the capacitive, resistive and signal integrity limits imposed upon electrical interconnects, the introduction of integrated photonics allows for efficient realization of physical connectivity that are costly to accomplish electrically. While many recent works have since cited the potential benefits of optics, inherent design tradeoffs of photonic datapath and backend components remain relatively unknown at the system-level. This thesis develops insights regarding the behavior of electrical and hybrid optoelectrical networks and systems. We present power and area models that capture the behavior of electrical interface circuits and their interactions with optical devices. To animate these models in the context of a full system, we contribute DSENT, a novel physical modeling framework capable of estimating the costs of generalized digital electronics, mixed-signal interface circuitry, and optical links. With DSENT, we enable fast power and area evaluation of entire networks to connect the dynamics of an underlying photonics interconnect to that of an otherwise electrical system. Using our methodolody, we perform a technology-driven design space exploration of intra-chip networks and highlight the importance of thermal tuning and parasitic receiver capacitances in network power consumption. We show that the performance gains enabled by photonics-inspired architectures can enable savings in total system energy even if the network is more costly. Finally, we propose a photonically interconnected DRAM system as a solution to the core-to-DRAM bandwidth bottleneck. By attacking energy consumption at the DRAM channel, chip, and bank level with integrated photoncis, we cut the power consumption of the DRAM system by 10x while remaining area neutral when compared to a projected electrical baseline.by Chen Sun.S.M
    • …
    corecore