7 research outputs found

    SWIFT: A Low-Power Network-On-Chip Implementing the Token Flow Control Router Architecture With Swing-Reduced Interconnects

    Get PDF
    A 64-bit, 8 × 8 mesh network-on-chip (NoC) is presented that uses both new architectural and circuit design techniques to improve on-chip network energy-efficiency, latency, and throughput. First, we propose token flow control, which enables bypassing of flit buffering in routers, thereby reducing buffer size and their power consumption. We also incorporate reduced-swing signaling in on-chip links and crossbars to minimize datapath interconnect energy. The 64-node NoC is experimentally validated with a 2 × 2 test chip in 90 nm, 1.2 V CMOS that incorporates traffic generators to emulate the traffic of the full network. Compared with a fully synthesized baseline 8 × 8 NoC architecture designed to meet the same peak throughput, the fabricated prototype reduces network latency by 20% under uniform random traffic, when both networks are run at their maximum operating frequencies. When operated at the same frequencies, the SWIFT NoC reduces network power by 38% and 25% at saturation and low loads, respectively

    Design and implementation of in-network coherence

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Title as it appears in MIT Commencement Exercises program, June 2013: Design and implementation of in-network coherence. Cataloged from PDF version of thesis.Includes bibliographical references (p. 101-104).CMOS technology scaling has enabled increasing transistor density on chip. At the same time, multi-core processors that provide increased performance, vis-a'-vis power efficiency, have become prevalent in a power constrained environment. The shared memory model is a predominant paradigm in such systems, easing programmability and increasing portability. However with memory being shared by an increasing number of cores, a scalable coherence mechanism is imperative for these systems. Snoopy coherence has been a favored coherence scheme owing to its high performance and simplicity. However there are few viable proposals to extend snoopy coherence to unordered interconnects - specifically, modular packet-switched interconnects that have emerged as a scalable solution to the communication challenges in the CMP era. This thesis proposes a distributed in-network global ordering scheme that enables snoopy coherence on unordered interconnects. The proposed scheme is realized on a two-dimensional mesh interconnection network, referred to as OMNI (Ordered Mesh Network Interconnect). OMNI is an enabling solution for the SCORPIO processor prototype developed at MIT - a 36-core chip multi-processor supporting snoopy coherence, and fabricated in a commercial 45nm technology. OMNI is shown to be effective, reducing runtime by 36% in comparison to directory and Hammer coherence protocol implementations. The OMNI network achieves an operating frequency of 833 MHz post-layout, occupies 10% of the chip area, and consumes less than 100mW of power.by Suvinay Subramanian.S.M

    Design of platform for exploring application-specific NoC architecture.

    Get PDF
    Liu, Zhouyi.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (leaves 110-114).Abstracts in English and Chinese.ABSTRACTS --- p.I摘要 --- p.IICONTENTS --- p.IIILIST OF FIGURE --- p.VLIST OF TABLE --- p.VIACKNOWLEDGEMENT --- p.VIIChapter CHAPTER 1 --- INTRODUCTION --- p.1Chapter 1.1 --- NETWORK-ON-CHIP --- p.1Chapter 1.2 --- RELATED WORKS --- p.2Chapter 1.3 --- PLATFORM OVERVEW --- p.6Chapter 1.4 --- AUTHOR'S CONTRIBUTION --- p.10Chapter CHAPTER 2 --- NOC LIBRARY --- p.12Chapter 2.1 --- NETWORK TERMINOLOGY --- p.12Chapter 2.2 --- BASIC STRUCTURE --- p.15Chapter 2.3 --- LOW-POWER ORIENTED ARCHITECTURE --- p.20Chapter 2.3.1 --- Low-Cost Allocator Design --- p.21Chapter 2.3.2 --- Clock Gating --- p.22Chapter 2.3.3 --- Express Virtual Channel Insertion --- p.22Chapter 2.4 --- LOW-LATENCY ORIENTED ARCHITECTURE --- p.28Chapter 2.4.1. --- Lookahead Bypass Scheme --- p.29Chapter 2.4.2. --- Lookahead Bypass Router Architecture --- p.29Chapter CHAPTER 3 --- BENCHMARK AND MEASUREMENT --- p.31Chapter 3.1 --- BENCHMARK GENERATION --- p.32Chapter 3.1.1 --- Types of Traffic Patterns --- p.32Chapter 3.1.2 --- Traffic Generator --- p.36Chapter 3.2 --- MEASUREMENT SETTING --- p.38Chapter 3.2.1 --- Warming-up Period. --- p.38Chapter 3.2.2 --- Latency Definition --- p.39Chapter 3.2.3 --- Throughput Definition --- p.40Chapter 3.2.4 --- Virtual Channel Utilization --- p.40Chapter CHAPTER 4 --- PLATFORM STRUCTURE --- p.41Chapter 4.1 --- FILE TREE --- p.42Chapter 4.1.1 --- System Files --- p.46Chapter 4.1.2 --- Low-Power NoC Related --- p.47Chapter 4.1.3 --- Low-Latency NoC Related --- p.50Chapter 4.1.4 --- Project Related --- p.51Chapter 4.2 --- PROCESSES --- p.52Chapter 4.3 --- GUI ACCESS --- p.56Chapter 4.3.1 --- Section 1: Project Setup --- p.58Chapter 4.3.2 --- Section 2-a: Low-Power Router Structure --- p.59Chapter 4.3.3 --- Section 2-b: Low-Latency Router Structure --- p.60Chapter 4.3.4 --- Section 3: Benchmark & Measurement --- p.60Chapter 4.3.5 --- Section 4: View Result --- p.62Chapter 4.3.6 --- Low-Power NoC Example --- p.62Chapter CHAPTER 5 --- OPTIMIZATION AND COMPARISON --- p.72Chapter 5.1 --- OPTIMIZATION TECHNIQUE --- p.72Chapter 5.1.1 --- Optimization Phase 1: Inactive Buffer Removal --- p.73Chapter 5.1.2 --- Optimization Phase 2: Infighting Analysis --- p.74Chapter 5.1.3 --- Over-Optimization --- p.75Chapter 5.1.4 --- Optimization Example --- p.79Chapter 5.2 --- NOCS COMPARISON --- p.83Chapter 5.3 --- LOW-POWER IMPLEMENTATION CODE EXPORT --- p.88Chapter CHAPTER 6 --- SUMMARY AND FUTURE WORK --- p.92Chapter 6.1. --- SUMMARY --- p.92Chapter 6.2. --- FUTURE WORK --- p.93REFERENCES --- p.9

    Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

    Get PDF
    This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz

    Software-based and regionally-oriented traffic management in Networks-on-Chip

    Get PDF
    Since the introduction of chip-multiprocessor systems, the number of integrated cores has been steady growing and workload applications have been adapted to exploit the increasing parallelism. This changed the importance of efficient on-chip communication significantly and the infrastructure has to keep step with these new requirements. The work at hand makes significant contributions to the state-of-the-art of the latest generation of such solutions, called Networks-on-Chip, to improve the performance, reliability, and flexible management of these on-chip infrastructures
    corecore