7 research outputs found
SWIFT: A Low-Power Network-On-Chip Implementing the Token Flow Control Router Architecture With Swing-Reduced Interconnects
A 64-bit, 8 × 8 mesh network-on-chip (NoC) is presented that uses both new architectural and circuit design techniques to improve on-chip network energy-efficiency, latency, and throughput. First, we propose token flow control, which enables bypassing of flit buffering in routers, thereby reducing buffer size and their power consumption. We also incorporate reduced-swing signaling in on-chip links and crossbars to minimize datapath interconnect energy. The 64-node NoC is experimentally validated with a 2 × 2 test chip in 90 nm, 1.2 V CMOS that incorporates traffic generators to emulate the traffic of the full network. Compared with a fully synthesized baseline 8 × 8 NoC architecture designed to meet the same peak throughput, the fabricated prototype reduces network latency by 20% under uniform random traffic, when both networks are run at their maximum operating frequencies. When operated at the same frequencies, the SWIFT NoC reduces network power by 38% and 25% at saturation and low loads, respectively
Design and implementation of in-network coherence
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Title as it appears in MIT Commencement Exercises program, June 2013: Design and implementation of in-network coherence. Cataloged from PDF version of thesis.Includes bibliographical references (p. 101-104).CMOS technology scaling has enabled increasing transistor density on chip. At the same time, multi-core processors that provide increased performance, vis-a'-vis power efficiency, have become prevalent in a power constrained environment. The shared memory model is a predominant paradigm in such systems, easing programmability and increasing portability. However with memory being shared by an increasing number of cores, a scalable coherence mechanism is imperative for these systems. Snoopy coherence has been a favored coherence scheme owing to its high performance and simplicity. However there are few viable proposals to extend snoopy coherence to unordered interconnects - specifically, modular packet-switched interconnects that have emerged as a scalable solution to the communication challenges in the CMP era. This thesis proposes a distributed in-network global ordering scheme that enables snoopy coherence on unordered interconnects. The proposed scheme is realized on a two-dimensional mesh interconnection network, referred to as OMNI (Ordered Mesh Network Interconnect). OMNI is an enabling solution for the SCORPIO processor prototype developed at MIT - a 36-core chip multi-processor supporting snoopy coherence, and fabricated in a commercial 45nm technology. OMNI is shown to be effective, reducing runtime by 36% in comparison to directory and Hammer coherence protocol implementations. The OMNI network achieves an operating frequency of 833 MHz post-layout, occupies 10% of the chip area, and consumes less than 100mW of power.by Suvinay Subramanian.S.M
Design of platform for exploring application-specific NoC architecture.
Liu, Zhouyi.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (leaves 110-114).Abstracts in English and Chinese.ABSTRACTS --- p.I摘要 --- p.IICONTENTS --- p.IIILIST OF FIGURE --- p.VLIST OF TABLE --- p.VIACKNOWLEDGEMENT --- p.VIIChapter CHAPTER 1 --- INTRODUCTION --- p.1Chapter 1.1 --- NETWORK-ON-CHIP --- p.1Chapter 1.2 --- RELATED WORKS --- p.2Chapter 1.3 --- PLATFORM OVERVEW --- p.6Chapter 1.4 --- AUTHOR'S CONTRIBUTION --- p.10Chapter CHAPTER 2 --- NOC LIBRARY --- p.12Chapter 2.1 --- NETWORK TERMINOLOGY --- p.12Chapter 2.2 --- BASIC STRUCTURE --- p.15Chapter 2.3 --- LOW-POWER ORIENTED ARCHITECTURE --- p.20Chapter 2.3.1 --- Low-Cost Allocator Design --- p.21Chapter 2.3.2 --- Clock Gating --- p.22Chapter 2.3.3 --- Express Virtual Channel Insertion --- p.22Chapter 2.4 --- LOW-LATENCY ORIENTED ARCHITECTURE --- p.28Chapter 2.4.1. --- Lookahead Bypass Scheme --- p.29Chapter 2.4.2. --- Lookahead Bypass Router Architecture --- p.29Chapter CHAPTER 3 --- BENCHMARK AND MEASUREMENT --- p.31Chapter 3.1 --- BENCHMARK GENERATION --- p.32Chapter 3.1.1 --- Types of Traffic Patterns --- p.32Chapter 3.1.2 --- Traffic Generator --- p.36Chapter 3.2 --- MEASUREMENT SETTING --- p.38Chapter 3.2.1 --- Warming-up Period. --- p.38Chapter 3.2.2 --- Latency Definition --- p.39Chapter 3.2.3 --- Throughput Definition --- p.40Chapter 3.2.4 --- Virtual Channel Utilization --- p.40Chapter CHAPTER 4 --- PLATFORM STRUCTURE --- p.41Chapter 4.1 --- FILE TREE --- p.42Chapter 4.1.1 --- System Files --- p.46Chapter 4.1.2 --- Low-Power NoC Related --- p.47Chapter 4.1.3 --- Low-Latency NoC Related --- p.50Chapter 4.1.4 --- Project Related --- p.51Chapter 4.2 --- PROCESSES --- p.52Chapter 4.3 --- GUI ACCESS --- p.56Chapter 4.3.1 --- Section 1: Project Setup --- p.58Chapter 4.3.2 --- Section 2-a: Low-Power Router Structure --- p.59Chapter 4.3.3 --- Section 2-b: Low-Latency Router Structure --- p.60Chapter 4.3.4 --- Section 3: Benchmark & Measurement --- p.60Chapter 4.3.5 --- Section 4: View Result --- p.62Chapter 4.3.6 --- Low-Power NoC Example --- p.62Chapter CHAPTER 5 --- OPTIMIZATION AND COMPARISON --- p.72Chapter 5.1 --- OPTIMIZATION TECHNIQUE --- p.72Chapter 5.1.1 --- Optimization Phase 1: Inactive Buffer Removal --- p.73Chapter 5.1.2 --- Optimization Phase 2: Infighting Analysis --- p.74Chapter 5.1.3 --- Over-Optimization --- p.75Chapter 5.1.4 --- Optimization Example --- p.79Chapter 5.2 --- NOCS COMPARISON --- p.83Chapter 5.3 --- LOW-POWER IMPLEMENTATION CODE EXPORT --- p.88Chapter CHAPTER 6 --- SUMMARY AND FUTURE WORK --- p.92Chapter 6.1. --- SUMMARY --- p.92Chapter 6.2. --- FUTURE WORK --- p.93REFERENCES --- p.9
Recommended from our members
Energy efficient communication across on-chip wires in digital CMOS
For the past half century, CMOS process scaling has followed Moore's law, approximately doubling transistor density every 18 months. While locally routed wires have generally scaled with transistor size, longer wires have scaled at a slower rate and in some cases have grown larger as chip size and complexity have increased. Wires routed for non-local communication now consume a large and increasing portion of the power, thermal and area budgets in CMOS designs. Additionally, dynamic energy expended in driving locally routed wires has become comparable to that expended in logic. The goal of this research is to investigate methods of reducing the energy required for on-chip communication, primarily through the use of low-voltage swing signaling. A network-on-chip routing architecture is presented that uses complementary architectural and low-voltage swing signaling techniques to significantly improve the latency, throughput and power of an on-chip network. On-chip signaling circuits are presented that improve the suitability of low-voltage swing signaling for short wire lengths and reduced supply voltages. Finally, a procedure for improving the energy efficiency of wire loads in digital CMOS through the automated insertion of low-voltage swing signaling circuits is presented
Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design
This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation.
The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed.
In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling.
The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz
Software-based and regionally-oriented traffic management in Networks-on-Chip
Since the introduction of chip-multiprocessor systems, the number of integrated cores has been steady growing and workload applications have been adapted to exploit the increasing parallelism. This changed the importance of efficient on-chip communication significantly and the infrastructure has to keep step with these new requirements.
The work at hand makes significant contributions to the state-of-the-art of the latest generation of such solutions, called Networks-on-Chip, to improve the performance, reliability, and flexible management of these on-chip infrastructures