5 research outputs found
Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks
Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment
Design of platform for exploring application-specific NoC architecture.
Liu, Zhouyi.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (leaves 110-114).Abstracts in English and Chinese.ABSTRACTS --- p.I摘要 --- p.IICONTENTS --- p.IIILIST OF FIGURE --- p.VLIST OF TABLE --- p.VIACKNOWLEDGEMENT --- p.VIIChapter CHAPTER 1 --- INTRODUCTION --- p.1Chapter 1.1 --- NETWORK-ON-CHIP --- p.1Chapter 1.2 --- RELATED WORKS --- p.2Chapter 1.3 --- PLATFORM OVERVEW --- p.6Chapter 1.4 --- AUTHOR'S CONTRIBUTION --- p.10Chapter CHAPTER 2 --- NOC LIBRARY --- p.12Chapter 2.1 --- NETWORK TERMINOLOGY --- p.12Chapter 2.2 --- BASIC STRUCTURE --- p.15Chapter 2.3 --- LOW-POWER ORIENTED ARCHITECTURE --- p.20Chapter 2.3.1 --- Low-Cost Allocator Design --- p.21Chapter 2.3.2 --- Clock Gating --- p.22Chapter 2.3.3 --- Express Virtual Channel Insertion --- p.22Chapter 2.4 --- LOW-LATENCY ORIENTED ARCHITECTURE --- p.28Chapter 2.4.1. --- Lookahead Bypass Scheme --- p.29Chapter 2.4.2. --- Lookahead Bypass Router Architecture --- p.29Chapter CHAPTER 3 --- BENCHMARK AND MEASUREMENT --- p.31Chapter 3.1 --- BENCHMARK GENERATION --- p.32Chapter 3.1.1 --- Types of Traffic Patterns --- p.32Chapter 3.1.2 --- Traffic Generator --- p.36Chapter 3.2 --- MEASUREMENT SETTING --- p.38Chapter 3.2.1 --- Warming-up Period. --- p.38Chapter 3.2.2 --- Latency Definition --- p.39Chapter 3.2.3 --- Throughput Definition --- p.40Chapter 3.2.4 --- Virtual Channel Utilization --- p.40Chapter CHAPTER 4 --- PLATFORM STRUCTURE --- p.41Chapter 4.1 --- FILE TREE --- p.42Chapter 4.1.1 --- System Files --- p.46Chapter 4.1.2 --- Low-Power NoC Related --- p.47Chapter 4.1.3 --- Low-Latency NoC Related --- p.50Chapter 4.1.4 --- Project Related --- p.51Chapter 4.2 --- PROCESSES --- p.52Chapter 4.3 --- GUI ACCESS --- p.56Chapter 4.3.1 --- Section 1: Project Setup --- p.58Chapter 4.3.2 --- Section 2-a: Low-Power Router Structure --- p.59Chapter 4.3.3 --- Section 2-b: Low-Latency Router Structure --- p.60Chapter 4.3.4 --- Section 3: Benchmark & Measurement --- p.60Chapter 4.3.5 --- Section 4: View Result --- p.62Chapter 4.3.6 --- Low-Power NoC Example --- p.62Chapter CHAPTER 5 --- OPTIMIZATION AND COMPARISON --- p.72Chapter 5.1 --- OPTIMIZATION TECHNIQUE --- p.72Chapter 5.1.1 --- Optimization Phase 1: Inactive Buffer Removal --- p.73Chapter 5.1.2 --- Optimization Phase 2: Infighting Analysis --- p.74Chapter 5.1.3 --- Over-Optimization --- p.75Chapter 5.1.4 --- Optimization Example --- p.79Chapter 5.2 --- NOCS COMPARISON --- p.83Chapter 5.3 --- LOW-POWER IMPLEMENTATION CODE EXPORT --- p.88Chapter CHAPTER 6 --- SUMMARY AND FUTURE WORK --- p.92Chapter 6.1. --- SUMMARY --- p.92Chapter 6.2. --- FUTURE WORK --- p.93REFERENCES --- p.9
Recommended from our members
Design and Optimization of Networks-on-Chip for Future Heterogeneous Systems-on-Chip
Due to the tight power budget and reduced time-to-market, Systems-on-Chip (SoC) have emerged as a power-efficient solution that provides the functionality required by target applications in embedded systems. To support a diverse set of applications such as real-time video/audio processing and sensor signal processing, SoCs consist of multiple heterogeneous components, such as software processors, digital signal processors, and application-specific hardware accelerators. These components offer different flexibility, power, and performance values so that SoCs can be designed by mix-and-matching them.
With the increased amount of heterogeneous cores, however, the traditional interconnects in an SoC exhibit excessive power dissipation and poor performance scalability. As an alternative, Networks-on-Chip (NoC) have been proposed. NoCs provide modularity at design-time because
communications among the cores are isolated from their computations via standard interfaces. NoCs also exploit communication parallelism at run-time because multiple data can be transferred simultaneously.
In order to construct an efficient NoC, the communication behaviors of various heterogeneous components in an SoC must be considered with the large amount of NoC design parameters. Therefore, providing an efficient NoC design and optimization framework is critical to reduce the design
cycle and address the complexity of future heterogeneous SoCs. This is the thesis of my dissertation.
Some existing design automation tools for NoCs support very limited degrees of automation that cannot satisfy the requirements of future heterogeneous SoCs. First, these tools only support a limited number of NoC design parameters. Second, they do not provide an integrated environment for software-hardware co-development.
Thus, I propose FINDNOC, an integrated framework for the generation, optimization, and validation of NoCs for future heterogeneous SoCs. The proposed framework supports software-hardware co-development, incremental NoC design-decision model, SystemC-based NoC customization and generation, and fast system protyping with FPGA emulations.
Virtual channels (VC) and multiple physical (MP) networks are the two main alternative methods to provide better performance, support quality-of-service, and avoid protocol deadlocks in packet-switched NoC design. To examine the effect of using VCs and MPs with other NoC architectural
parameters, I completed a comprehensive comparative analysis that combines an analytical model, synthesis-based designs for both FPGAs and standard-cell libraries, and system-level simulations.
Based on the results of this analysis, I developed VENTTI, a design and simulation environment that combines a virtual platform (VP), a NoC synthesis tool, and four NoC models characterized at different abstraction levels. VENTTI facilitates an incremental decision-making process with four
NoC abstraction models associated with different NoC parameters. The selected NoC parameters can be validated by running simulations with the corresponding model instantiated in the VP.
I augmented this framework to complete FINDNOC by implementing ICON, a NoC generation and customization tool that dynamically combines and customizes synthesizable SystemC components from a predesigned library. Thanks to its flexibility and automatic network interface generation
capabilities, ICON can generate a rich variety of NoCs that can be then integrated into any Embedded Scalable Platform (ESP) architectures for fast prototying with FPGA emulations.
I designed FINDNOC in a modular way that makes it easy to augmenting it with new capabilities. This, combined with the continuous progress of the ESP design methodology, will provide a seamless SoC integration framework, where the hardware accelerators, software applications, and
NoCs can be designed, validated, and integrated simultaneously, in order to reduce the design cycle of future SoC platforms
Improving Packet Predictability of Scalable Network-on-Chip Designs without Priority Pre-emptive Arbitration
The quest for improving processing power and efficiency is spawning research into many-core systems with hundreds or thousands of cores. With communication being forecast as the foremost performance bottleneck, Network-on-Chips are the favoured communication infrastructure in the context mainly due to reasons like scalability and power efficiency. However, contention between non-preemptive NoC packets can result in variation in packet latencies thus potentially limiting the overall utilisation of the many-core system. Typical latency predictability enhancement techniques like Virtual Channels or Time Division Multiplexing are usually hardware expensive or non-scalable or both. This research explores the use of dynamic and scalable techniques in Network-on-Chip routers to improve packet predictability by countering Head-of-line blocking (blocked low priority packet blocking a high priority packet) and tailbacking (low priority packet utilising the link that is required by a high priority packet) of non-preemptive packets.
The Priority forwarding and tunnelling technique introduced is designed to detect Head-of-line blocking situations so that its internal arbitration parameters can be altered (by forwarding packet parameters down the line) to resolve such issues. The Selective packet splitting technique presented allows resolution of tailbacking by emulating the effect of preemption of packets (by splitting packets) by using a low overhead alternative that manipulates packets. Finally, the thesis presents an architecture that allows the routers to have a notion of timeliness in data packets thus enabling packet arbitration based on application-supplied priority and timeliness thus improving the quality of service given to lower priority packets. Furthermore, the techniques presented in the thesis do not require additional hardware with the increase in size of the NoC. This enables the techniques to be scalable, as the size of the NoC or the number of packet priorities the NoC has to handle does not affect the functionality and operation of the techniques