5 research outputs found

    Simulation Of Multi-core Systems And Interconnections And Evaluation Of Fat-Mesh Networks

    Get PDF
    Simulators are very important in computer architecture research as they enable the exploration of new architectures to obtain detailed performance evaluation without building costly physical hardware. Simulation is even more critical to study future many-core architectures as it provides the opportunity to assess currently non-existing computer systems. In this thesis, a multiprocessor simulator is presented based on a cycle accurate architecture simulator called SESC. The shared L2 cache system is extended into a distributed shared cache (DSC) with a directory-based cache coherency protocol. A mesh network module is extended and integrated into SESC to replace the bus for scalable inter-processor communication. While these efforts complete an extended multiprocessor simulation infrastructure, two interconnection enhancements are proposed and evaluated. A novel non-uniform fat-mesh network structure similar to the idea of fat-tree is proposed. This non-uniform mesh network takes advantage of the average traffic pattern, typically all-to-all in DSC, to dedicate additional links for connections with heavy traffic (e.g., near the center) and fewer links for lighter traffic (e.g., near the periphery). Two fat-mesh schemes are implemented based on different routing algorithms. Analytical fat-mesh models are constructed by presenting the expressions for the traffic requirements of personalized all-to-all traffic. Performance improvements over the uniform mesh are demonstrated in the results from the simulator. A hybrid network consisting of one packet switching plane and multiple circuit switching planes is constructed as the second enhancement. The circuit switching planes provide fast paths between neighbors with heavy communication traffic. A compiler technique that abstracts the symbolic expressions of benchmarks' communication patterns can be used to help facilitate the circuit establishment

    Design of platform for exploring application-specific NoC architecture.

    Get PDF
    Liu, Zhouyi.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (leaves 110-114).Abstracts in English and Chinese.ABSTRACTS --- p.I摘要 --- p.IICONTENTS --- p.IIILIST OF FIGURE --- p.VLIST OF TABLE --- p.VIACKNOWLEDGEMENT --- p.VIIChapter CHAPTER 1 --- INTRODUCTION --- p.1Chapter 1.1 --- NETWORK-ON-CHIP --- p.1Chapter 1.2 --- RELATED WORKS --- p.2Chapter 1.3 --- PLATFORM OVERVEW --- p.6Chapter 1.4 --- AUTHOR'S CONTRIBUTION --- p.10Chapter CHAPTER 2 --- NOC LIBRARY --- p.12Chapter 2.1 --- NETWORK TERMINOLOGY --- p.12Chapter 2.2 --- BASIC STRUCTURE --- p.15Chapter 2.3 --- LOW-POWER ORIENTED ARCHITECTURE --- p.20Chapter 2.3.1 --- Low-Cost Allocator Design --- p.21Chapter 2.3.2 --- Clock Gating --- p.22Chapter 2.3.3 --- Express Virtual Channel Insertion --- p.22Chapter 2.4 --- LOW-LATENCY ORIENTED ARCHITECTURE --- p.28Chapter 2.4.1. --- Lookahead Bypass Scheme --- p.29Chapter 2.4.2. --- Lookahead Bypass Router Architecture --- p.29Chapter CHAPTER 3 --- BENCHMARK AND MEASUREMENT --- p.31Chapter 3.1 --- BENCHMARK GENERATION --- p.32Chapter 3.1.1 --- Types of Traffic Patterns --- p.32Chapter 3.1.2 --- Traffic Generator --- p.36Chapter 3.2 --- MEASUREMENT SETTING --- p.38Chapter 3.2.1 --- Warming-up Period. --- p.38Chapter 3.2.2 --- Latency Definition --- p.39Chapter 3.2.3 --- Throughput Definition --- p.40Chapter 3.2.4 --- Virtual Channel Utilization --- p.40Chapter CHAPTER 4 --- PLATFORM STRUCTURE --- p.41Chapter 4.1 --- FILE TREE --- p.42Chapter 4.1.1 --- System Files --- p.46Chapter 4.1.2 --- Low-Power NoC Related --- p.47Chapter 4.1.3 --- Low-Latency NoC Related --- p.50Chapter 4.1.4 --- Project Related --- p.51Chapter 4.2 --- PROCESSES --- p.52Chapter 4.3 --- GUI ACCESS --- p.56Chapter 4.3.1 --- Section 1: Project Setup --- p.58Chapter 4.3.2 --- Section 2-a: Low-Power Router Structure --- p.59Chapter 4.3.3 --- Section 2-b: Low-Latency Router Structure --- p.60Chapter 4.3.4 --- Section 3: Benchmark & Measurement --- p.60Chapter 4.3.5 --- Section 4: View Result --- p.62Chapter 4.3.6 --- Low-Power NoC Example --- p.62Chapter CHAPTER 5 --- OPTIMIZATION AND COMPARISON --- p.72Chapter 5.1 --- OPTIMIZATION TECHNIQUE --- p.72Chapter 5.1.1 --- Optimization Phase 1: Inactive Buffer Removal --- p.73Chapter 5.1.2 --- Optimization Phase 2: Infighting Analysis --- p.74Chapter 5.1.3 --- Over-Optimization --- p.75Chapter 5.1.4 --- Optimization Example --- p.79Chapter 5.2 --- NOCS COMPARISON --- p.83Chapter 5.3 --- LOW-POWER IMPLEMENTATION CODE EXPORT --- p.88Chapter CHAPTER 6 --- SUMMARY AND FUTURE WORK --- p.92Chapter 6.1. --- SUMMARY --- p.92Chapter 6.2. --- FUTURE WORK --- p.93REFERENCES --- p.9

    Improving Packet Predictability of Scalable Network-on-Chip Designs without Priority Pre-emptive Arbitration

    Get PDF
    The quest for improving processing power and efficiency is spawning research into many-core systems with hundreds or thousands of cores. With communication being forecast as the foremost performance bottleneck, Network-on-Chips are the favoured communication infrastructure in the context mainly due to reasons like scalability and power efficiency. However, contention between non-preemptive NoC packets can result in variation in packet latencies thus potentially limiting the overall utilisation of the many-core system. Typical latency predictability enhancement techniques like Virtual Channels or Time Division Multiplexing are usually hardware expensive or non-scalable or both. This research explores the use of dynamic and scalable techniques in Network-on-Chip routers to improve packet predictability by countering Head-of-line blocking (blocked low priority packet blocking a high priority packet) and tailbacking (low priority packet utilising the link that is required by a high priority packet) of non-preemptive packets. The Priority forwarding and tunnelling technique introduced is designed to detect Head-of-line blocking situations so that its internal arbitration parameters can be altered (by forwarding packet parameters down the line) to resolve such issues. The Selective packet splitting technique presented allows resolution of tailbacking by emulating the effect of preemption of packets (by splitting packets) by using a low overhead alternative that manipulates packets. Finally, the thesis presents an architecture that allows the routers to have a notion of timeliness in data packets thus enabling packet arbitration based on application-supplied priority and timeliness thus improving the quality of service given to lower priority packets. Furthermore, the techniques presented in the thesis do not require additional hardware with the increase in size of the NoC. This enables the techniques to be scalable, as the size of the NoC or the number of packet priorities the NoC has to handle does not affect the functionality and operation of the techniques
    corecore