34 research outputs found
A Time-Predictable Memory Network-on-Chip
To derive safe bounds on worst-case execution times (WCETs), all components of a computer system need to be time-predictable: the processor pipeline, the caches, the memory controller, and memory arbitration on a multicore processor. This paper presents a solution for time-predictable memory arbitration and access for chip-multiprocessors. The memory network-on-chip is organized as a tree with time-division multiplexing (TDM) of accesses to the shared memory. The TDM based arbitration completely decouples processor cores and allows WCET analysis of the memory accesses on individual cores without considering the tasks on the other cores. Furthermore, we perform local, distributed arbitration according to the global TDM schedule. This solution avoids a central arbiter and scales to a large number of processors
A Time-predictable Memory Network-on-Chip
To derive safe bounds on worst-case execution times (WCETs), all components of a computer system need to be time-predictable: the processor pipeline, the caches, the memory controller, and memory arbitration on a multicore processor. This paper presents a solution for time-predictable memory arbitration and access for chip-multiprocessors. The memory network-on-chip is organized as a tree with time-division multiplexing (TDM) of accesses to the shared memory. The TDM based arbitration completely decouples processor cores and allows WCET analysis of the memory accesses on individual cores without considering the tasks on the other cores. Furthermore, we perform local, distributed arbitration according to the global TDM schedule. This solution avoids a central arbiter and scales to a large number of processors
Recommended from our members
Architectures and Design Automation for Photonic Networks On Chip
Chip-scale photonics has emerged as an exciting field which can potentially solve many of the problems plaguing the high-performance computing industry, from large-scale to embedded. In theory, photonics is a superior communication medium because of its higher bandwidth density using wave-division multiplexing and bandwidth-power translucency to distance traveled. In practice, physical-layer design and engineering issues such as optical loss, crosstalk, and packaging have slowed its entry into widespread adoption at the chip and board scale. In this work, we present these issues and potential design improvements. The major contributions, however, are the tools and methods we have developed for the design of photonic interconnection networks, including a system-level simulator and CAD and modeling environment for layout, both of which are publicly available to the research community
On Energy Efficient Computing Platforms
In accordance with the Moore's law, the increasing number of on-chip integrated transistors has enabled modern computing platforms with not only higher processing power but also more affordable prices. As a result, these platforms, including portable devices, work stations and data centres, are becoming an inevitable part of the human society. However, with the demand for portability and raising cost of power, energy efficiency has emerged to be a major concern for modern computing platforms.
As the complexity of on-chip systems increases, Network-on-Chip (NoC) has been proved as an efficient communication architecture which can further improve system performances and scalability while reducing the design cost. Therefore, in this thesis, we study and propose energy optimization approaches based on NoC architecture, with special focuses on the following aspects.
As the architectural trend of future computing platforms, 3D systems have many bene ts including higher integration density, smaller footprint, heterogeneous integration, etc. Moreover, 3D technology can signi cantly improve the network communication and effectively avoid long wirings, and therefore, provide higher system performance and energy efficiency.
With the dynamic nature of on-chip communication in large scale NoC based systems, run-time system optimization is of crucial importance in order to achieve higher system reliability and essentially energy efficiency. In this thesis, we propose an agent based system design approach where agents are on-chip components which monitor and control system parameters such as supply voltage, operating frequency, etc. With this approach, we have analysed the implementation alternatives for dynamic voltage and frequency scaling and power gating techniques at different granularity, which reduce both dynamic and leakage energy consumption.
Topologies, being one of the key factors for NoCs, are also explored for energy saving purpose. A Honeycomb NoC architecture is proposed in this thesis with turn-model based deadlock-free routing algorithms. Our analysis and simulation based evaluation show that Honeycomb NoCs outperform their Mesh based counterparts in terms of network cost, system performance as well as energy efficiency.Siirretty Doriast
Hybrid Router Design for High Performance Photonic Network-On-Chip
With rising density of cores in Chip-Multiprocessors, traditional metallic interconnects won't be able to cater to the high demand in communication bandwidth at lower power consumption. Photonic interconnects are emerging as a very competitive and promising alternative to address these bottlenecks in recent times. The infrastructure for realizing such a communication architecture comprises of Micro Ring-resonator based silicon nano-photonic routers and waveguides. We propose a novel 5x5 photonic router microarchitecture employing mode-division-multiplexing along with wavelength-division-multiplexing and time-division-multiplexing. It increases the aggregate bandwidth almost four times in a network consuming almost 30% less power as compared to other recent photonic routers and laying the foundation of a high performance photonic network-on-chip(PNoC). We validated the feasibility of the proposed architecture and developed a new circuit switched based network simulator(PhotoNoxim) based on the router microarchitecture proposed by us to validate it under various synthetic traffics and benchmark applications
Distributed Time-Predictable Memory Interconnect for Multi-Core Architectures
Multi-core architectures are increasingly adopted in emerging real-time applications where execution time is required to be bounded in the worst case (i.e., time predictability) and low. Memory access latency is the main part forming the overall execution time. A promising approach towards time predictability is to employ distributed memory interconnects, either locally arbitrated interconnects or globally arbitrated interconnects, with arbitration schemes, and the pipelined tree-based structure can break the critical path of multiplexing into short steps with small logic size. It scales to a large number of processors that high clock frequency can be synthesised. This research explores timing behaviour of multi-core architectures with shared distributed memory interconnects and improves distributed time-predictable memory interconnects for multi-core architectures. The contributions are mainly threefold. First, the generic analytical flow is proposed for time-predictable behaviour of memory accesses across multi-core architectures with locally arbitrated interconnects. It guarantees time predictability and safely bound the worst case without exact memory access profiles. Second, the root queue modification with the root queue management is proposed for multi-core architectures with locally arbitrated interconnects that variation of memory access latency is reduced and timing behaviour analysis is facilitated. Third, Meshed Bluetree is proposed as the distributed time-predictable multi-memory interconnect, enabling multiple processors to simultaneously access multiple memory modules
Recommended from our members
Next Generation Silicon Photonic Transceiver: From Device Innovation to System Analysis
Silicon photonics is recognized as a disruptive technology that has the potential to reshape many application areas, for example, data center communication, telecommunications, high-performance computing, and sensing. The key capability that silicon photonics offers is to leverage CMOS-style design, fabrication, and test infrastructure to build compact, energy-efficient, and high-performance integrated photonic systems-on- chip at low cost. As the need to squeeze more data into a given bandwidth and a given footprint increases, silicon photonics becomes more and more promising. This work develops and demonstrates novel devices, methodologies, and architectures to resolve the challenges facing the next-generation silicon photonic transceivers. The first part of this thesis focuses on the topology optimization of passive silicon photonic devices. Specifically, a novel device optimization methodology - particle swarm optimization in conjunction with 3D finite-difference time-domain (FDTD), has been proposed and proven to be an effective way to design a wide range of passive silicon photonic devices. We demonstrate a polarization rotator and a 90â—¦ optical hybrid for polarization-diversity and phase-diversity communications - two important schemes to increase the communication capacity by increasing the spectral efficiency. The second part of this thesis focuses on the design and characterization of the next- generation silicon photonic transceivers. We demonstrate a polarization-insensitive WDM receiver with an aggregate data rate of 160 Gb/s. This receiver adopts a novel architecture which effectively reduces the polarization-dependent loss. In addition, we demonstrate a III-V/silicon hybrid external cavity laser with a tuning range larger than 60 nm in the C-band on a silicon-on-insulator platform. A III-V semiconductor gain chip is hybridized into the silicon chip by edge-coupling to the silicon chip. The demonstrated packaging method requires only passive alignment and is thus suitable for high-volume production. We also demonstrate all silicon-photonics-based transmission of 34 Gbaud (272 Gb/s) dual-polarization 16-QAM using our integrated laser and silicon photonic coherent transceiver. The results show no additional penalty compared to commercially available narrow linewidth tunable lasers. The last part of this thesis focuses on the chip-scale optical interconnect and presents two different types of reconfigurable memory interconnects for multi-core many-memory computing systems. These reconfigurable interconnects can effectively alleviate the memory access issues, such as non-uniform memory access, and Network-on-Chip (NoC) hot-spots that plague the many-memory computing systems by dynamically directing the available memory bandwidth to the required memory interface
Recommended from our members
Photonic Interconnects Beyond High Bandwidth
The extraordinary growth of parallelism in high-performance computing requires efficient data communication for scaling compute performance. High-performance computing systems have been using photonic links for communication of large bandwidth-distance product during the last decade. Photonic interconnection networks, however, should not be a wire-for-wire replacement based on conventional electrical counterparts. Features of photonics beyond high bandwidth, such as transparent bandwidth steering, can implement important functionalities needed by applications. In another aspect, application characteristics can be exploited to design better photonic interconnects. Therefore, this thesis explores codesign opportunities at the intersection between photonic interconnect architectures and high-performance computing applications. The key accomplishments of this thesis, ranging from system level to node level, are as follows.
Chapter 2 presents a system-level architecture that leverages photonic switching to enable a reconfigurable interconnect. The architecture, called Flexfly, reconfigures the inter-group level of the widely-used Dragonfly topology using information about the application’s communication pattern. It can steal additional direct bandwidth for communication-intensive group pairs. Simulations with applications such as GTC, Nekbone and LULESH show up to 1.8x speedup over Dragonfly paired with UGAL routing, along with halved hop count and latency for cross-group messages. To demonstrate the effectiveness of our approach, we built a 32-node Flexfly prototype using a silicon photonic switch connecting four groups and demonstrated 820 ns interconnect reconfiguration time. This is the first demonstration of silicon photonic switching and bandwidth steering in a high-performance computing cluster.
Chapter 3 extends photonic switching to the node level and presents a reconfigurable silicon photonic memory interconnect for many-core architectures. The interconnect targets at important memory access issues, such as network-on-chip hot-spots and non-uniform memory access. Integrated with the processor through 2.5D/3D stacking, a fast-tunable silicon photonic memory tunnel can transparently direct traffic from any off-chip memory to any on-chip interface – thus alleviating the hot-spot and non-uniform access effects. We demonstrated the operation of our proposed architecture using a tunable laser, a 4-port silicon photonic switch (four wavelength-routed memory channels) and a 4x4 mesh network-on-chip synthesized by FPGA. The emulated system achieves a 15-ns channel switching time. Simulations based on a 12-core 4-memory model show that for such switching speeds the interconnect system can realize a 2x speedup for the STREAM benchmark in the hot-spot scenario and a reduction of execution time for data-intensive applications such as 3D stencil and K-means clustering by 23% and 17%, respectively.
Chapters 4 explores application-level characteristics that can be exploited to hide photonic path setup delays. In view of the frequent reuse of optical circuits by many applications, we proposed a circuit-cached scheme that amortizes the setup overhead by maximizing circuit reuses. In order to improve circuit “hit” rates, we developed a reuse-distance based replacement policy called “Farthest Next Use”. We further investigated the tradeoffs between the realized hit rate and energy consumption. Finally, we experimentally demonstrated the feasibility of the proposed concept using silicon photonic devices in an FPGA-controlled network testbed.
Chapter 5 proceeds to develop an application-guided circuit-prefetch scheme. By learning temporal locality and communication patterns from upper-layer applications, the scheme not only caches a set of circuits for reuses, but also proactively prefetches circuits based on predictions. We applied this technique to communication patterns from a spectrum of science and engineering applications. The results show that setup delays via circuit misses are significantly reduced, showing how the proposed technique can improve circuit switching in photonic interconnects