65 research outputs found

    Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

    Full text link
    Fully Connected Neural Network (FCNN) is a class of Artificial Neural Networks widely used in computer science and engineering, whereas the training process can take a long time with large datasets in existing many-core systems. Optical Network-on-Chip (ONoC), an emerging chip-scale optical interconnection technology, has great potential to accelerate the training of FCNN with low transmission delay, low power consumption, and high throughput. However, existing methods based on Electrical Network-on-Chip (ENoC) cannot fit in ONoC because of the unique properties of ONoC. In this paper, we propose a fine-grained parallel computing model for accelerating FCNN training on ONoC and derive the optimal number of cores for each execution stage with the objective of minimizing the total amount of time to complete one epoch of FCNN training. To allocate the optimal number of cores for each execution stage, we present three mapping strategies and compare their advantages and disadvantages in terms of hotspot level, memory requirement, and state transitions. Simulation results show that the average prediction error for the optimal number of cores in NN benchmarks is within 2.3%. We further carry out extensive simulations which demonstrate that FCNN training time can be reduced by 22.28% and 4.91% on average using our proposed scheme, compared with traditional parallel computing methods that either allocate a fixed number of cores or allocate as many cores as possible, respectively. Compared with ENoC, simulation results show that under batch sizes of 64 and 128, on average ONoC can achieve 21.02% and 12.95% on reducing training time with 47.85% and 39.27% on saving energy, respectively.Comment: 14 pages, 10 figures. This paper is under the second review of IEEE Transactions of Computer

    Pulsar: Design and Simulation Methodology for Dynamic Bandwidth Allocation in Photonic Network-on-Chip Architectures in Heterogeneous Multicore Systems

    Get PDF
    As the computing industry moved toward faster and more energy-efficient solutions, multicore computers proved to be dependable. Soon after, the Network-on-Chip (NoC) paradigm made headway as an effective method of connecting multiple cores on a single chip. These on-chip networks have been used to relay communication between homogeneous and heterogeneous sets of cores and core clusters. However, the variation in bandwidth requirements of heterogeneous systems is often neglected. Therefore, at a given moment, bandwidth may be in excess at one node while it is insufficient at another leading to lower performance and higher energy costs. This work proposes and examines dynamic schemes for the allocation of photonic channels in a Photonic Network-on-Chip (PNoC) as an alternative to their static-provision counterparts and proposes a method of simulating and selecting the characteristics of a dynamic system at the time of design as to achieve maximum system performance in a Photonic Network-on-Chip for a given application type

    Min/max time limits and energy penalty of communication scheduling in ring-based ONoC

    Get PDF
    International audienceRecent advances in the photonics devices integration bring ONoC as a bridge future for communication media in the MPSoC domain. As ONoC can support Wavelength Division Multiplexing (WDM) technique, communications between cores can be improved through allocation of one or several wavelengths for each communication. However, WDM introduces wavelength crosstalk, requiring to increase the laser power to provide accurate communication between cores. Thus, for the designer, exploring this design space (execution time vs power consumption) is not an easy task due to a large number of wavelength allocation combinations. The contribution presented in this paper proposes to evaluate the two extreme bounds of this design space considering the different communication scenario. To address this problem, we model the wavelength allocation by two different objective functions to compute the bounds in terms of execution times. Furthermore, from an accurate model of crosstalk between the wavelengths, we compute the energy penalty for each communication scenario. The results presented in this paper highlight the execution time and energy consumption tradeoff, and the opportunity for communication optimisation thanks to an efficient use of WDM technique

    High-Performance and Wavelength-Reused Optical Network on Chip (ONoC) Architectures and Communication Schemes for Manycore Processor

    Get PDF
    Optical Network on Chip (ONoC) is an emerging chip-scale optical interconnection technology to realize the high-performance and power-efficient inter-core communication for many-core processors. By utilizing the silicon photonic interconnects to transmit data packets with optical signals, it can achieve ultra low communication delay, high bandwidth capacity, and low power dissipation. With the benefits of Wavelength Division Multiplexing (WDM), multiple optical signals can simultaneously be transmitted in the same optical interconnect through different wavelengths. Thus, the WDM-based ONoC is becoming a hot research topic recently. However, the maximal number of available wavelengths is restricted for the reliable and power-efficient optical communication in ONoC. Hence, with a limited number of wavelengths, the design of high-performance and power-efficient ONoC architecture is an important and challenging problem. In this thesis, the design methodology of wavelength-reused ONoC architecture is explored. With the wavelength reuse scheme in optical routing paths, high-performance and power-efficient communication is realized for many-core processors only using a small number of available wavelengths. Three wavelength-reused ONoC architectures and communication schemes are proposed to fulfil different communication requirements, i.e., network scalability, multicast communication, and dark silicon. Firstly, WRH-ONoC, a wavelength-reused hierarchical Optical Network on Chip architecture, is proposed to achieve high network scalability, namely obtaining low communication delay and high throughput capacity for hundreds of thousands of cores by reusing the limited number of available wavelengths with the modest hardware cost and energy overhead. WRH-ONoC combines the advantages of non-blocking communication in each lambda-router and wavelength reuse in all lambda-routers through the hierarchical networking. Both theoretical analysis and simulation results indicate that WRH-ONoC can achieve prominent improvement on the communication performance and scalability (e.g., 46.0% of reduction on the zero-load packet delay and 72.7% of improvement on the network throughput for 400 cores with small hardware cost and energy overhead) in comparison with existing schemes. Secondly, DWRMR, a dynamical wavelength-reused multicast scheme based on the optical multicast ring, is proposed for widely existing multicast communications in many-core processors. In DWRMR, an optical multicast ring is dynamically constructed for each multicast group and the multicast packets are transmitted in a single-send-multi-receive manner requiring only one wavelength. All the cores in the same multicast group can reuse the established multicast ring through an optical token arbitration scheme for the interactive multicast communications, thereby avoiding the frequent construction of multicast routing paths dedicatedly for each core. Simulation results indicate that DWRMR can reduce more than 50% of end-to-end packet delay with slight hardware cost, or require only half number of wavelengths to achieve the same performance compared with existing schemes. Thirdly, Dark-ONoC, a dynamically configurable ONoC architecture, is proposed for the many-core processor with dark silicon. Dark silicon is an inevitable phenomenon that only a small number of cores can be activated simultaneously while the other cores must stay in dark state (power-gated) due to the restricted power budget. Dark-ONoC periodically allocates non-blocking optical routing paths only between the active cores with as less wavelengths as possible. Thus, it can obtain high-performance communication and low power consumption at the same time. Extensive simulations are conducted with the dark silicon patterns from both synthetic distribution and real data traces. The simulation results indicate that the number of wavelengths is reduced by around 15% and the overall power consumption is reduced by 23.4% compared to existing schemes. Finally, this thesis concludes several important principles on the design of wavelength-reused ONoC architecture, and summarizes some perspective issues for the future research

    Resource and thermal management in 3D-stacked multi-/many-core systems

    Full text link
    Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures. 3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead. This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing. This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory. At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads. On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00
    • …
    corecore