2,527 research outputs found

    A survey on scheduling and mapping techniques in 3D Network-on-chip

    Full text link
    Network-on-Chips (NoCs) have been widely employed in the design of multiprocessor system-on-chips (MPSoCs) as a scalable communication solution. NoCs enable communications between on-chip Intellectual Property (IP) cores and allow those cores to achieve higher performance by outsourcing their communication tasks. Mapping and Scheduling methodologies are key elements in assigning application tasks, allocating the tasks to the IPs, and organising communication among them to achieve some specified objectives. The goal of this paper is to present a detailed state-of-the-art of research in the field of mapping and scheduling of applications on 3D NoC, classifying the works based on several dimensions and giving some potential research directions

    CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5 D, and 3D Processor-Memory Systems

    Get PDF
    Processing cores and the accompanying main memory working in tandem enable the modern processors. Dissipating heat produced from computation, memory access remains a significant problem for processors. Therefore, processor thermal management continues to be an active research topic. Most thermal management research takes place using simulations, given the challenges of measuring temperature in real processors. Since core and memory are fabricated on separate packages in most existing processors, with the memory having lower power densities, thermal management research in processors has primarily focused on the cores. Memory bandwidth limitations associated with 2D processors lead to high-density 2.5D and 3D packaging technology. 2.5D packaging places cores and memory on the same package. 3D packaging technology takes it further by stacking layers of memory on the top of cores themselves. Such packagings significantly increase the power density, making processors prone to heating. Therefore, mitigating thermal issues in high-density processors (packaged with stacked memory) becomes an even more pressing problem. However, given the lack of thermal modeling for memories in existing interval thermal simulation toolchains, they are unsuitable for studying thermal management for high-density processors. To address this issue, we present CoMeT, the first integrated Core and Memory interval Thermal simulation toolchain. CoMeT comprehensively supports thermal simulation of high- and low-density processors corresponding to four different core-memory configurations - off-chip DDR memory, off-chip 3D memory, 2.5D, and 3D. CoMeT supports several novel features that facilitate overlying system research. Compared to an equivalent state-of-the-art core-only toolchain, CoMeT adds only a ~5% simulation-time overhead. The source code of CoMeT has been made open for public use under the MIT license.Comment: https://github.com/marg-tools/CoMe

    Resource and thermal management in 3D-stacked multi-/many-core systems

    Full text link
    Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures. 3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead. This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing. This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory. At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads. On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00

    Architectural Impacts of RFiop: RF to Address I/O Pad and Memory Controller Scalability

    Get PDF
    Despite power boundaries, Moore's law is still present via scaling the number of cores, which keeps adding demands for more memory bandwidth (MBW) requested by these cores. To obtain higher MBW levels, it is fundamental to address memory controller (MC) scalability. However, MC scalability growth is limited by I/O pin counts scaling. To underline MC and pin scaling, a radio frequency(RF) I/O pad-scalable package-based (RFiop) memory organization is further investigated. In RFiop, a RF pad (RFpad) is defined as a quilt-packaging (QP) coplanar waveguide employed at RF ranges. An RFpad connects a rank to an RFMC which is formed by coupling MCs to RF transmitter/receivers. By using QP package to explore the architectural benefits of laying out ranks, RFiop replaces the traditional memory path with an RF-based one, while exploring the scalability of RFpads/RFMCs via RF signaling. When evaluating RFiop, our findings show that MBW/performance are enhanced by around 4.3x which can be viewed as a diminution in transaction queue occupancy/latency as well as using a reduced and scalable 4-8 RFpads per RFMC. RFiop architectural area benefits allow MBW/performance improvements of around 3.2x, while reducing interconnection energy up to 78%

    Walter: Wide I/O Scaling of Number of Memory Controllers Versus Frequency and Voltage

    Get PDF
    Computational application demands do push the scaling of the number of cores, which themselves further increase the demand for more bandwidth. The use of larger rank widths and/or scaling the number of memory controllers (MCs) is a straightforward way to increase memory bandwidth. Connecting wide ranks and MCs via low-capacitance Through Silicon Vias (TSVs) favors high-bandwidth 3DStacking systems (e.g. Wide I/O). Given that voltage and frequency scaling (VFS) lower power utilization but the use of lower clock frequencies reduces bandwidth, this article proposes Walter as a W ide I/O technique that trades off sc al ing of the number of memory con t roll e rs (MCs) versus clock fr equency and voltage (VFS) to mitigate low bandwidth and improve energy-per-bit usage. Our findings show that Walter ’s Wide I/O architectural benefits of using a larger number of MCs coupled with wider ranks when combined to VFS are promising: compared to the baseline for a 75% frequency/voltage reduction, MC scalability improved memory bandwidth by 2.4x and energy-per-bit reduced by 20% (most benchmarks for up to 16 MCs). Walter ’s architectural replacement of ranks set at specification frequencies with ones set at lower frequencies allows temperature reduction thus likely allowing further rank stacking
    corecore