2,527 research outputs found
A survey on scheduling and mapping techniques in 3D Network-on-chip
Network-on-Chips (NoCs) have been widely employed in the design of
multiprocessor system-on-chips (MPSoCs) as a scalable communication solution.
NoCs enable communications between on-chip Intellectual Property (IP) cores and
allow those cores to achieve higher performance by outsourcing their
communication tasks. Mapping and Scheduling methodologies are key elements in
assigning application tasks, allocating the tasks to the IPs, and organising
communication among them to achieve some specified objectives. The goal of this
paper is to present a detailed state-of-the-art of research in the field of
mapping and scheduling of applications on 3D NoC, classifying the works based
on several dimensions and giving some potential research directions
CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5 D, and 3D Processor-Memory Systems
Processing cores and the accompanying main memory working in tandem enable
the modern processors. Dissipating heat produced from computation, memory
access remains a significant problem for processors. Therefore, processor
thermal management continues to be an active research topic. Most thermal
management research takes place using simulations, given the challenges of
measuring temperature in real processors. Since core and memory are fabricated
on separate packages in most existing processors, with the memory having lower
power densities, thermal management research in processors has primarily
focused on the cores.
Memory bandwidth limitations associated with 2D processors lead to
high-density 2.5D and 3D packaging technology. 2.5D packaging places cores and
memory on the same package. 3D packaging technology takes it further by
stacking layers of memory on the top of cores themselves. Such packagings
significantly increase the power density, making processors prone to heating.
Therefore, mitigating thermal issues in high-density processors (packaged with
stacked memory) becomes an even more pressing problem. However, given the lack
of thermal modeling for memories in existing interval thermal simulation
toolchains, they are unsuitable for studying thermal management for
high-density processors.
To address this issue, we present CoMeT, the first integrated Core and Memory
interval Thermal simulation toolchain. CoMeT comprehensively supports thermal
simulation of high- and low-density processors corresponding to four different
core-memory configurations - off-chip DDR memory, off-chip 3D memory, 2.5D, and
3D. CoMeT supports several novel features that facilitate overlying system
research. Compared to an equivalent state-of-the-art core-only toolchain, CoMeT
adds only a ~5% simulation-time overhead. The source code of CoMeT has been
made open for public use under the MIT license.Comment: https://github.com/marg-tools/CoMe
Resource and thermal management in 3D-stacked multi-/many-core systems
Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures.
3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead.
This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing.
This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory.
At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads.
On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
Architectural Impacts of RFiop: RF to Address I/O Pad and Memory Controller Scalability
Despite power boundaries, Moore's law is still present via scaling the number of cores, which keeps adding demands for more memory bandwidth (MBW) requested by these cores. To obtain higher MBW levels, it is fundamental to address memory controller (MC) scalability. However, MC scalability growth is limited by I/O pin counts scaling. To underline MC and pin scaling, a radio frequency(RF) I/O pad-scalable package-based (RFiop) memory organization is further investigated. In RFiop, a RF pad (RFpad) is defined as a quilt-packaging (QP) coplanar waveguide employed at RF ranges. An RFpad connects a rank to an RFMC which is formed by coupling MCs to RF transmitter/receivers. By using QP package to explore the architectural benefits of laying out ranks, RFiop replaces the traditional memory path with an RF-based one, while exploring the scalability of RFpads/RFMCs via RF signaling. When evaluating RFiop, our findings show that MBW/performance are enhanced by around 4.3x which can be viewed as a diminution in transaction queue occupancy/latency as well as using a reduced and scalable 4-8 RFpads per RFMC. RFiop architectural area benefits allow MBW/performance improvements of around 3.2x, while reducing interconnection energy up to 78%
Walter: Wide I/O Scaling of Number of Memory Controllers Versus Frequency and Voltage
Computational application demands do push the scaling of the number of cores, which themselves further increase the demand for more bandwidth. The use of larger rank widths and/or scaling the number of memory controllers (MCs) is a straightforward way to increase memory bandwidth. Connecting wide ranks and MCs via low-capacitance Through Silicon Vias (TSVs) favors high-bandwidth 3DStacking systems (e.g. Wide I/O). Given that voltage and frequency scaling (VFS) lower power utilization but the use of lower clock frequencies reduces bandwidth, this article proposes Walter as a W ide I/O technique that trades off sc al ing of the number of memory con t roll e rs (MCs) versus clock fr equency and voltage (VFS) to mitigate low bandwidth and improve energy-per-bit usage. Our findings show that Walter ’s Wide I/O architectural benefits of using a larger number of MCs coupled with wider ranks when combined to VFS are promising: compared to the baseline for a 75% frequency/voltage reduction, MC scalability improved memory bandwidth by 2.4x and energy-per-bit reduced by 20% (most benchmarks for up to 16 MCs). Walter ’s architectural replacement of ranks set at specification frequencies with ones set at lower frequencies allows temperature reduction thus likely allowing further rank stacking
- …