16 research outputs found
A distributed interleaving scheme for efficient access to WideIO DRAM memory
Achieving the main memory (DRAM) required bandwidth at acceptable
power levels for current and future applications is a major
challenge for System-on-Chip designers for mobile platforms.
Three dimensional (3D) integration and 3D stacked DRAM memories
promise to provide a significant boost in bandwidth at low
power levels by exploiting multiple channels and wide data interfaces.
In this paper, we address the problem of efficiently exploiting
the multiple channels provided by standard (JEDEC’s WIDEIO)
3D-stacked memories, to extract maximal effective bandwidth
and minimize latency for main memory access. We propose a new
distributed interleaved access method that leverages the on-chip interconnect
to simplify the design and implementation of the DRAM
controller, without impacting performance compared to traditional
centralized implementations. We perform experiments on realistic
workload for a mobile communication and multimedia platform
and show that our proposed distributed interleaving memory access
method improves the overall throughput while minimally impacting
the performance of latency sensitive communication flows
A DRAM Centric NoC Architecture and Topology Design Approach
Most communication traffic in today\u2019s System on Chips (SoC) is DRAM centric. The NoC should be designed to efficiently handle the many-to-one communication pattern, funneling to and from the DRAM controller. In this paper, we motivate the use of a separate network for the DRAM traffic and justify the power overhead and performance improvement obtained, when compared to traditional solutions. We also show how the topology of this DRAM network can be designed and optimized to account for the funnel-shaped pattern. Our experiments on a realistic SoC multimedia benchmark shows a large reduction in power consumption and improvement in performance when compared to existing solutions
CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers
Abstract—Manycore chips are emerging as the architecture of choice to provide power efficiency and improve performance, while riding Moore’s Law. In these architectures, on-chip interconnects play a pivotal role in ensuring power and performance scalability. As supply voltages begin to level off in future technologies, chip designs in general and interconnects in particular will require specialization to meet power and performance objectives. In this work, we make the observation that cache-coherent manycore server chips exhibit a duality in on-chip network traffic. Request traffic largely consists of simple control messages, while response traffic often carries cache-block-sized payloads. We present Cache-Coherence Network-on-Chip (CCNoC), a design that specializes the NoC to fit the demands of server workloads via a pair of asymmetric networks tuned to the type of traffic traversing them. The networks differ in their datapath width, router microarchitecture, flow control strategy, and delay. The resulting heterogeneous CCNoC architecture enables significant gains in power efficiency over conventional NoC designs at similar performance levels. Our evaluation reveals that a 4x4 mesh-based chip multiprocessor with the proposed CCNoC organization running commercial server workloads is 15-28 % more energy efficient than various state-of-the-art singleand dual-network organizations. I
A method to remove deadlocks in Networks-on-Chips with Wormhole flow control
Networks-on-Chip (NoCs) are a promising interconnect paradigm to address the communication bottleneck of Systems-on-Chip (SoCs). Wormhole flow control is widely used as the transmission protocol in NoCs, as it offers high throughput and low latency. To match the application characteristics, customized irregular topologies and routing functions are used. With wormhole flow control and custom irregular NoC topologies, deadlocks can occur during system operation. Ensuring a deadlock free operation of custom NoCs is a major challenge. In this paper, we address this important issue and present a method to remove deadlocks in application-specific NoCs. Our method can be applied to any NoC topology and routing function, and the potential deadlocks are removed by adding minimal number of virtual or physical channels. Experiments on a variety of realistic benchmarks show that our method results in a large reduction in the number of resources needed (88% on average) and NoC power consumption, area reduction (66% area savings on average) when compared to the state-of-the-art deadlock removal methods
NoC topology synthesis for supporting shutdown of voltage islands in SoCs
In many Systems on Chips (SoCs), the cores are clustered in to voltage islands. When cores in an island are unused, the entire island can be shutdown to reduce the leakage power consumption. However, today, the interconnect architecture is a bottleneck in allowing the shutdown of the islands. In this paper, we present a synthesis approach to obtain customized application-specific Networks on Chips (NoCs) that can support the shutdown of voltage islands. Our results on realistic SoC benchmarks show that the resulting NoC designs only have a negligible overhead in SoC active power consumption (average of 3%) and area (average of 0.5%) to support the shutdown of islands. The shutdown support provided can lead to a significant leakage and hence total power savings
SunFloor 3D: A Tool for Networks on Chip Topology Synthesis for 3-D Systems on Chips
Three-dimensional integrated circuits (3D-ICs) are
a promising approach to address the integration challenges
faced by current systems on chips (SoCs). Designing an efficient
network on chip (NoC) interconnect for a 3-D SoC that meets
not only the application performance constraints but also the
constraints imposed by the 3-D technology is a significant
challenge. In this paper, we present a design tool, SunFloor
3D, to synthesize application-specific 3-D NoCs. The proposed
tool determines the best NoC topology for the application,
finds paths for the communication flows, assigns the network
components to the 3-D layers, and places them in each layer. We
perform experiments on several SoC benchmarks and present a
comparative study between 3-D and 2-D NoC designs. Our studies
show large improvements in interconnect power consumption
(average of 38%) and delay (average of 13%) for the 3-D NoC
when compared to the corresponding 2-D implementation. Our
studies also show that the synthesized topologies result in large
power (average of 54%) and delay savings (average of 21%) when
compared to standard topologies
Comparative Analysis of NoCs for Two-Dimensional Versus Three-Dimensional SoCs Supporting Multiple Voltage and Frequency Islands
In many of today\u2019s system-on-chip (SoC) designs, the
cores are partitioned into multiple voltage and frequency islands
(VFIs), and the global interconnect is implemented using a packetswitched
network on chip (NoC). In such VFI-based designs,
the benefits of 3-D integration in reducing the NoC power or
delay are unclear, as a significant fraction of power is spent in
link-level synchronization, and stacked designs may impose many
synchronization boundaries. In this brief, we show the quantitative
benefits of the 3-D technology on NoC power and delay values
for such application-specific designs. We show a design flow for
building application-specific NoCs for both 2-D and 3-D SoCs with
multiple VFIs. We present a detailed case study of NoCs designed
using the flow for a mobile platform. Our results show that power
savings strongly depend on the number of VFIs used (up to 32%
reduction). This motivates the need for an early architectural
space exploration, as allowed by our flow. Our experiments also
show that the reduction in delay is only marginal when moving
from 2-D to 3-D systems (up to 11%), if both are designed
efficiently
A Floorplan-aware Interactive Tool Flow for NoC Design and Synthesis
In this paper we present a floorplan-aware toolchain for NoC
design and synthesis integrated with a graphical front-end. The
resulting design methodology is highly automated yet entails rich
interaction with the user, spanning across traffic flow specification,
topology synthesis and physical floorplanning, with back-annotation
capabilities and opportunities for incremental design. We exploit the
proposed tool to implement some NoC-based case studies. We show
that not only a great amount of time and effort can be saved thanks
to the easy-to-use proposed environment, but also that the quality of
the final netlist improves due to the optimizations unlocked by the
early-stage interaction among the designer and the proposed
toolchain