Extending 2-D planar topologies in integrated circuits (ICs) 
Introduction
Device integration on a single die has kept broadly to the Moore predicted rate over the past four decades due to concerted industry efforts, and this trend is set to continue in the near future, with the DRAM 1/2 pitch being set a target of 18-20nm in 2018 by the ITRS [1] . For true system-level integration of heterogeneous elements including specialised and general purpose digital processing blocks, analogue, mixed-signal and RF functions and storage, the IC industry has increasingly been looking at 3-D integration. Multi-chip module type arrangements where packaged chips are situated on a single substrate or across a board and package-level integration options such as System-In-Package where stacked dies are connected by bond wires have been augmented by technologies capable of die-to-die and die-to-wafer integration using through-Sivias (TSV) (see Fig. 1 ). These different options generally provide a trade-off between cost, performance, functionality and footprint [2] .
The generalised NoC architecture has been proposed as being suitable for integrating heterogeneous elements and managing the communication between them efficiently. Sophisticated system-level analyses identify a memoryrelated bottleneck for high-throughput, demanding applications such as multi-media applications. A 3-D stacked memory system provides an interesting storage option for a general purpose NoC, potentially easing the memory bottleneck.
Memory Integration Technology
The main potential benefits of 3-D integration include reduced overall footprint, and reduction of overall wiring length for a given system configuration, with the associated improvements in propagation delay and energy per 1a: System-In-Package (Die-stacking using 1b: Wafer-level with die stacking [4] . It also allows disparate technologies to be integrated, which for a general purpose NoC can mean the chip-level integration of DRAM and FLASH with CMOS logic. The reduced parasitics for interconnects can further significantly simplify the circuit and power distribution network design for high performance applications.
In particular, massive amounts of inexpensive storage currently supplied by Magnetic Hard Disks can potentially be supplied by flash memory located within the same chip, eliminating the multiple communication hierarchies from the chip to off-chip caches to board-level traces to cables and back. This can potentially improve the raw data bandwidth by several orders of magnitude. Using this improvement to gain a true improvement in system-level functionality discernible by the end user will require a shift from the traditional architecture with multiple hierarchical caches.
Challenges in 3-D Integration
Reliability of such a system is a concern, since flash cells have an activity depended life-span, being subject to a higher voltage stress during the programming phase, and also during reading, for multi-level cells [5] . Wear-levelling algorithms which spread the load evenly across cells, error-correction and redundancy to account for manufacturing variances are common in Flash memories, but may need to be augmented with innovative techniques.
The main obstacle to 3-D integration is poor thermal conductivity and heat dissipation and the resultant temperature rise due to the high power density [6] . Thermal vias can alleviate this problem, but again innovative techniques are called for, such as micro-channel cooling [7] and dynamic activity management with built-in sensors and sensing functions to monitor the operating temperature.
A major consideration in 3-D integration is the fabrication cost. Yield is very sensitive to chip stacking, and can decrease exponentially with no. of layers in wafer-towafer integration. This is because even if the wafers individually have acceptable yield, when combined in a vertical stack, the probability of a good die on one layer combining with a bad one on another layer is high. Mechanical stress during the assembly process, and combined mechanical and thermal stress can cause failures as well as parametric shifts. Testing is significantly more complicated, and the turn-around time is higher. However, these issues are currently being heavily researched, and integrated test functions and an inexpensive test-insert to sit in-between dies in the stack may be one solution [8] .
Improvements in Performance
Three principal operations of a digital system can be identified: the binary switching transfer, communication of a bit and storage of a bit. Each of these operations can be characterised by a metric, such as gate delay for the binary switching transfer or interconnect latency for communication. High-level performance metrics such as MIPS, bandwidth and throughput can be calculated from these metrics. By considering limitations imposed on each of these operations from physical considerations at different hierarchical levels, ranging from the fundamental to the system level, an exploration space for performance can be defined. The key metrics relating to each of these operations for the 3-D stacked memory system will identify its location within the performance space, and highlight the possibility of potential improvements.
Such an analysis can be used to identify future opportunities for NoC based systems in demanding applications.
