Abstract-In this paper, we present 3-D-DATE, a circuit-level dynamic random access memory (DRAM) area, timing, and energy model that models both the front and back end of 3-D integrated DRAM designs from 90-16 nm, across a broader range of emerging transistor devices and through-silicon vias. This paper improves upon previous studies by providing detailed process models all the way down to the 16-nm technology node and incorporating DRAMs implemented with emerging gate transistor devices. Finally, we validate the model against both several commodity planar and 3-D DRAMs, from 80-to 30-nm process nodes, with the following metrics: energy with a mean error of 5%-1% and a standard deviation up to 9.8%, speed with a mean error of 13%-27%, and a standard deviation up to 24% and area within 3%-1% and a standard a standard deviation up to 4.2%.
3-D-DATE: A Circuit-Level Three-Dimensional DRAM Area, Timing, and Energy Model
I. INTRODUCTION
T HREE-DIMENSIONAL die-stacking technique has emerged as a promising solution of satisfying crucial demands of high-density, low-power in a small form-factor which continue to increase in dynamic random access memory (DRAM) [1] . DRAM industry has proposed and implemented 3D die stacked DRAMs for off-chip, and on-chip stacked memory applications [2] , [3] . Many studies have shown that 3D DRAM provides higher bandwidth with lower power consumption, as well as methods to utilize 3-D DRAM in memory hierarchies [4] - [8] . However, most of the studies have been limited in scope concerning vendor provided DRAM data from datasheets and vendor published documents. This has led to a growing gap in knowledge of the area, timing, and energy modeling of 3D DRAMs for utilization in the design process of processor architectures that could benefit from 3D DRAMs.
To facilitate the architecture-level DRAM research, a few works have been proposed on memory models for power, area and access latency calculation of DRAM design to replace vendor-provided data. CACTI is the most widely known of these models [9] . CACTI models caches, SRAMs and The authors are with the Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 27695-7911 USA (e-mail: jongbeom.park@ncsu.edu; rhett_davis@ncsu.edu; paulf@ncsu.edu).
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCSI.2018.2868901
DRAMs. The architectural and circuit level model includes assumptions for optimizing cache and SRAM and is suitable for modeling embedded DRAM. Rambus has also proposed a circuit-level DRAM power model [10] . The Rambus model provides a detailed DRAM physical dimension roadmap, but the model is unable to cope with latency calculation due to the lack of resistance and transistor turn-on current roadmap. DArT provides component-based DRAM model [11] . DArT models DRAM on 68 nm node, and it validates to 45 nm node by using the scaled result from the 68 nm design. The CACTI, Rambus model, and DArT are aiming for modeling planar die DRAMs, thus unable to model 3D DRAMs. CACTI-3DD inherits CACTI framework and is published to model 3D-DRAMs [12] . However, the model does not support DRAMs implemented in or below the 21 nm technology node. Also, the CACTI, Rambus model, DArT, and CACTI-3DD provide a model designed upon the conventional transistor. Thus, existing models are unable to cope with DRAMs implemented with emerging gate transistor devices with related architectural changes.
In this work, we present a DRAM area, timing, and energy (3D-DATE) 1 model by using the same empirical modeling method as CACTI-3DD and Rambus DRAM model. However, by addressing these modeling issues with the following features, 3D-DATE is a more concrete model to contribute architect researchers.
• 3D-DATE provides transistor-level accuracy across various DRAM process nodes, from 90 nm to 16 nm process nodes.
• 3D-DATE presents four different transistor models for modeling DRAM. Recessed channel array transistor (RCAT) [13] and sphere-shaped-RCAT (SRCAT) [14] models are provided in 3D-DATE for modeling traditional commodity DRAMs. 3D-DATE also provides an emerging gate transistor device, the vertical channel access transistor (VCAT) [15] to reflect the future DRAM layout trend and its effect on the area, energy, and speed. To support modeling of general transistor models in DRAM peripheral circuits, a conventional metal-oxide-semiconductor field-effect transistor (MOSFET) model is provided in 3D-DATE.
• 3D-DATE demonstrates a new core design to support emerging VCAT based cell array layout as depicted in [15] . See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. [13] . (b) SRCAT [14] .
(c) VCAT [15] .
sense-amplifier (SA) rotation and hybridization, and wordline (WL) strapping.
• 3D-DATE is validated against 22 planar and 3D DRAMs from 80 nm to 30 nm technology. The remainder of this paper is organized as follows: Section II describes DRAM process node characterization and roadmap. Transistors, wires, and through silicon via (TSV) models, modeled from 90 nm to 16 nm technology nodes, are discussed. Section III describes the circuit-level and architectural-level model of 3D DRAM. Section IV verifies the energy, latency, and area results of the proposed model against datasheets and published papers. Finally, Section V provides the conclusion for this work.
II. DRAM PROCESS MODELING AND ROADMAP
In this section, we provide details of the transistor and interconnect modeling methods included in 3D-DATE. Roadmaps for transistor and interconnect have been documented. Our goal is to show the framework of creating a DRAM roadmap.
A. Transistors
In DRAM, a gate transistor is required to reduce the leakage current and to retain the stored data in the cell capacitor during the required data retention time. As feature size decreases, conventional planar transistors suffer from higher leakage current, mainly due to the higher electric field across the channel, since supply voltage does not linearly scale with the channel length. Increasing channel doping suppresses the subthreshold current with a counter effect of an increase in the electric field across the device junction to the storage capacitor. This increases the junction leakage in the storage node.
Researchers have proposed several different devices for a gate transistor to reduce leakages [13] - [22] . Samsung proposed an RCAT with 88 nm DRAM technology and scaled down to 50 nm process [13] , [16] . The recessed gate structure increases the effective channel length of the gate which helps reduce the leakage current. The channel doping density can be reduced; therefore RCAT reduces junction leakage and overall leakage current [23] . Samsung also proposed an SRCAT with the 70 nm process and expected extendable scaling down to sub-50 nm process. SRCAT provides more recessed channel effect than RCAT [14] . As depicted in the Fig. 1 has longer channel length than RCAT or planar MOSFET. The unit, F, is denoted in minimum feature size (half pitch of the first metal layer). RCAT and SRCAT support 8 F 2 or 6 F 2 cell layout as shown in Fig. 2 .
FinFET or its hybrid are also studied as a bitcell transistor in DRAMs [19] - [22] . FinFETs have a more extensive channel width compared to a planar transistor which helps to suppress short channel effect [20] , [23] . However, FinFET needs negative wordline scheme or work function engineering to satisfy the off-leakage current requirement [21] . These limitations make FinFET less attractive than RCAT and SRCAT.
Vertical channel access transistor (VCAT) is another transistor that has been proposed as a bitcell transistor alternative for DRAMs [15] , [18] . The major benefit from VCAT is area efficiency; the VCAT is a three-dimensional structure in which the channel exists vertically, surrounded by the gate. The crosssectional view of VCAT is depicted in Fig. 1c . This allows for the bitcell transistor to be placed at the cross section of bitline and wordline. Thus, VCAT drives cell area from 8 F 2 or 6 F 2 to 4 F 2 as shown in Fig. 2c . Since 4 F 2 cell array layout could increase the gross die about 1.35 times compared to 6 F 2 cell array layout, the industry expected VCAT as the next gate device [18] , [26] .
Compared to all other supportive circuits, satisfying the speed margin of DRAM standard (i.e.DDR, DDR2, etc.) is the driving force that underlines most design and technology choices [16] . Studies show that commodity DRAM uses conventional planar transistors [27] or mixed devices (bitcell transistor for NMOS and a planar transistor for PMOS [16] ) for these supportive circuits. High voltage (HV) driving transistors, required to drive poly wordline in the bitcell array area are depicted by Vogelsang [10] . Table I and Fig. 1 . As shown in Fig. 1c , the top part of the VCAT pillar diameter would be assumed as 0.5 F due to the etching process. During the evaluation, general feature size and scaling is employed from the Rambus DRAM process roadmap.
We assume RCAT would dominate in 90 nm and 75 nm process, and SRCAT would dominate from 65 nm to 16 nm [14] , [17] . The trench depth of the recessed device correlates with the threshold voltage. A deeper trench would result in lower threshold voltage even while all the other conditions are unchanged [28] . The trench depth would follow the results published in [14] , which examined 110 nm to 60 nm process. Below 60 nm process, we assume that the trench depth would remain as it is in the 60 nm process.
For the gate material, we assume it is tungsten silicide from 75 nm and tungsten from 55 nm [27] . These work functions are 4.82 eV and 5.12 eV in each case [29] , [30] . Asymmetric Channel Doping (ASC) is assumed from 55 nm to reduce junction leakage between the storage capacitor and gate transistor [27] , [31] . VCAT would be used as a gate transistor from approximately 28 nm and below according to ITRS roadmap [26] . However, 3D-DATE provides a roadmap from 90 nm for the comparison since vendors fabricate test chips in larger technology nodes [15] . The pillar height would remain at 250 nm, as shown in [15] . The gate material would be polysilicon [18] , i.e., 4.15 eV for the work function.
Low leakage current (I o f f ) is the primary decision criterion for the gate-transistor design parameters. The JEDEC standard requires 64 ms data retention time at 85 • C for the storage node [32] . When the bitline is precharged half of V array , the relationship between storage node retention time (t R E F ) and I o f f is described by the equation [33] , [34] :
V B L is the bitline sensing voltage and is given by the equation, where C B is the bitline capacitance, V array is the bit array operating voltage, V M AX is the maximum allowable data loss for reading data with a sense amplifier, and C S is the storage capacitance.
The sensing delay of latch-based sense amplifier depends on the input voltage. To avoid performance degradation, the input signal requires above 60% of V array [35] . 3D-DATE calculates the bitline capacitance of 158 fF for 512 bitcells in 90 nm node. To achieve conservative approach, we assume bitline capacitance of 192 fF for 512 bitcells as in Rambus roadmap for 90 nm node. Also, we adopt Rambus bitline capacitance roadmap value for the smaller node calculation. From the calculation, bitcell capacitor could have about 20% to 33% data loss without the sensing speed penalty. V M AX is adopted as 10% of V array for conservative calculation. Based on calculation results of equation 1, 5 fA/cell is a good criterion to satisfy in all target technology node, even though t R E F is 220 ms. The DRAM vendors set the I of f criterion as less than 1 fA/cell [27] , [36] , [37] . However, based on our calculation, we assume 5 fA is the requirement for the gate transistor leakage current as our TCAD device simulation result.
For the RCAT, threshold voltage projections have been provided in references [14] , [16] , [21] , [23] , [38] . Among these empirical data, RCAT threshold voltages are collected as shown in Fig. 3 . For 3D-DATE model, it is assumed that the trend for RCAT threshold will best fit the straight line of the mean value of threshold data. Thus, the trend would follow equation (3) .
The standard deviation of data from the trend line is 0.0664. For the SRCAT, the threshold voltage is assumed to be 200 mV lower than RCAT when all other conditions are kept constant [14] . For the recessed gate transistors, 3D-DATE admits the result to the roadmap when the I o f f is less than 5 fA/cell and when comparing the result with the threshold projection (within the standard deviation range). For the VCAT, the leakage current is the only criterion for this study.
2) High-Voltage and Peripheral Transistor Model:
Peripheral and HV transistor roadmaps are deployed with ITRS MASTAR [39] . MASTAR could calculate expected transistor characteristics (i.e., on/off current, threshold voltage, mobility, etc.) with several transistor geometry values like gate length, oxide thickness and so on. We assume peripheral and high voltage transistors to be planar bulk, and the additional fabrication process for peripherals would be optimized for speed with low leakage current. From this assembly, we rely upon MASTAR process assumptions along with Rambus size projections. ITRS provides saturation current roadmap of supportive transistors [26] , and we admit the ITRS projection for adjusting channel doping concentration.
3) Transistor Roadmap: Fig. 4 shows turn-on current and capacitance roadmap of gate transistor with Rambus and CACTI-3DD roadmap. For the turn-on current, CACTI-3DD assumes I on is equal to 20 μA for every node as an ideal value. In 3D-DATE, I on of recessed gate transistor scaled from 24.1 μA at 90 nm node to 3.4 μA at 16 nm node and I on of VCAT is above 20 μA for every technology node. For the device capacitance, Rambus and CACTI-3DD expect similar scaling trend below 65 nm node. At the 90 nm node, 3D-DATE expects a difference of 3.4 to 3.8 times for VCAT and RCAT, respectively, compared to Rambus predictions. At 16 nm node, 3D-DATE also expects a difference of 13.0 to 8.6 times for VCAT and RCAT, respectively, compared to Rambus predictions. Fig. 5 and Fig. 6 shows the capacitance roadmap of HV and peripheral transistors. In both cases, 3D-DATE exhibits the most conservative capacitance projection because 3D-DATE adopts the most conservative side-wall capacitance from ITRS-MASTAR and assumes the most conservative gate capacitance mainly due to more extended gate length expectation of Rambus roadmap.
B. Wire 1) Wire Model:
As depicted in Fig. 7b , in general, commodity DRAM has capacitors between poly and metal layer one (M1) at the cell region and uses fewer metal layers (overall two to four layers [10] , [40] ) than the microprocessor technology. 3D-DATE assumes three copper metal wires with polysilicon wordline and polysilicon bitline. 3D-DATE mostly adopts the physical dimension from the cross-sectional report [41] to construct the DRAM wire roadmap; In the technical report, polysilicon plays a role of inter-cell routing layer. 3D-DATE use M1 as the inter-cell routing within peripherals. Since the lack of physical dimensions of M1 for the inter-cell routes in the technical report, the width, pitch, and aspect-ratio of the M1 layer and other properties would follow the ITRS M1 layer projection. Table II shows the detailed physical dimension.
For the detailed material properties, 3D-DATE adopts copper wire properties from the ITRS roadmap. For the wordline and bitline, 3D-DATE assumes tungsten silicide is used as a poly-wire material. Tungsten silicide could have different resistivity according to the different process recipes. Higher temperature and longer time on the annealing process give lower resistivity [42] . For calculating resistance, 3D-DATE uses 80 μ · cm. 3D-DATE adopts Horowitz wire model [43] for calculating resistance and capacitance of wire.
2) Wire Roadmap: Fig. 8 shows the 3D-DATE wire roadmap. Poly-wire has the highest resistance at 16 nm with 1420.5 /μm. Poly-wire also has high resistance in the order of poly-wire, M1, M2, M3 wire. This sequence lasts from 90 nm. Wire capacitance decreases as technology advances. The poly-wire has the highest capacitance in all node, mainly due to the wire pitch is the smallest. The M2 and M3 have a similar aspect ratio as 1.5 and 1.75 respectively, the same distance ratio (i.e., half of the wire pitch), and a similar dielectric between wires. Thus, the M2 and M3 have similar capacitance on all nodes.
For comparison, we choose three commodity logic design processes. The normalized values of wire capacitance and resistance across three anonymous processes with those of 3D-DATE are presented in Fig 9. For poly-wire, 3D-DATE expects about three-to four-fold greater capacitance, with 16% to 42% less resistance than anonymous logic processes, i.e., 3D-DATE has a more extensive physical dimension of poly-wire. In the M1 layer, the 3D-DATE exhibit about 1.3-to 1.6-fold greater capacitance with about 1.8-to 3.4-fold greater resistance. From this, we could expect 3D-DATE assumes smaller geometry and higher dielectric material than anonymous logic processes in the M1 layer. In M2 and M3, 3D-DATE expects about 14.5% to 80% less resistance than anonymous logic processes. On the other hand, the capacitance is about 1.5-fold greater. When dielectric material between M2 and M3 layer are similar to anonymous logic processes, larger physical dimension assumption results in smaller resistance with larger capacitance.
C. Through Silicon Via (TSV)
1) TSV Model: TSV is mainly made by etching or laser drilling. When the TSV is formed by etching, it is hard to achieve high etch rates, smooth sidewalls with controllable sidewall angle, and minimal mask undercut. When making a TSV with a laser, the masking and etching steps are III   ITRS GLOBAL INTERCONNECT TSV SIZE ROADMAP   TABLE IV  3D-DATE TSV PHYSICAL DIMENSION not needed. Although this method has the advantage of reducing the process step, it causes debris or splatters due to laser ablation [40] . Because of these challenges, the ITRS conservatively predicts the scaling of TSVs. Table III shows the ITRS TSV roadmap by year. Despite a published year and matched technology advance, the latest ITRS roadmap expects that size of TSV is still similar or larger than previous years. CACTI-3DD, on the other hand, assumed that the physical dimension of the TSV decreases as the technology advances.
3D-DATE adopts CACTI-3DD TSV physical dimension roadmap since we assume TSV size would scale due to technology advancement. Table IV shows the 3D-DATE TSV physical dimension roadmap. 3D-DATE also adopts CACTI-3DD TSV model and material properties for calculating resistance and capacitance of TSV [12] .
2) TSV Roadmap: Fig. 10 shows the 3D-DATE TSV roadmap compared to ITRS TSV roadmap. 3D-DATE TSV roadmap exhibits a larger area and smaller capacitance projection mainly due to the larger pitch. The 3D-DATE TSV roadmap also exhibits larger resistance due to smaller diameter projection. This makes 3D-DATE TSV area predictions more conservative and latency predictions faster than they would be if the ITRS TSV roadmap were used.
III. CIRCUIT AND ARCHITECTURE LEVEL MODELING Fig. 11 shows the program flow of the 3D-DATE circuit level model. After 3D-DATE reads the user configuration and technology roadmap, physical dimension and properties of the subarray are established and calculated. With the subarray geometry, bank size is calculated along with speed and energy of other bank components such as wordline driver and column select decoder. After calculating the bank properties, 3D-DATE computes the positions of the banks with the floor plan received from the user. TSVs are also inserted at this time according to the user configuration. This produces the die size and signaling length of the DRAM. After these calculations, 3D-DATE obtains DRAM area, energy, and speed by adding all delays and energy for each part.
In this section, we first present a generic model of the area, timing, and energy for the peripheral circuit. Driver, repeater, address decoder, and sense amplifier models are also presented. Along with the components, architectural level, subarray, bank and die level layout models are presented.
A. Generic Area, Timing and Energy Model
3D-DATE assumes that the buffer followed by the digital logic is driving the following logic or wire as shown in Fig. 12 . Fig. 12 . 3D-DATE general logic assumption. Fig. 13 . Nine bit row address decoding path [10] , [46] .
To calculate the size of the transistors which are utilized to build the logic and buffers, 3D-DATE utilizes the logical effort method that calculates the best number of stages in a multistage logic network with the balanced transistor size of each stage [44] . To calculate the latency of the circuit, 3D-DATE uses Horowitz approximation. Appendix A shows a detailed description of the logical effort and Horowitz circuit delay model used in 3D-DATE.
For the energy calculation of the gate, 3D-DATE accounts for the consumed charge by adding the capacitance of every node since the dissipated energy is given by the equation [45] :
where C L is the sum of the intrinsic capacitance of the gate and loaded capacitance of the output. P 0→1 is the probability that the device would consume energy. 3D-DATE fully accounts for power dissipation when a capacitor is charged and ignore the discharge event as CACTI [9] .
B. Design Components 1) Repeater Model:
Rabaey's approach [45] is adopted for the repeater model in 3D-DATE. Detailed description of Rabaey's repeater model used in 3D-DATE is presented in Appendix B. 3D-DATE supports delay penalty to save energy by sacrificing speed; after calculating the optimum size repeater interconnect delay, wire length and repeater size are adjusted for matching to the user-specified delay penalty by decreasing the size of the repeater slightly as well as increasing the wire length slightly.
2) Address Decoder Model: 3D-DATE provides a two-stage address decoder for both row and column address decoding as shown in CACTI5.1 [46] and described in Rambus model [10] . Fig. 13 shows the nine-bit row address decoding path from the input to each main wordline (MWL) as an example. The predecoder stage includes two operable predecoder blocks operable to generate output signals in response to the input address signals. Each predecoder block consists of two-level decoding logic. In the first stage of predecoder, addresses are decoded using up to three 2-to-4 or 3-to-8 base decoders in parallel. The outputs of these base decoders generate the final predecoder signal output by using NAND gates. Two input NAND gates are used when there are two base decoder blocks, and three input NAND gates are used when there are three base decoder blocks.
The second decoder stage in the row address path consists of a plurality of decoding logic which is comprised of NOR gate and inverter as shown in Fig. 13 . The inputs of NOR gate are connected to the output signals of two predecoder blocks, and the following inverter driver drives the MWL. In the row address path, The NOR gate drains out the stored internal charge when the driving output is not selected. To reduce drain charge at the second stage, 3D-DATE assumes that the pair of NOR gate and inverter driver of the second stage are grouped by subarray and enabled per subarray.
The column address path also has a two-decoding stage to generate column select signals. The difference from the row address decoder is that the driving buffer of the predecoder block is an inverter driver and uses the static NAND gate for the second stage. Since the static gates are used in the column address decoding path, the power consumed is less than the row path. In each address path, the first and the second stage is assumed to be located at the side of a bank. Address decoding delay is considered to be equal to critical path speed.
3) Sense Amplifier Model: We adopt a latch-based sense amplifier for 3D-DATE. Fig. 14 shows the schematic of the bitline sense amplifier. The bitline and complement bitline are precharged at half the voltage storage capacity of the storage capacitor by using an equalizer. When wordline is enabled during the read operation, voltage differences occur in the bitline pair due to the current flowing from the capacitor. The voltage difference acts as an input to the cross-coupled inverter sense amplifier. For the write operation, data is written from the write-back driver via I/O port through the sense amplifier. We assume the write-back driver exists another side of subarray as shown in [15] . 3D-DATE adopts the method used by CACTI5.1 and Horowitz to analyze delay of bitline, sense amplifier, and write-back driver [46] , [48] .
To estimate the sense amplifier area, 3D-DATE adopts the sense amplifier layout of Samsung DRAM depicted in [15] . In the study, the cross-coupled inverter sense amplifier is derived to be about 60 F and a short length of 6 F. Based on the derivation, 3D-DATE adds 30 F on the long side for the equalizer and the column select transistor. Therefore, the length of the entire long side is estimated to be 90 F. [10] , [49] . (b) Subarray and related peripheral circuits [50] .
C. DRAM Architecture 1) Die and Bank Architecture:
The schematic diagram in Fig. 15a shows the floor plan of a typical 8-bank double data rate (DDR) DRAM [10] , [49] . The bank comprises a group of subarrays. Row logic is placed in between banks to decode row addresses and to drive the main wordline (MWL). Column logic is also placed at the other edge of the bank to decode column address and to drive column select signal.
Data, address, and control I/O pads exist on I/O pad area along with the transceiver circuits. The center stripe has control logic and the power system such as a charge pump and voltage regulator to support bank operation. When TSVs are needed to support a 3D floor plan, 3D-DATE assumes that the TSV is located on the center stripe as introduced in the study [1] .
3D-DATE allows the user to determine the placement of the banks within the die. The user is required to input how many banks are arranged in the wordline direction and the bitline direction, or in the top and side directions of the die, respectively. Based on the size of the bank and the layout of the bank entered by the user, 3D-DATE prefers by placing the center stripe in the center of the die where the length is short so that the die shape approaches the square.
2) Subarray Architecture: 3D-DATE supports three different subarray layout as shown in Fig. 2 . The subarray with recessed gate transistor is for the subarray architecture as shown in Fig. 15b . In a typical subarray with a recessed gate transistor, MWL, column select, and master array data lines play a role of interface between the cell array and outside of the bank. During a read operation, the MWL is selected by the row decoder. And one of the SWL drivers is selected by a wordline control signal and then drive selected wordline by the row address. The conceptual schematics of a SWL driver and wordline control signal driver are shown on the right side of Fig. 15b . Column select signal is enabled by the column address and select data to read. The bitline sense-amplifiers which are aligned with the bitline sense the difference and drive out the selected data through master array data lines. The size of subarray is determined by performance and the density of the memory. SWL and bitlines are typically connected with 256 to 512 cells, and the column select line and MWL go across 16 to 32 subarrays [10] , [15] .
3D-DATE supports additional subarray floor plans for the emerging device, referred to as VCAT. Fig. 16 shows the schematic diagram of subarray for the 4 F 2 DRAM. There are two major changes compared to recessed gate transistor based subarray layout. First, we assume a sense amplifier layout that is tilted by 90 degrees [15] . Also, the wordline strapping model is implemented; The SWL is assumed using the M1 layer. Moreover, additional poly gate-wordline is connected to SWL using via. The maximum gate-WL length is 88 bit. The right side of Fig. 16 represent the 3D wordline strapping model as shown in the publication [15] .
3) General Area, Energy and Timing Calculation: To calculate area, energy, and speed of entire DRAM, 3D-DATE first performs calculations on the subarray and calculates the cell area of the bank based on the subarray. After calculating the energy and speed for the required address decoder with the row and column addresses input by the user, the area of the bank is calculated by adding the width of each decoder to the edge of the bank. 3D-DATE then estimates the floor plan of the entire die from the user input. The total energy usage and latency of the DRAM is the sum of all charge and discharge events and latency of each component along the path shown in Fig. 15a . The area of the DRAM die is also calculated as the sum of the areas of all components. 3D-DATE does not support an accurate model of IO pads, power networks, or control logic. Thus, 3D-DATE accounts for the estimated center-stripe area derived from several documentation and publications of 80 nm to 30 nm technology along with the calculated bank area [1] , [15] , [49] , [51] . In a commodity planar DRAM, the estimated additional center stripe area is about 20% of the sum of the entire bank area.
IV. VALIDATION
In this section, 3D-DATE model results are compared to energy, and speed published in the data sheets of several commodity DRAMs across 80 nm to 30 nm technology nodes and different DRAM generations. The area information is not available in the datasheet, the area result calculated at 3D-DATE is compared with the estimated area derived from the die photos of several documentations of 80 nm to 30 nm technology [49] , [51] . For validating 3D DRAM and VCAT based DRAM, the 3D-DATE model results are compared to the area, energy, and speed from several state-of-the-art publications [1] , [15] , [52] .
A. Energy Validation
Released DRAM datasheets do not directly report the operating energy of the device, but vendors provide currents across various operational conditions according to the JEDEC standard [32] , [53] . Among various operation scenarios, the key commands are active, read, write, and precharge command. In this work, we focus on four DRAM operation energies such as active, burst read, burst write, and precharge. To calculate the precise operating energy according to the datasheet, we used the system level model, DRAMPower, proposed by Chandrasekar et al. [54] . Table V shows the comparison of 3D-DATE energy results with the calculated energy from the specification. Listed commodity DRAMs are planar DRAMs.
The 3D-DATE energy calculation results yield an accuracy with the mean percentage errors of −1.39% to 1.74% and the standard deviation of less than 8.30%. From the results, 3D-DATE seems to predict each latency faster than actual DRAM speed. However, the measured shmoo plots in the latest publications [1] , [15] show that the measured latency of each timing parameter is much faster than the timing specification in DRAM manuals. The timing values provided DRAM specifications are slower values than the actual latencies of the implemented DRAM. Thus, the results comparing the specification and 3D-DATE calculation in Table V are not sufficient conditions to verify that the calculated latency from the 3D-DATE model is the same as or similar to the actual DRAM latency. However, the To obtain more accurate latency, it is necessary to know the exact kind of wire materials which are used for each DRAMs. Using a more elaborate delay model than the Elmore model also would be helpful. To understand the standard deviation, we need to take into consideration the DRAM I/O circuit. To receive and send accurate data, each vendor adopts different high-speed I/O circuit design for each chip. Because of these design difference, each DRAM chip has different internal delay budget right before the I/O circuit. However, we could not model the I/O circuit design of each DRAM. To reduce the standard deviation, it also needs to know the exact I/O circuits for each DRAM of each vendor. Table VII shows the comparison of 3D-DATE model area results. For Micron 1Gb DDR3 DRAM, when comparing the two models, the error rate difference is within the maximum of about 5.6%. However, as we discussed in with the derived areas of the target planar DRAM designs. The bank area is composed of subarray areas including bitcell, the sense amplifier, and sub-wordline drivers. The mean area errors for the bank area is approximate −3% with a standard deviation within 4.4%. The mean area errors for the die area is about −0.6% with a standard deviation of 2.49%.
B. Latency Validation

C. Area Validation
D. VCAT Based and 3D-DRAM Validation
To verify the 3D design and VCAT based design, we used data from several papers and commercial 3D DRAM specifications [1] , [15] , [52] . The VCAT based design has a buried bitline as shown in [15] . Table VIII shows the timing parameter comparison of VCAT based DRAM and 3D DRAMs. The target latencies of 8 Gb 3D DRAM [1] and Micron TwinDie™-DRAM (MT41K1G8TRF) [52] are by the DDR3 specification [55] . For the same reason shown in Table V, most   TABLE VIII  VALIDATION OF TIMING PARAMETER OF VCAT BASED AND 3D DRAM   TABLE IX VALIDATION OF AREA CALCULATION OF VCAT BASED AND 3D DRAM timing comparisons result in negative errors. Only the latency obtained from VCAT design is a measurement value and not a specification [15] . For measured values in the VCAT design, 3D-DATE yield the accuracy with errors of 0.37%, −0.85% in t RC D and t RC , respectively. Table IX shows 
E. Propagated Error Due to Process Variation
3D-DATE is circuit-level modeling based on technology predictions as shown in Fig. 11 . The error of technology projection would propagate to the circuit modeling results. To examine the impact of propagating error, we varied the L gate , channel doping, and T ox values by ±10%, respectively. Table X shows the timing parameter change according to process change. Increasing gate length, channel doping, and gate oxide thickness reduces the turn-on current of the transistor while all the other conditions are unchanged. Compared to the normal case, 10% more gate length, channel doping, and oxide thickness result decreased latency mean error since these changes result in slower device than the normal case. Except for the t RC D of 10% gate length increase case and t R P of 10% T ox reduce case, the standard deviation in most cases is also within the change of ±4%.
Table X also shows energy change according to process change. Increasing gate length by 10% results increased gate capacitance about 10%. As channel doping is changed, the gate capacitance is almost unchanged, and the depletion capacitance is changed less than ±10%, according to the calculation by the MASTAR [39] . Reducing oxide thickness 10% results increased gate capacitance about 10%. 3D-DATE adjusts the size of the driving buffers with these capacitance changes to achieve optimum speed. In the overall, energy change as follows: a 10% increase in gate length results increased overall mean error but a 10% decrease in gate length also affects buffer sizes to optimum speed, which results in a slight increase of the mean error by 0.59%; a 10% decrease in gate oxide thickness results in the more overall mean error, and a 10% increase in gate oxide thickness results in decrease of the mean error by 12.54%; and a 10% increase in channel doping leads the increase in energy due to the increased depletion charge but a 10% decrease in channel doping leads slight increase in energy since the buffer size changes. Except for a 10% increase in T ox case, The standard deviation in all cases is also within the change of 0.8%. Table VI shows the comparison of CACTI-3DD and 3D-DATE. For Micron 1Gb DDR3 DRAM, CACTI-3DD energy result yield an accuracy within −3.9% to 9.4% while 3D-DATE result yield −3.1% to 7.7%. For the latency, CACTI-3DD has a maximum error of −10.9% and 3D-DATE has an error of 13.0%. For the area, CACTI-3DD and 3D-DATE show an error of −9.8% and −5.8%, respectively. Also, for the 8 Gb 3D DRAM, CACTI-3DD and 3D-DATE show an error of 2.2% and 8.3%, respectively for the die area calculation. For the latency, CACTI-3DD has an error rate of −0.9%, while 3D-DATE has an error rate of −30.4%.
F. Comparison With CACTI-3DD
V. CONCLUSION
We have presented the three dimensional DRAM area, timing, and energy model. We have shown the 3D-DATE process roadmap from 90 nm to 16 nm node. With the latest device articles, we have proposed transistor roadmaps by simulation and calculation for VCAT, RCAT, high-voltage transistor and the peripheral transistor. The 3D-DATE roadmap has the most conservative value than the values already presented in other models. We also have proposed a wire roadmap using the material parameters provided through the ITRS roadmap and the physical dimensions presented in the ITRS roadmap and the cross-sectional die report. Compared to the anonymous logic design processes, poly, and the metal layer 1 are conservatively predicted. The metal layer 2 and 3 are predicted the larger size; therefore resistance values of the 3D-DATE roadmap are smaller than the logic processes, and capacitance values are more significant than the logic process. We adopted the physical dimension projection and model of TSV presented in CACTI-3DD. Also, we have implemented and verified circuit level model as empirical models such as CACTI-3DD and Rambus model. The layout and arrangement of subarrays and banks and the placement of peripheral logic were based on the DRAM architecture introduced in the Rambus model. This model is successfully validated against 22 published and commodity DRAMs. We verified 3D-DATE down to 30 nm node.
APPENDIX A GENERIC AREA, AND TIMING MODEL
A. Area Model
Since the study [56] showed the optimum stage effort was 3.98 for the large load driving stage buffers, 3D-DATE selects the optimum stage effort number of 4. This simplifies the optimum stage number as,
where path effort F is equal to C load /C in . At stage i , the input capacitor, C in,i is simply derived as,
where C in,i+1 represents the input capacitance of the next stage. The size of each inverter stage could be recursively derived from the loaded capacitance of the last stage since the last stage's next input capacitance is equal to the C load . Therefore, we can calculate the size of each stage and derive the area of the logic block.
B. Timing Model
A gate can be simplified as shown in Fig. 17 . At the node A, there is an intrinsic capacitance in the X direction. To calculate the intrinsic capacitance, we accounted for every drain capacitance connected in node A. In the Y direction, there is a load capacitance. Since the load capacitance is the sum of the gate capacitance of the transistor which connected to node A, we added every gate capacitance of connected node A. Then, the intrinsic gate delay is presented as τ f = R d (C intrinsic + C load ). The drain and gate capacitance, and the drain resistance would be obtained from TCAD simulation and MASTAR model result.
For the speed calculation, 3D-DATE uses Horowitz approximation as seen in [48, where β is the gate turn-on voltage which is a normalized range 0 to 0.5. 3D-DATE adopts the number of 0.5 for the β for conservative approximation. The α is the normalized rise time constants when input changed zero to logical one and defined as
where τ in is input rise time. The total delay of the gate and buffer chain shown in Fig. 12 is the sum of the delays of each node, which is calculated using Equation 7. Fig. 18 shows interconnect line with the repeater. R d , C out , and C in represent the resistance, output capacitance and input capacitance of a minimum size inverter respectively. r and c represent the resistance and capacitance per unit length, respectively. The optimum size inverter ratio, S opt , for the minimum size inverter is given by S opt = R d c rC in (9) as seen in [45, eq. (9.10) ]. The optimum wire length which is segmented by the inverter is derived as:
APPENDIX B REPEATER MODEL
By using the Elmore delay approach [57] , the unit length interconnect propagation delay with the optimized repeater is derived as: 
