Abstract-Scaling challenges with NAND flash have forced manufacturers to consider monolithic 3-D process and device architectures as potential successor technologies. Those that involve a vertical cylindrical channel are regarded as favorites. These include bit-cost scalable (BiCS) NAND, pipe-shaped bit-cost scalable (p-BiCS) NAND, and terabit cell array transistor (TCAT) NAND. It has been assumed that their manufacturing costs decrease monotonically with the number of additional device layers. This paper presents a rigorous analysis of this assumption based on recently reported challenges associated with the construction of these architectures. It is shown that there is a minimum in die cost after which costs increase with increasing device layers. Also, achievable die sizes using these approaches may not even reach existing production NAND Flash. An important consequence is that monolithic 3-D approaches that involve more lithography-intensive steps may actually result in lower total cost provided that these scale appropriately Index Terms-3-D, memory, nonvolatile (NV).
I. Introduction

N
AND Flash scaling problems in 2-D have given rise to several monolithic 3-D process and device architectures that involve vertical cylindrical channels. The first to make an appearance was the Bit-Cost Scalable (BiCS) approach [1] , [2] . This architecture evolved into the pipe-shaped Bit-Cost Scalable (p-BiCS) approach [3] , [4] to increase cell reliability. A similar vertical structure known as Terabit Cell Array (TCAT) NAND was introduced with the main differences being a "gate-last" approach and a bulk tunneling erase [5] , [6] . In all these cases, the relative bit cost is projected to decline monotonically with increasing memory density where the latter is achieved by ever-lengthening vertical NAND strings. Recently, however, certain challenges have been reported in the fabrication of these structures with the most important being the need to increase the unit cell size as the vertical string lengthens [7] . This increase has been quantified by the angled nature of the vertically etched hole and slit inherent in these structures [8] , [9] . This paper constructs a die cost model based on the structure of the vertical cylindrical NAND Flash architectures. As such, it extends the author's published model [10] to be able to compare such vertical approaches with the more evolutionary but lithography-intensive monolithically stacked architectures [11] - [16] . This allows a more complete cost comparison than simply considering the total number of cells in a single die. The assumption is that such lithography-intensive approaches can indeed scale. Preliminary data in this regard looks promising since classic electrostatic control measures can be implemented such as ultra-thin channels [17] , [18] and dual-gate structures [15] , [16] . The pitch in the X-direction is made up of the channel hole radius and the hole-to-hole space both at the top of the stack. The pitch in the Y-direction includes the slit-to-slit space plus the slit dimension both at the top of the stack. The slitto-slit space includes the channel hole positioned within the horizontal gate. This will involve a lithography misalignment that will have to be accounted for in the cell size. However, this registration error will be put aside in the following analysis so that the results cannot be attributed to poor lithography.
II. Physical Structure
An important realization is that the cell area at the top of the stack defines the total memory array footprint on the die. For a fixed memory capacity, fewer cells take up each layer as the stack heightens. Therefore, the array footprint should 0894-6507 c 2013 IEEE decrease. However, the cell area at the top of the taller stack increases and offsets the expected array area reduction. We may indeed reach the point where any array area reduction through stacking is completely offset by the cell area growth. This paper analyses this effect and determines the main factors affecting it.
Equations (1) and (2) describe the memory cell pitches at the top of the stack in the X and Y-directions respectively:
where P X is the cell pitch in the X-direction, P Y is the cell pitch in the Y-direction, ϕ is the hole diameter at the top of the stack including the gate dielectric, F is the minimum feature size at the top of the stack, D B is the slit's short side dimension at its bottom, N L is the number of device layers, L g is the vertical gate length, L s is the vertical gate space, and θ is the taper angle from the normal of both the channel hole and the slit. In this analysis, we assume that both hole and slit can be described by the same taper angle. This is probably a simplification since hole and slot structure will be determined by different process requirements. It will be left to the interested reader to build an extension to this model using two different taper angles. The hole diameter ϕ can be expressed as [8] :
where R B is the bottom radius of the channel hole excluding the gate dielectric, T ONO is the thickness of the gate dielectric, and all the other symbols have been defined above. The memory cell area in each layer (A) is given by (4) and the effective cell area of the 3-D vertical NAND (A eff ) is given by (5):
III. Process and Die Cost To lay the groundwork, a similar approach to that taken in reference [10] is adopted here to estimate wafer costs. Equation (6) gives the wafer process cost for a monolithically 3-D stacked NAND process whether it be the vertical cylindrical channel type described above or the more lithographyintensive approaches discussed elsewhere. (6) where C 3D W is the process cost of the 3-D wafer, C 0 is the base wafer cost with all the support circuitry but without the memory cells (assumed to be very similar for any of the 3-D approaches), C 3D crit mask is the cost of a critical mask The total number of dice per wafer (DPW) using any approach can be approximated by
where d is the wafer diameter and A die is the die area. The latter in 3-D (A 3D die )is given by
where N cell is the total number of cells in the chip, A is the area of a single memory cell on each layer (given by (4) in the case of the vertical channel NAND), and AE 0 is the array efficiency defined as the memory array area "as seen from above" divided by the total chip area and in %. Unlike reference [10] , we shall not consider multiple levels per cell since it is unlikely to be a differentiating factor between different 3-D approaches. Fig. 5 shows the die size of a 128 Gbit NAND Flash as a function of the number of device layers when using a vertical channel approach with the taper angle used as a varying parameter. The physical structure constants used are given in the figure and follow those used in reference [8] . Additional constants used here are D B , the slit's short side dimension at its bottom, and AE 0 , the array efficiency. Clearly, any of these parameters may be altered to optimize area and cost but the fundamental behavior remains the same.
IV. Die size examples
A few key points can be made based on Fig.5 . First, a change of taper angle by half a degree has a rather dramatic impact on the die size, namely an increase of about 60 mm 2 . Second, taper angles of one degree and above lead to minima in die sizes as a function of device layers. This is of course contrary to the original claims about such vertical channel NAND. Third, for a taper angle of one degree, about 50 device layers would be needed to reach about the same die size as the latest (at the time of writing this paper) classic 128 Gbit NAND Flash that uses 20 nm design rules and 3 bits per cell (146.5 mm 2 given in [19] ). Fig. 6 shows the die size of a 128 Gbit NAND Flash as a function of the number of device layers when using a vertical channel approach with the vertical gate pitch (L g + L s ) used as a varying parameter. The physical structure constants used are again given in the figure and follow those used in Fig. 5 and reference [8] . The most important one is the taper angle which is held at one degree.
Several important conclusions follow from Fig. 6 . First, die size is a very strong function of vertical gate pitch when taper angle is non-zero. Indeed, in this example, a 20 nm change in the latter results in about 70 mm 2 increase in the achievable minimum die size. Second, clear minima in die size are apparent at vertical gate pitches equal to or larger than 40 nm. Third, a vertical gate pitch of 40 nm at this taper angle is needed to reach the die size of the 128 Gbit NAND Flash presented in reference [19] . This is equivalent to 20 nm vertical gate length and 20 nm vertical gate space. Clearly the originally envisaged freedom to lengthen the vertical gate length in such NAND architectures becomes severely restricted whenever there is a non-zero taper on the hole and slit.
V. Die Cost Examples
The die size behavior with various parameters can be extended to die cost using (6), (7) and (8) . The same values for C 0 and C 3D crit mask are used as in reference [10] , namely $2800 and $200 respectively which, of course, can be varied by the interested reader. A value of 2 is used for N 3D crit mask which is the main advantage of such a vertical channel approach. Any additional cost associated with the various process steps to form the memory stack is regarded as negligible for this analysis. This is most probably a very optimistic approach in favor of the vertical channel architecture since these steps are certainly not similar to any standard 2-D NAND process. Included in these are multi-conductor and insulator depositions, deep multi-material reactive ion etching to form the channel hole and slit, memory gate dielectric deposition into the hole, channel silicon deposition and removal from everywhere else, refractory metal deposition or silicide formation to enhance the gate conductivity within the slit, and removal of unwanted metal in the slit.
Another simplification concerns yield, namely that we assume no yield degradation as we form the memory stack. Again, this is optimistic in favor of the vertical channel architecture but can be regarded as providing an asymptotic target that manufacturers tend to approach with time and process learning. Indeed, a similar path is taken below when various monolithic 3-D architectures are compared. Fig. 7 shows the die cost of a 128 Gbit chip as a function of the number of device layers in the vertical channel stack with the taper angle as a parameter. The model constants are shown in the inset and include the base wafer cost, the cost per critical masking layer and the number of such layers. As would be expected from the above die size analysis, cost minima are apparent. Indeed, half a degree increase in taper angle at a 40 nm vertical gate pitch has about $5 impact on die cost. Fig. 8 shows die cost of a 128 Gbit chip as a function of the number of device layers with the vertical gate pitch as a parameter with the taper angle held at one degree. Besides the cost minima, it can be seen that at this taper angle the vertical gate pitch has a dramatic impact on die cost. Every 20 nm increase in gate pitch increases the die cost by about $4.
VI. Discussion and Comparisons
The ability of vertically oriented channel NAND leading to decreasing costs per bit as more layers are added is clearly highly dependent on two interlocking factors, namely holeand-slit taper angle and vertical gate pitch. These two factors interact to form a relatively large cell pitch in the topmost layer of the memory stack. The problem of course is that it is this layer projected onto the silicon wafer that defines the memory footprint of the chip. Adding more layers in an attempt to increase memory capacity also results in a larger die. This tradeoff eventually leads to larger and more expensive die as more layers are added. In this way, cost minima arise as we have seen above. Clearly, vertical gate pitch and holeand-slit taper angle have become the prime scaling parameters in these vertical channel implementations taking over from classic NAND's wordline and bitline pitches. This seems to be yet another example of the "conservation of misery" principle that abounds in the semiconductor industry. Instead of relying on the ultimate lithography tools, these vertical cylindrical channel NAND approaches appear to place the spotlight on etch and deposition as the keys to smaller dice. In fact, advanced lithography may even be unable to shrink these dice any further since the hole size, taper angle and vertical gate pitch will be defined by other factors such as memory retention, etching a high aspect ratio hole and slit, deposition and removal of material within these structures, and wordline resistance of thin gate sheet conductors.
The fact that die size and cost minima appear for the vertical channel NAND architectures makes a comparison with the more lithography-intensive 3-D Flash approaches a sensible strategy. To do this, we shall look at making a 256 Gbit single chip NAND Flash and compare doing so using, on the one hand, the vertical cylindrical approach as described above, and, on the other hand, a generic lithography-intensive approach that involves the use of three critical masking layers per device layer. A 256 Gbit chip is chosen with the thought that classic 2-D NAND may not reach this goal resulting in the most cost-effective way being some form of monolithic 3-D approach. This, of course, may be a hostage to fortune given the incredible shrinking path that the former has taken up until now. Nevertheless, it illustrates the point of comparison.
If classic 2-D NAND does in fact manage to reach this product level, we can do the comparison at 512 Gbit at a later date.
Further assumptions for this comparison include: yield effects are not taken into account for either 3-D approach; the control block circuitry that resides in the bulk silicon is similar for both approaches and can be described by the same array efficiency factor; the minimum possible die size that can be reached by any 3-D approach will be about the area of existing 2-D NAND Flash peripheral circuitry since this will remain in the bulk (about 30 mm2 for a 150 mm2 die with an array efficiency of 80%); the same base wafer cost and critical mask cost are used for both; both 3-D approaches use single level cells; and device layers can be stacked until the minimum possible die area for each is reached. This last assumption is an over-simplification since as more 3-D cells are stacked, more support circuitry will be needed. However, this will be so for both approaches so neither will gain an advantage in the comparison by assuming no block growth. In this regard, the author's reference [10] was too conservative in assuming a smaller number of stacked device layers to approach this minimum die size.
Before entering into the comparison in earnest, we can consider the main factors that will affect the result. In the vertical cylindrical approach, increasing the number of device layers to increase capacity will result in a larger memory array footprint if the taper angle is non-zero. The cost tradeoff is then between increasing memory capacity to lower cost per bit versus needing a larger die to do so and therefore increasing the total die cost and therefore cost per bit. In any lateral lithography-intensive approach, increasing the number of device layers to increase capacity results in a smaller memory array and therefore die and so pushes cost per bit down. However, this is done using increasing wafer costs due to the expensive lithography that will push cost per bit up. The following comparison tests these tradeoffs between the two 3-D approaches. Fig. 9 is the equivalent of Fig.5 but for a 256 Gbit die showing the die size as a function of the number of added device layers in the vertical cylindrical NAND approach. The same model parameters as in Fig.7 are used and the taper angle is varied from zero to two degrees. In all cases, a fairly aggressive 20 nm vertical gate length and 20 nm vertical gate space are used. Minimum possible die area is only achievable with a taper angle at or close to zero degrees.
As a comparison, Fig.10 shows the 256 Gbit die size as a function of the number of added device layers in a generic lateral lithography-intensive approach where the cell size per layer is taken as 5F 2 with F, defined by lithography, as the variable parameter starting at 20 nm and increasing to 40 nm. Also, each memory array size is multiplied by 1.1 to account for the overhead of bitline contacts and string select devices. The main thing to notice is that the memory array footprint can continue to shrink by adding device layers. It is the cost of doing so that must be compared with the vertical channel approach. Such an approach reaches the minimum possible die area as long as enough device layers are added.
The main thing to notice from a comparison of Fig. 9 and Fig.10 is the ability of a lithographically expensive approach to reach die sizes that are smaller than all but the close-tozero taper angle vertical channel approach. The question is, can this be done such that the total cost of the resultant die is lower? Fig. 11 shows the die cost of a 256 Gbit chip as a function of the number of device layers with the taper angle as a parameter when using a vertical channel NAND approach. The model constants are shown in the inset and include the base wafer cost, the cost per critical masking layer and the number of such layers, and are the same as used for Fig.7 . Fig. 12 shows the die cost of a 256 Gbit chip as a function of device layers when using a lithography-intensive layered 3-D approach. The minimum feature size, F, is the parameter. The base wafer cost and cost per critical lithography step, $2800 and $200 respectively, are the same as used for the vertical channel NAND. The main difference here is that we have assumed three critical lithography steps per device layer. This adds $600 to the wafer cost for each additional device layer.
Figs. 11 and 12 encapsulate the crux of this paper. It appears that ever-decreasing cost NAND Flash using a vertical channel approach can only really be achieved when the taper angle is zero or within a few tenths of a degree from zero and when the vertical gate pitch is around 40 nm or below. Any taper angle larger than this combined with a vertical gate pitch larger than 40 nm results in a die cost that can be undercut by a lithography-intensive layered 3-D approach even when the latter has three critical layers per device level.
It is difficult to know what levels of taper angle are achievable by manufacturers but now and then, they include cross section micrographs using scanning electron microscopy (SEM) or transmission electron microscopy (TEM) in their publications. Table 1 shows measurements of these taper angles from the literature. The method involved taking a simple protractor to magnified SEM/TEM images so will be rough and ready. Nevertheless, it does give a fairly good indication of what taper angles have been reached in the vertical channel architectures. The main result is that getting below even one degree is turning out to be quite difficult. With this, it appears that such vertical channel approaches remain susceptible to being undercut in cost by the stalwart of the semiconductor industry, namely photolithography.
VII. Conclusion
This paper extends the work of others by putting vertical channel NAND Flash architectures on a firm cost footing. The results are quite intriguing, namely that two key parameters, taper angle and vertical gate pitch, conspire to produce cost minima as a function of device layers. Comparisons with a layered lithography-intensive approach show that there is a small parameter space, defined by taper angle and vertical gate pitch, within which the vertical channel architecture can maintain a lower total chip cost. Above a few tenths of a degree in taper angle opens up these architectures to being undercut in cost by the lithography-intensive approaches. A simple way of looking at this is that any non-zero taper angle projects a large cell pitch at the top of the stack. This cell pitch forms a large array at the top that itself gets projected onto the surface of the die and defines the footprint of the memory array.
A general principle can be gleaned from this study. It is that any high density 3-D Flash approach that exchanges lithography-intensive processing per device layer for a stack deposition followed by deep hole and/or trench etching must result in taper angles of zero or close to zero degrees. Otherwise, its total cost can be undercut by any 3-D process that uses lithography per device layer to minimize cell areas on all layers.
