Abstract: The prospects of three-dimensional (3D) integration for Terabyte large scale integration using bumpless interconnects with low-aspect-ratio TSVs and ultra-thinning are discussed. Bumpless (no bump) interconnects between wafers are a second-generation alternative to the use of microbumps for Wafer-on-Wafer (WOW) technology. Ultra-thinning of wafers down to 4 µm provides the advantage of a small form factor, not only in terms of the total volume of 3D ICs, but also the aspect ratio of ThroughSilicon-Vias (TSVs). Our bumpless interconnects technology is classified into Via-Last, which is performed from the front side after thinning, and stacking Back-to-Front, in which any number of thinned 300 mm wafers and/or heterogeneous dies can be integrated. From an economic point of view, in many situations WOW is the leading 3D process because stacking at the wafer level drastically increases the processing throughput, and using multi-level bumpless interconnects, with individual wiring die-to-die, provides an appropriate yield that is equivalent to or greater than that achievable with 2D processes when scaling down to 22 nm nodes and beyond.
Introduction
Prior to discussing 3D integration for high-volume manufacturing, it is necessary to investigate the future prospects of semiconductor technology development. Conventional two-dimensional (2D) scaling will face a severe economic crisis due to the expensive lithography processes and facilities required [1, 2] . Reducing costs requires the adoption of advanced lithography technologies, which, together with peripheral support facilities, account for one-third to one-fourth of the total cost of a manufacturing line. In short, while useful for reducing chip size, scaling is extremely burdensome in terms of capital investment. Large-scale investments have so far been made considering the technologies that will be available two to three generations ahead, e.g., 10 nm technology should be applied to nodes of <5 nm, which will also face physical limitations. This is based on the empirical rule that profits are made several generations after investments for reasons involving the tradeoffs between products sales and facility depreciation.
According to this empirical rule, an investment in 22 nm technology needs to be made in consideration of its applicability to 10-14 nm technologies. However, the price of extreme ultraviolet (EUV; ¼ 13:5 nm) lithography machines is about 100 million USD, which is more than twice that of ArF immersion lithography machines, and their current throughput is around one-tenth or less. When converted into the processing capacity of a current large-scale fabrication facility (e.g., 50,000 incoming wafers per month), based on this system performance, an investment of approximately 2 billion USD will be required for EUV technology. Assuming that the past lifelong sales for each generation are approximately 10-times the corresponding business investment, the corresponding market size necessary for this investment is more than 20 billion USD. Based on the 300 billion USD total worldwide semiconductor market, this expected market size for one product and one manufacturer is not realistic. In short, this is the limit of 2D scaling in light of the economics of the industry, and it is difficult to find a scenario of victory at present.
Extension into vertical space, such as 3D stacking, in combination with conventional 2D integration, is anticipated to overcome these problems [1, 2, 3, 4, 5, 6] . Fig. 1 shows a comparison of bump and bumpless interconnects using through-silicon vias (TSVs) for 3D logic/memory stacked structures, assuming six dies for a memory stack and one multi-core microprocessor. It is possible to make a roadmap to achieve high-density integration backed up by production costs. A stack containing three 300 mm wafers provides a total silicon surface larger than that of a single 450 mm wafer. Moreover, retaining the standard 300 mm wafer size for stacking ensures compatibility with existing manufacturing facilities in frontend processing and helps utilize the mature process technology that has been developed for wafer processing. This paper reviews wafer-level 3D integration and compares its manufacturability with conventional 2D scaling. For vertical wiring, bumpless interconnects using TSVs, ultra-thinning technologies for Wafer-on-Wafer, and Terascale generation devices are also described.
Wafer-level 3D integration process
Bumpless interconnects using TSVs are a second-generation alternative to the use of TSVs with micro-bumps [7, 8, 9] . Our bumpless interconnects process involves a Thinning-First process before bonding wafers, followed by a Via-Last process, meaning that interconnects are formed after bonding the wafers, as shown in Figs. 2 and 3. Via-hole etching is carried out on a silicon substrate having a dielectric layer of multilevel interconnects after wafer thinning. Since bumpless Wafer-on-Wafer (WOW), including Chip-on-Wafer (COW), technologies use a Back-to-Front stack, in principle any number of thinned 300 mm wafers can be stacked to fabricate large-capacity memory and logic devices. This wafer stacking method is similar to multilevel metallization in the Back-End-of-Line (BEOL), as if replacing dielectric deposition using thinned wafers and Al and/or Cu metallization with bumpless interconnects using TSVs.
The development of WOW has proceeded through four modules, classified along the process flow. The modules include a thinning module for thinning the wafer substrates in which devices are implemented, a stacking module for bonding and stacking the wafers, a TSV interconnects module for forming Cu interconnects embedded in upper and lower wafers with TSVs, and a packaging module for singulating the stacked wafers. Dual-Damascene interconnects form a so-called redistribution layer (RDL) and also serve as a counter electrode for the subsequent stacked wafer.
The thickness of the thinned wafer is a critical dimension for the aspect ratio of TSVs because the aspect ratio is determined by the diameter and the wafer thickness. However, since the thinned wafer is bonded on a base wafer, there is no need to take measures for handling ultrathin wafers. In our recent study, the typical thickness of a thinned wafer ranged from 5 to 10 µm. When the thicknesses of the device layers in a DRAM and an MPU are assumed to be approximately 5 µm and 10 µm, respectively, the aspect ratio (depth-to-diameter ratio) of a TSV is only 2-4 for a TSV diameter ranging from 5 to 10 µm, whereas conventional TSVs with bumps have aspect ratios ranging from 5 to 10. With decreasing aspect ratio, Fig. 1 . A comparison of bump and bumpless interconnects using TSVs for 3D logic/memory stack structures, assuming six dies for a memory stack and one multicore microprocessor. Bumpless interconnects can be formed with higher density (narrower pitch) compared with TSVs and bumps due to the limitations of bump size and pitch. Since Back-to-Front stacking and a TSVLast process have no limit on the number of stacks, high density and Tera-scale bandwidth can be anticipated using mature existing devices.
in the TSV processes, such as etching, thin film deposition, and metal filling, the process time decreases to about 1/5 at most, and the step coverage significantly improves.
With the use of small TSVs, stress induced by a mismatch in the Coefficients of Thermal Expansion (CTE) between Cu and Si decreases with decreasing aspect ratio of the TSV, as shown in Fig. 4 [10] . Stress at the center of the Cu plug decreases in proportion to the thickness of the Si wafer. The small aspect ratio provided by an ultrathin wafer also has the advantage of reducing stresses generated in the silicon itself, in the bottom and top Cu-TSV, and in interface regions having different CTEs [11, 12, 13] .
3 Details of WOW processes
Thinning module
A wafer is bonded to a support substrate (glass or Si wafer) in advance with a temporary adhesive. Thinning is performed by grinding (Back Grind, or BG) within several micrometers of the target thickness, followed by polishing until the final thickness is achieved. The thinned wafer is permanently bonded to the device surface of another wafer, and then the support substrate is removed. In the case of WOW employing bumps between wafers, simply thinning the wafer causes its rigidity to decrease, and the limit is thinning to 50-100 µm. By using bumpless WOW processes, a 300 mm wafer on which 32 nm-node SDRAM devices (2GB DDR3) are fabricated can be thinned down to 4 µm with a total thickness variation (TTV) of around 1 µm, which is just 0.5% of the initial thickness [14] . With a wafer thickness of 4 µm, visible light starts to pass through the wafer. Remarkably, no degradation of retention time before and after thinning was observed, even in a Si wafer with a thickness of 4 µm.
Stacking module
An organic adhesive such as benzo-cyclo-butene (BCB) polymer [15] with a thickness of approximately 5 µm was used to permanently bond the wafer. BCB adhesives start to polymerize with increasing temperature and are solidified at temperatures of 200-250°C. For WOW, the wafers are aligned just before being permanently bonded. To ensure appropriate alignment, infrared light passing through the silicon substrate is used. Wafers to be bonded to one another in WOW are originally thin and are therefore highly transmissive. With a lowtemperature bonding process and an optimized curing duration, the average misalignment between wafers can be made as small as several micrometers. On the other hand, because any gaseous solvent escaping from the adhesive after the bonding process would form a cavity (void) in the adhesive layer, measures should be taken to prevent this, such as preheating after applying the adhesive or performing the bonding process under a reduced pressure.
TSV (Through-Silicon-Via) module
For bumpless interconnects including RDLs, the Damascene method is employed to simplify the processes. For TSV processing, dry etching through the dielectrics (device layer), Si, and adhesive layer is carried out. TSVs with a small aspect ratio, for example, <3, have the advantage of shortening the process time for both etching and metal filling compared with conventional large TSVs. Assuming that the etching rate follows the mass transport limit reaction, the etching times, t and t 1 , at different TSV diameters, D and D 1 , and depths, d and 5 shows the via structure after etching and the leakage current, compared with Bosch and direct etching methods [16, 17] . Since Bosch etching was conducted by cyclic isotropic-etching and deposition, micro-steps called scalloping were formed at the sidewall of the via. The scalloping causes cracks in the dielectric layer and poor step coverage for thin films deposited by CVD and Physical Vapor Deposition (PVD). In contrast, anisotropic direct etching resulted in a smooth surface profile along the side wall. The leakage current in Bosch etching was one order of magnitude higher than that in direct etching. The leakage current was caused by Cu diffusion at the side wall of the TSV that took place at a thinner part of the dielectrics containing cracks. Thus, direct etching is suitable for TSV processing and enables the use of low-aspect-ratio vias. In addition, low-aspectratio Cu causes lower stress concentration in thermal processing compared with high-aspect-ratio Cu. Low stress reduces Cu deformation and stress propagation to the device regions. subjected to heat stress testing at temperatures of −65°C to 150°C. Scanning acoustic tomography (SAT) was adopted for internal observation, and after up to 100 repeated heat stress tests, no delamination was found at the interfaces between the molding compound and chips, nor at the chip stack interfaces.
Thinning and device characteristics
The effect of wafer thinning on device performance was examined to evaluate the thinning limits for 300 mm silicon wafers. In the case of DRAM devices, following the WOW process, the wafer was thinned to a final thickness of 4 µm, which is about 0.5% of the thickness of the bulk wafer (725 µm) and thinner than the device layer shown in Fig. 6 [14] . The total thickness variation (TTV) within the 300 mm wafer was low enough to realize multi-stacking: a TTV of 1.02 µm was achieved at an average thickness of 4 µm. No significant change with retention time (refresh time) in the entire wafer before and after thinning was observed for the Si thicknesses of 40, 20, 8, and 4 µm. This suggests that the thinning process developed in this study did not affect the junction leakage current, which degrades the retention time more sensitively than other leakage phenomena such as subthreshold leakage, capacitor dielectric leakage, and gate induced drain leakage (GIDL). Since an ultra-thinned wafer in the WOW process is bonded onto the bottom wafer in a so-called Bonding-First process, TSVs are processed after bonding (TSV-Last process), and there are none of the issues seen in the TSV-First process and/or the TSV-Middle process. Fig. 7 shows a schematic diagram of the grinding process. The wafer thickness uniformity after grinding was determined by the contact angle between the wheel and wafer surface [18] . The wafer was very slightly bowed after bonding due to deformation at the wafer edge where temporary adhesive was applied, and this uniformity was also reflected in the contact angle. By adjusting the contact angle to follow the geometric shape of the wafer, the TTV decreased to as low as 0.5 µm within the 300 mm wafer. Wafer thinning was carried out as follows: coarse grinding (#320 grit size) down to ∼50 µm, fine grinding (#2000 grit size) to <20 µm, and post-thinning using chemical mechanical polishing (CMP), as shown in Fig. 8 . With these thinning processes, the thickness of the damaged layer, including point defects such as vacancy-type defects, was decreased from micrometer level to several nanometers, as evaluated by TEM and Positron Annihilation Spectroscopy analyses [19, 20] . 5 Manufacturing for Terabit generation
Manufacturing roadmap toward the next generation of devices
Because our method allows thinning of silicon wafers down to 4 µm without any degradation of the device characteristics, the total wafer thickness, including the device layer and the adhesive layer, becomes 10 to 20 µm, which is 1/10th to 1/100th the thickness of conventional bump interconnects using TSVs. Even if the number of stacked wafers is 100, assuming that the wafer thickness is 10 µm, the total thickness after stacking is only 1 mm, which satisfies current packaging standards. By following these multilevel stacking processes, with a conventional memory device fabricated with 22 nm technology and having a memory density of 30 Gb/cm 2 , when four, eight, sixteen, etc. of these devices are stacked, the total capacity of the memory device can be increased to 120 Gb, 240 Gb, 480 Gb, etc., respectively. Terabit-capacity memory can be realized by stacking only 40 wafers; in contrast, to achieve equivalent capacity with a single wafer using extreme scaling would require 1 nm processing technology, as shown in Fig. 9 . Considering the technology roadmap, the issues of scaling technology and technology for fabricating three-dimensional structures are often discussed separately. However, these two technologies are not always mutually exclusive. Scaling would be relieved of the stringent requirements by using three-dimensional high-density integration technology combined with mass-production technology. In other words, a sufficiently long learning period would be ensured, and further cost reductions could be expected by concentrating on the control of variations among generations and shortening the process.
Considering the yield in wafer stacking
In the case of die-to-die series connections, the total yield in wafer stacking can be estimated by Y n , where Y and n are yield of one wafer and the number of stacked wafers, respectively. With bumpless interconnects using high-density multi-TSVs and a controller, output signals from one die and/or one memory bank (channel) can be connected to the upper and lower stacked chips independently, as shown in Fig. 10 . Hence, an unreliable device and/or bank unit in the stack can be ignored and thus counted as an independent probability event, resulting in a total yield in wafer stacking of Y 3D ) Y n . Fig. 8 . Cross-sectional TEM images of Si wafer after thinning. Thinning was carried out by coarse grinding (#320 grit size), fine grinding (#2000 grit size), and stress relief (post-thinning) using Chemical Mechanical Polishing (CMP).
In case of simplified die-level yield, the yield is estimated using the value Y ¼ 0:64 above for a case where four wafers are stacked. The probabilities of four good dies and three good dies in such a stack are 0.17 and 0.86, respectively. Stacking four dies and three dies with a memory density of 30 Gbit/cm 2 (equivalent to 22 nm technology) achieves capacities of 120 Gbit and 90 Gbit, respectively. If the number of effective chips is 1700 per 300 mm wafer at a chip size of ∼0.4 cm 2 at 8 Gbit/die, the expected number of chip sets per unit memory capacity is 289/ 32 Gbit (4 stack), 1462/24 Gbit (3 stack), and 1088/8 Gbit (single). To realize a capacity of over 24 Gbit with a single chip would require technology two or three Fig. 10 . Schematic diagram of DRAM stack structure. Here, one memory die has 16 channels (CH0 to CH15) in total, stacked following DRAM2, DRAM1, DRAM0, and then controller to DRAM3 (base wafer), using bumpless interconnects and a WOW process. Bumpless interconnects are connected independently to the controller die from each channel of the DRAM layers. Fig. 9 . Trends toward DRAM density using 2D conventional scaling and 3D multi stacking using existing DRAM. DRAM capacity in the 3D case corresponds to the number of stacked dies, assuming that redundancy is eliminated by cell-blocks at each layer.
generations ahead, such as 10-14 nm technologies. The defects, however, are not proportional to the reduction in area. This is because, as scaling proceeds, so-called stealth defects (unobservable defects) increase, and the control of process variations approaches its limit. When stealth defects become dominant, variations cannot be improved statistically, and thus die yield deceases with scaling. According to an empirical profit model, the investment in 22 nm technology needs to be made in consideration of future 8-10 nm technologies because of the tradeoffs between product sales and facility depreciation after huge investments. The price of extreme ultraviolet (EUV) lithography machines is approximately twice that of ArF lithography machines, and their current throughput is around one-tenth. This is the limit of scaling in light of the economics of the industry, and it is difficult to paint a scenario of victory at present.
Conclusions
WOW technology and bumpless interconnects using TSVs for three-dimensional stacking in wafer form have been described. It was found that an optimized wafer thinning process for determining the stack thickness does not cause degradation of Fig. 11 . Trends in two-dimensional (2D) scaling and wafer size including total Si surface of wafer stack. Conventional scaling will face difficulties such as physical limits and inability to minimize costs, whereas 3D integration will become superior to scaling. By combining conventional two-dimensional integration (2DI) with three-dimensional stacking to overcome such problems associated with device scaling and increasing wafer size, it is possible to make a roadmap towards highdensity integration backed up by production costs. In volume production, 3D wafer stacking (WOW) enables a lower cost than Chip-on-Chip (COC) and high-density integration, reaching Terabit level. Bumpless interconnects using TSVs and ultra-thinning provide high-density I/Os connecting top and bottom device layers and achieve a small form factor 1/ 10th that of bump structures.
the device characteristics in advanced commercial devices, even with the smallest thickness of 4 µm that we achieved. Because bumpless interconnects using TSVs can be connected directly to the upper and lower substrates by self-alignment, the package thickness can be reduced by an amount equivalent to electrodes, such as bumps, which are not necessary when bumpless interconnects are used in combination with wafer thinning. Since the design pitch of TSVs is determined by the bump size, high-density TSVs can be formed in bumpless interconnects by following TSV patterning processes. At the same time, size reduction of the finished shape allows the wiring between the upper and lower chips to be made shorter, which reduces the total wiring impedance and makes it easier to ensure high bandwidth with higher energy efficiency. Furthermore, by stacking wafers, high-density integration and system block arrangements become more flexible, and the design space is extended.
In combination with three-dimensional stacking for overcoming the problems associated with scaling, a roadmap towards high-density integration backed up by production costs can be formulated. The ability to stack chips while keeping the wafer shape unchanged ensures compatibility with existing manufacturing facilities in front-end processing and makes use of the technologies that have been nurtured for wafer processing. If processes up to three-dimensional stacking can be handled as units in the manufacturing line, the throughput will be one-hundred times greater than stacking starting with chips. Therefore, future semiconductor manufacturing is expected to advance with a roadmap in which the number of stacked wafers, the wafer thickness, and the number of TSV interconnects serve as indices, as shown Fig. 11 .
Acknowledgments
This study was carried out based on the three-dimensional integration development program by the WOW alliance of Tokyo Institute of Technology, and the authors thank the alliance members, Nagoya University, Tsukuba University, the University of Tokyo, and WOW Research Center Ltd., for their cooperation. 
