Abstract
Introduction
Today, 100+ million transistors can be made on a single die and the cost of a transistor is approaching to "zero". Every year the semiconductor industry makes almost the same amount of transistors as all of the previous years combined. The new challenge is not how many transistors can be built on a single chip, rather how to integrate diverse technologies together, predictably and cost-effectively.
System-in-a-Package (SiP), a generalization of System-on-a-Chip (SoC), overcomes formidable integration barriers without compromising individually optimized chip technologies. By preserving on-chip electrical environment, SiP matches or exceeds SoC performance with lower cost. SiP should be viewed as a giant chip rather than a miniaturized circuit board.
As the feature size of current IC fabrication technology approaches 90nm, DRAM technology and logic technology diverge more and more even both are in CMOS. Embedded DRAM cannot provide cost-effective solution because of its low yield and high fabrication complexity. Memory/logic integration based on SiP is a feasible alternative to embedded memory. Unlike SoC approach, which compromises different chip fabrication technologies, SiP approach unlocks the full potential of IC technology by the integration of conventional ASIC and memory technologies using existing, individually optimized ICs. Therefore, memory and logic can be integrated at lower cost and reduced size, while the performance can compete with the SoC counterpart.
Several technologies have been proposed to develop SiP modules. Stacked-chip SiP technology, which does not require any extra design process in chip design, is most commonly used to build an SiP [11] . However, bonding wires have inferior electrical properties including high parasitic inductance, and stacked structure results in inferior thermal heat conductivity. Therefore it is difficult to achieve better electrical or thermal performance than conventional design. To avoid these problems, Chip-onChip SiP technology (CoC) has been proposed [1] [2] [3] . One of the modules developed on CoC is FPGA/DRAM SiP module, which integrates large-scale FPGA and multi-bank DRAM in single package to provide high bandwidth memory access. By exploiting solder bumping and flip-chip assembly, CoC enables the integration of different chips with shorter interconnection length and larger IO density. However CoC can be only applied when the substrate chip is large enough to hold all of the other chips. This is a serious limitation in CoC, and to solve this problem, Chip-Laminate-Chip SiP technology (CLC) has been introduced [9] [10] . In a CLC module, a thin film laminate serves as a package substrate, wiring resource and decoupling capacitor for power source. Chips are solder-bumped on both sides of laminate, which allows heat dissipation from top and bottom sides of the package. Thus, CLC can achieve better electrical and thermal performance compared to CoC.
In this paper, we analyze electrical and thermal performance of CLC-based SiP and compare it with other SiP technologies. The paper is organized as following: Section 2 introduces CLC technology. Section 3 analyzes electrical performance of chip-to-chip connection on CLC-based SiP and compares to stacked-chip SiP. Section 4 examines thermal performance of CLC-based SiP by developing its thermal model and simulating with FLOTHERM, a commercial thermal analysis tool. Section 5 concludes the paper with some remarks.
Chip-Laminate-Chip Technology
Chip-laminate-chip technology employs one thin laminate film between the top and bottom chips to provide better electrical environment and robust power/ground distribution. Figure 1 illustrates the CLC technology. In the CLC module, the laminate is part of the BGA package; top-side chips and bottom-side chips are flipchip mounted on the laminate. Decoupling capacitors are build-in which provide better power/ground structure compared with CoC architecture.
Laminate Logic

DRAM
Some CLC package characteristics are [5] : Maximum off-chip delay << IO buffer delay (3.5ns).
Signal round trip time < rise time (500ps).
Inter-chip skew < board skew (500ps).
No terminating resistors required. Smaller buffer size and minimized ESD protection. When logic and memory chips are assembled with CLC, they are on the same chip electrically, even though they are fabricated in different chips physically. The memory/logic interface can achieve over 500 MHz for double data rate (DDR). The elimination of terminating resistors dramatically reduces the per-pin power consumption. Figure 2 is an example of CLC memory/logic integration module, which consists of one S3 graphics chip and 8 Micron 8MB DDR SDRAMs [10] . This tighter integration offers much higher memory access bandwidth than on-board graphic memory with little cost premium. CLC technology offers the potential that low-end PC CPU with DDR can compete with the performance of the current high-end server system by further improving memory access time, which is the system bottleneck. The speed-up of DDR SDRAM inside CLC module balances the core logic and memory access speed. Furthermore, one logic chip integrating the CPU, graphic chips, and chip set on one side, and 500 MHz DDR SDRAMs on the other side make a single-package computer. The unique features such as low cost, low power, small area, and light weight create great opportunities in consumer products.
Electrical Performance Analysis of CLCbased SiP
In this section, we try to analyze the CLC-based SiP electrical characteristics by modeling the internal IO performance inside CLC package. In this study, we focus on a DRAM and FPGA integration module, which contains one FGPA and two DRAM chips. The IO path from FPGA to DRAM includes FPGA IO (buffer and pad), FGPA chip IO rerouting, solder bump on FPGA side, laminate routing, solder on DRAM side, DRAM chip IO rerouting, and DRAM IO (buffer and pad). Since designer can easily reroute the FPGA IOs to match the footprint of DRAM solder bumps, connections between FGPA solders and DRAM solders are mostly vertical (via). We include those parasitics inside the model of solder bumps. Solder bump pitch size is 500um [7] . Some specifications of the FPGA chip and DRAM chip are listed in Table 1 . 
Modeling of rerouting wire length
The rerouting wire length is estimated based on chip geometry, IO floorplan, and technological parameters, such as IO pad pitch size and solder bump pitch size. Chip IO placement of FPGA and DRAM are illustrated in Figure 3 . We studied the rerouting layout of automatic routing tools with two metal layers and found that the rerouting wire lengths are always in a certain range, which is the shortest distance between the solder to IO boundary plus a few solder pitch size. Assuming the origin is the center of the chip, the chip size is M x N, the solder pitch size is P, the location of the solder is (x, y), which is in the upright quarter of the chip, the wire length can be calculated as following: For IOs on chip boundary only, The worst case is the solder bumps are located on the center of the chip (for IO on the centerline, the worst case is the solder on the middle of left/right chip boundary). The worst case wire length is:
The comparison between the measurement results and the calculated data is listed in Table 2 . Here, the average wire lengths were slightly overestimated because P is more like the upper bound of the offset between the chip IO and solder bump. 
RLC equivalent circuit
The equivalent circuit of IO path can be obtained by approximating the R, L, and C of IO pad, rerouting metal, and solder bumps.
The pad is mainly a capacitive load for the output driver, its capacitance can be easily found using the approximate formula [11] : where A pad is the bonding pad area, P pad the bonding pad periphery, H the height of the bonding pad above the conductive silicon substrate, and T the thickness of the metallization of the bonding pad.
Because the spaces between rerouting wires are usually large enough, the coupling capacitance for rerouting wires is negligible. The wire capacitance can be derived as [11] where W is the width of rerouting wire, H the height of the rerouting layer above the conductive silicon substrate, T the thickness of rerouting, and W reroute is the wire length.
We simplify the estimation of equivalent resistance by assuming the rerouting interconnect as one metal line, this is reasonable because the rerouting usually do not change routing layer or change only once with adequate routing resource.
T W W R reroute reroute
where is the resistivity of the metal line, W the interconnect width, and T the thickness of the metal layer.
The solder can be viewed as a cylinder shape conductor. Its resistance can be estimated as: where H solder is the height of the solder, D the solder diameter, H sl the distance from solder to the laminate ground plate. In addition, each solder bump with the underneath via contributes approximately 0.5nh inductance [12] . Therefore, we have the equivalent RLC as listed in Table 3 . As a comparison, we also analyzed the corresponding implementation using Stack-Chip (SC) technology, as illustrated in Figure 4 . Stack chip package mainly exploits wire-bonding technology. By eliminating additional rerouting layers, Stack chip package offers lower cost and lower design complexity over CLC package. However, the large parasitic inductance introduced by bonding wires limits its application in high frequency domain.
We model the bonding wire as a 25um diameter copper line, the bonding wire length for FPGA and DRAM is 3 mm and 5 mm respectively. The parasitic inductance can be approximated as 1.1 nh/mm [11] . The routing on the substrate is also negligible because the flexibility in netlist can minimize the distance between corresponding FPGA bonding pad and DRAM bonding pad. Therefore, we have the equivalent RLC for FPGA bonding wire as 0.162 ohms, 0.321 pF, 3.3 nh, and for DRAM bonding wire as 0.270 ohms, 0,401 pF, and 5.5 nh. The equivalent circuits of CLC IO path and SC IO path are illustrated in Figure 5 . 
Simulation result
We simulate above circuits using HSPICE. The goal is to analyze how the signals are transmitted from chip to chip inside the package, and how this new packaging technology impacts the chip design. Figure 6 illustrates the simulation result of CLC IO path and SC IO path. Upper waveform is the signal transfer from FPGA IO to DRAM IO in CLC package; lower one is the corresponding signal in SC package. Chip design, especially IO design, should be optimized for the superior electrical environment in CLC package. The buffer size of output driver can be minimized to achieve smaller area and less power. We analyzed how the delay, rise time, and fall time change with different driving strength, as illustrated in Figure 7 . The delay, rise time and fall time do not increase much as the buffer size was reduced to 50%, which means significant saving in chip area and power consumption. In SiP, internal IOs are not connected to external pins and are not exposed to the Electro-Static-Discharge (ESD). These protections become redundant and can be minimized [4] . Figure 8 compares the delay, rise time, and fall time vs. the input capacitance. We can see that the timing constant is decreased linearly with smaller load capacitance, which means less delay, smaller IO area, and less power. In addition, the output driver can be further minimized with smaller load on the input end. 
Thermal Analysis of CLC-based SiP
We analyzed the thermal performance of CLC-based SiP module using FLOTHERM [13], a commercial thermal simulation tool. We try to compare the thermal performance of CLC package with stack-chip package and system-on-a-chip (SoC) implementation. We assume the SoC implementation has die area and power consumption as the sum of each individual chip. This is reasonable first order approximation because SoC implementation usually has less power consumption and smaller chip area than CLC or SC implementation by removing inter-chip connection and IOs. These two factors contribute inversely to junction temperature, which make the thermal performance of SoC close to CLC implementation. The test packages have been simulated in standard JEDEC test environment with different airflow. The ambient temperature is set to be 30 . Table 4 lists some technology parameters used to model the three implementations. Table 5 shows the simulation result. CLC package has significant thermal advantage over stack-chip package. In the case of stack-chip package, the silicon is not good thermal conductor. The heat of the lower chip (FPGA) has to dissipate through the upper chip (DRAM), therefore, the FPGA junction temperature is much higher than in CLC module. In addition, the "back-heat" effect on the DRAM chips also makes the DRAM junction temperature higher than DRAMs in CLC module. The junction temperatures of CLC module are suitable for portable devices, which do not allow big heat sink. Airflow also has big impact on junction temperature, with 1m/s air flow from left to right, the temperature can be reduced as much as 14 . Figure 9 is the comparison of the temperature distribution of CLC module, SC module and SoC module. It is demonstrated that in the case of CLC module and SoC module, the heat is very well dissipated from the top of the package. On the other hand, the SC package provides greater thermal resistance due to the low conductivity of the encapsulant. So the junction temperature is significantly higher. 
C
Conclusion
CLC-based SiP can serve as an implementation platform for giga-scale systems by giving designers opportunities to explore their hardware architecture and physical implementation in an early stage and simplify the physical package design task. We analyzed the electrical characteristics and thermal performance of CLC-based SiP and compared with other implementation platform, such as stack-chip SiP and SoC. It is demonstrated that CLC technology has significant performance advantage over conventional SiP and is an ideal cost-effective alternative to system-on-a-chip.
Acknowledgement
The author would like to thank Professor Andrew B. Kahng from University of California, San Diego, Dr. King L. Tai from SyChip Inc. and Mr. Ethan Warner from Flomerics Limited for their valuable help and suggestion on this research. This work is funded in part by the DARPA/MARCO Gigascale Silicon Research Center.
