Abstract
Introduction
The transition to 0.25 µm or even smaller technologies allows the integration of up to 128 Mbit of embedded DRAM and 500 Kgates logic on the same piece of silicon. Figure 1 shows memory and logic complexities for various die sizes (excluding pads) in an advanced 0:25 µm embedded DRAM process. This possibility makes embedded DRAM 1 technology (eDRAM) very attractive for real "system-on-silicon" implementations [1, 2, 3, 4, 5] . Hence the market for eDRAM, estimated at 100-200 M in 1997, is projected to reach more than 4 billion in 2000. Additionaly eDRAM offers DRAM vendors the possibility to escape the actual DRAM prize desaster and to set up eDRAM IP.
The possibility to integrate large memory and logic on the same die has a large impact on system integration and performance, memory sizes, on-chip memory interfaces and memory structures. Main advantages of embedded DRAMs are higher memory bandwidth, lower power consumption, customized memory sizes and higher system integration. But embedded DRAM burdens disadvantages and/or challenges on technology and fabrication, testing and design methodologies.
With embedded DRAM, the system designer faces a new design parameter which enlarges tremendeously the 1 We speak of "embedded DRAM" or "embedded logic" depending on whether the master process is a logic or a memory process. Note that some authors use the terms in exactly the opposite way. architectural design space. He/she is no more restricted to the use of commodity DRAMs which imply standard sizes, interfaces and access protocols. The capacity of DRAMs increases by a factor of four for every new generation. As the growth of bandwidth requirements has kept pace with those of the memory, the interface width of DRAMs should thus have been growing as fast as the size of single DRAM devices. This has not happened for packaging reasons. As a consequence, in many applications the use of commodity DRAMs lacks of memory granularity and/or bandwidth.
Comparison of embedded DRAM versus external DRAM
Most important with embedded DRAM is that the designer can adjust the bandwidth and memory size to its application. Let us consider a system which needs a total amount of C system memory bits and T system memory bandwidth. C device and T device denote the memory size and memory bandwidth of a single commodity DRAM, respectively. There are two cases in which memory is wasted when the memory system is composed of commodity devices:
Case 1: the granularity of the memory devices forces more memory. This is the case if
E. g. the size of PC memory systems has grown by only half the rate of single DRAM devices (DRAM growth: 60%/year, Windows NT software growth: 33%/year).
Case 2: the memory bandwidth forces parallel access to several memory devices. Unnecessary memory is induced
Thus the wasted memory for a given system C system ; T system which has to be composed of memory devices C device ; T device is:
The ratio T system C system characterizes an application requirement. A low ratio means that the application demands relatively small bandwidth compared to its large memory sizes (e. g. workstation applications), a high ratio means that a large bandwidth must be provided by a small amount of memory (e. g. 3D graphics applications). This ratio is called the fill frequency [6] which gives the number of times per second a given memory can be completely filled with new data. It is important to notice that in the past T system C system over the different applications has been roughly constant or has increased, while the DRAM device fill frequencies
have declined steadily [6, 7] . The consequence (see case 2 above) is unwanted memory, especially in applications where T system C system increases (e. g. graphic applications).
Let's have a more detailed look on the memory bandwidth which can be calculated as T device = IO width f IO .
IO width is the width of the memory device and f IO the data IO frequency. Due to the page-miss penalty of DRAMs f IO is not a constant value. The access time to a dataword in another page differs by one order of magnitude compared to the access time to a dataword in the same page. Thus f IO can vary by one order of magnitude and the sustainable f IO is very application dependent. To maximize this value on the memory level, eDRAM offers the possibility to adapt the page size to the application, to integrate cache lines directly into the eDRAM macro, to apply multibank structures [8, 9] or to use an access-sequence control scheme as proposed in [10] .
A second factor which influences f IO is the load capacitance which has to be driven by the memory buffers. Obviously lowering this load increases f IO . Typically there is a difference of a factor of 10-50 between on-and off-chip driver loads. In addition, inductivity caused by the package and the board lines is eliminated if the DRAM/logic connection is done on-chip, thus system noise immunity is enhanced. However, the most important factor which influences the memory bandwidth is IO width . In commodity devices the IO width is limited to 16-32 pins due to packaging reasons. Embedded DRAM provides buswidths up to 512 bit or even more. Since the memory interface is on-chip, the total pin count of the chip is reduced and padlimited designs may be transformed into non-pad limited ones.
Obviously eDRAM can offer a finer granularity in memory sizes (steps of 256 Kbit or 1 Mbit) and a much higher bandwidth range than commodity DRAMs. Thus the fill frequency of an embedded DRAM module can be tuned towards the fill frequency of the application. Figure 2 illustrates this advantage. Peak bandwidths and fill frequencies of commodity DRAMs (EDO, SDRAM, SGRAM, DDR, Rambus) and embedded DRAM cores are depicted in a logarithmic scale for 16 Mb and 64 Mb, respectively.
