OVER THE LAST 25 YEARS, the storage capacity of monolithic dynamic randomaccess memories (DRAMs) has increased by six orders of magnitude through a combination of design innovations and improvements of existing technology. 1, 2 Design innovations include the folded bit-line cell array arrangement and compact, high-capacitance, threedimensional cell designs such as the stacked capacitor and trench capacitor cells. Technology improvements include increases in die area and, most importantly, reductions in minimum feature size. 1, 2 Further increases in DRAM storage density through the use of still smaller feature sizes face serious challenges from increasingly high fabrication facility costs. Monolithic digital systems increasingly employ embedded DRAMs; however, some of the density advantage of DRAMs over static RAMs is lost because conventional logic processes support only planar cell capacitor designs.
An often-proposed design innovation for achieving higher storage density without further reductions in feature size or complex 3D cell structures is to hold more than one bit per storage cell. The potential for storing more than two resolvable analog voltages on the capacitance of a storage cell facilitates this idea in DRAMs.
In this article, we describe a fault model for a 2-bit-per-cell MLDRAM proposed by Gillingham . 3 We derived the fault model using manual analysis and confirmed it with analog SPICE simulation. We modified accurate circuit models obtained from MO-SAID Technologies to produce the effects of physical defects similar to those reported for 1-bit-per-cell DRAMs with similar cell array layouts. We also propose an efficient test for the fault model and possible MLDRAM design-for-testability enhancements.
MLDRAM design problems
Three fundamental problems in multilevel DRAM (MLDRAM) design are 1) creating the different analog voltages to be stored, 2) creating the various reference voltages necessary to interpret previously stored signals, and 3) managing the effects of the reduced noise margins resulting from more closely spaced signal levels.
Researchers have proposed many ML-DRAM designs. [2] [3] [4] [5] [6] The scheme described by Gillingham has several attractive characteristics. For example, it solves problems 1 and 2 with charge-sharing techniques that locally generate analog reference signals for both the writing and sensing operations. Cell signal sensing involves comparing signals on matched sub-bit lines. Also, the same cell that originally held the stored signal is used to generate the reference signal required to sense the least significant bit. These features provide robustness against process variation effects. A further advantage is that the MLDRAM reuses many elements of a conventional 1-bit-per-cell DRAM, including most of the cell array layout, the basic sense amplifier design, and the precharge to mid-voltage (V pre = 1/2 V DD ).
Random cell accesses, however, are slower in Gillingham's MLDRAM than in a conventional DRAM because of the multiple steps required in the writing and sensing operations. For example, an MLDRAM's random cell read time t RAC is roughly twice that of a DRAM implemented in a similar semiconductor technology. However, after the random-access time penalty, accesses to additional data on the same internal cell page can be as fast in an MLDRAM as in a conventional DRAM. Thus, the maximum data bandwidths for MLDRAMs and DRAMs should be similar. The reduced noise margins of all MLDRAMs potentially increase their vulnerability to process variations, soft errors, and excessive cell leakage current. Possible solutions include higher-capacitance storage cells, 2 careful attention to cell layout to eliminate potential noise sources, and an increased refresh rate.
MLDRAM operation
The operation of Gillingham's MLDRAM is detailed elsewhere, 3 so we briefly summarize it here. Figure 1 illustrates the encoding of the four 2-bit Boolean pairs 00, 01, 10, and 11 as equally spaced voltages in the range ground to V DD . We refer to the leftmost bit as the most significant bit (MSB) and the rightmost bit as the least significant bit (LSB). Figure 1 shows the positive signal encoding; to obtain the negative signal encoding, we complement the MSB and LSB values. Note that the noise margins of this four-level encoding are only one third those of a two-level encoding using voltages 0 and V DD .
Sensing the voltage signal stored in a cell requires connecting the cell capacitance C c to a sense amplifier via a bit line. A bit-line capacitance C b is typically 8 to 13 times larger than a single cell's C c . 1 Thus, charge sharing reduces the cell signal's strength by a factor of C c /(C c + C b ). Recovering the 2-bit logical value from the attenuated cell signal is a two-step destructive read process: First, a sense amplifier compares the cell signal with the mid-reference level 1/2 V DD to determine the MSB. Then, depending on the outcome of the first comparison, the cell signal is compared with suitably attenuated copies of either 5/6 V DD or 1/6 V DD to determine the LSB (see Figure 1) . After a cell's contents have been sensed, the cell's analog voltage signal must be restored to the nominal level for the data just read, or written with the nominal voltage level for newly supplied data. Figure 2 shows the circuit diagram for one bit-line pair containing a column of storage cells (in the figure, the cell column runs horizontally, following common practice). One can view the circuit as two sub-bit-line pairs (left and right) that interact via a switch matrix controlled by signals C, C*, X, and X*. The left (right) sub-bit-line pair comprises true BL (BR) and complement BL* (BR*) sub-bit lines connected to the noninverting and inverting terminals, respectively, of the left (right) sense amplifier. All sub-bit lines have the same capacitance C b . Typically, many bit-line pairs are grouped to form left and right cell arrays that share peripheral circuitry. Each cell in an array consists of a capacitor connected from a common biased plate node to the source node of an access transistor; each access transistor's drain node connects to a sub-bit line.
As shown in Figure 2 , we assume that the MLDRAM uses PMOS access transistors. (Although using PMOS rather than NMOS access transistors changes the polarity of the wordline signals, it has no other effect on our analysis.) When a cell in the left (right) cell array is read, the MSB is recovered at a left (right) sense amplifier, and the LSB is recovered at the corresponding right (left) sense amplifier. With respect to an addressed cell in either the left or right arrays, MSB side refers to the array containing the cell, whereas LSB side refers Figure 2 . MLDRAM circuit schematic. 3 
IEEE DESIGN & TEST OF COMPUTERS
to the array on the other side of the switch matrix.
Word lines connected to the gate nodes of cell access transistors control access to a row of cells (in Figure 2 , the rows run vertically). We assume the word lines are numbered WL 1 Figure 3 shows the control waveforms that cause a write (or restore) operation to the cell at WL i and BL* followed by a read operation. This cell uses the negative signal encoding because it is accessed via the complement sub-bit line BL*. The word line signals are active low because of the PMOS access transistors. This waveform sequence proved more convenient for studying the external effects of faults than the sense-and-restore waveform sequence given by Gillingham. 3 We summarize the corresponding events for the bit-line pair here.
Write a new 2-bit value into the addressed cell:
1. The column address decoder (not shown in Figure 2 
Fault model development
To develop tests for a new DRAM, one usually starts with a superset of the tests found effective for earlier parts. 8 But this strategy is unlikely to be as effective for an MLDRAM, whose circuit schematic and operation differ from those of a conventional 1-bit-per-cell DRAM. We therefore decided to develop a new model of the expected faulty behaviors. Then we could design tests that efficiently target faults in this model. Inductive fault analysis is a defect simulation method that infers a fault model from the circuit layout and knowledge of the process defect distributions. 7 We chose to avoid this method's computational expense by using manual analysis to predict faulty behaviors directly from the circuit schematic. We confirmed the resulting behaviors, whenever practical, by SPICE simulations. This is a reasonable MLDRAM strategy since many elements of the cell array layout are identical or similar to those in a conventional DRAM. The main differences in behavior should follow from the differences in the control signals and the schematic.
Several assumptions underlie our MLDRAM fault model. First, we assume that the cell array's physical layout is very similar to that of a conventional folded bit-line DRAM. Thus, we assume that sub-bit lines are laid out in parallel on the same metal layer, 9 and that word lines are laid out orthogonally to the bit lines on one or more separate layers. (Word lines are usually implemented in gate polysilicon with parallel metal shunts or straps.) Second, we assume that physical defects in an MLDRAM will be similar to those known to occur in DRAMs.
9-11 Thus, we assume that stuck nodes, missing contacts, interconnect breaks, and short circuits between adjacent or crossing interconnections will be relatively frequent. For example, word-line-to-word-line and bit-line-to-bit-line shorts should be relatively common. To simplify our analysis, we consider both stuck-at-0 and stuckat-1 faults affecting the control signals; this allows us to abstract away details of the circuits that produce the control signals. Finally, we assume that defects affecting peripheral circuitry-row and column decoders, control logic, voltage generation and regulation circuits, and so on-will cause catastrophic failures readily detected by any reasonably thorough test of the cell arrays. 8 Therefore, we do not directly model defects in the peripheral circuits.
We call a physical defect clean if it produces faulty behavior that can propagate deterministically to a circuit output as an observable Boolean error. 12 Unclean defects often arise when contention exists at a circuit node joining two conducting paths to different power supply voltages. As with DRAMs, many physical defects in MLDRAMs are unclean. In an MLDRAM, the recovered MSB's state determines which reference voltage is used to sense the LSB. Thus, a defect that randomizes the MSB also randomizes the recovered LSB. The converse is not true; a random LSB will not affect the recovered MSB's state.
Results of manual analysis. For each physical defect, we predicted how the circuit would appear to behave at the external Boolean MLDRAM data interface. We identified eight classes of distinct erroneous behavior. Table 1 shows the resulting mapping from defects to fault classes. Now, we informally describe and justify the eight fault class definitions: 13 
Isolated stuck bits: cells contain fixed values.
A short between a word line and a cell capacitor's storage node causes the word line signal to overwhelm the cell signal. This causes the cell's MSB to appear stuck and the LSB to appear fixed at some random value (the word line signal drives both the LSB reference and the cell signal). Excessive leakage current from a cell capacitor causes both MSB and LSB to settle at fixed implementation-specific values. Several defects null out the cell signal, leaving the precharge voltage to be sensed; the observable result is a stuck MSB and LSB. These defects include a short between a sub-bit line and a cell capacitor, no connection between a sub-bit line and a cell, a cell access transistor stuck open, and a cell access transistor stuck on.
Multiple stuck bits: clusters or alignments of cells contain fixed values.
A stuck (or an interrupted) word line causes the precharge voltage to be sensed for all cells connected to that word line (to the floating word-line segment). The observable result is that the MSB and LSB of those cells are independently random (they are likely to be fixed states in real devices). A short from a word line to a sub-bit line produces stuck MSBs and LSBs along the shorted sub-bit-line pair, since the word-line signal is sensed instead of the cell signal during MSB sensing and LSB reference-level generation. (As in conventional DRAMs, 9 a short between a word line and a sub-bit line can also cause complex faulty behaviors if the sense amplifier's write signal communicates to the shorted word line and thus causes additional cells to be unexpectedly connected to sub-bit lines.) Similar arguments show that the following other defects also can cause multiple stuck bits affecting MSBs and/or LSBs: 14 1) MSB-side or LSB-side sense amplifier isolation control stuck at off, 2) C (C*) stuck at on, 3) sense amplifier isolation control stuck at on, 4) interrupted sub-bit line, 5) short between adjacent sub-bit lines with the same sense amplifier, 6) stuck-at-on bit-line precharge control or equalize control, 7) stuck-at-on sense amplifier precharge control, and 8) stuck-at-on switch matrix signal X or X*.
State coupling within a pair of adjacent cells on different word lines.
A short between adjacent word lines causes pairs of cells on the two word lines to be coupled as follows: Let cells i and j be two cells on the two shorted word lines that share the same sub-bit-line pair. Without loss of generality, assume that cell i was written more recently than cell j. If cells i and j connect to different sub-bit lines in a pair, the MSB of cell i is forced into the LSB of cell i and both the MSB and LSB of cell j (see Redeker 13 for details). If cells i and j connect to the same sub-bit line, the 2-bit (MSB, LSB) state of cell i is forced into cell j. If such a short interconnects the storage nodes of cells i and j on complementary sub-bit lines, writing (MSB, LSB) into cell i forces cell j into the state ( M S B ,L S B ), and vice versa.
State coupling from the MSB to the LSB of the same cell.
If switch control C (C*) is stuck at off, the LSB of all cells on a true (complement) sub-bit line is forced to be the complement of their MSB. Likewise, the LSB of all cells on a complement (true) sub-bit line is forced to be identical to their MSB. If switch control X or X* is stuck at off, the open switch prevents the LSB signal from being charge-shared with the MSB during writing. The result is that the LSB of all cells appears identical to the MSB.
Pattern sensitivities within a pair of cells on the same word line.
A short between adjacent sub-bit lines connecting to different sense amplifiers can produce several kinds of pattern sensitivities. What kind depends on where the short is with respect to the addressed cells and on the logic encoding used for the two shorted sub-bit lines. During sensing, cell signals and reference levels interfere with each other. As a result, the contents of cell pairs appear to interact so that fewer than 16 different state combinations can be stored and read back from the two cells (see Redeker 13 for details). Similar pattern sensi- Suppose a sense amplifier precharge control signal is stuck at off and an addressed cell is on the same (opposite) side of the switch matrix as the affected sense amplifier. The capacitance of the sense amplifier nodes holds the last MSB (LSB) data written to that sub-bit-line pair. A stuck-at-off bit-line precharge control signal prevents erasure of the cell signal last written to a cell on the same bit-line pair. The residual cell signal interferes with correct sensing of the next accessed cell. The next sensed MSB appears to be the last written MSB or its complement; the LSB is the same as the MSB or its complement, or is a random value. Selected circuit simulation results. Figure 4 shows the SPICE simulation waveforms of the MSB and LSB sub-bit lines for a cell on a complement sub-bit line in a defect-free ML-DRAM sequenced with the control waveforms from Figure Gillingham. 3 We made no attempt to minimize MLDRAM access time.) The initial 40% of the plots shows superimposed bit-line signals corresponding to initial writes of 00, 01, 10, and 11. Immediately after the write operation, the sub-bitline signals are equalized and precharged in preparation for a sensing operation. The middle 30% of the plots shows the MSB sensing, and the final 30% shows the LSB sensing.
Our manual analysis predicts that state coupling (fault class 4) will occur from the MSB to the LSB of the same cell if a switch control signal is stuck at off. Figure 5 shows the corresponding sub-bit-line signals from a SPICE simulation in which control signal C was fixed at 0 (off). Superficially the waveforms may not appear erroneous, but there is a problem affecting LSB sensing (note that the middle two signals no longer cross over). When we activate signal C to create and hold the LSB reference voltage on the appropriate sub-bit line (BL or BR), the voltage remains incorrectly at 1/2 V DD if C is stuck at 0. Thus, the LSB that is read will always have the same value as the MSB; in other words, there is an MSB-to-LSB state-coupling fault.
Figures 6 and 7 show simulated bit-line signals that illustrate state-coupling behavior involving two cells (fault class 3). This behavior is the result of two adjacent word lines, say WL i and WL j , being shorted together. Let i and j be two cells on the same sub-bit-line pair, accessed using WL i and WL j , respectively. In Figure 6 , we assume that cells i and j connect to the same sub-bit line; in Figure 7 , we assume the two cells connect to different sub-bit lines. Both figures show the bit-line signals for cell i when the value (MSB, LSB), where MSB, LSB ∈ {0,1}, is written into cell i, then the complementary value ( M S B , L S B ) is written into cell j, and, finally, cell i is read back to verify its contents. The signals are identified with cell i values 00, 01, 10 and 11. Note that we assume negative signal encoding. The waveforms in Figure  6 may at first look correct until we remember that during sensing (the last half), the recovered MSB and LSB values are those stored in cell j, not those stored in cell i. Thus, this simulation represents the case in which the state of one cell is forced into a second cell. Figure 7 shows that while the correct four voltages are formed and stored in cell i, the write to cell j runs into problems. On the MSB side, the MSB bit overwrites the LSB bit for cell j; thus, only 00 (the upper two curves) and 11 (the lower two curves) can be stored in cell j. Moreover, when cell i is sensed by its MSB and LSB side sense amplifiers, both its MSB and LSB are identical to the MSB value written to cell j. So this simulation represents the case in which cell j's MSB transfers into cell j's LSB and both the MSB and LSB of cell i.
MLDRAM testing strategies
We now consider the problem of testing an MLDRAM for the expected faults. First we describe an efficient Boolean test for the cell array of an MLDRAM in its original form. Then we outline possible design-for-testability enhancement opportunities for Gillingham's MLDRAM.
Testing the unmodified MLDRAM.
A useful first step in test design is to determine a lower bound on the length of any test that could detect all the modeled faults. For the ML-DRAM fault model, the possibility of class 5 pattern sensitivities implies that the tester must write and then read back and verify all 16 possible joint states of pairs of adjacent cells. Hence, each cell must be written and read at least four times, once for each possible 2-bit state. Consequently, any test that detects all single or multiple faults in the model must contain at least 8m operations, m denoting the number of 2-bit storage cells. We found a Boolean test of precisely this minimum length that detects all singly occurring faults from the first six fault classes-all the clean faults in the model. This test also happens to provide worst-case sensing conditions to check for degraded noise margins.
Our test involves alternately writing into the MLDRAM and then reading back and verifying four different complete data loads. The test thus requires 8m memory accesses (read or write). If the MLDRAM stores n bits, the test contains only 4n operations because n equals 2m for a 2-bit-per-cell design. This rather short test length is an important benefit of the ability to simultaneously read and verify 2 bits per cell. We specify each memory load as a tiling 8 using the 2 × 16 rectangular tiles shown in Table 2 . The tiles' short dimension must align along adjacent paired sub-bit lines (memory array columns), and the long dimension along groups of 16 adjacent word lines (memory array rows). In addition, the tiles must abut so as to completely cover all cells in the array. During each pass through memory, cells must be addressed in the same order. Cells must be accessed sequentially along each bit line, bit line by bit line, in what is often called fast-Y address order (X and Y typically denote row and column address fields, respectively). We assume that physically adjacent word lines have Y addresses that differ by one. Otherwise, we assume that the tester descrambles Y addresses to ensure that physically adjacent word lines are accessed consecutively.
Theorem. The test detects all the considered defects that cause clean faults in the MLDRAM fault model.
Proof outline. We prove this claim by showing that the test detects single instances of all faults of classes 1-6 or their underlying defects. Inspection verifies that all 32 cell positions in a tile region are loaded with the four values 00, 01, 10, and 11 after all four tiles have been written and read back. Hence, the test detects all defects that cause faults of classes 1, 2, and 4.
We can verify that for all tile positions that are horizontal neighbors (including those going from the 16th to the first columns), adjacent MSB values will be written and verified with all four combinations: 0 and 0, 0 and 1, 1 and 0, and 1 and 1. Hence, the test detects all class 3 faults. Further, the test's fast-Y address ordering, with the four MSB transitions just noted, ensures that all class 6 faults are detected.
We also can verify that all 16 possible ordered pairs from {0, 1, 2, 3} 2 = {00, 01, 10, 11} 2 are present in the columns of each of the four data tiles in Table 2 . Thus, all 16 different state combinations will be written to and read back from cell pairs connected to the same word line and to sub-bit lines connected to adjacent sense amplifiers. Hence, our test detects all subbit-line shorts that cause class 5 faults. So ends our proof.
Testing should detect degraded noise margins most easily during LSB sensing rather than MSB sensing because of the smaller differential signals involved. The LSBs of the tile data values form alternating word-line-aligned bars of 0's and 1's (both true and complement), as well as checkerboards (both true and complement). (Checkerboard patterns are frequently used to detect array noise and reduced noise margins in conventional DRAMs. 15 ) This observation supports our claim that the proposed test provides worstcase LSB sensing conditions and should therefore be effective at testing for degraded noise margins. One could readily enhance the test to detect excessive cell leakage currents by inserting an appropriate delay (four delays in all) between the times when new data is written and read back from memory. This delay must be chosen so as to give marginally excessive leakage currents sufficient time, typically tens of milliseconds, to cause observable errors. We would need characterization experiments and detailed process data to establish the most effective delay time.
MLDRAM testability enhancements. How might we modify an MLDRAM to enhance its testability? If MLDRAM technology is to be used in a commodity design in competition with conventional 1-bit-per-cell DRAM, we could provide standard DRAM fast access modes, such as page mode. Using essentially the same peripheral circuits as in a DRAM, these fast modes would speed the writing and reading of test patterns to the cell array. In addition, parallel test modes similar to those available in many commercial DRAM parts could be implemented in an MLDRAM. We can also consider massively parallel test modes that turn the sense amplifiers into parallel test pattern generators, since our test uses only four different test patterns. However, such test modes have not appeared economically viable for conventional DRAMs, and it is unlikely that the situation would be significantly different for most MLDRAM implementations.
For test time reductions beyond those resulting from conventional page modes and parallel test modes, we could exploit the inherent structural parallelism of the two MLDRAM cell arrays. A first testing phase using standard patterns could test the left and right cell arrays in parallel as conventional DRAMs. Test time would decrease by roughly a factor of four because of the two-fold parallelism of the cell arrays and the simpler, single-step DRAM sensing operation. Modified control waveforms that keep the switch matrix pass transistor switches open would be a convenient way of isolating the two cell arrays.
After the cell arrays are tested, a second phase would test the switch matrix. To verify the operation of the switches controlled by the X, X*, C, and C* signals, this phase would perform write and verify cycles to cells on true and complement sub-bit lines. The test mode just outlined would require changes to the timing chains and control logic, but it would avoid much more costly changes to the larger cell arrays.
The simple structure of the proposed test is conducive to implementation in a built-in self-test circuit. Redeker 13 describes a BIST MLDRAM implementation using our 4n test. The test pattern generator's complexity was only about 100 gates. Simple Boolean operations on the row address, column address, and tile number form the test data.
THE WORK DESCRIBED HERE is only the first step in an ongoing detailed characterization of Gillingham's MLDRAM. Our future work will include a more detailed simulation-based sensitivity analysis of the basic design. We are fabricating a test chip, and the resulting devices will permit much more detailed and realistic characterization studies. Uncovered design weaknesses may motivate the development of new tests. We plan to investigate further the design-for-testability enhancements mentioned here to better understand the tradeoffs among test time, fault coverage, fault diagnosability, design complexity, and silicon area increase.
