In order to improve the read performance at a low V DD , a new self-bias bitline voltage sensing scheme is described. This circuit greatly reduces the delay's dependence on bitline capacitance and achieves 19 ns reduction of the sense delay at low voltages. Multilevel storage sensing with this circuit is also discussed.
I. INTRODUCTION

F
LASH memories are used as storage devices for a wide variety of equipment. In particular, Flash memories are suitable for portable electronics equipment, such as personal digital assistants (PDA's), cellulars, and palm-top computers, because they are smaller, lighter in weight, and have greater shock immunity than hard-disk devices. For portable equipment applications, low power operation is important for long battery life. In order to decrease power consumption for a whole system, demand for a low supply voltage is increasing, and, in fact, of 3.3 V is the standard voltage for CPU's and RAM's in the market instead of of 5 V. Several 3.3-V operation Flash memories have been reported to meet this demand [1] - [5] .
The transistors are generally aggressively scaled to improve performance at this lower value. However, unlike other chips, such as CPU's and RAM's, Flash memories inherently require high voltage operation for their data program and erase. The core circuitry operates with another voltage supply , which must be set to a high voltage (often over 10 V) in the program and the erase operations. Therefore, the transistor parameters, such as channel length, the oxide thickness, the threshold voltage, and the drain breakdown voltage, have to be tuned to sustain operation and the usual device scaling cannot be adopted for them. In order to solve this problem, a multitransistor process has been introduced for 3.3-V operation Flash memories [1] , [2] , [4] , [5] . While a transistor is optimized with a device scaling for of 3.3 V, a transistor is designed to allow the high voltage operation for Publisher Item Identifier S 0018-9200(97)05301-8.
the program and the erase. Therefore, the oxide thickness and threshold voltage of the transistor are different from those of the transistor. The peripheral circuitry is designed with transistors to improve the power and performance of the chip. On the other hand, the core circuitry is designed with transistors, and in order to keep good read performance, an internally generated which is higher than is used to drive the core circuitry even in the read mode.
In order to improve the portability of equipment, single battery operation is desirable. In case of single-battery operation, a of 1.5 V is desirable. In fact, several 1.5-V chips have been reported [6] , [7] . Therefore, a 1.5-V operation Flash memory is also desirable to simplify the design of a system. However, a Flash memory inherently does not suit a low-voltage operation because of the constraint of a Flash memory cell. In order to realize a low-voltage operation, several problems must be addressed.
1) Interface between peripheral and core circuitry:
In the program and the erase mode, the core circuitry must be driven with a higher voltage than the peripheral circuitry. In addition, to achieve good read performance, should be set to a higher level than in the read mode to help compensate for the different transistor characteristics. Therefore, a level shifter circuit is necessary as an interface between the peripheral and the core circuitry. In a conventional level shifter circuit, some transistors, whose threshold voltage is not scaled down, must be driven with level signals. When is lowered to 1.5 V, the threshold voltage for the transistor is comparable to half of the . As a result, the level shifter has significant delay and poor operation margin, spoiling the read performance.
2) Sense circuitry speed and operation margin: As a Flash memory cell transistor has a stacked gate structure and its channel doping is optimized for its program and erase characteristics, its threshold voltage cannot be scaled in the same manner as the transistor. Therefore, the bias level for the gate and drain of the selected memory cell in the read mode should not be scaled. In other words, these bias levels become relatively higher to as decreases. This results in a longer time for charging the selected bitline and makes the sense speed more sensitive to , process fluctuations, and temperature. TABLE I  TRANSISTOR PARAMETER   TABLE II  BIAS CONDITION FOR FLASH MEMORY CELL is now standard. In case of the single voltage supply chip, supply must be internally generated by a charge pump circuit. The level is not scaled with , because the Flash memory cell characteristics determine the level. Furthermore, the threshold voltage of the transistors and their back bias characteristics which cause energy loss in the charge pump circuit are not lowered as , because the charge pump circuit must be composed of transistors. Therefore, lowering results in a lowering of the efficiency of the charge pump circuit which make low power operation even more difficult. For the third problem concerning the internal high voltage generation, several efficient charge pump circuits have been reported [8] , [9] . Furthermore, this problem can be solved by using an off-chip dc-to-dc converter. Therefore, we focus on the other two problems in this paper.
In the next section, we describe issues related to the interface between peripheral and core circuitry. In Section III, a new sensing scheme for a low operation is proposed. Through this paper, we assume that the multitransistor process is used to design circuits. The main transistor parameters are summarized in Table I .
II. LEVEL SHIFTER CIRCUIT
A Flash memory cell utilizes hot-electron injection and electron tunneling mechanisms for its data program and erase. High voltage biases at the selected wordlines, bitlines, and source lines are necessary to selectively activate these phenomena on memory cells. An example of the bias conditions for a selected memory cell transistor is shown in Table II [1] . As already mentioned, in the case of the multitransistor process,
should be set at a higher voltage than , while signals in the peripheral circuitry have level in all operation modes. Therefore, a level shifter circuit which converts a level signal into a level signal is necessary for the row decoder, column selection, and source selection circuit. Since the delay and power of the level-shift circuit can be large, it is an important circuit to optimize.
While the most difficult voltage levels occur in the program and erase mode, the delay of the level-shifter at these voltages is not critical. In fact, the inputs to the level shifters switch while the is at the read level. After the logic change in the circuit is completed, is set to the program or erase level. Consequently, the performance of the level shifter only needs to be optimized for a read.
Recently, many Flash memories utilize a negative-gate erase scheme [1] , [10] . In particular, this erase scheme is suitable for a single voltage supply Flash memory, because it realizes lower power consumption in erase mode than a source erase scheme. In the case of a negative-erase type Flash memory, wordlines must selectively be set at a negative voltage , as shown in Table II . Therefore, not only must the shift circuits translate a signal from to , but they must also level shift from to for the row decoders.
Conventional level shifter circuits are shown in Fig. 1 , (a) feedback pMOS type, (b) cross-coupled pMOS type; and (c) represents the dependence of the speed and the power consumption on . For simulations, we assume that for the read operation is 5 V. In Fig. 1 , and the level gate input is insufficient to drive the pull-down drivers. This also means these shifters have poor operation margin for the threshold voltage fluctuation of the transistor. It is possible to increase the drivability of the pull-down drivers by increasing their size, but that also increases the parasitic capacitances on their drain nodes and the switching noise caused by their gate-drain coupling. Consequently, that would increase the power consumption without improving the switching delay. As is indicated in Fig. 1(c) , the power consumption of supply is much larger than that on supply, especially in the low region. In the case of a single voltage supply Flash memory, the read voltage of 5 V is generated with an on-chip charge pump circuit, and the actual power consumption on is estimated at times that shown in Fig. 1(c) , assuming that the power efficiency of the charge pump circuit is . Therefore, the power consumption on is dominant at the level shifters. The rapid increase of the power consumption at under 1.5 V is caused by dc current leakage through to during the transient switching. At of 1.5 V, 28% of the total power on is consumed by the dc current leakage current. Lowering causes an increase of the transient time, as shown in Fig. 1(c) , and this leads to an increase of the power consumption on . This fact suggests that decreasing the transient time can improve not only the switching delay, but also the total power consumption for the level shifter circuit. 
A. High-Level Shifter
We propose a new level shifter circuit with a bootstrapping switch, shown in Fig. 2 . The pull-down drivers and are driven by bootstrapped signals BOOT and BOOT to enhance their drivability. Since this circuit is basically a Fig. 1(a) , BOOT must be switched between a bootstrapped level and , while BOOT can be switched between a bootstrapped level and . level input is not necessary to turn off , because its source, , rises to . In order to generate these waveforms, a new bootstrapping circuit with a CMOS switch is designed. The circuit is shown in Fig. 2(b) . and are bootstrapping capacitors, and are precharge transistors, and and form a CMOS inverter to switch the output, BOOT . Normally-on type is inserted to relax the maximum stress applied in the gate oxide of . and are fabricated in an n-well which is isolated from other n-wells and is biased to . While IN is high, turns on, pulling the BOOT low, and is precharged to through . When IN goes low, is boosted up above by . Therefore, sufficiently turns on although it is a transistor. Thus, BOOT is driven to the bootstrapped level above through , and BOOT is precharged to through . In the transient state of switching, slightly discharges the bootstrapped level of , because remains on until BOOT rises high enough to cut off . The charge lost through is less than 10% of the total charge on , so the power loss is small. When IN goes high, BOOT is switched to and boosts up BOOT above . In order to generate the bootstrapped level signal, the junction of the transistor must be tuned to sustain voltage of 2 . But, the maximum stress applied in the gate oxide does not exceed at any transistors in this switch circuit. Therefore, the gate oxide thickness of the transistor can be optimized for operation. The simulated waveforms are shown in Fig. 3 . In this way, the necessary switching of the gate input is realized for the both pull-down drivers, and their current drivability is enhanced. Fig. 4 represents the performance comparisons between the feedback type conventional level shifter and the proposed one. As shown in Fig. 4(a) , the proposed shifter circuit improves the switching delay and also the power consumption under of 1.5 V, because the improvement of the switching speed by the bootstrapping reduces the dc current leakage between and during the transient switching. Since this decrease of the power consumption on is larger than the increase of the power consumption on caused by the bootstrapping scheme, the total power consumption decreases. In particular, the bootstrapping scheme improves the operation margin of the level shifter. Fig. 4 (b) represents corner simulation results of the switching delay. The simulation conditions are summarized in Table III . Under the typical and the best condition, the differences of the delay between the conventional shifter and the proposed shifter are small, less than 0.1 ns. But, under the worst condition the delays of the conventional shifters increase by 3.5 ns while the delay of the new shifter increases by only 1.3 ns. This means that the proposed level shifter has much better operation margin for with reasonable operation down to of 1 V. Compared with the conventional shifters, the proposed shifter needs additional layout area for bootstrapping circuit, including bootstrapping capacitors and an isolated n-well. However, this level shifter is used only at the interface circuits between peripheral circuitry and the core circuitry, for example, predecoding circuits, and so the area penalty is negligible compared with the total chip area.
B. High/Low-Level Shifter
As already mentioned, a row decoder circuit for a negativeerase type Flash memory requires both the high-level and low-level shifts of the input signal. This low-level shift is also desirable for the read operation. Subthreshold leakage current of a stacked gate type transistor such as a Flash memory cell is inherently larger than that of a normal transistor, because the voltage of its floating gate is raised by the capacitive coupling with its drain node. Furthermore, the subthreshold current of an erased cell fluctuates because of the threshold voltage fluctuations after the erase. As a result, the total leakage on a bitline caused by the subthreshold leakage current on the unselected cells can be large enough to spoil the read operation when the number of cells per bitline is large. This problem can be solved by driving the unselected word to a negative level, because the subthreshold current exponentially decreases as the gate voltage decreases.
This function can be realized by the serially connected two conventional shifter circuits shown in Fig. 5 . For the negativeerase scheme, a double or triple well structure is introduced to realize a negative bias for wordlines in the erase mode [1] , [3] , and and are transistors fabricated in a p-well isolated from a p-type substrate by an n-well. The first level shifter converts a 0-signal to a differential 0-output, while the second level shifter takes the 0-signal and generates a -output. However, this circuit naturally has the same problem as the conventional shifter and, in particular, the power consumption significantly increases. The switching speed of the input signals for the second shifter, and , is slow because of the switching delay of the first shifter. This increases the dc current leakage, and the total power consumption becomes significantly large. A feedbacktype level shifter circuit which converts both high/low-level of the input signal has also been reported for DRAM's [11] . This shifter, however, is not acceptable for a Flash memory, because it requires a very low threshold voltage transistor and high voltage operation is impossible.
In order to solve this problem, we propose a new level shifter circuit in Fig. 6(a) . Apart from an inverter, this circuit is entirely composed of transistors. The source of nMOS driver, , is connected to while the source of pMOS load, , is connected to . Therefore, a input is sufficient to turn on , and is sufficient to turn on , as shown in Fig. 6(b) . On the other hand, level shifts of the input signal are needed to turn them off, for example, IN must be set to to turn off . In order to realize these level shifts, feedback circuits composed of , and an inverter, INV, are implemented. IN and IN , respectively , and the output is driven to or according to the 0-level input. The simulated waveforms are shown in Fig. 7 . Fig. 8 represents the performance comparison between the proposed high/low-level shifter and the conventional one. In the proposed circuit, the power consumption on is approximately one third of a conventional two-stage level shifter. Since the power consumption on increases with , the total power consumption has minimum value around of 1.5 V. At of 1.5 V, the power consumption decreases to 40% of that of the conventional scheme. Furthermore, as is shown in Fig. 8(b) , the switching delay and the operation margin of the level shifter are also improved. The switching delay in the worst-case corner improves by 3.3 ns.
In the row decoder circuit, the geometric constraints on the layout work must be considered to fit the circuit to a tight pitch of the memory cells in an array. Unlike the high-level shifter proposed in Fig. 2 , this level shifter does not need an isolated n-well which requires wide layout margin. Therefore, this circuit is suitable for a tight-pitch design and a conventional well layout.
III. SENSE CIRCUIT
A conventional sensing circuit, including memory cell arrays, is shown in Fig. 9 . The selected bitline level is clamped by source follower bias transistors and whose gates are biased to . The bitline level is set to the source level of and , because the transistors in the -selector and the isolation transistor are sufficiently large in size, and they are driven by level signals. is inserted to isolate the sense amplifier from the high voltage applied to the bitline in the program mode. This bitline clamp is necessary to avoid a soft write problem, which is unintentional data programming caused by read stress. Although the drain and the gate voltages for a selected cell in the read mode are much lower than those in the program mode, the total stress time can be much longer. Therefore, the bitline level in the read mode must be low enough to eliminate the soft write. Furthermore, the bitline clamp scheme improves the data sense delay in the core area, because it limits the swing of the bitline which has a large parasitic capacitance. In case of 0-read, the memory cell current is not enough to discharge the bitline, and the bitline is charged through and . After the bitline reaches a level equal to minus a threshold voltage, and turn off and the bitline level does not rise any more. As a result, the sense amplifier input is isolated from the bitline and can be pulled up quickly by the load, . In case of 1-read, the cell current is large enough to discharge the bitline and is pulled down to almost the same level as that of the bitline. If the bitline clamp level is lowered, the sense speed for 1-read is affected because of the decrease of the cell current. Consequently, the bitline should be biased at an optimum level, which is about 1 V for the current Flash cell [1] . Since the time for the bitline charge is dominant in the sense delay and it depends on the bitline capacitance, a memory cell array architecture with bitline divisions is usually implemented to reduce the bitline capacitance. Although the bitline division greatly reduces the delay in the core area, it results in significant area penalty.
Since a Flash memory cell transistor has a stacked gate structure, and an effective gate bias is reduced by the capacitive coupling of the stacked gates, its threshold voltage is inherently higher than that of a normal transistor. Furthermore, it is difficult to scale down the threshold voltage in the same manner as a transistor, because its channel doping must be optimized for programming characteristics. As a result, even if is lowered, it is desirable to keep the bias condition for a memory cell in the read operation unchanged. In fact, for 3.3-V operation chips, the internally generated supply of 5 V is used for gate bias [1] . When the bitline level is kept at the same level, it becomes relatively higher to as decreases. A bitline clamp level of 0.8 V is over 50% of for 1.5 V, while it is only 16% of for 5 V. This means that lowering affects the time for the bitline charge, which is the dominant component of the memory core delay. Furthermore, it requires a more accurate control of the bitline clamp level, because the same amount of the fluctuation of the clamp level results in greater delay variation in the sensing. Therefore, the conventional sensing scheme is not suitable for a low such as 1.5 V.
A. Self-Bias Bitline Sensing Scheme
A new sense circuit based on a self-bias bitline sensing scheme and its timing chart are shown in Fig. 10(a) and (b) , respectively. A dummy bitline is implemented for each I/O block. This dummy bitline has the same bitline capacitance as the normal bitlines, because it is located in a memory cell array and has the same geometric shape as the normal bitlines except that the source nodes of the memory cells connected to the dummy bitline are floated. The dummy bitline is used to charge the selected bitline and to control the bias voltage, . When a new set of addresses is asserted, an address detection circuit triggers a precharge pulse PRE to precharge the dummy bitline to . During this precharge period, all the bitlines and the sense amplifier input are reset to the level by a reset signal, RST. After the precharge and the column selection are completed, RST is disabled and an equalize signal EQ is asserted to equalize the newly selected bitline and the dummy bitline. During the equalization, both lines are set to the half of by charge sharing. More precisely, the bitline level is slightly lower than half of because of the parasitic capacitance in -selectors. But, it is not necessary for the bitline to be precisely half of in this self-bias bitline scheme. In the case of of 1.5 V, the bitline level is about 0.75 V, which is sufficiently low to avoid the soft write problem. In the conventional sensing scheme, the selected bitline is charged with source follower type bias transistors in order to limit the bitline level by their gate level. Therefore, the bitline charge speed is limited by the bias voltage level. In this scheme, however, the bitline level is limited not by the signal level, but by the charge sharing. Therefore, the bitline charge can be accelerated by driving EQ to the level with the level shifter circuit shown in Fig. 2 . As the dummy bitline has the same capacitance as the bitline, the time for the bitline charge does not depend on the bitline capacitance if an equalize transistor has enough current drivability. Thus, this scheme realizes fast bitline charge independently of the bitline capacitance. A sensing scheme using charge-transfer has been reported [13] , but it is suitable for differential sensing cases and cannot be utilized for the single-ended sensing of flash memories. During the equalization, is also set at the same level as the bitline through the bias transistor, . This does not affect the charge sharing on the bitlines, because the parasitic capacitance on is negligibly small compared with the bitline capacitance, and the load transistor is off. During the equalization period, the wordline selection is activated. Although the cell current begins to discharge the bitline and the dummy bitline level, the level drop caused by the cell current is estimated at only 0.03 V, assuming that the bitline capacitance is 5 pF, the cell current is 60 A, the time for the discharge is 5 ns, and this level drop can be completely compensated by self-bias bitline voltage, as explained later. Since the equalization time is overlapped with RC-delay of the wordlines by driving wordlines during the equalization period, the time overhead for the equalization can be minimized.
Unlike the conventional sensing scheme, the bitline is charged independently of in this scheme. Therefore, must be controlled to match the bitline level for fast sensing. In the bias voltage generator, the dummy bitline level is used to monitor the bitline level, because the dummy bitline has the same level as the selected bitline after the completion of the equalization. In the equalization period, , precharged to , is pulled down through , which is inserted to cancel the threshold voltage drop between and the bitline voltage at . Since operates in its saturation region, is expressed as where is the drain current, is the gate oxide capacitance per unit area, is the mobility, is the threshold voltage including the back bias effect.
is the dummy bitline voltage. Since the bias transistor, , is also in its saturation region, is also expressed as
After the equalization Therefore, and have the same back bias voltage From these equations, , which is the current that flows through , is expressed as
In this way, can be controlled by and the transistor width ratio between and independent of the bitline level. By tuning these parameters, can be set at an optimum level with which can quickly move in the sense period which follows next. This means is self-controlled according to the net bitline level in every read cycle and is set at the optimum level even if the bitline level fluctuates. In the sense period, remains constant, because the dummy bitline is isolated from the bitline and stabilized by , which is controlled by and . At the beginning of the sense period, has the same level as the bitline, and the load transistor turns on. In the case of 1-read, remains at the same level as the bitline, because the cell current discharges the bitline and keeps on. On the other hand, in the case of 0-read, quickly rises, because is self-controlled and the bitline charge is not necessary to make turn off.
In this way, level is determined by a competition between the cell current and the load current. Therefore, the load current is designed to match a cell current in order to compensate cell characteristic fluctuations. In the load control circuit, the load current is controlled by a cell current, , with a current mirror circuit and is optimized by the transistor width ratio between and is compared with a reference voltage, , by a differential sense amplifier. In the conventional sensing scheme, must be generated by using a dummy cell to compensate cell characteristic fluctuations between and . But, in the proposed scheme, the cell characteristic fluctuations are compensated by the load current. In addition, does not need to have high current drivability, because it is not equalized with to accelerate the bitline charge. Therefore, can be generated by using charge sharing between capacitors, as shown in the reference voltage generator. As a result, dc current flow is eliminated and power consumption for generation greatly decreases. After the sensing is completed, the output of the sense amplifier is latched, and the sense circuit is disabled for power saving. In the proposed sensing scheme, dynamic circuit operations are utilized for the bitline charge and the generation. Therefore, each read cycle must begin with the precharge period, followed by other periods in a proper timing. In order to cope with address skew, after every address change, the current state is disabled and the precharge period is restarted by the address detection signal.
B. Estimation by Simulation
The simulated waveforms of the self-bias bitline sensing scheme are shown in Fig. 11 . For 1-read, the sense amplifier output can immediately switch after the sensing begins, because initially stays at a 1-read level. For 0-read, starts rising within 1 ns after the sensing begins. This indicates that the self-bias bitline realizes fast sensing for both 0-read and 1-read. Fig. 12 represents the sense delay for corner simulations. The sense delay is determined by the time from an address input to a sense amplifier output, and this does not include an output circuit delay. The proposed sensing scheme realizes less dependence on the conditions than the conventional one and improves the sense delay by 19 ns under the worst condition. This scheme has better noise immunity than the conventional scheme, because is automatically matched with the bitline level. Under the noise condition where the peak of a bump on is about 30% of , the sense delays increase by 5 ns and 8 ns for the proposed scheme and the conventional scheme, respectively, according to simulation results. In Fig. 10 , we assume that and have the same threshold voltages, because they are both nMOS transistors with the same back bias voltage. But, a threshold voltage mismatch between them caused by process fluctuations can still occur in an actual chip. The mismatch can be minimized by layout techniques, such as drawing them close to each other and in the same direction. Assuming that the mismatch is 0.05 V, the sense speed is affected by only 4 ns according to simulation results. The dependence of the sense delay on the bitline capacitance is shown in Fig. 13 . In the conventional scheme, the sense delay depends on the bitline capacitance, because the time for the bitline charge linearly increases with the increase of the bitline capacitance. In the proposed scheme, however, the selected bitline is charged by the charge sharing with the dummy bitline which has the same capacitance as the bitline. In addition, can immediately be isolated from the bitline in the sensing, because is automatically controlled to the optimum level. As a result, the bitline capacitance has a smaller effect on the sense delay. Consequently, the bitline division for memory cell arrays is not required or the number of divisions can be reduced, so that the chip area can be greatly reduced. The chip performance can also be improved, because the memory cell array architecture is simplified. Assuming that the total delay at the signal bus and the output circuit is 10 ns, a typical access time is estimated to be 35 ns with a bitline capacitance of 10 pF, which corresponds to a Megabit class Flash memory.
For a Flash memory, data is stored in a memory cell as a threshold voltage change accomplished by injecting or extracting electrons in its floating gate. In other words, the cell current of the selected memory cell is changed according to the stored data. Although the recent Flash memory chips have an on-chip controller to control threshold voltages of memory cells during the program and the erase, they fluctuate because of the program and the erase characteristic differences among the cells. Therefore, it is desirable for a sense amplifier to have a wide operation margin for the cell current distribution. Fig. 14 represents the operation margin for the cell current, where the threshold cell current between 1-data and 0-data is designed to be 40 A under the typical condition. The cell current for 0-read represents the total leakage current on the bitline. The shaded regions represent read fail regions. In the conventional scheme, both sense delays for 0-read and 1-read strongly depend on the cell current. This is because the leakage current disturbs the bitline charge, spoiling 0-read sense speed, and for 1-read, bitline discharge is delayed as the cell current decreases. Furthermore, the operation margin for the cell current varies widely according to the conditions. On the other hand, in the proposed scheme, the sense delay has much higher sensitivity to the cell current. For 0-read, this scheme has a wide margin for the leakage current, because the bitline is charged by the charge sharing, and the self-bias bitline voltage completely compensates the voltage drop by the leakage current. On the other hand, the 1-read delay does not depend on the cell current at all, because in every read cycle is reset to the 1-read level before the sensing, and the sense amplifier initially outputs 1-data. As a result, the total read fail region is greatly narrowed and also fast sensing is realized. Consequently, the restriction for the threshold voltage distribution and the retention characteristics of memory cell can be relaxed. Furthermore, it has much less dependence on the different simulation conditions. This means that the selfbias bitline sensing scheme greatly improves the operations margin and noise immunity in the sense circuits. Fig. 15 represents the comparison of the current consumption for the sensing schemes. The current consumption is estimated as an average value per I/O in one read cycle and this includes the current for the sense circuit, the memory cell core, and the clock generators used to control the sense circuit. In the proposed scheme, the current for circuit increases, because the load current must be controlled with a cell current to compensate cell characteristic fluctuations. On the other hand, can be generated with a small current, because is not used for equalization with , and the compensation for cell characteristic fluctuations is not needed. The current consumption for also decreases, because is controlled with the bitline level. The current consumed for the bitline charge increases, because the dummy bitline charge is also necessary. However, the value for the bitline charge in Fig. 15 is for the worst case, and it can be reduced by half when a read cycle is short, because the dummy bitline retains the half level before the next precharge. As a result, even in the worst case, the total current consumption for the proposed scheme is almost the same as that for the conventional scheme. From the point of view of energy (power time) consumption per cycle, the selfbias bitline sensing scheme can reduce energy consumption by more than 35% compared with the conventional scheme at the minimum read cycle. Since the current for bitline charge is reduced by half at minimum read cycle, the current consumption is reduced to 85%. In addition, the cycle time itself can be shorter than that for the conventional scheme. Assuming the total delay which is not included in the sense delay shown in Fig. 12 is 10 ns, the read cycle can be reduced to 75%. Therefore the energy consumption per cycle is reduced by over 35%.
C. Multilevel Storage Sensing with Self-Bias Bitline
Multilevel storage is very desirable for mass data storage because multibits of data are stored in a memory cell. A Flash memory cell is suitable for multilevel storage, because data is stored in a cell as a threshold voltage shift, and this shift can be flexibly controlled. Several multilevel storage Flash memories have already been reported [4] , [5] , [12] . In these chips, multilevel storage is realized by tightly controlling the distribution of the memory cell threshold voltage, and their basic sensing schemes are similar to the conventional one shown in Fig. 9 . The conventional sensing scheme inherently needs a longer sensing time and has a poor margin for the cell current at of 1.5 V, as already explained. Consequently, the conventional sensing for multilevel storage requires a more precise control for the memory cell threshold voltage and a longer sensing time than the conventional 1-b storage at a low . For multilevel sensing, the self-bias bitline sensing scheme is suitable because it realizes much wider margin for the cell current fluctuations and also faster sense speed than the conventional scheme at of 1.5 V. In order to sense multilevels in a cell, the threshold value in cell current to distinguish adjacent levels must be changeable. For example, storing 2 b per cell requires three threshold cell current values, because four different levels are necessary to encode 2-b data. In the self-bias bitline sensing, the data is determined by the competition between the load current and the selected cell current. Therefore, the threshold cell current value can be changed by controlling the load current. Furthermore, the bias voltage for the bias transistor should also be tuned to set the bias transistor at the optimum state for each threshold current. As already explained, the load current and the level can be controlled by the current mirror transistor width ratio between and and between and in Fig. 10 , respectively. These ratios can be changed without any changes at and by splitting and and switching them according to the necessary threshold value. Therefore, the current threshold value is changeable without any speed penalty. The four different values can be detected by two sequential sensings with one sense amplifier [12] . Fig. 16 represents the dependence of the sense delay on the cell current for the four-level sensing. Its three threshold current values are designed to be 20 A, 40 A, and 60 A under the typical condition. Although the threshold current level varies slightly according to the conditions, the read fail regions are much narrower than the read pass regions, and cell current margin over 10 A is obtained for each level at the sense delay over 35 ns. This suggests that the self-bias bitline sensing scheme is much more suitable to sense multilevel storage, and the restriction for the memory cell threshold voltage distributions can be greatly relaxed compared with the conventional scheme.
As is already mentioned, the subthreshold leakage current on the unselected cell spoils the read performance. Particularly, in case of multilevel storage, the subthreshold leakage current can be larger, because it requires a lower threshold voltage for an erased cell than that for the normal storage. If this current leakage occurs, the cell current margin in the level-0 pass region in Fig. 16 is spoiled, and a more tight control for the threshold voltage distribution is required. However, by biasing unselected wordlines to a negative level with the high/low-level shifter, shown in Fig. 6 , the leakage current on the unselected cell can be eliminated and the cell current margin even for multilevel sensing can be improved.
In this way, the combination of the self-bias bitline sensing scheme and the negative bias for unselected wordlines with the high/low-level shifter circuit is suitable for the multilevel storage type Flash memories. This relaxes the restriction for the memory cell threshold voltage control in the program and the erase and also realizes fast multilevel sensing even at of 1.5 V.
IV. CONCLUSION
We proposed circuit techniques for a 1.5-V Flash memory. The proposed bootstrapping level shifter circuits realize fast switching speed and wide operation margin for 1.5 V operation without any fabrication process changes. The high/low-level shifter circuit for the negative erase also improves the read operation margin by eliminating leakage current on a bitline. The self-bias bitline sensing scheme greatly improves both the sense speed and the operation margin at a low . This scheme is suitable for multilevel storage sensing because of its high sensitivity to the cell current and is also applicable to other chips which utilize single-ended sensing, such as EPROM's and multiport SRAM's. By employing these techniques, a fast and low cost 1.5-V Flash memory chip can be designed and its wide operation margin is desirable for battery operated equipment use. A typical access time for a Megabit class chip is estimated to be 35 ns without bitline divisions. The remaining issues for a low-power Flash memory, including the program and erase operation, are the design of efficient internal voltage generator circuits and the fabrication process improvement for the reduction of the program and erase current of a Flash memory cell.
