Memory Design for Centimeter-Scale Organic and Nanometer-Scale Silicon Technologies by Zhang, Wei
  
 
 
 
 
Memory Design for Centimeter-Scale Organic and 
Nanometer-Scale Silicon Technologies 
 
 
 
 
A DISSERTATION 
SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL 
OF THE UNIVERSITY OF MINNESOTA 
BY 
 
 
 
 
WEI ZHANG 
 
 
 
 
 
 
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS 
FOR THE DEGREE OF 
DOCTOR OF PHILOSOPHY 
 
 
 
CHRIS H. KIM 
 
 
 
 
JULY 2012 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
© WEI ZHANG 2012 
  i 
Acknowledgements 
 
First and foremost, I would like to express my deepest gratitude to my advisor, 
Professor Chris H. Kim. His passion, patience and insights have been guiding me all the 
way towards my Ph.D.’s degree. I always feel fortunate to have him being my academic 
advisor and I am indebted to him for all those insightful comments and advice. He is 
actually not only an advisor on my academic work, but also a great teacher to lead me 
growing into an expert, and a close friend to help us out when we are in need of help.  
Second, I would like to thank all my academic committee members, Professor Sachin 
Sapatnekar, Professor C. Daniel Frisbie, Professor Kia Bazargan, and Professor Gerald 
Sobelman. Their insightful comments helped me to improve my work and this 
dissertation. 
Third, I would like to thank my colleagues in the VLSI research lab at the University 
of Minnesota. Their comments and efforts in various collaboration projects are precious 
to my achievements. They are: Dr. Jie Gu, Dr. Tony Kim, Dr. John Keane, Dr. Dong 
Jiao, Dr. Kichul Chun, Pulkit Jain, Seung-hwan Song, Ayan Paul, Xiaofei Wang, Bongjin 
Kim, Weichao Xu, Dan Liu, Won-ho Choi, Jongyeon Kim, Saroj Satapathy and Ed 
Pataky. 
Fourth, I would like to thank all my colleagues in Professor C. Daniel Frisbie’s and 
Professor Jianping Wang’s groups who had made great contributions in our collaboration 
tasks. They are: Dr. Yu Xia, Mingjing Ha, Dr. Daniele Braga, and Hui Zhao. 
  ii 
I would also like to thank the Graduate School of University of Minnesota for the 
Doctoral Dissertation Fellowship, which provides me with financial aid to support my 
research during the final year of my doctoral program. 
Last but not least, I would like to thank Broadcom® Corporation and its engineers for 
providing me with the internship opportunities, which enhanced my expertise and helped 
me in achieving my success. 
  iii 
Dedication 
 
This dissertation is dedicated to my parents and my wife, Lan Huang. Their consistent 
and unconditional support has equipped me with confidence and courage.  
Finally, I dedicate the merit and virtue of my work to all living beings. May all 
become compassionate and wise. 
 
  iv 
Abstract 
Low power memory is always desired due to its significance in many large-scale 
applications. It is important to emerging technologies such as organic electronics, since it 
is an indispensible component to extend the technology towards larger application scope 
with complicated functionalities. It is also a hot topic in the mature silicon technology 
because the device scaling makes memory designs challenging with increasing leakage 
currents and process variations. 
Organic electronics deals with conductive polymers and plastics, and is capable of 
realizing large area flexible applications, which cannot be fulfilled by modern silicon 
technology. Conventional organic devices require a high operation voltage due to its low 
carrier mobility. Ion-gel gated OTFTs (gel-OTFTs), however, deliver unusually high gate 
capacitance through an electrolyte-gated structure, and therefore offer sufficient drive 
currents under a low voltage. Being an emerging technology, few attempts have been 
made on organic memory designs. In this dissertation, we first propose an improved 
design-fabrication-testing flow to significantly facilitate the entire process, which boosts 
the design efficiency and fabrication yield and thus enables the implementation of 
complex circuits such as memory array. An organic process design kit (OPDK) with 
various modeling approaches allows designers to easily design organic circuits in a 
similar way as that in silicon technology. Various circuit components including logic 
gates, ring oscillators and a D-flipflop were demonstrated and a general purpose organic 
dynamic memory cell was proposed for the first time. The cell, known as a DRAM gain 
  v 
cell, achieves a sub-10nW-per-cell refresh power with a retention time of over 1 minute, 
which is 5 orders of magnitude longer than that in silicon designs. 
The same DRAM gain cell architecture is also found potential as embedded memory 
in the modern silicon technology, where the prevailing 6T SRAM is suffering from 
leakage power and poor low voltage margin when devices keep scaling down. In this 
dissertation we report the first variation-aware performance analysis on the silicon gain 
cell and reveal that conventional corner simulations are no longer valid in capturing worst 
cases of gain cells. Insights can be obtained through the various analysis approaches 
described in the dissertation to benefit future memory design strategy and device 
optimization. 
With innovations in cell structure and peripheral circuitry, the silicon gain cell 
performance can be further enhanced to compete with the mainstream 6T SRAM. In this 
dissertation, we for the first time experimentally demonstrate a gain cell design with 
write-back-free read operations, utilizing its non-destructive read nature to improve the 
read speed into GHz regime without sacrificing retention time. Various circuit techniques 
including a local-sense-amplifier architecture are proposed to eliminate the need of a 
complex current-sensing scheme, and a dual-row-access mode is proposed for further 
power saving in half-utilization scenarios. The test chip in a 65nm low power process 
achieves a 23.9% power saving compared to a 6T SRAM at 0.6V retention voltage and 
an additional 27.8% power saving during cases when only half array is needed. 
  vi 
Table of Contents 
List of Tables ………………………………………………………………………….. ix 
List of Figures……………………………………………………………………………. x 
Chapter 1 INTRODUCTION ……………………..………………………...…………… 1 
1.1 Low Power Memory .....……………………………………………………... 1 
1.2 Organic Device Platform Overview………………………………………….. 2 
1.3 Ion-Gel Gated OTFT Technique …………………………………………….. 4 
Chapter 2 ORGANIC PROCESS DESIGN KIT (OPDK) ……………………………. 9 
2.1 The New Efficient Design – Fabrication – Testing Flow ………………….. 10 
2.2 Organic Device Modeling ………………………………………………….. 13 
2.2.1 Behavior Modeling………………………………………………. 13 
2.2.2 Compact Modeling ……………………………………………….. 14 
2.2.3 Advanced Modeling Through Sub-Circuit ………………………. 15 
2.3 Organic Process Design Kit (OPDK) in Cadence® Virtuoso Environment .. 17 
  2.3.1 Schematic Design ………………………………………………… 18 
2.3.2 Layout Design ……………………………………………………. 18 
2.3.2.1 Design Rule Check (DRC) ……………………………... 19 
2.3.2.2 Layout Versus Schematic (LVS) ………………………. 19 
2.4 Automatic Printing …………………………………………………………. 20 
2.5 Testing ……………………………………………………………………… 22 
2.6 Summary …………………………………………………………………… 22 
Chapter 3 BASIC LOGIC CIRCUIT DESIGN ……………………………………… 23 
  vii 
3.1 Combinational Logic Circuit Design ………………………………………. 23 
3.2 Sequential Logic Circuit Design …………………………………………… 25 
3.3 Summary …………………………………………………………………… 28 
Chapter 4 ORGANIC GAIN CELL MEMORY DESIGN …………………………...... 30 
4.1 Memory Design Discussion ………………………………………………... 30 
4.2 Gain Cell DRAM …………………………………………………………... 31 
4.3 Test Vehicle Design Details ………………………………………………... 34 
4.4 Measurement Results ………………………………………………………. 36 
4.5 Summary …………………………………………………………………… 40 
Chapter 5 SILICON GAIN CELL MEMORY ANALYSIS …………………………... 41 
5.1 Introduction ………………………………………………………………… 41 
5.2 Silicon Gain Cell EDRAM ………………………………………………… 42 
5.3 Bitline Delay Variation …………………………………………………….. 46 
5.3.1 Corner Simulation Pitfalls ...……………………………………… 46 
5.3.2 Cell Node Voltage and Read Path Strength ……………………… 48 
5.4 Practical Issues with Monte Carlo Simulations ……………………………. 50 
5.5 Statistical Simulation Results ……………………………………………… 53 
5.5.1 Comprehensive Analysis ………………………………………… 53 
5.5.2 Contour Plot Analysis ……………………………………………. 55 
5.6 Summary …………………………………………………………………… 58 
Chapter 6 SILICON GAIN CELL MEMORY DESIGN ……………………………… 59 
6.1 Introduction ………………………………………………………………… 59 
  viii 
6.2 Write-Back-Free Read Operation ………………………………………….. 60 
6.3 Voltage Sensing w/ Local Sense-Amplifier Architecture ………………….. 64 
6.3.1 Read Disturbance Issue …………………………………………... 64 
6.3.2 Effectiveness Evaluation …………………………………………. 66 
6.3.3 Design Conclusion ……………………………………………….. 69 
6.4 Dual-Row-Access Low Power Mode ……………………………………… 69 
6.4.1 Multi-Row-Access Analysis ……………………………………... 70 
6.4.2 Dual-Row-Access Mode …………………………………………. 71 
6.5 Measurement Results ………………………………………………………. 72 
6.6 Summary …………………………………………………………………… 75 
Chapter 7 CONCLUSION …………………………………………………………… 77 
REFERENCE …………………………………………………………………………... 80 
 
  ix 
List of Tables 
Table 1. Comparison of recent OTFT techniques on the four most desired features........ 5 
Table 2. Comparison of various embedded memory options.......................................... 42 
Table 3. Summary of the results from Figs. 5.5 and 5.6.................................................. 50 
Table 4. Simulated voltage window improvement for PCOU coupling and regulated 
WBL schemes, respectively................................................................................ 67 
Table 5. Simulated voltage window improvement for LSA architecture with different 
number of cells per read bitline.......................................................................... 67 
  x 
List of Figures
Fig. 1.1. Process layers and materials for a gel-OTFT....................................................... 5 
Fig. 1.2. Basic Operations of a gel-OTFT.......................................................................... 7 
Fig. 1.3. IDS-VGS curve of a sample gel-OTFT (W=500µm, L=25µm)........................... 7 
Fig. 2.1. Typical design-fabrication-testing flow of gel-OTFT circuits............................. 9 
Fig. 2.2. Overall design-fabrication-testing flow for higher design efficiency and 
yield..................................................................................................................... 11 
Fig. 2.3. (a) Simulated (behavior and compact modeling) vs. measured current-voltage 
curves, (b) ring oscillator waveforms at 2.2V supply using behavior modeling, 
and (c) waveforms showing monte carlo simulation capability using compact 
modeling.............................................................................................................. 14 
Fig. 2.4. Hysteresis modeling sub-circuit......................................................................... 16 
Fig. 2.5. Simulated (a) Ids-Vgs curves and (b) inverter transfer curves with hysteresis.    
…………………………………………………………………………………. 17 
Fig. 2.6. Aerosol-jet printing illustration [22].…………………………………….……. 21 
Fig. 3.1. Measured transfer curve of an all organic inverter [9]...............................…… 24 
Fig. 3.2. Measured waveforms of a 1.0V NAND gate [9].……………………………... 24 
Fig. 3.3. Measured 36Hz 5-stage ring oscillator waveform at 2.2V supply voltage........ 25 
Fig. 3.4. Circuit (above) and measured waveform (below) of a 5-stage transistor-loaded 
ring oscillator at 2.0V operation voltage [9]....................................................... 26 
Fig. 3.5. (a) Circuit and layout of a D-flipflop implemented with gel-OTFT and (b) 
simulated and measured waveforms operating at 5Hz clock frequency under 
  xi 
1.5V [9].……………………………………………………………………..… 28 
Fig. 4.1. An SRAM cell in a p-type-only technology suffers from significant static power 
consumption.....………………………………………………………………... 31 
Fig. 4.2. Three operation modes of gain cell DRAM: write, hold and read..................... 32 
Fig. 4.3. Simulated data retention characteristics showing a retention time of 37 seconds. 
………………………………………………………………..………………...32 
Fig. 4.4. Conceptual diagram of a flexible display and sensor array system................... 35 
Fig. 4.5. Basic testing structure with the read current indicated in blue.......................... 35 
Fig. 4.6. Measured write and read waveforms................................................................. 36 
Fig. 4.7. (a) Measured retention time map for WWL=1.25V and (b) percentage of failing 
cells versus retention time for WWL of 1.25V and 1.30V, respectively........... 37 
Fig. 4.8. Retention time versus (a) WWL pulse width and (b) supply voltages.............. 38 
Fig. 4.9. Measured power consumption of an OTFT SRAM (static power only) and an 
OTFT DRAM for different WWL voltages and different operation scenarios. 
…………………………………………………………………………………39 
Fig. 4.10. Chip photograph of the 8x8 printed organic DRAM and performance summary. 
………………………………………………………………………………... 39 
Fig. 5.1. (a) Schematic and (b) layout of a 3T PMOS gain cell.  A PMOS cell has roughly 
an order of magnitude lower gate tunneling leakage than its NMOS counterpart 
and hence significantly improves the data retention time.................................. 43 
Fig. 5.2. Cell retention characteristics with process variations........................................ 44 
Fig. 5.3. Cell retention characteristics using Monte Carlo simulations.  The two criterions 
  xii 
for determining the maximum cell retention time are (1) VD0<0.6V for meeting 
a target read speed and (2) VD1-VD0>0.2V to ensure sufficient read current 
difference............................................................................................................ 46 
Fig. 5.4. Bitline delay dependency on TOX and VTH after a hold period of (a) 0µsec (b) 
300µsec............................................................................................................... 47 
Fig. 5.5. VD0 after 100µsecs of hold mode for different (a) TOX and (b) VTH values.  
The VD0 voltage primarily depends on the TOX of the STORAGE and WRITE 
devices................................................................................................................ 49 
Fig. 5.6. Bitline delay at VD0=0.15V for different (a) TOX and (b) VTH values.  Bitline 
delay at a fixed VDO depends on the VTH of the STORAGE and READ 
devices................................................................................................................ 49 
Fig. 5.7.  (a) One step and (b) two step Monte Carlo simulation methods for estimating 
gain cell eDRAM performance using standard circuit simulator features. …... 51 
Fig. 5.8. Bitline delay results of the one step and two step Monte Carlo methods at a hold 
mode time of (a) 10µsec and (b) 300µsec.......................................................... 53 
Fig. 5.9. Scatter plot shows the correlation between bitline delays at different hold 
periods.  (a) 10µsec vs. 0 µsec and (b) 300µsec vs. 0µsec................................. 54 
Fig. 5.10. Equal performance contours for a hold period of 300µsec. The cumulative 
probability of the highlighted area is 0.20%....................................................... 56 
Fig. 5.11. Bitline delay contour plot in the TOX and VTH space for hold times of (a) 
0µsec (b)10µsec (c)100µsec and (d) 300µsec.................................................... 57 
Fig. 6.1. Random cycle time comparison between SRAM and eDRAM......................... 60 
  xiii 
Fig. 6.2. 2T1D gain cell with preferential boosting [38].................................................. 62 
Fig. 6.3. Cell retention characteristics without write-back for different read access rates. 
Coupling strength increases with cell voltage..................................................... 63 
Fig. 6.4. Cell voltage w/o write-back vs. read access rate................................................ 63 
Fig. 6.5. Read disturbance mitigation of 2T1D cell using (i) beneficial PCOU coupling, 
(ii) regulated WBL, and (iii) short local read bitlines......................................... 64 
Fig. 6.6. Schematic and timing of local and global sense amplifier architecture............. 65 
Fig. 6.7. (a) Bitline voltage window and (b) sensing delay versus area overhead........... 69 
Fig. 6.8. Power comparison for different row numbers (a) without and (b) with additional 
timing adjustment............................................................................................... 70 
Fig. 6.9. Dual-row access mode illustration (WL0 and WL64 selected)......................... 72 
Fig. 6.10. Failure percentiles for a 1kb sub-array (left) and detailed view of tail cells 
(right).................................................................................................................. 73 
Fig. 6.11. Measured retention maps for single (left) and dual (right) row access modes. 
No significant bitline dependency is observed in either case............................. 73 
Fig. 6.12. Power consumption comparison between a power gated SRAM with a 0.6V 
retention voltage and various 2T1D eDRAM power down modes..........,.......... 74 
Fig. 6.13. (a) Measured VDD shmoo and (b) static power comparison........................... 74 
Fig. 6.14. Microphotograph and summary of test chip characteristics............................. 75
  1 
Chapter 1 
INTRODUCTION 
1.1 Low Power Memory 
Memory is an essential component in most electrical systems nowadays, providing a 
low power and low cost solution for data storage and processing. In silicon industry, 
SRAM and DRAM have been prevailing for years. 6T SRAM has been dominating as a 
mainstream embedded memory due to its high performance and logic compatibility, 
while DRAM has been mainly serving as a standalone memory for its high bit-cell 
density. 
Recently, along with the rapid development in electronics technology, there is 
increasing demand for exploring into novel memory designs. One hot topic is to 
implement memory in emerging device platforms instead of the mature silicon 
technology, since innovative device technologies, including organic electronics, are 
proceeding towards the application-level stage to achieve features that were absent in the 
traditional silicon technology. Memory, as an indispensible component for any complex 
systems, is therefore an urgent need. The other topic is to investigate novel low power 
memory in the current silicon technology. Despite the mature design and fabrication 
techniques, the silicon industry is now suffering from increasing leakage currents 
following the scaling trend, which makes low power SRAM challenging in the future. 
We focus on these two topics in this dissertation, as the low power memory 
implementation in an emerging organic device platform, and the emerging circuit 
  2 
techniques for realizing a low power embedded DRAM (eDRAM) in silicon technology. 
Although organic and silicon device platforms target at two different application scopes, 
i.e. large-area & flexible versus small-size high performance applications, low power is 
always a desired feature in both.  
In this dissertation, we will first evaluate and solve the design and fabrication 
challenges in organic device platform, and demonstrate an organic memory 
implementation. Then we will extend the low power memory investigation into silicon 
technology. In both platforms, we find gain cell DRAMs being a promising low power 
solution. 
1.2 Organic Device Platform Overview 
Large area flexible electronics technology has been drawing much attention recently 
as it has the potential of revolutionizing the ways how human society interacts with 
responsive electronic systems. One can imagine a smart bandage that not only protects 
your injury but also detects whether infection is occurring somewhere, and thus notifies 
medical-care personnel. Or one can also wear a dress that brings not only fashion but also 
physiological monitoring [1]. Research on blast dosimeter is already underway [2], which 
will be a low-cost, disposable flexible attachment in helmets to monitor pressure, 
acceleration, and sound, and thus help identify soldiers’ traumatic brain injury. Other 
applications include flexible electronic readers or photo frames that can be folded and 
rolled like today’s newspaper, and low-cost computers that can be folded into pockets or 
worn around your wrist. 
  3 
The modern silicon technology, in spite of its impressive performance, mature design 
and fabrication process, and well-developed circuit techniques, is unable to fulfill the 
requirements for large area flexible electronics. The rigid crystal lattice and the high 
temperature involved during fabrication make it incompatible with flexible substrates, 
while the complicated process steps and expensive masks prevent silicon chips being 
scaled up in area. Therefore, a novel solution is needed to accommodate the desired 
properties of large area flexible electronics. 
Organic electronics, a branch of electronics dealing with conductive polymers and 
plastics, serve as a promising candidate. Organic Thin-Film-Transistors (OTFTs) refer to 
devices primarily built using organic instead of silicon-based materials. They are now 
drawing increasing attention because of their attractive attributes such as structural 
flexibility, low temperature processing, large area coverage, and low cost [3]. Printable 
OTFTs have also been reported [4], which is desired to further lower the cost for large 
area flexible applications. OTFTs cannot match the performance of silicon-based 
transistors, but can complement them by enabling flexible electronic systems which don’t 
have to operate at high speed.  Although various OTFT devices have been proposed with 
significantly different characteristics, most of them share the similar electrical behaviors 
with silicon transistors. 
Typical organic applications include passive RFID tags [5] and active display 
matrixes [6]. Although the introduction of a viable n-type material has been delayed for 
most proposed organic techniques since existing materials have unstable characteristics, 
no good contact, and significantly lower mobility than their p-type counterpart, p-type-
  4 
only logic has been demonstrated to deliver practical performance in many circuits. 
Nowadays organic electronics have made rapid progress mainly in digital applications; 
for instance, most recently an 8-bit organic microprocessor on plastic foil has been 
reported [7]. Meanwhile, achievements in analog applications have also been presented, 
including various digital-to-analog converter (DAC) designs [8].  
Despite the rapid progress in the organic electronics area, it is still in its research 
stage. As an emerging interdisciplinary topic spanning electrical engineering and 
chemical engineering, there are still several practical problems. Few studies have 
examined the special circuit design strategies needed for organic transistor 
characteristics, and no efficient design and fabrication flow has ever been reported, even 
though these are critical aspects that need to be addressed for this emerging technology to 
take off. It is generally accepted that the most urgent need is to verify the capabilities of 
numerous OTFT variants and propose the best candidates to lead the flexible electronics 
area, and in addition, researchers need to find solutions for design and fabrication 
challenges.  
1.3 Ion-Gel Gated OTFT Technique 
Organic transistors proposed so far suffer from low carrier mobility, which usually 
results in a high operation voltage [5][7]. Few attempts have been made on low-voltage, 
printable and flexible organic transistors. An ion-gel gated OTFT (gel-OTFT), however, 
is capable of providing sufficient drain current under a low supply voltage by utilizing the 
high gate capacitance from an electrolyte gated structure [4][9]. Meanwhile, the materials 
utilized in a gel-OTFT can be printed on flexible substrates like plastic foils. Table 1 
  5 
compares various recently proposed organic devices regarding the four major desired 
features. Gel-OTFT achieves most of the features except for the complementary logic, 
which can be resolved by implementing p-type-only logic circuits. 
Table 1. Comparison of recent OTFT techniques on the four most desired features. 
 
 
Fig. 1.1. Process layers and materials for a gel-OTFT. 
A gel-OTFT [3]-[4][9] is fabricated based on ion-gel dielectric material and the 
  6 
mechanism of electrolyte, wherein ion-gel refers to an electrolyte formed by gelation of a 
triblock copolymer in an ionic liquid, acting as the gate dielectric. Fig. 1.1 shows the 
basic structure and Fig. 1.2 shows the electrolyte mechanism [3]. Poly(3-hexylthiophene) 
(P3HT) is printed between the two gold electrodes (source and drain) as the channel 
layer. An ion-gel layer covers the entire channel layer and a conducting polymer 
poly(3,4-ethylenedioxythiophene) : poly(styrenesulfonate) (PEDOT:PSS) is printed on 
the top as the gate electrode. Although other materials can be implemented as the channel 
layer, e.g. carbon-nanotube, in this dissertation we refer to P3HT-channel transistors as 
gel-OTFTs. Under zero bias, the ions in the ion gel are randomly distributed; when the 
gate is biased with a negative voltage, the ions are polarized by the electrical field and 
interfaces are formed near both surfaces of ion gel, creating two equivalent capacitors in 
serial. Since the distance between charges and ions at each interface is significantly 
small, the equivalent gate capacitance is high. Moreover, the ions can penetrate into the 
polymers, e.g. the channel layer, leading to a 3-D capacitor structure, which further 
boosts the gate capacitance. The resulting gate capacitance is tens of µF/cm2, which is 
one or two orders higher than that of a modern silicon transistor (around 1.4µF/cm2 in 
65nm CMOS) [9]. Therefore, in spite of the relatively low carrier mobility, the high gate 
capacitance enables the generation of a considerable current under a low operation 
voltage. Measurements have shown that the drain current is in the order of mA with a 
dimension of W/L = 500µm/25µm under 1V supply voltage. Fig. 1.3 [3] shows the 
current-voltage curve of a sample transistor when sweeping the gate-to-source (VGS) 
voltage. Moreover, this capacitance is essentially independent of the gel thickness (except 
  7 
during polarization period), eliminating the challenge of sub-100nm thickness control in 
printing process [9]. 
 
Fig. 1.2. Basic Operations of a gel-OTFT. 
 V D = - 0 . 1  V
 V D = - 1 . 0  V
 
Fig. 1.3. IDS-VGS curve of a sample gel-OTFT (W=500µm, L=25µm). 
The rest of the dissertation is organized as follows. Chapter 2 discusses the issues 
involved in the current design and fabrication flow of organic electronics, and proposes 
solutions, mainly involving a specifically developed Organic Process Design Kit 
  8 
(OPDK), to significantly improve design efficiency and fabrication yield. Chapter 3 
demonstrates the successful implementations of various organic circuit components, 
including an organic D-flipflop. Chapter 4 describes in detail on the design strategies and 
measurement results for an organic dynamic memory (DRAM) cell, which achieves a 
sub-10nW-per-cell refresh power. Chapter 5 evaluates the same cell structure in silicon 
technology, revealing interesting facts in its corner simulation strategy. Chapter 6 
proposes a novel silicon DRAM design, which achieves 1GHz random read frequency 
and lower power consumption compared to 6T SRAM with write-back-free read feature 
and local-sense-amplifier (LSA) architecture. Chapter 7 concludes the dissertation. 
  9 
Chapter 2 
ORGANIC PROCESS DESIGN KIT (OPDK) 
Designs in organic electronics are typically done in a fully-customized manner. 
However, nowadays an efficient design and fabrication flow is absent. Fig. 2.1 shows the 
typical flow for gel-OTFT circuit implementations, with several major design and 
fabrication challenges. First, there was no software kit to improve the overall design 
efficiency; second, circuit simulation, which is an important step to verify design quality 
before a real implementation, was not facilitated; third, the examination of pattern 
(layout) correctness was not guaranteed except for naked-eye checks which easily 
involve human errors; and finally, the lack of automatic printing prevents the fabrication 
approach being benefitted from its potential of high throughput. These challenges 
significantly reduce the overall design quality and delay the research progress towards 
more complicated systems. 
   
 
Fig. 2.1. Typical design-fabrication-testing flow of gel-OTFT circuits. 
In silicon area, computer-aided-design (CAD) tools have significantly improved the 
efficiency and yield of designs through performance simulations and various layout 
examinations. It is critical for organic electronics to incorporate the same strategy in 
order to scale up in system complexity. Fortunately, sharing a lot in common with silicon 
transistors, organic thin-film-transistors (OTFTs) are able to adapt these existing 
  10 
techniques. For example, among the mainstream computer-aided-design (CAD) software, 
Cadence® [12] Virtuoso design environment is feasible to be utilized for designing both 
circuits and layouts of OTFTs. Circuit simulation software such as Synopsys® [13] 
HSPICE is also an excellent candidate to verify the design functionality and predict the 
performance.  
These existing tools, however, cannot be directly utilized since the major 
characteristics and materials/structures between silicon and organic devices are still quite 
different. Therefore, Organic Process Design Kit (OPDK) [14] targets at building a 
design package following the actual properties of OTFT. With the assistance of 
Cadence® Virtuoso and Synopsys® HSPICE, it enables computer-aided designs on both 
circuit schematic and layout, significantly reducing the design effort for large-scale 
systems. 
2.1 The New Efficient Design – Fabrication – Testing Flow  
In our proposed flow, industry standard Computer-Aided-Design (CAD) tools from 
Cadence® and Synopsys® are used throughout the circuit design and implementation 
stages. This approach could facilitate the adoption of organic transistor technology in the 
future as it will allow chip designers to utilize the existing design environment built 
around silicon technology for organic transistors. In this dissertation, our discussion 
focuses on gel-OTFT technology, although the presented flow as well as the 
methodologies applies to other organic devices or techniques as well. The entire design 
flow is shown in Fig. 2.2. 
  11 
 
 
 
Fig. 2.2. Overall design-fabrication-testing flow for higher design efficiency and yield. 
The flow starts from device modeling. A device compact model is developed based 
on the current-voltage curves obtained from individual devices as well as the measured 
data from a ring oscillator. Channel mobility and threshold voltage can be extracted from 
the former results while the capacitance can be estimated from the latter. The HSPICE 
circuit simulator from Synopsys® is used to replicate the hardware data to ensure the 
  12 
fidelity of the compact model. Once the compact model has been verified with measured 
data, we design basic logic gates including inverters and NAND gates, which are the 
basic components of larger complex integrated circuits such as flip-flops, adders, etc. 
Such a hierarchical design approach based on a library of basic logic gates is typical in 
the development of any large-scale digital system. At the schematic level, each logic gate 
is optimized for speed, output voltage swing, and power consumption by adjusting the 
dimensions of the device, namely channel length and channel width, as well as the load 
resistance value. The optimized schematic is then converted to a layout view which is 
used to create patterns for the actual printing process. Both the schematic and layout 
designs are performed using Cadence® software which can also automatically check 
whether the layout meets fabrication constraints and corresponds to the original 
schematic of the design. This capability is crucial when designing complex circuits as a 
manual check by the designer will be prone to human errors. Using the basic logic gates 
and the compact model, schematic and layout of larger circuits can be built in a 
hierarchical fashion. The final layout of the integrated circuit is then converted to real 
printing files for automated fabrication. Finally, measurements are performed and results 
are compared with simulations to ensure the validity of the design flow, where 
LabVIEWTM [15] is incorporated to facilitate the data collection. It is worth noting that 
the compact model and the design flow can be further refined to account for any device 
variation or aging-related degradation.  The details of individual steps are described in the 
following sections. 
 
  13 
2.2 Organic Device Modeling 
Modeling of organic transistors, as an essential step to perform simulations, has not 
yet been researched a lot. Here we introduce three modeling methodologies that are easy 
to incorporate in most organic techniques and can maintain sufficient accuracy in many 
design scenarios. 
2.2.1  Behavior Modeling 
 An OTFT can be easily modeled via behavior modeling [16], which builds a look-
up table with measured current-voltage data. This method requires little modeling effort 
and provides high accuracy, therefore is most suitable for evaluating new device 
behaviors. The major issue is that outliers are usually inevitable due to measurement 
errors, which introduce abruptions in the data curve, involving a risk of convergence 
failures during circuit simulations. Therefore, data need to be processed in advance 
through certain averaging algorithms to generate a smooth curve. Also, each device of a 
different dimension requires one individual model, limiting its flexibility in practical use. 
The blue plot in Fig. 2.3 (a) displays the behavior modeling of a single transistor after 
smoothing algorithm is applied. 
Capacitances are not involved in behavior modeling; therefore it does not support 
timing analysis. However, this can be fixed by attaching an additional capacitor at the 
gate in the device model. This equivalent capacitance can be estimated from the 
measured frequency of a ring oscillator, as demonstrated in Fig. 2.3 (b). The equivalent 
capacitance value estimated for a gel-OTFT with a W/L of 500µm/25µm is around 1µF, 
which includes all the OTFT capacitance components as well as parasitic capacitance. 
  14 
-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0
-2.5
-2
-1.5
-1
-0.5
0
x 10-3
 
 
Measured Data
Behavior Model
Compact Model
I D
S
(m
A
)
V
_
O
U
T
 (
V
)Measured ta
Behavior Model
Compact Model
V
_
O
U
T
 (
V
)
 
Fig. 2.3. (a) Simulated (behavior and compact modeling) vs. measured current-voltage 
curves, (b) ring oscillator waveforms at 2.2V supply using behavior modeling, and (c) 
waveforms showing monte carlo simulation capability using compact modeling. 
2.2.2  Compact Modeling  
Compact modeling is the most standard solution in silicon industry. A compact model 
can easily handle transistors with various dimensions. Moreover, the capability of 
defining capacitance values inside the model allows timing analysis by nature. The 
flexibility of specifying device parameters also provides capability of analyzing device 
aging impacts and performing Monte Carlo simulations, which will become more 
essential with increasing transistor counts in larger-scale designs. Utilizing existing 
silicon models will save much development work due to the similarity between current-
voltage relationships of silicon and organic transistors, and is sufficiently accurate for 
many designs. However, this method still requires more modeling effort compared to 
  15 
behavior modeling with a considerable number of data-fitting iterations.  
In OPDK, an HSPICE Level-61 amorphous-Silieffortcon Thin-Film-Transistor (TFT) 
model is utilized to serve as the model prototype, because it shares several critical 
features with OTFT, including the absence of a body effect and the junction current. The 
key parameters to modify include the channel mobility, the dielectric thickness, the 
dielectric constant and the threshold voltage.  The red plot in Fig. 2.3 (a) represents the 
compact modeling of a single transistor and Fig. 2.3(c) shows the simulated ring 
oscillator output waveform of 500 Monte Carlo runs with a 25mV standard deviation in 
the threshold voltage. 
2.2.3  Advanced Modeling Through Sub-Circuit 
More advanced modeling based on behavior or compact modeling can provide even 
higher accuracy in organic device models. A typical solution is to group the primary 
model with other active or passive devices. For example, gate leakage current is 
important in some circuits, such as dynamic random access memory (DRAM) cells. This, 
however, is not included in the two modeling methods introduced above. A simple 
solution is to attach a resistor between the gate and the source/drain, creating a leakage 
path with the measured leakage current. In gel-OTFTs, this resistor is typically in the 
order of GΩ, and it provided reasonable estimate during the feasibility study of the 
organic DRAM cell in Chapter 4. 
Another example is the hysteresis modeling. Hysteresis was observed in gel-OTFTs 
during measurements between an increasing and a decreasing VGS. Although the detailed 
theoretical explanation is still under investigation, such phenomenon may be due to the 
  16 
ions temporarily trapped in the channel when bias voltages are applied. This hysteresis is 
important in certain robustness evaluations. To model this behavior, a more complicated 
sub-circuit modeling approach is adopted. It incorporates a decision circuit [18] to select 
between two transistor models which represent the two different behaviors due to 
hysteresis, respectively. A derivative circuit is implemented inside the decision circuit to 
detect the direction of VGS changing. The overall modeling scheme of a prototype model 
is shown in Fig.2.6. 
 
Fig. 2.4. Hysteresis modeling sub-circuit. 
The simulated and measured waveforms with hysteresis for a single gel-OTFT and an 
inverter are shown in Fig. 2.7.  The data were extracted from a transient simulation and 
re-plotted, because a DC sweep cannot incorporate the history impact. Although there is 
still room for improvement, this prototype serves well for a first-hand performance 
evaluation. 
  17 
0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1.0
0.2
0
0.4
0.6
0.8
1.0
Inverter Output
VINPUT (V)
-1.0 -0.8 -0.6 -0.4 -0.2 0-1 -0.8 -0.6 -0.4 -0.2 0
x 10
0.2
0
0.4
0.6
0.8
1.0
VGS (V)
W/L=500µm/25µm
(a) (b)
INPUT
OUTPUT
5k
1V
 
Fig. 2.5. Simulated (a) Ids-Vgs curves and (b) inverter transfer curves with hysteresis. 
2.3 Organic Process Design Kit (OPDK) in Cadence® 
Virtuoso Environment 
OPDK, as the core component in the proposed design-fabrication-testing flow, is a 
design kit developed for OTFT platform based on the framework of NCSU-FreePDK-
45nm kit [19]. This is the first reported design kit for organic electronics area. It includes 
gel-OTFT models described in the previous section for design and simulation purpose, 
and supports the major built-in functions of Cadence® Virtuoso, e.g. hierarchical 
schematic design, simulation file generation, etc., to facilitate organic circuit designs. 
Organic layers are defined for designers to easily draw layouts in the same way as they 
fabricate, and various examination tools guarantee a high design quality. Although the 
current OPDK focuses on gel-OTFTs, it can be easily adapted for other organic devices 
or techniques.  
  18 
2.3.1 Schematic Design  
      Schematic views in OPDK display the circuit designs in a straightforward way, and 
this circuit information can be directly extracted through built-in functions into complete 
simulation inputs. Similar to that in modern silicon technology, the hierarchical 
schematic design interface provides a systematic way to design and examine large-scale 
organic circuits. Circuit design-simulation iterations are therefore facilitated to provide a 
preliminary evaluation of circuit functionality and performance, which are critical in 
assuring a higher design efficiency and success rate. 
2.3.2 Layout Design 
With the absence of a computer-aided tool, organic-circuit designers have to design 
the layouts manually and examine them by naked eye. This can become significantly 
inefficient and impractical with a high probability of human errors when the circuit turns 
into larger-scale or becomes more complex. OPDK provides a complete set of layout 
resources for gel-OTFT, including the layer definition and layout examination tools. The 
layout can be designed in exactly the same way as in silicon industry with computer 
assistance, and can be directly turned into fabrication patterns. The layer definition 
provides all the layers required for gel-OTFT technique, and is highly extensive to 
support other organic device techniques. The layout design environment supports 
hierarchical design and array generation, providing an efficient way to create large-scale 
layouts with high precision. Various examination tools, including Design Rule Check 
(DRC) and Layout Versus Schematic (LVS) implemented with Mentor® Calibre [20], 
establish a key feature of OPDK to perform a complete and efficient examination over 
  19 
layout designs and thus significantly improve layout quality.  
2.3.2.1 Design Rule Check (DRC)  
Design Rule Check (DRC) examines whether the layout meets the pre-defined 
fabrication constraints, including the pattern width, spacing, enclosure and overlapping. It 
results in a higher yield by avoiding risky layout designs, and guarantees the 
completeness and correctness of device structures by checking for any missing or 
misplaced layers. Each layer of the organic device is examined in the similar way as that 
in silicon technology, with modifications to accommodate the actual fabrication details 
by employing additional dummy layers.  A substrate layer, which is represented by a 
dummy body contact in the device model and incurs no influence on device properties, 
provides the future capability of supporting organic transistors with back-gates. 
2.3.2.2 Layout Versus Schematic (LVS)  
The correctness of the entire circuit layout, including interconnections, is guaranteed 
by a Layout-Versus-Schematic (LVS) set. The software automatically compares the 
layout-extracted circuit information with the original design. This technique, widely used 
in silicon designs, guarantees the exact matching between layout and schematic designs 
and significantly saves the debugging effort by providing detailed diagnosis report.  
The existing LVS toolset for silicon transistors cannot be directly utilized due to the 
difference between organic and silicon devices. However, thanks to their similarity, LVS 
in OPDK is achieved by imitating silicon device structure with provided organic layers in 
the LVS algorithm. Appropriate selection and combination of different layers allow the 
generation of the correct recognition pattern while the structure completeness 
  20 
examination is accomplished by the DRC process. Other organic devices, such as 
capacitors, can be recognized by the LVS algorithm in a similar way, and the same 
concept can be utilized to support devices of other organic techniques as well. 
2.4 Automatic Printing 
Printability is a desired feature for large scale flexible circuits because it significantly 
lowers the fabrication cost compared to other approaches including photolithography. 
Ink-jet printing has been demonstrated [21] to implement circuits on plastic foils. 
However, this approach typically allows only low-density material ink. It also suffers 
from random directionality in the relatively large ink droplets, which makes resolution 
improvement challenging. 
In our work, we utilize the commercial aerosol-jet printing [4] provided by 
Optomec® Corporation [22]. Inks are first turned into dense aerosol by ultrasonic 
atomizers and then pushed through the printing nozzle, creating a dense and focused 
stream as shown in Fig. 2.8. High-density inks are therefore accommodated, and the 
continuous and tightly focused micro-droplet stream formed through the nozzle generates 
patterns with high uniformity and a resolution as small as 10µm [22]. It also offers an 
adjustable stand-off, which is the distance between the nozzle and the substrate, allowing 
even more flexibility in quality control. 
  21 
Gas in
Focused beam
to < 10 µm
Dense aerosol
Substrate
2 to 5 mm standoff
Various nozzle sizes
for line width control
Fine line capability (10 micron 
line width, 30 micron pitch)
 
Fig. 2.6. Aerosol-jet printing illustration [22].  
Despite the advantages in aerosol-jet printing, the printing condition may still vary 
through the entire printing process due to environment noises, which introduces 
variations and degrades circuit performance. Automatic printing is therefore crucial for 
large-scale organic circuitry fabrication to significantly enhance the fabrication 
efficiency, lower the cost, and improve the yield. Fortunately, the aerosol-jet printer is 
able to take and execute computerized instructions with real printing operations, 
providing the opportunity of printing organic circuits in an automated fashion. 
A conversion flow, including scripts and a parsing tool, enables the direct translation 
from the original Cadence Graphic-Database-System (GDS) file to a complete printing 
instruction list, which is then recognized by the printer for automatic printing. The 
fabrication time for each layer in an 8x8 organic DRAM array, to be introduced in 
Chapter 4 in details, is therefore reduced to minutes. This minimizes the fluctuations in 
printing conditions and guarantees higher yield as well as significantly smaller device-to-
device variations. 
  22 
2.5 Testing 
After circuit fabrication, device characteristics are first examined to evaluate the 
fabrication quality including current-voltage behaviors, device-to-device variations and 
stability. The target circuit is then tested to examine the functionality and performance, 
and the measurement results are compared with the simulated data. The entire testing 
process is assisted by LabVIEWTM to facilitate the data collection process and minimize 
human errors. 
2.6 Summary 
A new design-fabrication-testing flow for organic device platform, involving various 
modeling approaches and an organic process design kit (OPDK), facilitates the entire 
implementation of organic circuits and systems. Together with LabVIEWTM-equipped 
test interface, the flow incorporates automated computer support and handles the major 
challenges for designing and fabricating large-scale gel-OTFT-based systems, namely the 
design environment, performance simulation, layout examination and printing 
automation. The same concept and software kit can be extended to benefit other organic 
techniques without much difficulty. In the next two chapters, we will demonstrate several 
successful organic circuit implementations, which will be sufficient to verify the 
feasibility of standard circuit functions. 
  23 
Chapter 3 
BASIC LOGIC CIRCUIT DESIGN 
Theoretically, any circuit system, no matter how complicated, can be achieved by 
combining various basic circuit components. These components include combinational 
logic such as inverters and NAND gates, sequential logic such as D-flipflops, and special 
units such as memory bit-cells. In this chapter, we demonstrate these key components 
successfully implemented in gel-OTFT platform except the memory bit-cell (to be 
described in the next chapter), delivering practical performance under a low operating 
voltage. 
3.1 Combinational Logic Circuit Design  
Combinational logic provides the control functionality and is indispensible in any 
electrical system. The logic functions of NOT, AND, OR and their numerous 
combinations deliver the flexibility of achieving major logic expressions. Therefore, a 
successful demonstration of combinational logic gates is crucial to verify the potential of 
gel-OTFT technology. 
An all-organic inverter was implemented using Poly(3-hexylthiophene) (P3HT) as the 
channel layer with ion-gel gated structure [9]. Due to the lack of n-type devices, A 20kΩ 
PEDOT:PSS-based resistor was printed as the pull-down load. It is clearly shown in Fig. 
3.1 that the inverter output switches from digital ‘1’ (1.5V) to ‘0’ (0V) when the input 
sweeps from 0V to 1.5V, and vice versa. The maximum gain during switching is around 
7. Although in this sample circuit the trip point is close to VDD (1.5V) due to the weak 
  24 
pull-down resistor in the p-only logic, it could be effectively moved towards 0V side by 
reducing the strength ratio between the on-state transistor and the resistor, at the cost of 
inverter gain. 
 
Fig. 3.1. Measured transfer curve of an all organic inverter [9].  
 
Fig. 3.2. Measured waveforms of a 1.0V NAND gate [9].  
Our colleagues also constructed a 1.0V organic NAND gate by printing two identical 
gel-OTFTs and one 20kΩ resistor. The NAND gate demonstrates good NAND logic 
  25 
responses with two square-wave inputs switching at 100Hz and 200Hz, respectively, as 
shown in Fig. 3.2. With inverters and NAND gates successfully implemented, other 
combinational logic can be achieved through appropriate combinations of these logic 
units. 
3.2 Sequential Logic Circuit Design  
Sequential logic circuits are crucial in more complicated circuit systems where 
clocked operation is involved. It is widely used in state transferring, counting, etc. Ring 
oscillators were implemented to demonstrate clock generation, and a D-flipflop 
demonstrates the feasibility of sequential logic functions in ion-gel gated technology [9].  
0.00 0.05 0.10 0.15 0.20 0.25
-2
-1
0
 
Fig. 3.3. Measured 36Hz 5-stage ring oscillator waveform at 2.2V supply voltage.  
A ring oscillator composed of five inverters and one output buffer shows practical 
oscillation behavior in Fig. 3.3. It achieves an oscillation frequency of 36Hz at 2.2V 
supply voltage, with a 2.0V output voltage swing. Note that other operation frequencies 
for the same operation voltage can be realized by changing the stage number and detailed 
design parameters, including the transistor dimensions and resistor value. At the cost of 
an additional bias voltage one can also implement the ring oscillator using transistor-
loaded inverters [9], which enables tunable frequency, as shown in Fig. 3.4. A maximum 
  26 
frequency of 150Hz was achieved with an output swing of around 1.4V at 2.0V voltage. 
 
Fig. 3.4. Circuit (above) and measured waveform (below) of a 5-stage transistor-loaded 
ring oscillator at 2.0V operation voltage [9]. 
Our colleagues also implemented a D-flipflop, which to our knowledge was the first 
flipflop reported integrating electrolyte-gated transistors, with gel-OTFTs of 
W/L=500µm/25µm and 2kΩ resistors [9]. Fig. 3.5(a) shows the circuit and layout of the 
D-flipflop. It is composed of eight NAND gates and three inverters, and incorporates an 
asynchronous reset port [5]. Data is sampled at the falling edge of the clock signal. 
Assuming that all NAND and inverter gates are identical, the setup time of this D-flipflop 
is therefore 
	
	 = 2 +  																						 
		
 =  +  +  
Where “LowToHigh” and “HighToLow” indicate the data changing direction, and 
  27 
tNAND-rise and tNAND-fall are the rising and falling delay of NAND gates, respectively; the 
similar definition is also applied to inverter delays. 
The clock-to-output delay (c-to-q) is therefore 
		
	 =  +  +  
			
 =  +  +  
Where the tlatch is the delay for the SR-latch on the rightmost of the circuit schematic, 
composed of two NAND gates and two inverters, to stabilize. 
And the hold time is 
		
	 =  																										 
			
 =  +   
Fig. 3.5(b) shows the simulated and measured waveforms of the D-flipflop operating 
at 5Hz clock frequency, with measured setup time and clock-to-output delay both less 
than 50ms and hold time less than 10ms. The device variations and modeling inaccuracy 
contributes to the difference of timing parameters between simulation and measurement. 
 
  28 
In
p
u
t 
(V
)
Q
 (
V
)
2 mm
 
Fig. 3.5. (a) Circuit and layout of a D-flipflop implemented with gel-OTFT and (b) 
simulated and measured waveforms operating at 5Hz clock frequency under 1.5V [9].  
With ring oscillators and D-flipflop, clocked operations are now feasible in the gel-
OTFT platform. More advanced sequential functionality, including counters, shift 
registers, state machines and other flip-flop variants, could be realized by combining D-
flipflops with combinational logic. This significantly extends the application scope of the 
gel-OTFT technique.  
3.3 Summary  
The successful implementation of basic combinational and sequential logic units has 
well demonstrated the potential of gel-OTFTs to achieve complicated circuit 
functionalities for application purpose. Although the operation speed is not comparable 
with that of modern silicon technology, it is sufficient for many envisioned applications 
such as RFID tags that require its unique large-area and flexible properties. 
Memory circuits, however, present additional design challenges. Appropriate design 
  29 
strategies are needed to fit the properties of the OTFT platform. In the next chapter, we 
report the first DRAM cell in a printable, flexible and low-voltage ion-gel gated OTFT 
technology [1], which has no DC static power and achieves considerable performance 
improvement at operating voltages below 1V, overcoming the fundamental limitations of 
OTFT based SRAMs. 
  30 
Chapter 4 
ORGANIC GAIN CELL MEMORY DESIGN 
4.1 Memory Design Discussion  
Memory is a key component in many OTFT applications but few attempts have been 
made on building memory circuits suitable for OTFTs [23]-[26]. Non-volatile memory 
has been the major topic among the few achievements, which, however, usually requires 
additional process steps or materials, such as floating gates, ferroelectric materials, and 
fuses [23]-[25]. Volatile memory, on the other hand, serves well for many applications, 
and can be built using standard transistors with no additional cost and thus is easy to 
implement. Nevertheless, so far little effort has been made on organic volatile memory 
designs [26]. 
SRAM, as the mainstream embedded volatile memory in silicon chips, is no longer a 
practical low-power memory option in today’s OTFT technologies which only allow p-
type devices to be fabricated.  Although research on n-type OTFTs is currently underway 
[27], introduction of a viable n-type material has been delayed since existing materials 
have unstable characteristics, no good contact, and significantly lower mobility than their 
p-type counterpart.  Resistors or diode connected p-type transistors are therefore required 
to serve as a leaky pull-down network for SRAM functionality, as shown in Fig. 4.1, 
introducing considerable static power consumption. Moreover, unless the memory read 
operation can be sacrificed [26], the pull-down network must be strong enough to provide 
  31 
sufficient read noise margin, preventing any further power saving opportunities using an 
SRAM cell architecture.  
 
Fig. 4.1. An SRAM cell in a p-type-only technology suffers from significant static power 
consumption. 
Therefore, novel memory solutions are needed for low power purpose in p-type-only 
OTFT platforms.  Dynamic random access memory (DRAM), which may be 
implemented with p-type devices without introducing additional leaky paths, is thus a 
practical candidate.  
4.2 Gain Cell DRAM  
DRAM is mainly divided into two categories, namely the 1T1C and the gain cell 
DRAM, respectively. 1T1C DRAM, although prevailing in standalone silicon memory 
nowadays, is not suitable for current OTFT platform due to the requirement for a 
dedicated storage capacitor process. Gain cell DRAM, in contrast, can be implemented 
using three transistors, or even two transistors, in a standard logic process [28]. Despite 
using a single type of device for the cell, this circuit has no DC static current, making it 
an ideal memory candidate for OTFT technologies where only p-type devices are 
  32 
available. The main design challenge for gain-cells is the short retention time, which is 
typically less than 100µsec in the modern silicon platform due to the small gate 
capacitance used for storing the cell data [28]. 
Gel-OTFTs serve as an ideal device platform for DRAMs as their gate capacitances 
can exceed 100 µF/cm2 which is roughly 70x higher compared to that in 65nm CMOS 
[9].  The unusually high capacitance is derived from the ultra-thin electrical double layers 
formed upon electrical polarization as discussed in Chapter 1, and can be used to enhance 
DRAM retention time and simultaneously achieve a much higher drive current at low 
supply voltages compared to other OTFT devices.   
 
Fig. 4.2. Three operation modes of gain cell DRAM: write, hold and read. 
 
Fig. 4.3. Simulated data retention characteristics showing a retention time of 37 seconds. 
  33 
Fig. 4.2 shows a gain cell DRAM implemented with three transistors and its three 
operation modes.  The leakage components during hold mode are also illustrated in Fig. 
4.2, namely the gate currents of the storage and write devices and the subthreshold 
leakage of the write device. A boosted write wordline (WWL) is preferred to suppress the 
subthreshold leakage, which is the common practice in all silicon DRAM designs. This 
boosted voltage can be efficiently generated using a single-stage charge pump consisting 
of a p-type gate capacitor and a diode connected p-type device. These leakage currents 
surrounding the storage node lead to a loss in the cell voltage over time as depicted in 
Fig. 4.3.  Note that for a 3T p-type cell, the retention characteristic of data ‘0’ (D0) is 
more critical than data ‘1’ (D1) since the storage node is surrounded by high voltages 
resulting in all leakage currents in the pull up direction. Moreover, the read current for 
driving the read bitline (RBL) is determined by the data ‘0’ voltage because it is 
equivalent to the gate overdrive of the p-type read device.  
The degradation in the D0 voltage affects the bitline delay so a periodic refresh 
operation is required to restore the cell voltage, whose frequency and the resulting power 
consumption are determined by the cell retention time.  In general, the following two 
criterions determine the maximum retention time of a gain cell: (1) the cell node voltage 
for D0 should be lower than a specified value (or D1 being high enough for an n-type 
cell) so that a target read speed can be met, and (2) the cell voltage difference between 
D0 and D1 should be greater than a certain value for sufficient sensing margin.  In our 
case, the first criterion is more stringent and hence determines the overall retention time.  
Fig. 4.3 shows a simulated retention time of 37 sec based on a criterion of VD0<0.5V and 
  34 
VD1-VD0>0.2V as annotated in the figure, thanks to the significant gate capacitance and 
relatively small leakage currents. Note that although device area also contributes to the 
high capacitance, merely increasing the device size in a silicon design will not benefit in 
terms of retention time (typically less than 100µs) because the dominating leakage 
components, namely the gate and junction leakages, are also proportional to the area. 
Therefore the retention time in a silicon gain cell, which is essentially determined by the 
ratio between the capacitance and leakage currents, would remain the same regardless of 
the size increase.  
4.3 Test Vehicle Design Details 
An 8x8 DRAM array was fabricated using an aerosol-jet printer to demonstrate the 
feasibility of gain-cell DRAMs in a p-type only OTFT technology. Automatic printing, as 
introduced in Chapter 2, significantly reduces fabrication time and enhances yield. Our 
design philosophy was to implement a general-purpose memory array with full read and 
write capability that can be utilized for a range of applications including electrochromic 
displays and/or sensor sheet arrays as shown in Fig. 4.4. The system in vision utilizes 
logic units demonstrated in Chapter 3 serving as the control module, while memory, 
which cannot be easily accomplished by combining logic gates, is needed as the core to 
store data for driving the display pixels, or to collect data from the sensor array. 
  35 
...
 
Fig. 4.4. Conceptual diagram of a flexible display and sensor array system. 
WWL
WBL
RBL
RWL
Cell 
Node
VDD
23.5K
RBL Target: 
VDD – 0.1V
 
Fig. 4.5. Basic testing structure with the read current indicated in blue.  
  The basic testing structure for single cells in the 8x8 array is shown in Fig. 4.5. Data 
is written into the cell by driving WWL low. When read is enabled, a cell storing with 
  36 
data ‘0’ will have a large pull-up current and drive the RBL high. A pull-down resistor 
attached to the RBL resets the RBL signal after the read is completed. WBL is kept high 
after the write access to test for the worst case data ‘0’ retention condition. The steady-
state RBL voltage level is used to calculate the voltage stored in the cell at the time of 
access, based on current-voltage characteristics from sample transistors. We define 
retention time as the time it takes for the data ‘0’ voltage inside the cell to increase to 
0.5xVDD, and the resistor load is therefore selected so that a 0.5xVDD voltage level in 
the data storage node translates into an RBL level of (VDD-0.1V). 
4.4 Measurement Results  
Measurements were automated with LabVIEWTM software to collect reliable 
retention time statistics.  Measured waveforms in Fig. 4.6 verify full read and write 
functionality at 1V supply voltage. 
V
o
lt
a
g
e
 (
V
)
V
o
lt
a
g
e
 (
V
)
 
Fig. 4.6. Measured write and read waveforms. 
  37 
The retention time statistics of the DRAM array at 1V are summarized in Fig. 4.7. 
With a boosted WWL of 1.25V, the worst case cell has a retention time of 30 seconds 
while 72% of the cells have retention times longer than a minute. With a slightly higher 
boosted WWL of 1.30V, all cells in the array have a retention time exceeding one minute. 
This translates into a 5 orders of magnitude longer retention time compared to a gain-cell 
design in 65nm CMOS [28], which is sufficiently long even for the slow OTFT circuits to 
support a periodic refresh operation.  
1 2 3 4 5 6 7 8 9
 
 
 
Fig. 4.7.  (a) Measured retention time map for WWL=1.25V and (b) percentage of failing 
cells versus retention time for WWL of 1.25V and 1.30V, respectively.  
Fig. 4.8(a) shows the impact of WWL pulse width on retention time indicating that a 
20 ms write pulse is needed to achieve a retention time greater than one minute.  This can 
be further improved if a negative WWL voltage is available. The read delay, defined by 
the 50% rise time of the RBL signal when reading from a data ‘0’ cell, was 12 ms or less. 
The memory cells show robust read and write operation down to 0.8V, as indicated in Fig. 
4.8(b). 
  38 
 
Fig. 4.8. Retention time versus (a) WWL pulse width and (b) supply voltages. 
Our printed DRAM design also demonstrates significant potential in the view of 
power consumption. Due to the extremely long retention time, the refresh power of the 
proposed memory cell is in the nano-watt order. Fig. 4.9 compares the measured power 
consumption of an SRAM and a gain-cell DRAM implemented using the same kind of 
gel-OTFT. Individual SRAM memory structures were tested for this comparison. Results 
show that the overall power consumption of an OTFT DRAM (including active, static, 
and refresh power) at a 50Hz working frequency is 12X lower than the static power 
consumption merely of an OTFT SRAM. In standby mode where all RBLs are left 
floating except for when the cells are refreshed, the DRAM power consumption per cell 
is reduced to below 10nW. Fig. 4.10 shows the photograph and performance summary of 
the organic DRAM chip. 
  39 
1000
* 4T SRAM using ion-gel 
TFT devices with 10k 
pull-down resistors for 
an optimized SNM of 
80mV @ 1.0V, 25ºC
** Operating frequency = 
50Hz 
*** Refresh power 
4T SRAM
100
10
1
0.1
0.01
0.001
98µW
8µW 8µW
11nW
5.5nW
SRAM*
(Pstatic only)
DRAM 
(active**)
DRAM 
(active**)
DRAM 
(stdby***)
DRAM 
(stdby***)
IDC
1/12
1/17,818
‘1’ ‘0’
BL BLB
WL WL
 
Fig. 4.9. Measured power consumption of an OTFT SRAM (static power only) and an 
OTFT DRAM for different WWL voltages and different operation scenarios.  
24 mm
Process
Channel material
Array size
Retention time*
Standby power**
Ion-gel organic TFT
Poly(3-Hexylthiophene)
64 bits (8 WLs, 8 BLs)
30 sec @ 1.25V WWL
> 60 sec @ 1.30V WWL
11 nW/cell @ 1.25V WWL 
<5.5 nW/cell @ 1.30V WWL
* Measured @ 1.0V, 25 ºC
** Measured @ 1.0V, 25 ºC, refresh period of 30 
sec (WWL=1.25V) and 60 sec (WWL=1.30V)
Read delay < 12 msec
Write delay < 20 msec
Array dimension 24 x 25 mm
2
Operating voltage 0.8V - 1.2V
Active power**
8 µW/cell @ 1.25V WWL 
8 µW/cell @ 1.30V WWL 
 
Fig. 4.10. Chip photograph of the 8x8 printed organic DRAM and performance summary. 
 
 
  40 
4.5 Summary 
An 8x8 DRAM array was successfully implemented in gel-OTFT platform as the first 
general-purpose organic DRAM design ever reported. The design delivers practical 
performance under 1V supply voltage and a retention time being 5 orders of magnitude 
longer than that in silicon technology, thanks to the significant gate capacitance induced 
by the electrolyte-gated structure. The retention time is sufficiently long even for low 
speed OTFT circuits, which results in a significantly lower power consumption compared 
to an SRAM cell. The gain cell DRAM design therefore serves as a desired low power 
solution for general purpose memory in the organic device platform.  
In the next chapter, we will also explore the potential of gain cell DRAM in silicon 
platform, which is typically in nanometer-scale in modern industry. Although SRAM and 
1T1C DRAM has been dominating in silicon embedded memory area targeting at high 
speed with practical retention time, technology scaling is making the designs challenging 
especially for low power purpose. Gain cell DRAMs, with innovated circuit 
enhancement, may be a promising solution to overcome these design challenges. 
  41 
Chapter 5 
SILICON GAIN CELL MEMORY ANALYSIS 
5.1 Introduction  
6T SRAM has been the mainstream for embedded memory in silicon industry for its 
high density and performance. However, the increasing leakages and the contention 
between read and write paths make it challenging in future technology scaling. In recent 
years, embedded DRAMs (eDRAMs) have been drawing interest as a potential 
alternative.  1T1C type eDRAMs have been adopted for last level caches in several high 
performance server chips, delivering 5X+ higher bit-cell densities and a random cycle 
time of 2.0nsec [29]-[33].  Nevertheless, despite successful deployment of 1T1C eDRAM 
in recent server products, the complicated process steps involved in building the storage 
capacitor and the special access transistor, coupled with the limited signal swing at low 
supply voltages, make the scaling of this eDRAM technology unfavorable.  
Gain cells, which are typically 2X denser than 6T SRAM cells, are logic compatible 
and have good low voltage margin thanks to the decoupled read and write paths.  They 
also have a read port capable of driving long bitlines, making them competitive at low 
voltages.  Attaining practical retention times of 100’s of µsecs and improving the random 
access speed remain as formidable challenges for future gain cell eDRAM designs. Table 
2 compares the circuit parameters of interest for the three types of embedded memory, 
where the two major categories of gain cell eDRAMs, namely 2T and 3T, are shown 
  42 
Table 2. Comparison of various embedded memory options. 
 
In this chapter, for the first time, we present the general simulation analysis of a 
conventional 3T gain cell eDRAM with process variations, since the cells with a short 
retention time (tail cells) determine the array retention time. The actual chip design with 
innovated circuit techniques to enhance low-power eDRAM performance will be 
described in the next chapter. 
5.2 Silicon Gain Cell EDRAM 
Similar to that in Chapter 4, Fig. 5.1(a) shows a conventional gain cell eDRAM 
implemented with 3 silicon transistors (i.e. WRITE, READ, and STORAGE devices).  
Note that although complementary logic is supported in silicon platform, a PMOS cell 
has roughly an order of magnitude lower gate leakage than its NMOS counterpart and 
  43 
hence improves the data retention time by a factor of 3~4.  Fig. 5.1(b) shows the layout of 
a single cell in an industrial 1.2V, 65nm low power CMOS logic process.  The Write 
WordLine (WWL) of the PMOS gain cell is either driven to a negative voltage (typically 
around -0.5V) during write access to eliminate the threshold voltage drop, or held at a 
boosted voltage (typically VDD+0.5V) during hold mode to cut off the sub-threshold 
leakage.  Due to the positive and negative out-of-rail voltages, the operating voltage of an 
eDRAM cell must be carefully determined based on the gate dielectric reliability.  The 
W/L of each device is optimized for minimum cell area, longer retention time, and 
tolerance against process variation. In this analysis, we use a sizing of 150nm/60nm, 
265nm/80nm and 150nm/90nm, for the READ, the STORAGE, and the WRITE devices, 
respectively.     
 
Fig. 5.1. (a) Schematic and (b) layout of a 3T PMOS gain cell.  A PMOS cell has roughly 
an order of magnitude lower gate tunneling leakage than its NMOS counterpart and 
hence significantly improves the data retention time.  
Fig. 5.2 shows the simulated retention characteristics of this silicon gain cell 
  44 
including the impact of device mismatches. For a 3T PMOS cell, the steady state voltage 
is close to VDD as most of the leakage currents in a PMOS device are inherently in the 
pull up direction.  Note that data ‘0’ (D0) is again more critical than data’1’ (D1), as 
explained in Chapter 4. In many silicon gain cell designs, the first criterion to determine 
the cell retention time, namely that the cell node voltage for D0 should be lower than a 
specified value (or D1 being high enough for an NMOS cell), is more stringent, although 
this could change depending on the target operating frequency.  Fig. 5.2 shows a 
maximum retention time of around 300µsec based on a criterion of VD0<0.6 and VD1-
VD0>0.2V as annotated in the figure, targeting at a 4ns random cycle time.  
Time (µsec)
0 400 800 1200 1600 2000
0.5
1.0
0
Data 1
VD1-VD0
>0.2V
VD0<0.6V
WWL
WBL
RBL
RWL
1.1V, 65nm, 85ºC
IJUNC
IGOV
IGATE
Data 0
 
Fig. 5.2. Cell retention characteristics with process variations. 
Fig. 5.3 displays the read-writeback timing of a 256 WL x 128 BL 3T gain cell 
eDRAM array, with more dedicated read/write schemes for high performance compared 
  45 
to the organic DRAM cell presented in Chapter 4.  The bitline voltages are pre-
discharged and the RWL is asserted.  Depending on the cell data, the bitline voltage 
either rises (D0) or remains low (D1).  Unlike SRAMs, DRAMs do not have a 
complementary bitline and therefore require a reference bitline voltage for the sense 
amplifier. A dummy cell produces a reference bitline level that is in between the D1 and 
D0 bitline levels.  A small bitline voltage difference is amplified to a full swing signal by 
a sense amplifier. This is followed by a write-back operation to restore the cell data.  The 
full read-writeback cycle consists of the precharge delay, the bitline delay, the sense-
amplifier delay, and the write-back delay.  Among these delay components, the precharge 
delay, the sense-amplifier delay and the write-back delay are relatively immune to 
process variations and can be optimized separately.  Bitline delay on the other hand, is 
much more sensitive to process variations as is the case in most memory designs.  For 
this reason, this study will focus on the bitline delay and not the full access cycle.  Note 
that the dummy bitline circuit can be designed to be tolerant to variation through optimal 
sizing or post-silicon tuning.  This is possible due to their limited numbers.  Hence, we 
only consider variation in the accessed cell for our simulations. 
  46 
 
Fig. 5.3. Cell retention characteristics using Monte Carlo simulations.  The two criterions 
for determining the maximum cell retention time are (1) VD0<0.6V for meeting a target 
read speed and (2) VD1-VD0>0.2V to ensure sufficient read current difference. 
5.3 Bitline Delay Variation 
5.3.1 Corner Simulation Pitfalls  
Designers use corner parameters such as SS (slow NMOS, slow PMOS) to estimate 
worst case circuit performance. To examine whether standard process corners capture the 
  47 
worst case performance for gain cells, we simulated the bitline delay while varying the 
TOX and VTH in the same direction for all devices in a 3T gain cell.  The σ(VTH) values 
were calculated according to the gate areas of each individual device.  We used 3σ values 
of 0.11V, 0.07V and 0.09V, for the VTHs of the READ, STORAGE, and WRITE 
devices, respectively.  3σ value of TOX was 0.45nm for all three devices.  Since we are 
considering a PMOS cell, we are interested in the read access time for D0 which 
determines the PMOS drive current and is more vulnerable to process variations due to 
the leakage in the pull up direction.   
-2 -1 0 1 2
x 10
-2 -1 0 1 2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
x 10
 
Fig. 5.4. Bitline delay dependency on TOX and VTH after a hold period of (a) 0µsec (b) 
300µsec.   
Fig. 5.4(a) shows the bitline delay at a hold time of 0µsec, which is equivalent to the 
cell being read immediately after being written.  In this case, the SS and FF corner delays 
are close to the worst and best case delays.  However, after a 300µsec hold time, neither 
process corners captures the worst nor best cases as shown in Fig. 5.4(b).  This is due to 
the fact that a fast corner device has both larger leakage and higher read current, and 
  48 
therefore, even though the read device is stronger, the larger leakage current makes the 
cell voltage degrade faster resulting in an overall increase in read delay.  Similarly, for 
the slow corner, the weak read device is “compensated” by the smaller loss in the cell 
voltage due to the lower leakage current which leads to a non-worst case delay.  Fig. 
5.4(b) also implies that after a long hold period, a thin TOX high VTH combination will 
result in the worst case, which is not a standard process corner included in device models.  
For this reason, it is necessary to examine the impact of TOX and VTH separately. 
5.3.2 Cell Node Voltage and Read Path Strength  
Bitline delay of a 3T PMOS cell is determined by two circuit parameters: (1) the D0 
cell node voltage at the time the cell is accessed, and (2) the read path drive strength.  The 
former is determined by the various leakage components surrounding the storage node, 
namely the gate tunneling leakage of the STORAGE and WRITE transistors, and the 
junction leakage of the WRITE transistor.  The latter factor (i.e. the read path drive 
strength) simply depends on the VTH of the STORAGE and READ devices that 
comprise the read path.  Note that the sub-threshold leakage of the WRITE device with a 
boosted WWL must be negligible compared to the other leakage components.  This is 
imperative for any type of DRAM cell that has to maximize the cell retention time.   
Fig. 5.5 shows the D0 cell node voltage, namely VD0, after a 100µsec hold period.  It 
indicates that TOX of the STORAGE and WRITE devices have the strongest impact on 
VD0.  Other parameters such as the TOX of the READ device or the VTH of any device 
have little impact on VD0.  On the other hand, the bitline delay at a fixed VD0 voltage in 
Fig. 5.6 indicates that the VTH of the STORAGE and READ determine the read path 
  49 
strength.  Note that TOX variation in the STORAGE and READ devices does affect the 
read path drive current through the change in COX but this effect is negligible compared 
to the VTH effect as shown in Fig. 5.5. 
V
D
0
(D
0
 c
e
ll
 n
o
d
e
 
v
o
lt
a
g
e
, 
m
V
)
 
Fig. 5.5. VD0 after 100µsecs of hold mode for different (a) TOX and (b) VTH values.  
The VD0 voltage primarily depends on the TOX of the STORAGE and WRITE devices.  
 
Fig. 5.6. Bitline delay at VD0=0.15V for different (a) TOX and (b) VTH values.  Bitline 
delay at a fixed VDO depends on the VTH of the STORAGE and READ devices. 
  50 
Table 3 summarizes the findings from Figs. 5.5 and 5.6.  TOX affects only the cell 
node voltage and not the read path strength while VTH affects only the read path 
strength.  This explains why standard corner simulations cannot capture the worst case 
cells.  Since both TOX and VTH are moving in the same direction for standard corners 
(e.g. FF corner has a thin TOX and low VTH, SS corner has a thick TOX and high VTH, 
and so on), the TOX and VTH effects compensate each other resulting in a non-worst 
case delay.  In reality however, a device with a thin TOX and high VTH combination 
gives the worst case delay due to the detrimental effect on both the cell node voltage and 
the read path strength.   
Table 3. Summary of the results from Figs. 5.5 and 5.6. 
TOX Variation VTH Variation
VD0 
@ fixed hold time
WRITE, 
STORAGE
Negligible 
impact
Read path strength 
@ fixed VD0
READ, 
STORAGE
Negligible 
impact
 
5.4 Practical Issues with Monte Carlo Simulations  
This section discusses the simulation strategies for analyzing gain cell performance 
under process variation.  The following two strategies (see Fig. 5.7) can be considered 
when using the standard Monte Carlo features of circuit simulators.  Other elaborate 
simulation strategies can be devised but will not be discussed in this paper as they 
involve a considerable amount of programming effort.  
  51 
 
Fig. 5.7.  (a) One step and (b) two step Monte Carlo simulation methods for estimating 
gain cell eDRAM performance using standard circuit simulator features. 
- One step Monte Carlo:  This straightforward method simulates the entire hold mode 
and read access sequence for each Monte Carlo parameter set.  This can be considered as 
the golden method which represents the real hardware.  The main drawback of this 
method is that, depending on the type of circuit simulator, the designer may not be able to 
view the accurate signal waveforms.  Such anomalies can occur when running 
simulations with a discrepancy in time scale.  For example, for our studies, the hold mode 
simulation requires a time frame of 100’s of µsecs while the read access requires only a 
few nsecs.   
 
  52 
- Two step Monte Carlo: Here, the hold mode and read access Monte Carlo 
simulations are run separately.  That is, the cell node voltage distribution is obtained in 
the first Monte Carlo run, and the resultant distribution is fed into the second Monte 
Carlo in which the read access time is simulated using a fresh set of device parameters.  
Since the time scale and simulation time steps can be determined separately for the two 
Monte Carlo runs, the simulator doesn’t introduce any anomalies and provides accurate 
signal waveforms.  However, this method uses different parameter sets for the two Monte 
Carlo runs, which is not realistic.  Moreover, the cell voltage distribution from the first 
Monte Carlo may deviate from a pure Gaussian which could introduce further errors. 
Despite the potential inaccuracy in the two step Monte Carlo method, simulation 
results in Fig. 5.8 indicate that the bitline delay distributions estimated by the two 
methods are extremely similar with a mean and sigma difference of only 0.00136nsec and 
0.0137nsec, respectively (Fig. 5.8(b)).  This can be attributed to the fact that the impacts 
of TOX and VTH variation on bitline delay are independent, as discussed in Section 5.3.  
In other words, the TOX variation impacts only the first of the two Monte Carlo runs 
while VTH affects the second run, so the fact that the two Monte Carlo runs use different 
variation data does not affect the final results.  Approximating the cell voltage 
distribution to an ideal Gaussian did not introduce a noticeable error.  Therefore, it can be 
concluded that the two step Monte Carlo can prevent simulator anomalies at the cost of a 
minute error compared to the golden method.  Although we chose to use the golden 
method (i.e. one step Monte Carlo) for this work, one can consider using the two step 
Monte Carlo if the detailed waveforms are needed for further circuit inspection.   
  53 
Bitline Delay (nsec)
0.50.4 0.6 0.7 0.8 0.9 21 3 4 5 6
10
20
30
40
50
60
70
0
5
10
15
20
25
30
35
0
Mean difference 
= 0.00074 nsec 
Stdev difference
= 0.0002 nsec
Bitline Delay (nsec)
1.1V, 65nm, 85°C, hold time = 10µsec
One step MC
Two step MC
(a) (b)
1.1V, 65nm, 85°C, hold time = 300µsec
Mean difference 
= 0.00136 nsec 
Stdev difference
= 0.0137 nsec
One step MC
Two step MC
 
Fig. 5.8. Bitline delay results of the one step and two step Monte Carlo methods at a hold 
mode time of (a) 10µsec and (b) 300µsec.  
5.5 Statistical Simulation Results 
Based on the gain cell variation behavior discussed in section 5.3, and using the 
golden simulation methodology described in section 5.4, this section investigates the 
statistical performance of gain cell eDRAM.  The test vehicle is a 256 WL x 128 BL 3T 
PMOS gain cell eDRAM array built in a 1.2V, 65nm LP CMOS process but operated at 
1.1V to incorporate the potential fluctuation in supply voltage.  
5.5.1 Comprehensive Analysis  
First we show results from a comprehensive Monte Carlo simulation including both 
global and local TOX and VTH variations for each device.  This method can effectively 
capture the worst case cells and give an accurate estimate on the amount of redundancy 
needed for a target yield.  The simulation setup incorporates both the systematic and 
random variation components.  We chose a reference bitline level that equalizes the 3σ 
bitline delays for D0 and D1 which in turn maximizes the 3σ point performance.  Care 
  54 
has been taken to ensure that the simulation setup and the bitline/peripheral delays are in 
good agreement with real eDRAM systems [35][38].  Even though the cell retention time 
according to Fig. 5.2 is around 300µsec, we also evaluate the gain cell performance at 
shorter hold periods (e.g. 10µsec) to fully understand the variation effect.  Fig. 5.9 shows 
the simulation results where the correlation between bitline delays at different hold 
periods are plotted.  The bitline delay at 0µsec is simply the read path strength since the 
cell voltage has not degraded, so the x-axes in Figs. 5.9(a) and 5.9(b) are equivalent to the 
VTH variation of the read path.  Results show that after a 300µsec hold period, the TOX 
variation starts to influence the bitline performance.  The correlation coefficient between 
the bitline delay at 300µsec and that at 0µsec (i.e. Fig. 5.9(b)) is only 0.5073 which is 
substantially reduced from the value at 10µsec (i.e. 0.9993).   
B
it
li
n
e
 D
e
la
y
 @
 1
0
µ
s
e
c
 (
n
s
e
c
)
B
it
li
n
e
 D
e
la
y
 @
 3
0
0
µ
s
e
c
 (
n
s
e
c
)
  
Fig. 5.9. Scatter plot shows the correlation between bitline delays at different hold 
periods.  (a) 10µsec vs. 0 µsec and (b) 300µsec vs. 0µsec. 
Tracking the tail cells (i.e. cells with the longest bitline delays) at different hold 
  55 
periods can give insight into how different process parameters affect the overall chip 
performance.  The data points in the highlighted areas in Fig. 5.9 represent the tail cells.  
At a hold time of 0µsec, cells with a higher VTH become tail cells.  The tail cells do not 
change at 10µsec as the hold time is too short to cause any significant degradation in the 
cell node voltage.  However at 300µsec, cells with thinner TOX emerge as new tail cells 
as they suffer more from the gate tunneling leakage during the long hold period.  Fig. 
5.9(b) suggests that the tail cells switch from the ones with a higher VTH at short hold 
periods to the ones with thinner TOX at longer hold periods.  It is worth noting that 
although the TOX variation effect becomes more prominent at longer hold times, VTH 
remains equally influential on bitline delay.  This can be seen from the correlation 
coefficient value of 0.5073 at 300µsec.   
5.5.2 Contour Plot Analysis  
Another way to understand how the influence of TOX and VTH shifts with increasing 
hold periods is through a contour plot such as the one shown in Fig. 5.10.  Equal bitline 
delay contours (blue curves) and equal probability contours (black concentric circles) are 
shown in the TOX and VTH space.  The figure shows an example for a 300µsec hold 
time.  Since we are concerned of the worst case only, and since we would like to be able 
to plot the contours in a 2D space, it is assumed that TOX or VTH of all three transistors 
are moving in the same direction.   
  56 
 
Fig. 5.10. Equal performance contours for a hold period of 300µsec. The cumulative 
probability of the highlighted area is 0.20%.   
In case a more thorough evaluation is needed, for example for studying the random 
mismatch between the three devices in a gain cell, the same methodology can be 
extended to an N-dimensional space by independently sweeping the TOX and VTH for 
each individual device.  The blue contours in Fig. 5.10 have a positive slope indicating 
that the bitline delay is increasing as we move to the high VTH and thin TOX corner.  
Suppose we consider the concentric circle which passes the (TOX, VTH) = (-2σ, +2σ) 
point.  It can be seen that the red dot has the worst delay out of the possible TOX and 
VTH combinations.  The arrow indicates the direction of maximum bitline delay 
increase.  The cumulative probability of the dark grey area wherein the bitline delay is 
greater than that of blue contour is 0.20%. 
Using the contour plots in Fig. 5.11, we can evaluate how the sensitivity of bitline 
delay with respect to TOX and VTH changes at different hold times.  For example, for a 
  57 
0µsec hold time, the bitline delays are relatively insensitive to the TOX as shown in the 
vertical contours in Fig. 5.11(a).  As the hold time increases, the blue contours gradually 
exhibit a positive slope.  This becomes most apparent in the lower right corners of Figs. 
5.11 (c) and (d).  For hold times longer than 300µsec in which the TOX variation effect is 
even more severe, it can be anticipated that the red dot will traverse towards the bottom 
of the outer concentric circle.  The contour plot method can be useful in guiding the 
device optimization for improving performance and predicting the amount of redundancy 
needed to meet a target yield. 
DVTH
-3σ 3σ-2σ -σ 2σσ0
-3σ
3σ
-2σ
-σ
2σ
σ
0
-3σ
3σ
-2σ
-σ
2σ
σ
0
-3σ
3σ
-2σ
-σ
2σ
σ
0
-3σ
3σ
-2σ
-σ
2σ
σ
0
0µsec 10µsec
100µsec 300µsec
DVTH
-3σ 3σ-2σ -σ 2σσ0
DVTH
-3σ 3σ-2σ -σ 2σσ0
DVTH
-3σ 3σ-2σ -σ 2σσ0
(a) (b)
(c) (d)
 
Fig. 5.11. Bitline delay contour plot in the TOX and VTH space for hold times of (a) 
0µsec (b)10µsec (c)100µsec and (d) 300µsec. 
  58 
5.6 Summary 
Gain cell eDRAMs can be implemented in a generic logic process achieving roughly 
2X higher bit cell densities compared to 6T SRAMs.  Furthermore, gain cells have a 
wider read/write margin than 6T SRAMs since there is no contention between the read 
and write paths.  Despite the recent advances in eDRAM circuit technology, there has not 
been any in-depth study on gain cell retention time or performance considering process 
variation.  In this chapter, we carry out a simple variation study that can offer a better 
understanding of gain cell eDRAM operation under process variation.  The investigation 
begins on the premise that traditional corner parameters cannot capture the worst case 
delay and makes the case for a device level statistical simulation framework.  Next, we 
show that TOX variation affects only the cell node voltage while VTH affects only the 
read path drive strength.  As a result, the performance at short hold times is impacted 
more by the VTH, while at longer hold times, both TOX and VTH have significant 
impact on performance.  Two Monte Carlo simulation strategies are compared for 
obtaining the eDRAM delay distribution.  Finally, we utilize the correlation between 
bitline delays at different hold times as well as the delay contour plots to explain how the 
bitline delay’s sensitivity to TOX and VTH varies with respect to hold time.  These two 
methods for analyzing eDRAM performance can help optimize the sizing, build circuits 
and redundancy techniques for variation tolerance, and guide the device optimization 
process.    
In the next chapter, we will move forward to actual chip designs of gain cell eDRAM, 
targeting at a practical retention time with enhanced performance and robustness. 
  59 
Chapter 6 
SILICON GAIN CELL MEMORY DESIGN 
6.1 Introduction 
The conventional 3T design discussed in Chapter 5 does not fulfill the requirements 
in modern silicon industry. The array retention time reduces to less than 100 µsecs for a 2 
ns random cycle time, which is not sufficient to compete with the mainstream 6T SRAM 
in performance or power. Innovations in bit-cells as well as peripheral circuits are 
required to promote gain cells for industry use.  
Several recent gain cell eDRAM designs based on 2T or 3T cells have demonstrated 
practical retention times beyond 100 µsecs [36]-[38], SRAM-like performance [28][38], 
and true logic-compatibility by eliminating boosted voltages [38]. Despite the rapid 
progress, one interesting feature of gain cells that has been largely overlooked in the past 
is the potential for write-back-free operation by taking advantage of the non-destructive 
read.  
In this chapter, we experimentally demonstrate for the first time, a gain cell eDRAM 
without write-back operation. By removing the write-back from a read operation, the read 
access speed can be significantly improved. We also apply a local voltage Sense 
Amplifier (S/A) scheme to overcome the design complexities and variability issues 
prevalent in the existing current-sensing schemes used for 2T gain cells. Finally, a low-
overhead low-power mode based on a dual-row-access scheme extends cell retention 
time by 3X to save refresh power during periods when only a fraction of the cache 
  60 
memory is being used. 
6.2 Write-Back-Free Read Operation  
R
a
n
d
o
m
 C
y
c
le
 T
im
e
 (
N
o
rm
.)
 
Fig. 6.1. Random cycle time comparison between SRAM and eDRAM. 
A write-back-free read operation provides us a system speedup opportunity which has 
been neglected in previous gain cell designs [28][36]-[38]. In conventional 1T1C 
eDRAMs, the read operation relies on charge sharing between the storage capacitance 
and the bitline capacitance. Due to the destructive read nature, a write-back is needed to 
reinforce the cell data after the charge sharing operation. Gain cells on the other hand, 
have a non-destructive read, eliminating the need for a write-back. Fig. 6.1 compares the 
cycle time between a 6T SRAM and various 2T eDRAMs implemented in the same 65nm 
low-power process. It shows that, by eliminating the write-back delay that accounts for 
approximately 30% of the read cycle, a significant improvement in both operating 
frequency and active power can be achieved. Although these benefits are only applicable 
  61 
to the read cycle, it is sufficient to improve the overall system performance significantly, 
because in general there are more reads than writes in a processor cache, and a read may 
stall the system and therefore degrade the system level performance, while a write will 
not. 
A 2T1D cell, shown in Fig. 6.2, was used in this work to demonstrate the write-back-
free operation. The gain cell design is different from the previous 2T1C cell [38] in that 
the P-type coupling device shared between adjacent cells is replaced by a separate N-type 
diode in each cell. This provides the similar beneficial coupling up effect [36]-[37] during 
read while minimizing any coupling noise from the adjacent cells. The PCOU signal 
preferentially couples up the cell node voltage for increased read current and beneficially 
couples down the cell voltage after sensing to restore the full voltage levels without 
requiring a boosted negative voltage [38]. Further details on the circuit operation of the 
2T1C cell can be found in [38]. Another major difference from [38] is that PCOU 
switching occurs within a single read cycle to allow consecutive reads without a write-
back phase. Even though the write cycle contains a read operation, which is required due 
to the row-wise write nature of eDRAMs to avoid overwriting the data of unaccessed 
cells on the same row, we kept the write cycle timing intact so that the beneficial write 
feature described in [38] can be preserved.  
  62 
 
Fig. 6.2. 2T1D gain cell with preferential boosting [38]. 
Simulation results in Fig. 6.3 examine the impact of write-back-free reads for access 
rates from 0.01% to 10%. The voltage window between data ‘1’ and data ‘0’ remains 
unchanged for access rates up to 1%. For access rates greater than 1%, the cell voltage 
window slightly improves. Although the detailed reasons are still under investigation, this 
may be due to the minute difference between the couple up and couple down voltages 
that gets accumulated over a long retention period (e.g. 200 µsec), which compensates for 
the pull-up leakages in data ‘0’. Another potential reason is that the PCOU coupling 
temporarily raises the data ‘0’ voltage during a read cycle, which reduces the entire 
leakage profile and extends the retention time especially in cases with frequent read. Data 
‘1’ has the similar behavior; however, the leakage profile change is negligible compared 
to that in data ‘0’ because the leakage in data ‘1’ cases is significantly smaller.  
Fig. 6.4 shows that for practical access rates, getting rid of the write-back does not 
have an adverse effect on the data ‘1’ and data ‘0’ levels. Our test chip design focuses on 
experimentally verifying the impact of write-back-free reads on overall eDRAM 
performance for practical access rates. 
  63 
∆V(0%)=0.577V
∆V(0.1%)=0.578V
0 100 200 0 100 200 300
0
0.5
1.0
∆V(0%) = 0.577V
∆V(1%) = 0.588V
∆V(0%) = 0.577V
∆V(10%) = 0.693V
Data ‘1’
Data ‘0’
Data ‘1’
Data ‘0’
Data ‘1’
Data ‘0’
Time After Write (µsec)
C
e
ll
 V
o
lt
a
g
e
 (
V
)
C
e
ll
 V
o
lt
a
g
e
 (
V
)
0
0.5
1.0
∆V(0%) = 0.577V
∆V(0.01%) = 0.577V
Data ‘1’
Data ‘0’
300
Read Access Rate = 0.01%
65nm, 1.1V, 85ºC, 
simulation
Read Access Rate = 0.1%
Read Access Rate = 1% Read Access Rate = 10%
Time After Write (µsec)
 
Fig. 6.3. Cell retention characteristics without write-back for different read access rates. 
Coupling strength increases with cell voltage. 
C
e
ll
 V
o
lt
a
g
e
 w
/o
 W
ri
te
-B
a
c
k
 (
V
)
 
Fig. 6.4. Cell voltage w/o write-back vs. read access rate. 
  64 
6.3 Voltage Sensing w/ Local Sense-Amplifier Architecture  
6.3.1 Read Disturbance Issue 
For 2T-based gain cell, despite the fast speed and compact cell size compared to its 
3T counterpart, read disturbance has been a common issue [28] as shown in Fig. 6.5 
(above). Current from the unselected cells storing with data ‘1’ limits the RBL swing to 
around 0.2V which is not sufficient for reliable voltage sensing. Common practice has 
been to implement a current-sensing scheme, which keeps the RBL level close to VDD 
during read [28][38]. However, a current-sensing scheme suffers from variation issues in 
the dummy cell due to the single-ended sensing and cannot utilize dummy averaging 
techniques because of the small input impedance requirement. This will introduce 
significant design overhead and scalability challenges in future technology nodes.  
 
Fig. 6.5. Read disturbance mitigation of 2T1D cell using (i) beneficial PCOU coupling, 
(ii) regulated WBL, and (iii) short local read bitlines. 
  65 
To get around this problem, we apply three circuit techniques that allow a more 
robust voltage-sensing scheme to be used. First, the beneficial PCOU coupling in the 
accessed 2T1D cell provides stronger pull-down read current. Second, a regulated WBL 
scheme [36][38] lowers the fresh data ‘1’ voltage by 0.2V (1.1V to 0.9V), significantly 
reducing the disturbance current. This has proven to have little impact on data retention 
since data ‘1’ quickly stabilizes to around 0.85V due to the cell leakage profile [28]. 
Finally, a Local-Sense-Amplifier (LSA) scheme with short read bitlines [32] limits the 
maximum number of unselected cells to 63 which in turn reduces the worst case read 
disturbance current and provides a sufficient signal margin for reliable voltage sensing.  
 
Fig. 6.6. Schematic and timing of local and global sense amplifier architecture. 
  66 
Fig. 6.6 shows the detailed architecture with local and global voltage S/A’s. The 
reduced load of the local RBLs ensures a fast voltage development while a simple global 
voltage S/A further speeds up the signal propagation. Dummy cell averaging which was 
not possible in previous current-sensing schemes, can now be implemented to enhance 
the robustness under PVT variations. The area overhead compared to a conventional 
current-sensing scheme is only 1.6% due to the simpler voltage S/A circuit that partially 
offsets the LSA area overhead.    
6.3.2 Effectiveness Evaluation 
For the three circuit techniques proposed to allow voltage sensing in 2T-based gain 
cells, namely the beneficial PCOU coupling, the regulated WBL and the local sense 
amplifier (LSA) architecture, it is meaningful to evaluate the effectiveness of each 
technique in mitigating the read disturbance issue. 
Tables 4 and 5 summarize the simulated bitline voltage window improvement in the 
worst read disturbance scenario by incorporating individual schemes. Compared to the 
conventional scheme [28], the beneficial PCOU coupling contributes around 20mV 
increase while the regulated WBL provides the most significant improvement of around 
180mV. The LSA architecture is examined with various numbers of cells per bitline. 
Around 30mV improvement is observed for each 2X reduction in cell numbers.  
 
 
 
 
  67 
Table 4. Simulated voltage window improvement for PCOU coupling and regulated 
WBL schemes, respectively. 
 
Table 5. Simulated voltage window improvement for LSA architecture with different 
number of cells per read bitline. 
 
The Log relationship between voltage window and cell number per bitline can be 
derived by assuming the read current being a constant, because the accessed data ‘1’ cell 
is typically in its saturation region. At the steady state, the total subthreshold leakage 
from adjacent unaccessed cells is equal to the read current, as depicted in Fig. 6.5, which 
therefore determines the available bitline voltage window. Assume there are N cells per 
(local) read bitline, in the worst case we have: 
  = !" − 1% &                                                         (1) 
  68 
Due to the exponential relationship between subthreshold leakage and gate voltage, 
the equation can be expressed as: 
  = !" − 1%' × exp!,-./0% = !" − 1%' × exp	!!0.9 − ,45%/0%     (2) 
Where C and D are constants given by the device characteristics, and 0.9V is the 
fresh data ‘1’ voltages in unaccessed cells in the worst case. Based on our voltage setup 
at 1.1V supply, Equation (2) can be rewritten as: 
  = !" − 1%' × exp!!0.9 − !1.1 − ,676%%/0% 
    			= !" − 1%' × exp	!!,676 − 0.2%/0%                                    (3) 
Where VWINDOW is the available bitline voltage window. From (3) we can get:  
	,676 = −089!" − 1% + 089! /'% + 0.2 
	= −089!" − 1% + :;9<.										 
≈ −089" + :;9<.                                                            (4) 
Equation (4) clearly reveals the log relationship between the voltage window 
(VWINDOW) and the cell number per bitline (N). A more detailed derivation based on the 
given device subthreshold swing can further provide the estimated slope value. For a 
subthreshold swing of 100mV/dec,  
	 & = ' × 10>!,-./100?,% 		= 	' × exp!8910× ,-./100?,%		 
= ' × exp	!,-./43.4?,%                                                                   (5) 
Therefore, 
0 = <BCℎEF<ℎ;8G_<IJ9K/8910	 = 43.4?,	                                     (6) 
 For a 2X reduction in the cell number per bitline, 
∆,676 = −089!"% − !−089!2"%% = 0892 = 30.1?,	                         (7) 
This result in (7) matches the simulated value (around 30mV as shown in Table 5) 
and provides a better understanding in a theoretical view.  
  69 
6.3.3 Design Conclusion 
According to the previous section, the regulated WBL is the most effective in 
mitigating the read disturbance issue while the PCOU coupling is the least. Although a 
reduced cell number per bitline is beneficial, it has to be carefully selected due to the 
trade-off between performance and area overhead. 
400
500
600
0%
4%
8%
32 64 128 256
(L
)R
B
L
 W
in
d
o
w
 (
m
V
)
A
re
a
 O
v
e
rh
e
a
d
Cells per (L)RBL
65nm LP, 1.1V, 85C
0
800
1600
0%
4%
8%
32 64 128 256
S
e
n
s
in
g
 D
e
la
y
 (
p
s
)
A
re
a
 O
v
e
rh
e
a
d
Sensing Delay
Area Overhead
Cells per (L)RBL
65nm LP, 1.1V, 85C
(a) (b)
(L)RBL Window 
Area Overhead
  
Fig. 6.7. (a) Bitline voltage window and (b) sensing delay versus area overhead. 
Fig. 6.7 shows the performance and area overhead with different number of cells per 
(local) bitline. The number of 32 introduces 7% area overhead and is not preferable; 64 
serves as an optimized value with sufficient voltage window and sensing speed at a cost 
of 3.5% area overhead. Note that this overhead number is reduced to 1.6% considering 
the simple global voltage S/A compared to a current-sensing solution. 
6.4 Dual-Row-Access Low Power Mode  
For applications that do not utilize the entire cache memory space, shutting down 
parts of the array is a practical way to save power. For gain cell eDRAMs, however, 
activating multiple rows at the same time can yield greater power savings because the 
  70 
refresh power is determined by the worst case retention time of the tail cells, which can 
be repaired by enabling additional strong cells at the same time due to spatial 
randomness.  
6.4.1 Multi-Row-Access Analysis 
  
Fig. 6.8. Power comparison for different row numbers (a) without and (b) with additional 
timing adjustment. 
Fig. 6.8(a) compares the power consumption between simple powering down and 
multi-row-access approaches, with different numbers of rows accessed and no adjustment 
in timing or reference voltage. By not sharing LSAs, a multi-row access incurs no timing 
or reference adjustment for local sensing, and meanwhile multiples the sensing currents 
to speed up voltage development on the global RBLs, where the voltage swing limit is 
VDD. By merely calculating the sensing currents regardless of the swing limit, the multi-
row-access approach is estimated to be beneficial with up to 4 rows. However, this 
number is limited to 2 with a 1.1V swing, because the data ‘0’ in the 4- or 8-row case 
needs to be strong enough to prevent the quickly discharging local read bitline reaching 
  71 
0V by the time of global sensing.  
The swing issue can be resolved by scaling GRBL sensing time accordingly. 
Nevertheless, in this case the benefit of larger sensing currents is not available, and 
additional timing adjustment circuitry is required as design overhead. Fig. 6.8(b) 
indicates that 2 is the beneficial row number while the power saving is smaller than that 
in Fig. 6.8(a). 
Although the actual power savings would vary in real chips, Fig. 6.8 provides a good 
estimate of the performance trend. Activating more than two rows at a time (e.g. 4-row-
access, 8-row-access) has diminishing returns while incurring significant design overhead 
in forms of dedicated timing and reference circuitry. Therefore, we propose to use a 
simple dual-row-access mode. 
6.4.2 Dual-Row-Access Mode 
Fig. 6.9 shows our dual-row-access mode, where two wordlines with respective LSAs 
are enabled at the same time without any changes in the read reference circuit. The weak 
cell is thus repaired by a stronger one according to spatial randomness, and in addition, 
mismatch between the LSAs themselves gets averaged out. Moreover, the effective 
sensing current of the global read bitline (GRBL) is doubled, which improves the 
retention time even further under the same sensing window requirement and timing 
constraints. Therefore, the worst retention time can be improved by more than 2X using a 
dual-row-access scheme, while a simple powering-down approach may still suffer from 
the tail cell’s retention time.  
  72 
W
L
0
P
C
O
U
0
R
W
L
0
W
L
6
3
P
C
O
U
6
3
R
W
L
6
3
W
L
6
4
P
C
O
U
6
4
R
W
L
6
4
W
L
1
2
7
P
C
O
U
1
2
7
R
W
L
1
2
7
 
Fig. 6.9. Dual-row access mode illustration (WL0 and WL64 selected). 
6.5 Measurement Results 
A 64kb eDRAM test chip was implemented in a 1.2V, 65nm logic CMOS process to 
demonstrate the proposed circuit techniques. The chip achieves a 99.9% retention time of 
325µsec and a refresh power of 234.1µW/Mb at 1.1V, 85ºC. Figs. 6.10 and 6.11 show the 
failure percentile and retention map from a 1kb sub-array, respectively, for a worst case 
read disturbance pattern and a 1.0 ns read cycle time. No noticeable changes in the 
retention time were observed across different access rates. Although our measurement 
setup supports only up to a 1% access rate, this is sufficient for real-world applications as 
indicated in Fig. 6.4. For a 99.99% bit yield, the extrapolated retention time was 90µsec. 
Due to the dummy averaging feature of our proposed design, the measured retention map 
in Fig. 6.11 shows no significant signs of bitline dependency in the failure pattern which 
is in contrast to the results presented in our prior work [38]. 
  73 
 
Fig. 6.10. Failure percentiles for a 1kb sub-array (left) and detailed view of tail cells 
(right). 
 
 
 
 
 
Fig. 6.11. Measured retention maps for single (left) and dual (right) row access modes. 
No significant bitline dependency is observed in either case. 
Fig. 6.11 also shows that the worst case retention time improves from 325µsec to 
900µsec using the dual-row-access mode. Fig. 6.12 compares the power dissipation of 
SRAM and various eDRAM configurations. The proposed 2T1D design (single row 
  74 
access) achieves a 23.9% power saving compared to that of a power gated 6T SRAM 
with a 0.6V retention voltage. Compared to a simple power down mode where half the 
eDRAM is unused, a 27.8% power reduction was achieved using the dual-row-access 
mode. 
P
o
w
e
r 
C
o
n
s
u
m
p
ti
o
n
(µ
W
/1
M
b
)
6
T
 S
R
A
M
 
(0
.6
V
 R
e
te
n
ti
o
n
 V
o
lt
a
g
e
)
2
T
1
D
 e
D
R
A
M
2
T
1
D
 
H
a
lf
 A
rr
a
y
2
T
1
D
 
2
-r
o
w
 
Fig. 6.12. Power consumption comparison between a power gated SRAM with a 0.6V 
retention voltage and various 2T1D eDRAM power down modes. 
R
a
n
d
o
m
 C
y
c
le
 T
im
e
 (
n
s
e
c
)
R
e
te
n
tio
n
 T
im
e
 (s
e
c
)
C
u
rr
e
n
t 
C
o
n
s
u
m
p
ti
o
n
 (
µ
A
/M
b
)
 
Fig. 6.13. (a) Measured VDD shmoo and (b) static power comparison. 
  75 
The measured VDD shmoo for retention time and cycle time is plotted in Fig. 6.13(a), 
and a static power comparison between 6T SRAM and the proposed eDRAM design for 
different supply voltages is shown in Fig. 6.13(b). The longer retention time at higher 
supply voltages makes the eDRAM refresh power to be lower than the static power of an 
SRAM. Note that the optimal supply voltage of an eDRAM is usually higher than that of 
an SRAM due to the refresh power dominating the overall static power consumption. Fig. 
6.14 shows the die microphotograph and summarizes the key features. 
 
Fig. 6.14. Microphotograph and summary of test chip characteristics. 
6.6 Summary 
We have presented several circuit techniques to enhance the performance and 
robustness of gain cell eDRAM. The proposed design was the first to experimentally 
verify write-back-free read operation in gain cells according to the non-destructive read 
nature. No noticeable retention time difference was observed across a wide range of 
access rates. We also proposed various circuit techniques for mitigating read disturbance 
issues including a local-sense-amplifier scheme. In addition, a dual-row-access low 
power mode was introduced to further reduce static power in scenarios where less than 
  76 
half the cache is being utilized. Test chip measurements were presented from a 64kb 
eDRAM array implemented in a 1.2V, 65nm CMOS process. The resulted power 
consumption is 23.9% lower than that of an SRAM design, which therefore indicates the 
potential of gain cell eDRAM as a low power solution for future silicon memory. 
  77 
Chapter 7 
CONCLUSION 
Low power memory is indispensible in many large-scale applications, both in organic 
and silicon platform. Few attempts have been made on memory designs in organic 
platform, due to the immature techniques including the lack of an n-type device. In 
silicon platform, although memory is well supported and is equipped with various 
advanced circuit techniques for higher performance and robustness, technology scaling is 
making memory design challenging especially for low power purpose. In this 
dissertation, we find that gain cell DRAMs serve as a promising low power memory 
candidate in both organic and silicon platforms, with desired properties such as logic-
compatibility, decoupled read and write and non-destructive read. 
 For the organic platform, we utilize printable and flexible ion-gel gated organic thin-
film-transistors (gel-OTFTs) with Poly(3-hexylthiophene) (P3HT) channel material. Gel-
OTFTs provide unusually high gate capacitance, which is desired for DRAM data storage 
and meanwhile allows low voltage operation. We presented the first organic process 
design kit (OPDK), which introduces various computer-aided design (CAD) techniques 
including a complete set of design examination tools and various modeling strategies, to 
significantly facilitate the design and fabrication process. This eliminates major design 
and fabrication challenges such as high error rate in circuit and layout design, long 
fabrication time, and so on. We then demonstrated several successful circuit 
implementations, including inverter, NAND gate, ring oscillator and D-flipflop, which 
verifies the potential of gel-OTFTs for large-scale circuit systems.  
  78 
With the assistance of OPDK, we successfully demonstrated the first organic DRAM 
array delivering significantly low power and practical performance. Gain cell DRAMs 
can be implemented with p-type devices without any penalty and at the same time 
benefits from the high gate capacitance, which makes gain cell a desired memory cell 
structure for gel-OTFTs. An 8x8 organic DRAM array was implemented using aerosol jet 
printing. It achieved a sub-10nW-per-cell refresh power through a practically long 
retention time (over 30 seconds), which is 5 orders of magnitude longer than that in 
silicon designs and is sufficient even for the low speed organic circuits to perform a 
periodical refresh operation. 
We then extended our investigation of gain cell DRAMs in silicon platform. Our 
variation-aware simulation analysis revealed the fact that corner simulations, commonly 
adopted for worst-case evaluations, are no longer effective to capture the tail cells in 
silicon gain cell DRAMs. This is due to the relative independence between the two major 
variation sources, namely the dioxide thickness and threshold voltage. We proposed ways 
to understand this behavior, which also provide insights for future device optimization as 
well as simulation strategy improvement. 
Finally, we implemented a 64kb gain cell embedded DRAM array in a 1.2V 65nm 
low power technology. For the first time, we experimentally demonstrated a write-back-
free read operation in gain cells, which significantly improves the read speed. We also 
proposed three circuit techniques including a local sense amplifier (LSA) architecture to 
allow voltage-sensing in 2T-based gain cell designs. This avoids the challenging current-
sensing design and enhances the robustness through dummy averaging. A retention time 
  79 
of 325µs was achieved at 1ns read cycle time. This results in 23.9% power saving 
compared to a 6T SRAM in 0.6V retention voltage. A dual-row-access mode was 
proposed to achieve a further 27.8% power reduction in scenarios where only half of the 
array is needed. 
  
  80 
REFERENCE 
[1] D. Gamota, “Near-Term Opportunities for Large-Area Flexible Electronics”, Printed 
Circuit Design & Fab, April 2009.  
[2] J. Daniel, T. Ng, S. Garner, et al., “Pressure Sensors for Printed Blast Dosimeters”, 
IEEE Sensors, 2010. 
[3] W. Zhang, M. Ha, D. Braga, et al., "A 1V Printed Organic DRAM Cell Based on Ion-
Gel Gated Transistors with a Sub-10nW-per-Cell Refresh Power", International 
Solid-State Circuits Conference, pp. 326-328, February 2011. 
[4] J. H. Cho, J. Lee, Y. Xia, et al., “Printable Ion-Gel Gate Dielectrics for Low-Voltage 
Polymer Thin-Film Transistors on Plastic”, Nature Materials, Volume 7, Issue 11, 
pages 900 – 906, October 2008. 
[5] E. Cantatore, T. Geuns, G. Gelinck, et al., “A 13.56-MHz RFID System Based on 
Organic Transponders”, IEEE Journal of Solid-State Circuits, Volume 42 , Issue 1, 
Pages 84 – 92, January 2007. 
[6] E. Huitema, G. Gelinck, B. van der Putten, et al., “Plastic Transistors in Active-
Matrix Displays”, International Solid-State Circuit Conference, February 2003. 
[7] K. Myny, E. van Veenendaal, G. Gelinck, et.al., “An 8b Organic Microprocessor on 
Plastic Foil”, International Solid-State Circuit Conference, pp. 322-324, February 
2011. 
[8] T. Zaki1, F. Ante, U. Zschieschang, et.al., “A 3.3V 6b 100kS/s Current-Steering D/A 
Converter Using Organic Thin-Film Transistors on Glass”, International Solid-State 
Circuit Conference, pp. 324-325, February 2011. 
  81 
[9] Y. Xia, W. Zhang, M. Ha, et.al., “Printed, Sub-2V Gel Electrolyte-Gated Polymer 
Transistors and Circuits”, Advanced Functional Materials, Volume 20, Issue 4, pages 
587 - 594, February 2010. 
[10] I. Nausieda, K. Ryu, I. Kymissis, et al., “An Organic Imager for Flexible Large Area 
Electronics”, International Solid-State Circuits Conference, pp. 72-73, February 2007. 
[11] W. Xiong, U. Zschieschang, H. Klauk, B. Murmann, “A 3V 6b Successive-
Approximation ADC Using Complementary Organic Thin-Film Transistors on Glass”, 
International Solid-State Circuits Conference, pp. 134-135, February 2010. 
[12]  http://www.cadence.com 
[13] http://www.synopsys.com 
[14] http://opdk.umn.edu 
[15] http://ni.com/labview 
[16] Synopsys® HSPICE User Manual. 
[17] M. Ha, Y. Xia, A. Green, et al., “Printed, Sub-3V Digital Circuits on Plastic from 
Aqueous Carbon Nanotube Inks”, ACS Nano, Volume 4, Issue 8, pages 4388 - 4395, 
June 2010. 
[18] J. Harms, F. Ebrahimi, X. Yao, J. Wang, "SPICE Macromodel of Spin-Torque-
Transfer-Operated Magnetic Tunnel Junctions",  IEEE Transactions on Electron 
Devices, Volume 57, Issue 6, pp. 1425-1430, 2010.  
[19] http://www.eda.ncsu.edu/wiki/FreePDK 
[20] http://www.mentor.com 
  82 
[21] M. Jung, J. Kim, J. Noh, et al., "All-Printed and Roll-to-Roll-Printable 13.56-MHz-
Operated 1-bit RF Tag on Plastic Foils", IEEE Transactions On Electron Devices, 
Vol. 57, No. 3, pp. 571-580, March 2010. 
[22]  http://www.optomec.com 
[23] T. Sekitani, T. Yokota, U. Zschieschang, et al., "Organic Nonvolatile Memory 
Transistors for Flexible Sensor Arrays", Science, Vol. 326, no. 5959, pp. 1516-1519, 
December 2009. 
[24] S. Kang, Y. Park, I. Bae, et al., "Printable Ferroelectric PVDF/PMMA Blend Films 
with Ultralow Roughness for Low Voltage Non-Volatile Polymer Memory", 
Advanced Functional Materials, Volume 19, Issue 17, pp. 2812-2818, 2009. 
[25] B. de Brito, E. Smits, P. van Hal, et al., "Ultralow Power Microfuses for Write-Once 
Read-Many Organic Memory Elements", Advanced Materials, Volume 20, Issue 19, 
pp. 3750–3753, October 2008. 
[26] M. Takamiya, T. Sekitani, Y. Kato, et al., “An Organic FET SRAM With Back Gate 
to Increase Static Noise Margin and Its Application to Braille Sheet Display”, IEEE 
Journal of Solid-State Circuits, Volume 42, Issue 1, pp. 93-100,  January 2007. 
[27] R. Blache, J. Krumm, W. Fix, “Organic CMOS Circuits for RFID Applications”, 
International Solid-State Circuits Conference, pp. 208-109, February 2009. 
[28] K. Chun, P. Jain, T. Kim, C. H. Kim, “A 1.1V, 667MHz Random Cycle, 
Asymmetric 2T Gain Cell Embedded DRAM with a 99.9 Percentile Retention Time 
of 110µsec”, VLSI Circuits Symposium, pp. 192-192, June 2010. 
  83 
[29] P. Klim, J. Barth, W. Reohr, et al., “A 1 MB Cache Subsystem Prototype With 1.8 
ns Embedded DRAMs in 45 nm SOI CMOS,” IEEE Journal of Solid-State Circuits, 
vol. 44, no. 4, pp. 1216-1226, 2009. 
[30] M. Nakayama, H. Sakakibara, M. Kusunoki, et al., "A 16 MB cache DRAM LSI 
with internal 35.8 GB/s memory bandwidth for simultaneous read and write 
operation," International Solid-State Circuits Conference, pp. 398-399, 2000. 
[31] S. Romanovsky, A. Katoch, A. Achyuthan, et al., "A 500MHz Random-Access 
Embedded 1Mb DRAM Macro in Bulk CMOS," International Solid-State Circuits 
Conference, pp. 270-612, 2008. 
[32] J. Barth, et al., “A 500 MHz random cycle, 1.5 ns latency, SOI embedded DRAM 
macro featuring a three-transistor micro sense amplifier,” IEEE Journal of Solid-State 
Circuits, Vol. 43, Issue 1, pp. 86-95, 2008. 
[33] J. Barth, et al., “A 45 nm SOI embedded DRAM macro for POWER7TM processor 
32 MB on-chip L3 cache,” IEEE Journal of Solid-State Circuits, Vol. 46, Issue 1, pp. 
64-75, 2011. 
[34] K. Zhang, et al., “SRAM design on 65-nm CMOS technology with dynamic sleep 
transistor for leakage reduction,” IEEE Journal of Solid-State Circuits, Vol. 40, Issue 
4, pp. 895-901, 2005. 
[35] D. Somasekhar, et al., “2 GHz 2 Mb 2T gain cell memory macro with 128 
GBytes/sec bandwidth in a 65 nm logic process technology,” IEEE Journal of Solid-
State Circuits, Vol. 44, Issue 1, pp. 174-185, 2009. 
  84 
[36] K. Chun, P. Jain, J. Lee, et al., "A sub-0.9V logic-compatible embedded DRAM 
with boosted 3T gain cell, regulated bit-line write scheme and PVT-tracking read 
reference bias," VLSI Circuits Symposium, pp. 134-135, 2009. 
[37] W. Luk, J. Cai, R.  Dennard, M.  Immediato, S. Kosonocky, “A 3-transistor DRAM 
cell with gated diode for enhanced speed and retention time”, VLSI Circuits 
Symposium, pp. 184-185, 2006. 
[38] K. Chun, W. Zhang, P. Jain, C. Kim, “A 700MHz 2T1C embedded DRAM macro in 
a generic logic process with no boosted supplies,” International Solid-State Circuits 
Conference, pp. 506-507, 2011. 
