Asynchronous memory design. by Sit, Vincent Wing-Yun. & Chinese University of Hong Kong Graduate School. Division of Electronic Engineering.
Asynchronous Memory Design 
A Thesis 
submitted to 
The Department ofElectronic Engineering 
of 
The Chinese University ofHong Kong 
in 
partial fulfillment of the requirements for the degree of 
Master ofPhilosophy 
by 
Vincent Wing-Yun SIT 
June 1997 
i ^ ^ ^ ^ l 
p ( 1 5 JUL ^ ^ ! J 
^ T T S ^ T " ^ ^ 
m4|BRARY SYSTE^y 
^ ^ ^ ^ ^ 
Asynchronous Memory Desisn 
TABLE OF CONTENTS 
TABLE OF CONTENTS 
LIST OF FIGURES 
LIST OF TABLES 
ACKNOWLEDGEMENTS 
ABSTRACT 
1. INTRODUCTION 1 
1.1 ASYNCHRONOUS DESIGN 2 
1.1.1 POTENTIAL ADVANTAGES 2 
1.1.2 DESIGN METHODOLOGIES 2 
1.1.3 SYSTEM CHARACTERISTICS 3 
1.2 ASYNCHRONOUS MEMORY 5 
1.2.1 MOTIVATION 5 
1.2.2 DEFMmON 9 
1.3 PROPOSED MEMORY DESIGN 10 
1.3.1 CONTROL mTERFACE 10 
1.3.2 OVERVIEW 11 
1.3.3 HANDSHAKE CONTROL PROTOCOL 13 
2. THEORY 16 
2.1 VARIABLE BIT LJNE LOAD 17 
2.1.1 DEFESFITION 17 
i 
Asynchronous Memory Desisn 
2.1.2 ADVANTAGE 17 
2.2 CURRENT SENSING COMPLETION DETECTION 18 
2.2.1 BLOCK DIAGRAM 19 
2.2.2 GENERAL LSD CURRENT SENSOR 21 
2.2.3 CMOS LSD CURRENT SENSOR 23 
2.3 VOLTAGE SENSING COMPLETION DETECTION 28 
2.3.1 DATA READING IN MEMORY CIRCUIT 29 
2.3.2 BLOCK DIAGRAM 30 
2.4 MULTIPLE DELAYS COMPLETION GENERATION 32 
2.4.1 ADVANTAGE 32 
2.4.2 BLOCK DIAGRAM 33 
3. IMPLEMENTATION 35 
3.1 1M-BIT SRAM FRAMEWORK 3 6 
3.1.1 INTRODUCTION 36 
3.1.2 FRAMEWORK 36 
3.2 CONTROL CIRCUIT 40 
3.2.1 CONTROL SIGNALS 40 
3.2.1.1 EXTERNAL CONTROL SIGNALS 40 
3.2.1.2 E^JTERNAL CONTROL SIGNALS 41 
3.2.2 READ / WRITE STATE TRANSITION GRAPHS 42 
3.2.3 IMPLEMENTATION 43 
3.3 BIT LINE SEGMENTATION 45 
3.3.1 FOUR REGIONS SEGMENTATION 46 
3.3.2 OPERATION 50 
3.3.3 MEMORYCELL 51 
3.4 CURRENT SENSING COMPLETION DETECTION 52 
3.4.1 ONE BIT DATA BUS 53 
3.4.2 EIGHT BITS DATA BUS 55 
3.5 VOLTAGE SENSING COMPLETION DETECTION 57 
3.5.1 ONE BIT DATA BUS 57 
ii 
Asynchronous Memory Desisn 
3.5.2 EIGHT BITS DATA BUS 59 
3.6 MULTIPLE DELAYS COMPLETION GENERATION 60 
4. SIMULATION 63 
4.1 SIMULATION ENVIRONMENT 64 
4.1.1 SIMULATION PARAMETERS 64 
4.1.2 MEMORY TIMING SPECIFICATIONS 64 
4.1.3 BIT LINE LOAD DETERMINATION 67 
4.2 BENCHMARK SIMULATION 69 
4.2.1 CIRCUIT SCHEMATIC 69 
4.2.2 RESULTS 71 
4.3 CURRENT SENSING COMPLETION DETECTION 73 
4.3.1 CIRCUIT SCHEMATIC 73 
4.3.2 SENSE AMPLIFIER CURRENT CHARACTERISTICS 75 
4.3.3 RESULTS 76 
4.3.4 OBSERVATIONS 80 
4.4 VOLTAGE SENSING COMPLETION DETECTION 82 
4.4.1 CIRCUIT SCHEMATIC 82 
4.4.2 RESULTS 83 
4.5 MULTIPLE DELAYS COMPLETION GENERATION 89 
4.5.1 CIRCUIT SCHEMATIC 89 
4.5.2 RESULTS 90 
5. TESTING 97 
5.1 TEST CfflP DESIGN 98 
5.1.1 BLOCK DIAGRAM 98 
5.1.2 SCHEMATIC 100 
5.1.3 LAYOUT 102 
5.2 HSPICE POST-LAYOUT SIMULATION RESULTS 104 
5.2.1 GRAPHICAL RESULTS 105 
5.2.2 VOLTAGE SENSESfG COMPLETION DETECTION 108 
iii 
Asynchronous Memory Desisn 
5.2.3 MULTIPLE DELAYS COMPLETION GENERATION 114 
5.3 MEASUREMENTS 117 
5.3.1 LOGIC RESULTS 118 
5.3.1.1 METHOD 118 
5.3.1.2 RESULTS 118 
5.3.2 TIMING RESULTS 119 
5.3.2.1 METHOD 119 
5.3.2.2 GRAPHICAL RESULTS 121 
5.3.2.3 VOLTAGE S E N S ^ G COMPLETION DETECTION 123 
5.3.2.4 MULTIPLE DELAYS COMPLETION GENERATION 125 
6. DISCUSSION 127 
6.1 CURRENT SENSING COMPLETION DETECTION 128 
6.1.1 COMMENTS AND CONCLUSION 128 
6.1.2 SUGGESTION 128 
6.2 VOLTAGE SENSING COMPLETION DETECTION 129 
6.2.1 RESULTS COMPARISON 129 
6.2.1.1 GENERAL 129 
6.2.1.2 BIT LINE LOAD 132 
6.2.1.3 BIT UNE SEGMENTATION 133 
6.2.2 RESOURCE CONSUMPTION 133 
6.2.2.1 AREA 133 
6.2.2.2 POWER 134 
6.2.3 COMMENTS AND CONCLUSION 134 
6.3 MULTIPLE DELAY COMPLETION GENERATION 135 
6.3.1 RESULTS COMPARISON 135 
6.3.1.1 GENERAL 135 
6.3.1.2 BIT UNE LOAD 136 
6.3.1.3 BIT UNE SEGMENTATION 137 
6.3.2 RESOURCE CONSUMPTION 138 
6.3.2.1 AREA 138 
6.3.2.2 POWER 138 
6.3.3 COMMENTS AND CONCLUSION 138 
6.4 GENERAL COMMENTS 139 
iv 
Asynchronous Memory Design 
6.4.1 COMPAPaSON OF THE THREE TECHNIQUES 139 
6.4.2 BIT LINE SEGMENTATION 141 
6.5 APPLICATION 142 
6.6 FURTHER DEVELOPMENTS 144 
6.6.1 INTERACE WITH TWO-PHASE HCP 144 
6.6.2 DATA BUS EXPANSION 146 
6.6.3 SPEED OPTIMIZATION 147 
6.6.4 MODIFIED WRITE COMPLETION METHOD 150 
7. CONCLUSION 152 
7.1 PROBLEM DEFINITION 152 
7.2 IMPLEMENTATION 152 
7.3 EVALUATION 153 
7.4 COMMENTS AND SUGGESTIONS 155 
8. REFERENCES R-1 
9. APPENDIX A_1 
9.1 HSPICE SIMULATION PARAMETERS A-1 
9.1.1 TYPICAL SIMULATION CONDITION A-1 
9.1.2 FAST SIMULATION CONDITION A-3 
9.1.3 SLOW SIMULATION CONDITION A-4 
9.2 SRAM CELL LAYOUT AND NETLIST A-5 
9.3 TEST CHIP SPECIFICATIONS A-8 
9.3.1 GENERAL SPECIFICATIONS A-8 
9.3.2 PIN ASSIGNMENT A-9 
9.3.3 m n N G DIAGRAMS AND SPECIFICATIONS A-10 
9.3.4 SCHEMATICS AND LAYOUTS A-11 
9.3.4.1 STANDARD MEMORY COMPONENTS A-12 
9.3.4.2 DVSCD AND MDCG COMPONENTS A-20 
9.3.5 MICROPHOTOGRAPH A-25 
X 
Asynchronous Memory Desisn 
LIST OF FIGURES 
Figure 1-1: Micropipelines Memory Interface 8 
Figure 1-2: Proposed Asynchronous Memory Interface 11 
Figure 1-3: Block Diagram ofProposed Asynchronous Memory 12 
Figure 1-4: Timing Diagram for Read Operation 14 
Figure 1-5: Timing Diagram for Write Operation 15 
Figure 2-1: A CMOS Circuit with CSCD 20 
Figure 2-2: CMOS LSD Current Sensor 22 
Figure 2-3: Circuit Diagram ofCMOS LSD Current Sensor 24 
Figure 2-4: Circuit Diagram of SRAM Read Cycle 29 
Figure 2-5: Block Diagram ofDVSCD 31 
Figure 2-6: MDCG Write Completion Circuit 34 
Figure3-1: lM-bit SRAM framework 37 
Figure 3-2: Data Sense Amplifiers 39 
Figure 3-3: Read / Write State Transition Graphs 43 
Figure 3-4: Four-Phase HCP Control Circuit 45 
Figure 3-5: Conventional Bit Line Connection 49 
Figure 3-6: Segmented Bit Line Connection 50 
Figure 3-7: Six Transistors Static Memory Cell 52 
Figure 3-8: One Bit Data Bus Read Completion Using CSCD 54 
Figure 3-9: Eight Bits Data Bus Read Completion Using CSCD 55 
Figure 3-10: One Bit Data Bus DVSCD Read Completion 58 
Figure 3-11: Eight Bits Data Bus DVSCD Read Completion 59 
Figure 3-12: MDCG Write Completion (Four Regions) 61 
Figure 4-1: Timing Diagrams of Critical Read / Write Control Signals 65 
Figure 4-2: Bit Line Load Verification 68 
vi 
Asynchronous Memory Desisn 
Figure 4-3: Benchmark Memory Simulation Circuit 70 
Figure 4-4: Benchmark Memory Graphical Results 71 
Figure 4-5: Memory Simulation Circuit with CSCD and MDCG Circuits 74 
Figure 4-6: Modified Sense Amplifier Current Characteristics 76 
Figure 4-7: Region 3 Read Signals Graphical Results 77 
Figure 4-8: Read Acknowledge Signals Graphical Results 78 
Figure 4-10: Graph ofRead Acknowledge Time vs. Bit Line Load 79 
Figure 4-9: Graph ofRead Completion Time vs. Bit Line Load 79 
Figure 4-11: Relationship Between Data, Read Completion Signal and Current 81 
Figure 4-12: Memory Simulation Circuit with DVSCD and MDCG Circuits 84 
Figure 4-13: Region 3 Read Signals Graphical Results 85 
Figure 4-14: Read Acknowledge Signal Graphical Result 85 
Figure 4-15: Graph ofRead Completion Time vs. Bit Line Load 86 
Figure 4-16: Graph of Read Acknowledge Time vs. Bit Line Load 87 
Figure 4-17: Region 3 Write Signals Graphical Results 90 
Figure 4-18: Write Acknowledge Signal Graphical Results 91 
Figure 4-19: Graph of Actual Write Time vs. Bit Line Load 93 
Figure 4-20: Graph of Write Completion Time vs. Bit Line Load 93 
Figure 4-21: Graph of Write Acknowledge Time vs. Bit Line Load 94 
Figure 5-1: Block Diagram of Asynchronous Memory Test Chip 99 
Figure 5-2: Schematic of Asynchronous Memory Test Chip 101 
Figure 5-3: Floor Plan of Asynchronous Memory Test Chip 102 
Figure 5-4: Layout of Asynchronous Memory Test Chip 103 
Figure 5-5: Modified Sense Amplifier Benchmark Results 106 
Figure 5-6: Modified Sense Amplifier DVSCD and MDCG Results 106 
Figure 5-7: Conventional Sense Amplifier Benchmark Results 107 
Figure 5-8: Conventional Sense Amplifier DVSCD and MDCG Results 107 
Figure 5-9: Graph of Read Acknowledge Time vs. Bit Line Load 109 
Figure 5-10: Graph ofRead Acknowledge Time vs. Bit Line Load 112 
vii 
Asynchronous Memory Desisn 
Figure 5-11: Graph of Write Acknowledge Time vs. Bit Line Load 115 
Figure 5-12: Typical Results Observed On Logic Analysis System 119 
Figure 5-13: Typical Waveforms Observed On CRO 122 
Figure 5-14: Graph ofRead Acknowledge Time vs. Bit Line Load 124 
Figure 6-1: Graph ofDVSCD Results Comparison 131 
Figure 6-2: Graph ofMDCG Results Comparison 136 
Figure 6-3: 8M-bit Asynchronous Memory System 143 
Figure 6-4: Interface with Two-Phase HCP 145 
Figure 6-5: N bits Data Bus DVSCD Read Completion Circuit 147 
Figure 6-6: Memory Matrix Allocations for Different Cases 148 
Figure 6-7: Block Diagram of Alternative Write Completion Method 151 
Figure 9-1: Six Transistors SRAM Cell Layout A-5 
Figure 9-2: Test Chip Read / Write Timing Diagrams A-10 
viii 
Asynchronous Memory Desisn 
LIST OF TABLES 
Table 3-1: The Approximated Results for One to Ten Regions of Segmentation 48 
Table 3-2: Memory Location vs. Address Pins Status, Transmission Gates Status and Equivalent Bit 
Line Load 51 
Table 4-1: Definitions ofMemory Timing Specification Parameters 66 
Table 4-2: Verification ofBit Line Load Value 69 
Table 4-3: Benchmark Memory Numerical Results 72 
Table 4-4: CSCD Numerical Results 79 
Table 4-5: DVSCD Numerical Results 86 
Table 4-6: Comparison Between DVSCD and Benchmark Results 88 
Table 4-7: MDCG Numerical Results 92 
Table 4-8: Comparison Between MDCG and Benchmark Results 96 
Table 5-1: Modified Sense Amplifier DVSCD Numerical Results 108 
Table 5-2: Comparison Between DVSCD and Benchmark Results 110 
Table 5-3: Conventional Sense Amplifier DVSCD Numerical Results 111 
Table 5-4: Comparison Between DVSCD and Benchmark Results 113 
Table 5-5: MDCG Numerical Results 115 
Table 5-6: Comparison Between MDCG and Benchmark Results 117 
Table 5-7: DVSCD Measured Results 123 
Table 5-8: Comparison Between DVSCD and Benchmark Results 124 
Table 5-9: MDCG Results 125 
Table 6-1: DVSCD Results Comparison 130 
Table 6-2: Effect ofBit Line Load and Segmentation on DVSCD Method 132 
Table 6-3: MDCG Results Comparison 135 
Table 6-4: Effect ofBit Line Load and Segmentation on MDCG Method 137 
Table 6-5: Comparison between the CSCD, DVSCD and MDCG methods 140 
ix 
Asynchronous Memory Design 
Table 6-6: The Comparison ofRead Acknowledge Times for Three Cases 149 
Table 9-1: Test Chip General Specifications A-8 
Table 9-2: Test Chip Pin Assignment A-9 
Table 9-3: Test Chip Timing Specifications A-11 
X 
Asynchronous Memory Desisn 
A CKNO WLED GEMENTS 
My work would not have been made possible without the help and support of 
many individuals. I would like to take this opportunity to thank my supervisor, Prof. 
Choy Chiu-Sing for his full support, guidance and suggestions on my work 
throughout my period of study. Also, I would like to thank Prof. Chan Cheong-Fat, 
Prof. Chang Fung-Yuel and Prof. Xu Jian-Bin. I had learn a lot from the courses they 
� 
had taught. Moreover, I would like to thank all my peers and colleagues in the ASIC 
/ VLSI Laboratory. This includes (names in random order) Eva Pang Yuk-Wah, 
Johnson Pang Tin-Chak, Mark To Hon-Sun, Timothy Chung Kai-Cheung, Frankie 
Cheng King-Sun, Kelvin Cheung Ka-Wai, Lau Yuen-Pat, Pun Kong-Pang, Thomas 
Chan Chung-Kei, To Cheuk-Him, Or Chung-Yuk, Soo Wing-Yiu, Eliza Yang Chi-
Shan, Vincent Siu Chun-Wah, Winnie Chan Suk-Fong, Johnson Leung Tsz-Chung, 
our research assistants / associates Dr. Juraj Povazanec, Mr. Li Fong, Mr. Long Hu-
Qiang, and our laboratory technician Mr. Yeung Wing-Yee. Of course, I would like 
to thank my family members and friends for their continuous support and 
encouragement. 
xi 
Asynchronous Memory Desisn 
ABSTRACT 
Asynchronous system design has been a hot topic over the past decades. It 
has potential advantages over the synchronous approach and, if implemented 
sensibly, the asynchronous design can perform much better than the equivalent 
synchronous design in some applications. The motivation of designing asynchronous 
memory arises from the recent development of various types of asynchronous 
processors. As different from the conventional design, the proposed asynchronous 
memory system is able to communicate with other asynchronous systems based on 
certain asynchronous handshaking control protocol. Also, to increase average speed 
performance, the variable bit line load concept is employed so that the memory read / 
write access time varies with the memory location. 
We propose to implement the circuit using static RAM and Four-Phase 
Handshaking Control Protocol. For read completion signal generation, two 
asynchronous techniques are investigated which are Current Sensing Completion 
Detection (CSCD) and Dual-Rail Voltage Sensing Completion Detection (DVSCD). 
On the other hand, for write completion signal generation, the Multiple Delays 
Completion Generation (MDCG) technique is investigated. To evaluate the 
performance of these techniques, we assume to implement the circuit for lM-bit as 
an example. In this case, each bit line is segmented into four regions. The CSCD 
method is evaluated by the pre-layout simulation results and found not suitable for 
read completion signal generation. The DVSCD and MDCG methods are evaluated 
xii 
Asynchronous Memory Desisn 
by the pre-layout and post-layout simulation results, as well as the testing results of 
the test chip. The DVSCD method is found suitable for the purpose, whereas the 
MDCG method is conditionally suitable for the purpose. 
We will illustrate how to use the proposed asynchronous memory system as a 
general large memory block. For future development, the proposed asynchronous 
memory system can be modified to communicate with other systems according to 
Two-Phase Handshaking Control Protocol and to handle more bits in the data bus. 
Moreover, the speed of the proposed asynchronous memory system can be optimized 
if special techniques are developed for smart memory allocation. Also, the write 
completion signal generation can be improved by a new method based on sensing the 
content of an extra row of memory cells. 
xiii 
Asynchronous Memory Desisn 
摘要 




(asynchronous handshaking control protocol)與其他異步系統聯絡°另外我們更 
利用可變位元線負荷槪念（variablebitlineload)來增加平均速度。 
我們建議利用靜態記憶體（SRAM)及四相聯絡控制協定（Four-Phase 
Handshaking Control Protocol)來實現整個線路。爲了產生讀取完成訊號，我們 
觀察了電流感應完成偵測方法（CurrentSensingCompletionDetection)及雙軌電 
壓感應完成偵測方法（Dual-Rail Voltage Sensing Completion Detection)�另外， 
爲了產生寫入完成訊號，我們觀察了多重延遲線路完成訊號產生方法 













Chapter 3 Implementation Asynchronous Memory Design 
Wm INTRODUCTION 
1. EVTRODUCTION 
Asynchronous Design has been a hot topic since the 1950's. Over the past few 
decades, numerous researchers have been working in this field. Why asynchronous 
design is so attractive? The title of the research is Asynchronous Memory Design. 
What is asynchronous memory and what are the differences between the 
conventional design and the proposed design? 
In this chapter, first of all we will describe the background of asynchronous 
design in Section 1.1. This includes the potential advantages, design methodologies 
and system characteristics of asynchronous design. Afterwards, we will explain the 
motivation of designing asynchronous memory and its definition in Section 1.2. 
Finally, we will present the control interface, overview and handshake control 
protocol of the proposed memory design in Section 1.3. 
Page 1 
Chapter 3 Implementation Asynchronous Memory Design 
1.1 ASYNCHRONOUS DESIGN 
1.1.1 POTENXm. ADVANTAGES 
Since the mid 1950's, asynchronous design has been a hot topic in research. 
Many researchers are refining and evaluating the techniques that can be applied in 
this area [l]-[5]. This is motivated by the potential advantages of asynchronous 
design over the conventional synchronous design. These potential advantages, 
according to Hauck [1], include no clock skew, low power consumption, average-
case instead of worst-case performance, ease of global timing issues, better 
technology migration potential, automatic adaptation to physical properties, robust 
mutual and external input handling. Whether these advantages can really be obtained 
depend on the application, design and implementation of the asynchronous system. 
1.1.2 DESIGN METHODOLOGDES 
Asynchronous design is such a rich area of research that there are many 
different approaches to synthesize the circuit. The more common asynchronous 
design methodologies include Huffman asynchronous circuits [6], burst-mode 
circuits [7]-[10], micropipelines [11], template-based [12] and trace theory-based 
[13], [14] delay-insensitive circuits, signal transition graphs [15]-[17], change 
diagrams [18], and compilation-based [19] quasi-delay-insensitive circuits. To make 
a strong comparison between these methodologies, especially in the critical issues of 
speed, power consumption and area, is difficult since each method is suitable for a 
Page 2 
Chapter 3 Implementation Asynchronous Memory Design 
certain type of application. For example, the micropipeline structure is powerful for 
implementing general computations, in which the delay between each processing 
stage is readily known; on the other hand, delay-insensitive circuit, through the use 
of completion detection circuitry, is suitable for implementing circuits with arbitrary 
delay between each processing stage. 
1.1.3 SYSTEM CHARACTERISTICS 
Hauck [1] had differentiated between synchronous and asynchronous designs 
by the following assumptions. In general, the conventional synchronous design is 
based on the assumptions that all the signals are binary, and that time is discrete. By 
the binary signals assumption, simple Boolean logic can be used to describe and 
manipulate logic constructs. By the discrete time assumption, hazards and feedback 
can largely be ignored. Asynchronous design keeps the binary signals assumption, 
but violate the discrete time assumption. 
Basically, for an asynchronous system, there is no global clock governing the 
timing of the state change. Sub-systems exchange information at mutually negotiated 
times with no external timing regulation. This results in significant circuit 
simplification responsible for data processing. On the contrary, the difference in 
operating speeds of sub-systems complicated the circuit responsible for generating 
the control signals. Therefore, when compared to an equivalent synchronous system, 
the asynchronous one tends to have simpler data processing circuitry but more 
complex control circuitry. 
Page 3 
Chapter 3 Implementation Asynchronous Memory Design 
As described previously in Section 1.12, although there are many methods to 
design an asynchronous system, according to Brackenbury, Furber and Kelly [20], 
the controlling schemes of the synthesized circuits differ only in dual-rail encoding 
or bundled data approach, with either transition or level sensitive signals. For dual-
rail encoding, each signal is represented by two wires which carry the true and 
complementary states of the signal. In this way, the validity of the signal is indicated 
by observing whether the two wires carry complementary states. For bundled data 
approach, each signal is represented by one wire, and the validity of the signal is 
indicated by an additional timing wire. 
When comparing between these two schemes, the number of wires used for 
dual-rail encoding is more than that for bundled data approach. However, for bundled 
data approach, extra circuits are needed to generate the signals for indicating the data 
validity. Dual-rail encoding is used for delay insensitive circuits, in which the delay 
is unbounded. On the other hand, bundled data approach is used for speed insensitive 
circuits, in which the delay can be arbitrary but bounded. For either schemes, the 
signals may be transition or level sensitive. For transition sensitive signals, the states 
of the signals are denoted by the rising or falling edges. For level sensitive signals, 
the states of the signals are denoted by the HIGH or LOW voltage levels. In general, 
transition control is more complex than level sensitive control because the internal 
control logic is usually level sensitive: the transition is needed to be converted to a 
level. However, level sensitive circuits need to be reset at the end of each operation 
before carry on to the next operation which results in wasteful of time and energy. 
Page 4 
Chapter 3 Implementation Asynchronous Memory Design 
To design an asynchronous system, one has to be very careful about the race 
and hazard problems [21]. The race problem is caused by the simultaneous change of 
value of two or more state variables within a single state transition. The hazard 
problem is caused by the unequal propagation delays in gates so that there are 
unintended transitions in the logic levels of signals. Possible hazards include static, 
dynamic and essential hazards. Usually, the race and hazard problems can be solved 
by suitable state assignments and by adding redundant states. 
1.2 ASYNCHRONOUS MEMORY 
1.2.1 MOTWATION 
The motivation of designing asynchronous memory origins from the recent 
development of various types of asynchronous processors. Here are some examples. 
First of all, Garside [22] had developed a CMOS self-timed ALU as part of an 
asynchronous implementation of the ARM microprocessor, which is a 32-bit RISC 
architecture developed by Advanced RISC Machines Limited. According to Garside, 
the system is self-timed since the majority of operations complete quickly whilst 
allowing rare worst-case operations to take longer, maintaining a high average 
throughput. Also, Muscato and Albicki [23] had developed a CMOS locally clocked 
sequential microprocessor for self-timed environment. The processor is synchronized 
by a pulse that runs through the whole system, and the pulse is channeled through a 
delay block that simulates the processing time of the instruction to be executed. They 
Page 5 
Chapter 3 Implementation Asynchronous Memory Design 
claimed that since the system accrues the average delay, rather than the longest delay 
ofthe instructions executed, this processor is expected to have better throughput than 
the synchronous one. Moreover, Tierno, Martin, Borkovic and Lee [24] had 
implemented a 100-MIPS GaAs Asynchronous Microprocessor with new circuits 
including a sense-amplifier and a completion detection circuit. They had also 
employed small size asynchronous static RAM as cache and fast memory for the 
testing of the microprocessor. In addition, Chang and Lu [25] had implemented a 
static MIMD data flow processor using micropipelines which is wholly asynchronous 
at both the architectural and the implementation levels. Asynchronous design, 
particularly in designing microprocessor, has been a hot topic of research nowadays, 
and there is a need to design and develop asynchronous memory block for these 
systems. 
One may ask why we need to design special memory block for these 
asynchronous systems. Is it possible to use the conventional memory block? The 
answer is no. It is because asynchronous systems communicate with each other based 
on certain handshaking control protocol, for example, Two-Phase Handshaking 
Control Protocol or Four-Phase Handshaking Control Protocol as will be described 
later in Section 1.3.3. However, conventional memory block cannot communicate by 
these protocols. Actually, conventional memory block is operated by a global clock, 
and the read and write operations are executed within a certain number of clock 
cycles. Therefore, to design the asynchronous memory block, we need to modify the 
control circuit in the conventional one. Other parts in the conventional memory block 
can still be used. However, as will be described later, we will demonstrate how to 
modify the memory matrix to improve average speed performance. 
Page 6 
Chapter 3 Implementation Asynchronous Memory Design 
There are many ways to design asynchronous memory. In the Turing Award 
lecture "Micropipelines" presented by I. E. Sutherland [11], micropipelines were 
introduced as an asynchronous alternative to synchronous elastic pipelines: the input 
and output data rate may vary. Micropipelines have been demonstrated as a powerful 
method to implement general computation. One example is given by Pang and Choy 
26] who had implemented a matrix multiplier by applying this structure. Apart from 
this type of application, Sutherland also proposed the use of micropipelines in 
memory design. In this case, the throughput of the memory part may be larger than 
that of the conventional memory. It is because the memory array can be accessed 
concurrently with driving the retrieved data and with decoding for the next address. 
The block diagram of the memory interface proposed by Sutherland is shown 
in Figure 1-1. It consists of two micropipeline structures since the data flow in 
opposite directions during the read and write operations. The Control Unit may be an 
asynchronous processor with the capability of communicating with the 
Asynchronous Memory based on the dual micropipelines structure. Actually, an 
asynchronous cache system may be inserted between the Control Unit and the 
Asynchronous Memory. The Asynchronous Memory consists of the conventional 
memory matrix plus the modified control circuitry for micropipelines structure. To 
understand the operation of the system, please refer to Figure 1-1. During the write 
cycle, the request signal R1 is initiated by the Control Unit, and the write data 
together with the address are placed in the data bus D1. When the operation is 
finished, the acknowledge signal A1 is generated. During the read cycle, R1 is 
initiated by the Control Unit together with the address placed in D1. When the read 
data are ready in D2, R2 is generated by the asynchronous memory to signify the 
Page 7 
Chapter 3 Implementation Asynchronous Memory Design 
i Micropipeline (Ml) ： 
^ ~ ~ Request ( R 1 ) 一  
一 ； Acknowledge ( A 1 ) 
l _ i i l ^ i _圓_^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^ ^^^^ i 
^ ： Address and Write Data ^ 
i i _ l I ^ i W M p i w i i i ；議 b i i i 
iil_ §1 :-••ll__^ ^^  -••丨 I I __i 
扫-¾ o o 
_ _ i 扫 , \ . . … i i i i i l i i B ^ ^ ^ ^ ^ ^ ^ ^ ……i g^S 
r9 ^ : Request ( R 2 ) ； 安 专 i 'WlmiSiSS^ \ ^ _ _ _ : 
' ^ ； _11_置__漏___讓_._蜃 i 
|_fl_:_i -^^  ： _ 麗 _ _ : _ 誦 騰 _ 麗 _ 邊 画 _ 霞 隱 震 謹 _ 盛 ： _liilfil: 
Read Data ； / ； iii||:|:|i;| 
| _ _ | 丨 : _ 誦 顯 _ _ w m m M S S s ^ ^ ^ ^ 
\ Micropipeline fM2) ; 
Figure 1-1: Micropipelines Memory Interface 
read completion. This is how micropipelines are applied to implement asynchronous 
memory. 
However, there are some disadvantages in this method. First of all, since a 
single micropipeline structure cannot handle bi-directional data bus, two 
micropipeline structures are required, together with two request and two 
acknowledge signals. Therefore, the asynchronous interface control circuitry is 
complicated. Secondly, the read / write completion signals, which are R2 and A1 
respectively, are generated by inserting the delays in the micropipelines which 
specify the worst-case memory read / write times. This limits the potential speed 
Page 8 
Chapter 3 Implementation Asynchronous Memory Design 
improvement of asynchronous designs. We are going to improve these in the 
asynchronous memory system that we have developed. 
1.2.2 DEFEVrriON 
Before explaining the memory system that we have developed in details, we 
should define the topic first. Generally, an Asynchronous Memory is defined as a 
static or dynamic memory system that can communicate with other asynchronous 
systems based on certain control protocol. Conventionally, the asynchronous 
memory generates the completion signals by inserting the worst-case delays which 
specify the worst-case read / write access times of the memory matrix. This method 
is simple since the delay circuitry is very simple. It is also effective since the memory 
cell access time at different locations in the memory matrix is almost the same. 
However, we have devised a method to differentiate the time to access memory cells 
at different locations. To increase the average speed performance of the 
asynchronous memory system, the worst-case delay insertion method is no longer a 
good option. As a result, the definition of the asynchronous memory in our research 
is, in addition to the former definition, the memory system that is capable to generate 
the tme completion signals once the read / write operation is finished. 
Page 9 
Chapter 3 Implementation Asynchronous Memory Design 
1.3 PROPOSED MEMORY DESIGN 
1.3.1 CONTROL EVTERFACE 
As shown in Figure 1-1, micropipelines can be used to implement the control 
circuitry for asynchronous memory. However, as described previously in Section 
1.2.1, since the control interface consists of two pairs of request and acknowledge 
signals, together with two data buses, it is not only complicated but also not readily 
used by other asynchronous or self-timed systems since these systems have only one 
pair of request and acknowledge signals, together with one data bus. 
To make the asynchronous memory control interface compatible with other 
systems, the new control interface depicted in Figure 1-2 is proposed. It consists of 
one pair of request and acknowledge signals, the read / write select signal, one bi-
directional data bus and one address bus. It operates as follows. During the read 
operation, when the address is ready, the Control Unit generates the request signal as 
well as the RAV signal to indicate read operation. The required data is then read from 
the memory matrix, and the acknowledge signal is generated. During the write 
operation, when the address and data is ready, the request and R/W signals are 
generated. The required data is then written to the memory matrix, and the 
acknowledge signal is generated. This control interface is simple and can easily be 
used by other asynchronous or self-timed systems. 
Page 10 
Chapter 3 Implementation Asynchronous Memory Design 
Request ^ 
rS Acknowledge o ^  
.¾ U g 
5 I 誦 , o I 
^ I g 
g ^ Data ^ S 
^ ^ < ) < • r"H 
多 隱_1園1__廳聽囊|_!!:|!|丨1: 
w :::::.:极""醒法丨熙丨她^錄丨•細-‘丨记丨-丨：：-丨丨丨：：：丨：：：：丨：丨；：丨丨、:.:,   
1/ 
Figure 1-2: Proposed Asynchronous Memory Interface 
1.3.2 OVERVn:W 
The block diagram of the proposed asynchronous memory is shown in Figure 
1-3. The system is speed insensitive, and bundled data approach has been used. 
Basically, apart from the control circuit and the read / write completion circuits, 
everything is just the same as the conventional static RAM (SRAM) architecture 
'27]. The control circuit is used to communicate with other asynchronous systems 
and generate the internal asynchronous control signals. The read / write completion 
circuits are used to generate the realistic read / write completion signals. SRAM 
architecture is employed in our research since it is fast, simple and can be easily 
Page 11 
Chapter 3 Implementation Asynchronous Memory Design 
Sense READ WRITE 




Address Bus Req ^ 
m m p H H l | | l ^ Column Decoder Ack.< Control Circuit  
Rm——> 
门 \ ~ ~ ~ ~ I 
‘ Row ^ ^ Memory 
^***i^ Decoder *P^ Matrix 
n  
i I ^ 
Data Bus Write Spn^ X^^  Read 
< _ Buffer _ | _ Tr^pUp^^^^ Buffer _ | 
^ » � 
Figure 1-3: Block Diagram of Proposed Asynchronous Memory 
implemented in the asynchronous way. Although Dynamic RAM (DRAM) is more 
area efficient than SRAM, it is not employed since internal clocking is needed to 
refresh the memory cell contents, so it is difficult and complicate to implement 
DRAM in the asynchronous way. Moreover, SRAM is always used in design in 
which the memory speed is critical to the circuit performance. This is not the case for 
DRAM. Therefore, SRAM rather than DRAM architecture is chosen in our research. 
Page 12 
Chapter 3 Implementation Asynchronous Memory Design 
1.3.3 HANDSHAKE CONTROL PROTOCOL 
Speed insensitive systems communicate with each other based on two 
common handshake control protocols (HCP). The first one is Two-Phase HCP 
introduced by Sutherland [11]. For this protocol, an event is triggered by a signal 
transition, whether it is a rising edge or a falling edge. It is termed two-phase since 
only two phases, the sender's active phase and the receiver's active phase, are allowed 
in each operation cycle. In each cycle, the sender's active phase is terminated by the 
request signal, whereas the receiver's active phase is terminated by the acknowledge 
signal. During the receiver's active phase, the transmitting data must be held steady 
until the acknowledge event is triggered. The data is termed bundled since the delays 
in data transmission must be less than delays in transmitting the request event, in 
order that the data are ready before the request event. Another common asynchronous 
HCP is Four-Phase HCP as described by Furber and Day [28]. For this protocol, an 
event is triggered by the signal logic level, but there is a choice as to which logic 
level (either HIGH or LOW) is used to trigger the sender's active phase and the 
receiver's active phase. The other logic level is inactive and is part of the recovery 
phase during which the circuit prepares for the next cycle. Therefore, it is termed 
Four-Phase since apart from the two active phases, two recovery phases are also 
required. In our memory design, since most of the internal control signals are level-
sensitive, Four-Phase HCP is used and each request or acknowledge event is 
triggered by the logic HIGH. 
Page 13 
Chapter 3 Implementation Asynchronous Memory Design 
The simplified event-driven timing diagrams for read and write operations are 
shown in Figure 1-4 and Figure 1-5 respectively. For the read operation, wheii the 
read address is ready, the external control unit generates the request signal. This 
terminates the sender's active phase. The circuit then starts to read from the memory 
matrix. When the read data is ready, the read completion signal triggers the 
acknowledge signal. This terminates the receiver's active phase. After the two 
recovery phases, the cycle starts again. The write operation is very similar to the read 
operation. For the write operation, when the write address and data are ready, the 
external control unit generates the request signal. The circuit then starts to write to 
the memory matrix. When the process is completed, the write completion signal 
triggers the acknowledge signal. 
One may ask why there is a difference on how the acknowledge signal is 
exerted: for read operation, it is triggered by the completion falling edge; for write 
Request ^ V  
Acknowledge / \  
Read completion ，/ \ ^  
Read enable ——/ \  
Address ^X^ ^X^ 
Data X ^ X ^ 
Figure 1-4: Timing Diagram for Read Operation 
Page 14 
Chapter 1 Introduction Asynchronous Memory Des_ 
I ~" 
Request / \<^   
Acknowledge \ / ^ ([ 
Write Completion y r \  
Read enable \ / 
需_圖圖 
Data " ^ ~ " X 
Figure 1-5: Timing Diagram for Write Operation 
operation, it is triggered by the rising edge. Actually, it is due to the difference in the 
techniques used for generating the read and write completion signals. This difference 
on how the acknowledge signal is exerted is not important since the completion 
signals are internal control signals. 
Page 15 
Chapter 2 Theory Asynchronous Memory Design 
H^ ^^PflB^ ^^ B^ ^^ I 
Wk THEORY 
2. THEORY 
For the conventional synchronous or asynchronous memory systems, the 
worst-case memory access time is specified for the read / write cycle times since the 
bit line load is constant. In our proposed memory design, in order to demonstrate the 
increased average speed performance of asynchronous designs, the bit line load is 
assumed to be variable. What is Variable Bit Line Load? Assuming that the bit line 
load is variable, how the realistic read / write completion signals can be generated? 
In this chapter, we will first define and explain the advantages of variable bit 
line load in Section 2.1. Afterwards, we will present the theory of the two techniques 
that we have investigated to generate the read completion signal. These include the 
Current Sensing Completion Detection method in Section 2.2 and the Voltage 
Sensing Completion Detection method in Section 2.3. Finally, we will introduce the 
Multiple Delays Completion Detection method in Section 2.4. 
j 
Page 16 
Chapter 3 Implementation Asynchronous Memory Design 
2.1 VAMABLE BIT LEVE LOAD 
2.1.1 DEFEVmON 
The bit line load is the parasitic capacitance, resistance and inductance 
accompanied the long bit lines of the memory cell matrix. Among the three parasitic 
effects, the capacitance is the dominating one for CMOS VLSI design. The bit line 
capacitance can be as high as several tens of pico-Farads (pF), and 20 pF is a typical 
value [29], [30]. Variable Bit Line Load refers to the bit line, the load of which can 
be made variable by the addition of suitable circuits controlled by external signals. 
2.1.2 ADVANTAGE 
With reference to the SRAM architecture described in Figure 1-3, during the 
read operation, the bit line load is being driven by the memory cell. On the other 
hand, during the write operation, the bit line load is being driven by the write buffer. 
Since the drive strength of the write buffer is much greater than the memory cell, the 
effect of the bit line load is more significant for the read operation than the write 
operation. Hence, the advantage of variable bit line load is more significant for the 
read operation. � 
If the bit line load can be made variable by breaking the bit line into 
segments, the time to access memory cells at different locations can be varied [31:. 
For the conventional synchronous or asynchronous memory systems, since the worse 
case memory cell access time is specified for the read / write cycle times, there is no 
Page 17 
Chapter 3 Implementation Asynchronous Memory Design 
advantage of making the bit line load variable. For a self-timed system [1], since the 
control signals are asynchronous in nature or event-driven, the time delay between 
the request signal and the acknowledge signal can be a variable. Therefore, if the bit 
line load is made variable, the time to access memory cells at different locations is 
varied, and the read / write completion signal generated can be used as the 
acknowledge signal for the memory system to communicate with other self-timed 
systems. Since the time delay between the request signal and the acknowledge signal 
is dependent on the value of the bit line load accompanied the target memory cell, the 
average speed performance of the memory system is increased when compared to the 
conventional synchronous or asynchronous memory systems. 
2.2 CURRENT SENSEVG COMPLETION DETECTION 
For our memory design, when the read / write operation of the required 
memory cell is finished, the completion signal is generated immediately as described 
previously in Section 1.2.2. The completion signal generated can then be used as the 
acknowledge signal for the memory system to communicate with other self-timed 
systems. We have investigated two completion signal generation methods for the 
read operation in our research based on current sensing and voltage sensing, the 
theory of which will be described in this section and Section 2.3 respectively. 
The Current Sensing Completion Detection (CSCD) method is proposed by 
Dean, Dill and Horowitz [32]. This method is suitable for asynchronous systems 
Page 18 
Chapter 3 Implementation Asynchronous Memory Design 
employing bundled data approach in the HCP. Because of this, the advantages of 
CSCD include those advantages of bundled data approach over dual-rail encoding 
32], namely, reduced parasitic capacitance, removal of spacer tokens in the data 
stream, and computation state similarity of consecutive data variables. Moreover, for 
asynchronous systems using CSCD, the Boolean function blocks used in 
synchronous designs can also be used so that the conventional methods for 
combinational circuit design and optimization can be applied. However, the reason of 
applying this method is that it is tailor-made for CMOS designs and it can be 
implemented by simple circuits, as we will demonstrate in the following sections. 
2.2.1 BLOCKDL\GRAM 
When the output of a CMOS circuit are transiting due to activities at inputs, 
that is the circuit is in the switching state, the supply (VDD) or ground (VSS) current 
increases dramatically. When all the inputs and outputs reach their corresponding 
final logic levels, that is the circuit is in the steady state, the VDD or VSS current 
decreases to approximately zero. Actually, this relationship between the VDD or 
VSS current and the circuit operation state for CMOS designs has made the CSCD 
method possible. In fact, the CSCD method is specially designed for CMOS circuits. 
Page 19 
Chapter 3 Implementation Asynchronous Memory Design 
The block diagram of a CMOS circuit with CSCD is shown in Figure 2-1. 
Refer to the diagram, apart from the conventional CMOS logic block, the CSCD 
implementation includes two current sensors, an input latch, a minimum delay 
generator and a control output NOR gate. The function of the current sensor is to 
detect a predefined current threshold and generates the sensor completion signal as 
long as the current is below this threshold. Ideally, two current sensors, one 
connected to VDD and the other connected to VSS, are required for accurate 
completion detection. The input latch is used to minimize the variations in input 
arrival times so that valid input combinations are being evaluated. To guarantee 
correct completion signal generation, the minimum delay generator is needed to 
allow time for the current sensor to respond to the input changes. This signal, 
令 
|，ren; o/p|  
Sensor 
VDD 
i i ^ — — • Output 
hiput ^ CMOS Logic Block • 
Control yP 一 Latch ^ 
vss 
Control 0 /P 
'\ Minimum Current \ ^ \ ^ 
• Delay Sensor ^ /P = Z = = H > ~ ~ ^ 
Generator ^ 1 ^  
Figure 2-1: A CMOS Circuit with CSCD 
Page 20 
Chapter 3 Implementation Asynchronous Memory Design 
together with the two current sensor completion signals, are combined to form the 
final completion signal by the NOR gate. 
The circuit operates as follows. Assume that initially the circuit is in the 
steady state. When the input data are ready, the control signal triggers the input latch 
and the data are latched. The CMOS logic block then starts to switch, and the circuit 
enters the evaluation or switching state. The current sensors then monitor the VDD 
and VSS currents. The completion signals of the sensors will not be generated as 
long as the current is higher than a predefined threshold. When the VDD and VSS 
currents are lower than the threshold, the sensor completion signals are generated. 
These signals are combined with the minimum delay to form the final completion 
signal. 
2.2.2 GENERAL LSD CURRENT SENSOR 
In Section 2.2.1, we have discussed how to implement CSCD for CMOS 
circuit. In the implementation, one of the key components is the current sensor. 
Actually, the current sensor is specially designed for CSCD. According to Dean, Dill 
and Horowitz [32], there are two possible configurations for sensing the current flow 
between VDD and the logic block, which are Low Supply Drop (LSD) and Zero 
Supply Drop (ZSD). The voltage drop across the current sensor should be minimized 
to reduce the influence of the sensor on the operation and performance of the logic 
function being monitored. The LSD configuration is used in systems where the 
supply voltage and the logic voltage are the same (for example, CMOS technology). 
Page 21 
Chapter 3 Implementation Asynchronous Memory Design 
On the other hand, the ZSD configuration is used in systems where the supply 
voltage is greater than the logic voltage (for example, 3.2 V BiCMOS technology 
with 5V supply). Since we are using CMOS technology for our design, the LSD 
configuration is chosen. As mentioned in Section 2.2.1, in the ideal case, two current 
sensors are required; however, in practical situation, the signal transitions in most 
circuits are evenly distributed over the logic block, only one current sensor is enough. 
This is particularly important for the LSD configuration since the voltage drop across 
the current sensors should be minimized. 
The block diagram of a general LSD current sensor for connection to VDD is 
shown in Figure 2-2. It consists of a sensing load L1, a voltage clamping transistor 
T1 and a sense amplifier A1. The sensing load L1 is used to convert the supply 
今 
VDD^  o  
^ ^ L ^ _ , 




VDDouT n i ^ \ 
A1 ^ > " i ~ ~ 0 
. ^ ^ / / ^ VcOMP 
0 Vw J1_ 
To VoDOf 
CMOS Logic Block 
Figure 2-2: CMOS LSD Current Sensor 
Page 22 
Chapter 3 Implementation Asynchronous Memory Design 
current proportionally to a voltage, which is one of the input to the sense amplifier. 
The voltage clamping transistor T1 is used to limit the maximum sense voltage to a 
value which has virtually no effect on the operation of the logic function, for 
example, 0.7 V. Otherwise, if the supply current is large, the voltage drop across L1 
will be large enough to affect the correct functioning of the CMOS logic block. The 
voltage of the sensing load VLOAD is compared with a reference voltage Y^^^ by the 
sense amplifier, the output of which is the completion signal VcoMP of the current 
sensor. When the circuit is in the steady state, the supply current is virtually zero, and 
VcoMP is LOW. When the circuit is in the switching state, the supply current 
dramatically increases. If the current is larger than a predefined threshold controlled 
by the VREF, VcoMP is pulled HIGH. 
2.2.3 CMOS LSD CURRENT SENSOR 
To implement the LSD current sensor in CMOS technology, Pang et al. [5] 
had proposed a design with simple circuitry. The proposed CMOS LSD current 
sensor for connection to VDD is shown in Figure 2-3. The transistor TP1 acts as the 
sensing load and voltage clamp, which corresponds to L1 and T1 in Figure 2-2. The 
transistors TP2 and TN2 form a voltage divider, which generates the reference 
voltage VREp. The transistors TP3 and TN3 form an inverter or amplifier, which 
corresponds to A1 in Figure 2-2. For this current sensor, the principle of current 
mirror is applied to convert the VDD current 1穩 into a proportional current lMiRROR-
l M i R R O R is then used to charge up the inverter output VcoMP if it is larger than a 
predefined threshold controlled by the voltage divider output VRgp. Basically, the 
Page 23 
Chapter 3 Implementation Asynchronous Memory Design 
Sensing Load and 
I Voltage Clamp 令 Amplifier 丨 
P ； i VDD^  ： 丨 " ~ ~ | 
广 i p 1 — ^ j ~ ~ J i r 
V^DD ! TP.l | p ~ f ~ ~ i 1 ~ ~ c | | T P 3 i lM,RROR 
• I p 1 I I 
1 <• i 个 ，•~~i~~I=> 
VREF ； ； VcoMP 
VDDouT ； j T N 3 ： r n 
iiiiii|:ifiii 
1 圖 ^ ^ 
ToV^ o^f 丨 L_^  I 
CMOS Logic Block 權 ： 
； ;Voltage 
; ：Divider 
Figure 2-3: Circuit Diagram ofCMOS LSD Current Sensor 
T ^ ^ ^ e i ^ 5 i ^ 5 5 ^ ^ 5 i ^ ^ S i ^ ^ ^ 5 i ^ ^ S ^ ^ S ^ ^ ^ ^ S ^ ^ 5 5 l ^ ^ ^ ^ r i 
operation of this circuit is the same as that described in Section 2.2.2 and will not be 
repeated here. 
Now we are going to derive the relationship between lyDo and VcoMP 
analytically. For the transistor TP1, since the gate and drain are connected together, 
(2-1 ) ^DS(TP1) = ^GS(TP1) 
and TP1 is always in saturation. Apply the current equation of MOSFET in 
saturation region: 
(2_2 ) ^DS(TP1) = ~~2~ (^GS(TP1) ~ ^T(TP1)) 
Page 24 
Chapter 3 Implementation Asynchronous Memory Design 
T7 
.77 _ 77 ^^DS(TP1) 
( 2 - 3 ) .•• ^ DS(TP1) ~ ^r(TPl) + 1 ~~^ 
y P(TPi) 
Hence TP1 acts as a MOS diode or an active load in which VDs(ipi) increases 
proportionally with the square of lDs(TPi), and the voltage drop across the diode is 
clamped at a Vos(TPi) value a bit greater than V^jny The supply voltage to the CMOS 
logic block is thus reduced to 
( 2 - 4 ) ^DD(BLOCK) 二 ^ DD - ^ DS(TP1) 
This may not affect the CMOS logic block functionally since Vos(TPi) is small, but the 
timing response is affected. 
Since transistors TP1 and TP3 form a current mirror, 
(2 5 ) ^DS(TPl) — (1 + ^ (TPl)^DS(TPl))(^^L)(TPl) 
^DS(TP3) (1 + ^ (TP3^DS(TP3))(^^^)(TP3) 
Assume that the channel length modulation terms ？ (^^ ” and X(Tp3) are small, 
,。广、 ^DS(TP1) CyL)(TP]) 
(2-6) 7 = 7 ^ ~ ~ 
^DS(TP3) ( /L/(TP3) 
Substituting for lyoD and !MmROR in ( 2-6 ), we have 
(r^ … r (WZL)(TP3) 
V 丄-’) 上 MRROR — (w/ ) VDD 
‘ ( / L j ( T P l ) 
Transistors TP2 and TN2 are active loads, and because they are both in the 
saturation region, applying KCL, we have 
( 2 - 8 ) “ (^DD - ^REF — ^T(TP2) , “ '^~(^REF ―― ^ T(TN2)) 
Page 25 
Chapter 3 Implementation Asynchronous Memory Design 
^T(TN2) j (—/^(Tp2) + ^DD — ^T(TP2) 
(2 -9) . . . V ^ = 1 ‘ 
7 + / r ' " > { 
V / ^(TF2) 
Hence TP2 and TN2 form a voltage divider with Y^^ as the output voltage. 
Let the loading capacitance at the completion signal output of the current 
sensor be Coui- When VcoMP < 乂虹尸 _ 乂丁(爪3), transistor TN3 is in the linear region, 
we have 
V^2 
(2-10 ) ^DS(TN3) ~ ^(TN3)[(^REF - ^T(TN3))^COMP — ^ ^ 
The equation for charging up the capacitor at the current sensor output node is 
( 2 - 1 1 ) CoUT ~ ~ ^ = (^MRROR - ^DS(TN3)) 
Integrating both sides 
1 V . c"f^ ^^COMP 
( 2 - 1 2 ) ^ ] d t = • -n — 
^OUT 0 0 厂(_ T/2 _ n /T/ _ T/ \T/ , J 
~"2~~ r COMP — P{TN3) V REF — ^ T{TN3)广 COMP 卞！ MIRROR 
Substituting ( 2-7 ) and after simplifications, we get 
(2-13) 
A(/p^D) ^(^VDD ) — P{TN7>) O^REF — ^ T{TN3)) 
^COMP ⑴ 二 (W/\ (W/\ 
(,丄—）J 门—广刊,,/,,,,)、 2 —) J 
fW/\ ^VDD\^~^ ) 〜w/、 ^VDD 
V /L){TPl) V /Lj(TPl) 
where 
Page 26 
Chapter 3 Implementation Asynchronous Memory Design 
C%)(ZP3) 
(2-14 ) 八(/隱)=-^(TN3) (^REF _ ^T(TN3) ) - ^P{TNZ) fW/\ ^隱 
\ V / L ) ( T P l ) 
� t i,T � , ^P(TN3>) (^REF ~ ^T(TN3)) ^ ^i^VDD). 
(2-15 ) n U v n o ) = 7 T " " A ( / _ ) + l n ( - ^ “ 7 ^ y - ^ “ ^ . . . . J 
^OUT P(TN3) V^ REF ^ T{TN3)J ^K^VDDJ 
and V^ EF is given by ( 2-9 ). From this expression, we get VcoMP(0) = 0 and VcoMP© 
increases with the inverse of the subtracted exponential of t iflyoD is constant. 
When VcoMP > R^EF _ T^(TNsp transistor TN3 is in the saturation region. 
Performing analysis similar to the above, we get 
P{TN2>) , T/ \2 
(2-16 ) ^DS{TN3) ~ ~2~、腳—^T{TN3)) 
(2-17 ) CouT ~"^ ” (^MRROR — ^DS(TN3)) 
and finally 
(2-18) 
V ( f \ (卜~)「拟(們) _P{TN3) v^ + (v -V ^ 
^COMPV)-广 ^(W/\ 隱 9 ^^REF ^T{TN3))」十、〜厂 ^T{TN3)) 
^OUT V /L)(TPl) 丄 
where t � i s calculated by substituting VcoMP = V ^ • 乂聊3) in 
(2-13 ) which is the time when the transistor TN3 is just saturated. From this 
expression, we get VcoMp(to) = ~^ ^Y _ 乂丁(™�）and VcoMP(t) increases linearly with t if 
IvDD is constant. 
Equations 
(2-13) and ( 2-18 ) show how the completion signal VcoMP charges up when 1禱 is 
above the threshold set by the voltage divider. The VcoMP discharge equations are 
Page 27 
Chapter 3 Implementation Asynchronous Memory Design 
similar to the above and will not be derived here. Since VcoMP depends on IvDD, if we 
want to plot VcoMP against time, we have to know how lyoD increases with time. 
2.3 VOLTAGE SENSEVG COMPLETION DETECTION 
In Section 2.2, we have introduced the CSCD method which is suitable for 
generating the completion signal for general CMOS circuits. If implemented suitably, 
this method can be used to generate the completion signal for a memory system when 
the read operation of the required memory cell is finished. In this section, we are 
going to introduce another method based on voltage sensing. We proposed the Dual-
Rail Voltage Sensing Completion Detection (DVSCD) method. This method is 
chosen because the data are dual-rail encoded internally for memory circuits and this 
method suits the purpose. Actually, it is specially designed for memory circuits. 
Also, unlike the CSCD method in which the completion detection circuit consists of 
special CMOS circuit (the current sensor), this method requires standard logic 
components to form the completion detection circuit which is relatively easy to 
implement. 
Page 28 
Chapter 3 Implementation Asynchronous Memory Design 
2.3.1 DATA READEVG EV MEMORY CmCUIT 
To understand how the DVSCD method works, the data reading mechanism 
of a general memory circuit should be introduced first since the DVSCD method is 
specially designed for memory systems. 
The simplified circuit diagram during the SRAM read cycle for one cell is 
shown in Figure 2-4. For memory circuits, data is read by sensing the memory cell 
content through the use of a sense amplifier [27], [29]-[30], [33]-[36]. The SRAM 
read cycle is briefly described as follows. 
First of all, after decoding the address, the word line signal WL of the 






i i ; ; i g i : i | : ; l l 
SRAM Cell 
:i|“i[::[i:;1_ll_Hil BB B 8^ ‘)__&111:;;:':;_晶::;1:;::11_圖 
WL 
Word Line 
c > Data Sense 
. . _ . f \ Amplifier 
^……BitLine Pair • ^ \ ^ 
一 > ^ ^ 
'WMmMmM— -^:r;g0:&gM|g^^^^l 
^ k ^ ……Bit Line Load……•个 ^ Buffb" 
Figure 2-4: Circuit Diagram of SRAM Read Cycle 
Page 29 
Chapter 3 Implementation Asynchronous Memory Design 
line pair B and BB are charged up by the precharge buffer to a predefined voltage 
level, for example VDD. When the bit line loads are fully charged, the precharge 
signal is disabled. The memory output node with a logic LOW stored starts to pull 
down one of the bit lines eventually. Since the drive strength of the memory cell is 
small and the value of the bit line load is large, it takes a long time to pull down that 
particular bit line. Therefore, a sense amplifier is used to sense the memory cell 
content. It will generate a logic HIGH when the inverting input is sensed to be 
pulling down, on the other hand, a logic LOW will be generated when the non-
inverting input is sensed to be pulling down. 
2.3.2 BLOCKDLVGRAM 
The DVSCD method generates the completion signal when the sense 
amplifier has finished sensing the memory cell content. To understand how it works, 
refer to the block diagram of DVSCD shown in Figure 2-5. It consists of two sense 
amplifiers and one NOR gate which are standard CMOS design components. The 
sense amplifier is identical to that for sensing the memory cell content, that is the 
data sense amplifier shown in Figure 2-4. The non-inverting input of the two sense 
amplifiers SAg and SAgg are tied to VDD, whereas the inverting input are connected 
to the two bit lines B and BB. The completion signal VcoMP is obtained at the NOR 
gate output. 
Page 30 
Chapter 3 Implementation Asynchronous Memory Design 
」.，.，.」“."...:、：：.：：.：”.. ...,,,...,,,...".,,,.r ...................-,.,,,,,,,,,,.,,.,,,,.,,| \ 
个 
. . _ _ ^ ^ 
B SA3 > 
^ ^ > ^ 
z _ r - ^ v _ 
r = L > ^ 
I " ^ ^ h - > _ J 
BB SAgg ^ > 
^ [ > ^ 
Figure 2-5: Block Diagram ofDVSCD 
It operates as follows. When the bit line pair (B and BB) is precharged to 
VDD, the output of SAg and SAgg are pulled LOW. Actually, this can be preset by 
scaling the transistor sizes used in the sense amplifiers properly. At this point, VcoMP 
is pulled HIGH. When the precharge signal is disabled, the memory cell content is 
not only sensed by the data sense amplifier, but also sensed by SAg and SAee. When 
the data sense amplifier has just finished sensing the memory cell content, the output 
of either SAg or SA^^ will be pulled HIGH also since either one of the bit lines will 
be pulled down. At this time, the VcoMP will be pulled LOW which signifies the 
completion of the read operation. 
Page 31 
Chapter 3 Implementation Asynchronous Memory Design 
2.4 MULTIPLE DELAYS COMPLETION GENERATION 
In the last two sections, we have introduced the CSCD method and the 
DVSCD method for generating the completion signal for the read operation. Now we 
are going to introduce the method for generating the write completion signal. 
Actually, if we want to generate the completion signal, we have to sense the 
signals of the circuit which is critical to the completion of the operation. To achieve 
the appropriate sensing, a sensor is needed which is either added to the circuit or 
stand alone to sense the signals involved. For the read operation, since the critical 
circuit for the read completion is the sense amplifier, the number of which is small in 
the whole memory system, it is both possible and practical to add a sensor to each 
sense amplifier and perform either current or voltage sensing. However, for the write 
operation, since the critical circuit for the write completion is the memory cell, the 
number of which is very large in the entire memory system, it is difficult and 
impractical to add a sensor to each memory cell and perform the sensing. Therefore, 
in our research, the write completion signal will not be generated in a way similar to 
the read completion signal. 
2.4.1 ADVANTAGE 
To generate the write completion signal, we proposed the Multiple Delays 
Completion Generation (MDCG) method. The advantage of this method is based on 
the variable bit line load assumption described in Section 2.1. If the bit line load can 
be made variable by breaking the bit line into segments, the time to write memory 
Page 32 
Chapter 3 Implementation Asynchronous Memory Design 
cells at different locations can be varied. Therefore, if for each segment, the worst 
case memory write time is known, a unique worst case delay for that particular 
segment can be used to generate the write completion signal. In this way, different 
worst case delays are used for different segments, and the write completion signal is 
generated by one of those delays during each write cycle. Therefore, the average 
memory write cycle time is reduced when compared to that required by the 
conventional method in which one worst case delay is used for the entire memory 
matrix. 
Alternatively, it is possible to generate the write completion signal by sensing 
the bit lines voltage level. During the write operation, the bit lines in use are sensed 
by sense amplifiers. When the bit lines voltage difference increases to a certain value, 
the data should be written to the target memory cell and the write completion signal 
is generated by the sense amplifiers. However, the reliability of this method is 
questionable since the threshold to write a memory cell varies over the entire 
memory matrix. Moreover, the bit lines should be precharged before the data is 
written which lengthens the write operation time. Also, the sense amplifiers consume 
more area and power than the delay circuits. As a result, this is not a good alternative. 
2.4.2 BLOCKDL^GRAM 
The block diagram of the write completion circuit employing the MDCG 
method is shown in Figure 2-6. It consists of a segment decoder and a delay 
generator. The delay generator consists of all the different worst case delays for each 
Page 33 
Chapter 3 Implementation Asynchronous Memory Design 
segment. It operates as follows. At the beginning of each write cycle before the 
request signal is enabled, the address bits critical in deciding the segment 
corresponding to the target memory cell are input to the segment decoder. The 
segment decoder then outputs the appropriate signals to control the delay generator 
such that the worst case delay for that particular segment is chosen. Afterwards, the 
request signal is enabled and the write completion signal is generated by the 
appropriately delayed request signal. 
I -•• 
Address 
, B : 0 I 
y? 
Segment Decoder  
0/P . 
Control I I 
Bus \ / 
\ / Write 
Request Completion 
O — l/p Delay Generator 0/P C > 
Figure 2-6: MDCG Write Completion Circuit 
Page 34 
Chapter 3 Implementation Asynchronous Memory Design 
Q IMPLEMENTATION 
3. EVlPLEMENTATION 
In Chapter 2, the concept of variable bit line load and the techniques for 
generating the read / write completion signals are described. How to implement these 
concepts and techniques in the memory circuit? In this chapter, we are proposing 
how to design the asynchronous memory system. We will take the lM-bit SRAM as 
an example. First of all we will describe the lM-bit SRAM framework in Section 
3.1. The implementation of the control circuit will be described in Section 3.2. After 
that, we will focus on the implementation of the techniques presented in Chapter 2. 
These implementations include the variable bit line load in Section 3.3, the CSCD 
method in Section 3.4, the DVSCD method in Section 3.5 and the MDCG method in 
Section 3.6. 
Page 35 
Chapter 3 Implementation Asynchronous Memory Design 
3.1 lM-BIT SRAM FRAMEWORK 
3.1.1 EVTRODUCTION 
In our research, we have chosen to design the asynchronous memory system 
based on the lM-bit SRAM framework. There are two reasons for using lM-bit or 
128k-byte memory size. One reason is that this memory size is large enough to 
demonstrate the advantage of variable bit line load, that is, the read / write access 
time difference between the memory cells at extreme positions is significant. Another 
reason is that this memory size is in fact a basic unit in building SRAM systems 
nowadays. If one wants to build a 8M-bit or lM-byte SRAM system, he can either 
connect four lM-bit SRAM chips together with external control circuitry added or 
incorporate the four lM-bit SRAM layouts inside a single chip with internal control 
circuitry modified. The reason of using SRAM, as mentioned previously in Section 
1.3.2, is that SRAM is easy and simple to be implemented in the asynchronous way 
when compared with DRAM. 
3.1.2 FRAMEWORK / 
The lM-bit SRAM framework is shown in Figure 3-1. Actually, it follows 
the SRAM block diagram shown in Figure 1-3. The components drawn with solid 
lines are those which can be found in a conventional SRAM. These components 
include row decoder, column decoder, pass transistor array, address buffer, read 
Page 36 
Chapter 3 Implementation Asynchronous Memory Design 
iliiiiilliililii 
«>；|；|：：：1：：|：：：：：：；：^ 2-to-4 ]；：?;；0：；;^；|：;|：|：,：?：：-；；：^  �VP Segment Decoder [ 
iii_ii piiiiiiijiiiiiiii o/p _ fmMKS9&Omis 
1 1 1 ¾ ^ 
|;|il;lll i |i | | iij " "' ' ]iiiii 舊1 1 _ _ _ � E ]iif MP ]|隱隱隱_|!隱__||_ 
||||||| i||i|ii;||| Write |||| Read 丨 
w. _::i:i"_l 丨_:後":._權::丨:丨Completion iHif;i Completion丨丨丨@丨丨.丨讓:_顧!:丨禹:丨!疆匪:|丨罷;丨： 
瞧!霸1_ g I 5iB|l Modified l_ll_l:il Circuit |：|| Circuit 丨 !丨 " _ _ _ _誦 | _ : | | _ 
ii__| 6 Q : : � 1024x1024 |;||||||||_0/P___^ _i;||:!_—_?仁__]|丨丨丨1觀"_|_匪!||:||!1 
:_講__! 誦額蒲面1 o ^ 丨：讓鎮！ Memory Matrix 丨"萬__:_窗_!:: ::!y__:._:::"":_!:_i;_ ：|>|| ： % WMMyi:¾i 
M___:i 觀 膽 _ 霍 — ^ | | i i | j 'iiii'i^ l^lli'|ffi^ ;li;H| 
iiillii: i i i i | i iiiiii !!111丨_||["^「細吼 Rc ]_.1!|!|屬|__||_|画|疆!!.！ 
i i i l l i i f l i Control _ i i i ^ i : _ _ _ i |i|i|||j||lig^^^^^ Circuit 綱 iififfiiiQ___iii!i:i 
:'•：；^'；；；：：：；：.；；?5! ““ _:iK:._. g::_::E| iT_| ACK 丨：":• :•:.•:：11；;:.•:-O^iSfesMe^ge；:丨i；丨 
i l _ i i P | : I S3 _ i | i Pass Trans.stor Aray 圓:|| 醒| p^  CR CW _丨11!1_1顯_!_!1!__ 
i i k w i _, __嶋^ ^^ ^^ ^^ ^^ ^^  
:1丨1_!誦誦_§__蘭1^ ^^ ^^ _； _^_^ _j^ i,_yiiMyiiiSii__丨：：葡丨丨丨难：丨：丨碰:…^^  
| | | | | | _ 謂 _ 觀 _ _ | l__l_:_! 10-to-1024 lill 11 _ Precharge | | | | !|||::|||| __:::_而__:||| |_|_|_|__:__!l^^^^^ ;^ ^^_;j^ ;^jj^p;^ >^>>^  Column Decoder 團丨:隱_ Buffer . ;;|.i;si;|||'||':|||-N 
|：|：:；：-：；1；1；：1|| ,…………”�.:.,:., _!!lii_,:::::::::::•::::: •:,,—�, WsisSM 
^'mBm^sm§0^^ 
;i:r^ '|yfi::;:Mli:il:li 
i i iM^^^ 
11!：：：1|1!|»^^ ^^^^^ 
:"••：⑴“：：⑴丨丨：丨幽劇^ ?®!®® f^^ ffi?|:i^ iiiiJs：^^  
i _:_ 丨?:_ ：：两_ ^ Sfirfef (；： I：® •_._ i: ： ：•；; ：；: y i;: : ：  ； c:_^l_6i|5|;；>； 
§|!；-：|；|1^：|：1|1 
Figure 3-1: lM-bit SRAM framework 
buffer, write buffer, precharge buffer and data sense amplifier which will be 
described in the coming paragraphs. 
The row decoder and the column decoder is used to select one ofthe rows and 
one ofthe columns respectively during each read / write cycle. For lM-bit memory 
system, since there are l024 rows and 1024 columns, both the row decoder and the 
column decoder is a lO-to-1024 decoder. This decoder is mainly composed of two 
stages ofNAND gates cascaded together in order to maximize the speed. 
The pass transistor array is controlled by the column decoder and is used to 
connect the bit line pair of the selected memory cell with the bit line pair outside the 
Page 37 
Chapter 3 Implementation Asynchronous Memory Design 
memory matrix. Since there are 1024 columns, the array consists of 1024 pass 
transistors which can be NMOS pass transistors, PMOS pass transistors or 
transmission gates. Usually, NMOS pass transistors are used due to its small size. 
The address buffer is used to drive the NAND gates in the row and column 
decoders. It is composed oflarge inverters . 
The read and write buffers are used to control data flow which are composed 
of tri-state buffers. For one bit data bus, the read buffer is simply a single tri-state 
buffer. Since the write buffer is connected to the bit line pair which is dual-rail, two 
tri-state buffers are required for one bit data bus. 
The precharge buffer is used to precharge the bit line load during the read 
operation. It is composed of two large PMOS transistors and one small PMOS 
transistor. The sources of the two large PMOS transistors are connected to the VDD 
and their drains are connected to the bit line pair. The source and drain of the small 
PMOS transistor is connected to the bit line pair. The gates of the three PMOS 
transistors are connected together which is the enable pin of the buffer. In this way, 
during the precharge period, the precharge signal will pull low so that the three 
PMOS transistors are turned on. The two large PMOS transistors then start to 
precharge the bit line load. The small PMOS transistor is used to equalize the voltage 
level of the two bit lines. 
The data sense amplifier is used to detect the memory cell content by sensing 
the voltage difference of the bit line pair. The conventional data sense amplifier [27], 
[29], [30], [34], [36], [37] is shown on the left of Figure 3-2 which is composed of 
five transistors. The B and BB pins are connected to the bit line pair, the D pin is the 
Page 38 
> 
Chapter 3 Implementation Asynchronous Memory Design 
output and the E pin is the enable. However, in order to increase speed, the data 
sense amplifier can be modified as shown on the right of Figure 3-2. This modified 
data sense amplifier connects the two inputs of two conventional data sense 
amplifiers together and the two outputs are connected to the two nodes of a ring 
formed by two inverters. The modified sense amplifier is more than two times faster 
than the conventional one, but the power consumption is doubled. Either sense 
amplifier can be used in the proposed memory system. 
For those components drawn with dashed lines, they are either new circuits 
designed or conventional circuits modified for our proposed memory system. These 
components include control circuit, segment decoder, modified memory matrix, read 
completion circuit and write completion circuit. These components will be described 
B o  
BB 
0 ^  
今 今 令 
L J L T J L J 
)~"|~~c D~ ~~c >~~f~c 
j ^ V ^ D i ^ L j ^ h 
B r^ H BB __ L_, ^ L_| 
o—— ——<• 1 ——__ 
n r . 'S r n r 
E r^ E H r^ 
o D  
*—I I J L__j 
去 ~ 4- i 
D 
I •~|^^>o^ i——0 Con�entionaI Data Modified Data ^^ Scnsc Amplifier Sense Amplifier ~0<^  :"•:x^  . . ::iiiiiiiiV；;；；；；：；；；.;-：-' . ]i:ij:.:ii.:.:i •： - ‘ • ：；.；-  •;；；.•：•:; • ；•：;:•:：-?:;•.;•••"•： .••：•： : .：；.-•：•：.；•；.：•:： .:..:. ‘ .:.. : • :^:«:： ": . y:::y-.:-:::::r '-| 
Figure 3-2: Data Sense Amplifiers 
Page 39 
Chapter 3 Implementation Asynchronous Memory Design 
in more detail in the following sections. 
3.2 CONTROL CmCUIT 
3.2.1 CONTROL SIGNALS 
As described previously in Section 1.3.3, the proposed asynchronous memory 
system is able to communicate with other asynchronous systems based on Four-
Phase HCP. In other words, the control circuit of the system should be able to 
communicate with other asynchronous systems based on the read / write timing 
diagrams depicted in Figure 1-4 and Figure 1-5 respectively. To achieve the task, 
control circuit different from the conventional one should be used. Before 
implementing the control circuit, first of all we have to decide how many control 
signals we have to use and the relationships between them. 
3.2.1.1 EXTERNAL CONTROL SIGNALS 
Externally, the control circuit should be able to handle two input signals. The 
first one is the request signal (REQ) which triggers the read / write event. The 
second one is the read enable signal (RW) which differentiates between the read or 
write operations. For read operation, RW is pulled HIGH; for write operation, RW is 
pulled LOW. On the other hand, the control circuit should be able to generate one 
Page 40 
Chapter 3 Implementation Asynchronous Memory Design 
external output signal which is the acknowledge signal (ACK) to signify end of the 
current operation. It should be noted that only REQ and ACK are required in 
communication according to the Four-Phase HCP. 
3.2.1.2 INTERNAL CONTROL SIGNALS 
Internally, for the read operation, the control circuit should be able to handle 
one input signal and two output signals. When the read operation is being triggered 
by REQ, the bit line load is being precharged by the precharge buffer. Therefore, the 
control circuit should be able to generate the precharge signal (PC). Afterwards, the 
control circuit will monitor the read completion through the read completion input 
signal (RC) which is generated by the read completion circuit. When the read data is 
ready, the control circuit should generate the enable signal for the read buffer (CR) to 
output the read data. 
For the write operation, the control circuit should be able to handle one input 
signal and one output signal internally. When the write operation is being triggered 
by REQ, the control circuit should generate the enable signal for the write buffer 
(CW) to input the write data. The control circuit will then monitor the write 
completion through the write completion input signal (WC) which is generated by 
the write completion circuit. 
Page 41 
Chapter 3 Implementation Asynchronous Memory Design 
3.2.2 READ / WRITE STATE TRANSITION GRAPHS 
With the external and internal control signals being introduced, we are now in 
a position to describe the signal transitions during the read and write operations. 
Please refer to signal transition graph (STG) for the read cycle in Figure 3-3. 
The ”+” sign means the transition from LOW to HIGH whereas the "-" sign is the 
opposite. For the read operation, when REQ rises, RC will also rise and PC will fall 
to enable the precharge buffer. When the precharge period ends, PC will rise and the 
memory cell content is being sensed by the sense amplifier. When data is read, RC 
falls and CR will rise to output the read data. ACK will then rise to signify the end 
of operation. When the external control unit has finished getting the read data, it 
resets REQ. CR and ACK will then fall and the memory system is ready for the 
next cycle. 
For the write operation, when REQ rises, CW will rise to input the write 
data. When the data is written, WC rises and ACK will rise to signify the end of 
operation. The external control unit will then reset REQ, and CW, WC and ACK 
will fall and the memory system is ready for the next cycle. 
Page 42 
Chapter 3 Implementation Asynchronous Memory Design 
/只 / T 
/ PC- RC+ / CW+ 





\ CR- \ WC-
y ^ ^ ^ ^ A C K - y^^^^ ACK-
READ WRITE 
CYCLE CYCLE 
Figure 3-3: Read / Write State Transition Graphs 
3.2.3 EVEPLEMENTATION 
To implement the control circuit, we have to express all the output signals in 
terms ofthe input signals. As described previously in Sections 3.2.1.1 and 3.2.1.2, 
the input signals are REQ, RW, RC and WC. Basically, the output signals can be 
generated by Boolean combination of these input signals. However, for practical 
situation, two delay elements should be used. One of the delay elements is used to 
allow for the worst case address decoding time. In that case, REQ is delayed and Page 43 
Chapter 3 Implementation Asynchronous Memory Design 
REQwL is the delayed output. This ensures that if the requesting device triggers the 
request signal simultaneously with the address, the address can still be decoded and 
the required memory cell is selected before the read / write operation. Another delay 
element is used to generate PC which is an inverted pulse. In that case, REQwL is 
further delayed and REQpc is the delayed output. 
As a result, the Boolean expressions for CW, CR, ACK and PC are derived 
as follows: 
(3 -1) CW = REQ^,-m = REQ^ + RW 
(3-2) CR = REQ^.RW.J^,PC = REQ^ +W + RC.PC 
(3-3 ) ACK = CR + CW. WC = 0 • CW-WC 
(3 -4 ) PC = RE^ + REQ,c + m = REQ^ . REQ,c . R W 
These expressions are trivial results from the descriptions in Sections 3.2.1 and 3.2.2 
and are self-explanatory. These implementations are race free since for each 
expression, the corresponding inputs will not change their states simultaneously. 
According to equations ( 3-1 ) - ( 3-4 ), the control circuit for the proposed 
asynchronous memory system is implemented as shown in Figure 3-4. A large part of 
the circuit can be implemented by standard digital logic gates with proper drive 
strengths. The two delay circuits can be implemented by inverter chains if the exact 
delay values are known for the address decoding and bit lines precharging. However, 
before the proposed memory circuit is fabricated, we can only estimate these values. 
Page 44 
Chapter 3 Implementation Asynchronous Memory Design 
i|i|__iijli^ ^^^^^^^^^ Delay(PC) j | | g p g | | g j | j j ^ | g | | | : : _ | _ | ^ ^ ^ ^ | : | | _ _ _ _ ^ ^ 
'hSMm： WmmMSSgiMSSSimmS^^ 
i c a a a i ^ ^ 
丨丨丨:__丨"丨.丨丨:『[^^ ^^誦:__1丨灣_,懸:.®!^  1:;:;_:1:;_;:;::;:丨:::::::;:::： M0SM:W'Xmm&::::W 
;i:iOi"^  ^  Delay (WL):;丨::誠:.丨1丨_:::::;::丨:_::_:_:;: _>?设:丨::丨;:.丨丨_:甲11丨::二丨丨/ ,^m.mM. 丨;？::;::丨丨::丨％:丨:::::::::"^ ^巧:丨丨::::.:::.::::丨:丨:::丨丨:;丨.丨丨::;mfSMMMwmMM 3^m i^m& 
i;;i:llii:_::::::: .：二二..;;:;::^ jg|ii::|M|i|||||^  
E:ii:l-'^ l^ l::^ -;^ lllli 
i i i i ^ ^ ::lii^ |ii^ ;;l^ i'i^  
o  
Figure 3-4: Four-Phase HCP Control Circuit 
Therefore, for safety sake, voltage-controlled delay [38], [39] instead of inverter 
chain is used for the first test chip. For this kind of delay circuit, the delay value can 
be controlled by a voltage, so the delay value can be tuned after fabrication. Ifthat is 
the case, two more control pins are input to the control circuit to control the two 
delay circuits. This method is allowed in research since the test chip is used to test 
the theory only. 
3.3 BIT LEVE SEGMENTATION 
The modified memory matrix introduced in Section 3.1 is different from the 
conventional memory matrix in that its bit line load is variable. To implement the 
concept of variable bit line load described in Section 2.1, one possible method is to 
Page 45 
Chapter 3 Implementation Asynchronous Memory Design 
quantize the bit line load value. This can be done by breaking the bit line into 
segments through the insertion of pass transistors or transmission gates. 
Up to now, one may ask why we segment the bit line to reduce the average bit 
line load for the read and write operations. Alternatively, the bit line load can be 
reduced by breaking the memory matrix into sub-matrices and the corresponding bit 
lines are combined together by a multiplexer. In our research, we want to make clear 
that we are already discussing about memory sub-matrices as stated in Section 3.1.1. 
The bit line segmentation technique should be applied inside the memory sub-
matrices. One may also say that for a single memory sub-matrix, it can still be 
divided into smaller sub-matrices and then multiplex the bit lines. However, this 
method may not be better than the bit line segmentation method. It is because much 
more area and power is consumed since the multiplexer used for combining the bit 
lines is far more complicated than the pass transistors or transmission gates used for 
bit line segmentation. Although the average bit line load can be further decreased, it 
is believed that it will not decrease too much since the multiplexer adds extra 
parasitic load. 
3.3.1 FOURREGIONS SEGMENTATION 
Theoretically, the more segments the bit line pair is being broken down, the 
more realistic is the completion signal generated. However, in practical situation, 
more segments require more transmission gates which will consume more area and 
power. Also, more parasitic capacitance and resistance will be added to the bit line 
Page 46 
Chapter 3 Implementation Asynchronous Memory Design 
pair which will affect the signal propagation. Therefore, if the bit line pair is broken 
into more segments, although the completion signal will be more realistic, that is the 
resolution in time is increased in generating the completion signal, the average 
memory access time is increased due to the added parasitic effects when compared to 
the one with less segments. In light of this, we will not segment the bit line pair into 
many regions. H o w many regions should be segmented? 
As will be described in Chapter 4, we have performed pre-layout simulation 
on the lM-bit S R A M framework. The results between the one without and the one 
with bit line segmentation are compared. To realize the optimum number of regions 
to be segmented, we make use of the D V S C D simulation results because, as 
described previously in Section 2.1.2, the advantage of variable bit line load is more 
significant for the read operation. From Table 4-3, the typical benchmark Read 
Acknowledge Time is 15.81 ns. This is the read time when the bit line load is not 
segmented. From Table 4-6，we can calculate the delay introduced by the three 
transmission gates used for bit line segmentation which is 1.28 ns per gate. Assume 
that the read time varies linearly with the bit line load. From these data, we can 
approximate the Read Acknowledge Time at an arbitrary memory location when the 
bit line is segmented for a certain number of regions. Take the Read Acknowledge 
Time for Region 2 when the bit line is segmented for three regions as an example. 
The time to access one bit line segment is 15.81 ns / 3 = 5.27 ns. For three regions of 
segmentation, two transmission gates are used. For Region 2, one ofthe transmission 
gates is O N , together with the time to access two bit line segments, the Read 
Acknowledge Time is approximated to be 5.27 ns x 2 + 1.28 ns = 11.82 ns. The 
approximated results for one to ten regions of segmentation are shown in Table 3-1. 
Page 47 
Chapter 3 Implementation Asynchronous Memory Design 
The average Read Acknowledge Times are also shown. It is observed that the 
average result of four regions of segmentation is the best. Although the result for 
three regions of segmentation are comparable to that of four regions，it is not a good 
\ 
idea to segment the bit line for three regions. It is because the transmission gates 
control circuit will be more complicated than the one for four regions. Also, the 
resolution in time in generating the completion signal is lower for three regions of 
segmentation. Therefore, we can conclude that segmenting the bit line into four 
regions is a good compromise between increasing the average speed by reducing the 
added parasitic load and increasing the resolution in time in generating the 
completion signal. 
| i i a ^ ^ ^ ^ ^ ^ ^ ^ ^ N 
Location Onc i'wo ilnvc 1 our l-i\c Si\ Sc\on I ighl Nmc lcn 
Region 1 " ^ " r W l 7 . 0 9 18.37 19.65 20.93 22.21 23.49 24.7/ 26.03 i/Ji 
Region 2 ： 7：^~"11.82 14.44 16.48 18.32 19.96 21.M 2J.04 i4.4b 
Region 3 : ： 5 2 7 ~ ~ ^ ~ ~ 1 2 . 0 4 14.40 16.42 18.28 20.UU 21.bU 
Region 4 ： : 3M~~7M~~r048"~TL^~~15.02~~16.96~~18.74 
Region5 = ： ： ： H6~~6^6~~934~"TTJ6~~Um""^15.88 
Region 6 ： ： ： ： ： ^ ~ ~ 5： ^ ~ ~ ^ ~ ~ T M S ~ ~ 1 3 . 0 2 
Region 7 ： ： : ： ： = ^~~524~~7：^~~10.16 
Region 8 ： ： ： - - - - ^98~~^80""^73^ 
Region 9 - ~ " ^ - - - - - " 5^~~4.44 
Region 10 - - - - - ~ - • _ " !58 
Average""^15.81 12.50 11.82 11.81""^1^"~12.44 12.88"~m9~~Y^~~14.45 
^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^  ^^ ^^ ^^ ^^ ^^ MI^ ^^ M^ ^^ ^^ ^^ H™^^ *^^ ^^ ^^ ^^ ^^ ^^ *^""^ ^^ "^"^ ^^ ™*^ "^ ™"^ ^^ ^^ "^ ^^ ^™ "^""""^ "^ ^^ ^^ ^^ "^^ "^"^ ^^ ^^ ^^ "^^ """"^ ^^ ^^ ^^ "^ "^ ^^ ^^ ^^ ^^ ^^ "^^ ^^ "^ ^^ ^^ ^^ "^ 
Table 3-1: The Approximated Results for One to Ten Regions of Segmentation 
Page 48 
Chapter 3 Implementation Asynchronous Memory Design 
For comparison, the conventional bit line connection is shown in Figure 3-5, 
and the bit line segmentation connection employed in our asynchronous memory test 
chip is shown in Figure 3-6. For the segmentation connection, the bit line pair is 
being broken into four segments by the insertion of three pairs of transmission gates. 
In this way, the memory matrix is being broken into four memory sub-matrices, that 
is four regions of memory access along the direction of the bit line segmentation. If 
the parasitic capacitance of the entire bit line is 4C, assuming that the parasitic 
capacitance associated with the transmission gates is small, the parasitic bit line 
capacitance of each segment is C since the parasitic capacitance is directly 
proportional to the bit line length. 
B 
• <Z> 
::;鬧_丨_團1_:丨::::_—一|A__i _—+’’‘―_1__誦^^^^  
響::1丨1;丨:議;::丨隱:_隱 Memory _ _ _ _ _ _ 
::;;imiir:ii;Ui|::::|::_i|___ Matrix ::;::iiii|||M 
一 “ ~ ^ 
::iii^-li'iv--
BB 
^ ' ^ ~ ^ 
4C 
• . 
Figure 3-5: Conventional Bit Line Connection 
"*^^^^^^^^^^^ H^^^^^^ H^^^^^^^^^^^^H^^^^^^^^^ I^^^^^^^^II^^^^^^^^^ I^H^^^^^^^^^^^^^^^^^^^^^H 
Page 49 
Chapter 3 Implementation Asynchronous Memory Design 
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ _ ^ ^ ^ ^ ^ ^ _ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ : ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ r = = = ^ 
n > ^  ’ 
| _ 圖 
|___國^^^ ::‘.:::"”:、:'"、:•:f’:::.:::,:。::"'".::':_墨1圓^ 
|lf;||;i|(;|i 2-to-4 Segment Decoder 丨 _ . : _ : _ _ ; : _ : 丨 : _ ^ ^ ^ ^ ^ ^ 
丨.丨::翠.:'丨::丨:丨、:韓:丨:丨丨誇 ^ td ？ t ^ : _ ? w ^ d3 丨；：|：-；； ：丨.;:::¥；；:； :::』丨；::丨；：:;¾':；：；； 
誦顯疆|_|__誦|__圖^ ^^  
1 ¾ ! ! ¾ ¾ . ||||||;i 
l^iill:!iiU|liiyiii 
•：丨丨 i|i Memory““1,1 :_•_••_ lliilj~"Memory~ l|;丨種麗⑴！函擺 i|:li Memory !|_:_丨1:!丨!聽 Memory 
iiliig:i|ii,|_| Sub-Matrix | 腦 _ 躍 麵 _ _ . 麗 Sub-Matrix || | _ | | | | | | 丨 誦 | ^ ^ ) Sub-Matrix |：|：； _||_||_|_pj| Sub-Matrix [：；；;；：：：： 
WKimMiM'SiMi^^ 
l l l p ^ ^ 
WMlSSSB^ 
i 4- ‘ 去 
REGION 1 REGION2 REGION 3 REGION4 
Figure 3-6: Segmented Bit Line Connection 
3.3.2 OPERATION 
For the modified memory matrix, the four regions are differentiated by the AO 
and A1 address pins as shown in Table 3-2. These two address pins are input to the 
2-to-4 segment decoder as shown in Figure 3-6 and the outputs are used to segment 
the memory matrix. The decoder output will also be used for write completion signal 
generation which will be described in Section 3.6. 
Page 50 
Chapter 3 Implementation Asynchronous Memory Design 
According to the connection between the segment decoder outputs and the 
transmission gate controls shown in Figure 3-6, the O N / OFF status of the 
transmission gates and the equivalent bit line load corresponding to read / write 
access for different memory sub-matrix is shown in Table 3-2. Take the memory sub-
matrix in Region 1 as an example. When it is being accessed, address pins AO and 
A1 are L O W , therefore all the transmission gates T1, TlB, T2, T2B, T3 and T3B 
will be O N and the equivalent bit line load is 4C. 
[ ^ ^ : . y ' i . - ; 丨 \ |j 
# ^ ^ ^ ^ 絲 「 等 、 卜 、 . \ . \.、r、、、 、 ： : : _ ~ ^ 
Region 1 0 0 O N O N O N 4C 
^ 1 5 ^ ¾ ^ ¾ ¾ ! ! ¾ ~0 1 OFF O N O N JC 
Region 3 1 0 X OFF O N 元 
: , " . * ^ 顯 _ 麵 : ] 1 X X OFF C 
Note: X = Don't Care Status 
Table 3-2: Memory Location vs. Address Pins Status, Transmission Gates Status and 
Equivalent Bit Line Load 
3.3.3 MEMORY CELL 
The memory cell used in the modified memory matrix is just the conventional 
six transistors static memory cell [27], [36], [37], [40], [41] shown in Figure 3-7. 
Each memory cell consists of two inverters cross coupled with the bit stored in the 
internal nodes SB and SBB. To access the memory cell, WL will pull HIGH and the 
memory cell can be accessed through the two bit lines B and BB. 
Page 51 





A1 SB (» 1 S B B A2 
n ^ ' ~ ~ " L - ^ ^ ~ " " " i T ^ 
W L ~ 
o * 
On 1 " " ~ BeO 
Figure 3-7: Six Transistors Static Memory Cell 
3.4 CURRENT SENSEVG COMPLETION DETECTION 
In Section 2.2, we have described how the CSCD method works. W e have 
also introduced the C M O S LSD current sensor in Section 2.2.3. This technique is 
used in the read completion circuit described in 3.1.2. Now, the question is how to 
implement the current sensor in the proposed asynchronous memory design. In other 
words, which component's current flow in the memory system should be sensed by 
the current sensor in order to generate the read completion signal. 
Refer to the block diagram of the proposed asynchronous memory design 
shown in Figure 1-3. During the read operation, as described in Section 2.3.1, the 
target memory cell is enabled and the memory cell content is sensed by the sense 
Page 52 
Chapter 3 Implementation Asynchronous Memory Design 
amplifier. The output of the sense amplifier will go through a transition when it has 
finished sensing the memory cell content. Therefore, the sense amplifier is the 
component in the memory system which is critical to the read completion. Hence, we 
proposed that the current sensor should be added to the sense amplifier to generate 
the read completion signal. 
In the following sections, we will describe how we implement the current 
sensor to the lM-bit S R A M system with one bit data bus in Section 3.4.1. Moreover, 
the implementation for eight bits data bus will also be described in Section 3.4.2 as 
an example ofhow the CSCD technique can be applied for larger data bus. 
3.4.1 ONE Brr DATA BUS 
The block diagram for read completion signal generation assuming one bit 
data bus is shown in Figure 3-8. In the diagram, the C M O S LSD current sensor is 
shown by two blocks, which are the voltage divider and the load and amplifier. For 
one bit data bus, the implementation is straightforward because only one sense 
amplifier is used. The sense amplifier is connected to the bit line pair and the read 
buffer in the conventional way. However, the V D D pin of the sense amplifier is 
connected to the VDDoux pin of the current sensor described in Section 2.2.3. In this 
way, the V D D current of the sense amplifier is sensed. 
Page 53 
Chapter 3 Implementation Asynchronous Memory Design 
^ ^ Precharee 
CMOSLSD Current Sensor g ^ 
Load & 
i Voltage Amplifier 
. ； Divider ^ i \ 
i … i|^ b o 
： R^EF(OUT) K^EVm C^OMP ~~^ /^>^ ^ / READ^ j^^ p 
i ypppuT i 
^ ^ ^ VDD 
(3> + ^^ <J 
\ > ^ ™ ~ — o 
^ \ _ ^ ^ To Read 
^ (’• . . ._. ^ ^ ^ ^ VSS Buffer 
'..,.. L^ 
Bit Line Pair 
Figure 3-8: One Bit Data Bus Read Completion Using C S C D 
Theoretically, VcoMP can be used as the read completion signal. However, in 
practical situation, due to the design of the sense amplifier, VcoMP will be pulled 
HIGH after the bit line precharge is initiated for a certain period of time. To 
guarantee that the read completion signal stays at HIGH during the entire precharge 
period, VcoMP is connected to an inverter, the output of which is then combined with 
the precharge signal by a N A N D gate which generates the read completion signal 
READcoMP In this way, when the bit lines are under precharging, the precharge 
signal is already pulled L O W so that READcoMP is pulled HIGH independent of 
VcoMP. When the bit lines are precharged, the precharge signal is pulled HIGH 
whereas VcoMP is pulled HIGH since the sense amplifier is switching, so that 
READcoMP is still HIGH. READcoMP is then dependent on VcoMP- When the read 
operation is completed, VcoMP is pulled L O W so that READcoMP is pulled L O W . 
I 
Page 54 
Chapter 3 Implementation Asynchronous Memory Design 
3.4.2 EIGHT BITS DATA BUS 
The block diagram for read completion signal generation assuming eight bits 
data bus is shown in Figure 3-9. For eight bits data bus, eight sense amplifiers are 
required which are connected to the bit line pairs and the read buffer in the 
L o a d s & A m p l i f i e r s 
i^; WBM^^^&M'WM. ^;  ^1^4^i^^  ^^^^ :|.y^  ^:;i;^i i?； in ：!:； WMMMMSff^WM _ _ 霧 _ | _ _ _ _ 賴 _ _ _ _ _ : _ 饔 | 鐘 
;;:::丨.1:_丨_断:::_丨丨0 
:；：：：i；；:;t^.i.;： V c o M P :.：.>；：；;：：:. • ：：：::h';i:i:;:::f;:;_.::;';:;_g:[i:i|y:i:;!i:|y :¾:::;¾;¾¾:¾;^ :¾^  
|i|f!^ |||___VDEV VDDo^|pj|igR 
111_11震§1|圓 i:,,jiiii{|jiim 
丨丨!丨:"::::5:.::_::___1攀丨丨丨……""v"…“‘……^'WUpmiSim 
. .;,;l;;•;.;：.： ‘ C O M P ：；：;；;；;： :i：;,i：；;,i|; : 1 _ 變 編 醒 猶 杀 薩 圾 v-jjSsSili：® !:¾¾:;¾¾¾^^ 
〗 | : : ! | | | ^ | | | f | | _ _ V D 〜 V D D ^ ^ ^ i W ; | | | Q | | | y ^ 
丨;:"1丨!"3:":!丨1!:_丨:丨"!1:丨： v^_ :|iil;BBltfiS ^ ,^,,|j^tfS 
；.；,^ ；：；：:s:::::；；：；；丨:：::印:::::入:?|::;、::;.::.. >:;:_；；：?；：；;：: ；•；!；；.；•；：•：；:； L < ^ ^ : ：；；¥S5>•::：： :面緯_料;:.困,；；；•：;s；：®；：； ：5；；.；®：;：；：；；;：；；；；;s;i:i;fiSia;;;:;;;ii 纖 楚 ： 縫 曲 纖 恋 茂 ； m i 0 m m i - m m m u m i i n 
sSimMMS^ESiSrn^^ 
^mmmmiimBmmmmms^^ 丨丨iiiii_i____•,___i,__圓^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^  
iiiliii__1_卿丨:丨| V pmmmmmBmmmi隱“_•:__:__:難阳顯_誦_穩____鐘_:_iiiiiiiiiiiiiii 
::;终:丨::丨_:.:_:::丨::彳.丨.:|丨.::密:丨.丨.丨.丨:彳::丨::丨:_丨径^运• coNff ::: ；；:. • ;.:.i.i ：•;：;；：；：；：.：资运:丨:丨彳经经招衡运+丨:丨:缀浅:;^:丨|::笼丨::丨:丨丨:丨:::;::;;:;;:;)�:::;:::::;•：•：.：：.丨彳.:丨:丨:;.丨::丨:経;:;:恐:;:丨:丨_:丨::::.丨::丨钩 ：|;：：.：^|：：：.；：：>：;：::丨;资彳::::::彳::趙彳|:矩.:x:H:;i^S:;;S'ir  
__i__^ ^^ ^^  vDD^ ^^ iii;:iiS;3i|^ ^ 
KS5fti;;.:涵涵链嫩麵：；>:；；.：：； ：if；：：；：：sisi :¾: J > : . ; : _ ' . C > i ^ g g _ _ i S K ？ 邏 钱 ： 塞 縱 5 : s 场》_斑__径謹》通涯:鐘棚纖圾纖; s ^ : s _ _ _ : 
liliiiii«^ ^^  v_) iii^ iiiiBB >^ :|^ iiiiii*^  
| i i | | i | | :M ; ;S^ | : i | | | ^ 
:：愁：丨：丨 :/;¾¾;¾:¾:¾:¾; 1;¾:!¾;；：'：:|：：:；：：；.：：：；；；：:;：.：•：.：•；；： ；；,：.：；；:i'; ;•;.'：•： ::::；:/i.i:.:.:.:$句_:〔丨::丨:盗:丨.丨:丨_丨|丨:终:.丨:丨5贺:丨:丨::^:!^资':?:丨:丨:丨_5ii^t丨'^法丨_丨_丨:丨卞丨:丨:;:溶;i:i^:i'!:i1s;i;i|^;i:;^^  丨:丨:丨:丨:丨:丨:§:|丨:5:_言:丨丨:::丨:丨;丨;:丨1::丨能::节〒::3:丨沒:.丨!:¾;.!:¾ ：；；;：|>：：；?::：;；:：：： 
::;::;.丨.:.:_:;:;:;:;::;怒:丨:.;、:.丨:丨.::'丨.:_:_丨:丨:5:.丨丨丨:丨:3:;资:丨':：丨：丨 6;^ !:;¾:¾:::¾¾¾¾¾;?:.: ^ :^:;:;;i:;:::::s:.:frn"^ TrTrn"ir*TrnTT^ T^r"TrT"!TT^ f^ ;|7nTTrnTTTrnTr"*^ T^ "^nT"nTn?^  ^ :.:.:::::::::::::::*:.:::.:::::.:::::.::: ；-；：：；：：：-；-：；：：-：：.!：-； ： ：: :¾::.¾:: ；: >j ；.；：；； ;:;•:;.; ::;';?:¾¾< :;:^¾::':!;-:¾¾^;¾¾':¾ ::¾:¾:!¾::¾:¾¾¾¾.:.:::::¾':':::::::::::::::;:: :¾¾¾¾-: :.;:;¾¾?:¾:;:;¾¾¾¾.::;¾¾.;:;¾:;)::'::::;¾ 
_ _ _ _ _ _ _ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 
l _ _ _ i _ |““^ |::^:|li|ipliEBfl 
: : : : ^ i _ _ _ _ _ v D i v v D D o u T i i l i i i c a i s > ^ ’ _ _ l i i i 麗 _ _ _ _ 
丨 : 丨 : : : " 纖 _ 穩 _ _ _ v „ _ 、 ：:!":___j^ii _ ^.^..Jiiia   
. : : . " : : : 丨 隱 酬 — 圖 ^ 
;^i;-  ^REF(ouT) ¢^  Mliiiiillllilil ;|i 1 \::；：丨:；|；！；;|;i；；:.丨,i；i；；:::丨:::•；；：；；；•；；；:• ：：•：；5；：丨丨；：；；；:i:L“；H-；；丨：；；；•：•；；；；...\,\\\：,:i i:_-i;!_|ilSi!;l:i!il:i_;:!|f;i:|;i:li|:!;_|  ^ ^ M w X i M s U m ;iiiliiiliB: 0:::::(::::::::4 :':.:.:::.:::.::.:::.i:.:.;.:::':.-y丨.丨..:::::二:::"::.:1:丨.::::.:.:::::.::::::::.::|::;i'i:丨:.::::.,''•丨::::::'丨:::.::::.:::::::.:::::.:::::::.:::.:::::::.::1::敌:.:.:.. ::lnliil^lii:SSMsliiil;^^^ 丨:_慈丨:丨彳彳:丨丨1:丨:丨:丨丨:丨:丨.丨:丨1丨1丨:丨:丨:丨:丨丨1丨丨丨1:丨:丨.丨丨丨:丨?丨丨:丨：1! lill;^tilfi 
V ：:::。丨•:::: • • ::: • • :::¾.;::-:.:.:¾.: ;S%Sfi^;i>s:ix>;:.丨::丨;丨《:;:.:_;:丨::丨:;:;:;:;:::;:;+:;.:;丨丨:丨簽丨:.:.:;:丨.丨:::.:;+:丨丨丨.:.:丨:丨.丨_:;';:';_:1:丨-丨:;:;:;:丨:丨:丨:;丨:丨:丨.:;:;::;:知窗:;沒::沒:；丨：:丨:丨項_:兹:.;¾¾¾;;.: :¾¾¾ 欲•;:;:::;::::;: :¾:¾;+: ;;.;•;:;:;•:.;•;:::;.;:;:;:;;: 
t;;:¾:::;::::¾:;;;¾;::¾:;;:;<:;(;:^':::::;':::);f；：；： 1¾¾:.;-^  . C O M P .\:I：：:：•； . • • !.:.i:H.:;:U;:i:iliS^ S^;;'tf!;i^  缓:丨:丨丨:丨;:续;:丨:;〕:;笠:;1:丨;_:;丨:丨:;::;;：;：;：:<;：;：-：;；;|!：：>^：:丨:丨.:丨:丨:;丨:|丨:_丨:丨:|:丨:|丨:紹:¾;笠:丨:丨:丨:丨::丨:丨:丨’丨!：;•.;：; ' M M M M M；： 
_ | _ _ _ _ _ V D I 5 > ^ VDD^^iPi|ifi:piif?^* 
.!i;;i;^.i:i^-;:;'^fi!•; :^ ¾¾;; 1¾.;¾¾-;:•；-;ti>:;^;ii;;^  , , ::.:::〕:::::•:•:.:;::•;::•:::•;•;:::;〗:•:.:.:;:::;.::;•:• ^ . ^ J p w < ^ i i i ^ ':¾!¾;;:;::¾^  :;¾¾¾.:¾¾¾:::: 
! _ _ _ 丨 _ : _ _ : _ V R E F _ mm-MBmn>s^ " > < t ^ i _ _ i _ _ _ : _ _ _ : _ i _ _ _ _ - _ : ^ ^ ^ ^ ^ ^ ^ 1 : _ : _ : _織 1 蕩 _ _ : _ : _ _ • _ _ 
_ _ 1 _ _ 誦 ^ ^ 
iii|ili|M^^^  
_:__1:_:__籠:__輩::： c^oMP |^'l;;i:|:i;:^::^ir^ 
|||||||{j|||fc^  VDD^ ,^ |M|;||;g 
________ v^_ ::ll::::l|sifcf^i '^.,,..Tng^  
:!:li|M 
_ _ _ ^ ^ 
_ _ i 賺 匪 画 : 通 _ : : : | V 丨:_::.:::|:丨::::.::::::::::::_:〜:__:!:_麗_驅_耀;_s::ii:,m甚__:_画謹_:i _ 誦 _ 麗 讓 _ : i i 誦 _ _ _ : _ _ _ _ _ 議 i i i 
•^1；|：；.：：-'；：：；：：：：;.;^ ： •：；；：•；：.；；.：.： ;::i * coNff :;:丨::丨'::;丨丨 i;-i|fi:^;:;i;-! !:!|；;;|;^ ;||；'；1； «^^ ^ i":;s;:T: ；-•：-•：;；••：：•：：;：;：：：；.：::;.:.::.::广:_:7:::：：：!；：|:::;:::::_:1:: ：：：：•：：： ：：;：•：•：>••：:•:.:•:•:•:::•:•:::::::•:•:::':::•:•:::::•::::•:':::::::•:::•:•:.:.:::.:::::•: ：•：•： 
i!lfilllliB vDD„^ |i|j|ii^ |§Hp^  
iiiilliw :,.,J:=liiiiii^  
.__)'::_:__'_ii_ii 丨:丨:_::._'__:.:';:.:_:'_.丨::::''^ ::1^ -^::¾-;:::;-: f;.ii:i:.ix;;;.-i::i-ii^:;'lty^^^^^  H.i:i;i;'v;.i{;;E^|:i;iB^BOB-:'i': :|i'^ S ： :¾;!¾:-;:;!¾ ;-K.i|: ;f;t:;|p|||;|y||| ;|t;i>if;:;tyiit;fi^^ i^ ili^ lil:!!：Iiltlit; 
_-____i_;_:__i5n:r*""""";;^^^^ 丨“：弱丨；；⑴；丨：疆：：：：：藥^  
• 凝 : _ : : 缀 _ _ _ 纖 圍 | _ : 蹈 ^ | ^ : : : coxn> + ：：;：；;；：：；： 3::+丨:+:::::_::_|誦丨:丨::丨_霧！ _通 : _ _通 . _ : _ :纖 _激 :無痴 .丨 : : '謹 
iiiiiiiii^^^^^^^^  ilpB vDD^  ^m^m ^1^ 111!¾¾ -y^^^^^ff 
i _ _ _ _ i K ......:i:.iilliE^^ 
. iiiiiliB _ : _ _ 1 _ : _ _ _ _ 盤 _ _ _ 握 ^ ^ ^ ^ 
Figure 3-9: Eight Bits Data Bus Read Completion Using C S C D 
Page 55 
Chapter 3 Implementation Asynchronous Memory Design 
conventional way. For each sense amplifier, one current sensor is needed to sense its 
V D D current, and the connection between the load and amplifier of the current 
sensor and the sense amplifier is the same as that for one bit data bus. However, since 
the threshold of the eight current sensors are the same, only one voltage divider is 
required and connected as shown in the diagram. In this way, not only the circuit 
complexity is decreased but also the area and power used is minimized when 
compared to the implementation of using eight times the circuit used in one bit data 
bus read completion. It is because the function of the voltage divider is to supply a 
constant voltage level, and its size need not be as large as eight times the one used in 
one bit data bus as shown in Figure 3-8. In fact, the size of the voltage divider may 
be the same as the previous one. 
The VcoMP pins of the current sensors are input to three N O R gates, the 
outputs of which are combined by the three-input N A N D gate. Actually, the three 
N O R gates and the three-input N A N D gate serve to combine the eight VcoMP pins in 
the O R way so that the read completion will be generated only if all the eight VcoMP 
signals are pulled L O W . Two stages are used here in order to avoid large number of 
transistors (at least eight) connected in series which will increase the gate delay. The 
three-input N A N D gate output is then combined with the precharge signal in the 
same way as that for one bit data bus, and the read completion signal READcoMP is 
obtained at the two-input N A N D gate output. 
Page 56 
Chapter 3 Implementation Asynchronous Memory Design 
3.5 VOLTAGE SENSE\G COMPLETION DETECTION 
In Section 2.3, we have described how the D V S C D method works. This 
technique is used in the read completion circuit described in 3.1.2. Now, we are in a 
position to describe how we implement the method in the memory system for one bit 
data bus in Section 3.5.1. Moreover, the implementation for eight bits data bus will 
be described in Section 3.5.2 to demonstrate how the D V S C D method can be applied 
to larger data bus. 
3.5.1 ONEBITDATABUS 
The block diagram of read completion signal generation using the D V S C D 
method assuming one bit data bus is shown in Figure 3-10. The two inverting input 
of the D V S C D circuit shown in Figure 2-5 is connected to the data sense amplifier 
via the bit line pair. As described previously in Section 2.3.2, the VcoMP pin ofthe 
D V S C D circuit is pulled HIGH when the bit lines are precharged to V D D , and then 
pulled L O W when the read operation is completed. However, when the bit lines are 
still under precharging, the VcoMP pin may be HIGH or L O W depending on its initial 
condition. Therefore, the VcoMP signal alone cannot be used as the read completion 
signal. As shown in the diagram, VcoMP is connected to an inverter, the output of 
which is then combined with the precharge signal by a N A N D gate which generates 
the read completion signal READcoMP In this way, when the bit lines are under 
precharging, the precharge signal is already pulled L O W so that READcoMP is pulled 
HIGH independent of VcoMP- When the bit lines are precharged, the precharge signal 
Page 57 
Chapter 3 Implementation Asynchronous Memory Design 
t 
nTiT—"^~~""^"^"T^TT^""^n^n"^^"^^iwn^w^in^ri^S"^!"i"^"^E"^"^^^^I^"^MS^!^"W^MS^?"^^T"7^*~"^^[MPr^""^~^7""^T^r""r^F""^^^^^H^^ "^^^~T^i"^T~j~T^i^^"^^"XMH^iiiiiiiiBiiSi 
, ' , 
DVSCD Circuit 
\ i . 
i 令 丨 Precha^ e 
i i 门 . • ： 
iii^i:iii;:i:i 
衫洛:;.:::::;:。:.::;:二::.“::::;::;.:::：.:;;4 :::”:兹::「;[::;::.;;:、:;:::;::';::.:;.::;;「“:::::「^^^  i：： ： ： ‘^Wm^MmMm^；；： ； ；^: . • ；；;• ；i；：：;•：;•;\:M^^miMBm^'M 
W!ssmmmsmii__1:丨:_::::__丨::_響:_:圓_ 
i;::iisiii!iliii^^ 
i：：：：：-；!：：^：：!^：；；^ sA3 ^:MMMr::m::mm:mm^S^^ 
| ； 1 ： ： ： ： 1 ： 6 ： ： ： « 
11^^ ^^^ ^^^  
|M^ ^^ ^^^ 
ii^-^'i::i^  |11：11：1|：1：：1-^- sABB ^2Bffl^i:iaii:iiii^^^^^^ 
i i 
w S m S ^ 
wmm^^-Mm;^iiamMmsmmM >：：：：：：：：：： ^^":::^:-：||^：::；：：1 
g 0 • \ - Z ToRead 
^ •••., \ U ^ Buffer 
\ I Data Sense 
Bit LinePair Amplifier 
Figure 3-10: One Bit Data Bus D V S C D Read Completion 
is pulled HIGH whereas VcoMP is pulled HIGH, so that READcoMP is still HIGH. 
READcoMP is then dependent on VcoMP. When the read operation is completed, 
VcoMP is pulled L O W so that READcoMP is pulled L O W . 
Page 58 
Chapter 3 Implementation Asynchronous Memory Design 
3.5.2 EIGHT BITS DATA BUS 
The block diagram of read completion signal generation by the D V S C D 
method assuming eight bits data bus is shown in Figure 3-11. For eight bits data bus, 
1 DVSCD Circuit 
||liiB^ V iif^ji|iilH^^^^^^^^^^^  
___:piiiMiP ^Jiitai___i_iffli BB ^^ "^ '^iili^Jiili 
l M i i i « _iiii___—國^ ^^ ^ 
\ i ‘ J 
Bi O P ^ \ _ \ i i 
^ ^ > - ^ • \ i I 
BB1 0 J ^ ^ 丨 ； 
u^  •‘ : ； ： 
\ \ i r 
\ ； ； ！ 
V 二 ； ： 
\ \ I i 
• • H i i i ^ ^ 
j l l l l i a B ^ 
• l l i i i ^ ^ 
•. : ： j ； 
W i 丨 ； 
k u : 丨 ； 
iiiBiiDseii^ _ 國 誦 漏 隱 _ _ _ 
> ~ ~ o ‘... \n ！ 
wmms:mmmmm z^^<mmmMmammsm 
mmmm-rnm} ;:;::.;_;:明;奨::丨资:|:彳:丨;:;:.::::;：；^; ；::. l<l^：； ； n ;•;: H :1¾:;¾¾!;¾¾^;¾¾¾:^  ,^Hi;B;: :^.:1!;¾¾;¾¾^^  ;4::;:;:::§:「;::;::丨::;:丨丨;+:':;::5:丨衫:丨:丨:淺:丨:丨.:;::丨'1丨:::^¾¾!;¾:;:.丨：:.:.:涯键_:容丨'丨:運: ^¾; ；^：；：|^：!；!：!|：'^ ： ：：•;;;：：：•• ::5.;彳::;:;::丨：；：；；；：^$；^；：；^  !1：®».； 
^ = ^ i ^ . I ^ 丨 READ^p 
,i Precharge \ ： coMP 
_ _ _ _ 薩 _ 圓 ^ ^ ^ ^ ^ )aiiiiiiiiiQ__i , 
Buffer I I 」 \ “ ‘ p \ / ‘ •: < ', _ iiilll^ 'Og's ^f^^wmmmMm:mmmmMmMmm _ _ _ 耀 _ _ _ 羅 _ _ 誦 謹 _ _ _ 瞧 ^ ^ i 羅 _ 籠 | _ 靡 _ _ 誦 痛 _ 
一 W M m m M m r n i i___ilM^^^^^^ iiiili____嶋^^^^^^^^ 
BB4 o [ ; > ^ ni i i 
•、 i ‘‘ • 
^ / H i 
l i i i * i i ^ ^ 
• • i W t f c ^ 
置_國^^ ^^  
靈__0»1161«^ ^^ §"丨:讓:丨丨丨!藝_誦籠__:I麗f|iiiiii;iiiii _ i 1 1 1 1 1 : _ _ _ _ _麗 _ | _ _ _ _ 
隱隱_纖__纖通_!:"::“奠 ]>iito___i"::___t__:i纏：薩!llliil_丨____觀觀_讓禱《»_____權_擺___編懸圓 
|||||ygg^^ 
•. / , t ^^^^^^^^M^K "^^ 
WiM'^M^MiM . ;i;：卞;.;::;:;:::.:::,:;.;.;。:;.;;;:;:;;;;::.;:"^^  '--^¾:-:-'------:'." 
Figure 3-11: Eight Bits Data Bus D V S C D Read Completion 
Page 59 
Chapter 3 Implementation Asynchronous Memory Design 
eight sense amplifiers are required which are connected to the bit line pairs and the 
read buffer in the conventional way. Theoretically, eight D V S C D circuits are 
required to generate the read completion signal. However, for a normal memory 
matrix with eight bits word, the time needed to read the second, third, ..., to the 
seventh bit should be less than either the first bit or the last bit read time due to 
geographic reason. Therefore, only two D V S C D circuits, one for the first bit and the 
other for the last bit, are required. The output of the D V S C D circuits are then 
combined with the precharge signal in a way similar for the case of one bit data bus, 
and the read completion signal READcoMP is obtained from the three-input N A N D 
gate output. 
3.6 MULTIPLE DELAYS COMPLETION GENERATION 
In Section 2.4, we have described the M D C G method for generating the write 
completion signal. Refer to the M D C G block diagram shown in Figure 2-6, the 
components involved are the decoder and delay generator. In this section, we will 
describe how we implement the method as the write completion circuit for the lM-
bit S R A M described in Section 3.1.2. 
Page 60 
Chapter 3 Implementation Asynchronous Memory Design 
The M D C G circuit diagram for four regions is shown in Figure 3-12. As 
explained in Section 3.3.1, since we are going to segment the memory matrix into 
four regions, the 2-to-4 segment decoder is used to control the delay generator. 
Therefore, the decoder output is connected to the control bus of the delay generator. 
As described in Section 2.4.1, the delay generator should consist of four worst-case 
delays to generate the write completion signal, one for each region. The delay circuit 
being used for each write cycle is selected by the four corresponding transmission 
r ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ = ^ ^ ^ ^ ^ ^ ^ ^ = : ^ ^ = : ^ ^ = : ^ ^ ^ ^ ^ = ： ： ^ ^ = ： ^ ^ = ^ ^ ^ ^ ^ ^ ^ ^ = ^ ^ = ^ 
> ^ 
yy 
I > > I —o 
2-to-4 Segment Decoder 
^ ro 它 td 召 w ^  ro 
Delay 1 (Region 1) ^ ^ \  
o Delay 2 (Region 2) ^ / \ "' 
Request W R I T E ^ o ^ 
0 o <> 0 
0 Delay 3 (Region 3) - ^ ^ <• 
Delay 4 (Region 4)——….•一一 •— - ^ ^  
Figure 3-12: M D C G Write Completion (Four Regions) 
Page 61 
Chapter 3 Implementation Asynchronous Memory Design 
gates. The enable signals of the four transmission gates constitute the control bus of 
the delay generator which are connected to the decoder output. In this way, during 
each write cycle, the appropriate transmission gate and hence the respective delay 
circuit is selected by the control bus. When the request signal is generated, the write 
completion signal is generated by the selected delay circuit. 
The delay circuit can be implemented by the conventional inverter chain if 
the worst-case delay value for each region is known. However, before the test chip is 
fabricated, we can only estimate the delay values. Therefore, voltage-controlled delay 
[38], [39] instead of inverter chain is used for the first test chip. This method is 
acceptable since the test chip is used to test the theory only. 
Page 62 
Chapter 4 Simulation Asynchronous Memory Design 
1 ^ ^ ^ 
Q SIMULATION 
4. SEVnJLATION 
In the previous chapters, we have described the theory and implementation of 
various asynchronous techniques that can be applied to the proposed asynchronous 
memory system. In this chapter, we will evaluate the performance of these 
techniques by simulations. However, before presenting the simulation results, we 
will describe the simulation environment in Section 4.1. This includes the simulation 
parameters, the memory timing specification parameters and the bit line load 
determination. After that, we will mention the benchmark memory simulation results 
which are used to compare with the results of the proposed techniques in Section 4.2. 
Afterwards, we will present the simulation circuits and results for the proposed 
techniques. This includes the CSCD method in Section 4.3, the D V S C D method in 
Section 4.4 and the M D C G method in Section 4.5. 
Page 63 
Chapter 3 Implementation Asynchronous Memory Design 
4.1 SEVnJLATION ENVmONMENT 
4.1.1 SmULATION PARAMETERS 
Before presenting the simulation results, the environment under which the 
circuits are being simulated should be described first. In order to obtain more 
accurate timing response of the circuit, the HSPICE analog simulator is used. Also, 
since we are going to fabricate the test chip using A T M E L ES2 C M O S 0.7 ^ im 
technology, the corresponding HSPICE Level 6 simulation parameters are used 
which are included in Appendix Section 9.1. These parameters allow us to simulate 
the circuits under fast, typical and slow conditions. Also, C A D E N C E SCHEMATIC 
C A P T U R E is used to capture the circuits so as to generate the respective netlists for 
the simulations. 
4.1.2 MEMORY TEVONG SPECDFICATIONS 
Since the control protocol and mechanism of the proposed memory system is 
different from the conventional one, the conventional memory timing specifications 
cannot be applied. Instead, new timing specifications are defined and are applied for 
interpreting the simulation results. 
The simplified read / write timing diagrams are shown in Figure 1-4 and 
Figure 1-5 respectively. Base on the read / write STGs shown in Figure 3-3, the 
idealized timing diagrams including all the critical signals are shown in Figure 4-1. 
Page 64 
Chapter 3 Implementation Asynchronous Memory Design 
REQ A  
REQwL - j [| I  
PC ‘ , I , \ _ J ~ ~ ^ d 
i 1 Completion >. 
j 5 Time 
R C ‘ j i p ^ ^ = ^  
丨 丨 Read 
j ！ Acknowledge>|r|- 
A C K i j^  Time ^  
Sender's | Receiver's | Sender's and Receiver's | Sender's 
Active Phase 1 Active Phase I Recovery Phases | Active Phase 
(a) Read Operation 
REQ r"""^ ^  
REQwL I ^ I  
C W I ^  
丨 丨 Wrke 
i i Completion  
W C j 1^  Time 1  
； I j j. Write ::::: • V'| ; i i i : | i : f : 
i I Acknowledge 4 | q-"^：''：--
A C K 丨 1^  Time ’ ‘ …..�:::::.::::::::? 
Sender's i Receiver's | Sender's and Receiver's | Sender's 
ActivePhase ； Active Phase | Recovery Phases | ActivePhase 
(b) Write Operation 
Figure 4-1: Timing Diagrams of Critical Read / Write Control Signals 
Page 65 
Chapter 3 Implementation Asynchronous Memory Design 
The definitions of the critical memory timing specification parameters are shown in 
Table 4-1. 
^ ^ ^ ^ 3 " ? . �、 ， 、 , . r _ j ^ • ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ~ ： 
Rcad Completion Time The time from the end ofbit line precharge to the start 
%:":::::/:::::. !)5,3';;;::¾¾¾¾¾¾??:' of read completion generation (PC+ ~> RC-) 
: ;脚访 ^ ^ ^精蘭 _觀職爾 The time from the end ofbit line precharge to the start 
of read acknowledge generation (PC+ ~> ACK+) 
~~Write Completion Time The time from the start of writing to the start of write 
completion generation (CW+ ^ W C + ) 
1^ _^_議____灞讓1^ |^§|霸|^ ||^ |廳 
Write Acknowledge Time~"~The time from the start of writing to the start of write 
acknowledge generation (CW+ — ACK+) 
|^ |pl^ ^^ |p|p^ |^ p|i^ p^ ^ 
朽::、:!麵)1|^ :精禪镇驗籍|_' The time from the start of writing to the time when 
the data is written to the memory cell 
:i&$$0_i&0@^ii_g^,_^g3$pi&$i_&_g^iM&@&$gi^$%g^gg$@@&%00g5&$@$gf@i%$&%$g&$,^2@f@_$ 
• ^ ^ ^ _ ^ ^ _ ^ ^ ^ _ ^ ^ ^ ^ ^ ^ _ « ^ ^ _ « » _ ~ « » ~ ~ « « « ~ ^ ~ ~ ^ ^ ^ ^ ~ — ~ ^ ^ ^ " 
Table 4-1: Definitions ofMemory Timing Specification Parameters 
This table includes the four parameters described in the timing diagrams, plus 
one more parameter: the Actual Write Time. Actually, this parameter is very 
important for the write operation. It can be simulated but cannot be measured 
accurately from the test chip. All of the above parameters will be applied to the 
simulation results in the coming paragraphs. 
Page 66 
Chapter 3 Implementation Asynchronous Memory Design 
4.1.3 BIT LEVE LOAD DETERMEVATION 
To simulate the memory circuit, one way is to draw the entire layout of the 
lM-bit S R A M using C A D E N C E L A Y O U T and then extract the netlist for 
simulation. However, since the netlist extracted will be very large, enormous time 
and resource will be needed for each simulation run. Because this method is so 
inefficient and ineffective, we will not simulate in this way. Actually, during each 
read / write operation, since only one memory cell in the memory matrix is in 
operation, the entire memory matrix can be modeled as one memory cell driving the 
bit line loads. In light of this, we will simulate the memory circuit according to the 
following method. First of all, the circuit of 1-bit S R A M is captured using 
C A D E N C E SCHEMATIC CAPTURE. Then the netlist is extracted. After that, the 
bit line loads are added to the netlist for simulation since this loading is very critical 
to the circuit performance. Hence, we have to determine the bit line load before 
simulation. 
The bit line load is dependent on the geometry of the memory cell layout as 
well as the number of cells in a column. To determine the bit line load, first of all we 
have drawn the layout of the S R A M cell using A T M E L ES2 C M O S 0.7 ^im 
technology which is going to be used in the test chip. Then, based on the ES2 
technology file, the parasitic capacitors associated with the memory cell are 
extracted. The bit line load for one memory cell is calculated to be 0.02 pF. As 
described in the previous chapter, we are targeting on the lM-bit S R A M with the bit 
line being segmented into four regions, each region consists of 256 rows of memory 
cells. Therefore, the load of one bit line segment is equal to 0.02 pF x 256 = 5.12 pF. 
Page 67 
Chapter 3 Implementation Asynchronous Memory Design 
To make the simulation results more conservative, the bit line load is taken to be 6 
pF for each region. The S R A M cell layout as well as the extracted netlist are 
included in Appendix Section 9.2 for reference. 
To verify the determined bit line load value (6 pF / region), we have used the 
method depicted in Figure 4-2. First of all, we draw the layout of 256 S R A M cells in 
a column and drive one of the bit lines with an inverter as shown in Figure 4-2(a). 
The bit line voltage level is initially set to L O W and the rise time and propagation 
delay are recorded. Afterwards, we replace the S R A M column with a capacitor as 
shown in Figure 4-2(b) and perform the same simulation. The results are shown in 
Table 4-2. The two sets of results are very similar and the bit line load of 6 pF per 
region can be concluded as a good approximation. 
iiSiK^3iifi^i§HS®S::G_::SM^^5P||p^iSi 
_il::ii;_lii:l_||j_ SRAM ,_3^:_nm:::;iif:fe,E_3y:;;::: 
'SMfmmM:MMiM ceii 肩丨::.:::::::::”"":丨,::"丨丨”:.:::丨丨::::丨"”.:1丨::":.:::__|丨丨::丨::::"丨:::“」 
F'^i:::.^li:|r^iS 
^:MMMmii' p:| SRAM 1¾:- ,¾. :丨:丨丨:1.:纖 
WM:iK:mmM^ 'M & Cell _ :丨::：.:〔::「:::;:::丨::::::;:_:::.:減:丨:::3^  
::、:晶丨::::1_認_::::::::「丨;:_:;::丨"丨"丨:::::::::,;::::::丨,:::::iiiNiif;i:i|i::,Hi::L 
;:::::::丨:丨丨::::::丨:":::丨丨!6^ 1::丨丨丨丨::::::丨::|3::丨:::_:::.:_3 :i:i:li:liii|i i ; : i P :： 1： « ^ 
1:"丨::.:::::::::::::::::.:::::導丨:::丨:丄::丨::::| SRAM __誦謹疆_»讓_:__^^^^^^^^^ 
lllll,：：：；：^^^ Cell pg|||||||M 
⑷ 0>) 
Figure 4-2: Bit Line Load Verification 
Page 68 
Chapter 3 Implementation Asynchronous Memory Design 
^ ^ ^ g f i - . . | . , :  
Result of Figure 4-2(a) 8.00 3.70 
^ ^ " ^ ^ i f | | p ^ i ^ ^ ^ ^ ^ 
Table 4-2: Verification ofBit Line Load Value 
4.2 BENCHMARK SmULATION 
4.2.1 CmCUIT SCHEMATIC 
The schematic of the benchmark memory simulation circuit is shown in 
Figure 4-3. Since the benchmark memory is used as a control to compare with the 
proposed asynchronous memory, the simulation circuit is identical to that of the 
proposed asynchronous memory except that the bit line pair is not segmented by 
transmission gates and the 2-to-4 segment decoder is deactivated. As shown in the 
figure, it consists part of the components described in Section 3.1.2 including the 
memory cell (6t_memory_ceil), read buffer (da ta—out—buffer ) , write buffer 
(da ta—in—buffer ) , precharge buffer (ix4_precharge), data sense amplifier 
( l4 t_SA) , control circuit ( c o n t r o l ) , segment decoder ( 2 - t o - 4 _ d e c o d e r _ r e g ) , read 
completion circuit (l4t_rc) and write completion circuit (wri te_comp). The row 
decoder, column decoder, pass transistor array and address buffer are not included 
since the word line and address signals can be easily simulated by voltage sources. 
The read completion signal is generated by the D V S C D method and the write 





























N 0 § _
 
- O N 3 0 5 1
 3 Z 1
^ 
S 5 3 H 0 
脚 
、 











 Y H n u











r u L T
 ^ - - - ^ a r
 1 
^ 
T I O . 3 ^



























 — 〕 实











































1 ^ ^ n 
』 _ 二 
! 
SD s i J
 o i






























































































2 」 a = 。 q - 1 - l p §
 


























 9  9
 9
 8











 0  0 T A 
去
 .
Iw ^ ^ J ^














「 n to 二 ,> 









































c ^ -4 e m 堪 . R 
« . s ^ w 
」 
• 洲 










N O P I 3 S 3 0
 
> 紀

























Chapter 3 Implementation Asynchronous Memory Design 
To complete the picture, the bit line load is added to the bit line pair for 
simulation. As described previously in Section 4.1.3, the bit line load is taken as 6 pF 
per region. For the benchmark memory, since we are going to simulate the bit line 
without segmentation, the bit line load is taken as the total of the four regions, that is 
24 pF. 
4.2.2 RESULTS 
The graphical results for the benchmark memory under typical simulation 
condition are shown in Figure 4-4. It is observed that the all the critical control 
signals can be generated satisfactorily following the right sequence without glitch 
and hazard. The graphical results for the fast and slow conditions are very much 
i 
¥ 1 ^ •% * ¥ y % % % % 半 * 辈衆 * % % ；»；某 % % ^ % ^ % % % X 半术、. ¥ >? >^  本 ^  ^ ？^ * y、. ^  本 X X X * ？^ 劣 * ^ 劣 * )^  * 本军 ¥ y ¥ ¥ ¥ 劣 * >^  >^  * ^ = ^ * 
V ： r ^^1 I ^ 1 - R E Q 
p q.o 7 - ： . • . .. . - . - “ A ^ ^ 
T - : L 2 . 0 -…• - . . - : 
I ： — 0 r 丨 i . i. 1 I. I I 1 |1_I——1——i——I——I_——I——^ I I 丨 I I I I ‘‘~I~i 
r. 「…厂 n ^~~f" 「―― '. C W 
L M . 0 7 I • . I I - ^ 
丁 ―： I I : 3  ^ "�r …：I 1 j . 一： wc 
N 〔丨:.,,i.J,.i , I I , I 1| 1 I . I I 1 I I A 
V - 门 「 1 ―： RC 
^ H • 0 : • I “ Zi  
丁 ： I - • ^ "-- . I -  C R 
N 0 , I . I I , I I I L — U ^ ~ ^ „ U i 
V 二 ^~~ r 三 A C K 
已 q.or . -A  T - : L 2 . 0 — : 
1 : -
r I I I || I . I I ,1. I I I I I I I I—I—1——I—^J I I I 1 I _ _ i 0 5 0 0 N 1 0 0 0 N 1 5 0 . 0 N 2 0 0 . 0 N 2 5 0 . 0 N TT) 0 . 0 N 2 0 _ 0 N T 1 M E � L I N� 3 0 0 . 0 N 
Figure 4-4: Benchmark Memory Graphical Results 
Page 71 
\ 
Chapter 3 Implementation Asynchronous Memory Design 
similar to the typical one except for the timing difference. Also, it is found that the 
results for reading and writing a '1' and a '0' are very similar. 
The numerical results are tabulated in Table 4-3. The simulated times under 
various conditions are tabulated against the memory timing specification parameters. 
The percentage of increase or decrease for the fast and slow results are compared 
with the respective typical results and are shown inside the brackets. One may 
wonder how the Actual Write Time is simulated. This is done by initially setting the 
target memory cell to store a 丨0，and the bit line pair to the worst initial conditions, 
and then measure the time to store a T to the memory cell. This is possible in 
simulation since we can probe the memory storage nodes. In this way, the worst-case 
Actual Write Time is measured. In fact, this parameter is very important since the 
delays for generating Write Completion Time and the Write Acknowledge Time are 
tuned according to this parameter. 
^ ^ ^ ^ ^ ^ M 
截 巧 & � ^ 治 丨 1 巧 ^ ? ^ ^ ^ 难 乂 _ 愈 仏論 疲 ; 纟 錢 激 总 、 縱 、 ; ： ^ 
Read Completion Time (ns) 5.41 (-63.3%) 14.73 35.93 (+143.9%) 
""^Read Acknowledge Timc (ns) 6.44 (-59.3%) f l U 37.37 (+136.4%) 
Write Completion Time (ns) 12.63 (-22.1%) K f i 21.08 (+30.0%) 
~~~Write Acknowledge Time (ns) 13.40 (-21.6%) Iim 22.21 (+30.0%) 
、、；::Ac^lCp0ppT|tt|;^_,_|? 8.98 (-32.1%) K^ 20.82 (+57.4%) 
Table 4-3: Benchmark Memory Numerical Results 
Page 72 
Chapter 3 Implementation Asynchronous Memory Design 
Refer to the table, it is found that in general, the differences between the 
results for fast, typical and slow conditions are very significant. It is because the ES2 
HSPICE Level 6 simulation model file takes the very extreme case for the fast and 
slow conditions as shown in Section 9.1. Also, it is found that the differences for the 
read operation results are much larger that the write operation results. This is because 
the equivalent circuit of the read operation (the small memory cell driving the bit line 
load) is much more sensitive to changes than the equivalent circuit of the write 
operation (the large write buffer driving the bit line load). 
4.3 CURRENT SENSEVG COMPLETION DETECTION 
4.3.1 CmCUIT SCHEMATIC 
The simulation circuit is shown in Figure 4-5 which is similar to the 
benchmark memory simulation circuit shown in Figure 4-3 except that the bit line 
pair is segmented by the transmission gates (tran) into four regions. Also, the 2-to-4 
segment decoder is activated so that the M D C G circuit is used to generate the write 
completion signal. The LSD Current Sensor (p_current_sensor) is used to 
generate the read completion signal to simulate the CSCD method. 
Page 73 
Chapter 3 Implementation Asynchronous Memory Design 
V 
rT"|~~ � i § � g 
§ _ 一 ^ 
I X 二^ 
二一 I, m >^ ⑴， 
— ~~"^  ^ „ , L 
同 . V ~ ^ I , I 〇 [ C 卜 g a 1 = i 。 ^ ^ c 〇⑥ i 
0 I © —. • cs V >— ^^ ^ 工 
I »1^^ -| i uJ C-_^  • !  
^ ！ 8 I Y ^ o .^  〒 i ^ ^ " = ^ > 舰 | » — — i —. A s „ 丄 ^ g I V ^ X ^ ] 2 EL_ 
？ , s ^\~W • ；^7^  
^1 「.„ |^'F^ . .1 T ‘ ” � � — 
g ^>^N,/^no^ T f 0 
== g kt/0ti \ ^^L_| L 1 ^  § 
‘ I y 
|_ i L 
「 h i ~m i s 气 , ^ = — ^ f^  r^ 1 <o CO b 
r^ i^js^ ^ ““] 0 2 
g "^ ““ '^yp^ '^  '^  ；^ 
^^ T/0fi ^ ‘ 11 ;j 
w ‘ ® 1 I I Q 
^ i cN ri§] 1^ o cr^  
. : ^ L i i f m>' : 
C “ 1 ,——B o 0 0 Q) 云 .'丨 1 ~siit~ n ^ = g g a “ — « I !^ ¾^^  g 5 I y s 5 
E « § T 1¾ ^ |o |5 |5 |a 
o> NT^ ' s^ L§_ ___ 
= " \ 6 ^ . , “~ 3d s _ 
5 ~^®~ NI >C ino ^^ ® ‘‘ 丄 ®~ M3 s 1«A ~"«1、敝 
— ^ „ r \ , _ L i «3# »— «3 S o,,^OdA 
“ I 3 ,| . 
^ 0 -|。| 
i _, s|  
1 『1 n + „ r^I>1 . T . . _ I ^ ^ 
““ >^^ pk™ rTn 
0^V0ti ^  ^— cea t i 1 I ” f8 5^  ^ • ZdB^  I “~ <:a 运 
“ I ‘~ LB8， 
® I “ ~ ~ lB 1； ^^  I ~ ~ “ ~ 08eT 1 « S i |~« 0g '丨 *' 
T T 1 • I H I 
f\_ 円 ^ 丄 I J。丨 
^ T y!^^^ tD 5 o 
I"“^»^  N( >^ iHO ^» “ r ®~ • ^ 
^ ^ P " < I <J !. 0 ^ CM a 
t^/0fi ^ 1 ~^~I I ^ f ® zm I I 1 溪 I 9 «~ ZM 3丨 S I (B ffl~ LQM ^ COMA ^ ^ ^ ^ MOA 
3 I 丫 B ^ LM P Z3MA ~ _ M^OA 
旨飞况——B I B——0aM L3MA ^ 、 • 
^ I ffl 0M 0OMA ^S 0M3A 
T ^ 丄 ^ ^ 
p<;^  I S^ I""^  
•“"""^；^;^^：^。"" • 
z3t/0)-i ^ «5 
m 
'\ i 
g I | | | | | | | |  252^ )255?^ &J"J5pO:33p"fOZ:EHM T rg -I fJ J U3 ^ •去i (P ‘ ^  s i s A Figure 4-5: Memory Simulation Circuit with CSCD and M D C G Circuits 
Page 74 
Chapter 3 Implementation Asynchronous Memory Design 
4.3.2 SENSE AMPUFTER CURRENT CHARACTERISTICS 
For proper implementation of the CSCD method, the current characteristics 
of the data sense amplifier are very important. Before going on to the C S C D 
simulations, the current characteristics of the sense amplifier in use should be 
observed and verified. 
W e have described the conventional sense amplifier and the modified sense 
amplifier shown in Figure 3-2. First of all, we have investigated the conventional 
sense amplifier [31]. It is found that the current characteristics of the conventional 
sense amplifier is not symmetrical for pulling HIGH and pulling L O W . This will 
make it difficult to set the threshold of the current sensor. W e have tried to make the 
current characteristics symmetrical by modifying the W/L ratios of the conventional 
sense amplifier. However, this will affect the speed of the sense amplifier. Therefore, 
the conventional sense amplifier is not used in the CSCD simulations. 
W e have also investigated the modified sense amplifier. The current 
characteristics are shown in Figure 4-6. The sense amplifier is first of all precharged 
and then is allowed to sense the memory cell content. The current characteristics for 
sensing a T and a '0’ are obtained. These characteristics are symmetrical so that the 
CSCD method can be applied. The current consumed by the sense amplifier should 
be sensed within the sensing period and the read completion signal should be 
generated when the data is ready. The sensing period and the time when data is ready 
is shown in the figure. One may wonder why there is still current flow when the data 
is ready. This is because the bit line capacitor is still discharging. The current 
Page 75 
Chapter 3 Implementation Asynchronous Memory Design 
。 -5.0Mr - ： 1： :厂 ^ PullHighI 
M I - 1 0 . 0 M r ’ / —-) 
星 1 5 oM ^ I _ |J J I I j|_ I I- _ • T I Lt___4___1 1——^y \ 1- _ 1 I I I I I I I ^ 
-5 : OM “ - : - ^ : : , ^ - ~ ‘ ^ 患 Pull Low I 
A L 三 ；/ -- . 
M 1 _ 1 0 . OM : / 一： ^ P N : I I / , I _5 I c n M E- I . • I r 1 • I I I I I U. 1——u~o_^  I I 1 I < I I I I I ^ ‘^ • "eO ON 90 ON 1 00 . ON 1 1 0 . ON 120.0N 13 0 . ON 8 0 ： 0 N T I M E ( L I N ] 1 3 0 . 0 N 
=^=z=j><^=4> Data Ready 
Precharge Sensing 
Period Period 
Figure 4-6: Modified Sense Amplifier Current Characteristics 
characteristics will reach the final steady state when the bit line capacitor is fully 
discharged. 
From the reasons explained above, the modified sense amplifier is used for 
the CSCD simulations. Actually, this sense amplifier is much faster than the 
conventional one. Also, this sense amplifier is used in the benchmark simulation as 
well as the D V S C D simulation so that the results can be easily compared. 
4.3.3 RESULTS 
The graphical results for the read control signals are shown in Figure 4-7. In 
this case, the results for Region 3 under typical condition are shown. It is observed 
that all the read control signals can be generated satisfactorily following the right 
sequence. The results for the fast and slow case are similar to the typical ones except 
for the timing difference. Also, it is found that the results for reading a '0’ or T are 
the same. 
Page 76 
Chapter 3 Implementation Asynchronous Memory Design 
；« * 業 % % X % % 米 X % % ^ % % >f % % X % % % % X % % % % )F % % % % >T ^ 51^、- % % % % % >S ^ ^ % t % % 术 % % % ^、. % X % % % Sf % t. X、. % ；< 1 % % % 
V . o - p ^ ^ “ ^ ^ ^ • ： • ^ ^ R E Q 
0 L - - ^ 
L I - . . . -T N 2.0 r -
0 - I . I I. I I , I I I I I ,1 .1 I I 1 I I I I . I.丨.I I I I I 1 . I I I ~-^1~^  
.-““厂： ^ ^ ~ • _^  PC 
1L [ � r :A  
L I - 二 T N 2 . 0 : -
二 I I Li I I I I I I I I 1 I I I 1 I I I I I I I I. I I I I I I I IJ “ ：[ f ^ … … . ^ ^ ^ . 
T N “― .• V^__^ : . 1: 
Q -.| ., I I I 1 I I I I • I r^-1 I 1 I~~i““i""i““r~i““i~rn~~i~~i~i~~r~i~~i““nrr^  
V " L … … … … : : . " ^ " ~ ~ " ^ ~ “ ^ ^ ^ C R 
0 L - - ^ L I - . , . • . • 二 T N 2 - 0 r • • ： 
0 ~ I I I I I I I_ J1 .1.. I, I. . I I I I 1 I I I I I I • I 1 I I I I I I I Lj_^  
V “ ^ j . :j . . f : ; . : . . . … . l ^ . A C K 
0 L - - ^ 
L I - • • . -
T N 2 . 0 : • : 
-I I I I I I I I 1 i I I I ,1 I I I I I I I I I .丨 I •.丨 I I I I I I “^^  A 0- q 5 0 ON 5 0 0 ON 55 0 . ON G 0 0 . ON 6 5 0 . ON 7 0 0 . ON q 0 0 . 0 N 丁 I M E 〔 L I N〕 7 4 0 . 0 N 
Figure 4-7: Region 3 Read Signals Graphical Results 
The graphical results for the read acknowledge signals generated for different 
regions are shown in Figure 4-8. Recall that in Table 3-2, the bit line load for Region 
1 is the largest, whereas that for Region 4 is the smallest, the results show that the 
read acknowledge time increases with the bit line load as expected. 
The numerical results for each region are shown in Table 4-4. The average 
results are also shown which assume that the probability to access the memory cell at 
each region is the same. The Read Completion Time and the Read Acknowledge 
Time are plotted against the bit line load in Figure 4-10 and Figure 4-9 respectively. 
As observed from the graphs, these timing results decrease from Region 1 to Region 
4. This is expected since the bit line load decreases from Region 1 to Region 4. 
Page 77 
Chapter 3 Implementation Asynchronous Memory Design 
1 冰 %%%% % 米 % ¥ % % ^  % % % % % %%^% 呆 Hf % % % 輩 % %、-、.、.％ 5K s< ^ >»f % 米 % ¥ ^  ^ - ^  ^  半米 % %、•、^ ^ ^  ^  ^  戈 ^ t.W:^.yf. % % % % % %�. 5K 
S 二 ^ ^ — ^ n f l - : 卿 
L H . 0 7 I I _ A  T : I Region 1 丨：• L 2.。「 . ： j p A C K 
0 I J _ 1 _ _ I I I I _ _ L _ L j _ _ _ I _ _ l _ u J _ I _ L _ L _ L _ I _ L _ U I I I I I I. I I I |1 ,1 ^ 
V .： 厂 ^^“^  - A C K 
[^  M . 0 一 . . —- A  
T ： Region 2 : L 2 . 0 一. j: 
1 ： — 
“ I I I I I I I I I I I I I I I I I 1 I , I , I . 1 1 I I I I I I 1.1. I I ~ ~ ^ 
y 0: 厂 ^ ^ ^ ^ - A C K 
U q . 0 7 • : A  
T ： Region 3 : L 2 . 0 -.… j • : 
1 ： -
- I I I I I I I I I I I I . I I I I I . I I 1 . 1 . I I I I I I I I I I “~~2^ 
V °： r ^ “ | i ACK 
〔丨 M . 0 7 - A  
T : Region 4 : 
L 2 . 0 -一 
I ： 一 
-I I I I I I I I I .1 I,, I. .1 I I I I I. I. I 1. 1 I I .1 I I I. I I • •'• I "~A 0 • — q 5 0 0 N 5 0 0 0 N 5 5 0 . 0 N 6 0 0 . 0 N G 5 0 . 0 N 7 0 0 . 0 N q00 . ON TIME 〔 L I N〕 74 0 ON 
Figure 4-8: Read Acknowledge Signals Graphical Results 
r i ^ ^ j ^ f t i 
Region 1 Read Completion Time (ns) 143.84 181.58 351.30 
Read Acknowledge Time (ns)~~144.18~~182.71 ~~352.88 
IU<^ ion 2 Read Completion Time (ns) 107.88 136.37~~264.66 
Read Acknowledge Time (ns)~~108.22 137.50~~266.23 
Region 3 Read Completion Time (ns) 71.92 90.41 176.44 
^ g ^ f ; p Read Acknowledge Time (ns)~~72.76 91.54 177.23 
Region 4 Read Completion Time (ns) 36.30 46.34 89.01 
Read Acknowledge Time (ns) 36.64 47.47 89.79 
Page 78 
Chapter 3 Implementation Asynchronous Memory Design 
Average Read Completion Time (ns) 90M""113.68""220.35 
ReadAcknowledge Time (ns) 90.45 114.81 221.53 
Table 4-4: C S C D Numerical Results 
400  
350 --- 7 
300 - ^ ^ ^ y ^ … … 
,250 -.-------………-—………y- | + 敞 
^ 200 ^ ^ ^ ^ ^ - -®—Typical 
‘ 1 5 0 - — … ^ ^ .^^^^^..^^^, l"^SIow 
1。。_ _^ ^^ ^^ r^^ r;":^ ^^ _^@^ ^^ ^^ ^^ ^^ ^^ ;;;;:;:;:S-^ ^^::::^ -^
50 "^^；^；^;：：：：：：：；^?^：："：^.  
0 J ！ i  
6 12 18 24 
Bit Line Load /pF 
Figure 4-10: Graph ofRead Completion Time vs. Bit Line Load 
4 0 0 I ：一：:…I 
350 - 7 
300 - - … … … … … . … ^ ^ 
. 2 5 0 Z | + Fast 
f 200 ……^^^^^ ^—Typical 
^ 150...-^^^^ ,-^^^;^;;^5^:丨一-
10。_.^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ _:^ ^^ _^ ^^ _^_^ , 
50 @^^^^:::Z^rrr:^----- ---……-
0 \ i i  
6 12 18 24 
Bit Line Load /pF 
Figure 4-9: Graph ofRead Acknowledge Time vs. Bit Line Load 
Page 79 
Chapter 3 Implementation Asynchronous Memory Design 
The C S C D numerical results are compared with the benchmark memory 
results. The average Read Completion Time is 90.00 ns, 113.68 ns and 220.35 ns for 
fast, typical and slow simulations condition respectively which is 84.59 ns (1563.6 
o/o), 98.95 ns (671.8 % ) and 184.42 ns (513.3 % ) slower than the benchmark results. 
The average Read Acknowledge Time is 90.45 ns, 114.81 ns and 221.53 ns for fast, 
typical and slow simulation conditions respectively which is 84.01 ns (1304.5 %), 
99.00 ns (626.2 % ) and 184.16 ns (492.8 % ) slower than the benchmark results. The 
CSCD results are found to be several times slower than the benchmark results. This 
is explained as follows. 
4.3.4 OBSERVATIONS 
The relationship between the sense amplifier data output, read completion 
signal and the sense amplifier switching current for the four regions are shown in 
Figure 4-11. The sense amplifier is simulated to read a ’0' from the memory cell so 
that the data output is pulled L O W . As observed from the figure, the current is still 
switching when data is ready, and the current switching response is becoming slower 
and slower from Region 4 to Region 1 due to the increase in equivalent bit line load. 
For the CSCD simulations, the threshold of the current sensor is set such that the read 
completion signal R C will be generated when the current drop is about 4 m A (from 
16 m A to 12 mA). However, from the simulation results, when data is ready, the 
current drop is only 1 m A (from 16 m A to 15 mA). Therefore, R C is not generated 
close enough to data read completion. 
Page 80 
Chapter 3 Implementation Asynchronous Memory Design 
X 束 % % ^ % % % X 术 X % % % % % % % 术 ^ ^ ^  % 米术 % % % % % % 术术术 % % X 来 % 半 1�- ^  % % % % 米 % % % % % % ^ t- * 术 5^  劣 ^�• ^  术米 ^ 米 '^ ^ 某劣、 
,L ..0^=-"^ 4 D A T A 
Region 1 0 . ^^L \.. \h 
丁 . 0 E- . .. >.U^ _I__.__.__._.~~I~~.^ ‘^^ ‘^‘~I~"丨~~^  
； . " ^ ^ ；： ]pp 
V L 2 . 0 E- f • ^ \ i: ^  
0 1 - ""-^ ^ : 
t N 1 . 0 — • ^--c__ ^ 
E__J I I I I I ri~~,"^“^^ 
� L . 「 。 … 门 _ _ ^ ^ C U R R E N T 
^ • -lO.OM"^  • ^ _ _ _ _ ^ - - ^ ^ ^ ^ ^ 〒 
.1 5 0 M — . I -l—j~~-j-^__L__J-^ r"^T"^V • • I I I I I I ‘ I ^ 
vL M.o^ •• K^ - D A T A 
Region2 p • .^ol- ) • • ？ 
T 0 L , , , ,. , 1 _ , _ _ i ^ . ^ ^ . ~ . ^ ~ ‘ " ^ ^ I ^ ‘ " ~ ‘ ~ " ‘ ^ ‘ ~ I ~ ^ - ^ 
V L 2 . 0 ^ . . . . [ : “ ^ ~ ~ ~ ^ " ^ 1 , K C 0 1 : \\ - ^  L N 1.0? ： ^ -^ - -^______^^ ] 
g ^ . . I I . . . I I • I • I I -1 • . , . • i . 1 I • • I . I ~ ~ r ~ ~ ~ T l . I • . ^ 
,L _ " ^ 门 ^ _ _ ^ ^ - - ^ " " ^ ^ ^ C U R R E N T 
p N - 1 0 . 0 M f ^ ^ ^ ^ ^ ^ ^ ―： ^  
.1 5 oM — • • I • V I - -i __<^ ~~+—-r""7 1 r i • -1 1 .1 • I I • I- • ' - ^ 
V L [ � - = " ~ ^ 1- D A T A 
Region 3 。 ？。^  .:. ... ...二， 
丁 二 ， 丨 I . I . . . . I • ^ , _ _ , ^ , _ _ I _ ^ , _ i 0. _ I ‘ • • ^ ‘ ‘ ^ 
V L 2 0 ^ ( “ ~ ~ ~ ~ ~ ~ " " " " ~ " ^ — R C 
0 1 — 三 \ 三 ^  L N 1 .0 r ^ ^ _ _ _ _ _ ^ 1 
g i=~— I • I • I • I • I 1 1 - 1- T~~~i~~i ~ ~ I • I ““^TT ~r ~~^ 
“ -5.oHpn ^ ^ ^ ^ ^ ^ _ _ — ~ ~ ^ C U R R E N T 
口 ^ - 1 0 . OM�. • ^ ^ ^ ^ ‘ ^ . -： ^ 
‘ 15 OM ^ 1. V• I - • I--T^ "^ r^ '^  I I ! I I I 1 I . I • ‘ I ^  
V L ..0^^-^ ^ D A T A 
Region4 p . 2。—..…… : , 
T 0 L , .., L _ , _ _ I ^ , _ _ . _ , _ _ . _ - 1 ~ I ~ ~ I ~ ^ ‘ ^ ‘ ~ ~ I ~ - — ^ ^ 
V L 2 . 0 '^ • 厂 . ~ ^ " 、 \ j . i K C 0 1 : \ - ^  L N 1 . 0 — j \\ 1 
丁 - ~ ~ ~ \  
Q ^  I 丨 1 T I 7'~~~\ ~i““ . I ~1 “‘ 1 ~~• • I ~ “ 卜. •'• ^  . • ‘ ”^  
Qi -5-0M^ • ^ ^^ ,.^  — ^ ^ ^ ^“^^~~^ C U R R E N T 
S 1 : / - ^  
p N - 1 0 0 M - . y^. 三 -1 R n M : I l_]^--^T''^'^  I I I I • I I I I I I I 1. I ^  1�•…丨 5 0 0 ON G 0 0 _ ON 7 0 0 . ON 4 0 0 . 0 N T i M E � L I N� 7 E 0 . 0 N 
Figure 4-11: Relationship Between Data, Read Completion Signal and Current 
One may wonder why the current sensor cannot be made more sensitive to 
detect the 1 m A change. However, for the present current sensor circuit, we found 
that if the current sensor is scaled to be more sensitive, R C cannot be generated 
properly. Actually, it is a trade-off. To make the current sensor more sensitive, the 
Page 81 
Chapter 3 Implementation Asynchronous Memory Design 
pull down part is scaled up so that the voltage level of the sensor output is decreased 
when it is busy. Ideally, when the current sensor is busy, the sensor output is pulled 
to V D D . As shown in the figure, the sensor output is about 2.7 V (which is a bit 
higher than the threshold voltage for the next stage) when the current sensor is busy 
in order to increase sensitivity. Therefore, it is hard to increase the sensor sensitivity 
while maintaining its correct operation. 
Conclusively, the CSCD results are much slower than the benchmark results 
due to the slow current switching response. To maintain its correct operation, the 
sensitivity of the current sensor cannot be increased to the desired level. Therefore, 
the CSCD method is not suitable for generating read completion signal in the 
proposed asynchronous memory system. In other words, the CSCD method can be 
made practical for generating read completion signal if the problems stated above can 
be solved by proper circuit modifications. 
4.4 VOLTAGE SENSEVG COMPLETION DETECTION 
4.4.1 CmCUIT SCHEMATIC 
The simulation circuit is shown in Figure 4-12 which is similar to that of the 
benchmark memory simulation shown in Figure 4-3 except that the bit line pair is 
segmented by the transmission gates ( t r a n ) into four regions. Also, the 2-to-4 
segment decoder is activated so that the M D C G circuit is used to generate the write 
Page 82 
Chapter 3 Implementation Asynchronous Memory Design 
completion signal. The D V S C D circuit is used to generate the read completion signal 
so that the D V S C D method is simulated. 
4.4.2 RESULTS 
The graphical results for the read control signals are shown in Figure 4-13. In 
this case, the results for Region 3 under typical condition are shown. It is observed 
that all the read control signals can be generated satisfactorily following the right 
sequence without glitch and hazard. The results for the fast and slow cases are similar 
to the typical ones except for the timing difference. Also, it is found that the results 
for reading a '0' or '1' are the same. 
The graphical results for the read acknowledge signals generated for different 
regions are shown in Figure 4-14. It is found that the read acknowledge time 
increases with the bit line load as expected. 
The numerical results for each region are shown in Table 4-5. The average 
results are also shown. The Read Completion Time and the Read Acknowledge Time 
are plotted against the bit line load in Figure 4-15 and Figure 4-16 respectively. As 
observed from the graphs, these timing results decrease from Region 1 to Region 4 as 
expected. 
Page 83 
Chapter 3 Implementation Asynchronous Memory Design 
V 
8 M ||. 
I 个 r:^， 
S: ("^ • — Li" 
� [T] >^ C/)〇 I L 「 
;|]^^ 〇 c ^ 
II _ . I 1 [^1 E o ① I 
—… r ^ ①。。^  . V ^ • • -
o i^ so ^ “ X 乙 — — > 
人 i od ~ T ] 
, i n : . 1 5 g E i 
§ 3 W II ( ) U.—  
i . ...p^ "^ “ r^ ,二 ^⑴⑴ i p < M ^ > -g __^Ni^i^no~« T ^ . 
P= S ^ 4 \ r ^ , —• U § 
r "^“^-^~ rT] 11「 g 
L_ ^ ~^" 
^ 1^ «^  ^m s 
5 E, “丨丨“  ^ ^ f\ 
B ^ CD |\ 
T ^ • cn  "^‘丄巴/^ ~^~ 资费 2 
‘ "“ ^ ^ y f ^ ~ ^ ” 1 — g 
^^0t/0tm \ ‘‘ I 
I 4 1 ^ 
, o i““QS~~t S 5 ~^� -
^ M cM > cD a: S ^ CT) 
2 ^ ^ i ； f • : 
5 ^ T .=: 3一~1 1111 i: I i i s 
.i T ^ s I -”"“’ L 2 . L L L L 
o> p^l"^^ Ls__ o ^ _ 
5 ®~ Nl ^^ i^nO ~"""» “ i “~ M3 i l«A ~"S]"A 
— / ^ P ^ A M 2 �dA ~"-M 3dA 
^vati \ “ 
“ Jg 5 ,3 
4> .£ o 2 ", o' J z r gs 
§ EiM ~ n 
^ <D 
^ ^ = £ 丨丨I, 
^ s 
卡 „ r < A ^ , 卞 _ — t r 牛 
"^ "^~N^ />l^ o~" ^ ^ 1 r ^ 
0^t/0ti N ^ £日3 
I 4 1 I~~»^ ea g 
I a^ ZQQO 
“ « ^ 乙 日 ^ a I “^ 1日9, ® i “ ^ L8 X 21 I~“^ 089" ” ^ J “® 03 ”“ r ^ k ^ ,1 ^ 1 ‘ n J 33 
I~~a~" Nl y>]^ ino ~~“ «丨 _[ ® ^ 舰 / ^ P ^ I '^^^~ ； — fM a L^ 0t/0ta \ I 1——I ^ f ® caM g 
I 1 芸 I f “~ 2M 。丨 _^  ^ I f “^ 19M .^  COWV ~~^m f^ 3A ^ I © s——iM ？ ^^A —^ 乙納3八 ElM ——^ I “ 0BM l�MA ^ L^�A 
^ I “ 0M 0D/AA ^ 嶋八 
a 1 ^ I I 
� \ ^ ^ J ^ ^ ^ ^ 
I"~~^  N;^^^^^°""" ^ 9 
L^t/0ti ^J m 
m 
1 ？ ~ " L i 
I o CQ CD 
_al T T m rO <^l OJ — •“ 
=j m > m > m > ® ^ 
^ E>5j-jsp0:>ap-|70iz 
eiM — T 〜-3 1 ——|f L ^ ii 
^ i i -<< 
Q CD k S> s 
S 
Figure 4-12: Memory Simulation Circuit with D V S C D and M D C G Circuits 
Page 84 
Chapter 3 Implementation Asynchronous Memory Design 
% X * X * X * * * * X X * % * * % * * X * X * * X % * * * * * *. i« * * X X X X X X X * X * * 1 * * * * * * * * •* * * X * .* * * * * »； *�* * * 
V q . 0 二 : : : ~^~"1 二 • REQ 
0 L 二 二 k 
T N 2.0 r   
0 — 1 丨 丨 I . . I . . . I . , , ,1 - . I , . . I ,--.丨 . . 1 . , . I I . . . .1 .. I . 1.. -1 . . . . I . . . I . I 1 _ I 1 1 — ^ 
^“\ ‘ ^ • ^ ^ pc 
V M . 0 - • 二 , r c 
0 L : - ^ 
L I - 二 
T N 2 . 0 r 二 
0 : . I... I. I. I I L_i I . I I. I I I I 1 I. . I .. I I I I 1 ‘ J 
V q . 0 r j j I 1 . . . : . _ _ R C _ 
0 L 二 二 h 
L 1 , ,, • — 
T N 2 . 0 _ : 
0 ^ _ _ , _ _ , _ _ I _ ~ J . [ .卜 I I i"^“^.~~I~~I^“i~~I"~I“^‘~^I~^‘~I~~~^^~‘"~\ 
V q . 0 - ~ "‘“““ ~ C R 
0 L - - ^ 
L I - . . • -T N 2 . 0 ； : 
0 “ I I . I I I . I I _ _ I I I I I I I I I I I L _ ] — — I _ 2 \ 
V . 0 - [ ：• . 1 L A C K 
0 L - • ： ^ 
^ i 2.or -: 
- 1 i I I I i I I 1 .. .t . I. i. 1 .1 i . I. . 1 I I I A j — I _ ^ 0 . 2 0 0 0 N 2 2 0 0 N 2 M 0 . 0 N 2 G 0 . 0 N 2 9 0 . 0 N fi) 0 . 0 N 1 80 . ON 丁 l M E � L 1 N� 3 0 0 . ON 
Figure 4-13: Region 3 Read Signals Graphical Results 
« y X X X % % * « ¥ * X W. % % % % * X * X 乂 % X * X ^  * * X、X X * 乂 X * % % % * * ¥ * X >C * % % % % X * * X V ¥ ¥ «、% % * * * ^  * * * * * 
V ： r n - REQ 
'^ “： I D . 1 I “ ^ — — ^ T I Region 1 丨 ：• 
L 2-or I 1 : A C K 
1 — I . -
a - I I I I _! 1 I I I 1 L _ 1 I I I I ’ I ‘ I I ~ I — I — ‘ — — ^ ] 
V ^ 二 / _: A C K 
T ‘ ‘ -: Region 2 ： ^  
L 2 . 0 - T 
�】！ - : 
- , , , I . , , I . > 1 I , , , I - . i I 1 '^  1 _ ^ 
S 、 I 1 i A C K 
[; M . 0 7 . - 乙 
T ： Region 3 ： L c' . 0 - : 
i 二 ， , , : , , 1 I I , I I . I I 1 I I I j I I ' I i 1 i, 
V 〔’ 二 f ^ ACK 
0 q G - I — 二  
T - I Region 4 ：— 
L 2 . 0 : : 1 二 _ 、 - . , I ！ , , . I i/ , 1 I i 1 - I 1 1 1 I 1 .' J ； _ 3 0 ^ 0 0 on 22 D 0 N i'M 0 . OU E'bO 0!i d 8 0 . 0 fJ S^; 0 OfJ 1 9 0 . ON T 丨 HE � L 丨 H� 3 0 0 OM 
Figure 4-14: Read Acknowledge Signal Graphical Result 
Page 85 
• 
Chapter 3 Implementation Asynchronous Memory Design 
• ^ ^ ^ ^ ^ 譽 w^vvvy、y:，L ‘ V "“^~: • . •、、 
lCy^ "^^ .^ :.:_ • 
Region 1 Read Completion Time (ns) 8.39 厂 18.68 40.41 
Read Acknowledge Time (ns)~~9^ 19.64 42.06 
5iiiSfc^v Read Completion Time (ns) 5?73 u7lS 30.28 
Read Acknowledge Time (ns)~~sM 14.86^ 31.51 
Region 3 Read Completion Time (ns) 3Jl ^ ~ 19.59 
Read Acknowledge Time (ns)~~454 ^ 21.10 
Region 4 Read Completion Time (ns) 007 4 ^ 9.59 
Read Acknowledge Time ( n s ) ~ ~ ^ 5?f5 11.10 
Average Read Completion Time (ns)~~ 4.43 11.32 24.97 
ReadAcknowledge Time (ns) JT3 12.52 26.44 
Table 4-5: D V S C D Numerical Results 
4 5 丨：丨：:::::.:兹^:丨因:::•:.:::::^盗,:涯::::※彳::::•::丨丨兹丨:丨舉:::•:_凝.:::::靈:::歲：：麵職丨:_::::::::::::::姿,.::::_:靈:::::_:::::::缀丨:::::::::::,•::禱::丨:::::::::::••_:,g變:,:,:_::::: 
40 - _ _ _ _ _ _ • 
35 J||||||M^^ 
3 0 — - ^  
也 Z |-^Fast“ 
% 25 - ----- - ------ Z ^ ^ 
^ ^ ^ _a_Typical 
j| 20 Z ----- -^^^,, _^SIow 
: ; : z f : ^ ^ ^ l 
5 | _ , ^ ^ ^ ^ ^ _ _ ^ . _ ^ < ^ 一 �_<^^r^:^^^^^^._^^_^_^_^^___^-——--…--
0 U ^ i i  
6 12 18 24 
Bit Line Load /pF 
r\ 
Figure 4-15: Graph ofRead Completion Time vs. Bit Line Load 
Page 86 
Chapter 3 Implementation Asynchronous Memory Design 
45_^  ^ 
40 - … … " j ^ -
35 - … … ^ ^ ^ ^  
30 - Z -…… 
Z » Fast““I 
2 25- — … … ^ ^ ^ “ … … … … ^ ^ 
^ ^ ^ _ ^ T y p i c a l 
I 20- ^ ^ 一 … + … … • … ‘ - ‘ … ‘ ‘ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ . ^ S I o w 
" t > ^ - - - ^ ^ ③ ^ ^ … … -
^ ^ 1 : 5 ^ ^ ¾ 
0 •• 1 1  
6 12 18 24 
Blt Line Load /pF 
Figure 4-16: Graph ofRead Acknowledge Time vs. Bit Line Load 
The D V S C D numerical results are compared with the benchmark memory 
simulation results in Table 4-6. In the table, the differences in time are shown as well 
as the percentage change with the benchmark results as the reference. The average 
Read Completion Time is 4.43 ns, 11.32 ns and 24.97 ns for fast, typical and slow 
simulation conditions respectively which is 0.98 ns (18.1 %), 3.41 ns (23.2 % ) and 
10.96 ns (30.5 %) faster than the benchmark results. The average Read Acknowledge 
Time is 5.43 ns, 12.52 ns and 26.44 ns for fast, typical and slow simulation 
conditions respectively which is 1.01 ns (15.7 %), 3.29 ns (20.8 %) and 10.93 ns 
(29.2 o/o) faster than the benchmark results. 
y 
Page 87 
Chapter 3 Implementation Asynchronous Memory Design 
^ ~ " ’ I ‘ 1 
_ ^ ^ ^ ^ S "^"""•"^ •^  
, ^ ^ _ ^ ^ 4 丄 丄 \ \ 一 ? ^ : ^ . \ . . . 、 \ …、 ;. 
Kcgion 1 Read Completion Time (ns) +2.98 ns +3.95 ns +4.48 ns 
(+55.1%) (+26.8%) (+12.5%)  
^^_^ _^^_____^____^____^_^_^ _^^ ^^ _^  ^^^_,^ ^^__^ ^^ ^^ _^ ^^^ ^^ _^__^ __^ ^ ^^^ __^ _^ _^^ ^^ _ 
::：；礙；發 Read Acknowledge Time (ns) +2.98 ns +3.83 ns +4.69 ns 
(+46.3%) (+24.2%) (+12.6%) 
Region 2 Read Completion Time (ns) +0.32 ns -0.95 ns -5.65 ns 
(+5.9%) (-6.4%) (-15.7%)   
Read Acknowledge Time (ns)~~+0.41 ns -0.95 ns -5.86 ns 
(+6.4%) (-6.0%) (-15.7%) 
Region 3 Read Completion Time (ns) -1.90 ns -6.22 ns -16.34 ns 
(-35.1%) (-42.2%) (-45.5%) 
""''H"  
^ ¾ ¾ ¾ Read Acknowledge Time (ns) -1.90 ns -5.98 ns -16.27 ns 
(-29.5%) (-37.8%) (-43.5%) 
Region 4 Read Completion Time (ns) -5.34 ns -10.42 ns~~-26.34 ns 
(-98.7%) (-70.7%) (-73.3%) 
Read Acknowledge Time (ns) -5.55 ns -10.06 ns -26.27 ns 
(-86.2%) (-63.6%) (-70.3%) 
Average Read Completion Time (ns)~~ -0.98 ns -3.41 ns -10.96 ns 
(-18.1%) (-23.2%) (-30.5%) 
Read Acknowledge Time (ns) -1.01 ns -3.29 ns -10.93 ns 
(-15.7%) (-20.8%) (-29.2%) 
Table 4-6: Comparison Between D V S C D and Benchmark Results 
Page 88 
Chapter 3 Implementation Asynchronous Memory Design 
It is noted that the Read Completion Time and Read Acknowledge Time for 
Region 1 is larger than the benchmark results although the bit line load is the same 
(24 pF). This is due to the delay of the transmission gates which are used to segment 
the bit lines. However, from Region 2 to Region 4, the decrease in equivalent bit line 
load gradually compensates the delay of the transmission gates so that the overall 
read times are decreased. 
The average Read Completion Time and the average Read Acknowledge 
Time under all simulation conditions are faster than the benchmark results. This 
implies that the Variable Bit Line Load together with the D V S C D method is faster 
than the conventional one. 
參 
4.5 MULTffLE DELAYS COMPLETION GENERATION 
4.5.1 CmCUIT SCHEMATIC 
The simulation circuit is shown in Figure 4-12 which consists ofthe M D C G 
circuit for write completion signal generation. The differences between the 
simulation circuit and the benchmark simulation circuit are described previously in 
Section4.4.1. 
Page 89 
Chapter 3 Implementation Asynchronous Memory Design 
4.5.2 RESULTS 
The graphical results for the write control signals are shown in Figure 4-17. 
In this case, the results for Region 3 under typical condition are shown. It is observed 
that all the write control signals can be generated satisfactorily following the right 
sequence without glitch and hazard. The results for the fast and slow case are similar 
to the typical ones except for the timing difference. Also, it is found that the results 
for writing a '0' or '1' are the same. To determine the Actual Write Time, the SB and 
SBB signals are probed. These are the signals at the storage nodes of the memory cell 
and the simulation is set up such that the toggling of these signals signifies that the 
data are being written. 
The graphical results for the write acknowledge signals generated for 
I 
* * sr * * * >< * * * ^  * * * % % ^  * * ¥ * * ¥ * *^  t X ¥ •». X * X >； ¥ * X * S( * * * * * * * * * * * * *. * * * * * >« * * S^、* * * * * * * * * * 
�L - " “ ^ 1 ^ . ^ ^ 
U L - --L I - 二 T N L) . 0 r 二 
1~ 1 I 1 I I t I I - 1 .1 I I I I i j I . I I i~I i 1 ^ 
V .;'o- ^ ^ ^ 」u C W 0 L : - L 
L I - 二 T [.J 2 . 0 r -
|~ I I I I i' I 1 I I I I I 1 I 1 I I I 1 I I I 1 ！ ^ 
。- \ r ！ 0 0 V M�I — \ : , OD  0 L . . ： \ -乙 L 1 - 二 „ T N a. 0 r - °  
0 t ^ 1 I I i _ ' ' T ^ ' u . _ L _ 1 _ _ 1 — I — I _ 1 _ J _ J I 1 1 1——I——！——！——L_g S B B 
1 -： '一丨 i w c 
q n — _ ^  
n L • ‘ 二 - L L I - 二 
T i,i .^ 0 r I -
r I ： I I ] I I I 1 I I I I I I I I I I ！ I I \ — ^ 
0 — — u V q 0 二 I — A C K  G L - - 二 
“ . 0 . \ 1 
r I I » I I I fi I I I I I I 1 I I I I I I I ^ j I ^ 0 2 0 0 N M 0 0 N 6 0 G N 8 0 .〔丨 N 1 0 0 . 0 N 1 2 0 . 0 N 2 0 . 0 N T I M E ( L 1 M J 1 M 0 . 0 N 
Figure 4-17: Region 3 Write Signals Graphical Results 
Page 90 
Chapter 3 Implementation Asynchronous Memory Design 
different regions are shown in Figure 4-18. Actually, the write acknowledge signals 
are generated based on the Actual Write Time. The results show that the write 
acknowledge time increases with the bit line load as expected. 
The numerical results for each region are shown in Table 4-7. The average 
results are also shown. The Actual Write Time, Write Completion Time and the 
Write Acknowledge Time are plotted against the bit line load in Figure 4-19, Figure 
4-20 and Figure 4-21 respectively. As observed from the graphs, these timing results 
decrease from Region 1 to Region 4. This is expected since the bit line load 
decreases from Region 1 to Region 4. 
>< •^  ^ % ¥ %. % % 术 ¥ X ^ >^  % % % % % X ?(f 劣 ¥ 半 ¥¥ .X X X X ¥ S< >?米 X X ¥ >^  ^  ¥ X、% ¥ )i； ^ * 术 V * 本 5< * >^  X ^ 、 ^ 呆米 ^ 本劣本；*^ 某 ^ 半 1 术 ^ ^ 
V ： :| ^~~p- ^ 1^1 - R E Q 
‘ [0 - . • n . . ‘ 1 - ^  ^ ^ 
T i Region 1 I ： • 
^ "〕-- 1 “ A C K 
N : I \ -
Q r I I i _i I I I u」.丨.I - I . 1 I I.. I I I ‘ I i~1~^—I——a 
X " L / ^ ；~"^  1 i ^  A C K 
T “： : Region 2 _, ^ " r . . ：-
N _ 1 -0 r I I I I I I I I I I I I I I I I 1 1 - I I I M—1——^  
S 。^L . 「 :^^ i^ A C K 
T ： Region 3 ： L 2.0 - -I ： -N _ , \ _ 
a r I I ^ I I I 1 1 1 I I I I I i I 1 I I I V I — I — —么 1 . - f _: A C K P M . 0 — n • A - ^   T ： Region 4 二 
L 2 . 0 - 7 
1 ： 1 -
r . . I I I i i ..1 I 1 I I 1 I I I I . I I ‘ I > ^ I — ！ — ^ 
0 .2 0 0 N M 0 0 N S 0 0 N 8 0 . 0 N 】0 0 . 0 N 1 ^ 0 . 0 N 2 0 . 0 N T I M E I： L I N J i 4 0 . 0 N 
Figure 4-18: Write Acknowledge Signal Graphical Results 
Page 91 
Chapter 3 Implementation Asynchronous Memory Design 
^ " ^ ^ ^ T " \ Actual Write Time (ns) ^T:C64^^'2'{A6 ' Tl.sT 
Write Completion Time (ns)~~16.74 22.55 38.95 
Write Acknowledge Time (ns) 16.95"""23.37 39.73 
Region 2 Actual Write Time (ns) m 13.58 20.98 
Write Completion Time (ns)~~11.09 H 3 3 25.00 
Write Acknowledge Time (ns) 11.76 15.09 26.11 
Region 3 Actual Write Time (ns) 5 ^ TT\ 12.67 
Write Completion Time (ns) TA9 0 8 14.13 
Write Acknowledge Time (ns)~~TM ^ 15.24 
Region 4 Actual Write Time (ns) 1 ^ 3Sl 6.08 
Write Completion Time (ns) 4 ^ ^54 7.11 
Write Acknowledge Time (ns) 5.03 5.36 8.30 
Average Actual Write Time (ns) 7Jo 11.57 17.81 
Write Completion Time (ns) ^ 12.50 21.30 
Write Acknowledge Time (ns)~~10.41 13.33 22.35 
Table 4-7: M D C G Numerical Results 
Page 92 
Chapter 3 Implementation Asynchronous Memory Design 
35 -^   
30 Z 
25 y ^  
» 20- , Z /^ _^f^st 
« ^ ^ ^ ^ -®—Typical 
I 15- ^^^ p>^-…--…-^5^SIow 
r^^^^' 
5 寸 魏 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ . - > ^ - • “ ^ 
0 J i i  
6 12 18 24 
Bit Line Load /pF 
• 
Figure 4-19: Graph of Actual Write Time vs. Bit Line Load 
4 0 :::,:•、:::::::：丨：丨丨：:::::丨丨：:灘麥丨丨兹:齒滋终：丨::法丨:::::兹:纖靈::盖::逸:：丨缀:•潘__議,涵:_:::::::::：：__:::,:::::：丨：::::,：：,:::::,發:,輪,::::::_:::::::,,,,,,,.• 
35 - ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ S ^ ^ ^ I i . 
30 - ---- -- - y ^ 
广 _ -―… - — 7 ；丨+Fast 
o 20 - y ^ -… ^ ^ ^ ^ -*-Typical 
^ 1 5 - … … _ … ― — — Z — , ^ < " i ^ . "^SIow  
- ^ ^ ^ 
5 4^ ^^ =^^ =^ =^===^rr:  
0 J i i  
6 12 18 24 
Blt Line Load /pF 
Figure 4-20: Graph of Write Completion Time vs. Bit Line Load 
Page 93 
Chapter 3 Implementation Asynchronous Memory Design 
40  
35 ^ ^… 
30 ^ ^ ^ 
g 25 - -…“ ^ ^ 一………--^, | + Fast 
« 20 - y ^ — … … … ^ ^ ； ^ ^ ^ < ^ -^ —Typical 
P 15 Z-_^^,^^^^J^;^^l^j |~^SIow 
10 -.^ r^r：：：：^- ^^^^^；：^：；：：!"^. - - , 
5 ,^^^===^^.  
0 i ! 
6 12 18 24 
Bit Line Load /pF 
Figure 4-21: Graph ofWrite Acknowledge Time vs. Bit Line Load 
The M D C G numerical results are compared with the benchmark memory 
simulation results in Table 4-8. The average Actual Write Time is 7.50 ns, 11.57 ns 
and 17.81 ns for fast, typical and slow simulation conditions respectively which is 
1.48 ns (16.5 %), 1.66 ns (14.5 %) and 3.01 ns (12.5 %) faster than the benchmark 
results. The average Write Completion Time is 9.83 ns, 12.50 ns and 21.30 ns for 
fast, typical and slow simulation conditions respectively which is 2.80 ns (22.2 %), 
3.71 ns (22.9 %) and -0.22 ns (-1.0 %) faster than the benchmark results. The average 
Write Acknowledge Time is 10.41 ns, 13.33 ns and 22.35 ns for fast, typical and 
slow simulation conditions respectively which is 2.99 ns (22.3 %), 3.76 ns (22.0 %) 
and -0.14 ns (-0.63 %) faster than the benchmark results. 
Page 94 
Chapter 3 Implementation Asynchronous Memory Design 
^^^^^^^^^^j_—>i|^\<^^、w_i、I、、” 1 ^ k<~ """""     
^ ^ P J 
llQifUm 1 Actual Write Time (ns) +4.06 ns +8.23 ns +10.69 ns 
(+45.2%) (+62.2%) (+51.3%) 
Write Completion Time (ns) +4.11 ns +6.34 ns~~+17.87 ns 
(+32.5%) (+39.1%) (+84.8%) 
Write Acknowledge Time (ns) +3.55 ns +6.28 ns +17.52 ns 
(+26.5%) (+36.7%) (+78.9%) 
Region 2 Actual Write Time (ns) -0.1 ns~~ +0.35 ns +0.16ns 
(-1.1%) (+2.6%) (+0.8%) 
Write Completion Time (ns) -1.54 ns -1.88ns +3.92 ns 
(-12.2%) (-11.6%) (+18.6%) 
Write Acknowledge Time (ns)~~-1.64 ns -2.00 ns +3.9ns~~ 
(-12.2%) (-11.7%) (+17.6%) 
Region 3 Actual Write Time (ns) -3.59 ns -5.52 ns -8.15ns 
(-40.0%) (-41.7%) (-39.1%) 
Write Completion Time (ns) -5.44 ns -7.63 ns -6.95 ns 
::::?(::: (-43.1%) (-47.1%) (-33.0%) 
Write Acknowledge Time (ns) -5.49 ns -7.59 ns -6.97 ns 
(-41.0%) (-44.4%) (-31.4%) 
Region 4 Actual Write Time (ns) -6.28 ns -9.72 ns -14.74 ns 
(-69.9%) (-73.5%) (-70.8%) 
Write Completion Time (ns) -8.32 ns -11.67 ns -13.97 ns 
(-65.9%) (-72.0%) (-66.3%) 
Write Acknowledge Time (ns) -8.37 ns -11.73 ns -13.91 ns 
(-62.5%) (-68.6%) (-62.6%) 
Page 95 
Chapter 3 Implementation Asynchronous Memory Design 
Average Actual Write Time (ns) -1.48 ns -1.66 ns -3.01 ns 
(-16.5%) (-12.5%) (-14.5%) 
p^pS^~~Write Completion Time (ns)~" -2.80 ns -3.71 ns +0.22 ns 
(-22.2%) (-22.9%) (+L0%) 
Write Acknowledge Time (ns) -2.99 ns -3.76 ns +0.14ns 
(-22.3%) (-22.0%) (+0.62%) 
Table 4-8: Comparison Between M D C G and Benchmark Results 
The average Actual Write Time under all simulation conditions is faster than 
the benchmark results. This implies that the Variable Bit Line Load together with the 
M D C G method is faster than the conventional one. The average Write Completion 
Time and the average Write Acknowledge Time under fast and typical conditions are 
faster than the benchmark results, but the results for slow condition are a bit slower 
than the benchmark results. This is because the increase in the delay values of the 
variable delays are large when the simulation condition changes from typical to slow. 
This results in the Write Completion signal not being generated close enough to the 
actual write completion. However, the Write Completion Time and the Write 
Acknowledge Time can be carefully tuned in the test chip to be as close as possible 
to the Actual Write Time. 
Page 96 
Chapter 5 Testing Asynchronous Memory Design 
Q TESTING 
5. TESTEVG 
To actually evaluate the performances of the proposed asynchronous 
techniques in practical situation, an asynchronous memory test chip is designed, 
fabricated and tested. In this chapter, first of all we will explain how the test chip is 
designed in Section 5.1 step by step from block diagram, to schematic and finally to 
layout. Then, we will present the post-layout simulation results for the D V S C D and 
M D C G techniques. This includes the HSPICE post-layout simulation results in 
Section 5.2. Finally, the measured results of the test chip are described in Section 5.3. 
This includes the logic results and the timing results. 
Page 97 
Chapter 5 Testing Asynchronous Memory Design 
5.1 TEST CHff DESIGN 
5.1.1 BLOCKDLVGRAM 
In Section 3.1, we have proposed to implement the asynchronous techniques 
for lM-bit S R A M . However, in order to simulate the performance of these 
techniques, there is no need to simulate the entire memory matrix. As explained in 
Section 4.1.3，the simulated performance of the proposed asynchronous techniques 
can be fully observed by the simulation of one column in the memory matrix. This 
idea is also applied to the asynchronous memory test chip which consists of several 
columns of memory cells. 
Also, in order to simplify the simulation calculations, we have modeled the 
memory column by one memory cell connected to the estimated equivalent bit line 
load. In this way, the benchmark simulation results as well as the CSCD, D V S C D 
and M D C G simulation results are obtained. However, in the asynchronous memory 
test chip, we can implement the entire memory column which consists of 1024 
memory cells without any difficulty to obtain more convincing results. 
The block diagram of the asynchronous memory test chip is shown in Figure 
5-1. Basically, the design of test chip follows the S R A M framework depicted in 
Figure 3-1. 
Page 98 
Chapter 5 Testing Asynchronous Memory Design 
__i________:i__^ ^^ ^^  '::r"~:."..".;， 
, N 2-to-4 
) L T Segment Decoder 
iiiiiii _ _ _ _ 國 ^ cvp 1111¾^  
_ i W i B ^ ^ B s 
iiiiiii _ _ | ill I i i |iM^ E ^ |||i||||i |||||||| 丨_8_:;1 丨 丨 丨 i:ii__________9_l_i;i) write |ygpi^  i|i||y||||i||||||^  
|||ii|i| “ __i_ I I I ^|^KS|^g|mSSM Completion _:.:.:;:.释.:.:丨 ^"^"^^^^^^^^^^^^^^^^^^^gppi______:_._ 
l_i_l: ?i ^  __i_ I - I c i ___lil_i_:i__ Circuit i pSi¾pfSP__雜__iM®籠爵_ft_•_::•__• 
iiiiiii ilil:l fj ’ 丨 i 丨 i 丨 (f!!!^i_ o/p m> i _|_圓^^ _ l l 1 | i i l i I 丨 a 3 i | i i i i l i m ^ M i p i i W p _ _ _ ^ _ ^ _ _ iliiii iiliiii 一 & __:i § ；？ I £^?? I Ep?r = ^  ______ _li__iiiniii iiliilt:illiliil^  
i_i _l_i ill ^ 1 I 11 11 ii ^ m iiiiwiM 
_i_i iiii ^1 芝 I s I I g-| iiiIiiiiiii _ wc _机 ^^i:iiiiiiiiiiiii^  
___hr^___ i ^  I I y- I 1B I i y- __隱__ iiiii REQ iiiWi®_i__^__:_ 
lil!l l!llii^ill ^ I i I i I ^  _i_i_ 画 s = 棚 ^ r n ^ m m m m 
iiilii|~ ““|||| I ^  I ^  \ _i_i_i_ ACK fggggip^^p^|p!_^_^__ 
iiliii WiW| s ^  iiiii i i i iiiiiiii iiiil pc cR cw _i_i _圓_誦圓:_:_圖:_顯:__ i i i _ ^ e ^ |1 illlii I I I _ _ i i liii 
^mmmmmmmmm wm软激：：丨 | ] 丨 _ _ ! oA> _灘：缀iii:iii;:?MiiH.;iiyii 品庙.:_?__•!__:_ 
_ _ _ _ _ ) ^ M s r f _ _ _ _ P r e c h a r g e £ . _ _ _ _謙 _ _誦讓疆匪瞧隱 _隱 _ _誦誦誦 
{jjjj]J Buffer [jj||j|tM 
,闘^^^ 
iiiiiiiiiiii; giiiii _ iiii _ 111 _ iiii _ •••_• ^^^ ^^ jjl^piHiiHijpiiiiiiiiiii _i: ii:ii.iiiii_:_i_ 
__iii_il_lipi I iiiiBiiii _ 1111 _ 11111 ii m _ _____ ""^11^___1111|讓,讓,攀_隱隱蘭:_難_: _ 差 jjJjjJppii_ 
__『:避丨‘‘N; m iiiW_iM_iiiB_iiB_iiiiM<giiiiiiiiiilli5iiiiiiiiiii M jiiiii 
_y^Hi^ I i i B _ _ _ _ _ _ _ i i i B ^ i i i i i i ^ p i i i i i i i i _ 1 iiisiiyt_i 
_i_iii___ ^ iiiil _ iiil ii liil _ iliil _ iiiiiii < ^ 1 ^ : _ _ _誦麗觀麗麗___ i _ _ _ _ _ 
丨壤:::丨薩職:_麵__ :¾¾¾¾¾ _ iiiii _ iiil;i ii _:纖_ _ __g!g;j "^"•^；^^||丨麵!|^^纖__|丨辑蒲___翻,__ _ _:1關醜__:1:丨1丨； 
;i::;.:i.: ;i:''^ i; !::怒:::;:;.:^.;.:/:;::;〕:〔;:?::;.;_:_::;”:资;丨.;1 :;¾¾;¾ :：|：；；：^；：；：：：;；:；：：：|;' :¾;;¾:: :¾;¾:;¾¾¾¾¾ :;¾¾¾ :'!1;¾;¾¾¾:}:;:' ;;':.:;:;':':i;.;.;;Uwi^ 2:ii:;:i:_ii:i:il:il.i:^ :i;:i::i*:il:ii:i:ii:ii>_::i'ii:ii;i^ :4^ :i_:^ :i^ ':j^ j;.;•：•>；•• ；,；；'；：1；>.；.； :jix'；'：'；'： :•； : ： ;^¾y;.i.:i:0i  Ii;丨-丨1.:丨1;:丨:丨.丨:.丨:丨》丨:丨:丨:丨:丨:丨:養丨:丨:1丨羅:丨!丨:丨:丨;?iO| illl iiilii :lil| mSm iii i!iii.:l __"|,:;._| Read CompIetion Circuit _:_:p：_；;l|yi:ifi i.i 
_1::_:丨.:.:_::1?1丨|:_:_1^^^^ iiiiilil iiiiiii _:__l_ggggfj ^  (Convemional Amplifier) ^ ^^ W^SMS^ ^ 
\ • ^^yWi；.； :¾;¾¾:: .;¾¾;>-¾¾¾;: :;:.;:¾¾¾¾¾: :;¾;¾;¾ ;i;^ :;.;；;i;s :•:::::::;•;:;:;.:•: i：.；：；^：；；：:;；• {:.:：\: 
：•：；•：•：•：；：•： :-^':¾:: :¾¾:¾ :;:¾¾;¾¾¾;;¾: :¾¾:; :¾¾:;-¾:¾¾¾:;; ；•： ： ;V: ：•：；.：.  ：.： ; ： ：； J , _ _ « „ ^ _ _ _ _ _ — - 7-rr—-^>}：-. ；：  ::::'::::;;.;.:•;:;.;::::.:.;:;.:.: i'i ； ：；； 
liii;;iiliiliii;l^iiil^  o/p 11:!.:!1:!丨謹!誦_丨_:;丨:1丨"®:!" 
ii .:•.:v：；：.:.:.:_:::::•:_::::::::::: :.:<:•: :¾；.；.;•；.；>：；>•；；•； ；•；>：；•；•：>> ；.；.；•；；；•；•；.；；•；;-ii;>'ii;i5v：:¾ ::.:_::+:::.::.::I I::；：;； :i.; :.:.: . + _ 
••:丨:，:丨_丨丨-;.:丨:丨:丨:丨丨:'::;:丨::)|_: 1：; |ili:.i; :;|i;iIil!t |siW&MmMffMMS^^ Read Completion Circuit |；；；；^：：|：：0；; !||ij::|;;i||i:|i| II | | | | | y j ^ (Conventional Amplifier) jfflJ|p g 
__圍^^^ i _ _ i _ i i _ 
園 _ 腦 _ :|i.; aj,|#_话稱_變_變_,笛l:?'ia;：：：：：j：：:：：:I Read Completion Circuit |^||;!;S 2 
-^ --::^ O n^;;^ :^ '^'iili^ "^  ^ (Modifier Sense Amplifier) " ^ f f l l :：1：|!|||1111|1^ 
ii:':it:;:;-i;:!-:p E li::iiilM^^^  
i i _ 
i:il|:;ifj|;|il|-：-：；；-： [gg;mx m"m：.：^¾'^'^!¾^^ ¾^'^.'"^¾'^^：^-^-^¾^:'^-^:^-^-^| Read Completion Circuit 丨"丨::::_!::::::;:::.: :-::.::.::.:.:::-:::;:i_P:_^ 
_ _ | _ i _ i _ i i i i _ _ ^ (Modifier Sense Amplifier) ® 
I i 
Figure 5-1: Block Diagram of Asynchronous Memory Test Chip 
In order to obtain the testing results for the modified sense amplifier, two 
memory columns, one used to test the proposed asynchronous techniques and one 
used as a benchmark are needed. However, to obtain more results for comparison, the 
conventional sense amplifier is also tested. Therefore, two more memory columns are 
required. In the figure, the four memory columns and the corresponding read 
completion circuits are drawn by dashed lines. For Column 1 and Column 4, the bit 
Page 99 
Chapter 5 Testing Asynchronous Memory Design 
line is not segmented. For Column 2 and Column 3, the bit line is segmented into 
four regions. For Column 1 and Column 2, the modified sense amplifier is used for 
data sensing as well as read completion signal generation. For Column 3 and 
Column 4, the conventional sense amplifier is used. The four read completion 
signals are input to the 4-to-l multiplexer so that only one control circuit is needed to 
handle the four read completion signals. 
5.1.2 SCHEMATIC 
The schematic of the asynchronous memory test chip is shown in Figure 5-2. 
In the figure, the more important blocks are labeled. The 1024 x 4 bits memory 
matrix is broken into eight blocks (Ml-M8), each block with 128 x 4 bits. The 
memory blocks are connected by the bit lines in a zig-zag way. Each memory block 
is connected to four blocks in the left which are the decoder circuits. Pl-P4 are the 
precharge buffers. Tl-T3 are the transmission gates for bit lines segmentation. R1 
and W 1 are the read and write buffers respectively. SA is the sense amplifier block. 




diq3 :^s3i vCjora3pNy[ snouojqouAsy jo oi;Bui9qos -Z-9 3^¾ 
X J^ 「^ X r«ffl^ I f 
_�___ LXJ I i4^M"^Y^4^ •_ Ti 
l^r ^ I旨環_  
[g ig |g |g 
1—�"�� rti^___ e E E I  
嗎_, 丫 f Tm^fe^^3 IX 
-^t^= "• 1 …—n I *——-—* rm 
r^rV^r^HL^ __ __ __ __ 
rTTT^ 
W^mr I ‘ epM |lir 
^^ r^ ^ i"““X w_m;:_ ____ 
^=^ fEp^j^=^[^_^i^P^[_^;;;j^^ ^ 
^J““丁 f TwvT^^iT^^^T^ 乙工 L_^ 2i. t — =^ I~ I rpn 
[g |g |g f ____ 
,4^^T^^ 
m^^r 塞 I ‘ SM _ ||lli 
[d,m,. ____ i 
L = = = jl^YTXmTT5^^g^^i CT 
^^m I 讚：i^^^jj^j £工 
細_^_ ,i^^™4^ 
I • =gs,g 9 ？ ^ / TAT “^ ： E5EEE# 
Kj^pi f LW ;ig |ll^ 




I “ lZ I nK ： : : = = : 二 = I I 
3AV -^^^7¾ =zz^||| 
L_g^ rf= rr^^= ^^^ 二. 
…•^^^^^^^嗎 y ^#^;¾! 
[f^ni^eir / jg ^ 
丨“ ^^^ / 暑 
vs m iAV 
L — u^isdQ AA0tu9jA[ snouodt[oMsy~~~ ^uijsdx g mdDtjj 
Chapter 5 Testing Asynchronous Memory Design 
5.1.3 LAYOUT 
The floor plan of the asynchronous memory test chip is shown in Figure 5-3. 
It shows how different components are arranged in the layout but not drawn to scale. 
In order to minimize the layout size, the 1024 x 4 bits memory matrix is broken into 
eight blocks, each block with 128 x 4 bits connected by bit lines in the zig-zag way. 
Apart from the memory circuit, an inverter chain (50 inverters) and a voltage-
controlled delay chain (10 variable delays) are included in the test chip. These 
circuits allow us to directly measure the inverter delay value and the variable delay 
value for analysis. 
Address Buffer Transmission Gates “ 
"^ ~~ ""•^ """ ~" ~" ^ ^^ " ^ n ~ ~ ^  适 
3 m ^ .E ^ C  1 ^n~~ 5 ^ fe § I与 妄 J “ 
会 ^ ^ — s g 
^ 一 ^ ^ ^ c p^  a X a X & X & ^ a x a .^  a .>< a .^  ^ ^ B '5 ^ 'B B .S 里 B 5 B 43 B ^ 与 ^ 轻 ~^~~ M 5o ts $o 与 m cd o) j2 w J2 ^  ^ ^ ^ ^ »2 ,1 ^ ：^ ^ S ^ ^ ^ S ^ S ^ S ^ ^ •§ 2 g 
1 a ^ a ^ s ^ a ^ § ^  § ^  § I § | ^ i i 
一 r I g I i I 6 I g I g I I I I I I ^ 旨5 
^ 8 S 8 S 8 S 8 S 8 S 8 2 8 S 8 S ^ 
0 o — s — u ^ <u 计 V 寸 o 寸 " 寸 <s 寸 ^ 
c S Q ^ Q ^ Q 15 Q 15 Q y Q | Q | Q x 
I Q ^ ?5 ^ S ^ ？^ _ ?5 _ ？^ I ？^ I S I 岂  
S I ^ H ^ — ^ H ^ 一 ^ — ^ — ^ — ^ n ^ ^ « I •与 g | | l 
I ^ J " 
o 
扫 0 — V -5 
a .¾ 
^ o 
1 S ^ c 
� 
Precharge Buffer U 
Figure 5-3: Floor Plan of Asynchronous Memory Test Chip Page 102 
Chapter 5 Testing Asynchronous Memory Design 
The full-custom asynchronous memory test chip layout is shown in Figure 5-
4. It is drawn using C A D E N C E L A Y O U T . The chip is fabricated by A T M E L ES2 
0.7 ^ im C M O S technology with a die size of 3710 x 5870 ^ iml In this run, 10 test 
chips are fabricated. The specifications of the asynchronous memory test chip as well 
^ ^ ^ ^ ^ P 
^^『棚丨_|||::[、..丨::::::::::::::^^^_:样」^^^ 
|M ; ^ [ ] j j j g y | ^ ^ ^ ^ ^ ^ | [‘，^ ^ 
^^^^^B^MMHMMHHIi".丨:~丨丨：丨 
旧 "U H ^ H ^ H I ^ H I I.丨丨 n. ^ p 
.• • . ‘ ^^ ^^ H^ ^^ ^^ ^^ B|^ ^^ B^^ ^^ |^ ^^ ^^ ^^ H^^ ^^ S^ ^^ ^^ ^^ H^H • |^ *ln  
H ; ^ ^ I ^ I ^ H _ ^ t t f S : i r ' S ^ 
_ _ " ^ B B W H B 1 _ •:•國 
lr44srJB ^^^ M^^ ^^ B^^ B^ ^^ ^^ ^^BB^ M^ ^^ ^^ ^^B |jT5C .!: 
^ 4 n ^ ^ l ^ l ^ H ^ l l H ^^  " ^ ^ 
國；丨  ~ I H _ _ l _ 6 y L 
In M f l ^ H ^ H H ^ H I:: i ' ^ p 
^ M h W s ^ W m W B m W M 1 :: 丨丨… 
|H^n--^H^^H9H^H9^9B^^91 D S 
BlB*lHffl^^^^3^^^^M^H^^^^3 I P fi I iU>A>>'mmu 
: : : : : ^ i f l ^ H ^ B H ^ B I i ：： s 」 ^ ^ 
^H^r^N^^^a^^^9^9^^^91 D I、： >-
^ ^ ¾ ^ f t M l M ^ ^ B ^ ^ S H H H I bi ：： b 
n M l M p ^ H ^ H B ^ H r 丨:髮^^^ 
, _ ^ = ^ p p p p ^ j . . _ I l f c mimk^M 
a 
Figure 5-4: Layout of Asynchronous Memory Test Chip 
Page 103 
Chapter 5 Testing Asynchronous Memory Design 
as the chip mircophotograph are included in Appendix Section 9.3. 
5.2 HSPICE POST-LAYOUT SEVWLATION RESULTS 
In Chapter 4, we have performed the simulations with bit line load added so 
that we can approximate the performance of the proposed asynchronous techniques. 
After drawing the layout of the asynchronous memory test chip, a netlist can be 
extracted which includes all the parasitic devices. Based on this netlist, more accurate 
simulations can be performed. The layout is simulated by HSPICE which will be 
described in this section. 
It is very time and resource consuming to perform HSPICE simulation on the 
entire layout of the test chip because the netlist extracted is very large. To perform 
the simulation, the layout is simplified. In each region, only four memory cells are 
used (one for each column) so that only 16 memory cells are present in the simplified 
layout. All the other memory cells in each region are replaced by a large capacitor. 
The bit lines are not deleted so that the 16 remaining memory cells are connected in a 
way similar to the original layout. To determine the parasitic capacitance of the 
deleted memory cells, the bit line capacitance is subtracted from the estimated bit 
line load per region. It is because the estimated bit line load is the sum of the bit line 
capacitance and the parasitic capacitance of the memory cells. From Appendix 
Section 9.2, for each region, the bit line capacitance is 0.0167 x 256 = 4.3 pF. 
Therefore, for each region, the parasitic capacitance of the deleted memory cells is 6 
Page 104 
Chapter 5 Testing Asynchronous Memory Design 
-4.3 = 1.7 pF. This capacitance is added to the bit lines for simulation. Based on the 
netlist extracted from the simplified layout, the HSPICE results are obtained. 
5.2.1 GRAPHICAL RESULTS 
The graphical results for the modified sense amplifier under typical condition 
are shown in Figure 5-5 (Benchmark results) and Figure 5-6 (DVSCD and M D C G 
results). The graphical results for the conventional sense amplifier under typical 
condition are shown in Figure 5-7 (Benchmark results) and Figure 5-8 (DVSCD and 
M D C G results). It is observed that all the read and write completion and 
acknowledge signals can be generated as expected without glitch and hazard. The 
results for the fast case and the slow case are similar to the typical ones except for the 
timing difference. Also, it is found that the results for reading or writing a ’0丨 or T are 
the same. The read access times for the modified sense amplifier are found to be 
much faster than those of the conventional sense amplifier as expected. 
Page 105 
Chapter 5 Testing Asynchronous Memory Design 
% % 束 % 束本 % ^  ；« X % ^  沐 X 米术 * ¥ X 11 ^ �来 % 米 % X % % ^  % % % % % % 米 % 米 % ^  米 1 % % ^ - % X 米 % ,¥ % % % % % % 米 % K- >!�yf. yf-��累�^�. 
1 - ^ i R E Q 
匕丨 M . 0 : . : • . _ A — ~ 
T - 二 
L 2 , 0 — : 
1 ： -
0 r I 1 . 1 I I I .1 I . _ I I I — I — 1 — — I — — I — — . . I I I I I I I ~ ^ I ~ 3 ^ 
5 .) ( ^ ^ ^ n r . 1 w c 
卜丨 q . 0 - - A  L -
T - 二 
L 2 . 0 7 . —-
^ r 1 I I I .1 I I I I I 1 I I 1 I — I — — I — — I — — I — — I — — J I I I ‘ I I J 
0^ 二：—.. .j / ~ V ^ R C 
T ： : ： L 2 , 0 7 -
1 ： -
N 0 : I I I I I I I I i I I I . > I I 、 | I I 1 ^ 
^ : ~ ^ ~ ^ n f^n_- A C K 
^ ^ 0 一 • - A  L -
T - ： 
L 2 . 0 — . : 
I ： -
„ |- ( I I I I I I .1 ] . I I ^ I 1-~I——1——I——I——I——I——i——1——I“^ ‘ I ‘ iA 
0 . 5 0 0 N 1 0 0 0 N 1 5 0 0 N 2 0 0 . 0 N 2 5 0 . 0 N / t i 0 . 0 N 
2 0 . 0 N - T I M E 〔 L 1 N 〕 3 0 0 . 0 N 
Figure 5-5: Modified Sense Amplifier Benchmark Results 
；« ^ 呆 * ^ 半呆 * <^ * * •% * 来 * •% % % % * ^ % * * X * * * * * * * •^ % ^. % * * * * * * 呆 * * S： * * * * * * * 米 * * * * * * * * * * * * * * * * * 
V - 「 丁~~^ ““n .f " ' n n ^ A C K 
'^  4 . 0 - • • 1 I I - A  
T - I Region 1 ! ： • 
J^  -"-•• I. .. ：丨丨 I I -: R E Q 
N 0 r . I 1 I I I . . I . . ! . I. . i l _ ^ _ _ I _ _ I _ _ b i _ 1 _ _ 1 _ ~ i ~ ~ I ~ " I ~ ~ I ~ ~ I ~ i — — I ~ ~ " 」 • 丨 丄 丄 ^ 
[; ^ i.i ：―..... . .. ： .「 |J ^  A C K 
T ‘ ： Region 2 ： ^  
L 2 . 0 — . : 
I : -
N 0 r I I I . I . I . I.. I.、I I . 1 ^ I f>_I—I——I——I——L_I——I——I——I_li. I. I I 1^ 
V - : I .|~"n^ A C K 
u H 0 — _ 1^   
T : Region 3 : 
L 2 . 0 — . . ~ 
丨〜 ： = 
0 r I I I i) I I ！ I j . I 1 Li__I__^_I__I——I__I——I__I_I_I——u 1 1 I I 1^  
V .: I ^““n I ni ACK 
L' 4 . 0 - • . - A  
T ； Region 4 ： 
L 2 . 0 : • n 
^ r , I I I I . 1, I 1 I I L_ i i h__1—I—I——I——I——1——I——I~^11 I I I ‘ ^ 
0 . 5 0 ON 1 0 0 O N 1 5 0 . 0 N 2 0 0 . 0 N 2 5 0 . 0 N 3 ^ 0 . O N 
2 0 . 0 N T I M E 〔 L 1 N ) 3 0 0 . 0 N 
Figure 5-6: Modified Sense Amplifier D V S C D and M D C G Results 
Page 106 
Chapter 5 Testing Asynchronous Memory Design 
X * « * ¥ * ^ X * * « % ¥ ¥ * * * * * * * * % * * * * * * * % * * * % * * 卞 % * * * * * % * * *� % * * ？* * * % K K y * % * 東 * * * * * * * * 
V 二 n : R E Q 
p [0: :A ^ 
T - ： 
L 2 . 0 - : 
1 “ -
N 
0 「 • I 丨 1 i I -1 I . I I I I L—I——！——I——1——I——J I I I I j I I I I L i _ _ ^ 
V L [ 1 W C 
° M . 0 :- - A  
T : 
L 2 . 0 -_ : 
1 ： -
ii I I I I i I I I I I I / I U——I——I——1——I——I——J I I 1 I I I I I J V - r \ i Rc 
[^  4 . 0 : - A  
T - : 
^ "_r —: 
N - . I , ^ _ _ _ J , \ : Q t^ ~^ I~.~~I~.~~-~~‘—^ '~~I~~>""‘~~'"~I~~I~‘~‘~~‘~I~~f I I ‘ 1 ‘ I ‘~~‘~~^~a^  
V ： rn_: ACK 
-' M . 0 - - A  
L -
T - 二 
L 2 . 0 — : 
1 - -N 一 r _ I I 11 I I I I I ) I I I I K I I I 1 1——1——1—I—I——1_i I I I v ^ 
0 . 1 0 0 0 N 2 0 0 0 N 3 0 0 . 0 N M 0 0 0 N 5 0 0 . 0 N G^ 0 • 0 N 20 . ON T I ME 〔 L 1 N〕 & 0 0 . ON 
Figure 5-7: Conventional Sense Amplifier Benchmark Results 
— 
5 ^ * * ¥ ； * 本 《 * * * 5 ( ^ 束 某 ¥ 米 * » ； > ^ * * * > ； * * * 本 ？ » ^ * * * * * 劣 * * * * 劣 * * 5 ^ 劣 ？ < * * * > ^ 术 * * 5 * * 菜 5 « ( * 才 * 劣 * * 軍 > ^ 呆 * * 才 > ( * 术 劣 术 
V ： r i n I m - A C K 
P M . 0 一 . . I I - A  
T Region 1 ： • 
L 2.。— j . : I . |- I : R E Q 
N ,- , I I I I I I I I I , 1 I J_J__I I (_1 I jj——I——I——i——I——I——I —J I L iL^ 
1 、- ^ . “ i A C K 
U q Q —— _ ^  
T : Region 2 : 
L 2 . 0 ‘ ： 
1 ： _ 
Q >" 1 I 1. 1. J. I -I I I .1 . il_I I t__I——I——1——!——I——I——I——i——i——^  I I 1 ^ 
^ - r ^ — . r |i A C K 
1^  M . 0 — _ A  
T I Region 3 二 
L 2 . 0 — “ 
1 ： — 
0 r . I I 11 ,1. I.. ,1 I .L .i ..丨.jl_I I——•_I——I——I——I——I——I——I——I~~'i- I ‘丨 |^ "i 
V .: ^ 厂 |i A C K 
？ "『 Region 4 , 
L 2 . 0 : 
1 - -
N -, - I I I i I I . . I . I. . I . .1 . . I ,1 . . ll_I I »_I——！——I——I——I——I——LJ I . I ‘ I I lLA 
0- 1 0 0 ON 2 0 0 ON 3 0 0 ON 4 0 0 . 0 N 5 0 0 . 0 N G ^ O . O N 2 0 . ON ‘ T I ME CL 1N� G 00 , ON 
Figure 5-8: Conventional Sense Amplifier D V S C D and M D C G Results 
n^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ f^^ ^^ ^^ ^^ m^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ m^n^ ^^ ^^ mu^ ^^ miii 
Page 107 
Chapter 6 Discussion Asynchronous Memory Design 
5.2.2 VOLTAGE SENSEVG COMPLETION DETECTION 
The D V S C D numerical results for the modified sense amplifier are shown in 
Table 5-1. The average results are also shown which assume that the probability to 
access the memory cell at each region is the same. The Read Acknowledge Time is 
plotted against the bit line load in Figure 5-9. As observed from the graph, the Read 
Acknowledge Time decreases from Region 1 to Region 4 as expected. 
^ ^ s ^ T ^ " " ^ ^ ^ " " " ^ — _ . _ _ ^ ^ 一 
Bcnchmark Read Completion Time (ns) 9.08 19.73 42.67 
: ¾ ¾ ¾ ¾ ! Read Acknowledge Time (ns)"~11^~~~KM 47.23 
Region 1 Read Completion Time (ns)~~12.76 24.38 49.14 
Read Acknowledge Time (ns)~~T^775 27.81 53.70 
Region 2 Read Completion Time (ns) ^ 1 ^ 40.03 
Read Acknowledge Time (ns)~~UA~\ 22.60 44.59 
Region 3 Read Completion Time (ns) T ^ 13.56 27.81 
Read Acknowledge Time ( n s ) ~ ~ ^ 1 ^ 32.36 
.Regioti4 ； Read Completion Time (ns) 0 ^ ^ 14.62 
Read Acknowledge Time (ns)~~3^ 11.92 19.18 
Average Read Completion Time (ns) 7.56 16.44 32.90 
ReadAcknowledge Time (ns)~~10.28 19.80 37.46 
Table 5-1: Modified Sense Amplifier D V S C D Numerical Results 
� Page 108 
Chapter 6 Discussion Asynchronous Memory Design 
60. 
5。 :—......-:::::^ 
4 0 ~ ^ ^ — ： 
« ^ ^ | + Fast 
^ 30 ^ ^ - -" -^Typical 
j| ^ ^ _^^_.^-"""-^-""""^•'' _^SIow 
20 …^^ ^^  :.;,^ ..---^ -^   
l^^^^^^^^^^^^^n^*"^^^^* 
10 __'^ ^^ ^^ ^^ ^^ ^^ ^^ .^><^^ ^^ rrr:frrrrr:*  
0 1 1 i  
6 1 2 1 8 2 4 
Bit Line Load /pF 
Figure 5-9: Graph of Read Acknowledge Time vs. Bit Line Load 
The numerical results for each region are compared with the benchmark 
results in Table 5-2. In the table, the differences in time are shown as well as the 
percentage change with the benchmark results as the reference. The average Read 
Completion Time is 7.56 ns, 16.44 ns and 32.90 ns for fast, typical and slow 
simulation conditions respectively which is 1.52 ns (16.7 %), 3.29 ns (16.7 % ) and 
9.77 ns (22.9 % ) faster than the benchmark results. The average Read Acknowledge 
Time is 10.38 ns, 19.80 ns and 37.46 ns for fast, typical and slow simulation 
conditions respectively which is 1.52 ns (12.8 %), 3.21 ns (14.0 % ) and 9.77 ns (20.7 
% ) faster than the benchmark results. 
� Page 109 
Chapter 6 Discussion Asynchronous Memory Design 
Region 1 Read Completion Time (ns) +3.68 ns +4.65 ns +6.47 ns 
(+40.5%) (+23.6%) (+15.2%) 
Read Acknowledge Time (ns) +3.85 ns +4.80 ns +6.47 ns 
(+32.4%) (+20.9%) (+13.7%) 
Region 2 Read Completion Time (ns) +0.51 ns -0.55 ns -2.64 ns 
(+5.6%) (-2.8%) (-6.2%) 
Read Acknowledge Time (ns) +0.51 ns -0.41 ns -2.64 ns 
(+4.3%) (-1.8%) (-5.6%) 
Region 3 Read Completion Time (ns) -2.06 ns -6.17ns -14.86 ns 
(-22.7%) (-31.1%) (-34.8%) 
Read Acknowledge Time (ns) -2.05 ns _6.16ns -14.87 ns 
(-17.2%) (-26.8o/o) (-31.5%) 
Region 4 Read Completion Time (ns) -8.22 ns -11.10ns -28.05 ns 
(-90.5%) (-56.3%) (-65.7%) 
Read Acknowledge Time (ns) -8.39ns -11.09 ns -28.05 ns 
(-70.5%) (-48.2%) (-59.4%) 
Average Read Completion Time (ns)~ -1.52 ns -3.29 ns -9.77 ns 
(-16.7%) (-16.7%) (-22.9%) 
ReadAcknowledge Time (ns) -1.52 ns -3.21 ns -9.77 ns 
(-12.8%) (-14.0%) (-20.7%) 
Table 5-2: Comparison Between D V S C D and Benchmark Results 
� Page 110 
Chapter 6 Discussion Asynchronous Memory Design 
The D V S C D numerical results for the conventional sense amplifier are shown 
in Table 5-3. The average results are also shown. The Read Acknowledge Time is 
plotted against the bit line load in Figure 5-10. As observed from the graph, the Read 
Acknowledge Time decreases from Region 1 to Region 4 as expected. 
I 卜 | I I I 
Benchmark Rcad C"omplclion limc{ns) 73.29 105.82 159.04 
Read Acknowledge Time (ns) 76.37~~109.25~~163.97 
Region 1 Read Completion Time (ns)~~84.25~~117.81"""180.00 
Read Acknowledge Time (ns)~~87.33~~121.23~~184.52 
Region 2 Read Completion Time (ns)~~67.12 91.44 139.73 
Read Acknowledge Time (ns) 70.21 9 5 ^ 144.25 
Region3 Read Completion Time (ns)~~46.92"^6^ 98.22 
Read Acknowledge Time (ns)~~50.00 70.55 102.74 
Region 4 Read Completion Time (ns)~~26.03 39.04 55.48 
Read Acknowledge Time (ns) 29.11 42.81 60.41 
~~Average Read Completion Time (ns) 56.08 78.77 118.36 
ReadAcJmowledge Time (ns)~~59.16 82.45 122.98 
Table 5-3: Conventional Sense Amplifier D V S C D Numerical Results 
� Page 111 
Chapter 6 Discussion Asynchronous Memory Design 
200 
180 +‘^^ 
160 .^ ,>cr^ ..... 
1 4 0 ^ 乙  
g 120 ^^^^^^^^^ ..^^n !"•"Fast 
^ 100 - y < ^ —i^^^^^r^fT：：：^—…— -^—Typical 
|i 80 ……-^^^^^ <^^^^:^--^^^^^^^<^^^"::^:":^--^ _^^^^^^__-<>^"=^"^'' ^!^Sbw  
6。^^^^;;;;;^;^^^^_^:  
4 0 | 5 ^ ^ ^ - ^ - 
20 ’.二  
0 i i  6 12 18 24 
Bit Line Load /pF 
Figure 5-10: Graph ofRead Acknowledge Time vs. Bit Line Load 
The numerical results for each region are compared with the benchmark 
results in Table 5-4. The average Read Completion Time is 56.08 ns, 78.77 ns and 
118.36 ns for fast, typical and slow simulation conditions respectively which is 17.21 
ns (23.5 o/o), 27.05 ns (25.6 % ) and 40.68 ns (25.6 % ) faster than the benchmark 
results. The average Read Acknowledge Time is 59.16 ns, 82.45 ns and 122.98 ns for 
fast, typical and slow simulation conditions respectively which is 17.21 ns (22.5 %), 
26.80 ns (24.5 % ) and 40.99 ns (25.0 % ) faster than the benchmark results. 
� Page 112 
Chapter 6 Discussion Asynchronous Memory Design 
^ r " " " ^ r “ 
^ ^ M ^ M i i a ^ r ^ " " " > ^ 、 ！ ： . . ： 
Region 1 Read Completion Time (ns) +10.96 ns +11.99 ns +20.96 ns 
'','： : % '。〜’’、 
(+15.0%) (+11.30/0) (+13.2%) 
Read Acknowledge Time (ns)~~+10.96 ns~~+11.98 ns~~+20.55 ns 
(+14.40/0) (+11.0%) (+12.5%) 
Region 2 Read Completion Time (ns)~~ -6.17 ns -14.38 ns -19.31 ns 
(-8.40/0) (-13.6%) (-12.1%) 
Read Acknowledge Time (ns) -6.16 ns -14.04 ns -19.72 ns 
S P ^ ® : (-8.1%) (-12.9%) (-12.0%) 
: ¾ ! ¾ ¾ ¾ ¾ ! Read Completion Time (ns) -26.37 ns~~-39.04 ns~~-60.82 ns 
(-36.0%) (-36.9%) (-38.2%) 
Read Acknowledge Time (ns) -26.37 ns -38.70 ns -61.23 ns 
(-34.5%) (-35.4%) (-37.30/0) 
Region 4 Read Completion Time (ns) -47.26 ns~~-66.78 ns""^-103.56 ns 
(-64.5%) (-63.1%) (-65.1%) 
Read Acknowledge Time (ns) -47.26 ns -66.44 ns -103.56 ns 
(-61.9%) (-60.8%) (-63.2%) 
Average Read Completion Time (ns)~ -17.21 ns -27.05 ns -40.68 ns 
(-23.5%) (-25.6%) (-25.6%) 
ReadAcknowledge Time (ns) -17.21 ns -26.80 ns -40.99 ns 
(-22.5%) (-24.5%) (-25.0%) 
Table 5-4: Comparison Between D V S C D and Benchmark Results 
� Page 113 
Chapter 6 Discussion Asynchronous Memory Design 
5.2.3 MULTIPLE DELAYS COMPLETION GENERATION 
The M D C G numerical results are shown in Table 5-5. The average results are 
also shown. The Write Acknowledge Time is plotted against the bit line load in 
Figure 5-11. As observed from the graph, the Write Acknowledge Time decreases 
from Region 1 to Region 4 as expected. 
^ ^ i f i _ 
Benchmark Actual Write Time (ns) 9.55 13.9 23.63 
Write Completion Time (ns)~~^6"~VL^~"~29.26 
. ¾ ¾ ¾ & ¾ ¾ Write Acknowledge Time (ns) 14.59 19.64 31.32 
；？‘'^“‘:<；； '"二§v 7 ' ' ： 各 乂 ; : / 丨:::/:箱:續釣试    Region 1 Actual Write Time (ns) 14.18 20.96 35.19 
Write Completion Time (ns) 19.01 25.75 39.71 
W 7 M E J ^ i Write Acknowledge Time (ns) 20.14 27.33 41.67 
,:--:7:¾^::¾:,  
Region 2 Actual Write Time (ns) 1 ^ l432^"25.09 
_ _ ^ ^ _ > > ^ _ _ ^ ^ ^ ^ _ > _ ^ ^ _ ^ _ _ ^ _ _ ^ ^ ^ _ ^ 
Write Completion Time (ns)~~K^ 17.47~~29.26 
Write Acknowledge Time (ns) 14.59 1^~~~31.32 
::,R^ ioW:4:::, Actual Write Time (ns) Jls ^ 16.61 
； ' - v � " f > , . ::  
..> ,；;-；- ,;¾^;; Write Completion Time (ns) 9.04 10.34 19.41 
Write Acknowledge Time (ns) 10.07 11.78 21.47 
Region 4 Actual Write Time (ns) Y^% 4.93 10.10 
O 
‘ 二 ' i / '；/'>;；，《?'、'、、;,,： Write Completion Time (ns) ^ T ^ 13.56 
Write Acknowledge Time (ns) 7.60 8.84 15.79 
� Page 114 
Chapter 6 Discussion Asynchronous Memory Design 
H ^ Actual Write Time (ns) 8.20 12.28 21.75  
Write Completion Time (ns) 12.02 15.22 25.49 
WriteAcknowledge Time (ns)~~13.10~~~16.90^"27.56 
Table 5-5: M D C G Numerical Results 
4 5 ^ 1 
40 Z 
35 ... ^ ^ ^ ^ 
30 ------ - --^^— -
_^>x^  摄^ 4 Fasl 1 25 …- Z ^ ^ + 丁 • I ^ / ^ ^^-‘^ _^Typical 
E 20 …^^^^^：^ ………-. ,Z^^^^^^^, ^ s i o w 
. 乙 ― … — ^ ^ ^ ^ ^ ^ . ^ ^ … … — 
10 _.^^^^__-<^^^^^^^T^:^^-------- -
»r====^ "^ "^ ^ 
5 - L  
0 i 1  
6 12 18 24 
Bit Line Load /pF 
Figure 5-11: Graph of Write Acknowledge Time vs. Bit Line Load 
The numerical results for each region are compared with the benchmark 
results in Table 5-6. The average Actual Write Time is 8.20 ns, 12.28 ns and 21.75 ns 
for fast, typical and slow simulation conditions respectively which is 1.35 ns (14.1 
o/o), 162 ns (11.7 % ) and 1.88 ns (8.0 % ) faster than the benchmark results. The 
average Write Completion Time is 12.02 ns, 15.22 ns and 25.49 ns for fast, typical 
and slow simulation conditions respectively which is 1.54 ns (11.4 %), 2.25 ns (12.9 
o/o) and 3.77 ns (12.9 % ) faster than the benchmark results. The average Write 
Acknowledge Time is 13.10 ns, 16.90 ns and 27.56 ns for fast, typical and slow 
� Page 115 
Chapter 6 Discussion Asynchronous Memory Design 
simulation conditions respectively which is 1.49 ns (10.2 %), 2.74 ns (14.0 % ) and 
3.76 ns (12.0 % ) faster than the benchmark results. 
^ ^ ^ ^ ^ ^ ^ _ _ _ _ _ _ _ 
^ ^ B ^ ^ ^ ^ ^ ^ M ^ ^ M ^ ^ M ^ ^ B 
^ ^ ^ 梦 、 _ 4 ‘ >^ ^^ -.>^ 、、、:::效：^々 ;::::仏、亡:::、、:、:?〈:::巧:::, 
Kccrion 1 Actual Write Time (ns) +4.63 ns +7.06 ns +11.56 ns 
(+48.5%) (+50.8%) (+48.9%) 
Write Completion Time (ns) +5.45 ns~~+8.28 ns~~+10.45 ns 
(+40.2%) (+47.4%) (+35.7%) 
Write Acknowledge Time (ns)"~+5.55 ns~~+7.69 ns~~+10.35ns 
(+38.0%) (+39.2%) (+33.0%) 
Region 2 Actual Write Time (ns) +0.52 ns +0.42 ns +1.46 ns 
(+5.4%) (+3.0%) (+6.2%) 
Write Completion Time (ns) 0.00 ns 0.00 ns 0.00 ns 
(0.0%) (0.0%) (0.0%) 
Write Acknowledge Time (ns) 0.00 ns 0.00 ns~" 0.00 ns 
(0.0%) (0.0%) (0.0%) 
Region 3 Actual Write Time (ns) -3.90 ns -5.00 ns -7.02 ns 
(-40.8%) (-36.0%) (-29.7%) 
Write Completion Time (ns) -4.52 ns -7.13 ns -9.85 ns 
(-33.3%) (-40.8%) (-33.7%) 
Write Acknowledge Time (ns) -4.52 ns -7.86 ns -9.85 ns 
(-31.0%) (-40.0%) (-31.4%) 
Region 4 Actual Write Time (ns) -6.67 ns -8.97 ns -13.53 ns 
� Page 116 
Chapter 6 Discussion Asynchronous Memory Design 
(-69.8%) (-64.5%) (-57.3%) 
Write Completion Time (ns) -7.09 ns -10.14 ns~~-15.70 ns 
(-52.3%) (-58.0%) (-53.7%) 
獨 糧 驅 _ Write Acknowledge Time (ns) -6.99 ns -10.80 ns -15.53 ns 
(-47.9%) (-55.0%) (-49.6%) 
Average Actual Write Time (ns) -1.35 ns -1.62ns -1.88 ns 
(-14.1%) (-11.7%) (-8.0%) 
Write Completion Time (ns)~ -1.54 ns -2.25 ns -3.77 ns 
I (-11.4%) (-12.9%) (-12.9%) 
Write Acknowledge Time (ns) -1.49 ns -2.74 ns -3.76 ns 
(-10.2%) (-14.0%) (-12.0%) 
Table 5-6: Comparison Between M D C G and Benchmark Results 
5.3 MEASUREMENTS 
A test board is assembled for testing the asynchronous memory test chip. 
Two types oftests are performed on the chip. First of all, the chip is tested to see if it 
operates according to the designed logic. This includes the generation of read and 
write completion signals and the correct functioning of the memory cells. 
Afterwards, the timing results of the chip are measured. These timing results include 
the read and write acknowledge times. 
� Page 117 
Chapter 6 Discussion Asynchronous Memory Design 
5.3.1 LOGIC RESULTS 
5.3.1.1 METHOD 
To test for the correct functioning of the test chip, the following simple and 
basic tests are applied. The 10 test chips are first of all tested by writing a '0，and 
check the read data is ’0丨 when the acknowledge signal is generated for all the bits. 
Afterwards, they are tested by writing a '1' and check the read data is T for all the 
bits. 
To perform these tests, the Logic Analysis System is used. Several modules 
incorporated in the system are used for the tests which include the pattem generator 
and the logic analyzer. All the input signals are generated by the pattern generator, 
and all the results are observed by the logic analyzer. 
5.3.1.2 RESULTS 
Some typical results observed on the Logic Analysis System are shown in 
Figure 5-12. All the 10 test chips are tested and we found that 8 of them are 
functioning properly but 2 of them cannot generate the acknowledge signal. This is 
probably due to the fabrication problem. For the 8 functioning test chips, the read and 
acknowledge signals are generated as expected. Moreover, all the memory cells of 
that 8 test chips are functioning properly as observed from the logic analyzer. 
� Page 118 
Chapter 6 Discussion Asynchronous Memory Design 
(State/Timing E ] ( Haveform 1 ) (Cancel ) ( Run ) 
{ Accumulate ] ( ^ | f Dcn \ 。 [ Off J [ X markerJ L_!1^  ) 
{ s/Div~~"1 { Delay~] { Markers 1 ( X lo 0 1 [lrig to xl【Trig lo ol 
[100 ns J 1 400 nsJ [ Time J [ 200 ns J [ 150 nsJ | 350 nsJ 
r n Z X " = ' ^ LJ"""LJ U 
"K I L U ~ ~ L U I _ I ~ ~ U ^ 
D I/P ： ： 
D 0/P ； ： 
(a) Writing and Reading a ,0’ 
(State/Tirning E ] ( Haveform 1 ) (cancelj ( Run ) 
(Accumulate ] { At | ( ocQ ) o 
Off [ X markerJ 1 ) 
(s/Div~1 ( Delay ] ( Harkers ] { X lo 0 ] j^Trig lo x] flrig lo o j 
[100 ns ^  1 400 nsJ 1 Time J [ 200 ns J [ 150 nsJ | 350j^  
p n ^ z j ~ ~ u j Lj ~ ~ T J ~~""Lj~n 
• y—U_j~~U_] I I~~L_J~~ 
D I/P ： ； ； 
0 0/P ： ； ： ‘ 
(b) Writing and Reading a '1’ 
Figure 5-12: Typical Results Observed On Logic Analysis System 
5.3.2 TEVDNG RESULTS 
5.3.2.1 METHOD 
According to the control circuit design described in Section 3.2, for the read 
operation, the measured time difference is the sum of the word line delay, precharge 
delay and the Read Acknowledge Time. For the write operation, the measured time 
� Page 119 
Chapter 6 Discussion Asynchronous Memory Design 
difference is the sum of the word line delay and the Write Acknowledge Time. 
Therefore, to find the Read Acknowledge Time and the Write Acknowledge Time, 
the delay values are subtracted from the measured results. Now, the question is how 
to find these delay values. As described in Section 5.1.3, a voltage-controlled delay 
chain is included in the test chip. Therefore, we can approximate the word line and 
precharge delays by applying suitable control voltages to the delay chain and 
measure the delay values. 
For the read operation, the benchmark as well as the D V S C D timing results 
are measured for both the modified and conventional sense amplifiers. In order to 
obtain more convincing results, we take the average of the results for the farthest and 
the nearest memory cell from the sense amplifier within the entire bit line (for the 
benchmark case) or within the same bit line segment (for the D V S C D case). 
For the write operation, the benchmark as well as the M D C G timing results 
are measured. In order to obtain more convincing results, we take the average ofthe 
results for the farthest and the nearest memory cell from the write buffer within the 
entire bit line (for the benchmark case) or within the same bit line segment (for the 
M D C G case). As described in Section 3.6, the write acknowledge signal is generated 
by one of the voltage-controlled delays in the write completion circuit. For each 
region, the voltage-controlled delay is tuned so that the write acknowledge signal is 
generated when the write operation is finished. Now, the question is how to tune 
these delays. Actually, for each region, the delay is designed so that the minimum 
delay value is smaller than the respective Actual Write Time. The performances of 
these delays are verified by HSPICE simulation. In this way, these delays are tuned 
� Page 120 
Chapter 5 Testing Asynchronous Memory Des_ 
according to the following method. First of all, for each region, the delay is tuned to 
the minimum value so that there is not enough time for the write operation to take 
place. The memory cell is then used to write and read a T and then write and read a 
'0' alternately and the data output signal is monitored. At first, the data output signal 
will either stay HIGH or L O W . The delay is then tuned step by step to attain larger 
delay values. When at the time the data output signal canjust follow the sequence of 
the data input signal, this implies that there is just enough time for writing the data to 
the memory cell and the tuning of the delay is done. 
The Cathode Ray Oscilloscope (CRO) is used for accurate timing 
measurements. The input signals are still generated by the pattern generator 
incorporated in the Logic Analysis System. Using the CRO, the time differences 
between the request and acknowledge signals for the read and write operations are 
measured. 
5.3.2.2 GRAPHICAL RESULTS 
Some typical waveforms observed on the C R O are shown in Figure 5-13 as 
examples. All the 10 test chips are measured and we found that 8 of them are 
functioning properly but 2 of them cannot generate the acknowledge signal. In the 
coming paragraphs, the average values of the 8 functioning test chips are used. The 
time differences between the request and acknowledge signals are measured from the 
CRO. 
Page 121 
Chapter 6 Discussion Asynchronous Memory Design 
TftK Run; 500MS/s Sample Q K S Ta*-Run: SOOMS/s Sample 
f- 1. H i — ,. ,, ,,' ,^~5.-：'；', 1 .； .,,.  
… . - … ： | | ' - - 1 r - . : - ;U; 34ns : ： ^ A! 26ns 
：：,： t h： 32ns • I 丨. J ¢: 2^"s 
-.• - - -. .!.J(^A^^ ^^^ ^^v^^ ^^^ *^^  • • - • • - . • - : Ch1 rreQ -…•.- .! .//:4*^^^^^ :^"*^^ "^^^t^^^ ： ’ ch1 freq 
'：；: ； :1^ ¾^¾ :i/ ： i ： 1： 1-103MHZ 
• ， • T ： 丄 • 
- •, --/ - ‘II . 丄 . I. 
： : 5 一 : . . . 、 ： . . . : . ‘ ％ = .. .： :T ； : i . . . : J . . ..,^ ;^ rk 
<B_J^ f^ ;;;:::ni::r:j^ "^""^ ""^  " P ___^ -i^ g^^ ^^ ^^ ^^ ;^ ^^ !^w^ ^^  茂滥 
j. I . . . 1 . . . 1 .. . : .. . . . . ..- resolutton - \ -一 - • -_ � . . ; j � . � . L : . . . i . . . : . : . /%? ;^^ k ;..L.:i,.;,.... i . … ； . . ： . . ： " ^ - ' 
!•^ •^^ •^ •^ :-:-^ •^(jilj ： [ . . : . . . l p / i W ^ ^ ^ : ” - ^ : • ^ • ^ • ^ .:: {；• t^ i_fV^ :-^ ^^ S-_^ «>^ „^  
L l ^Wm^' ''Crtl m,hV- M'U6n»'OiV ‘y • -l '<W W l ltt4m '^ ' ch5 ' 2 W M i66m' cKl ' / 'isSm^ 
(a) Modified Sense Amplifier (b) Modified Sense Amplifier 
Benchmark Results D V S C D Region 4 Results 
Ti-ARun:SOOMS/s S impl« TaKRun: 500MS/s &mp te 
I { _ T | J.—_； ！ "1~TT ~ " ~ ~ . . 
r-'""' 'l "|i""- : ' ••••• •： ^ |A：S6ns r““:.. .: iT ' j:..'T ..'. . .:....: i^ "™ 
： j ； . i ¢ : 54ns :丨 _ t .^: 34ns 
. . . ' . ! ^ - iA/VvSr '--.-.^-/-^<rv^…… • • • •‘ ch1F req 、 . � W ^ ^ ^ ^ w ^ ^ ^ :. . . . . . 」 ^ . h l Freq 
•f • r 1^102¾ :/. . I j"�2MHz 
；‘I • i j ； _ i * Chi Pk-Pk t : t i. ‘ Chi pk-pk ;.. .11 . . ...t \.. . . . . 34SmV •：•••十 \ 344mv 
1 . 1 : I L : J 
圔 _ > ^ , . . ^ ^ ^ ^ ^ ^ 1 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ¾ ^ ^ ^ ^ ^ m ; 圔 . _ ^ i ^ ^ j ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | ^ ^ , V v > _ > > ^ , ¾ ¾ 
i 于 rh2 Pk-Pk I X Ch2 Pk-Pk 
：I 了 744mv . . . ' , t. 75^ mv 
• . . I . .-: I • V • 
• I 't ‘ ！ I I . 
广 2. r^ v^>^ ^^ >^Jv • 4. fVW-^--J 2- >-v.^ .-^ ^ ： . t- . 〜 〜 - . 一 ' ， i ： t V - ；！ ： •： I . ： . . . 1細 l'oamv' ‘ . '(ihi ;'aariiV'. M' l06ns ‘ C'hl ‘ >' ‘ 'l96mV 'gfL M«mv'' 'ih2 2fiflmV M' idb™ ‘ thl f l9Bmv' 
(c) Conventional Sense Amplifier (d) Conventional Sense Amplifier 
Benchmark Results D V S C D Region 4 Results 
Tex Run: 500MS/S Sample 
卜：： " ― 子 ...：..】:~^  
f .- :| , • • ~‘“. ！ _ “ ‘ • • : • ‘ • W J2ns 
I i 9 : 1 0 n s 
| h � _ > ^ � � ^ ^ ^ ^ w _ « « ^ : _ 
, J 1.103MHZ 
……...：1 : . : -
I . i： 1 ^ C h l P k - P k 
I ;.. , . .H . . • . . J 33fimV 
V u ^ ^ ^ 
v^ ^-u>yi>i wwi<i"4 |^f'-i• HHt-~^^^-*^*--'" i - ~ ‘ ‘‘ '^^*^'*- • H '^ ">t*^  Ch2 freq 
m j ' | ^ ^ 4 , ^ v , : ^ ri^*M(u • • • “. 1 • ： . t - “ i 
. i ； } Ch2Pk-Pk 
t T i 776mv .] 1' . . 
2^ -«.^ *^^~j| • • ,; \f\>'^^^“^^X1^ 
'aa 'tft^mV'^ ‘ ch2 ‘ '3^'m^' Kl6Ans g>^l f ift^mV^ 
(e) M D C G Benchmark and Region 4 Results 
Figure 5-13: Typical Waveforms Observed On C R O 
� Page 122 
Chapter 6 Discussion Asynchronous Memory Design 
5.3.2.3 VOLTAGE SENSEVG COMPLETION DETECTION 
The measured D V S C D results for the modified and conventional sense 
amplifiers are tabulated in Table 5-7. These results are also plotted in Figure 5-14. 
The D V S C D results for the modified and conventional sense amplifiers are 
compared with the respective benchmark results in Table 5-8. For the modified sense 
amplifier, the average Read Acknowledge Time is 22.31 ns which is 2.73 ns (10.9 % ) 
faster than the benchmark result. For the conventional sense amplifier, the average 
Read Acknowledge Time is 38.62 ns which is 6.25 ns (13.9 % ) faster than the 
benchmark result. 
""""Mk^^^^^kV ^\-V^ "^^^ Lk [从 l*^、L L..“"，_l、“L 、U••• uui '^""^  L U| - 、\ • .. 
^ m ^ ^ M ^ ^ ^ B ^ ^ ^ ^ ^ ^ P ^ ^ ^ ^ M ^ ^ ^ ^ ^ ^ ^ ^ B y ^ ^ p ^ ^ ^ ^ ^ ^ J ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ B 
^^^^^^^^^^fc^^^^^^^^^^^^^^^^^^^^^^^H^^^^^^^^^^^^H^^^^^^^^^"M^^ ^^^^^^^^^^¾^^^^^^^^^^^^^^^^^^^^^^^¾^^^^^^^^^^^^^^^^^¾^^^^^^^¾ 
- - . … . - 〜 丄 -    
Benchmark Read Acknowledge Time (ns) 25.04 44.86 
"""Region 1 ~ Read Acknowledge Time (ns) 26.00 50.38 
、.Rc^ giott： 1^::'A Read Acknowledge Time (ns) 24.58 42.71 
~~Region 3~~~ Read Acknowledge Time (ns) 22.14 3 5.14 
“Region 4~~ Read Acknowledge Time (ns) 16.51 26.28 
~Average ReadAcknowledge Time (ns) 22.31 38.62 
Table 5-7: D V S C D Measured Results 
� Page 123 
Chapter 6 Discussion Asynchronous Memory Design 
6 0 • • :该终 • _ _顯_疆 ,讓__漏_ , : _ , • , : , _ : : , , , , : : , _ :黃 _ | , : _ g : : ,：： I : : | : I塵 : i : _ _ M i : _ ,厲置 , : • _ : | ; | , ; i | | g | ; . B ^ ^ 
5 0 - _ _ | | _ | _ _ | _ _ ^ ^ ^ ^ ^ 
^ ^ ® ^ ^ ^ ^ " " " " ^ 
^^---•--•:••••-'•---- 一 一 . _^ ^ 
« 3 o - i i i i p i i H ^ 
ii 1 - " ^ ^ 
" b：^^^^^^^^^^^^^.•……_……-…………=二……_ 
10 • Modified 1  
__^ _Conventional 
0 |::丨:::'::::為:《::::赫实_-：丨丨：:_:::::'-''"^ *'^ '" ):::::¾::¾.¾¾:¾:¾,:¾::::::::'怨：::::::::：丨：:势.?各:、:'、:-:、:::、:、::'::::::.4:::::::::::5::.沾:::::、^ 
1 2 3 4 
Bit Line Load Ratio 
Figure 5-14: Graph ofRead Acknowledge Time vs. Bit Line Load 
„««>»『”、Y、^‘"^“^n~" ~ " r " ^ 、、.�、 
^ ^ ^ ^ ^ m ^ l ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ M ^ ^ M ^ ^ ^ ^ ^ ^ M ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ l 
^ ^ ^ ^ ^ M ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ M ^ ^ ^ ^ ^ ^ ^ B ^ M ^ ^ ^ M 
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ H ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ P ^ ^ ^ ^ ^ ¾ ^ ^ ^ ^ ^ ^ ^ ¾ ^ ^ ^ ^ ^ ^ y M , ^ ^ J ^ M 
」 I 
• \ \ � \ \ � < f x^->^A;^^ ^ ^ ^ M ^ ^ ^ ^ ^ ^ ^ ^ M ^ ^ M W W ^ V ^ t � a a > < - ^ -.:?<¾^ ^ s~�~<8^ jf..^ S.. >^/~AS.VMf.:v»«^  A\ ‘ a^ -..- •-^ .-”入��t..-.t. >.»..； • -- • •>»�.^�.^^�..:�—y>.:.:.ii„i»....:�..�.—��mf<«^». .... 
Region 1 Read Acknowledge Time (ns) +0.96 ns +5.52 ns 
(+3.8%) +12.3%) 
~~Region 2 Read Acknowledge Time (ns) -0.46 ns -2.15 ns 
(-1.8%) (-4.8%)   
“Region 3 Read Acknowledge Time (ns) -2.90 ns -9.72 ns 
(-11.6%) (-21.7%)   
“Region 4 Read Acknowledge Time (ns) -8.53 ns -18.63 ns 
(-34.1%) (-41.5%) 
XX X " " > i - - • 
~Average ReadAcknowledge Time (ns) -2.73 ns -6.25 ns 
(-1幌) " 肩 
Table 5-8: Comparison Between D V S C D and Benchmark Results 
When compared to the post-layout simulated results, we found that for both 
amplifiers, the differences in the measured Read Acknowledge Times between 
� Page 124 
Chapter 6 Discussion Asynchronous Memory Design 
different regions are smaller than the simulated results. This is because the actual bit 
line load is smaller than the estimated one. Also, we found that the measured results 
for the modified sense amplifier are faster than the slow results but slower than the 
typical results, while the measured results for the conventional sense amplifier are 
faster than the fast simulation results. These discrepancies are caused by the 
difference between the models for simulation and the performance of the fabricated 
transistors. 
5.3.2.4 MULTIPLE DELAYS COMPLETION GENERATION 
The M D C G results are tabulated in Table 5-9. When obtaining the M D C G 
results according to the method described in Section 5.3.2.1, for all the cases, we 
found that the write operation can still take place even if the voltage-controlled delay 
is tuned to the minimum value. The Write Acknowledge Time is found to be 9.52 ns 
for all cases. This implies that the actual write time for all cases is less than 9.52 ns. 
r " T 3 i p - q 1 . . . . 、 \ . . . • 
Benchmark Write Acknowledge Time (ns) 9.52 
: a i _ _ i S  
g p ^ ^ ^ ; | Write Acknowledge Time (ns) 9.52 
5 糊 _ 職 畫 Write Acknowledge Time (ns) 9.52 
Region 3 Write Acknowledge Time (ns) ^ 
Region 4~""Write Acknowledge Time (ns) ^ 
Table 5-9: M D C G Results 
� Page 125 
Chapter 6 Discussion Asynchronous Memory Design 
At this point, there are two questions. Firstly, why the measured Write 
Acknowledge Time when the voltage-controlled delay is tuned to the minimum delay 
value is 9.52 ns? It should be much less than 9.52 ns. From the measurements ofthe 
voltage-controlled delay chain, the average minimum delay value is 3.68 ns. 
Therefore, the minimum Write Acknowledge Time should be around 4 - 5 ns. The 
measured result of 9.52 ns is probably due to the larger propagation delay of other 
parts in the circuit than the simulated values. Secondly, the post-layout simulated 
Write Acknowledge Times for some regions are much larger than 9.52 ns, why the 
write operation can be completed in less than 9.52 ns in the test chip? To account for 
the speed up of the write operation, it is believed that the actual bit line load is 
smaller than the estimated one. These discrepancies are caused by the difference 
between the models for simulation and the performance of the fabricated transistors. 
� Page 126 
Chapter 6 Discussion Asynchronous Memory Design 
Q DISCUSSION 
6. DISCUSSION 
W e are now in a position to summarize, discuss and conclude the simulation 
and testing results of the three asynchronous techniques presented in the previous 
chapters. The CSCD, D V S C D and M D C G methods are discussed in Section 6.1, 6.2 
and 6.3 respectively. Afterwards, we will compare these three asynchronous 
techniques as well as evaluating the bit line segmentation effect on these techniques 
in Section 6.4. Then, the general application of the proposed memory system as a 
large memory block is presented in Section 6.5. Finally, we will explain the possible 
further developments of the proposed memory system in Section 6.6 which include 
two-phase HCP interface, data bus expansion, speed optimization and modified write 
completion method. 
Page 127 
Chapter 6 Discussion Asynchronous Memory Design 
6.1 CURRENT SENSmG COMPLETION DETECTION 
6.1.1 COMMENTS AND CONCLUSION 
The conventional sense amplifier is not used in the CSCD implementation 
because of its asymmetrical switching current characteristics. Instead, the modified 
sense amplifier is used which has symmetrical switching current characteristics. It is 
much faster than the conventional one but consumes more power. The read 
completion signal can be generated by the CSCD method by simulation. However, 
the average results for fast, typical and slow simulation conditions are all several 
times slower than the benchmark results. It is because the current sensor is not 
sensitive enough so that the read completion signal cannot be generated close enough 
to data read completion. For the present current sensor, it is hard to increase 
sensitivity while maintaining its correct operation. Therefore, we can conclude that 
using the present current sensor design, the CSCD method is not suitable for 
generating the read completion signal. 
6.1.2 SUGGESTION 
To use the CSCD method for read completion generation, the present current 
sensor should be modified. The modified current sensor should be able to sense small 
current change (for example, lmA) while generating the proper read completion 
signal. This can be done by adding an extra stage in the current sensor. The first stage 
is responsible for sensing the switching current whereas the second stage amplifies 
� Page 128 
Chapter 6 Discussion Asynchronous Memory Design 
the completion signal from the first stage output. In this way, the read completion 
signal is expected to be generated more closely to data read completion at the 
expense of consuming more static and dynamic power. 
If the current sensor can be modified as above, the CSCD method will 
produce better results. However, whether this method is suitable for practical use is 
still a question. It is because first of all, the current characteristics of the C M O S logic 
block may not be very stable. The difference between the simulation models and the 
fabricated transistors makes it more unreliable to trust on the current characteristics. 
These two problems make it very difficult to set a suitable threshold of the current 
sensor. Therefore, one should take notice of these problems when using the CSCD 
method. 
6.2 VOLTAGE SENSEVG COMPLETION DETECTION 
6.2.1 RESULTS COMPARISON 
6.2.1.1 G E N E R A L 
For comparison, we choose the typical condition simulation results to 
represent the simulation results. The pre-layout D V S C D simulation results, post-
layout D V S C D simulation results and the D V S C D testing results for the modified 
sense amplifier are compared in Table 6-1. The time difference and the percentage 
� Page 129 
Chapter6Discussion Asynchronous Memory Design 
change corresponding to the respective benchmark results are shown in the brackets. 
These results are plotted in Figure 6-1. 
^ ^ ^ » ^ C T 
^ ^ ¾ ¾ ¾ ¾ ! ! 
Benchmark 15.81 23.01 25.04 
Region 1 W M 2 ^ 26.00 
(+3.83 ns, +24.2%) (+4.80 ns, +20.9%) (+0.96 ns, +3.8%) 
Region2 H ^ 22.60 24.58 
(-0.95 ns, -6%) (-0.41 ns, -1.8%) (-0.46 ns, -1.8%) 
Region 3 ^ 5 ^ 2 ^ 
(-5.98 ns, -37.8%) (-6.16 ns, -26.8%) (-2.90 ns, -11.6%) 
Region4 lTs n m I^^I 
(-10.06 ns, -63.6%) (-11.09 ns, -48.2%) (-8.53 ns, -34.1%) 
Average 12.52 19.80 22.31 
(-3.29 ns, -20.8%) (-3.21 ns, -14.0%) (-2.73 ns, -10.9%) 
Table 6-1: D V S C D Results Comparison 
Page 130 
Chapter 6 Discussion Asynchronous Memory Design 
30 - • . . � �乂 . � ， . •:. ... ,,,.,., .,, , : , ,„. . . . , . .��,.，『）,：、〔』 . . , . ..^,,.,.;,�...…，,I 
, _ i i i i i i i i i 
20 ^ : ¾ ; ¾ ¾ ¾ ^ 
j 1 5 j ^ ^ ^ ^ ^ _ ^ ^ ^ : ^ r ^ : ^ ^ ^ ^ | ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 
10 ^^_^ ,^_^r^:r:TT_ •h-layoutSimulation  
_^_^^ _^__^^ -^"-"""-"^  激 Post-layout Simulation 
5 _*rT"^ ^^  - - Testing  
0 i i  
1 2 3 4 
Bit Line Load Ratio 
Figure 6-1: Graph of D V S C D Results Comparison 
As observed from the graph, in general, the pre-layout simulation results are 
slower than the post-layout simulation results. This is reasonable since all the 
parasitic effects are considered for the latter one. The testing results are found to be 
slower than the post-layout simulation results. Actually, as mentioned in Section 
5.3.2.3, the testing results lie somewhere between typical and slow post-layout 
simulation results. Therefore, the testing results are reasonable. 
Also, as observed from the graph, the pre-layout and post-layout simulation 
results increase linearly with the bit line load. However, due to discrepancies in 
fabrication, the testing results are not perfectly linear. But we can still draw a best 
straight line across the data points. As we can see, the testing results form a line with 
smaller slope than the simulation results. This is explained in next section. 
� Page 131 
Chapter 6 Discussion Asynchronous Memory Design 
6.2.1.2 BIT LINELOAD 
The maximum variation of the Read Acknowledge Time is obtained by 
calculating the difference between the Region 1 and Region 4 results. It varies 
proportionally with the magnitude of the bit line load. These results are tabulated in 
Table 6-2. Recall that when we do the pre-layout simulation, we have approximated 
the bit line load to be 6 pF per region. Afterwards, when we do the post-layout 
simulation, the bit line load is maintained at 6 pF per region also. Therefore, it is 
reasonable for both the pre-layout and post-layout simulation results to be very 
similar (13.89 ns and 15.91 ns respectively) with the latter one being larger. 
However, the testing result is much smaller than both simulation results (9.49 ns). 
This implies that either the bit line load in the test chip is smaller than the value we 
have estimated, or the drive strength of the memory cell is stronger in the test chip, or 
both. 
1 棚 _ _ ^ ^ ^ ^ ^ ^ 
漏_聽,、1»|^^?^^ 
;:)\1:!.::源):0芬,\‘力、::,《3::(絶悠0:《淡汝^%^_欲^；彳磁她^1^《 
, ¾ ( ¾ ¾ ¾ ¾ ¾ ¾ % ! ¾ ^ ¾ 15.81 23.01 25.04 
^ ; ¾ ¾ ¾ ¾ ^ ^ ! ¾ ^ W M TfJi 26.00 
^¾¾!!^¾¾¾! ^ ^^ ^ 
~~"Maximum Variation JsJ9 1JJ1 J ^ 
(Region 1 - Region 4) 
Transmission Gates Delay JTs 4M 0.96 
(Region 1 - Benchmark) 
Table 6-2: Effect ofBit Line Load and Segmentation on D V S C D Method 
� Page 132 
Chapter 6 Discussion Asynchronous Memory Design 
6.2.1.3 BIT LINE SEGMENTATION 
The segmentation transmission gates delay can be obtained by calculating the 
difference between the Region 1 and Benchmark results. Actually, we are calculating 
the delay sum of the three segmentation transmission gates. These results are also 
shown in Table 6-2. The pre-layout and post-layout simulation results are very 
similar (3.83 ns and 4.80 ns respectively) with the latter one being larger which is 
reasonable. However, the testing result is significantly smaller than both simulation 
results (0.96 ns). This implies that the segmentation transmission gates delay in the 
test chip is smaller. 
6.2.2 RESOURCE CONSUMPTION 
6.2.2.1 AREA 
Refer to the D V S C D circuit shown in Figure 3-10, it consists of two sense 
amplifiers, one N O R gate, one inverter and one N A N D gate. The layout size of one 
D V S C D circuit is 14342 ^ m^ which is mainly contributed by the sense amplifiers. If 
we generate the read completion signal by the conventional method (using only one 
delay circuit), the layout size of the read completion circuit will be 2880 ^iml 
Although the D V S C D circuit is 4.98 times the area of the conventional read 
completion circuit, still it is not a large circuit. 
� Page 133 
Chapter 6 Discussion Asynchronous Memory Design 
6.2.2.2 POWER 
Basically, the power consumption of the D V S C D circuit arises from the two 
sense amplifiers. The power consumption of other components in the D V S C D circuit 
is small. For each modified sense amplifier, the simulated static current is 5 m A and 
the dynamic current can reach as high as 15 m A . Actually, there is a trade-off 
between the speed and power consumption of the sense amplifier: the faster is the 
speed, the larger is the power consumption. 
6.2.3 COMMENTS AND CONCLUSION 
The D V S C D method has been proved functional by simulation and testing. 
Moreover, it is shown that we can have 3.29 ns (20.8%), 3.21 ns (14.0%) and 2.73 ns 
(10.9%) average gain in speed for the pre-layout simulation results, post-layout 
simulation results and testing results respectively over the benchmark results. 
Although the maximum variation in Read Acknowledge Time in the test chip is 
smaller than the simulation value, still it is significantly large so that the D V S C D 
method is worth to apply. The segmentation transmission gates delay in the test chip 
is even smaller than the simulation results, thus the bit line segmentation method is 
worth to apply together with the D V S C D method. The D V S C D method is not very 
area consuming but the power consumption is significant. When applying the 
D V S C D method, the trade-offbetween power and speed should be noticed. 
� Page 134 
Chapter 6 Discussion Asynchronous Memory Design 
6.3 MULTffLE DELAY COMPLETION GENERATION 
6.3.1 RESULTS COMPARISON 
6.3.1.1 GENERAL 
The pre-layout M D C G simulation results, post-layout M D C G simulation 
results and the M D C G testing results are compared in Table 6-3. The time difference 
and the percentage change corresponding to the respective benchmark results are 
shown in the brackets. These results are plotted in Figure 6-2. 
| 職 : ^ ^ ^ ^ ^ « _  i y � y % ^ _ g ^ ^ : e _ ^ f i t r^ ^ \ � | : \ ^ 4 t 
^ M ^ ^ ^ M ^ ^ ^ M ^ M 
― 遍 叙 ‘ 》 : ， 。 〜 c \、\、\ .、\ 、、、.、. 
Benchmark 17.09 19.64 y.:>2 
Region 1 23.37 2733 ^ 
(+6.28 ns, +36.7%) (+7.69 ns, +39.2%) 
Reglon2: T ^ ^ ^ ~ 
: 
- ‘(-2.00 ns, -11.7%) (0.00 ns, 0.0%) 
X _ _ _ ^ ^ ^ _ _ ^ _ _ _ _ ^ ^ ^ _ ^ _ ^ _ 
Region3 “ ^ 5T7^ ^ 
X 
\ : (-7.59 ns, -44.4%) (-7.86 ns, -40.0%) 
Rbkioii4 ;f 536 04 ^ 
f :、\ \彷、 � ； 
、^ \ ‘ (-11.73 ns, -68.6%) (-10.80 ns, -55.0%) 
p v m i g e JTJ] 丽 ^ ~ 
雇::\ \ \ ^ (-3.76 ns, -22.0%) (-2.74 ns, -14.0%) 
Table 6-3: M D C G Results Comparison 
� Page 135 
Chapter 6 Discussion Asynchronous Memory Design 
……—…- - - - -"-- — “_—. 
30  
j � .^ ^^^^^^^^^^^^^^^ .^^ /:::::::^ ""^ 
10 - . - >^ -^  ,^,^^.rrT:::^:'Z^.... - .^ .<r :T^ • •‘……‘ -……‘……-.-.:.-...--...-..vJ 
m - ^ < ^ , ^ ^ . ^ - ^ . 一 — — ^ 
.^_^_^_^ _^>^<""-"""^  _^Pre-layout Simulation 
5 >rrrt"T^- —m— Post-layout Simulation --
__j5j_ Testing 
: : : 荡 _ 纖 _ _ 舊 _ _ _ ^ , | i | i 丨:：?:： ：• :i> .•：：.  ,i:.:,:.i,::.,:.«逸：榮 H^j ia: jM,: te i M i i ih： 
0 -耀__:：丨丨：讓:_灘__-騰^签:勞丨-舞:丨教:，灘：丨：'調：纖移-- j . : : : : 逾 - — — I  1 2 3 4 
Bit Line Load Ratio 
Figure 6-2: Graph o f M D C G Results Comparison 
As observed from the graph, in general, the pre-layout simulation results are 
slower than the post-layout simulation results. This is reasonable since all the 
parasitic effects are considered for the latter one. As discussed previously in Section 
5.3.2.4, the Write Acknowledge Time is found to be 9.52 ns for all cases which 
implies that the actual write time is less than 9.52 ns. The write operation is speeded 
up in the test chip. This implies that either the bit line load in the test chip is smaller 
than the value we have estimated, or the drive strength of the write buffer is stronger 
in the test chip, or both. 
6.3.1.2 BIT LINE LOAD 
The maximum variation of the Write Acknowledge Time which is obtained 
by calculating the difference between the Region 1 and Region 4 results are tabulated 
� Page 136 
Chapter 6 Discussion Asynchronous Memory Design 
in Table 6-4. Since we cannot measure the M D C G results from the test chip, only the 
simulation results are tabulated. It is reasonable for both the pre-layout and post-
layout simulation results to be very similar (18.01 ns and 18.49 ns respectively) with 
the latter one being larger. 
I ^ ^ ^ i i i e ^ ^ ^ ^ - i i 
^ ^ i a s i ^ i ^ S i ^ 
Benchmark 17.U9 l、M 
: ¾ ¾ ^ ^ ^ ¾ ¾ 丽 ^ 
:¾¾¾!^ ¾¾¾¾! ^ ^ 
Maximum Variation T^1 TK49 
(Region 1 - Region 4) 
Transmission Gates Delay 6.28 7.69 
(Region 1 - Benchmark) 
Table 6-4: Effect ofBit Line Load and Segmentation on M D C G Method 
6.3.1.3 BIT LINE SEGMENTATION 
The segmentation transmission gates delay which is obtained by calculating 
the difference between the Region 1 and Benchmark results are also shown in Table 
6-4. The pre-layout and post-layout simulation results are very similar (6.28 ns and 
7.69 ns respectively) with the latter one being larger which is reasonable. 
� Page 137 
Chapter6Discussion Asynchronous Memory Design 
6.3.2 RESOURCE CONSUMPTION 
6.3.2.1 AREA 
Refer to the M D C G circuit shown in Figure 3-12, it consists of four delay 
circuits and four transmission gates. The layout size of one M D C G circuit is 21879 
p,ml If we generate the write completion signal by the conventional method (using 
only one delay circuit), the layout size of the write completion circuit will be 2880 
l^ml Although the M D C G circuit is 7.60 times larger than the conventional write 
completion circuit, still it is not very large. 
6.3.2.2 POWER 
Unlike the D V S C D circuit which consumes both static and dynamic power, 
the M D C G circuit only consumes dynamic power. The dynamic power consumption 
is small since the W/L ratios of all the transistors used in the M D C G circuit are not 
large. 
6.3.3 COMMENTS AND CONCLUSION 
The M D C G method has been proved functional by simulation. Moreover, it is 
shown that we can have 3.76 ns (22.0%) and 2.74 ns (14.0%) average gain in speed 
for the pre-layout and post-layout simulation results respectively over the benchmark 
results. The M D C G method can generate the write completion signal in the test chip. 
However, due to the discrepancies between the simulation models and the fabricated 
Page 138 
Chapter6Discussion Asynchronous Memory Design 
transistors, we do not manage to measure the M D C G results. The segmentation 
transmission gates delay is acceptable for the simulation results, thus the bit line 
segmentation method is worth to apply together with the M D C G method. The area 
consumption for the M D C G method is not very large, and the power consumption is 
small. The maximum variation in Write Acknowledge Time for the simulation results 
are significantly large, so it seems that it is worth to apply the M D C G method. 
However, from the measured results, we can estimate that the maximum variation 
will be less than 9.52 ns, in that case, the M D C G method may not be as beneficial as 
expected. The M D C G method will be more beneficial if the bit line load is large. 
6.4 GENERAL COMMENTS 
6.4.1 COMPARISON OF THE THREE TECHNIQUES 
In designing and implementing the proposed asynchronous memory system, 
we have investigated three completion signal generation techniques: the CSCD 
method (Current Sensing Completion Detection), the D V S C D method (Dual-Rail 
Voltage Sensing Completion Detection) and the M D C G method (Multiple Delays 
Completion Generation). These techniques are compared in Table 6-5. 
Page 139 
Chapter 6 Discussion Asynchronous Memory Design 
^^BY�_^^^^^^ff<�. ‘'‘�“ “—“‘"•“""""^ r r ^ 
^ ^ 繁 職 ^ ^ ^ 8 、 、 \ \ \ 0 . 、 ， 、 、 . , . - 广 
Application ^ a d completion Read completion Write completion 
signal generation signal generation signal generation 
-;''::::::、}:':,辦_灘@霍 Current sensing Voltage sensing Delay insertion 
Simulation Results Unacceptable Good Good 
“Testing Results N / A Good Cannot be determined 
Transistor Count Small Medium Large 
丨::_;赫:麵_函萄 Small Medium Large 
::,磁感||§^'__'5 Medium dynamic Large dynamic~ Small dynamic power 
and static power and static power 
Effect ofBit Line N / A Small Small 
Segmentation 
Portability B ^ Good Fair 
Suggested General C M O S Memory circuit^ General asynchronous 
Application circuits circuits 
Table 6-5: Comparison between the CSCD, D V S C D and M D C G methods 
The first four items in the table are previously explained. For the transistor 
count and area consumption, the CSCD method is the smallest because it consists of 
only a current sensor and a N A N D gate; the D V S C D method is in the middle and the 
M D C G method is the largest since it consists of several delay elements which are 
complicated circuits. For the power consumption, the M D C G method is the smallest 
since it only consumes a small amount of dynamic power; the CSCD method is in the 
middle since the current sensor will consume a certain amount of dynamic and static 
power; the D V S C D method is the largest since the sense amplifiers will consume a 
significant amount of dynamic and static power. For the effect of the bit line 
� Page 140 
Chapter6Discussion Asynchronous Memory Design 
segmentation, it is small for both the D V S C D and M D C G methods. For the 
portability, we mean the ease of applying these techniques to another memory system 
with similar framework but different size. In this case, the D V S C D method is the 
best since it can be readily applied to the new system by simply replacing the sense 
amplifiers with the new ones; the M D C G method is in the middle since we have to 
find out the respective delay values of the new system and then change the values of 
the delay elements in use; the CSCD method is the worst since in the new memory 
system, the sense amplifier used may not be the same so that the threshold of the 
current sensor have to reset which is very time consuming and tricky. After 
investigation, we have the following suggestion on the application of these 
techniques: The CSCD method is suitable for general C M O S circuit, but may not be 
suitable for special circuits like the memory circuit; the D V S C D method is tailor-
made for the read completion signal generation in the memory circuit; the M D C G 
method is suitable for general asynchronous circuits with significant timing 
difference between different modes of operation. 
6.4.2 BIT LEVE SEGMENTATION 
As mentioned in the previous section, the effect of bit line segmentation on 
the D V S C D and M D C G methods are small. The bit line can be segmented as proved 
by the difference in access times of the simulation as well as testing results. The 
layout size of each of the transmission gate pair is about 4 times the size of one 
memory cell. Therefore, when segmenting into four regions, the size of the 12 rows 
of memory cells are added to the memory matrix. If there are 1024 rows in the 
Page 141 
Chapter 6 Discussion Asynchronous Memory Design 
memory matrix, the segmentation transmission gates will increase the matrix size by 
about 1.2 % which is small. Therefore, we can conclude that the bit line segmentation 
is functional, effective and the area and power consumption is small. 
6.5 APPLICATION 
• 
The proposed lM-bit asynchronous memory system can be used as one ofthe 
basic building blocks for various asynchronous applications. However, for some 
applications, the memory requirement is more than lM-bit. How to connect several 
lM-bit systems together to form a larger memory block? 
Several lM-bit systems can be connected together easily through the use of 
multiplexers and decoders. To illustrate this, the connection for a 8M-bit 
asynchronous memory system is taken as an example which is shown in Figure 6-3. 
Apart from the eight lM-bit systems, a decoder and a multiplexer are required. All 
the data buses (lbit) and address buses (20 bits) of the lM-bit systems are connected 
together. One may wonder how the eight pairs of request and acknowledge signals of 
the lM-bit systems are combined together. Actually, for a 8M-bit system, the total 
number ofaddress bits required is 23. Apart from the 20 address bits connected to the 
address bus, the remaining 3 address bits are used as inputs to a 3-to-8 decoder, the 
outputs of which are used to control a 8-to-l multiplexer. This multiplexer is used to 
combine the eight pairs of request and acknowledge signals together to form one pair 
of request and acknowledge signals. In this way, this 8M-bit asynchronous memory 
� Page 142 
Chapter6Discussion Asynchronous Memory Design 
i i i i i l i i f H ^ 
iii|_PtflijjJ|tiiitiReq i M . b i t A — s s ¢ 1 : : = : : : ; : : : : : �A d d r e s s iM-bi t ^ 叫 ， 
1_壊曜__1:辨嗎^  R A V Asynchronous Asynchronous R A V “ , " . ' ' " 
11 _ 1 _曙 A c k 麵 Data g::| |：：；：；^ _ D a t a S R A M ^ ^ ^ |  
i i i i " _ i i _ | i i i t i t i f ^ ^ ^ " ^ ~ " A — s s | g 5 ^ j j j J ^ ^ m Address , M - b i t 叫^:圏隱隱函画||_誦||隱!|丨|;!丨 
_ i i " S : i _ i l _ i i i i i i i i i i f t W R A V Asynchronous Asynchronous ^ Eii;EiMm^^^WSW^ *m'M 
___|||pgjjjpj| Ack SRAM Data《:::；::;:_Data 画 八。 |^:|麗麗圓|||__腿|旧|_丨||: 
1 i l i i | S | i | ^®^ lM-bi t Address g | : : : : : : ; : ::二 A d — s iM-bi t ^ ‘ ^ 丨|塵__1^ 1_画圍_丨誦.__丨 
l i _ ; _ _ y i i i f t i i i i i S i S i i i i 谓 Asyndu-onous Asynchronous R A V ^ ^ ¾ ¾ ¾ M f H mWMM *SS. 
I I i p i | p Aok _ Data g J ^ l | ^ 1 ¾ ] Data S R A M 八。、 | | | | | _ | : | | | | | | | | _ | | _ | | | 
i | | | n i i H | | i | i | ^ i | ^ Req lM-bi t Address g ^ :::::::::; :::::::i^ Address iM-bi t ^^^ ：||||；||^|；|,|||||^ 
: i _ i : : : i _ i i i i _ _ J i i i S S R A V Asynchronous Asynchronous R A V 誦1§1應誦__冒;丨|::.::1::丨:.::丨丨:.： 
i i i _ i _ ^ i : ! _ ^ _ _ | A c k S _ Data 口： ；;:: ^ D a t a 麵 八扇_隱丨:顯_:__|[隱:;| 
_ 丨 ___1____顧_:_顏_^^^ _ _._•_ _Mi;::;::piik;ii 
隱__!誦園_薩麵^ 
.:::••…丨：丨：丨：丨 :N：：!；；.；；<^：：：：^ _ : 丨 丨 仏 _ ：•：,丨：::丨:：：,；：:.丨::.:::.:!:¾;?:：：：：；：；丨:.:：；:::.；丨：；：:.:::丨；:：：!：'：：： ：； :， | :丨 : ¾ , ; : : fSSs^££： y：；.-：：；；；丨:；:；:•丨：；；'•：；丨::；；；•；.： •：；：-；；丨：i.:: ；-• ：: ： •: ：: •；：；:丨；：：；：^-； 
• 1鑑贫:.::::::丨::1丨1:;丨::.錢::.丨丨缀终缀揭紹该;縫:黎:丨运:::.:彳丨::.:fo~"o““二 二““(^ I (N““rn~~(^^““^ >b"“^>o >0““Co““^ ~~^ M: ；•; . 丨.:丨.:丨彳_;:)丨1彳1丨丨:與资:丨:丨彳彳:丨丨:丨:想 0(:;::¾;;: ；：^：-；"-：；：.：：>>. ；-：；>；：：；；：：；：：；：^^；：：；：>；：,  . ： Q ^ 0 ^ 0 ^ 0 -H 0 ^ 0 ^ 0 ^ 0 ^ :::::::.:.:.:.:::: 
；：;：：：：^ >H >^  >- >^  >^  >^  >^  >^  >^  >^  >^  >- >^  >^  >^  >^  ^ ^mm ^ m M ^ 
i i ^ l l i l i i i i i i y 8-to-l Multiplexer Control ( Output ^ '^^‘^ Input ( _ 1 ^爲錄_ : :丨:丨丨_ :誦| _ _ _讓_ _ _ _ _ ^ 
i i i i l : f i i i ; i f i B ^ (2 bUs mput/output bus) rnM^ Decoder M S : _ ; _ _ f f : : : H _ : _ : _ _ _ _ ^ 
[:.；‘丨：：：！丨 i# '1;¾¾!;!;¾ li&iI;l: &丨::;.;:丨:^):::;1要翁丨丨丨:;|:1:;:丨.:.;:; : o 一 ;:;S:; 丨;:丨能:.丨:丨:丨:召:;:.丨射:;::丨.;!itisili^：：;丨::::丨:;:.:丨:丨:;.丨:丨|.丨:丨.； 
_ _ : _ 藤 _ : _ _ 義 _ 纖 _ 響 : : _ _ < < 丨:::.”;.::.:.:1 ：： k=^^ 
liillllB 
R A V R E Q A C K 
Figure 6-3: 8M-bit Asynchronous Memory System 
system can be used in a way exactly the same as the lM-bit system except that the 
memory size is eight times the lM-bit one. 
Page 143 
Chapter6Discussion Asynchronous Memory Design 
6.6 FURTHER DEVELOPMENTS 
6.6.1 EVTERACE WITH TWO-PHASE HCP 
The proposed asynchronous memory system is designed to communicate with 
other asynchronous systems based on four-phase HCP. However, some asynchronous 
systems are using two-phase HCP for communication. H o w to modify the proposed 
system to communicate with these systems? 
To communicate with these systems, the proposed asynchronous memory 
system should be able to communicate with two-phase H C P also. To achieve this, 
one way is to re-design the memory system as a two-phase system. All the control 
signals described in Section 3.2.1 should be modified to two-phase. However, for the 
present memory architecture, the internal control signals are level sensitive which 
can only be generated in four-phase. To allow these signals to be generated in two-
phase, the entire memory architecture should be re-designed. However, this is not a 
good option since the two-phase memory architecture is expected to be more 
complex than the present one. The new architecture may consume more area and 
power, but the speed may not be faster than the present one. Also, it is hard to design 
the new memory architecture. Therefore, we will not re-design the memory system as 
a two-phase system for two-phase communication. Instead, we propose that an 
interface circuit should be designed to allow the memory system to communicate 
with other systems based on two-phase HCP. 
Obviously, there is no need to modify the address bus, data bus and internal 
control signals described in Section 3.2.1.2. Only the two external control signals, the 
Page 144 
Chapter 6 Discussion Asynchronous Memory Design 
request and acknowledge should be modified. These signals are converted from four-
phase HCP signaling to two-phase HCP signaling by connecting to the interface 
circuit shown in Figure 6-4. 
As shown in the circuit diagram, the four-phase request signal can be 
generated by the connection as follows. The two-phase request and acknowledge 
signals are input to the Muller C-element, the output of which is XOR-ed with the 
two-phase acknowledge signal. In this way, whenever there is an event on the two-
phase request signal, its voltage level will be opposite to the two-phase acknowledge 
signal. The C-element will output the present voltage level of the two-phase request 
signal. When the C-element output is combined with the two-phase acknowledge 
signal by the X O R gate, the X O R gate output will pull HIGH. When the read / write 
operation is finished, the two-phase acknowledge signal will toggle its voltage level 
so that the X O R gate will pull L O W . In this way, the four-phase request signal is 
::_ffl:_l :-:;i;:r^o;i:.l;;^ ^^  ^^^^；:||；;!;;[^"：：^^ 
;Hlii |ib^:-^-^'-:-i^l:C-^  C-element ^^；；;；;{；;；;；；-；^  
:;_:r't:.RE£'::::::::::.:^ ic^ wMi^ u$sfn^  
乂吼 :::丨:丨丨::贝《1此炬:::丨:、:丨_:|::::丨:丨:丨::::;::；^  - •：：-丨丨：:•.…丨.：:|;:.:丨丨丨丨:二::丨丨丨丨::：:，:：，.：：：:::.:'!'"；：!y^.Armm-^：：^^  
R E Q :•:•:.:";::•:::::”":::':::::::::；:::::::{•::::::!::::;::::::: :::::.:::::^:^^^:^^^:•^r^Lii"::::":::.::!::.::::::,::::/:;::;;:::::::::.::: :::::::!:;::.___::::::".;::::_:::::;.::.::::;::沒.:.:丨:::::丨:.::丨::::::::::::::「:： • Control 雇;_:.::::::_:_丨:;::::丨::::丨、:.:丨.::丨:丨:丨:::.丨.::丨::::丨:丨匪:_丨::::::::.;:::.:丨:::丨丨:丨::::丨:丨:丨丨:丨::::丨:丨丨::::::::::::丨::丨丨丨丨^^^^ :.:.: M•i^gig:丨丨hAf/:^r^:i釋¾诗P^ i^hn::.;::;:  
Circuit ACK .;:^::;:____W:r::.:___,__B|f,;::::::_,::::':::^^^^^  
:::::误;;;:浑难^线5牵;;化:.::1;!::.:;;誇 $;|;::.::;::;,::1;:"::;;:;:;):;:;: .li:iillil'^:itM^^^^^  PC CR CW Acknowledae i ；:::……:..:,:::::：：::::;:;:::;::::::.:,；::....•:::: iifllpisS^  
•;::;:::丨|丨丨丨:;_:;丨崎_:丨|_|:誦_1丨;:_^  
|__||圓:::::::』::!:_.|:::;_:;;::":::|;1 ::i:;l;;:_lilli^^ 
_1__::_:^1惑丨:丨:1丨::_画:丨:_ QH::lJil!IM^ ^^  
_fl:_:?__?i _::;::::;:，.:::.::': :;:!"^:::.::究;::.:丨::::.: .::丨::::丨:丨:丨:::.!:::.::?:. CLR ^ ；.；；；；：；：；：；； '1:_i:H_l_:_____U_^^ :丨「:::::;:丨_::盈: 
. ••‘ . . ..:::;L.^-.-'.-^---.---.--'^>'---'--->'---'-"-.s. i^—>:rk»,^f:—.w:.—?^—»:~ . •‘ 
Interface Circuit 
Figure 6-4: Interface with Two-Phase HCP 
Page 145 
Chapter 6 Discussion Asynchronous Memory Design 
obtained at the X O R gate output. 
The four-phase acknowledge signal is connected to the positive-edge 
triggered D-type flip-flop. In this way, whenever an event occurs on the four-phase 
acknowledge signal, a positive-edge is triggered, and the D-type flip-flop output will 
be toggled. The output will either stay HIGH or L O W (depending on its previous 
state) during the entire active phase of the four-phase acknowledge signal. When the 
four-phase acknowledge signal is deactivated, a negative-edge is triggered. However, 
since the flip-flop is positive-edge triggered, its output will remain unchanged during 
the entire recovery phase ofthe four-phase acknowledge signal. As a result, the two-
phase acknowledge signal is obtained at the output. 
6.6.2 DATA BUS EXPANSION 
The proposed asynchronous memory system is being simulated and tested for 
one bit data bus. For applications in which the data bus width is more than one bit, 
the completion circuits of the system should be modified. The write completion 
circuit need not be modified since the generation of the write completion signal is 
independent of the data bus width. Therefore, the only circuit that needs to be 
modified is the read completion circuit. In Section 3.5.2, we have described how the 
read completion circuit is implemented for eight bits data bus. The described idea can 
also be applied to the general case of N bits data bus. The block diagram of read 
completion signal generation by the D V S C D method for N bits data bus is shown in 
Figure 6-5 which is similar to Figure 3-11. For a normal memory matrix, since the 
� Page 146 
Chapter6Discussion Asynchronous Memory Design 
— • ‘ • • “ ‘ ‘ . ' 二 , , ‘‘ 
^ DVSCD Circuit 
l i i i i i l f c t^ ^^  
||l;:||l:gggg^ 
\. ； ； ‘  ‘ 
；：：；^：；；；：；；'^：；；；!:；；^：：：'-'；：；；：；^ :y:''':-.v:^-'<-:::'':.:::}-^^^ 
\ \ . ； . . 
|||;|-：;|：||；1；;|；|!；;^ 
: Precharge \ i 尺 腳 释 
ToRead [ 3 > _ _ ! V > ^ — — : - ^ ' 
Buffer J 
ys;'Hi；：；；-?^-；'^'；^  '^.:M'''-'.-^^^^^ 
| i H i i i i i ^ ^ ^^^^^--M 
li_ii__S:___ii%i^ iiii__:.iiiii_^ ^^ ^ 
Figure 6-5: N bits Data Bus D V S C D Read Completion Circuit 
longest read access time among the N bits data is either at the first bit or the last bit, 
the D V S C D method is applied to these two bits only. The output of the D V S C D 
circuits are then combined with the precharge signal, and the final READcoMP is 
obtained from the N A N D gate. 
6.6.3 SPEED OPTBlKATION 
Since the memory access time of the proposed asynchronous memory system 
depends on the memory location (Region 1 to Region 4), the average speed 
performance of the proposed system depends on how memory cells at different 
regions are accessed. In practical situation, only a certain percentage but not the full 
capacity of the system are required at most of the time. Therefore, the speed of the 
Page 147 
Chapter 6 Discussion Asynchronous Memory Design 
proposed system can be optimized in general by allocating the nearest memory cells 
first (Region 4) progressively towards the farthest memory cells (Region 1). 
Take for example, the system is being used up to 256k bits for a certain 
application. For comparison, different cases are considered and the ways the 
corresponding memory matrix is allocated are shown in Figure 6-6. In the figure, the 
allocated memory blocks are filled with the 'x' pattern. For Case (a) (Conventional 
method), since the access times for memory cells at different locations are the same, 
the data are allocated at random locations. For Case (b) (DVSCD and M D C G 
methods), although the memory access time varies with memory location, since the 
asynchronous processor or the control unit does not take care of how the memory 
cells of the proposed system are allocated to optimize the speed, the data are still 
allocated at random locations. For Case (c) (DVSCD and M D C G with speed 
optimization), the memory access time varies with memory location, and the 
asynchronous processor or control unit allocates the nearest memory cells first 
(Region 4) to optimize the speed. The Read Acknowledge Times for the three cases 
Conventional DVSCD and DVSCD and MDCG Methods 
Method MDCG Methods with Speed Optimization 
i i ' | ' : : : : ; : : : . _ | . . . . . | I : : _ _ — : : : : : — _ ^ ^ 
iiii ^ 0 ^f_i_;......^.....%^i__ _ _ _ 
:__• R^ 1^?_糧___ R^ 凶5^ 丨!___1 l_:li;:__ii:: 
t U K j — m m i^ 
(a) (b) (c) 
Figure 6-6: Memory Matrix Allocations for Different Cases 
� Page 148 
Chapter 6 Discussion Asynchronous Memory Design 
are compared in Table 6-6 based on the D V S C D measured results. The Write 
Acknowledge Times are not compared since we cannot measure the accurate results 
for different regions, and the measured results imply that the difference between the 
Region 1 and Region 4 Write Acknowledge Times is small. In the table, the results 
for Case (b) and Case (c) are compared with that of Case (a) and the decrease in 
access times and the respective percentages are shown in brackets. For the modified 
sense amplifier, the time saved per read operation increases from 2.73 ns (10.9%) to 
8.53 ns (34.1%) when speed is optimized. For the conventional sense amplifier, the 
time saved per read operation increases from 6.25 ns (13.9%) to 18.63 ns (41.5%) 
when speed is optimized. Therefore, the speed of the proposed system can be greatly 
improved after optimization. 
" : ) x v f t p g | ^ ^ ^ ^ 
^ ^ M ^ ^ ^ ^ ^ ^ M ^ ^ ^ ^ ^ ^ M 
^ ^ ^ B ^ ^ ^ ^ ^ ^ ^ M ^ ^ ^ ^ ^ ^ ^ M 
： - 、 、 � \ .、 ;？）、 \ :、 X \ > : W、W .”m } ; ^ V ^ > v ^ V t C ^ V ^ > :《“、、、、V : A V ^ ^ ^； ^ ^ W 25.04 44.86 
:'。〜‘访一 ~~22.31 (-2.73 ns,-10.9%)~ 38.62 (-6.25 ns, -13.9 %) 
： m m m m  
；''''\ ‘ \ "： '>:'"'〃 - > t, ''^" k；:」:；為-
^ 16.51 (-8.53 ns, -34.1 %) 26.28 (-18.63 ns, -41.5 %) 
V } 
_J  
Table 6-6: The Comparison ofRead Acknowledge Times for Three Cases 
The above example demonstrates the importance of speed optimization by the 
control unit when only part of the memory matrix are required. How about the case 
for which the entire lM-bit memory matrix is needed? In this case, the speed ofthe 
system can still be optimized. In general, this is achieved by first of all dividing the 
� Page 149 
Chapter 6 Discussion Asynchronous Memory Design 
data to be allocated to the proposed system into several classes according to the 
frequency of usage of the data. Then, speed can be optimized by allocating the more 
frequently-used data in nearer memory cells and the less frequently-used data in 
farther memory cells. Conclusively, the way to optimize the speed for the proposed 
system depends on various types of applications. It can be achieved by modifying the 
computer program that runs the control unit. The gain in speed after optimization 
may vary according to the application. 
6.6.4 MODmED WRITE COMPLETION METHOD 
W e have investigated the M D C G method for write completion signal 
generation. As described in Section 2.4, we generate the write completion signal by 
delaying the request signal instead of sensing the target memory cell contents. It is 
both difficult and impractical to add a sensor to each memory cell in the matrix and 
perform the sensing. However, there is still one practical method to generate the 
write completion signal based on similar principle. 
The block diagram of the modified write completion method is shown in 
Figure 6-7. As shown in the figure, assuming one bit data bus, one extra memory cell 
is added to the bit lines. The extra memory cell is modified so that we can then 
perform current or voltage sensing on the cell to generate the write completion 
/ 
signal. This method is also implied for N bits data bus, in which N extra modified 
memory cells are added, and the write completion signal is generated by combining 
the sensing results of the N cells. Theoretically, this method can generate the write 
� Page 150 
Chapter 6 Discussion Asynchronous Memory Design 
Memory Matrix 
:i__i_ii__iiii_i_^ ^^ ^ c ^ Si_i|||_l 麗丨1丨丨丨|丨丨1!丨1漏!丨旧纏隱_1誦《g ^ g ]〖::!!___ ii_ 
i_______ifciii plc^|im Jj|iii 
|11_1:__圓_1_薩__1誦_^  ffl iiiii〖t_i:P_ii:::liilil:3:i:::_::iiiii_iiii_i_^^ ® g iil::_lll_ 
:;|:::;|!|;1_:!_纖___|_園|;|; :ii:i_:_i__i_|__,y_____;:: :iii||||_l 
:.::••®:;i;.i:;;_i__iii_:i"i:;_ o 1：1：'：1'1：：1：：：：；：1 z _•:;____ 
.||：;.：：;：：^：：：;：：：：：|：：：1^^^^ o £ h：；：：；^：：?：：；；：!^-^  y ^ 1丨丨丨"丨:丨.:丨丨_|團變|| 誦_圓_|國 
Write Compo Write Comp^ 
Figure 6-7: Block Diagram of Alternative Write Completion Method 
•^ ^^ ^^ g^ ^^ ^^ ^^ ^^ ^^ ^^ i^ ^^ ^^ m^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ H^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ I^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ H^^ ^^ ^^ ^^ I^ ^^ ^^ ^^ H^^ ^^ ^^ I^ I 
completion signal accurately. However, practically, the sensing circuitry will affect 
the performance of the extra memory cells. Therefore, in order to apply this method 
with promising performance, one should think of how to increase the sensitivity of 
the sensing circuit while minimizing the effect of the sensing circuit on the extra 
modified memory cell. 
� Page 151 
Chapter 6 Discussion Asynchronous Memory Design 
Q CONCLUSION 
1. CONCLUSION 
In this chapter, we will present the achievements of the research. This include 
problem definition, implementation, evaluation, comments and suggestions. 
7.1 PROBLEM DEFEVITION 
W e have explained the motivation of designing an asynchronous memory 
system. The proposed asynchronous memory system can be used together with 
various types of asynchronous processors. The proposed asynchronous memory 
system is defined as a memory system that can communicate with other 
asynchronous systems and is capable to generate the true completion signals once the 
read / write operation is finished. 
7.2 EVffLEMENTATION 
In general, we have proposed to implement the circuit using S R A M and Four-
Phase Handshaking Control Protocol (HCP). To generate the true read / write 
� Page 152 
Chapter 6 Discussion Asynchronous Memory Design 
completion signals, first of all we have introduced the concept of variable bit line 
load which is made possible by breaking the bit line into segments. In this way, the 
memory access time can be made variable depending on the memory location and the 
average access time is reduced. Based on this concept, the asynchronous techniques 
investigated are applied. W e have selected and explained the Current Sensing 
Completion Detection (CSCD) method, the Dual-Rail Voltage Sensing Completion 
Detection (DVSCD) method and the Multiple Delays Completion Generation 
(MDCG) method, the first two for read completion signal generation signal and the 
last one for write completion signal generation. 
To evaluate the performance of the proposed asynchronous memory system, 
we have tried to implement the system based on the lM-bit S R A M framework as an 
example. In order to implement the variable bit line load concept, the bit line is 
segmented into four regions. All the memory components including the 
implementations of the CSCD, D V S C D and M D C G methods and the Four-Phase 
HCP control circuit are realized at the gate-level. 
7.3 EVALUATION 
The asynchronous techniques investigated are first of all evaluated by pre-
layout simulation results. W e have performed the pre-layout HSPICE simulation 
using the A T M E L ES2 C M O S 0.7 i^m simulation model. New memory timing 
specifications are defined for the proposed memory system. The bit line load used is 
� Page 153 
Chapter 6 Discussion Asynchronous Memory Design 
determined to be 6 pF per region. W e have performed the benchmark memory 
simulation for comparing with the simulation results of the proposed techniques. The 
C S C D simulation results are several times slower than the benchmark results. This is 
because the current sensor in use is not sensitive enough to generate the read 
completion once the data is read. However, if its sensitivity is increased, the read 
completion signal cannot be generated properly. The D V S C D and M D C G methods 
are proved functional by the pre-layout simulations. For the D V S C D typical 
simulation results, the average Read Acknowledge Time is 12.52 ns which is 3.29 ns 
(20.8 % ) faster than the benchmark results. For the M D C G typical simulation results, 
the average Write Acknowledge Time is 13.33 ns which is 3.76 ns (22.0 % ) faster 
than the benchmark results. 
The D V S C D and M D C G methods are also evaluated by post-layout 
simulation results and the asynchronous memory test chip results. The test chip 
consists of 1024 x 4 bits and is fabricated by A T M E L ES2 0.7 ^im C M O S 
technology with a die size of 21.8 m m l The post-layout HSPICE simulation is 
performed using the A T M E L ES2 C M O S 0.7 ^ im simulation model and the D V S C D 
and M D C G methods are proved fonctional. For the D V S C D typical simulation 
results, the average Read Acknowledge Time for the modified sense amplifier is 
19.80 ns which is 3.21 ns (14.0 %) faster than the benchmark results. The average 
Read Acknowledge Time for the conventional sense amplifier is 82.45 ns which is 
26.80 ns (24.5 %) faster than the benchmark results. For the M D C G typical 
simulation results, the average Write Acknowledge Time is 16.90 ns which is 2.74 ns 
(14.0 % ) faster than the benchmark results. The test chip is proved functional by the 
Logic Analysis System. For the D V S C D measured results, the average Read 
� Page 154 
Chapter 6 Discussion Asynchronous Memory Design 
Acknowledge Time for the modified sense amplifier is 22.31 ns which is 2.73 ns 
(10.9 % ) faster than the benchmark results. The average Read Acknowledge Time for 
the conventional sense amplifier is 38.62 ns which is 6.25 ns (13.9 % ) faster than the 
benchmark results. For the M D C G measured results, the Write Acknowledge Time is 
9.52 ns for all cases. The actual results are smaller than the minimum write 
completion delay value. This is due to the discrepancies between the simulation 
model and the fabricated transistors. 
7.4 COMMENTS AND SUGGESTIONS 
The asynchronous techniques investigated can be concluded as follows. For 
the CSCD method, it is not suitable for generating the read completion signal using 
the present current sensor design. Even if the current sensor is modified, it is difficult 
to set the threshold for it to operate correctly in the test chip since the simulation 
results may not be accurate enough. For the D V S C D method, it is suitable for 
generating the read completion signal. It works well with the bit line segmentation 
method. It is not very area consuming but the trade-off between power and speed 
should be noticed. For the M D C G method, it is suitable for generating the write 
completion signal only if the bit line load is large. It works well with the bit line 
segmentation method. It is not very area and power consuming. 
The proposed memory system can be used in various applications. In general, 
several systems can be connected together easily to form a larger memory system. 
� Page 155 
Chapter 6 Discussion Asynchronous Memory Design 
The proposed memory system can be further developed to be able to communicate 
with other asynchronous systems based on Two-Phase HCP. Also, the number of bits 
in the data bus can be expanded easily. The speed performance of the proposed 
memory system can be optimized if special techniques are developed for smart 
memory allocation. More work can be done on the modified write completion 
method to improve the write completion signal generation. 
� Page 156 
References Asynchronous Memory Design 
8. REFERENCES 
1] S. Hauck, "Asynchronous Design Methodologies: An Overview", Proceedings 
ofIEEE, Vol. 83, No. 1, pp. 69-93, January 1995. 
2] C. L. Seitz, "System Timing", Introduction to VLSISystem, Mead and Conway, 
Addison Wesley, Chapter 7, pp. 218-262, 1980. 
[3] J. Spares, C. D. Nielsen, L. S. Nielsen and J. Staunstrup, "Design of Self-timed 
Multipliers: A Comparison", Workshop on Asynchronous Design 
Methodologies, Manchester, March 1993. 
[4] Y. K. Tan and Y. C. Lim, "Self-timed System Design Technique", Electronic 
Letters, Vol. 25, No. 5, pp. 284-286, March 1990. 
[5] T. C. Pang, Y. W . Pang, C. S. Choy, C. F. Chan and W . K. Cham, "Self-timed 
Design Methodologies - A Comparison", under review. 
6] S. H. Unger, Asynchronous Sequential Switching Circuits, N e w York NY: 
Wiley-Interscience, 1969. 
[7] S. M . Nowick, D. L. Dill, "Automatic Synthesis of Locally-Clocked 
Asynchronous State Machines", Proceedings ofICCAD, pp. 318-321, 1991. 
'8] S. M . Nowick, D. L. Dill, "Synthesis of Asynchronous State Machines Using a 
Local Clock", Proceedings ofICCD, pp. 192-197, 1991. 
[9] K. Yun, D. Dill, "Automatic Synthesis of 3D Asynchronous State Machines", 
Proceedings ofICCAD, pp. 576-580, 1992. 
[10] K. Yun, D. Dill, S. M . Nowick, "Synthesis of 3D Asynchronous State 
Machines", Proceedings ofICCD, pp. 346-350, 1992. 
[11] I. E. Sutherland, "Micropipelines", Communications ofthe ACM, Vol. 32, No. 
6, pp. 720-738, June 1989. 
[12] E. Brunvand, R. F. Sproull, "Translating Concurrent Programs into Delay-
Insensitive Circuits", Proceedings ofICCAD, pp. 262-265, 1989. 
[13] J. C. Ebergen, Translating Programs into Delay-Insensitive Circuits, Centre 
for Mathematics and Computer Sciences, Amsterdam C W I Tract 56, 1989. 
[14] J. C. Ebergen, "A Formal Approach to Designing Delay-Insensitive Circuits", 
Distributed Computing, Vol. 5, No. 3, pp. 107-119, July 1991. 
PageR-1 
References Asynchronous Memory Design 
15] T. A. Chu, C. K. C. Leung, T. S. Wanuga, "A Design Methodology for 
Concurrent VLSI Systems", Proceedings ofICCD, pp. 407-410, 1985. 
[16] T. A. Chu, "Synthesis of Self-timed VLSI Circuits from Graph-Theoretic 
Specifications", M 1. T. Tech. Rep. MIT/LCS/TR-393, June 1987. 
17] L. Y. Rosenblum, A. V. Yakovlev, "Signal Graphs: From Self-timed to Timed 
Ones", International Workshop on Timed Petri Nets, Torino, Italy, pp. 199-
206, 1985. 
18] M . A. Kishinevsky, A. Y. Kondratyev, A. R. Taubin, V. I. Varshavsky, "On 
Self-timed Behavior Verification", Proceedings of TAU' 92, March 1992. 
19] J. Martin, "Programming in VLSI: From Communicating Processes to Delay-
Insensitive Circuits", UT Year of Programming Institute on Concurrent 
Programming, C. A. R. Hoare, Ed. MA: Addison-Wesley, pp. 1-64, 1989. 
[20] L. E. M . Brackenbury, S. B. Furber and R. Kelly, "Transforming Architectural 
Models Into High Performance Concurrent Implementations", Department of 
Computer Science, Manchester University. 
[21] H. Taub, "Sequential Circuits", Digital Circuits and Microprocessors, 
McGraw-Hill International Editiion, Chapter 7, pp. 310-316, 324-325, 1985. 
22] J. D. Garside, "A C M O S VLSI Implementation of an Asynchronous ALU", 
Department of Computer Science, Manchester University. 
[23] S. J. Muscato and A. Albicki, "Locally Clocked Microprocessor", Proceedings 
ofIEEE, pp. 47-51, 1993. 
[24] J. A. Tierno, A. J. Martin, D. Borkovic and T. K. Lee, ”A 100-MIPS GaAs 
Asynchronous Microprocessor", Proceedings of IEEE, pp. 43-49, 1994. 
[25] C. M . Chang and S. L. Lu, "Design of a Static M I M D Data Flow Processor 
Using Micropipelines", Proceedings of IEEE, Vol. 3, No. 3, pp. 370-378, 
September 1995. 
26] Y. W . Pang and C. S. Choy, "An Asynchronous Matrix Multiplier", 
Proceedings ofIEEE, pp. 315-318, 1995. 
[27] B. Prince, "Static RAM Architecture", Semiconductor Memories, Wiley, 
Chapter 5, pp. 149-166, 1992. 
[28] S. B. Furber and P. Day, "Four-Phase Micropipeline Latch Control Circuits", 
Department of Computer Science, Manchester University. 
Page R-2 
References Asynchronous Memory Design 
[29] E. Seevinck, "A Current Sense-Amplifier for Fast C M O S SRAMs", VLSI 
Circuits Symposium Digital Technical Papers, pp. 71-72, 1990. 
[30] P. Y. Chee, P. C. Liu and L. Siek, "High-Speed Hybrid Current-Mode Sense 
Amplifier for C M O S SRAMs", Electronic Letters, Vol. 28, No. 9, pp. 871-873, 
April 1992. 
31] W . Y. Sit, C. S. Choy and C. F. Chan, "A Study of Current Sensing Technique 
in Designing Asynchronous Static R A M for Self-Timed Systems", Electronic 
Letters, Vol. 33, No. 8, pp. 667-668, April 1997. 
[32] M . E. Dean, D. L. Dill and M . Horowitz, "Self-Timed Logic Using Current-
Sensing Completion Detection (CSCD)", Proceedings ofIEEE, pp. 187-191, 
1991. 
[33] M . Izumikawa et aL, "A 400MHz, 300mW, 8KB, C M O S S R A M Macro with a 
Current Sensing Scheme", IEEE 1994 Custom Integrated Circuits Conference, 
pp. 595-598, 1994. 
[34] K. Ishibashi et al., "A 12.5ns 16MB C M O S S R A M with Common-Centroid-
Geometry-Layout Sense Amplifiers", IEEE Journal of Solid-State Circuits, 
Vol. 29, No. 4, pp. 411-416, April 1994. 
[35] P. H. Voss et al., "A 14ns 256Kxl C M O S S R A M with Multiple Test Modes", 
IEEE Journal of Solid-State Circuits, Vol. 24, No. 4, pp. 874-880, August 
1989. 
[36] J. P. Uyemura, "Design of Basic Circuits", Circuit Design for CMOS VLSI, 
Kluwer Academic Publishers, Chapter 8, pp. 345-394, 1993. 
[37] C. J. Nicol and A. G. Dickinson, "A Scalable Pipelined Architecture for Fast 
Buffer SRAMs", IEEE Journal of Solid-State Circuits, Vol. 31, No. 3, pp. 419-
429, March 1996. 
[38] Y. W . Pang, "A Novel Asynchronous Cell Library for Self-timed System 
Design", Master of Philosophy Thesis, Department of Electronic Engineering, 
The Chinese University ofHongKong, December 1994. 
[39] Y. W . Pang, W . Y. Sit, C. S. Choy, C. F. Chan and W . K. Cham, "An 
Asynchronous Cell Library for Self-timed System Designs", IEICE 
Transactions, Vol. E80-D, No. 3, March 1997. 
[40] K. J. Schultz et al., "Low-supply-noise low-power embedded modular", IEE 
Proc.-Circuits Devices Syst., Vol. 143, No. 2, April 1996. 
Page R-3 
References Asynchronous Memory Design 
[41] H. Okamura et aL, "A lns, lW, 2.5V, 32 Kb N T L - C M O S S R A M Macro Using 
a Memory Cell with P M O S Access Transistors", IEEE Journal of Solid-State 
Circuits, Vol. 30, No. 11, November 1995. 
'42] "ECPD 07 Dual Layer Metal 0.7 Micron Electrical Rules", ES2, AGl-DR17 
Rev. A., October 1994. 
Page R-4 
Appendix [ Asynchronous Memory Design 
9. APPENDK 
9.1 HSPICE SEMULATION PARAMETERS 
The HSPICE simulation parameters under typical, fast and slow simulation 
conditions are obtained from the ES2 0.7 ^ im Electrical Rules [42]. 
9.1.1 TYPICAL SEVnJLATION CONDITION 
*H:*********H:***H=**>H******H=**********=l=*********=l=*****************=i=*=l=**** 
*** ECPD07 *** HSPICE Lev 6 Rev 4.00 ***** F. JEULAND *** 15-Oct-93 *** 
:C**H:H=**=K**=l=*H=**H:*******H=*****=l=*H=**********=t=**=K=l=*******=i"=l=*************** 
* * 
** Improvements : 
** . Update on more recent silicon ** 
** • Better compromise between analog and digital requirements ** 
* * = ( : = ^ 氺 * * * * = ! < > ! = = ) ： = ! = * = ) ! * = ( = * * = ) = * * * * = ! = * * * * * * * * * > ) = * * * * * * * * > ! = * * * * = ! = * * * * = < = * * * * * * * * * * * * * * * * * *H=***=K**************H=*H=*=I=*****************=>=**************************** * * 
* Warnings : 1/Those parameters have been determined and validated * 
* in the following range: * 
* W > 1 um ； L > 0.8 um * 
* |Vdsl < 5.5 V * 
* |Vgs|>1.2V * 
* |Vbs| < 5.5 V (best fit for Vbs < 2V) * 
* 25C<T<125C * 
* * 
* 2/ The weak inversion mode is included in this model. * 
* The subthreshold parameters do not take into account the * 
* process spreads. * 
* * 
* 3/ Because of a particular geometrical dependence of * 
* threshold voltage, VSH, NWM and UTRA parameters have * 
* negative values flagged by HSPICE as a "warning". Please * 
* do not take those "warnings" into account. * 
* * 
**=(:**H==i<**=l=********************=^*=t=********=l=***************************** 
**************** TYPICAL CASE 水氺术 * ^ =氺木本=^ = ^水本木+本本水本本本 
.LIB TYP 
OPTIONS TNOM = 27.0 SCALM = 1 ASPEC = 0 
.MODEL T_NMOS NMOS 
+ LEVEL = 6.0 UPDATE = 1.0 
+ NSUB = 1.965e+16 TOX = 150.0 BETA - 97e-6 
+ XJ 二 0.25u 
+ RSH = 65 
+ MOB = 2 
+ F1 = 372k F2 = 200m UTRA = 563m 
+ XL = 0.04u LD = O.lu 
+ XW = 0.9u WD = 0.45u 
+ VTO = 815m 
+ NSS = 0.0 
+ NWE = 193.7n 
+ UFDS - 99.5m VFDS = 0.2 FDS = 84m 
+ VBO = 1.50 GAMMA = 764m LGAMMA = 705m 
+ VSH = 650m NWM = -197m SCM 二 1.733 
PageA-1 
Appendix Asynchronous Memory Design 
+ WIC = 2 NFS = 5E11 WEX = 1 7 
+ LAMBDA = 10.63u 
+ NU = 1 
+ KU = 1.405 ECRIT = 87k MBL = 555m 
+ KA = 974m MAL = 295m 
+CLM = 3 
+ KCL = 1.08 MCL = 4.63 
+ TLEV = 1 
+ BEX =-1.5 TCV =-2m 
+ CAPOP = 2.0 ACM = 2.0 
+ CJ = 503u MJ = 0.43 
+ CJSW = 109p MJSW = 0.43 PB = 0.675 
+ CGDO = 200p CGSO = 200p JS = 2u 
.MODEL T_PMOS PMOS 
+ LEVEL = 6.0 UPDATE = 1.0 
+ NSUB = 2.5e+16 TOX = 150.0 BETA = 30.16e-6 
+ XJ = 0.5u 
+ RSH = 80 
+ MOB = 2 
+ F1 = 483k F2 = 320m UTRA =-197m 
+ XL = 0.042u LD = O.lu 
+ XW = 0.9u WD = 0.45u 
+ VTO = -1 
+ NWE 二 56.1n 
+ UFDS = 331m VFDS =0 .5 FDS = 286m 
+ VBO = 1.5 GAMMA = 587m LGAMMA = 653m 
+ VSH = -116m NWM = -442m SCM = 1.01 
+ WIC = 2 NFS =6E11 WEX = 14.3 
+ LAMBDA = 14u 
+ NU = 1 
+ KU = 9.871 ECRIT =486k MBL = 803m 
+ KA = 1.082 MAL =0 
+ CLM = 3 
+ KCL = 41.91m MCL =6.97 
+ TLEV = 1 
+ BEX =-1.0 TCV = 1.52m 
+ CAPOP = 2.0 ACM = 2.0 
+ CJ = 776u MJ 二 0.51 
+ CJSW = 572p MJSW = 0.51 PB =0 .7 
+ CGDO = 200p CGSO = 200p JS = 20u 
.MODEL ND D 
+ CJA = 503u CJP = 109p 
+ EXA = 0.43 EXP = 0.43 PB 二 0.675 
.MODEL PD D 
+ CJA = 776u CJP = 572p 
+ EXA = 0.51 EXP =0.51 PB = 0.7 
* * 本 * * = ^ * * * * * * * * * * * * * * * * * * * > | = * * * * * * * * * * * = ^ * * * * * * * 氺 * 氺 * * * * * * * * * 氺 * * * * * * * * * * * * * 
.ENDL TYP 
Page A-2 
Appendix Asynchronous Memory Design 
9.1.2 FAST SEVRJLATION CONDITION 
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 水 水 氺 氺 氺 氺 
*** ECPD07 *** HSPICE Lev 6 Rev 4.00 ***** F. JEULAND *** 15-Oct-93 *** 
* * * * 氺 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 水 > | < * 氺 本 > | < * 氺 氺 水 氺 = | " 氺 氺 本 本 氺 水 木 本 * * * * * 水 木 * 本 * 水 = | = * * 木 * 氺 
**********>!=***** pAS7 CASE 氺氺氺+氺氺=^木>1<氺*>1<*本+氺氺氺氺水 
.LIB FAST 
.OPTIONS TNOM = 27.0 SCALM = 1 ASPEC = 0 
.MODEL F_NMOS NMOS 
+ LEVEL = 6.0 UPDATE = 1.0 
+ NSUB = 1.965e+16 TOX = 135.0 BETA = 107e-6 
+ XJ = 0.25u 
+ RSH = 55 
+ MOB = 2 
+ F1 = 372k F2 = 200m UTRA = 563m 
+ XL =-0.08u LD = O.lu 
+ XW = 1.08u WD = 0.45u 
+ VTO 二 715m 
+ NSS = 0.0 
+ NWE = 193.7n 
+ UFDS = 99.5m VFDS = 0.2 FDS 二 84m 
+ VBO = 1.50 GAMMA = 714m LGAMMA = 655m 
+ VSH = 650m NWM = -197m SCM = 1.733 
+ WIC 二 2 NFS = 5E11 WEX = 1 7 
+ LAMBDA = 10.63u 
+ NU = 1 
+ KU = 1.405 ECRIT = 87k MBL = 555m 
+ KA = 974m MAL = 295m 
+ CLM = 3 
+ KCL = 1.08 MCL = 4.63 
+ TLEV = 1 
+ BEX =-1.5 TCV =-2m 
+ CAPOP = 2.0 ACM = 2.0 
+ CJ = 453u MJ = 0.43 
+ CJSW = 90p MJSW = 0.43 PB 二 0.675 
+ CGDO = 220p CGSO = 220p JS = 2u 
.MODEL F_PMOS PMOS 
+ LEVEL = 6.0 UPDATE = 1.0 
+ NSUB = 2.5e+16 TOX = 135.0 BETA = 33.4e-6 
+ XJ = 0.5u 
+ RSH = 70 
+ MOB = 2 
+ F1 = 483k F2 = 320m UTRA = -197m 
+ XL = -0.078u LD = O.lu 
+ XW = 1.08u WD = 0.45u 
+ VTO 二 -0.9 
+ NWE = 56.1n 
+ UFDS = 331m VFDS = 0.5 FDS = 286m 
+ VBO = 1.5 GAMMA = 537m LGAMMA = 603m 
+ VSH = -116m NWM = -442m SCM = 1.01 
+ WIC = 2 NFS 二 6E11 WEX = 14.3 
+ LAMBDA 二 14u 
+ NU = 1 
+ KU =9.871 ECRlT =486k MBL = 803m 
+ KA = 1.082 MAL =0 
+ CLM = 3 
+ KCL = 41.91m MCL =6.97 
+ TLEV = 1 
+ BEX =-1.0 TCV = 1.52m 
+ CAPOP = 2.0 ACM = 2.0 
+ CJ 二 706u MJ = 0.51 
+ CJSW - 522p MJSW = 0.51 PB = 0.7 
+ CGDO 二 220p CGSO = 220p JS = 20u 
.MODEL ND D 
+ CJA = 453u CJP = 90p 
+ EXA 二 0.43 EXP = 0.43 PB 二 0.675 
.MODEL PD D 
+ CJA 二 706u CJP 二 522p 
+ EXA = 0.51 EXP = 0.51 PB = 0.7 *******************=^***•**********•********•*****•******•*********•**** 
.ENDL FAST 
Page A-3 
Appendix Asynchronous Memory Design 
9.1.3 SLOW SEVnJLATION CONDITION 
****>|<******=|c***************************************H<************=l"****** 
*** ECPD07 *** HSPICE Lev 6 Rev 4.00 ***** F. JEULAND *** 15-Oct-93 *** 
* * * * * * * * * * 氺 * * * * * * * * * * * * * * 氺 * * * * * * * * * = | < * * 氺 * + 氺 = | < 木 水 = | < + 本 本 氺 * 木 水 木 * 水 + 氺 水 木 = " = 氺 本 本 氺 * 水 * 水 本 木 
**************** SLOW CASE ******************** 
.LIB SLOW 
.OPTIONS TNOM = 27.0 SCALM = 1 ASPEC = 0 
.MODEL S_NMOS NMOS 
+ LEVEL = 6.0 UPDATE = 1.0 
+ NSUB = 1.965e+16 TOX = 165.0 BETA = 88e-6 
+ XJ = 0.25u 
+ RSH = 75 
+ MOB 二 2 
+ F1 = 372k F2 二 200m UTRA = 563m 
+ XL = 0.16u LD = O.lu 
+ XW = 0.72u WD = 0.45u 
+ VTO = 915m 
+ NSS = 0.0 
+ NWE = 193.7n 
+ UFDS = 99.5m VFDS = 0.2 FDS 二 84m 
+ VBO = 1.50 GAMMA 二 814m LGAMMA 二 755m 
+ VSH = 650m NWM = -197m SCM = 1.733 
+ WIC = 2 NFS = 5E11 WEX = 1 7 
+ LAMBDA = 10.63u 
+ NU = 1 
+ KU = 1.405 ECRIT = 87k MBL = 555m 
+ KA = 974m MAL = 295m 
+ CLM = 3 
+ KCL = 1.08 MCL =4.63 
+ TLEV = 1 
+ BEX =-1.5 TCV =-2m 
+ CAPOP = 2.0 ACM 二 2.0 
+ CJ = 553u MJ = 0.43 
+ CJSW = 130p MJSW = 0.43 PB 二 0.675 
+ CGDO = 180p CGSO =180p JS = 2 u 
MODEL S_PMOS PMOS 
+ LEVEL = 6.0 UPDATE = 1.0 
+ NSUB = 2.5e+16 TOX 二 165.0 BETA - 27.4e-6 
+ XJ 二 0.5u 
+ RSH = 90 
+ MOB = 2 
+ F1 = 483k F2 二 320m UTRA = -197m 
+ XL = 0.162u LD 二 O.lu 
+ XW = 0.72u WD = 0.45u 
+ VTO = -1.1 
+ NWE = 56.1n 
+ UFDS = 331m VFDS = 0.5 FDS = 286m 
+ VBO 二 1.5 GAMMA = 637m LGAMMA = 703m 
+ VSH = -116m NWM = -442m SCM 二 1.01 
+ WIC 二 2 NFS = 6E11 WEX = 14.3 
+ LAMBDA = 14u 
+ NU 二 1 
+ KU = 9.871 ECRIT =486k MBL = 803m 
+ KA = 1.082 MAL =0 
+ CLM = 3 
+ KCL = 41.91m MCL =6.97 
+ TLEV 二 1 
+ BEX --1.0 TCV = 1.52m 
+ CAPOP = 2.0 ACM = 2.0 
+ CJ = 846u MJ - 0.51 
+ CJSW = 622p MJSW = 0.51 PB = 0.7 
+ CGDO = 180p CGSO =180p JS = 20u 
.MODEL ND D 
+ CJA - 553u CJP 二 130p 
+ EXA = 0.43 EXP = 0.43 PB = 0.675 
.MODEL PD D 
+ CJA 二 846u CJP = 622p 




Appendix Asynchronous Memory Design 
9.2 SRAM CELL LAYOUT AND NETLIST 
The layout of the six transistors S R A M cell is shown in Figure 9-1. Its height 
is 27.6 ^ im and its effective width is 13.6 ^ im with an area of 375.36 ^ iml Actually, 
its area is not as small as those in commercial use nowadays. However, its area is 
almost the minimum if A T M E L ES2 C M O S 0.7 ^ im technology is employed. This 
cell can still be used in our research since if the proposed asynchronous methods 
,_ _ _ . • • . . - . • • . - _ - • • - - - - • - • • - - - • • • i I 11 • - I • ,T • • • * • • • • • • “ “ • • • ‘ • ‘ • _ _ • _ 5' • • “ “ • ‘ “ “ • •峰• ~ ‘ • • • ‘ • • ‘ ‘ ‘ • • • ‘ “ • / *S ‘ 、 11 
I -、-、-、、. ：國：： W M , , \ \、、:,:' I N-wel l implont 
^ ^ ^ # 圓一 _ _ _ _ _ _ •— 
騰 議 _ , _ — 
：,-'•' :jj:^  o^..…^L^•….德-….^ •；‘、、‘ ::-。 |......,j 
:燃甚_ I • � , H p + 一 漏鳳0__棚_ •— __:__• ::::_ ^ | _ l _ _ _ • -IB|i|^M|Bl M g i j M ^ f ^ ^ 
Figure 9-1: Six Transistors S R A M Cell Layout 
Page A-5 
Appendix Asynchronous Memory Design 
work with this S R A M cell, the methods can also work with other S R A M cells. 
Below is the netlist extracted from the S R A M cell layout. To determine the 
bit line loads Cg and C^B for each cell, we add all the parasitic capacitances 
associated with them. From the netlist: 
Cj3 = (0.0003 + 0.0028 + 0.0002)“„ + (0.0028 + 0.0139X,iine = 0.0200 pF / cell 
CBB = (0.0003 + 0.0028 + 0.0002^ + (0.0028 + 0.0139)b_e = 0.0200 pF / cell 
* net 0 = gnd! 
* n e t l = / B B 
* net 2 二 /SB 
* net 3 = /S 
* net 5 = /WL 
* net 6 = /vdd! 
* net 7 = /B 
* pd(0) 二 /+42 
D0 6 6pd AREA=8e-12 PJ=1.2e-05 
* pd(l) = /+41 
D1 2 6 pd AREA=4.4e-12 PJ=7.06e-06 
* pd(2) 二 /+40 
D2 6 6pd AREA=4.8e-12 PJ=6.12e-06 
* pd(3) 二 /+39 
D3 3 6 pd AREA=4.4e-12 PJ=7.06e-06 
* pd(4) = /+38 
D4 6 6 pd AREA=8e-12 PJ=1.2e-05 
* nd(5) = /+37 
D5 0 0 nd AREA=8e-12 PJ=1.2e-05 
* nd(6) = /+36 
D6 0 2 nd AREA=6.4e-12 PJ=le-05 
* nd(7) = /+35 
D7 0 1 nd AREA=4.8e-12 PJ=8.8e-06 
* nd(8) = /+34 
D8 0 0 nd AREA=4.8e-12 PJ=5.6e-06 
* nd(9) = /+33 
D9 0 7 nd AREA=4.8e-12 PJ=8.8e-06 
*nd(10) = /+32 
D10 0 3 nd AREA=6.4e-12 PJ=le-05 
* nd(ll) = /+31 
Dl l OOnd AREA=8e-12 PJ=1.2e-05 
* capacitor(12) = /+30 
cl2 0 5 6.59465e-15 
* capacitor(13) = /+29 
cl3 6 0 2.38056e-14 
* capacitor(14) 二 /+28 
cl4 1 5 3.11407e-16 
* capacitor(15) = /+27 
cl5 1 0 2.77819e-15 
* capacitor(16) = /+26 
cl6 1 6 2.7684e-15 
* capacitor(17) = /+25 
cl7 2 5 6.48027e-18 
* capacitor(18) = /+24 
cl8 2 0 2.32935e-15 
* capacitor(19) = /+23 
cl9 2 6 2.86783e-15 
* capacitor(20) = /+22 
c20 2 1 7.65333e-18 
PageA-6 
Appendix — Asynchronous Memory Design 
* capacitor(21) = 1+21 
c21 3 5 7.13333e-18 
* capacitor(22) = /+20 
c22 3 0 2.97383e-15 
* capacitor(23) = /+19 
c23 3 6 2.22335e-15 
* capacitor(24) = /+18 
c24 3 1 1.89e-19 
* capacitor(25) = /+17 
c25 3 2 7.19329e-16 
* capacitor(26) = /+16 
c26 7 5 3.11407e-16 
* capacitor(27) = /+15 
c27 7 0 2.77819e-15 
* capacitor(28) = /+14 
c28 7 6 2.7684e-15 
* capacitor(29) = /+13 
c29 7 1 2.11222e-16 
* capacitor(30) = /+12 
c30 7 3 8.35415e-18 
* capacitor(31) = /+ll 
c31 0 5 8.12544e-15 
* capacitor(32) = /+10 
c32 6 0 5.08096e-14 
* capacitor(33) = /+9 
c33 1 0 1.38676e-14 
* capacitor(34) = /+8 
c34 2 0 2.46634e-14 
* capacitor(35) = /+7 
c35 3 0 2.46634e-14 
* capacitor(36) = /+6 
c36 7 0 1.38676e-14 
.model model4 pmos level=2 vto=-0.7 gamma=0.4 kp=1.5e-05 
+lambda-0.03 tox=6e-07 
* pfet(37) 二 /+5 
m37 2 3 6 6 model4 w=2u l=0.8u 
* pfet(38) = /+4 
m38 6 2 3 6 model4 w=2u l=0.8u 
.model model5 nmos level=2 vto=0.7 gamma=0.2 kp=3e-05 
+lambda=0.02 tox=6e-07 
* nfet(39) = /+3 
m39 1 5 2 0 model5 w=lu l=0.8u 
* nfet(40) = /+2 
m40 2 3 0 0 model5 w=2u l=0.8u 
* nfet(41) = /+l 
m41 0 2 3 0 model5 w=2u l-0.8u 
* nfet(42) = /+0 
m42 7 5 3 0 model5 w=lu l=0.8u 
Page A-7 
Appendix Asynchronous Memory Design 
9.3 TEST CfflP SPECmCATIONS 
9.3.1 GENERAL SPEOTICATIONS 
^^ ^^ B3jxJ : 、 \ : 、 、 … ’ 激 」 
l>ic Si/c 21.8 mm- -
Technology CMOS 0.7 fun ATMEL ES2 
Pin Number 46 pins -
~~Package DIL 48 : -
Voltage 0 - 5 V~~ Standard CMOS voltage level 
Current 40 - 60 mA Depending on types of sense amplifiers in use 
Power 200 - 300 mW Depending on types of sense amplifiers in use 
Temperature 0 -70 ^ -
~Frequency 14.5 - 22.2 MHz~~Read Operation: Conventional Sense Amplifier 
22.2 - 27.8 M H z Read Operation: Modified Sense Amplifier~~~ 
41.7 M H z Write Operation 
Table 9-1: Test Chip General Specifications 
PageA-8 
Appendix Asynchronous Memory Design 
9.3.2 PEVASSIGNMENT 
I ^ W ^ ^ R j ^ ^ ^ ^ — ^ ^ ^ ^ ^ ^ B H r f i M i B i M ^ l i — ^ ^ ^ ^ B 
1 XD3 Voltage-controlled delay chain input  
2 TD4 Voltage-controlled delay chain control  
3 TD5 Voltage-controlled delay chain output  
; ; ; v s s ~ “ ^ 
Z J Z I Z y ^ vss 
"""^6 PVSS~ Peripheral VSS  
z n _ - _ _ 
§~~\ VPC Precharge delay control 
9~~J ~~VWL Word line delay control 
""""fO~1 ~~VWC3~ Write completion delay control 3  
~ T T 1 ~~VWC2~ Write completion delay control 2  
12 i ~~VWC1 “ Write completion delay control 1  
~ 1 3 VWCO~~ Write completion delay control 0  
~~f4 SAl Sense amplifier selection 1  
15 SAO Sense amplifier selection 0  
16 • VDD VDD 
~ ~ y f ~ _ ^ ^ ^ VDD 
~~TS~^~PVDD~~ Peripheral VDD  
~~i9~~ ACK Acknowledge  
~~20~" R ^ Read / Write selection  
21 — REQ — Request 
~~22~~ D03 Data output 3 (Conventional sense amplifier benchmark)  
~~23~ D02 Data output 2 (Conventional sense amplifier)  
~~24~ DOl Data output 1 (Modified sense amplifier)  
~ ^ D ^ Data output 0 (Modified sense amplifier benchmark)  
~ " ^ DB Data input 3 (Conventional sense amplifier benchmark)  
~~Yi W1 Data input 2 (Conventional sense amplifier)  
~"28 DU Data input 1 (Modified sense amplifier)  
~ " ^ DlO Data input 0 (Modified sense amplifier benchmark)  
~~30 PVSS Peripheral VSS  
~31~" VSS VSS 
- 3 T " VSS 一 VSS  
“ “ ^ A9 Address input 9  
~~5¾ A8 Address input 8  
~~35 A7 Address input 7  
~~35 A6 Address input 6  
~~37 A5 Address input 5  
~ ^ ^ A4 Address input 4  
~~39 A3 Address input 3  
~~40 A2 Address input 2  
~~4i X\ Address input 1  
"~42 ^ Address input 0  
~~43 p v D D ~ ~ Peripheral VDD  
~"55 VDD ‘ VDD 
^ V D D ~ VDD 
“~"46 j u Inverter chain input  
~~47 TO Inverter chain output  
4 8 」 - - 一 
Table 9-2: Test Chip Pin Assignment 
Page A-9 
Appendix Asynchronous Memory Design 
9.3.3 TESflNG DLVGRAMS AND SPEOTICATIONS 
I I 隣 _ _ _ 国 
_醒___ _ _ _ i i i i i i _ _ _ _ l l H i l W ^ 
j | | M : j J 
I i I Completion 
I ； I Time 
• • M 
||ll|[illlI||i|^  
I j 1 Acknowledge|4- — 
ilili:iiiW^  , 
iiiiiiii:ii;iiii^ ^ 
i I 
I ！ I 
Sender's I Receiver's ； Sender's and Receiver's | Sender's 
Active Phase ； Active Phase I Recovery Phases | Active Phase 
i ； � 
(a) Read Operation 
REQ ^ _ _ i | | 
REQwL j [\ L  
i 1 1 
CW I , L  
i i Write 
I I 丨 Completion  
WC I L Time A  
I ! Write \ 
！ i Acknowledge +pj  
A C K ! 1^  Time ^ ' 
i i ! 
i ； I i 
Sender's 1 Receiver's ！ Sender's and Receiver's | Sender's 
Active Phase 1 Active Phase i Recovery Phases | Active Phase 
‘ I 
(b) Write Operation 
| | 4 : : : : : f E : : : ^ 
Page A-10 
Appendix Asynchronous Memory Design 
H f T ^ ^ > > ^ ^ ,、’ I …“丨 
4 ,_",’— 
Region 1 Read Acknowledge Time (ns) 26.00 50.38 
Region 2 Read Acknowledge Time (ns) 24.58 42.71 
Region 3 Read Acknowledge Time (ns) 22.14 3 5.14 
T ^ ^ p i S S W Read Acknowledge Time (ns) 16.51 26.28 
Regions 1 - 4 Write Acknowledge Time (ns) 9.52 
Table 9-3: Test Chip Timing Specifications 
9.3.4 SCHEMATICS AND LAYOUTS 
In this section, the schematics and layouts of the main blocks that constitute 
the asynchronous memory system are shown. The standard memory components 
which include memory cell, address decoders, sense amplifiers and buffers are 
depicted in Section 9.3.4.1. The D V S C D and M D C G components which include 
transmission gates, control circuit, read / write completion generation circuits and 
voltage-controlled delay are depicted in Section 9.3.4.2. 
Page A-11 
Appendix Asynchronous Memory Design 
9.3.4.1 STANDARD MEMORY COMPONENTS 
i h C A G E N O . |0WGN0 . ^ . |5H IPEV || 
1 I Vincent Sit _^__ 
r REVISIONS 
ZONE REV DESCRIPTION DATE APPROVED 
丁 
_^h r^r" 
。：p _ 1 » 
t i ^ ^ 
— ^^  ^ 
i i i 
m i § 
Jun 12 22:32:26 1997 CUHK EE ‘ 
。剛 — Six l rans is tors 
一 SRAM Cell , 
CHECKED ^~"|CAGE NO. |OWG NO. ^ , “ “ ^^V 
A Vincent Sit 一 
ISUED 1 I :~~;:rp~:r 
SCALE |SHEET 1 QF I 
A 
i -、.、-〉 ：誦：: i[H , , \ \ V I N-well implont 
^ ^ m t t 圓 一 。 _ : • _ • _ _ — _ f _ _ _ — 
I ^ * H i F s I _ — _ _ _ 『 1 























































































































































Appendix Asynchronous Memory Design 
i l | C A G E N O . |DWCNO... , ^ . , ISH. |REV |l 
|_ I Vincent Sit I 1 I 
I ‘ REVISIONS 
ZONE REV DESCRIPTION DATE APPROVED 
—1/0.8 ^ ^ 1/0.8 _^ +.8 ^^'/0.8 _^ 0^.S ~^^'-' • 
„ Vr^/0.8 
Ai w ^Ht 
_ V[73/0.8 . 
A2 P ^ H t 
..1 nf^ 
^ > M 敏 H^^ 夺 
A5 » H P ' ' 
CS i J H " + 0 . 8 
Jun 12 22:26:30 1997 CUHK EE 
D_ 2nd Stage Uecoder 
CHECKED 
CHECKED 11^ “ |CAGE NO. DWG NO. ‘ • 附 A Vincent Sit . |_D 1^ I |sHET 1 OF 1 1 
牛 
I 圓 i sS^SE=! .•-•'"" i 
^ ^ M ^ ^ ^ ^ ^ ^ ^ 1 
_ f a _ _ 
_ _ _ _ _ _ 
3 i i E i f i t i l ^ f t 
^ ^ E E E i H ^ ^ p p p g | | 































I I I 
l ^ s ^ 
M























































 「 f ^ ^ M i i ^ ^ ^ H 
I : J t 4 s ^
 ^




 ^ ¼ ^ :
 j : -
 l : ^ ^ ^ _ ^ ^ j l
 
-
















































. & K § ^ ,
 ^




i ^ g ^ H 
t ^ ^ ^ ^ . . 1 -
 
! ! ^ ^ ^ ¾ 
叫 —
 ^ 4 ^ M ^ t u ^ .
卿
 
一 ; ^ ^ ^ ^ , i ^ ^ ^ s
 ^ 
. ^ ^ s ^
 _ 屬
 I ^ H l ^ B _
 
-
^ ^ ^ f c ^ ^ ^ ^ .
 
一 ™ ^ ^ ^ f i
 - I
^ ^ S M 
一 : :
^ ^ ^ ^ ^ ^ ^ ^ 龙 ^ ^ ^ ^ ^
^ ^ ^ 一
 
場 








Appendix Asynchronous Memory Design 
i | | C A G E N O . |DWGNO... , ^ . , |SH |REV l| 
I 1 Vincent Sit 1  
I ‘ REVISIONS 
ZONE REV DESCRIPTION DATE APPROVED . 
""\-__^ wpA=80/lpA = 0.8 
^"""~"\^ wnA=40/lnA = 0.8 
A1 » » — W - : : > - ^ » , : 丨 
^ . - ^ wpE=10/lpE=0.8 
^ ^ ^ | | wnE=5/lnE = 0.8 
\ ^ wpA=80/lpA = 0.8 
^ - v ^ wnA=40/lnA=0.8 
A2 » > ^ lri J > ^ » » Y 2 
^ ^ wpE=10/HpE = 0.8 
Z 5 wnE = 5/lnE = 0.8 
U^ 1^ 
^ \ ^ ^ ^ wpA=80/lpA = 0.8 
^ - v ^ wnA=40/lnA=0,8 
A3 1^  ：~^丄 lri J>-^ »Y3 
^ ^ wpE=10/lpE=0 8 ^ 
_ ^ ^ ^ z wnE=5/lnE = 0.8 ^ 
~"\«_^ wpA=80/lpA = 0.8 
^ \ ^ ^ ^ wnA=40/lnA = 0.a 
A4 ^ 丄 tri ^ ^ > ^ ^ V 4 
^ ^ ^ wpE=10/lpE=0.8 
^ ^ ^ § wnE=5/lnE = 0.8 
k 
Jun 12 23:56:10 1997 CUHK EE 
“删 — Read Butter 
CHECKED 
CHECKED ^ ~ | C A G E NO. |OWG NO. . . ~ ~ REV A Vincent Sit 卜删 ||;^:;:^  I |SHE.T 1 OF 1 
L\ 
I ] 
i i i : j^ 
| ! :國 5 : \ 
^^ B s s 
« 1 1 羅;i: v^  




l^^ yU3ffintit^  
Page A-17 
Appendix Asynchronous Memory Design 
r， CACe NO |owc NO, ,. • 广 . , |SH |REV || 
\7 I Vincent Sit | 1  
REV(SlONS 
ZONC REV DESCRIPTION DATE APPROVED 
f S ^ S ？ i t f 
申 k , 申 _ , 申 申 
j 1 1 1 1 1 1 1~"m^  
oora <o„ 03 to CD 00 00 cn oD 03 oqoq «^00 ^ cq oq co « ? « ® co P P =^ oo ^ ® ® to f^  -3 fT ii ?T ^^  ^^  T? 7 7 ?i? II 〒3 n "^  II "f 11 ^^  1^ 4^ ^^  ^^  II ^^  II ^^  |i %^. II 1^  -^^  ^ ^ II S§ il il ii l| 15 i^  11 ” i$ i^ ii if H ^^  
““：.1^ 11 : L^l^  1^ n ^ ^ iJ[ I ^^  _i4 ‘ i^  JiJi ⑷ JiJi : 1^ ” “ ^ ^ 
11 V 1^  II >• ft 11 >• ^ i 11 > §•》 ”> ^i ？ ^ > “ f-i - ^^ i^ -“ 
/ \ ENB / \ ENB / XENB / XENB / XENB / ^NB / \ENB / \〔NB 
. / ； ^ / ^ / ^ 」 / ； ^ ； ^ / ^ / ^ / ^ / ^ 」 ^ 1^ ‘ ^1 <1 ^ < < < < 
II CO 1 «3 II ® 1 ro I： « i «? «^ ； ® f ¥ t !f ^p <1 < <1 
^^r^ ^" r^-^  ^"r^ ^ r " 
5 S 3 s 
Jun 12 23:57:48 1997 CUHK E E 
画" 一 Write Gutter 
CHECKED “~~ 
CHECKED SIZE |CAGE NO. |OWG NO. ’ . ~~~ 附 A Vincent Sit . ISSUED [ I ： ~ 7 T P ~ ~ 7 ^ 










心 .i! ^ * 
i _ 
,t jQ t 删叫 I 彻叫| 。]•!! 
. ““llQ lU9DUIA V 
A2U I • ‘ . ON OMOI 'ON 3QV0] 3ZI5 Q3>i03H0 
03>I33HO 
」9“nq £)6」DlpaJd _ 
^ ^ ^ |_^ p Q L&&i 9?-8G-£Z Zl unp 
Od , 
长 J 夺 « Od 9EucHp3jd “ Od 36jOq3Sjd « Od abjoqosjd * 3d s6joq03jd m 03 CD 03 
CD CD CD CD CD CD CD ① | 
TTTJ T CD CD CD CD 2 03 2 S 5- S" g— §" 
03AOaddV 3lV0 NOIidldDS3a A3d 3NOZ 
SNOISIA3d ‘ I 
m ]IS |U90UIA I T A j A]a HS • ON OMO| ON 39V0 l| 丁 




















 e . ^ -









































 ^ r n ^
^ ^ ^ ^ ^ ^ ^ n p ^ ^ ^ s 
「



































_ Z ^ _ ^ ^ ^ ^ ^ ^ A
 m ^ ^ A _
 ^ ^ ^ ^ ^ ¾ ^ ^ ^ ^ ¾ ^ ¾ ¾ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ¾ ^ ¾
 
一 一
 一 一 
^ S 0
 « —







 . 1 
1 ½ ! , . , ! ¾ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ¾ ¾ ¾ ¾ ¾ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ¾ ¾ ^ - ! ¾ ! 
\ ^ \
 F 7 ^
 9
 H ^ l . . . .
 | ^ ^ .
^ . 、 誇 、 ； # 、 ； 、 ％ 、 ； 、 终 / ^ 、 ； 敎 、 、 ； ； ^ 、 、 ； ^ ^ ^ ; 、 ； ^ ^ ^ < ^ ^ " { ^ ^ { ^ ^ < ^ - . 冬 〈 < 务 、 . 务 、 餐 『 . ： ； ^ - . : 1 、 ： ^ 、 $ 、 > 、 梦 务 、 / < ; ^










 " . 1 ^ 3
^ ^ | ^ 饿 谦 饿 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
甚一 〕 【 一 」 」
. h h . . 〔 ！ ！
f ? ^ t ^ ! j ^ ^ ^ j ? ^ ^ ^ ^ ^ ^ ?
% ! ^ ^ ? s ^ ! ^ ? ! ^ ^ ^

































 f ^ H _
 g ^
 ^ J J
 
. 
I _ 一 ^ ^ ^ ^ ^ 8 _ 棚 鄉 ^ ^ ^ ^ ^ 8 一 _ • 
s m m m m m ^
 f f B l f t _ ^ 





_ _ _ _ _ _ _ = _ _ _ — = — = — J 
i A







Appendix Asynchronous Memory Design 
丄 |CAGeNO. |DWGNO,. , ^. , |SH |REV |l 
V I Vincent Sit ] ；__ 
I “ ‘ “ REVlS(ON5 
ZONE REV DESCRIPTION DATE APPROVED 
I —4-n;^ 画严 ^ 
_ ： T y^—^~^®^pc 
I • H �70/0.8 I ^ 20/0.8 ^ 40/0.8 1 ^ 80/0.8 .. H——]N deioy5 Q——B——gg^inyx>^^-irwx>^^- itw><>^~I 
I > I ^ ^ 10/0.8 '^"^ 20/0.8 ^ 40/0.8 
VPC ^ “ 
WC_ out P> — I 
• ^ 1 I „_^~"X 2_.S 
： 供_1^~>,80/0.8 o 耀 ' 2 W _ X « _ I 
^^1^^¾^ i ^ _ ^ n 
p"^^^2_.8 I i^CW 
长 I I r\ 4_8 I 升 
REO 1> B ^ lH deJoy5 Q _»_~_i——a^ inyX>^i~j F 
^ I ^^ 20/0.9 . 
I I M ^ 丨 ^ 20/0.8 
, \ 40/0.3 , A V ^ 80/0.8 « ^ r \ 20/0.6 rx^ 20/0.8 � \ 20/0.8 " ° "^- b — — ��, A C 
™ ^ . . . 4 V ^ - . ' - ^ ^ = ^ " " ^ X ^ B - ° " ^ > ^ ^ ¾ > ° ^ -A ^ j ^ ^ o : L ^ ^ _ _ _ J " Z ^ 
1 / 20/0.a p® 7 ^ y 20/0.8 “ 1 �14/0.8 ^ 10/0.8 ^^ ^ 10/0.5 I ^ C R 
RC ^  VWL ^ ^ 
J u n 12 2 3 : 5 0 : 0 7 1997 CUHK EE 
— 一"~"hour-Phase HCP“^ — Control Circuit 
CHECKED "ii^i~~CAGE NO. DWG NO. . . R ^ A Vincent Sit 一 . 
ISSUEO 1 I ： “ “ ~ i ^ ~ ~ 1 
SCALE I [SHEET 1 Q H 1 
t 
丨 _ 1 1 ^ ^ ^ ^ ^ ^ g 圖 曙 ^^^fl__i 
國 |^ B^___,= Page A-21 
Appendix Asynchronous Memory Design 
^ r"°' 「 ' " V i n c e n t Sit l"l 厂 
I REMSIONS 
ZONE REV OESCR!PT!ON DATE APPROVED 
B0^^~“~~ B ui rc RC ^ • ^RCC 
BBO ^ l^~日日 u… 
Q ^  
PC 1 ^ ~ ^ ^ ^ € � 
8 1 ^ ~ " “ ^ B u t rc R � ^ ^ “ . R C ' 
6Bl4H^ ~ BB o 10 
Q. 0 
l{ ^ c s , 
B2 m ^ ~ ~ B 5t,rc «C ~ « 1^ f ^。 
4__^ BB2 # ^ ~ BB ^ ⑴ ' ^ 
l-" 0. u 
T ^ 
J I « CS2 
B 3 f ^ ~ « ~ ~ B 5t rc RC ~“——|^RC3 
BB3 蜃 ~ « » ~ ~ eB u 0^  
tL L' j 
!t ^CS3 
|UPDATED “ 广 _ _ _ 
Jun 12 23:54:20 1997 („UHI\ E L 
I "細 — DVSCL) UircLiit 
CHECKED 
I CHECKCO lIzl~~|CAGC NO. |DWC NO. “ “ Z T 7 " I 心 A Vinceiit bit | 
_ ^ scA.t 丨 |sHEa 1 OF I 1 
[：二二一 千 二」 
u H 
I 國 國 _ 種 珍 _i I 1 • l i I i Page A-22 
Appendix Asynchronous Memory Design 
丄 |CAGENO. |DWGNO,. , ^ . , \w. |REV |l 
V II I Vincent Sit I 1 I 
r REVISIONS 
ZONE REV DESCRIPTION OATE APPROVED 
^ 趟 WB0 
I 1 p^" |2^^ 
WC�n 1^ ^——m IN deloy5 � — — • — —l N ^ ^ Q ^ O U T ——•——* ^ WC_out 
> ^y^^ ^ ^ ^ \ 
i ^24 \ ,观 
歉 I L €_ 
VWC0 P ^ * 
I 1 f^n |S^^ 
i.~~«——]N de^y5 Q ——«——IN S ^ O U T ——•~~•: — 1 — ^ 4 " ^ ^ W, 1 m W82 
VWC1 1^ ？ i ^ 
I 1 p<^Js^ 
'.——a IN deloy5 Q ——•——IN ^ L OUT ——•——>• ^ ^ 
• ^ Z P ^ • 
1 !•^― ^  « W2 
m WB3 
VWC2 P “ » 
I 1 p>>_^ tron m ^ ^ 
I 费——IN cJelay5 �——»——[N ^^COUT ——• f 
^ ^ ^ > < 
7 i^ 0/20L__d . m 
VWC3 i ^ * 
|UPDATED _ _ 
Jun 12 23:52:41 1997 CUHK EE 
。麵 MDCG Circuit 
CHECKED CHECKEO Sizl~~|CAGE NO, DWG NO. ^ . R[V A Vincent Sit 一 . 
ISSUED 1 I :~~;:rp:~~:r SCALE I |SHEET 1 〇F 1 
平 
腳_^1國„ 嗣鋼__] •ittili •™i___i •^ r__ _  
^ t e | | 

















































 I ^ ^ ^ ^ B K _ ^ S § 
i - r
 i
 ^ ^ H 
聊 J-
 ^ e D J
 ^ ^ ®
 l ^ ^ s 
^







^ ^ ^ ^ ^ ^ ^ ^ ^ ^
 : ; ; ; ¾
^ ^
^ ^ ¾ 




^ ^M J ^ # 
.
i J r r
 







 f _ ^
 ^ ^ w ^ 
II fel
 i ^ ^









 I ^ m s M 
连 _ • _ 
3 _ 















 „ 31 ^^^^^H^^^^^^^^^^^^^^^H 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^1
 雄 ^^^^^l^^^^^^^^^^^^^^^^l 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^1
 - - . , - . . ^^^^^^l^^^^^^^^^^^^^^^^^^^l 
^^^^^^^^^^^^^^^lr>:
 I - .^^H^^^^^^^I 
^^^^^^^^^^^^^^1
 v«r~
 二一 - -一一一一
 二二二 i „ ^.^^H^^^^^^^I 
^^^^^^^^^^^^^^^^^^^^^^^^^^1

























































































































































































































J ^ ^^^H^^^^^^^^^I 
^^^^^^^^^^^^^^^^^^^^^^H






























































































 ， .^^^^^H^^^^^^^^^^^^^^^^I 
^^^^^^^^^^^^^^^^^^^^^^H





























































































































































































































































































































































































































































































 • ^^H^^^^^^^H 
^^^^^^^^^^^^H
 > ^H^^^^^^H 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^B
 Lw, 






:!:•^&Ti^^:z:^\-:'^H'^:.;:r：--.'^ • ••••;'-''•'' •.•.• ‘ >'-':^ '-'-':-' -^ v^r>：,/-;：' •, '-^ '-^ '...^ '. ^ ： :、.•-.-.... ,../:. :.、.：. • : 
- , •‘.‘. •；, . • . ‘ 
,;--^-;^v-'.:''''^v>."^-'V'v. .：>：-^'••.•^,•• . ••:•:&:.、:：.-, .,.,:•.-.-、.'.. ..1 .. , . :..,:-^^''^^'^;'^-^^'r^.N->-^^-''^>.v/ ‘ -' •, •^•••••.^.：•^•i^ . , 。 ’ .：.. ... ：. , ..:... I " . . . • • - , . : ' -
._ , - I：, _.- V--..’. .，. , ‘ - . ‘ ..... ‘ .,1 • -.‘ -_ , . - ’ .... . t .. . 
•^ ••-‘ ••； -:-iV,-^ >:.'.--,-.:v -',.--'^ v.' >/•••. :.:.‘..,",•.:./...,;..':... > •, . , • . . . . • 
.」.:：....“：.:“,:..： .々:？.:...： .：...、乂,:.:；:：；. ..：「:/、.:：..：： ,::'.、.. ••::••:-•,:. . - . : . . - . . 厂 . . • 
；-.^：：^ :^ '^.-.-： v；/-：.^ '^ ••. :^•:/•••••^ ‘ •^•.^ •^, V --••>：• / :..-••.....: ..:. . .: .... . ‘ ‘ • - ‘ :-;:.;r''rV../.-^--v; ；.：•> .:•’.., ：：•:.., ,、:.:.•:•:..,.. .,/ . . ' . ’ . .••. .-.-. . •  .. . .... • •  ... •-• . ：... . .. ,• . . . . ' • . . •, -
.:：,：,,,；：：;/；'-'>：：^-；,;：>'^ -:V':.;'v,"/-' : - .. • . 
... 
. : • • : / : 、 : . 、 ' . , • 
‘ - ‘ •.. -
', 
•, •‘ ‘ 
、:：• • 
|r 
二 . . • . . . . …. ., • 
； / . : : . : : . . ..’ • „• ••• . 
... :.• . . ..::. . . . •  . . . '• ：•‘ . •‘- . . •••. .. . . •  . •• .  .. .. “ . . . . -
l.,�.v.:?:S;...%^^ ^^  ；-.:.: . . . . .j , . : :^.; ‘ . . • . ‘• . 
•； 'V.''.v.::' -V -;.,-.^:r-.'---v' ;V ：：.-：. -".-^ .--V:"".,;' ''^  • -••••：. -•. ••• .-^"- .“-;:‘ . . . . . r.v:^ -;;V ;^/.^ ^ ,"...:^; ;,.：.•； .,.'-';:v- •, .：-•„；；：：. .••.：.：,•；：•.•.： ‘ . ../••• .“..,..:••/. .|.. .:,....... •:... . . . , •.-..:.:... ?；•；：：. . . . . . .. ... . . .  . ..: ... . . . . ,.- :. • • . . . • . . . ； 
CUHK L i b r a r i e s 
圓1圓111111111111 
DD370MES3 
