Formal Design and Verification of an Asynchronous SRAM Controller by Khomenko V et al.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Newcastle University ePrints - eprint.ncl.ac.uk 
 
Khomenko V, Mokhov A, Sokolov D, Yakovlev A.  
Formal Design and Verification of an Asynchronous SRAM Controller.  
In: 17th International Conference on Application of Concurrency to System 
Design (ACSD 2017). 2017, Zaragoza, Spain: IEEE Computing Society. 
 
Copyright: 
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all 
other uses, in any current or future media, including reprinting/republishing this material for advertising 
or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or 
reuse of any copyrighted component of this work in other works. 
DOI link to article: 
https://doi.org/10.1109/ACSD.2017.12  
Date deposited:   
30/03/2017 
 

varies. (The delay is expressed in terms of the length of the
inverters chain needed to match it, with the inverters operating
at the same voltage as the SRAM cell.)
An asynchronous write-acknowledge circuit was proposed
in [6], that adds two transistors to a memory cell implemented
as cross-coupled NOR2 gates to generate an acknowledgement
signal (alternative embodiments with NAND2 gates were also
proposed there). Even if one somehow adapts this idea to
the 6T memory cell, it would require extra two transistors
per cell plus an extra acknowledgement line per column.
Another memory cell design with completion detection was
developed in [4]. It provides true completion detection for
both reading and writing, and is speed-independent and free
from voltage references. Unfortunately, it doubles the number
of transistors per SRAM cell as well as the number of bit-
lines, and so is costly in terms of area. Several asynchronous
SRAM controllers utilising the standard 6T memory cell were
proposed and successfully implemented over the past two
decades. [7] presents a partially speed-independent solution
with dual-rail voltage sensing completion detection in the read
mode and different bundled delays in the write mode. [8] relies
exclusively on bundled delays (with a Schmitt trigger as a
variable delay element) in both read and write modes. [9] relies
on timing assumptions in the write mode as well as during the
bit-line pre-charge operation. [10] uses a duplicated SRAM
column and dummy memory cells, and requires adjustable
voltage references to accommodate variations. These solutions
are clearly not ideal.
Note that the completion detection in the read mode is
relatively simple – one can pre-charge the bit-lines to 1 before
asserting WL, and then wait for one of the bit-lines to switch
to 0 which would indicate the completion. However, in the
writing mode completion detection is not always possible: it
can happen that the new value coincides with the one already
stored in the memory cell, in which case there is no signal that
would indicate the completion. The approach proposed in [4]
(based on the original idea in [11]) copes with this problem
by observing that:
• Completion detection is possible when the new value
differs from the old one (both bit-lines will then flip).
• One can first read the stored value to check if it needs
flipping.
The controller design1 proposed in [4] is shown in Fig. 3.
This design was produced with help of PETRIFY [12] and
then optimised manually, and hence not guaranteed to be
correct (asynchronous circuits are known to be very difficult to
design correctly without formal methods support). As a minor
contribution, we formally verified that the circuit in Fig. 3 is
indeed speed-independent. However, its interface to bit-lines is
not delay-insensitive, and hazards on the output x4 of gate 1
are possible, especially in the low-power mode and when the
bit-lines are buffered or augmented with a sense amplifier,
which is usually the case in practice. To illustrate this problem,
1[4] first proposes a 12T cell with completion detection, but then goes on
to develop a controller for the usual 6T cells.
Figure 3. Asynchronous SRAM controller from [4].
consider flipping the value stored in the memory cell in the
writing mode. In such a case the bit-lines are in the state either
(0,1) or (1,0), and they both will flip, with the one changing
from 1 to 0 flipping first. However, if the bit-lines are buffered,
the controller may observe these two events in any order, in
particular the transient state (1,1) is possible, resulting in a
hazard.
In this paper we propose a new SRAM controller design. It
is based on the same idea, but improves over the design in [4]
on several aspects:
• It was systematically developed, synthesised and formally
verified.
• Its interface to bit-lines is delay-insensitive, which solves
the above problems and makes the controller more robust.
• The reset phase of the controller is more concurrent and
overlaps with the actions of the environment.
Speed-independence and Signal Transition Graphs
The SRAM controller presented in this paper falls within
an important class of speed-independent (SI) asynchronous
circuits, where following the classical Muller’s approach [13]
each gate is regarded as an atomic evaluator of a Boolean
function, with a delay associated with its output. In the SI
framework this delay is positive but unbounded and variable,
i.e. the circuit must work correctly regardless of its gates’
delays, and the wires are assumed to have negligible delays.
Alternatively, one can regard wire forks as isochronic (Quasi-
Delay Insensitive (QDI) circuit class [14]) – then wire delays
can be added to their driving gates delays, i.e. SI≈QDI.
The SI assumptions are reasonable inside a small block,
but often one cannot rely on the block interface wires to
have negligible delays or be isochronic: Sometimes a part
of the block interface (e.g. the interface to bit-lines in our
case) should be delay-insensitive (DI), i.e. the circuit should
work correctly regardless of delays in some external wires
(but if such a wire is forked internally, these internal forks are
considered isochronic). Fortunately, one can easily simulate a
delay on a particular wire in the SI setup simply by adding a
buffer on that wire.





Table I
SIMULATION RESULTS.
voltage (mV) the developed controller the controller from [4]
write time (ps) read time (ps) write time (ps) read time (ps)
no flip flip no flip flip
set reset set reset set reset set reset set reset set reset
1,000 866 60 1,063 60 683 197 687 63 923 49 595 255
900 1,005 68 1,247 69 792 229 792 73 1,083 57 687 297
800 1,214 83 1,537 83 954 274 950 86 1,339 69 824 360
700 1,558 106 2,057 107 1,223 351 1,215 113 1,798 87 1,050 465
600 2,204 150 3,154 149 1,724 492 1,709 159 2,782 122 1,473 662
500 3,696 254 6,053 241 2,881 819 2,858 259 5,413 200 2,453 1,119
400 8,741 554 16,058 549 6,760 1,897 6,751 574 14,487 462 5,749 2,634
350 16,665 1,007 30,576 1,006 12,823 3,553 12,943 1,040 27,561 790 10,940 4,954
300 37,669 2,123 68,519 2,119 28,875 7,903 29,605 2,153 61,667 1,677 24,878 10,991
275 60,266 3,500 162,735 3,265 45,980 12,720 47,557 3,350 152,395 2,570 39,720 17,350
Overhead (%) 22 -3 11 19 14 -35
B. Conformation
We have formally verified that each of the four blocks of
SRAM controller conforms [27] to the environment given by
its STG specification, i.e. these circuits will not produce any
outputs that are unexpected by the environment. Moreover,
the overall circuit conforms to the model of the environment
in Fig. 12. Note that the restrictiveness of this model of the
environment gives a higher confidence in the correctness of
the circuit, as any deviation of the circuit from the restricted
set of permitted behaviours would have been reported: If the
circuit could produce an output that is not expected by this
model, it would have been flagged as an error, whereas a less
strict model might have tolerated it.
C. Deadlock-freeness
For each of the four blocks of SRAM controller we have
formally verified that its STG specification is deadlock-free,
and that its circuit implementation is deadlock-free in the
environment given by its STG. Moreover, the overall circuit
is deadlock-free in the model of the environment in Fig. 12.
V. SIMULATION
Both the developed SRAM controller and the one from [4]
were mapped to FARADAY gate library, with the UMC 90nm
technology process. The parameters of the SRAM were de-
rived from a standard UMC 90nm T6 cell. Several SPICE
simulations of these controllers were then performed on a
range of supply voltages using Cadence Spectre and Analogue
Design Environment. The voltage was reduced in 100mV
step from the nominal 1V, further decreasing the steps at the
sub-threshold voltages (below 400mV). The minimum voltage
at which we could get reliable operation was 275mV. The
simulation results are summarised in Table I, where write time
is the delay between the corresponding events of wr and wa
signals and read time is the delay between the events of rr
and ra signals. The signal events are “registered” at half of
the supply voltage level. Set and reset phases of the write/read
handshakes are measured separately. In the write mode two
scenarios are simulated, when the bit-lines do not flip and
when they flip.
On average, the complete read cycle (both set and reset
phases) of the decomposed circuit is ~3% slower compared to
the design in [4]. In write mode the overhead is ~11% if the
bit-lines flip and ~20% otherwise. Interestingly, the reset of
the read phase is 35% faster in the developed circuit, which is
due to the concurrent de-assertion of ra with the reset of the
internal signals.
Our experiments also showed that the delay-insensitive
implementation of WRITECOMPLETION modules shown in
Fig. 10 is the optimal design choice: The non-DI implementa-
tion of Fig. 9 only improves the write cycle by ~3.5% if there
is no flip of the bit-lines. We believe this marginal speedup
does not justify the reduced robustness. The alternative DI
implementation shown in Fig. 11 slows down the write cycle
by ~3% due to bigger gates on the critical path.
Note that this kind of simulations may be not reliable,
especially for the sub-threshold voltages, and are aimed only
to give a rough idea of the controller’s performance; we leave
a more detailed analysis for future work.
VI. CONCLUSIONS
We designed an asynchronous SRAM controller. It was
inspired by the design in [4], but was systematically developed,
synthesised and formally verified. It is more robust than the
design in [4] due to a delay-insensitive interface to bit-lines,
in particular it fixes the hazard due to the possibility of
transient 11 on buffered bit-lines during the write operations.
In our future work we plan to fine-tune the circuit to push the
minimum operating voltage further down, and to produce it
on silicon to confirm the possibility of its reliable operation
at sub-threshold voltages.
ACKNOWLEDGEMENTS
This research was supported by EPSRC grants
EP/L025507/1 “A4A: Asynchronous design for Analogue
electronics” and EP/K001698/1 “UNCOVER: UNderstanding
COmplex system eVolution through structurEd behaviouRs”.
REFERENCES
[1] S. Priya and D. J. Inman, Energy harvesting technologies. Springer,
2009, vol. 21.
[2] I. Poliakov, D. Sokolov, and A. Mokhov, “WORKCRAFT: a static data
flow structure editing, visualisation and analysis tool,” in Petri Nets and
Other Models of Concurrency. Springer, 2007, pp. 505–514.
[3] “WORKCRAFT homepage, URL: http://www.workcraft.org.”
[4] A. Baz, D. Shang, F. Xia, and A. Yakovlev, “Self-timed SRAM for
energy harvesting systems,” Journal of low power electronics, vol. 7,
no. 2, pp. 274–284, 2011.
[5] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, “A variation-tolerant
sub-200 mV 6-T subthreshold SRAM,” Solid-State Circuits, IEEE
Journal of, vol. 43, no. 10, pp. 2338–2348, 2008.
[6] C. van Berkel and R. Saeijs, “Write-acknowledge circuit including
a write detector and a bistable element for four-phase handshake
signalling,” 1994, US Patent US 5280596 A.
[7] V. W.-Y. Sit, C.-S. Choy, and C.-F. Chan, “A four-phase handshaking
asynchronous static RAM design for self-timed systems,” Solid-State
Circuits, IEEE Journal of, vol. 34, no. 1, pp. 90–96, 1999.
[8] T. Soon-Hwei, L. Poh-Yee, and M. S. Sulaiman, “A 160-Mhz 45-mW
asynchronous dual-port 1-Mb CMOS SRAM,” in Electron Devices and
Solid-State Circuits, 2005 IEEE Conference on. IEEE, 2005, pp. 351–
354.
[9] J. Dama and A. Lines, “GHz asynchronous SRAM in 65nm,” in Asyn-
chronous Circuits and Systems, 2009. ASYNC’09. 15th IEEE Symposium
on. IEEE, 2009, pp. 85–94.
[10] M.-F. Chang, S.-M. Yang, and K.-T. Chen, “Wide embedded asynchron-
ous SRAM with dual-mode self-timed technique for dynamic voltage
systems,” Circuits and Systems I: Regular Papers, IEEE Transactions
on, vol. 56, no. 8, pp. 1657–1667, 2009.
[11] V. Varshavsky, et al., “A self-timed random access memory,” 1988,
USSR Patent.
[12] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and
A. Yakovlev, “PETRIFY: a tool for manipulating concurrent specifica-
tions and synthesis of asynchronous controllers,” IEICE Transactions
on Information and Systems, vol. E80-D, no. 3, pp. 315–325, 1997.
[Online]. Available: citeseer.ist.psu.edu/cortadella96petrify.html
[13] D. Muller and W. Bartky, “A Theory of Asynchronous Circuits,” in Proc.
Int. Symp. of the Theory of Switching, 1959, pp. 204–243.
[14] A. Martin, “Compiling communicating processes into delay-insensitive
VLSI circuits,” Distributed computing, vol. 1, no. 4, pp. 226–234, 1986.
[15] T.-A. Chu, “Synthesis of self-timed VLSI circuits from graph-theoretic
specifications,” Ph.D. dissertation, Dept. of Electrical Engineering and
Computer Science, MIT, 1987.
[16] L. Rosenblum and A. Yakovlev, “Signal graphs: from self-timed to timed
ones,” in International Workshop on Timed Petri Nets, Torino, Italy,
1985, 1985.
[17] T. Murata, “Petri nets: Properties, analysis and applications,” Proceed-
ings of the IEEE, vol. 77, no. 4, pp. 541–580, 1989.
[18] V. Khomenko, M. Koutny, and A. Yakovlev, “Logic synthesis for
asynchronous circuits based on Petri net unfoldings and incremental
SAT,” Fundamenta Informaticae, vol. 70, pp. 49–73, 2006, special Issue
on Best Papers from ACSD’04.
[19] A. Valmari, “The state explosion problem,” in Lectures on Petri Nets I:
Basic Models, Advances in Petri Nets. Springer, 1998, pp. 429–528.
[20] J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and
A. Yakovlev, “Automatic handshake expansion and reshuffling using
concurrency reduction,” in HWPNâA˘Z´98, 1998, pp. 86–110.
[21] V. Varshavsky, Ed., Self-Timed Control of Concurrent Processes.
Kluwer Academic Publishers, 1990.
[22] W. Reisig, Petri Nets: An Introduction, ser. EATCS Monographs on
Theoretical Computer Science. Springer-Verlag, 1985, vol. 4.
[23] A. Alekseyev, V. Khomenko, A. Mokhov, D. Wist, and A. Yakovlev,
“Improved parallel composition of labelled Petri nets,” in Proceedings
of ACSD’11. IEEE Computer Society Press, 2011, pp. 131–140.
[24] V. Khomenko, “Model checking based on prefixes of Petri net unfold-
ings,” Ph.D. dissertation, University of Newcastle upon Tyne, School of
Computing Science, 2003.
[25] ——, “Efficient automatic resolution of encoding conflicts using STG
unfoldings,” IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 17, pp. 855–868, 2009, special Section on Asynchronous
Circuits and Systems.
[26] ——, “Logic decomposition of asynchronous circuits using STG un-
foldings,” in Proceedings of the IEEE International Symposium on
Asynchronous Circuits and Systems (ASYNC). IEEE Computer Society
Press, 2011, pp. 3–12.
[27] D. L. Dill, Trace Theory for Automatic Hierarchical Verification of
Speed-Independent Circuits. Cambridge, MA, USA: MIT Press, 1989.
