Functional synthesis of genetic systems by Vaidyanathan, Prashant
Boston University
OpenBU http://open.bu.edu
Theses & Dissertations Boston University Theses & Dissertations
2019







FUNCTIONAL SYNTHESIS OF GENETIC SYSTEMS
by
PRASHANT VAIDYANATHAN
B.E., Birla Institute of Technology and Science - Pilani, 2012
M.S., Boston University, 2014
Submitted in partial fulfillment of the





All rights reserved except for
Chapter 2 and Sections 3.5, 3.6.
Sections 2.6, 2.7, and 2.8 adapted
from [Nielsen et al., 2016]
Reprinted with permission from
AAAS. Sections 2.1–2.5, and
2.8 adapted from [Vaidyanathan
et al., 2015]. Section 3.5 adapted
from [Vaidyanathan et al., 2017].




Douglas M. Densmore, Ph.D.
Associate Professor of Electrical and Computer Engineering
Associate Professor of Biomedical Engineering
Second Reader
Calin A. Belta, Ph.D.
Professor of Mechanical Engineering
Professor of Systems Engineering
Professor of Electrical and Computer Engineering
Third Reader
Wenchao Li, Ph.D.
Assistant Professor of Electrical and Computer Engineering
Assistant Professor of Systems Engineering
Fourth Reader
Andrew Phillips, Ph.D.
Head of Biological Computation Group
Microsoft Research




First and foremost, I would like to thank my advisor and friend Doug Densmore.
When it comes to mentors, one can’t ask for someone better than Doug. He has always
believed in me and has given me countless opportunities to pursue interesting research
ideas, meet new people, and attend interesting talks and conferences. From teaching
me to restore old arcade cabinets, to witty life lessons embedded with references
to ‘The Simpsons’, to guiding me to become a better scientist, Doug has been an
incredible influence in my life. I’m grateful for all his advice and for inspiring me to
always aim high.
I would also like to thank my thesis committee members. Calin Belta, has been
incredibly supportive and gave me the opportunity to work on some really interesting
ideas as part of the bio-CPS project. His guidance really helped shape the Phoenix
project into what it is today. He also introduced the Friday Sichuan Gourmet group
(myself, Nicholas, and Curtis) to some very interesting dishes, for which we will
eternally be grateful. Wenchao Li provided interesting insights and helped me think
about approaching some of the challenges in my Ph.D. in a very unique manner.
His suggestions helped make the Phoenix framework viable and practical. Andrew
Phillips, is the nicest and most compassionate person I have ever worked with. He has
been an incredible source of advice, wisdom, and positivity. He gave me an incredible
opportunity to work as a research intern under his guidance at Microsoft Research
which helped me grow as a researcher.
I was fortunate enough to work alongside some truly brilliant people. I would
particularly like to thank Bryan Der, Kevin Leshane, Cristian-Ioan Vasile, and Evan
Appleton who were not only amazing friends and collaborators, but also excellent
mentors who guided me and helped me understand synthetic biology, formal methods,
v
and many other concepts essential to my graduate work. I would also like to thank
my collaborators from Boston University: Swapnil Bhatia, Nicholas Roehner, Ryan
Silva, Curtis Madsen, Sadra Sadraddini, Rachael Ivison, Tim Jones, and Giuseppe
Bombara, whose contributions were essential to this thesis. I’d like to give a special
shout-out to my gifted collaborators in the Voigt Lab at MIT: Alec Nielsen, Bryan
Der, and Jonghyeon Shin for all their hard work and efforts in building and charac-
terizing the biological parts and circuits for the Cello project. It has truly been an
honour and pleasure working with you folks. I’d also like to specially thank Nicholas
Delateur and Brian Teague from the Weiss Lab at MIT. Nicholas has been tirelessly
working to build and characterize parts, modules, and circuits for the Phoenix project
for over two years now. I really appreciate his hard work and dedication. I also want
to thank Chris Voigt, Ron Weiss, and Chris Myers for all their guidance and help.
It has been an honour to work with giants in the field of synthetic biology. I’d also
like to acknowledge all my lab mates at CIDAR Lab for being an integral part of my
social circle in Boston.
A huge thank you to my friends and family. First, I’d like to thank my amazing
and loving parents. They have always been there for me and have worked extremely
hard to give me every opportunity to pursue my dreams and ambitions. To Amma, the
most caring and loving person I know, you are my pillar of strength. You encouraged
me to be inquisitive and always pushed me to achieve my potential. To Appa, my
source of motivation and perseverance, you have been a constant source of inspiration
for me. You have worked tirelessly to help me dream big. Everything I am today, is
because of my parents. I would like to thank my uncle and aunt: Harish Ganapathy
and Ranjani Harish, who have been guiding me at every step of my life. Harish
mama, you have always been my role model, and every thing I’ve achieved in my
life, I owe it to you. I want to thank both my grandfathers: Car thatha, for being
vi
someone I could look up to in the field of mathematics, and Ladoo thatha, the first
academic and scholar in the family, for always inspiring me to pursue knowledge.
I want to thank my Super Saiyan younger brother Balaji, who showed me how to
be courageous, cheerful, and strong even in the face of adversity, you will always be
missed. I’d also like to give a special shout out to my best friends: Ameya, Nikhil,
Omkar, and Tejas. You guys are awesome and I look forward to our next adventure.
Finally, I want to thank my amazing and wonderful wife Namratha, who has been
incredibly patient, understanding, loving, and has always supported me. You inspire
me and I love you.
vii
FUNCTIONAL SYNTHESIS OF GENETIC SYSTEMS
PRASHANT VAIDYANATHAN
Boston University, College of Engineering, 2019
Major Professor: Douglas M. Densmore, Ph.D.
Associate Professor of Electrical and Computer
Engineering
Associate Professor of Biomedical Engineering
Boston University, College of Engineering
ABSTRACT
Synthetic genetic regulatory networks (or genetic circuits) can operate in complex bio-
chemical environments to process and manipulate biological information to produce a
desired behavior. The ability to engineer such genetic circuits has wide-ranging appli-
cations in various fields such as therapeutics, energy, agriculture, and environmental
remediation. However, engineering multilevel genetic circuits quickly and reliably is a
big challenge in the field of synthetic biology. This difficulty can partly be attributed
to the growing complexity of biology. But some of the predominant challenges in-
clude the absence of formal specifications – that describe precise desired behavior of
these biological systems,as well as a lack of computational and mathematical frame-
works – that enable rapid in-silico design and synthesis of genetic circuits. This thesis
introduces two major frameworks to reliably design genetic circuits.
The first implementation focuses on a framework that enables synthetic biologists
to encode Boolean logic functions into living cells. Using high-level hardware descrip-
tion languages to specify the desired behavior of a genetic logic circuit, this framework
describes how, given a library of genetic gates, logic synthesis can be applied to syn-
viii
thesize a multilevel genetic circuit, while accounting for biological constraints such
as ‘signal matching’, ‘crosstalk’, and ‘genetic context effects’. This framework has
been implemented in a tool called Cello, which was applied to design 60 circuits
for Escherichia coli, where the circuit function was specified using Verilog code and
transformed to a DNA sequence. Across all these circuits, 92% of the output states
functioned as predicted.
The second implementation focuses on a framework to design complex genetic sys-
tems where the focus is on how the system behaves over time instead of its behavior
at steady-state. Using Signal Temporal Logic (STL) – a formalism used to specify
properties of dense-time real-valued signals, biologists can specify very precise tempo-
ral behaviors of a genetic system. The framework describes how genetic circuits that
are built from a well characterized library of DNA parts, can be scored by quantify-
ing the ‘degree of robustness’ of in-silico simulations against an STL formula. Using
formal verification, experimental data can be used to validate these in-silico designs.
In this framework, the design space is also explored to predict external controls (such
as approximate small molecule concentrations) that might be required to achieve a





1.1 Engineering Biological Systems . . . . . . . . . . . . . . . . . . . . . 2
1.2 Functional Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Genetic Logic Synthesis 9
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Synthesis challenges for genetic logic . . . . . . . . . . . . . . . . . . 11
2.3 Framework for Genetic Logic Synthesis . . . . . . . . . . . . . . . . . 14
2.3.1 Logic Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.3 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.5 Problem Complexity . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 General Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.3 Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.4 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 Current Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 Cello - An Implementation of Genetic Logic Synthesis . . . . . . . . . 37
2.6.1 Cello Design Environment - Overview . . . . . . . . . . . . . 37
x
2.6.2 Specification: Verilog hardware description language . . . . . 41
2.6.3 Parsing Verilog to generate a truth table . . . . . . . . . . . . 48
2.6.4 Logic synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.6.5 Response functions and cytometry data for insulated gates . . 55
2.6.6 Repressor assignment . . . . . . . . . . . . . . . . . . . . . . . 58
2.6.7 Combinatorial design of circuit layouts . . . . . . . . . . . . . 64
2.6.8 Predictions of circuit performance . . . . . . . . . . . . . . . . 72
2.7 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3 Automated Design of Genetic Systems with Temporal Verification 85
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.2 Performance Specifications for Genetic Systems . . . . . . . . . . . . 86
3.2.1 Signal Temporal Logic . . . . . . . . . . . . . . . . . . . . . . 86
3.3 Framework for Genetic Temporal Logic Synthesis . . . . . . . . . . . 96
3.3.1 Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.3.2 Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.3.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 105
3.3.4 Problem Complexity . . . . . . . . . . . . . . . . . . . . . . . 106
3.4 General Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.4.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.5 Temporal Logic Inference for Time Series Data . . . . . . . . . . . . 112
3.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.5.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . 115
3.5.4 Grid-Based Temporal Logic Inference . . . . . . . . . . . . . . 115
3.5.5 Using GridTLI in Biological Networks . . . . . . . . . . . . . 122
xi
3.5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.6 Metrics for Signal Temporal Logic . . . . . . . . . . . . . . . . . . . . 129
3.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.6.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.6.3 Pompeiu-Hausdorff Distance . . . . . . . . . . . . . . . . . . . 137
3.6.4 Symmetric Difference . . . . . . . . . . . . . . . . . . . . . . . 139
3.6.5 Quantification of design quality . . . . . . . . . . . . . . . . . 143
3.7 Phoenix - An Implementation of Genetic Temporal Logic Synthesis . 147
3.7.1 Phoenix Design Environment - Overview . . . . . . . . . . . . 147
3.7.2 Structural Specification - miniEugene . . . . . . . . . . . . . . 147
3.7.3 Structural Specification to Rooted Tree . . . . . . . . . . . . . 151
3.7.4 Part and Module Assignment . . . . . . . . . . . . . . . . . . 152
3.8 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.8.1 Constitutive Expression . . . . . . . . . . . . . . . . . . . . . 168
3.8.2 “Low” to “High” . . . . . . . . . . . . . . . . . . . . . . . . . 174
3.8.3 Pulse Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
3.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4 Conclusions and Future Work 184
A Application of Grid TLI to a Fuel Control System 186
B Application of STL Metrics in Loss functions for TLI 191





2.1 Genetic circuit realizing multiple Boolean functions. . . . . . . . . . . 24
2.2 Quantifying how well a genetic circuit realizes a Boolean function . . 25
2.3 Genetic Logic Synthesis Implementations . . . . . . . . . . . . . . . . 36
2.4 Insulated gate response function parameters . . . . . . . . . . . . . . 57
3.1 Directed PH Distances . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.2 PH and SD Distances . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.3 List of miniEugene identifiers used in Phoenix. . . . . . . . . . . . . . 149
3.4 Candidate Assignment for DNA components in Phoenix . . . . . . . 154
3.5 Circuit configurations found using Exhaustive search. . . . . . . . . . 158
3.6 Parameters for Constitutive Promoter–RBS Composite DNA compo-
nents in Phoenix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.7 Parameters for Activatable and Repressible Promoter–RBS Composite
DNA components in Phoenix . . . . . . . . . . . . . . . . . . . . . . 167
3.8 Parameters for CDS components in Phoenix . . . . . . . . . . . . . . 167
3.9 Parameters for Small Molecule Inducers in Phoenix . . . . . . . . . . 167
A.1 Comparison of GridTLI and Decision Tree TLI . . . . . . . . . . . . 189
xiii
List of Figures
1·1 Computation in a cell . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1·2 Building reliable and robust Genetic Systems. . . . . . . . . . . . . . 4
1·3 Example of Functional Synthesis . . . . . . . . . . . . . . . . . . . . 6
1·4 Functional Synthesis of Genetic Systems . . . . . . . . . . . . . . . . 7
2·1 Genetic implementation of combinational logic . . . . . . . . . . . . . 10
2·2 Crosstalk in Genetic Circuits . . . . . . . . . . . . . . . . . . . . . . 12
2·3 Signal Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2·4 Genetic context effects . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2·5 Genetic Logic Synthesis workflow . . . . . . . . . . . . . . . . . . . . 34
2·6 Overview of Cello. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2·7 The physical specification for the Eco1C1G1T1 UCF. . . . . . . . . . 38
2·8 Overview of Cello - Block Diagrams . . . . . . . . . . . . . . . . . . . 40
2·9 Assignment of genetic gates to the circuit diagram. . . . . . . . . . . 41
2·10 Flow of Verilog parsed to a truth table . . . . . . . . . . . . . . . . . 50
2·11 Nested Verilog modules . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2·12 Logic synthesis workflow in Cello . . . . . . . . . . . . . . . . . . . . 53
2·13 Logic synthesis workflow in Cello . . . . . . . . . . . . . . . . . . . . 56
2·14 Cello circuit score calculation . . . . . . . . . . . . . . . . . . . . . . 59
2·15 Toxic circuits and gate measurements in Cello . . . . . . . . . . . . . 61
2·16 Tradeoff between circuit score and predicted cell growth . . . . . . . 62
2·17 Simulated annealing search algorithm for repressor assignment in Cello 63
xiv
2·18 Example sets of Eugene rules for the Majority circuit. . . . . . . . . . 71
2·19 Circuit distribution calculation. . . . . . . . . . . . . . . . . . . . . . 74
2·20 Automated design of circuits by Cello. . . . . . . . . . . . . . . . . . 77
2·21 Automated design of circuits by Cello. . . . . . . . . . . . . . . . . . 78
2·22 Automated design of circuits by Cello. . . . . . . . . . . . . . . . . . 80
2·23 Automated design of circuits by Cello. . . . . . . . . . . . . . . . . . 81
2·24 Circuits with 1 failed output state. . . . . . . . . . . . . . . . . . . . 82
2·25 Circuits with 2 failed output states. . . . . . . . . . . . . . . . . . . . 83
2·26 Circuits with 3 or more failed output states. . . . . . . . . . . . . . . 83
2·27 Analysis of circuit failures and the design of multiple constructs by
combinatorial design. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3·1 Robustness of Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3·2 Visual Representation of Example 2 . . . . . . . . . . . . . . . . . . . 93
3·3 Visual Representation of Example 3 . . . . . . . . . . . . . . . . . . . 94
3·4 Visual Representation of Example 4 . . . . . . . . . . . . . . . . . . . 95
3·5 Biological Modules used in the GTLS framework. . . . . . . . . . . . 100
3·6 Biologically Invalid/Redundant Circuits . . . . . . . . . . . . . . . . 104
3·7 Workflow for Genetic Temporal Logic Synthesis . . . . . . . . . . . . 111
3·8 Time-series measurements and simulations of biological networks. . . 112
3·9 Overview of GridTLI . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3·10 Grid TLI used to characterize the levels of protein expression for a
biological network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3·11 Scatter plot of mean robustness of false negative traces against average
number of operators in the STL formula generated by GridTLI for the
biological traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3·12 Impact of thresholds on the STL formula generated by GridTLI . . . 126
xv
3·13 GTLS Workflow using GridTLI . . . . . . . . . . . . . . . . . . . . . 128
3·14 Symmetric Difference between two STL formulae . . . . . . . . . . . 140
3·15 Biological traces of constitutive expression and an induction experiment.145
3·16 Boxes for Union of two STL formulae . . . . . . . . . . . . . . . . . . 146
3·17 Circuit topologies produced by miniEugene . . . . . . . . . . . . . . 151
3·18 Vertex Decomposition and Model Composition in Phoenix . . . . . . 153
3·19 Part Assignment in Phoenix using Exhaustive Search . . . . . . . . . 155
3·20 Abstract circuit topologies . . . . . . . . . . . . . . . . . . . . . . . . 158
3·21 Design Constraint Example for Simulated Annealing . . . . . . . . . 161
3·22 Finding a biologically valid neighbor. . . . . . . . . . . . . . . . . . . 162
3·23 Hamming Distance between two circuit configurations. . . . . . . . . 163
3·24 Acceptance probabilities for various values of ∆. . . . . . . . . . . . . 163
3·25 Performance of Simulated Annealing in Phoenix. . . . . . . . . . . . 164
3·26 Library of DNA components, composite DNA components, and mod-
ules designed for Phoenix. . . . . . . . . . . . . . . . . . . . . . . . . 166
3·27 Specifications for the Phoenix Constitutive Expression case study. . . 169
3·28 SBOL visual representations of candidate circuit assignments for the
Phoenix Constitutive Expression case study. . . . . . . . . . . . . . . 170
3·29 Deterministic Simulations for the Phoenix Constitutive Expression
case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
3·30 Stochastic Simulations for the Phoenix Constitutive Expression case
study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
3·31 Stochastic Simulations for the Phoenix Constitutive Expression case
study against the region of satisfaction of the performance specification.173
3·32 Experimental validation for Phoenix Constitutive Expression case study.174
3·33 Specifications for the Phoenix “Low” to “High” case study. . . . . . . 176
xvi
3·34 Candidate circuit assignment designed by Phoenix for the “Low” to
“High” case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
3·35 Deterministic Simulation and Experimental Traces for the Phoenix
“Low” to “High” case study. . . . . . . . . . . . . . . . . . . . . . . . 177
3·36 Deterministic Simulations and Experimental Traces for the Phoenix
“Low” to “High” case study checked against the performance specifi-
cation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
3·37 Specifications for the Phoenix Pulse case study. . . . . . . . . . . . . 180
3·38 All candidate circuit assignments designed by Phoenix for the Pulse
case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
3·39 Candidate Circuit assignment for Phoenix for the Pulse case study. . 181
3·40 Stochastic simulations of a candidate circuit assignment for the pulse
case study with different configurations. . . . . . . . . . . . . . . . . 182
A·1 Example testing set for case study A. . . . . . . . . . . . . . . . . . . 186
A·2 Scatter plot of mean MCR against mean number of operators in the
STL formula generated by GridTLI for the fuel control system data. . 188
B·1 Using STL Metrics in loss functions for TLI . . . . . . . . . . . . . . 193
xvii
List of Abbreviations
AGRN . . . . . . Abstract Genetic Regulatory Network
BDA . . . . . . Biodesign Automation
CAD . . . . . . Computer-Aided Design
CDS . . . . . . Coding Sequence
DAG . . . . . . Directed Acyclic Graph
FN . . . . . . False Negative
FNR . . . . . . False Negative Rate
FP . . . . . . False Positive
FPR . . . . . . False Positive Rate
GLS . . . . . . Genetic Logic Synthesis
GRN . . . . . . Genetic Regulatory Network
GTLS . . . . . . Genetic Temporal Logic Synthesis
MCR . . . . . . Misclassification Rate
MEFL . . . . . . Molecules of Equivalent Fluorescence
NP . . . . . . Non-deterministic Polynomial-time
RBS . . . . . . Ribosome-binding Site
ROBDD . . . . . . Reduced Ordered Binary Decision Diagram
SBML . . . . . . Systems Biology Markup Language
SBOL . . . . . . Synthetic Biology Open Language
STL . . . . . . Signal Temporal Logic
TLI . . . . . . Temporal Logic Inference
TN . . . . . . True Negative





Genetic Engineering has come a long way from being a wishful thought in science fic-
tion [Stableford, 2004] to being the foundation for a new field of science that can solve
some of the most challenging obstacles that humanity faces [Endy, 2005]. Biologists
have discovered the structure of DNA [Watson et al., 1953], understood the flow of
genetic information in biological systems [Crick, 1958,Crick, 1970], and cracked the
genetic code [Nirenberg and Matthaei, 1961]. These breakthroughs helped shape the
field of genetic engineering where, for decades, biologists have been able to read and
write DNA, cut and paste DNA sequences, and model the behavior of DNA to better
understand biochemical phenomena that govern our lives.
More recently, with science becoming more collaborative, a new interdisciplinary
field called Synthetic Biology emerged. Synthetic Biology aims to apply principles
of engineering and use genetic engineering, to design and build synthetic biological
modules, biological systems, and biological constructs, that have been inspired by
naturally occurring biological systems, for useful purposes. The ability to engineer
robust biological systems has tremendous applications such as, genetically engineering
cells to artificially produce key pharmaceutical drugs like artemisinin [Paddon and
Keasling, 2014], using biosensors for environmental environmental remediation [Karig,
2017], encoding logic in cells to detect malignant cancer cells [Xie et al., 2011], and
engineering microbes to produce synthetic materials [Charoonnart et al., 2018]. How-
ever, engineering such sophisticated systems quickly and reliably is still a big challenge
2
due to the complexity of biology. Lack of proper strategies, efficient frameworks, and
use of computational methods, could impede the ability to expand the scope of vari-
ous breakthroughs in synthetic biology to practical applications [Purnick and Weiss,
2009].
1.1 Engineering Biological Systems
The ultimate goal of synthetic biology is to encode logic within biological systems so
that they can process inputs to produce a desired behavior. A cell can be treated as a
complex system that is capable of sensing environmental conditions, such as detecting
secondary metabolites, identifying change in pH, or using quorum sensing. The cell
can then process these conditions (inputs) based on the logic that is encoded in the
DNA, and produce molecules to balance the environment (outputs). In many ways,
the cell is analogous to a computer that can run a program that is encoded in the
DNA.
Figure 1·1: Computation in a cell
Fig. 1·1 shows a high level abstraction of the engineering principles that drive
synthetic biology. The desired behavior of a biological system can be written as
a “biological program” using DNA as the coding language. This “genetic code” can
then be inserted into a cell, and the cellular machinery can then execute the program.
Like any other programming language that is structured around syntax and semantics,
3
genetic code also has syntax and semantics that are determined by fundamentals of
biology and chemistry. And just like any other computer program or software, it is
important to write this genetic code in a systematic and efficient manner, since this
greatly impacts how the cell runs the biological program. Incorrect “syntax” could
result in the “biological code” to not work as intended, or to not work at all. In
some cases, certain biological elements and phenomenon can also act like malicious
software and damage the cellular machinery.
Keeping this analogy in mind, the goal of this thesis is to present methods and
frameworks that can help design and build genetic systems in a formal, robust, and
reliable way. The workflow shown in Fig. 1·2 has been inspired by the method of
building robust software and hardware systems in the field of computer engineering
and systems engineering. This process or workflow typically has the following key
components:
• A system specification: which formally describes the requirements, constraints,
and description of the system to be built.
• A design methodology/process: this is the step that synthesizes a system from
a library of available components based on the specifications provided.
• Verification of the system: which includes a metric that quantifies how well a
system satisfies its specification.
This technique is consistent with the Specify - Design - Build - Test methodology
which is a core concept in Biodesign Automation [Densmore and Hassoun, 2012], a
design approach used to synthesize genetic constructs.
While there has been significant progress in formulating these principles to de-
sign genetic circuits [Appleton, 2016, Roehner, 2014], reliably synthesizing robust
4
Figure 1·2: Building reliable and robust Genetic Systems.
genetic circuits still remains one of the biggest challenges in the field of synthetic
biology [Nielsen, 2017]. These challenges can be summarized as follows:
• Lack of formal specifications: in many cases, the description of the desired
behaviors of genetic circuits are vague and mathematically ambiguous. This is
compounded by the fact that many of the experiments are often hard to repeat
and the results are hard to reproduce.
• Complexity of synthesis algorithms: in general, many synthesis problems (like
Logic synthesis) are computationally intractable [Umans et al., 2006]. This
implies that when these techniques are used to design biological circuits, the
only algorithm that can provide the best solution is brute-force. This may not
be the primary obstacle currently due to the lack of well characterized biological
components, which results in fewer building blocks for genetic circuits, thereby
resulting in a smaller solution space. However, this can change very quickly
as biologists have been making great strides in characterizing new parts and
modules [Nielsen et al., 2016]. Another issue is that biology poses its own
unique set of constraints which does not make it trivial to directly apply well-
5
established techniques to synthesize genetic circuits.
• Lack of clear metrics to quantify behavior: just stating that a biological module
behaves as intended is not enough. There have to be clearly defined metrics
that can either qualitatively or quantitatively define how well a system or its
behavior satisfies a formally defined specification. This is important to be able
to design genetic circuits with reproducible behavior.
This thesis addresses these challenges by using the concept of “Functional Syn-
thesis” to formulate frameworks that formally describe how genetic circuits can be
synthesized.
1.2 Functional Synthesis
The goal of functional synthesis is to design a system for a given set of formal spec-
ifications using well characterized components from a given library, such that the
system satisfies the constraints of the specifications.
Formal specifications are mathematically unambiguous descriptions of admissible
behaviors of a system. Specifications can be categorized into 3 main categories:
• Structural Specification: constrains the structural or topological properties of
a system.
• Functional Specification: describes intended capabilities of a system, or ‘what’
a system must be capable of doing.
• Performance Specification: describes the operational capabilities of a system,
or ‘how well’ a system must operate.
The goal of functional synthesis is to design or synthesize a system that can satisfy
all the constraints posed in the specifications. The system and its behavior is then
6
checked against the specifications and assigned a score (using a set of rules defined
to quantify the behavior/properties of the system). This score is compared against
scores of other possible system designs using a metric. The system with the best score
is then returned as the solution.
Figure 1·3: Example of Functional Synthesis
Fig. 1·3 shows an example of using functional synthesis to design a car. The
structural specification can include dimensions for the car, number of doors, etc. The
functional specification would list the capabilities of the car, for instance: “the car
must be able to move forward or move in reverse, or must be able to turn left and
right”. The performance specification can include details like acceleration for the car,
or fuel mileage, etc. The car can then be built using the parts that are available. The
car’s quality can then be judged based on how well it satisfies the constraints in the
specifications.
Fig. 1·4 shows an example of using functional synthesis to design genetic circuits.
In this example, the genetic circuit can only have two promoters (structural specifi-
cation), where the production of GFP must decrease (functional specification) from
1000 MEFL to 200 MEFL within 2 hours (performance specification). The genetic
7
Figure 1·4: Functional Synthesis of Genetic Systems
circuit must be built using a library that has 3 promoters, 1 ribosome binding site, 3
coding sequences, and 1 terminator.
1.3 Contributions
This thesis uses the concept of functional synthesis to define two frameworks that
address the synthesis challenges in synthetic biology. The main contributions of this
thesis can be summarized as follows:
• Formally define two frameworks that can synthesize genetic circuits based on a
given specification and a library of DNA components/modules.
1. The first framework specified in Chapter 2 describes a framework for Ge-
netic Logic Synthesis, where biologists can design genetic circuits encoded
with Boolean logic functions.
2. The second framework specified in Chapter 3 describes a framework for
Genetic Temporal Logic Synthesis, which enables biologists to specify func-
tionally rich temporal behaviors of biological systems, and helps biologists
8
identify configurations for their genetic circuits that can achieve a desired
temporal behavior.
• Identify biological limitations while using well established engineering principles
in these frameworks.
• Identify the computational complexity of both the frameworks.
• Clearly define qualitative and quantitative metrics that indicate how well a
genetic system satisfies a desired behavior.
• Develop heuristics to identify genetic circuits that can satisfy the specifications.
• Provide implementations for both frameworks along with biologically validated





Boolean algebra is a foundation for understanding and defining logic that has en-
abled robust engineering of many of today’s large scale electronic systems. In the
field of synthetic biology, Boolean logic functions are now being engineered into liv-
ing cells [Lu et al., 2009, Purnick and Weiss, 2009, Nielsen et al., 2013, Brophy and
Voigt, 2014]. Genetically encoded logic operates within the central dogma of molec-
ular biology, which describes the flow of information from DNA to RNA to protein.
Transcription is the biological process by which DNA is decoded to RNA by a poly-
merase (RNAP), while translation is the process by which RNA is decoded to protein.
High/low levels of RNAP flux can be abstracted to ON/OFF logic states, which is
the basis for transcriptional genetic logic.
While genetic logic can be implemented in many cell types at the transcriptional,
translational, or post-translational stages of the central dogma, the synthesis frame-
work presented in this thesis is primarily focused on transcriptional genetic logic
circuits. Transcriptional genetic logic circuits are made up of one or more transcrip-
tional units. A transcriptional unit contains several different types of DNA compo-
nents (genetic parts), but promoters and coding sequences are sufficient for a basic
understanding of a unit’s function. Promoters are start sites for transcription. One












    input
promoters
transcriptional 
   logic gate
transcription
factor (diffusing)
















































Figure 2·1: Genetic implementation of combinational logic. A) Pro-
moter and coding sequence symbols (bent arrows and block arrows) are
derived from standardized iconography [Galdzicki et al., 2014]. Promot-
ers are transcriptional start sites, which can be ON or OFF. Coding
sequences encode transcription factors that can diffuse and regulate
promoter state (repression arc). When a coding sequence encodes a
protein-based transcription factor, transcription and translation occur,
but this illustration does not explicitly show the RNA transcript. A
NOR gate requires two input promoters, a coding sequence encoding
a transcriptional repressor, and an output promoter. B) A multilevel
XOR circuit has a unique transcriptional state for each row of the
truth table as indicated by the presence/absence of specific repression
arcs. The output (white coding sequence) is transcribed when either
upstream promoter is ON.
11
activation or repression of RNAP flux at a different promoter. These genetic regula-
tory relationships, illustrated using directed arcs from coding sequences to promoters
(Figure 2·1), enable the implementation of transcriptional genetic logic gates, such
as BUFFER, NOT, OR, NOR, AND, and NAND. For example, an input promoter,
coding sequence for a transcriptional repressor, and output promoter implement a
genetic NOT gate. A NOR gate can be built from the same parts, but two input
promoters are required (Figure 2·1A). Furthermore, an XOR circuit can be built by
connecting two NOT gates, two NOR gates, and one OR output (Figure 2·1B). In
this circuit, promoters are ON unless repressed by a transcription factor, and the
conditional presence/absence of repression arcs results in XOR logic for the output
coding sequence.
2.2 Synthesis challenges for genetic logic
This section describes three engineering challenges for genetic circuit design (Fig-
ure 2·2, Figure 2·3, and Figure 2·4, with new terms defined in Section III).
Crosstalk
Unlike metal wires, the signal carriers in genetic circuits are regulatory molecules
that can diffuse and mix within the liquid medium of a cell. Sometimes this dif-
fusion also occurs between cells, for example, if the molecule is involved in cell-to-cell
communication [Saeidi et al., 2013, Tamsir et al., 2011]. Due to diffusion within
and/or between cells, each gate must encode unique regulatory proteins or RNA
to avoid unintended logic computations. In synthetic biology, this phenomenon is
known as crosstalk. Crosstalk limits the scalable design of genetic logic circuits and
requires significant experimental effort to identify and quantify regulators that have
unique binding partners [Moon et al., 2012,Stanton et al., 2014]. For example, reusing
12
the same promoter-coding sequence pair for each gate in the XOR circuit results in
multiple undesired regulatory arcs (Figure 2·2).
Crosstalk
No crosstalk
in1 in2 in2 in1
out
out
in1 in2 in2 in1
Figure 2·2: Crosstalk. Signal carriers are transcription factors, which
are diffusible molecules that mix within a cell. Crosstalk occurs when
the same signal carrier is reused in more than one logic gate in a cell.
To avoid crosstalk, each gate must use a unique signal carrier.
Signal matching
In contrast to electronic logic gates, for which input/output signals have definable
system-wide high/low levels, genetic logic gates have unique response functions that
describe experimentally measured input-output relationships and determine whether
an input/output signal is high or low [Stanton et al., 2014, Yaman et al., 2012].
Thus, assigning particular Boolean functions to genetic logic gates does not guarantee
successful genetic logic, as signal mismatches can result in logic errors (Figure 2·3).
Context effects
The physical properties of biological systems are challenging to quantify and under-
stand. Even after extensive efforts to characterize a series of genetic logic gates and
avoid both crosstalk and signal mismatches, logic errors can occur due to unantici-
pated non-modularity that arises when genetic components are used in new genetic,
cellular, and environmental contexts [Cardinale and Arkin, 2012,Brophy and Voigt,
13








1 2 3 4 5 6 7 8 9 10 11 12
the genetic XOR circuit has twelve new part junctions
versions of the same XOR circuit with different part junctions:
emphasis on the context of one coding sequece (red)
in1 in2 in2 in1
out
out












in2 NOT gate output in2 NOT gate output
Figure 2·3: Signal matching. The dynamic range of the blue NOT
gate matches the ON/OFF dynamic range of the input promoter. The
dynamic range of the green NOT gate does not match and results in a
logic error where both outputs are high.
2014]. For example, the performance of a component can be altered by its neighbor-
ing components [Lou et al., 2012,Mutalik et al., 2013], and unintended components
can form at component junctions [Yao et al., 2013]. The risk of local component
interference increases as circuit size i creases (Figure 2·4). Context effects can also
act at a distance, such as retroactivity when a transcription factor binds multiple
instances of a promoter [Del Vecchio et al., 2008]. Other context effects are not as
well understood, such as resource competition within the host cell, growth rate of
the host cell [Cardinale et al., 2013], and environmental factors that affect host cell
physiology [Cardinale and Arkin, 2012,Arkin, 2013].
Despite these challenges, the field of genetic circuit design continues to mature.
Crosstalk is being mitigated by programmable transcription factors that can bind
specifiable DNA sequences, thus expanding the number of orthogonal molecular ‘wires’
(RNA-guided dCas9 [Qi et al., 2013, Bikard et al., 2013, Nielsen and Voigt, 2014],
zinc fingers [Lohmueller et al., 2012], TAL effectors [Lienert et al., 2013], and small







1 2 3 4 5 6 7 8 9 10 11 12
the genetic XOR circuit has twelve new part junctions
versions of the same XOR circuit with different part junctions:





in1 in2 in2 in1
Figure 2·4: Genetic context effects. While there are several differ-
ent types of genetic context effects, this figure illustrates new DNA
component junctions. A larger circuit will have more junctions and
carries a higher risk for non-modular component behavior. In circuits
with the same genetic gates, variations in order and orientation of the
transcriptional units result in different neighboring DNA components.
more precise and reproducible experimental measurement and characterization tech-
niques [Beal et al., 2012b,Kelly et al., 2009]. Context effects are also being identified
and insulated [Cardinale and Arkin, 2012,Arkin, 2013]: genetic components have been
developed to flank promoters [Mutalik et al., 2013,Lou et al., 2012], and retroactiv-
ity can be quantitatively modeled [Del Vecchio et al., 2008]. As genetic components
increase in both number and modularity, computer-aided design will play a vital role
in genetic circuit design. With this goal in mind, this thesis will define a framework
for genetic logic synthesis.
2.3 Framework for Genetic Logic Synthesis
Genetic logic synthesis is the process by which an abstract functional description is
mapped to a design for a genetic logic circuit, one that includes a complete DNA
15
sequence to inform its physical construction. Genetic circuits broadly include combi-
national logic circuits for decision-making, sequential circuits for switches and coun-
ters [Gardner et al., 2000,Friedland et al., 2009], and analog circuits for linear sensing
and ratio calculation [Daniel et al., 2013, Sarpeshkar, 2014]. These are all active ar-
eas of research, but this framework focuses on combinational logic circuits. In order
to define a framework for genetic logic synthesis, first, the general process of logic
synthesis has been described, followed by the definition of terms and concepts for
formulating the genetic logic synthesis problem.
2.3.1 Logic Synthesis
Logic synthesis can be defined by this workflow: specification, design, build (see also
Section V). The input to the system is any abstract logic description that is able
to be specified using a description language or description tool. The output is a
technology-specific mapping of the input logic.
Specification is the process of creating a correct representation of the description
that is canonical for optimization, such as a truth table or a reduced ordered binary
decision diagram (ROBDD) [Bryant, 1986]. This point in the workflow is closest
to the abstract functional description of an electronic logic circuit and is the step
with the least concern for the target technology. An example of specification for
electronic circuits is the act of accepting or rejecting a description written in Verilog
for RTL synthesis, as per the IEEE Standard for Verilog RTL Synthesis (1364.1-
2002) [iee, 2002]. The standard enables reproducibility among different synthesis
tools and defines a specific subset of Verilog that can be reliably instantiated in
hardware.
Next, the design step of the synthesis process takes the output of specification
and produces an optimized gate-level schematic. This optimization weighs several
16
cost functions, such as reducing the total number of components in the schematic or
the total amount of component fan-in [Chu, 2006] (examples of genetic cost functions
in Table 2.3). The functional characteristics of the gate-level components, but not
necessarily their physical attributes, must be defined using an external library, which
usually consists of logic gates at the level of NOT, AND, OR, and NOR.
Build is the final step in the process of converting an abstract design into a realiz-
able form. This step translates the optimized gate-level schematic into structures that
can be found in a specific target technology. Since this step is technology specific, it
marks the farthest point in the workflow from the abstract functional description.
2.3.2 Terms
• Graphs: In this framework, gate-level schematics are represented using directed
acyclic graphs (DAGs).
1. A graph G = (V ,E) is a data structure consisting of a set of vertices V and
a set of edges E connecting these vertices [Rosen, 1999]. In this framework,
each vertex in a graph represents a gate in a gate-level schematic or an
input/output component for the schematic as a whole.
2. A directed graph is a graph in which each edge is an ordered pair of
elements (vi ,vt) in V . An edge eit is said to start with its initial vertex vi
and end with its terminal vertex vt.
3. A path from v0 to vn is an ordered set of edges
(v0,v1) , (v1,v2) , (v2,v3) , . . . , (vn−1,vn) ,
such that the terminal vertex of each edge is the same as the initial vertex
of the next edge in the path.
4. In an acyclic graph, there exists no path that starts with a vertex vi and
17
ends with the same vertex vi .
5. A colored graph is a graph in which a color is assigned to each vertex
of the graph, such that no two vertices are assigned the same color. This
restriction of color uniqueness is necessary to avoid crosstalk during genetic
logic synthesis.
• Synthetic Biology: The following section defines the terms from synthetic biol-
ogy that are necessary for a basic understanding of genetic logic synthesis.
1. A DNA component (also referred to as a genetic part in synthetic biology)
is a finite string of characters (p1,p2,p3,p4, . . . ,pn), where pi ∈ {A,T ,G,C}.
A, T , G, and C represent the nucleotides adenine, thymine, guanine, and
cytosine, respectively, which are the basic building blocks of DNA. A new
junction of two DNA components results in a new genetic context for each
component.
2. Transcription is the biological process by which DNA is “read” to produce
RNA.
3. Translation is the biological process by which RNA is “read” to produce
protein.
4. A promoter is a type of DNA component (bent arrows in Figure 2·1) at
which transcription starts. Promoter locations determine fan-in and fan-
out for a gate.1
1Fan-in is determined by the number of physically adjacent, non-identical promoters driving
expression of a coding sequence. Note that genetic logic gates with fan-in greater than two have
not yet been reported. Limitations to fan-in include a high OFF level due to accumulation of leaky
expression from each promoter, and physical interference between RNAP and downstream promoters
bound by transcription factors. Fan-out of a primitive gate is determined by the number of disjoint
instances of the same promoter. Homologous recombination can delete DNA between repeated
sequences, which may place restrictions on fan-out for certain gates. Recombination can occur at 20-
30 base pairs of identical sequence, but larger sequences will be more susceptible. Additionally, high
fan-out might affect the response function due to transcription factor titration effects (retroactivity).
18
5. A coding sequence is a type of DNA component (box-arrows in Figure 2·1)
that encodes and can be transcribed to RNA. RNA typically encodes and
can be translated to protein, though RNA-based regulators do not require
translation.
6. RNAP (RNA polymerase) binds a promoter in the ON state and tran-
scribes the downstream coding sequences to RNA. Levels of transcription
at a promoter can be expressed in terms of RNAP flux, the number of
polymerases per second that start transcription at that promoter.
7. A transcription signal is a measure of transcription based on RNAP flux
at a promoter that serves as a Boolean input or output in transcriptional
genetic logic.
8. A transcription factor is a diffusible signal carrier (RNA or protein) that
regulates transcription at a specific promoter. This regulation is the basis
for implementing transcriptional genetic logic.
9. A genetic module is a set of DNA components and any of their encoded
or associated biochemical components (such as RNA and protein) that
interact to perform some biological function.
10. A transcriptional genetic logic gate is a genetic module comprised of a
set of one or more coding sequences that regulate a promoter via their
encoded transcription factor(s). This regulation is abstracted to a logical
relationship between the input transcription signals from any promoters
composed with the gate’s coding sequences and the output transcription
signal from the gate’s promoter.




The response function specifies how the combination of input transcription
signals of gate k is mapped to the output transcription signal of gate k.
Examples of response functions for NOR gates, AND gates, and an OR
gate are shown as heatmaps in Figure 2·5 (white is low, black is high).
12. A transcriptional unit is a partially ordered set of DNA components that
is capable of undergoing transcription, typically including one or more
adjacent promoters followed by a coding sequence. The set is partially
ordered because the input promoters must precede the coding sequence,
but the order of the input promoters can vary. Transcriptional units form
when transcriptional genetic logic gates are connected to form a circuit
and a promoter from one gate is composed with a coding sequence from
another gate.
13. A transcriptional genetic logic library is a set of promoters for generat-
ing transcription signals, a set of coding sequences for reading/displaying
transcription signals, and a set of transcriptional genetic logic gates for
transforming input transcription signals to output transcription signals in
accordance with a set of response functions.
14. A transcriptional genetic logic circuit is a subset of a transcriptional ge-
netic logic library with a colored DAG to indicate the connectivity of its
elements. In general, each color in the DAG represents a transcriptional
genetic logic gate that encodes a unique transcription factor. The colors
of the input and output vertices of the graph (Section 2.3.3), however,
represent individual promoters and coding sequences, respectively. This
20
exception is necessary for complete transcriptional units to be formed at
the boundaries of the graph (Figure 2·5).
15. The DNA sequence S produced by genetic logic synthesis is an ordered
set of DNA components. This ordered set must conform to a ligation or
physical concatenation of the set of transcriptional units that arise from
the connectivity of the synthesized transcriptional genetic logic circuit.
2.3.3 Concepts
Types of Vertices
In this framework for genetic logic synthesis, a graph can contain three types of
vertices.
• Input: A vertex is an input when it is not a terminal vertex for any edge in
the graph. A colored input vertex represents a single promoter and generates
an output transcription signal on the outgoing edges for which it is an initial
vertex.
• Output: A vertex is an output when it is not an initial vertex for any edge in the
graph. A colored output vertex represents a coding sequence and reads/displays
an input transcription signal from the incoming edges for which it is a terminal
vertex.
• Primitive: A vertex is a primitive when it is a terminal vertex for at least
one edge and an initial vertex for at least one other edge in the graph. A
colored primitive vertex represents a transcriptional genetic logic gate. Hence, a
primitive vertex reads one or more input transcription signals from its incoming
edges and generates an output transcription signal on its outgoing edges. In
21
the case of digital logic synthesis, a primitive vertex would map n input signals
to one output signal using a completely specified Boolean function:
f : {0,1}n→ {0,1} .
In the case of genetic logic synthesis, however, a primitive vertex must map n
input signals to one output signal using the response function of its genetic logic
gate.
Boolean Signal Representation
In an electronic logic circuit, the discrete state of each electrical signal is determined
by measuring an electrical property, such as the voltage or current in a wire. An
electrical signal can be high or low, depending on whether the value of the measured
electrical property is above or below a specified threshhold.
For example, in a typical complementary metal-oxide-semiconductor (CMOS) cir-
cuit that is supplied with a voltage of VDD , a wire with a voltage ranging from 0 to
VDD /2 transmits a low signal and is represented in Boolean logic as 0. A wire with
a voltage ranging from VDD /2 to VDD , transmits a high signal and is represented in
Boolean logic as 1. Accordingly, the state of an electronic logic gate depends on the
properties of the gate and the states of its input signals in the context of the circuit
as a whole.
In a genetic logic circuit, however, it is the response function of each genetic logic
gate that determines the Boolean representation of signals within the circuit (Figure
2·3). Like an electronic logic gate, the state of a genetic logic gate depends on the
properties of the gate and the states of its input signals. The state of each input
signal, however, is determined by comparing the value of the signal to the response
function of the gate, rather than to a circuit-wide threshold . This poses a challenge
22
for defining discrete high and low states for genetic logic circuits.
Response Functions
Transcriptional genetic logic circuits transmit data via transcription signals. A tran-
scription signal provides a measure r ∈ R≥0 of RNAP flux within a genetic logic
circuit. In the colored DAG that represents such a circuit, each primitive vertex re-
lates the values of its k input transcription signals
(
ri1, ri2, . . . , rik
)
to the value of its
output transcription signal (ro) using a response function φ.
ro = φ
(
ri1, ri2, . . . , rik
)
.
Generally, each response function φ of a primitive vertex captures how its input
transcription signals are combined over the coding sequences of its corresponding gate
and subsequently transformed to an output transcription signal via genetic regulation.
Unlike primitive vertices, the input vertices of a graph do not possess response
functions of the sort defined here. Instead, an input vertex supplies its output tran-
scription signal with a value ro from a closed interval in R≥0. Hence, this output
signal has both maximum and minimum values ON and OFF.
An output vertex calculates the combined display value rd of its input transcription
signals
(
ri1, ri2, . . . , rik
)
in accordance with the response function φout.
rd = φout
(
ri1, ri2, . . . , rik
)
.
In general, φout represents how input transcription signals are combined over a
coding sequence that encodes the final molecular output of a transcriptional genetic
logic circuit. For instance, this output could be a fluorescent protein for experimental
measurement, an enzyme for biochemical production, or any natural or recombinant
cellular function.
23
Finally, a response function can also relate the input signals and output signal of
a transcriptional genetic logic circuit containing multiple gates. Figure 2·5 shows an
example of this type of circuit: an XOR gate composed of two NOT gates, two NOR
gates, and one OR gate. In order to reuse such a circuit as a library gate, its response
function φC would likely need to be determined empirically (beyond composing the
function from that of each individual gate in the circuit). In addition, this composite
gate would have multiple colors that would need to be taken into account to avoid
introducing crosstalk during genetic logic synthesis.
How a genetic circuit implements a Boolean function
Since transcription signals cannot be defined as having high or low states in the
context of a transcriptional genetic logic circuit as a whole, it is challenging to define
how such a genetic logic circuit implements a Boolean function.
For example, consider a Boolean function:
f : {0,1}n→ {0,1} . (2.1)
and a colored DAG G that has:
• n input vertices that generate transcription signals with maximum and mini-
mum values ON and OFF.
• m primitive vertices that map their transcription signals in accordance with m
response functions (φ1,φ2, . . . ,φm).
• a single output vertex that displays a transcription signal calculated with φout.
The value of the function φC represented by G can be determined by calculating
the value of φout, which depends in turn on the values of the m response functions
and the n transcription signals generated by the input vertices. By setting the values
24
of these generated input signals to either ON or OFF and mapping them to the input
set {0,1}n of (2.1), 2n values for φC are obtained. Let ONmin = {φC | f (x) = 1}min
and OFFmax = {φC | f (x) = 0}max, where f (x) is the Boolean function and x ∈ {0,1}n.
It can then be said that G realizes the Boolean function specified in (2.1) if
ONmin > OFFmax. (2.2)
This conservative definition for a Boolean function allows the condition specified
in (2.2) to realize multiple Boolean functions. For instance, consider the following
possible values of φC : {16,0.2,0.4,0.5} for inputs a and b. These values can realize
all Boolean functions specified in Table 2.1.
a b (a+ b)′ (a⊕ b)′ (a+ b′) 1 0 V alue
0 0 1 1 1 1 0 16
0 1 0 0 0 1 0 0.2
1 0 0 0 1 1 0 0.4
1 1 0 1 1 1 0 0.5
Table 2.1: A Genetic circuit realizing multiple Boolean functions.
To solve this ambiguity, this thesis introduces a score α, to define the extent to













where rdmax is the maximum value of the codomain of φC and rdmin is the minimum
value of the codomain of φC .









and for Boolean Contradictions: f : {0,1}n→ {0}







The following section demonstrates how α defined in equations (2.3) (2.4) (2.5)
can be used to quantify the Boolean functions that realized the values φC : {16.0, 0.2,
0.4, 0.5} using the condition specified in (2.2). Let rdmin = 0.1 and rdmax = 17. Note
that these are hypothetical numbers that do not reflect simulation results.
Table 2.2 illustrates that the Boolean function (a+ b)′ can best realize the values
φC : {16,0.2,0.4,0.5}.
Function rdmax rdmin ONmin OFFmax α
(a+ b)′ 17.0 0.1 16.0 0.5 0.9586
(a⊕ b)′ 17.0 0.1 0.5 0.4 0.5030
(a+ b′) 17.0 0.1 0.4 0.2 0.5059
1 17.0 0.1 0.2 N/A 0.0059
0 17.0 0.1 N/A 16.0 0.0592




Within the established framework, the problem of genetic logic synthesis can be
defined as follows:
Given a Boolean function
f : {0,1}n→ {0,1,x}m . (2.6)
(where x denotes a don’t care)
and a finite transcriptional genetic logic library, with uniquely colored input promoters
with known ON/OFF levels, logic gates with known response functions, output coding
sequences and a well defined α, synthesize a DNA sequence S which can realize the
logic function specified by (2.6) (Figure 2·5).
Objectives
• Minimize the number of primitive vertices.
• Represent each vertex with a unique color from the genetic gates library (Fig-
ure 2·2).
• Color the sequence S to maximize the value of α (Figure 2·3).
2.3.5 Problem Complexity
This section will now remark briefly on the complexity of the Genetic Logic Synthesis
(GLS) problem. Consider the special case when α is given to be 1, since if GLS for
any α were easy, it could be used to solve for α = 1 too.
Consider the following Subset Sum Problem (SSP): given a set S of positive inte-
gers and a positive integer k, decide whether a subset of S sums to k. Via a reduction
from the Knapsack problem, SSP is known to be NP-complete [Lagarias and Odlyzko,
1985]. A polynomial time reduction exists from SSP to the Genetic Logic Synthesis
27
problem (GLS). Each element si ∈ S is translated into a Buffer gate — one that leaves
its input logically unchanged — with a piecewise linear response function





resulting in a gate library L of |S | gates. Let the boolean function to be synthesized
be f = {(0 7→ 1), (1 7→ 0)} — the unary negation function. Let the input promoter
operating points for low and high inputs be 0 and k +1 respectively.
First, it can be argued that any circuit synthesized using Buffer gates alone will




for the set of Buffer gates B ⊆ L used in the synthesized circuit. For f (1), the
output
OFFmax = (k +1) +
∑
s∈B s.
But, for the final gate,
rdmin = si , if the final gate is gate i, and (2.9)




It must be noted, that as per the definition of α, no circuit whose OFFmax , rdmin
or whose ONmin , rdmax can achieve an α = 1, since α measures the relative dynamic
range of the output gate utilized by the circuit. Thus, for the circuit comprising solely
buffer gates, α < 1.
Next, consider an additional gate g with
28
φg(x) = x if x ≤ k,
= 0 otherwise.
Note that a circuit comprising g alone will also have α < 1 since φg(0) = 0 and
φg(k + 1) = 0, which implies that ONmin = 0 and OFFmax = 0. But rdmax = k and
rdmin = 0. Thus,






Also note that g cannot act as an input gate at the given operating points.
With the gate library L∪ {g}, consider an instance of SSP where the answer to
the SSP is “yes.” Thus, there is a subset of S that sums to k. It must shown that a
corresponding circuit with α = 1 exists. Consider a circuit C comprising buffer gates




s = k, and
φC(k +1) = (k +1) +
∑
s∈B
s = 2k +1.
Now consider C plugged into g; call the full circuit C′. In this case,
φC′ (0) = k, rdmax = k, ONmin = k,
φC′ (k +1) = 0, rdmin = 0, OFFmax = 0,




Now consider an SSP where the answer to the SSP is “no.” Thus, all subsets
of S sum to a value either greater or smaller than k. As it has been shown above
independent of the status of the SSP, the buffer gates alone and the gate g alone both
result in a circuit with α < 1. So, consider the case where some circuit D comprising
buffer gates alone is plugged into gate g. (Call this circuit D ′.) Since this is a “no”
instance of SSP, φD(0) , k. Therefore, φD ′ (0) ∈ {0,φD(0)}. (If g were an internal gate,
the argument still holds.) Hence ONmin , rdmax = k. Thus, α(D) < 1.
It has now been shown that an instance of SSP can be translated into an instance
of GLS such that a circuit with α = 1 is obtained if and only if the SSP instance is a
positive one. But it is well known that SSP is complete for NP. Thus, it must be the
case that GLS is also NP-hard. Furthermore, the decision version of GLS – “is there
a circuit C implementing f with library L with α(C) = 1?” – is clearly in NP, since
given C, its α can be computed and its gates checked to be in L in time polynomial
in |L| and |C|. Thus, the decision version of GLS is NP-complete.
Thus, this proof reveals a fundamental challenge in designing multi-stage genetic
circuits: the selection of a response function matched subset of dynamic range-
maximizing gates from an arbitrary collection of gates.
2.4 General Approach
A major goal of both genetic and electronic circuit synthesis is to translate an ab-
stract functional description into a technology-specific representation that can be
directly implemented [iee, 2002]. This section discusses similarities and differences
between synthesis in the domains of living technology versus electronic technology in
the context of the generalized, three-step synthesis process presented in this thesis.
30
2.4.1 Specification
Specification is the process of describing abstract logic, vetting the description for
synthesis, and transforming it to a canonical form. Each of these terms are described
in this section.
Abstract logic requires a language for circuit description. For genetic circuits, this
language could be Verilog, VHDL [Gendrault et al., 2014], or more genetic-specific
languages, such as the Genetic Engineering of Living Cells (GEC) [Pedersen and
Phillips, 2009], Proto [Beal et al., 2011], or the Synthetic Biology Open Language
(SBOL) [Galdzicki et al., 2014] standard. Currently, SBOL describes the structure
but not the higher-order function of a genetic design. With proper extension [Roehner
et al., 2015], SBOL could form the basis for a standard genetic synthesis language: a
standard language originally used for circuit description can become a powerful tool
for circuit synthesis, as did VHDL for electronic circuit design.
All of the previous languages potentially contain instructions that may or may
not be synthesizable. As a result, the chosen language for circuit description must be
vetted for constructibility via a standard, although the requirements for “appropri-
ateness” of a genetic circuit will be different than that of an electronic circuit. For
example, genetic circuits lack a clocking function, rendering edge-sensitive Verilog
models as not appropriate for genetic construction. Additionally, genetic synthesis
currently lacks a standard for reproducibility and appropriateness as the IEEE stan-
dards for VHDL and Verilog provide for electronic circuit synthesis.
Finally, the vetted description is transformed into a correct representation of the
description that is canonical for optimization, such as a truth table or a reduced
ordered binary decision diagram (ROBDD) [Bryant, 1986].
31
2.4.2 Design
The design step takes the output of specification and uses an optimized number of
generic primitive gates to implement its logic operation. For example, ABC [Brayton
and Mishchenko, 2010] uses an AND-Inverter graph library to heuristically optimize
the specified logic operation. The gates are considered generic because they have no
detailed information regarding their physical characteristics. In approaches to genetic
logic synthesis, generic logic gates often take the form of abstract genetic modules,
DNA components, and/or biochemical components (such as RNA and protein) that
fix the types of biological mechanisms implementing the logic, but not the exact
physical structures of the components involved in these mechanisms.
Optimization during the design step must be defined by at least one cost function.
For electronic circuits, wires could be considered cheap, as in the case of an optimal
construct for a programmable logic array. This construct is achieved via two-level
synthesis and results in a circuit expressed in sum-of-products form, thus having a
higher fan-in than a circuit produced by multilevel synthesis (i.e. synthesis using mul-
tiple levels of gates) [Chu, 2006]. For genetic circuits, additional wires are frequently
considered expensive and thus multilevel synthesis is preferred.
Other costs are challenging to quantify. First, unknown genetic context effects
that can degrade circuit performance carry a risk factor that increases with circuit
size. Secondly, since genetic circuits are run by a living host cell, each additional signal
competes for the same metabolic energy and resources the cell needs to survive. These
resource limitations and other modes of circuit toxicity are challenging to predict, so
the size of the genetic program needed to implement the logic function should be
minimized (see Table 2.3 for examples of cost functions). This can be done at the
generic gate-level during the design step or a merged design-and-build step. Since
genetic logic circuits have orders of magnitude fewer gates (less than 20 promoters
32
or coding sequences [Moon et al., 2012, Purnick and Weiss, 2009]) than electronic
circuits, more exhaustive methods for gate-level optimizations might prove useful.
2.4.3 Build
The build step takes the optimized generic gate structure and composes a functionally
identical structure that is specific to a particular technology. Whereas a gate compo-
nent is defined primarily by its function, gates and components at the build layer are
further characterized by their physical attributes [Chu, 2006]. The build process for
both electronic and genetic domains begins by finding physical constructs that behave
reliably and making these constructs modular. These physical constructs can draw
from multiple biological mechanisms for the same logic function [Brophy and Voigt,
2014]. For example, an AND gate can be constructed from RNA toehold switches,
riboswitches, RNA-aptamers, metabolic enzymes [Miyamoto et al., 2012], coregula-
tory transcriptional activators [Moon et al., 2012], recombinases [Siuti et al., 2013],
split polymerases [Shis and Bennett, 2013], or by combining NOR and NOT gates
(also having multiple biological mechanisms) [Nielsen and Voigt, 2014,Stanton et al.,
2014]. The problem becomes a matter of routing intermediate signals between reliable
modules, taking care to match signal types. Not all biological mechanisms of logic
share the same type of signal, which in this framework is a measure of transcription
from a promoter.
The output of the build step is a fully mapped structural representation of the op-
timized logic construct expressed solely in components found within the list of reliable
modules. This list of reliable modules can be thought of as an Application Specific
Integrated Circuit (ASIC) standard cell library in electronics and a transcriptional
genetic logic library in genetics. Synthesis targeting an ASIC further behaves as an
analog to synthesis targeting genetic circuits in that both require post-synthesis fabri-
33
cation. In contrast, there is not currently a genetic counterpart to a pre-manufactured
Field Programmable Gate Array (FPGA).
Though genetic circuit mapping resembles that for ASICs, the genetic equivalent
to the standard cell library also has a finite number of unrepeatable intermediate
signal carriers from which to draw. Graph coloring from a finite library is required
to avoid crosstalk from transcriptional factors that coexist and mix within the same
intracellular medium. Furthermore, genetic gates cannot currently conform to a single
static discipline that defines a standard noise margin and system-wide thresholds
for input/output signal strengths. Instead of using a static discipline, each gate
interaction in genetics requires signal matching. Finally, even a correctly colored and
tuned circuit could reliably produce failures during experimentation. These proven
failures should be captured in a library and avoided during the build process.
2.4.4 Workflow
Figure 2·5 highlights a workflow to solve Genetic Logic Synthesis. The Genetic gates
library. consists of four NOR/NOT gates, two AND gates, and an OR output.
Dashed gray input promoters will be replaced by promoters from other gates in a
connected circuit. In the response function heatmaps showing ro = φ(ri1, ri2), white
indicates low output and black indicates high output. Regulation arcs ending with
a flat line indicate repression, arcs ending with an arrow indicate activation. The
specification is the logic function of interest and the library of experimentally charac-
terized DNA components are required inputs. In the design step, the logic function
is mapped to an uncolored graph, where gates are represented as generic functional
primitives with no physical characteristics. The type of gates and number of gates
must be available in the genetic gates library. Five functionally equivalent Boolean
circuits are shown, but only three can be fully colored by the genetic gates library.
34
if( A!= B)
  out = 1;
else


























































Figure 2·5: Genetic logic synthesis workflow.
In the build phase, the graph is colored using available DNA components from the
library. Black and gray promoters are the input vertices, the white coding sequence
is the output vertex, and other colors represent a transcription factor, which can only
be used once per circuit to avoid crosstalk. The goal is to color the graph to maximize
circuit score α. Once the graph is colored, all DNA components are concatenated
into a final DNA sequence S, which can be ordered from a company or constructed
in a molecular biology lab.
35
2.5 Current Approaches
There are many methodologies that can be implemented to solve the genetic logic
synthesis problem. While the field of genetic logic synthesis is very new, previously
published methodologies for solving the genetic logic synthesis problem have been im-
plemented in various software tools and approaches, including BioJADE [Goler, 2004],
GEC [Pedersen and Phillips, 2009], AutoBioCad [Rodrigo and Jaramillo, 2013], Opt-
Circuit [Daika and Maranas, 2008], TASBE [Beal et al., 2012a] (includes the Proto
BioCompiler [Beal et al., 2011] and MatchMaker [Yaman et al., 2012]), SBROME [Huynh
et al., 2012,Huynh et al., 2013,Huynh and Tagkopoulos, 2014], iBioSim [Roehner and
Myers, 2014], Parts & Pools [Marchisio, 2014], Cello [Nielsen et al., 2016] (which will
be discussed in detail in Section 2.6), and Phoenix (which will be discussed in detail
in Chapter 3). Table 2.3 briefly summarizes a selection of these approaches.
36
Concept GEC AutoBioCad OptCircuit TASBE SBROME iBioSim P&P Cello Phoenix
Specification
Languages
GEC SBML n/a Proto SBOL SBOL ProMoT Verilog STL




Synthesis Categories ML ML ML ML ML ML TL ML ML
Cost Functions n/a n/a n/a # of TU # of GM n/a # of TF # of GLG # of DNAC
Build
Technology Libraries
DNAC DNAC DNAC DNAC DNAC GLG AGLG GLG DNAC
GM GM GM
Mapping Considerations
CT CT CT CT CT CT CT CT Sim
Sim Sim Sim SM LC SL SM, Tox SM
Sim Sim Sim TT
Computational Approaches
ES MINLP MINLP GS GS DP, B&B ES ES ES
DP, B&B SA SA
MINLP
Table 2.3: Genetic Logic Synthesis Implementations. Key: Specifica-
tion: GEC = Genetic Engineering of Living Cells, P&P = Parts&Pools,
LBS = Langauge for Biochemical Systems, SBML = Systems Biol-
ogy Markup Language, SBOL = Synthetic Biology Open Language,
ProMoT = Process Modeling Tool, STL = Signal Tempoeral Logic.
Design: GM = Genetic Module, TF = Transcription Factor, TU =
Transcriptional Unit. Build: AGLG = Abstract Genetic Logic Gate,
B&B = Branch&Bound, DNAC = DNA Component, DP = Dynamic
Programming, ES = Exhaustive Search, GLG = Genetic Logic Gate,
GS = Greedy Search, LC = Ligation Count, MINLP = Mixed Integer
Nonlinear Programming, SL = Sequence Length, SM = Signal Match-
ing, SA = Simulated Annealing, Tox=Toxicity, TT = Truth Table.
37
2.6 Cello - An Implementation of Genetic Logic Synthesis
Electronic design automation (EDA) was developed to aid engineers in the design
of semiconductor-based electronics. In an effort to accelerate genetic circuit design,
principles from EDA was applied to enable increased circuit complexity and simplify
the incorporation of synthetic gene regulation into genetic engineering projects. This
section will now describe a software implementation for Genetic Logic Synthesis -
Cello (www.cellocad.org), which uses the hardware description language Verilog to
enable a user to describe a circuit function. The user also specifies the sensors, actua-
tors, and “user constraints file” (UCF), which defines the organism, gate technology,
and valid operating conditions. Cello uses this information to automatically design a
DNA sequence encoding the desired circuit. This is done via a set of algorithms that
parse the Verilog text, create the circuit diagram (or circuit topologies), assign gates,
balance constraints to build the DNA, and simulate performance.
Section 2.6.1 gives a high level overview of Cello while Sections 2.6.2– 2.6.8 present
the various implementation details of Cello.
2.6.1 Cello Design Environment - Overview
Verilog is a commonly used hardware description language for electronic system de-
sign [Thomas and Moorby, 2008]. It is hardware-independent, meaning that a circuit
can be described by abstract textual commands and then transformed into different
physical implementations (i.e., chip types). Verilog is often accompanied by a simula-
tion package that aids the evaluation of a design in silico before building the system.
Verilog code has a hierarchical organization centered on modules that communicate
through wires to propagate signals. In this implementation, circuit function can be de-
fined by Case, Assign, or Structural statements within modules (fig. 2·10). The initial
focus of Cello is to create asynchronous combinational logic without feedback. This is
38
Figure 2·6: Overview of Cello.
Figure 2·7: The physical specification for the Eco1C1G1T1 UCF.
39
useful in the design of genetic circuits that can process multiple environmental sensors
in order to choose among different cellular functions. However, Verilog provides the
framework to extend the designs to include more complex circuits, including those
with specified timing and signal strengths as well as analog (Verilog-AMS) functions.
The philosophy behind Cello is to generate circuits for highly specified physical
systems and operating conditions. This paradigm is captured by the UCF, which
specifies (i) the gate technology, including DNA sequences and functional data; (ii)
defined physical locations for the circuit (e.g., plasmid or genomic locus); (iii) the
organism, strain, and genotype; (iv) operating conditions where the circuit design is
valid; (v) architectural rules to constrain the part arrangement; and (vi) preferred
logic motifs to be incorporated during logic synthesis.
The UCF follows the JSON (JavaScript Object Notation) standard [Crockford,
2006], which is both human and machine-readable and is convertible with SBOL
(Synthetic Biology Open Language) [Galdzicki et al., 2014]. The Eco1C1G1T1 UCF
(data file S1) was developed for E. coli (NEB 10-beta) and gate technology based on a
set of 12 prokaryotic repressors [Stanton et al., 2014]. The development of additional
UCFs would enable a circuit design to be transferred to other organisms, conditions,
or gate technologies.
When a user selects a UCF and synthesizes a circuit from Verilog code, the cor-
responding DNA sequence is designed in three steps (Fig. 2·6). First, the textual
commands are converted to a circuit diagram. Algorithms parse the Verilog code
and derive a truth table (fig. 2·10), which is converted to an initial circuit diagram
by the logic synthesis program ABC [Brayton and Mishchenko, 2010] and subse-
quently modified to only contain logic operations for gates available in the UCF
(fig. 2·12). The second step is to assign specific regulators to each gate in the di-
agram. Functionally connecting gates requires that the outputs from the first gate
40
Figure 2·8: Overview of Cello - Block Diagrams
span the input threshold of the second gate (Fig. 2·9B). Because gates based on dif-
ferent regulators have different response functions, not all pairs can be functionally
connected (Fig. 2·9C). Identifying the optimal assignment is an NP-complete prob-
lem (Fig. 2·9D) as discussed in Section 2.3.5. A Monte Carlo simulated annealing
algorithm was implemented to rapidly identify an assignment that produces the de-
sired response (Fig. 2·9E and fig. 2·17). The third step is to create the linear DNA
sequence based on the circuit diagram and gate assignment. The assignment is con-
verted into a set of parts and constraints between the parts (written with the Eugene
language [Oberortner et al., 2014]). The UCF can also include additional constraints
on the genetic architecture—for example, to forbid a particular combination of parts.
A combinatorial design algorithm [Smanski et al., 2014] is used to build a genetic
construct that conforms to the constraints (fig. 2·18). This allows a user to design
multiple constructs containing the same circuit function and genetic constraints while
varying unconstrained design elements to build a library that can be screened.
Cello then simulates the performance of the genetic circuit. When flow cytometry
data are provided in the UCF for the gates, this provides the cell-to-cell variation in
the response for a population of cells. A computational approach was developed to
quantify how population variability propagates from the sensors through the gates to
41
Figure 2·9: Assignment of genetic gates to the circuit diagram.
the output promoters (fig. 2·19). Cello applies a simple algorithm to determine how
signals propagate from the sensors through the gates to the output promoters. This
generates predicted cytometry distribution for all combinations of input states, which
can be directly compared to experiments. Finally, for each gate, the load on the cell
for carrying the gates is estimated on the basis of their impact on growth [percent
reduction of optical density at 600 nm(OD600)] as a function of the activity of the
input promoter [Nielsen et al., 2016]. For any combination of inputs, if the predicted
growth reduction exceeds a threshold, this information can guide multiobjective cir-
cuit optimization or be provided as a warning to the user (Fig. 2·16).
2.6.2 Specification: Verilog hardware description language
A subset of Verilog is synthesizable, meaning the program can be directly mapped to a
physical implementation in hardware. Synthesizable Verilog is transformed to a netlist
(a list of connected primitive gates that can be mapped to a hardware technology),
which is functionally equivalent to the Verilog code. The subset of Verilog used in
Cello is described in this section.
42
Verilog module
Verilog is written using modules, where each module has a name, and the line defining
the module name also requires input definitions and output definitions. The following
box defines a module named “example” with output “x”, and inputs “a” and “b”.
Keywords are shown in blue.
module example(output x, input a, b);
endmodule
Assign Statement
Within a Verilog module, Cello accepts and parses assign statements, case statements,
and structural statements. An assignment provides a concise way to specify a combi-
national logic function. Assign statements use an = operator to set the value of a wire
on the left-hand side based on the wire values and logic operators on the right-hand
side.
module example(output x, input a, b);
assign x = a & b;
endmodule
Additional Verilog operators that can be used in assign statements are: (a & b)
which equals (a AND b), (a |b) which equals (a OR b), and (∼ a) which equals (NOT
a).
The following statement uses multiple operators.
module example(output x, input a, b, c);
assign x = a & b | ~c;
endmodule
The order of operations proceeds from left to right. Parenthesis can be used to specify
a different order of operations to implement a different function.
43
module example(output x, input a, b, c);
assign x = a & (b | ~c);
endmodule
More complex assign statements can use internal wires to carry values within
the module. To use internal wires, the names must be defined, and they must be
assigned (appearing on the left-hand side of the equation) before they can be used
as an operand on the right-hand side. The function above can also be implemented
using internal wires.
module example(output x, input a, b, c);
wire w1, w2;
assign w1 = ~c;
assign w2 = b | w1;
assign x = w1 & w2;
endmodule
Case statement
case statement provides a way to specify a truth table in Verilog. Since all combina-
tional logic functions can be represented as a truth table, the case statement can be
used to specify any combinational logic function as input to Cello. A case statement is
placed within an “always” block. An always block contains a “sensitive list”, meaning
the always block executes the code within the begin/end keywords whenever a value
changes for a member of the sensitive list. The sensitive list below contains in1 and
in2.






The case statement is placed within the begin/end lines within the always block.
The line case(in1,in2) indicates that the argument of the case statement is in1,in2.
In Verilog, the brackets indicate concatenation, meaning the argument for the case
statement is one value that is the concatenation of in1 and in2. If in1 is 0 and in2 is
1, the argument would be 01.







The actual cases within the case statement are specified using a bit-wise numbering
system: 2’b01: out = 1’b0. This individual case executes when the argument is a
2-bit number in binary notation (2’b) equal to 01. When this case executes, the value
0 for a 1-bit number in binary notation (1’b) is assigned to the wire named “out”.
By specifying all combinations of input values as individual cases, a complete truth
table can be specified. The following example specifies the truth table for a 2-input
AND operation.




2'b00: {out} = 1'b0;
2'b01: {out} = 1'b0;
2'b10: {out} = 1'b0;





More complex case statements can be used to specify n-input m-output truth tables,
such as the multiple output truth table for the priority circuit(Figure 2·20).




3'b000: {x,y,z} = 3'b000;
3'b001: {x,y,z} = 3'b001;
3'b010: {x,y,z} = 3'b010;
3'b011: {x,y,z} = 3'b010;
3'b100: {x,y,z} = 3'b100;
3'b101: {x,y,z} = 3'b100;
3'b110: {x,y,z} = 3'b100;




The names of multiple output wires are concatenated within brackets, so the
concatenated value of xyz equals 000 in the first case, equals 001 in the second case,
and so on. Due to concatenation within brackets, the order of names matters in a,b,c
and x,y,z. However, the order of names in the sensitive list does not matter. For
instance:
• always@(a, b, c) is the same as always@(c, b, a)
• x, y, z is different than z, y, x
• case(a, b, c) is different than case(c, b, a)
Structural Statement
Assign statements and case statements are forms of “behavioral” Verilog, meaning
that the function is specified without considering a gate-level schematic. Structural
46
Verilog can be used to directly specify the desired circuit topology using the same
form as a netlist, which specifies a gate type, the output wire name, followed by the
input wire names.
nor (x, a, b);
The above example specifies the function x = a nor b. Note that gate types are
specified in lowercase in Verilog. Each gate can only have a single output, but can
have multiple inputs. Allowed gate types include: not, or, nor, and, nand, xor, xnor,
and buf. For example, the following specifies the function x = a and b and c.
and (x, a, b, c);
To use structural elements within a Verilog module, the above lines just need to
be written within a Verilog module:
module example (output x, input a, b, c);
and (x, a, b, c);
endmodule
Internal wires can be used to build up more complex structural statements. The
next example also implements 3-input AND logic, but uses a combination of four
NOT gates and two NOR gates:
module example (output x, input a, b, c);




nor (w3, w4, w5);
not (w2, w3);
nor (x, w1, w2);
endmodule
47
Even though structural Verilog can be used to specify a wiring diagram, logic
synthesis is used to convert certain primitive gate types might not be available in the
genetic gates library and to minimize number of gates in the circuit, if possible.
Combining Verilog statements
Explanations and examples of Verilog case statements, assign statements, and struc-
tural statements provided above were limited to one type of statement per module.
However, these forms can also be combined in a module to build more complex pro-
grams. An example is provided below that combines the following commands:
Define a module name, input wire names, and output wire names:
module example(output out, input a, b, c);
Initialize the internal wire names that will be required to carry values within the
module:
wire w1, w2, w3, w4;
Assign: Let w1 carry the value of the logical operation a AND c:
assign w1 = a & c;
Assign: Let w2 carry the value of the logical operation (NOT a) AND (NOT c):
assign w2 = ∼a & ∼c;
Structural: Define a NOR gate with output wire w3 and input wires w1 and w2:
nor (w3, w1, w2);
Structural: Define a NOT gate with output wire w4 and input wire w3:
not (w4, w3);
Case: Use a case statement to define a truth table for a 2-input AND function with
inputs w4 and b, and output out. Members of the sensitive list are w4 and b, so the
begin/end block will execute when w4 or b changes value. The argument for the case
statement is the concatenated value of w4 and b. Only set the value of wire out to 1
48
when the concatenated value of w4 and b equals 11. End the Verilog module:
endmodule
module example(output out, input a, b, c);
wire w1, w2, w3, w4;
assign w1 = a & c;
assign w2 = ~a & ~c;





2'b00: {out} = 1'b0;
2'b01: {out} = 1'b0;
2'b10: {out} = 1'b0;




2.6.3 Parsing Verilog to generate a truth table
Section 2.6.2 explained the syntax for writing Verilog code. All combinational logic
functions can be expressed in the form of a truth table, which is the entry point to
logic synthesis. This section describes how the Verilog program is parsed to a naïve
netlist (list of connected gates), and how the naïve netlist is used to generate a truth
table.
The first Verilog line is the module definition, which is parsed to obtain the in-
put names and output name(s). From there, individual assign, structural, and case
statements are parsed from the Verilog file. Each individual statement is converted
to a logic node that can contain a single gate, multiple gates, or a truth table.
49
Assign
A line starting with the assign keyword indicates an assign statement, which is parsed
to a tree data structure in which input wire names are the leaf nodes, the output wire
name is the root node, logic operators (∼, |, &) are the internal nodes, and parentheses
inform the branching. This tree is functionally equivalent to a circuit diagram, which
is used as a logic node with one or more logic gates.
Structural
A line starting with the lowercase name of a gate type (not, nor, or, and, nand, xor,
xnor) indicates a structural statement, which is parsed to a single-gate logic node of
that type, where the first argument indicates the node’s output wire name, and all
subsequent arguments indicate input wire names.
Case
An always block containing a case keyword is parsed to a truth table, where the wire
name within curly brackets (for example, out) is the output wire name of the node,
and the case argument (for example, w4,B) indicates the input wire names. Gate
types are not used in the logic node parsed from a case statement; instead, the truth
table is used to relate output values to the input values of the node. Connecting all
nodes according to the input/output wire names results in a graph that can be used
to propagate logic through each node to calculate the truth table specified by the
input Verilog(Fig. 2·10).
Nested Verilog modules
The example shown in Fig 2·10 used different types of Verilog statements in the same
module. However, Verilog modules can also be nested to form more complex pro-
50
Figure 2·10: Flow of Verilog parsed to a truth table. A Verilog file
is parsed into individual assign, structural, and case statements. Each
statement is converted into a logic node, which can contain one or
more gates, or a truth table. Logic nodes are connected by matching
input/output wire names, and Boolean logic is propagated through each
node to compute the truth table of the circuit output.
grams. In the module hierarchy, the referencing module is called the parent module,
and the referenced modules are called child modules (Figure 2·11). This nesting im-
plements the reuse of previously written modules, which is helpful when scaling up
to larger logic programs.
2.6.4 Logic synthesis
The previous section describes how the Verilog code is parsed to create a truth table.
This section focuses on the next step, which is to convert the truth table to a circuit
diagram. This is a process known as logic synthesis and the approach presented in
this thesis relies on algorithms that are typically applied to electronic circuits, with
additional steps to incorporate constraints that arise from working with a limited set
of genetic gates.
Logic synthesis is performed in several steps (Figure fig:S25). First, the truth table
is converted to a NOR-Inverter Graph (NIG). Second, logic motifs can be swapped for
51
Figure 2·11: Nested Verilog modules for module reuse. The same logic
function shown in Figure S23 is rewritten using a parent module that
references child modules. In the same flow, each statement is converted
to logic nodes, which are used to generate the truth table specified by
the full program. In Cello, the parent and child modules would appear
as a single long file.
52
equivalent subcircuits to reduce circuit size. Logic motifs can be stored and retrieved
from the UCF to biasing the circuit toward particular motifs that are desirable given
the biochemistries of the gates in the library.
To convert a truth table to a NOR-Inverter Graph (NIG), an intermediate step
uses the logic synthesis tool ABC [Brayton and Mishchenko, 2010] to generate an
AND-Inverter Graph (AIG) built exclusively from 2-input AND and NOT gates.
ABC minimizes the number of gates (nodes) and layers (longest path) in the AIG.
The AIG is converted to an NIG containing 2-input NOR and NOT gates. This
conversion is done by replacing (A AND B) with the equivalent (NOT A) NOR
(NOT B) according to DeMorgan’s rule.
As an alternative to ABC, a path to the circuit diagram was developed using
Espresso [Brayton et al., 9845], another commonly used tool for logic synthesis. This
approach differs in that it first converts a truth table to a minimized Product of
Sums (POS), which is then then converted to an NIG. Both the ABC and Espresso
routes are implemented in Cello and the one that produces the circuit diagram with
the minimum number of gates is selected. In approximately 95% of the cases, the
number of gates after the ABC route is less than or equal to the number of gates
after the Espresso route.
The user may have preferred logic motifs that they would like to have in the circuit
diagram, if possible. These could represent optimized combinations of gates (both
ABC and Espresso are not guaranteed to find the global minimum) or motifs that are
particularly robust for a given biochemistry. For example, the UCF developed, has a
list of small 3-input 1-output motifs generated from brute-force enumeration (Meth-
ods). Additionally, this is a simple mechanism to introduce non-NOR logic functions
for which genetic gates may be available. The Eco1C1G1T1 UCF motif library con-
tains: (1) a 2-input OUTPUT_OR motif to replace a NOR-NOT subcircuit at an
53
Figure 2·12: Logic synthesis workflow. The starting point is a truth
table. The AIG is converted to an NIG using DeMorgan’s rule: (A
AND B) equals (NOT A) NOR (NOT B), and removing double NOT
gates. Subcircuits in the initial circuit diagram can be substituted for
user-defined logic motifs specified in the UCF. The black dashed box
highlights one subcircuit from the initial circuit, and the red dashed
box indicates a functionally equivalent motif from the library, which is
substituted into the circuit. This process is done iteratively until no
more substitutions are identified. The logic constraints are determined
by the gate types available in the UCF, and the number of instances of
each gate type in the UCF. In this example, there are a maximum of
12 NOR/NOT gates, and any number of OUTPUT_OR gates.
54
output, (2) a 2-input 1-output optimal XNOR motif, and (3) small 3-input 1-output
NOR/NOT motifs. An attempt to incorporate the user-defined circuit architecture
motifs into circuit diagrams occurs during the final step of logic synthesis. Starting
with an initial NOR/NOT circuit diagram, subcircuits are replaced with a set of
user-defined motifs, if possible. This process has been inspired by the DAG aware
rewriting approach [Mishchenko et al., 2006]. This is performed by the following
steps.
1. First, all 4-feasible cuts are enumerated [Pan and Lin, 1998]. All possible sub-
circuits in the initial circuit diagram with 4 input wires and 1 output wire are
enumerated. This is done by visiting each gate’s output wire, then performing
a breadth-first search on the incoming wires and gates, proceeding until the
circuit inputs are reached. During this search, unique subcircuits are added to
a list.
2. Next, the subcircuit and each-user defined motif is checked for NPN equiva-
lence [Huang et al., 2013]. The truth table for each subcircuit and each user-
defined motif is evaluated. If a subcircuit and a motif have Boolean equivalence
(also checking permuted input wire order), then the motif is substituted in place
of the subcircuit. If multiple subcircuit/motif matches are found, the motif that
reduces the number of circuit gates the most is used.
3. Finally, each time a motif replacement is made, the replacement algorithm is
performed again until no more replacements can be made [Mishchenko et al.,
2006].
Motifs in the library can use gate types other than NOR/NOT, such as AND,
NAND, OR, XOR, or XNOR. To constrain the logic gates according to the number
55
and types of gates available in the genetic gates library, a cost function is used during
subcircuit substitution. The cost is the total number of gates in the circuit that
exceed the gates available in the library. For example, if there are 6 NOR/NOT gates
and 1 AND gate in the library, and the circuit has 7 NOR/NOT gates and 2 AND
gates, the cost would evaluate to (7-6) + (2-1) = 2. If there are enough available
gates in the library to cover the gates in the circuit, the cost is 0. A substitution is
rejected if the cost increases, and is accepted if the cost decreases or does not change.
This cost evaluation guides logic synthesis to produce a circuit that can be covered
by the gates library. However, after subcircuit substitution converges and no more
substitutions are possible, if the cost is still greater than 0, the circuit is reported as
“not synthesizable”.
2.6.5 Response functions and cytometry data for insulated gates
Production of YFP from the insulated gates’ outputs were measured at various in-
ducer concentrations by cytometry and converted to output relative expression units
(RPU) [Nielsen et al., 2016]. For each of these gates, IPTG was used to induce
gate expression from the PTac promoter. Additionally, inducer concentrations were
converted to input promoter activity by measuring expression of YFP from PTac at
those inducer concentrations (Figure 2·13a). The median input and output RPU val-
ues were plotted for each inducer concentration to create the experimental response
function (Figure 2·13b).
These empirical response functions were fit to a Hill equation:




where y is the output promoter activity, x is the input promoter activity, ymax
is the maximum observed promoter output value, ymin is the minimum observed
56
Figure 2·13: Distributions and response functions for insulated gates.
(A) Representative YFP fluorescence histograms for each gate are each
normalized to RPU (see Supplementary Section VI.C). IPTG concen-
trations used were: 0, 5, 10, 20, 30, 40, 50, 70, 100, 150, 200, and 1000
M. (B) The response functions are fit to Equation S1 (black lines).
Error bars are one standard deviation of the median for three experi-
ments performed on different days. Hill equation parameters are given
in Table reftable:S4.
57
Repressor RBS y∗min y
∗
max K
∗ n Toxicity (RPU )#
AmeR F1 0.2 3.8 0.09 1.4 -
AmtR A1 0.06 3.8 0.07 1.6 4.1
BetI E1 0.07 3.8 0.41 2.4 -
BM3R1 B1 0.004 0.5 0.04 3.4 -
B2 0.005 0.5 0.15 2.9 -
B3 0.01 0.8 0.26 3.4 -
HlyIIR H1 0.07 2.5 0.19 2.6 -
IcaRA I1 0.08 2.2 0.10 1.4 1.7
LitR L1 0.07 4.3 0.05 1.7 0.2
LmrA N1 0.2 2.2 0.18 2.1 -
PhlF P1 0.01 3.9 0.03 4.0 -
P2 0.02 4.1 0.13 3.9 -
P3 0.02 6.8 0.23 4.2 -
PsrA R1 0.2 5.9 0.19 1.8 -
QacR Q1 0.01 2.4 0.05 2.7 N.D.
Q2 0.03 2.8 0.21 2.4 1.7
SrpR S1 0.003 1.3 0.01 2.9 -
S2 0.003 2.1 0.04 2.6 -
S3 0.004 2.1 0.06 2.8 -
S4 0.007 2.1 0.10 2.8 -
Table 2.4: Insulated gate response function parameters. * – In units of
RPU. # – Highest input RPU achieved before cell growth was reduced
>20% compared to a control (Methods). Dashes indicate no toxicity
observed at the highest inducer levels. N.D. means no data collected.
58
promoter output value, K is the repression threshold (the input value at which the
output is half maximum), and n is the Hill coefficient. The Hill equation is overlayed
with the experimental response function (Figure 2·13b), and the parameters for the
Hill equation fits are provided in Table 2.4.
2.6.6 Repressor assignment
Section 2.6.4 describes how the circuit diagram is generated. The next step is to
assign genetic regulators to the gates in the diagram. Each gate is based on a unique
biochemistry and thus generates a different response function. The assignment prob-
lem is to identify the optimal way to select and connect these gates to generate the
maximum overall dynamic range for the circuit. This section describes how a par-
ticular repressor assignment is scored. Next, the search algorithm is described that
optimizes the assignment.
One approach to the assignment problem would be to permute all possible com-
binations of gates and identify the one that generates the best circuit. This would
guarantee the identification of the global optimum. However this method becomes
intractable as circuit size and library size grow. The number of unique assignments
(with a single RBS variant per gate) is given by r!/(r-g)!, where g is the number of
gates in the circuit and r is the number of repressors in the library. With the Cello
library of repressors (including RBS variants), a 9-gate circuit has ∼ 1011 permu-
tations. A search algorithm needs to be implemented to scale to larger circuits and
libraries, but often comes with the tradeoff of introducing stochasticity into the search
and can converge on local optima.
59
Figure 2·14: Cello circuit score calculation.
60
Calculating the circuit score
Unlike the score (α) described in Equation 2.3 in Section 2.3, the Cello framework
adopts a much simpler score S. The circuit score S captures how closely the logic
function generated by a repressor assignment matches the desired truth table for
the circuit. Because the output of genetic circuits is not digital, the ON and OFF
states have numerical values and a larger difference between these values (the dynamic
range) is desirable. Calculating S requires two steps. First, the output is calculated
for all combinations of input states. The activity of the sensors feeds into the gates
and their response functions (Equation 2.13) are used to calculate how the signal
propagates through the circuit. Then, S is calculated by comparing the lowest output





An example is shown in Figure 2·14, where there are two sensors and four input
states. Figure 2·14a shows the circuit diagram for an XOR circuit with gate assign-
ments AmtR (blue), IcaRA (magenta), and PhlF (orange). Figure 2·14b shows the
visualization of signal propagation for each of four input states. Colored curves are
gate response functions (Equation 2.13) with the same coloring scheme from (A).
Dashed vertical lines represent promoter input levels for the gate. Dashed horizontal
lines represent promoter output levels. The “+” symbol indicates promoter outputs
from the IcaRA and PhlF gate are summed at the terminal OR gate. Figure 2·14c
shows the predicted output levels for each of the four input combinations. The 0s and
1s at the top of the graph indicate the desired truth table behavior for each output.
The lowest ON state and highest OFF state are marked, and the ratio of these values
is the circuit score, S.
61
Figure 2·15: Toxic circuits and gate measurements. Expression of each
repressor from the tandem promoter with seven IPTG concentrations:
0, 9.5, 19, 47.5, 95, 285, and 950 M was induced; for an additional
five samples, 950 M IPTG was induced along with aTc at concentra-
tions: 0.0095, 0.095, 0.285, 0.95, and 1.9 ng/mL. After a period of
growth, the cultures’ absorbances at 600 nm was measured and the
values were normalized to the uninduced sample. For x-axis values,
YFP was measured from the same tandem promoter at the same in-
ducer concentration and fluorescence was converted to RPU. Error bars
are one standard deviation of absorbance (y-axis) and the median (x-
axis) for three experiments performed on the same day. Plasmids used
were pJS0101-pJS0109.
Calculation of predicted circuit toxicity
For each gate, normalized cell growth is measured as a function of input promoter
activity (Figure 2·15). For a circuit, certain input states can lead to the expression of
multiple repressors and this can lead to toxicity. For each gate in a circuit, the input
RPU is calculated, and the cell growth value is interpolated from the two nearest
experimentally-measured normalized cell growth values from the UCF. The toxicity
of the whole circuit for a particular input combination is calculated as the product of
normalized cell growth for each of the individual gates. There is no theoretical basis
for this; rather, it was chosen to strongly bias against circuits where any repressors
are expressed beyond their empirical toxicity threshold. After the toxicities of all the
input states are calculated, the toxicity of the circuit as a whole (“growth score”) is
62
Figure 2·16: Tradeoff between circuit score and predicted cell growth.
For the Majority circuit, each assignment has a circuit score (S) and a
growth score (represented as a point on the scatter plot). The Pareto
frontier is shown as a red line. A threshold is defined to eliminate toxic
assignments from consideration (shaded region in center plot). Left (as-
signment highlighted yellow in center plot): Prediction of assignment
with high S but toxic expression of IcaRA. Assigned gates: P2-PhlF,
H2-HlyIIR, A2-AmtR, B3-BM3R1, I1-IcaRA, S4-SrpR. Right (assign-
ment highlighted yellow in center plot): Prediction of assignment with
normal cell growth but low S. Assigned gates: B3-BM3R1, F1-AmeR,
S4-SrpR, A2-AmtR, P2-PhlF, H2-HlyIIR.
taken as the worst input state.
As shown in Figure 2·16 for the Majority circuit, there is a trade-off between the
circuits with the highest circuit score (S) and those that are at risk of reducing growth,
creating a Pareto-optimal curve. The current algorithm applies a cutoff (0.75) with
respect to the growth score and only allows circuit assignments that fall above the
cutoff.
Simulated Annealing Assignment Algorithm
The goal of repressor assignment is to find the combination of gates that maximizes the
circuit score, S (Equation 2.14). The repressor assignment problem has a large discrete
search space for which a Monte Carlo simulated annealing algorithm [Metropolis
63
Figure 2·17: Simulated annealing search algorithm for repressor as-
signment. Each plot shows 100 trajectories, and as the number of steps
increases for each trajectory, the highest S up to that point is plot-
ted (black lines). The temperature factor annealed according to the
schedule listed above.
and Ulam, 1949] was implemented to identify an optimum assignment. The search
initializes with gates from the library being randomly chosen and assigned to a gate
in the circuit. Any gate can be assigned to any position in the circuit. Each iteration
of the Monte Carlo algorithm swaps the assignments of two gates. This is done by
randomly selecting one gate in the circuit, randomly selecting a second gate either in
the circuit or in the gate library, and then performing the swap. After the swap, the
circuit score for the new assignment S’ is calculated and the move is accepted with a




After calculating the probability, a random number R between 0 and 1 is gener-
ated: if R < P , the swap is accepted, and if R > P , the swap is rejected. If the swap
improved S, then P > 1, and the move is always accepted. After the first assignment is





where i is the current iteration, Tmax is the starting temperature, and C is a
constant that determines the rate of cooling. After reaching the end of annealing, the
run continues at T = 0 until 10,000 steps progress with no additional improvement.
The simulated annealing results in Figure 2·17 show convergent solutions for the
circuits ranging from 5 to 9 gates, where Tmax = 100, and C is 5× 10−5.
Several modifications were made to the basic algorithm described above to allow
for additional constraints that do not appear in the S calculation. Some gates have
multiple RBS options, but a repressor cannot be used more than once in a circuit. To
prevent illegal swaps that reuse a repressor, a list of gates that can be legally swapped
is generated. Gates with the same repressor as the selected gate are allowed in the
list, because this swap simply replaces the RBS for the gate. Gates with a different
repressor are only allowed in the list if another gate from the circuit does not use the
same repressor. To avoid repressor reuse, gate group names are also specified in the
UCF. The group name will typically be the repressor name, but different repressors
can also be grouped if they exhibit cross-talk.
Additional constraints can be applied to reject assignments that would otherwise
be accepted based on S. For example, assignments whose growth score is below a
threshold or when two “roadblocking” promoters have to be connected as inputs to
a gate [Nielsen et al., 2016], are rejected in this implementation.
2.6.7 Combinatorial design of circuit layouts
After a gate assignment has been found for a circuit diagram, a linear DNA sequence
that contains the complete circuit is generated. This is done using combinatorial
design [Oberortner et al., 2014,Smanski et al., 2014], which has been applied to build
DNA sequences using a set of parts, constraints between parts, and organizational
rules. The assignment algorithm leads to a list of parts in the circuit as well as
65
constraints between parts (e.g., due to roadblocking). The UCF can also contain
additional organization rules, such that the repressors have to appear in a specified
order and orientation. After the assignment algorithm, the parts and rules for a circuit
are automatically used to build a Eugene file. From this Eugene file, combinatorial
algorithms described previously [Oberortner et al., 2014, Smanski et al., 2014] are
used to build the DNA sequence, which is the output of Cello. The Eugene file itself
is also an output of Cello so that it can be run at a later time to generate additional
constructs (https://eugene.cidarlab.org/).
One of the advantages of using combinatorial design is that many constructs can
be built that preserve the same underlying rules–and therefore produce the same
circuit function–but unconstrained aspects of the design are allowed to vary. Before
the user runs Cello, an option is available to specify the number of desired constructs.
Building and testing a library of designs instead of a single design can help identify
a functional variant. Additionally, identifying failed and successful designs provides
a data set for learning new organizational rules [Smanski et al., 2014]. An example
of this are the Majority circuits, where multiple constructs are shown.
This section describes how gates and their component parts are organized in Eu-
gene as well as the impact of adding organizational constraints to the UCF. A hier-
archical design is used to describe circuits in Eugene (Level 1: Parts, Level 2: Gates,
and Level 3: Circuit).
In Level 1, part types are defined, and individual parts with those types are
defined. Parts in Eugene require a type and a name, while other attributes such as
DNA sequence can be added optionally. Below are examples of part definitions for a
promoter, ribozyme, RBS, CDS and terminator that make up a gate.
66












In Level 2, parts are assembled into gate devices (a device is defined as a collection
of parts). The gate device contains the ribozyme insulator, RBS (sometimes multiple
variants are allowed), repressor, and terminator. For a NOR gate, there can be two
additional undefined promoters. For example, the device for the PhlF gate (with the
P3 RBS) is as follows.
//Level 2: Gate device
Device PhlF_device(
Promoter, Promoter, RiboJ53, P3, PhlF, ECK120033737
);
Next, a set of rules have been defined that act on the P3_PhlF device. These rules
define the promoters that drive PhlF according to the circuit diagram, and an enforced
order of those promoters (e.g., to avoid roadblocking). The ALL_FORWARD rule
just orients all parts in the forward direction (this gate will be allowed in the reverse
direction in a later step). Note that rules on different lines must be joined with the
AND keyword.
67





pBM3R1 BEFORE pHlyIIR AND
ALL_FORWARD
);
Given the devices and rules for each gate, an enumeration of variants for each
device is performed using the ‘product’ function. The PhlF device shown above only
allows a single device variant.







In Level 3, gate device variants will be combined into the circuit device. First,
the circuit device and each gate device must be initialized, where each gate is named
by the repressor to allow rules from the UCF to be applied according to that name.









Rules are applied to the circuit device before enumerating circuit variants. The
EXACTLY 1 counting rule ensures that each gate appears once and only once in the
circuit. Additional rules can also be specified, for example requiring the PhlF gate
to be in the first position and in the forward orientation, and requiring each gate to
alternate orientation, as follows.
//Level 3: Circuit device rules
Rule circuit_rules
(ON circuit:
gate_PhlF EXACTLY 1 AND
gate_SrpR EXACTLY 1 AND
gate_BM3R1 EXACTLY 1 AND
gate_HlyIIR EXACTLY 1 AND
gate_BetI EXACTLY 1 AND





Now that the circuit rules have been specified, the combinatorial design step can
be performed. Nested for-loops iterate through all gate device variants from Level
2 (there might only be a single variant for each gate, or there might be variants
with different promoter orders). In the innermost loop, the current set of gate device
variants is used to build the circuit device in Level 3 in the ‘permute’ function. Each
set of designs in the inner loop is appended to an array called ‘allResults’.
69
//Level 3: Design of circuit variants
Array allResults;
for(num i0=0; i0<sizeof(PhlF_devices); i0=i0+1) {
for(num i1=0; i1<sizeof(BetI_devices); i1=i1+1) {
for(num i2=0; i2<sizeof(SrpR_devices); i2=i2+1) {
for(num i3=0; i3<sizeof(HlyIIR_devices); i3=i3+1) {
for(num i4=0; i4<sizeof(AmtR_devices); i4=i4+1) {
















allResults = allResults + result;
}}}}}}
Figure 2·18 explains how Eugene rules specify the design space of allowed circuit
layouts. At the gate device level (Level 2), the only degree of freedom is promoter
order within each gate. NOT gates can only have 1 variant. NOR gates can have
two variants, where either promoter order is allowed. This degree of freedom al-
lows 2N variants, where N is the number of 2-input gates. Enforcing roadblocking
70
rules (STARTSWITH) constrains promoter order, but for tandem non-roadblocking
promoters, either promoter order is still allowed (Figure 2·18, Gate devices).
In addition to promoter order, gate order/orientation is the other degree of free-
dom. At the circuit device level (Level 3), if the repressor order is constrained, and
all gates are in the forward orientation (as specified in Eco1C1G1T1), then the only
degree of freedom is promoter order from Level 2. In the example circuit assignment,
repressor order and all forward rules result in 4 solutions (Figure 2·18, Panel 1).
Additional variants can be generated by removing order/orientation rules. For
example, removing the FORWARD gate_PhlF rule allows the reverse orientation of
the PhlF gate and results in 8 solutions ((Figure 2·18, Panel 2). Removing a second
rule, gate_PhlF BEFORE gate_BetI, now allows the PhlF gate in any position and
results in 40 solutions ((Figure 2·18, Panel 3). Removing all remaining gate order
rules (a BEFORE b) allows unconstrained shuffling of gate order and results in 960
solutions ((Figure 2·18, Panel 4). Removing all remaining FORWARD rules allows
all gate orders and orientations and results in 15,360 solutions ((Figure 2·18, Panel
5). The Eugene rules specified in the UCF include roadblocking rules in Level 2 gate
devices, and repressor order and all forward rules in the Level 3 circuit device. As
described above, these rules can be removed to unconstrain the layout design space.
Furthermore, any additional rules from the Eugene language [Oberortner et al., 2014]
can be added to the UCF for user-specified constraints to the design space. The
number of desired variants can be specified in the Options tab of the Cello web
application. After layout design using Eugene, the Cello output has three forms. The
first output is a file containing an ordered list of part names and part orientations
(+, -) for each variant. The second output is a file containing an ordered list of gate
names and gate orientations for each variant. The third output is a separate plasmid
file for each variant in which the circuit module is inserted into the specified genetic
71
location.
Figure 2·18: Example sets of Eugene rules for the Majority circuit.
72
2.6.8 Predictions of circuit performance
Qualitative predictions of the circuit output distributions can be computed for each
input combination. This is performed as a final step in Cello, after the gate assignment
search has converged. In order to perform this step, each gate in the circuit must
have experimental cytometry distributions added to the UCF, with fluorescence values
converted to RPU [Nielsen et al., 2016].
As a first step, the experimentally-measured gate output RPU histograms are
normalized to have a total of 10,000 events in evenly log-spaced bins (one bin ev-
ery 100.024x RPU). At least 8 experimentally-measured output RPU histograms at
various input RPU levels are required (the gates in this work use 12 input levels).
Histograms Y(x) are generated at intermediate input levels x by positioning their
medians, <Y(x)>, on the gate’s response function (Equation 2.17, with parameters
fit). Once the medians are in place, the counts f for each bin are interpolated from the
counts (fL and fR) of the experimentally-measured histograms that lie to either side
of x (Equation 2.18). The experimentally measured histograms have input values, xL
and xR. The parameter m is the bin relative to the median, y/<Y>.








In this way, histograms at intermediate inputs are generated with medians on the
response function, and shapes interpolated from the nearest experimental histograms.
Once all the gate distribution response functions are computed, the qualitative predic-
tions for output distributions can be computed. For a particular input combination,
sensor values feed into the first layer of gate distribution response functions (dashed
73
vertical lines, Figure 2·19). This input value takes a vertical “slice” of the distribution
response function to create the output histogram. Next, those gate output histograms
become input histograms for the second layer of gates.
Input histograms can be viewed as 10,000 individual input events, each of which
produces its own output histogram—all of which are averaged to produce a histogram
containing 10,000 events. At NOR and OR gates, input histograms are combined by
first summing the histogram medians (or summing the histogram median and the
sensor input value if a sensor promoter is an input), then shifting both histograms to
be centered at that new median, and then averaging the counts in each bin to create
a histogram with 10,000 events.
Sensor input signals are propagated through gate distribution response functions
in the circuit until the output histograms are produced for each input row in the truth
table (Figure 2·19c).
2.7 Experimental Validation
Cello was applied to the design of 60 circuits for Escherichia coli, where the circuit
function was specified using Verilog code and transformed to a DNA sequence. The
DNA sequences were built as specified with no additional tuning, requiring 880,000
base pairs of DNA assembly. Of these, 45 circuits performed correctly in every output
state (up to 10 regulators and 55 parts). Across all circuits, 92% of the 412 output
states functioned as predicted.
This section highlights 52 circuits (Figure 2·20 out of the 60 that were designed
using Cello and built using insulated gates [Nielsen et al., 2016]. These circuits include
a Priority Detector (that prioritizes the inputs and selects which output is ON based
on the highest-priority input that is ON), well-known functions (e.g., a multiplexer),
and logic underlying cellular automaton pattern formation (e.g., “Rule 30”) [Wolfram,
74
Figure 2·19: Circuit distribution calculation.
75
2002]. Additional three input, one-output logic circuits were built that demonstrate
the ability to integrate inputs in different ways. These circuits could be used to turn
a cellular function on or off in response to an environment defined by multiple signals.
Each of the 52 circuits was specified either by using behavioral Verilog (with Cello
performing the logic minimization step) or by performing a separate enumeration to
identify the global minimum number of gates and specifying the circuit diagram using
structural Verilog. Subsequently, the global minimum three-input logic gates were
included in the UCF so that they could be incorporated as motifs in larger circuits
in future designs (fig. 2·12). For each circuit, the sensor promoters and ON/OFF
values were specified, the Eco1C1G1T1 UCF was selected, and a DNA sequence was
automatically generated by Cello. DNA synthesis [Kosuri and Church, 2014] and
assembly were used to build each sequence; sequences contained up to 10 regulators
and 55 parts. The output states of each circuit were measured by flow cytometry and
compared with the Cello predictions. No additional tuning was done to diverge from
the Cello-predicted sequence.
Of these 52 circuits, 37 functioned as predicted, such that all of the output states
matched desired ON and OFF levels (Figure 2·20). Further, the predicted cytometry
distributions closely matched those measured experimentally. Of 412 output states
across all circuits that were built, 92% were correct. The Consensus circuit (output is
ON only when all three inputs agree) is the largest, containing 10 regulatory proteins
(seven repressors from NOT/NOR gates and three from the inducible systems) and
55 genetic parts. Two of the circuits with four layers (0x3D and 0x8E) were selected
to characterize the switching dynamics between states (Figure 2·23) Interestingly,
0x8E shows a transient incorrect state, known as a “fault” in electronics, when the
inputs are changed from −/− /− to +/− /+. This is consistent with the last NOR gate
transiently receiving ON/OFF inputs until one of the signals transits two layers.
76
Of the 52 circuits, seven were incorrect in one state, two were incorrect in two
states, and five had ≥3 failures (figs. 2·24 – 2·26). As more gates were included in
a design, there was a higher probability of failure (Fig. 2·27A). Two circuits were
found to cause a growth defect (Fig. 2·27B). The circuits that failed in a few states
tended to match the remaining states closely, so the initial design could be used as
the basis for further rounds of optimization. To this end, debugging experiments
were performed to determine which gates failed. This was done by creating a set
of plasmids that contained each gate’s output promoter fused to yellow fluorescent
protein (YFP). These plasmids were transformed with the circuit in lieu of the output
plasmid, and the response of the internal gate was measured for all combinations of
inputs. An example of this is shown in Fig. 2·27C. From this analysis, most of
the circuit failures point to unexpected behavior from the anhydrotetracycline (aTc)
sensor (seven circuits) or AmtR gate (two circuits).
Screening design variants has the potential to increase the probability of success,
particularly for larger circuits. To do this, Cello outputs a Eugene file that contains
architectural rules from the UCF as well as constraints to enforce the circuit diagram
and repressor assignments (fig. 2·18). The user can specify the size of the library, and
a combinatorial design algorithm [Smanski et al., 2014] generates the target number
of constructs. Although all of the systems should be functionally equivalent, subtle
changes in their composition may affect circuit function through hidden effects (e.g.,
transcriptional read-through or promoter interference). This approach was tested
by designing a Majority circuit (Fig. 2·27D), whose output is ON when a majority
of its inputs are ON. A small library of six constructs that maintained the same
circuit diagram and repressor assignments, but in which the order and orientation of
genes was allowed to vary (Fig. 2·27E), was built. Several of these circuits functioned
correctly; the response of the best of these is shown in Fig. 2·27F.
77
Figure 2·20: Automated design of circuits by Cello.
78
Figure 2·21: Automated design of circuits by Cello.
79
2.8 Discussion
This chapter has detailed some of the unique challenges facing genetic logic synthesis
and identified potential approaches to meet these challenges. The approach presented
has explicitly separated functional specification from physical implementation. This
separation will allow for this framework to persist as genetic logic technologies advance
[Brophy and Voigt, 2014].
This chapter has also presented a novel implementation that addresses the issue
of design of synthetic regulatory networks being dominated by manual trial-and-error
tinkering at the nucleotide level. Cello automates the selection and concatenation of
parts and balancing the associated constraints. By doing this, it enables more rapid
design of larger multipart systems; the circuits that were presented in this thesis are
larger and more complex than most that have been built by hand. Of 60 circuits
designed automatically, 45 functioned as designed. The DNA sequence output rep-
resents a testable prediction that either validates the underlying theory or reveals
failure modes that can be addressed in the gate design. Previous experiments in
which repressors were used to build synthetic logic gates showed that this often led to
nonsensical functions that could not be predicted from the known interactions [Cox
et al., 2007]. Quantifying why predictions fail, where systems break, and how the
host evolves provides failure modes that can be addressed through engineering. This
chapter showed how iterative co-development of robust gates and software can con-
verge on genetic systems that are highly repetitive and modular, in stark contrast to
the encoding of natural networks.
80
Figure 2·22: Automated design of circuits by Cello.
81
Figure 2·23: Automated design of circuits by Cello. (A) All circuits
designed by Cello that had correct output states are shown along with
their genetic schematics, output predictions, and experimental mea-
surements. The inputs A, B, and C correspond to the PT ac, PT et, and
PBAD sensor promoter activities; their corresponding regulators (LacI,
TetR, and AraC*) are not shown in the schematics. The outputs (X, Y,
and Z) correspond to YFP driven from output promoters in separate
experiments. Solid black distributions are experimental data; blue and
red line distributions are computational predictions from Cello. (B)
Time-course data are included for two circuits.
82
Figure 2·24: Circuits with 1 failed output state.
83
Figure 2·25: Circuits with 2 failed output states.
Figure 2·26: Circuits with 3 or more failed output states.
84
Figure 2·27: Analysis of circuit failures and the design of multiple con-
structs by combinatorial design. (A) For the library of 60 circuits, the
fraction of correct states (black) and the fraction of fully correct circuits
(gray) are shown versus the number of repressors in the circuit. (B)
Impact on cell growth for the two circuits that failed because of toxicity.
The control bar is for cells containing the RPU standard plasmid only.
Data are average values from three experiments performed on differ-
ent days; error bars denote SD. (C) An example of circuit debugging.
All combinations of inputs for all wires were tested; for clarity, only a
subset of debugging for the failed state (+/+/+) is shown. The data
are normalized to [0, 1] to correct for the dynamic range across gates.
In this case, the failure originated when the AmtR gate produced an
intermediate response that then propagated through the circuit. (D)
The circuit diagram for a Majority circuit. Colors correspond to repres-
sors. (E) Six layouts for this circuit were designed that maintain the
same repressor assignments but allow the order and orientation of the
gates to vary. The circuit score (S) is defined as the ratio of the lowest
ON state’s median to the highest OFF state’s median. Error bars are
one SD for two experiments performed on different days. The dashed
line marks the lowest circuit score in the library. (F) Predictions and
cytometry distributions for the final design. The format and inducer




Automated Design of Genetic Systems with
Temporal Verification
3.1 Background
Synthetic biologists create new genetic circuits in informal, iterative design-build-test
cycles. As such, the field currently relies on ad-hoc methods, sometimes aided by
simulations and mathematical modeling. While recent advances have led to soft-
ware tools for computationally designing functional genetic circuits for Boolean logic
(described in Chapter 2), this is only one type of functionality that is desired of
genetic circuits. Biologists are often concerned with how their systems behave spa-
tially and temporally. Furthermore, most existing tools are primarily focused on the
steady-state function of the circuit and fail to capture performance specifications.
The ability to formally describe genetic circuit behavior over time with performance
parameters represents a powerful move towards building more complex and robust
genetic circuits.
This chapter presents a framework for Genetic Temporal Logic Synthesis (GTLS)
which incorporates the fundamentals of systems engineering and formal methods into
the design-build-test cycle of synthetic biology. This framework enables biologists to
specify precise temporal behaviors for biological systems and provides a method to
quantify the degree of satisfaction of system behavior against a performance specifi-
cation. The goal of the framework is to design “genetic systems” - which are genetic
86
circuits along with configurations that tell a biologist the conditions required for a
genetic circuit to satisfy a performance specification.
3.2 Performance Specifications for Genetic Systems
A formal specification is a mathematically unambiguous description of system [Baier
and Katoen, 2008]. Specifications are often used to describe various aspects of a
system, for instance, structural constraints, desired behavior, undesired behavior,
etc. While structural and functional specifications have been used to describe desired
behavior in previous efforts, performance specifications offer an additional richness to
describe very specific expectations of system behavior. Temporal logic is often used
in systems engineering to describe desired behaviors of complex systems. The use of
temporal logic to specify time-dependant behaviors of genetic circuits is a powerful
step forward towards building robust genetic systems that can be used in various
applications such as pattern formation and amorphous computing.
3.2.1 Signal Temporal Logic
Using performance specification to describe the behavior of a genetic circuit is a
novel approach used in the GTLS framework. It is important to note that a perfor-
mance specification differs from a functional specification. A functional specification
is a formal description that describes the intended behavior, capabilities, and interac-
tions of a system. A performance specification however formally describes a detailed
and specific requirement for an intended behavior of a system (throughput, latency,
etc.), as well as conditions a system must meet to satisfy a desired behavior. For
instance, a functional specification for a genetic circuit might specify that a genetic
circuit must produce GFP. However, a performance specification would be more spe-
87
cific: the genetic circuit must produce at-least 100 MEFLs of GFP within 60 minutes.
The main goal of the performance specification in GTLS is to capture both the tempo-
ral constraints as well as real-valued thresholds of various signals in a genetic circuit.
Signal Temporal Logic (STL) is a formalism used to specify dense-time real-valued
signals [Maler and Nickovic, 2004, Donzé et al., 2012]. As a specification language
often used to describe continuous-time systems, STL allows users to specify func-
tionally rich behaviors of genetic circuits [Vaidyanathan et al., 2017]. This section
describes the syntax and semantics of STL.
Notation
Let R,R≥0, N denote the set of real, non-negative real, and natural numbers, re-
spectively. |r | is used to denote the absolute value of r ∈ R. Given x ∈ Rn, ‖x‖∞ :=
maxi∈{1,··· ,n} |xi | - where xi is the i’th component of x - is its infinity-norm. A scalar-
valued function f : Rn → R is rectangular if for some i ∈ {1, · · · ,n}, f (x) = xi . A
metric space is an ordered pair (M,d), where M is a set and d :M×M→ R≥0 is a
distance function such that i) d(x,y) = 0⇔ x = y; ii) d(x,y) = d(y,x),∀x,y ∈ M; iii)
d(x,z) ≤ d(x,y) + d(y,z),∀x,y,z ∈ M. If (M,d1) and (M,d2) are metric spaces, then
(M,λd1 + (1−λ)d2), is also a metric space for any 0 ≤ λ ≤ 1.
This thesis uses discrete notion of time. Time intervals in the form I = [t1, t2] ⊂N,
t1, t2 ∈N, t1 ≤ t2, are interpreted as {t1, t1+1, · · · , t2}. [τ+t1, τ+t2] is denoted by τ+I ,
τ ∈N. The continuous interval {r |0 ≤ r ≤ 1} is denoted by U. An n-dimensional, real,
infinite-time, discrete-time signal s is defined as a string of real values s : s0s1s2 · · · ,
where st ∈ S,S ⊂ Rn, t ∈ N. The suffix of s at t, denoted by s[t], is a signal such
that s[t]τ = st+τ for all t,τ ∈N. s[t1, t2] := st1st1+1 · · ·st2 is used to refer to a particular
portion of a signal. The set of all signals with values taken in S is denoted by S . The
set of all signal prefixes with time bound T is defined as ST := {s[0 : T ] | s ∈ S} . For
88
the convenience of notation, s ∈ ST is used to say s[0 : T ] ∈ ST . The distance between
two signals s, s′ ∈ ST is defined as d(s, s′) := supt∈[0,T ] {‖st − s′t‖∞} .
Signal Temporal Logic
The syntax of STL is recursively defined as follows [Maler and Nickovic, 2004]:
φ ::=> | π | ¬φ | φ1 ∧φ2 | φ1UIφ2 ,
where > is the Boolean true constant; π is a predicate over Rn in the form of f (x) ∼ µ,
f : S→R, µ ∈R, and ∼∈ {≤,≥}; ¬ is the Boolean operator for negation; the Boolean
operator for conjunction is denoted by ∧ or &&; and UI is the temporal operator
until over bounded interval I . A predicate f (s) ∼ µ is rectangular if f is rectangular.
Other Boolean operations are defined in the usual way (for instance, the Boolean
operator for disjunction is denoted by ∨ or ||). Additional temporal operators include
eventually and globally.
Eventually (denoted by ♦ or F) is defined as ♦Iφ ≡ >UIφ, where I is an interval.
In other words, ♦Iφ is said to be true if sometime in the interval I , φ is true at-least
once.
Globally (denoted by  or G) is defined as Iφ ≡ ¬♦I¬φ, where I is an interval.
In other words, Iφ is true only if in the time interval I , φ is always true.
The set of all STL formulae over signals in S is denoted by ΦS .
The GTLS framework also uses G and F symbols to denote Globally and Even-
tually, respectively. This thesis also uses || and && to denote disjunction and con-
junction, respectively. It should also be noted that while STL does not place any
restriction on the format of a free variable that denotes a signal, in the implemen-
tation of the GTLS framework (Phoenix), all free variables must always have the
format:^out[0-9]+$ i.e. each free variable must start with the sequence out followed
89
by a sequence of numbers ranging from 0 to 9. The unit of time (specified in the
time intervals) is always minutes and the unit of any genetic circuit signal is always
Molecules of Equivalent Fluorescein (MEFL).
Robustness
(a) STL Specification: [0,6]x ≥ 2 (b) Robustness = -1
(c) Robustness = 1 (d) Robustness = 2
Figure 3·1: This figure shows the robustness of 3 signals against an
STL specification: [0,6]x ≥ 2. It is clear that Signal 0 in Fig. 3·1b does
not satisfy the STL formula, while Signal 1 and 2 (Fig. 3·1c and 3·1d
satisfy the STL formula. However, this also shows that Signal 2 satisfies
the STL specification better than Signal 1.
The STL score, also known as robustness degree is a function ρ : S ×ΦS ×N→R,
90
which is recursively defined as [Maler and Nickovic, 2004]:
ρ(s, (f (s) ∼ µ), t) =

µ− f (st) ∼=≤
f (st)−µ ∼=≥
,
ρ(s,¬φ,t) = −ρ(s,φ, t),
ρ(s,φ1 ∨φ2, t) = max(ρ(s,φ1, t),ρ(s,φ2, t)),
ρ(s,φ1 ∧φ2, t) = min(ρ(s,φ1, t),ρ(s,φ2, t)),









ρ(s,♦I φ,t) = max
t′∈t+I
ρ(s,φ, t′),




As one can inspect from (3.1), a signal satisfies an STL specification at a certain time
if and only if its corresponding STL score is positive: s[t] |= φ⇔ ρ(s,φ, t) > 0, where
|= is read as “satisfies”. The case of ρ = 0 is usually left ambiguous - this is never a
concern in practice due to issues with numerical precision. This framework considers
ρ = 0 as satisfaction, but by doing so, the principle of contradiction is sacrificed:
s[t] |= φ and s[t] |= ¬φ if ρ(s,φ, t) = 0.
The horizon of an STL formula is defined as the minimum length of the time




∥∥∥φ∥∥∥ = ∥∥∥¬φ∥∥∥∥∥∥φ1 ∧φ2∥∥∥ = ∥∥∥φ1 ∨φ2∥∥∥ =max{∥∥∥φ1∥∥∥ ,∥∥∥φ2∥∥∥}∥∥∥φ1U[t1,t2]φ2∥∥∥ = t2 +max{∥∥∥φ1∥∥∥ ,∥∥∥φ2∥∥∥}∥∥∥♦[t1,t2]φ∥∥∥ = ∥∥∥[t1,t2]φ∥∥∥ = t2 + ∥∥∥φ∥∥∥
(3.2)
The set of all STL formulae over signals in S such that their horizons are less than
T is denoted by ΦST . Note that computing ρ(s,φ, t) requires s[t : t +
∥∥∥φ∥∥∥], and the
rest of the values are irrelevant.




s ∈ ST | ρ(s,φ,0) ≥ 0
}
. (3.3)
Note that L(φ) ⊂ Rn(T+1). When the predicates are rectangular, the bounded-time
language becomes a finite union of hyper-rectangles in Rn(T+1).
Example 1 Let S =U. Consider the following six STL formulae in ΦS20:
φ1 = [0,20]θ1,φ2 = [0,20]θ2,
φ3 = ♦[0,20]θ1,φ4 = [0,20]θ1 ∧♦[0,20]θ2,
φ5 = ([0,10]θ1)∧ ([12,20]θ2),φ6 = [0,16]♦[0,4]θ1,
(3.4)
where θ1 = (x ≥ 0.2) ∧ (x ≤ 0.4), and θ2 = (x ≥ 0.2) ∧ (x ≤ 0.44).
∥∥∥φi∥∥∥ = 20, i =
1, · · · ,6. Two examples of bounded-time languages are: L(φ2) =
⋂20
τ=0{0.2 ≤ xt ≤
0.44},L(φ3) =
⋃20
τ=0{0.2 ≤ xt ≤ 0.4}. Consider two constant signals s1 and s2, where
s1t = 0.3, s
2
t = t/20, t = 0,1, · · · ,20. The STL scores are computed from (3.1). For
instance, ρ(s1,φ1,0) = 0.1, ρ(s2,φ1,0) = −0.6 (minimizer at t = 20), and ρ(s2,φ3,0) =
0.1 (maximizer at t = 6).
92
Example 2 This performance specification states that between the time intervals 30
minutes and 120 minutes, the output signal must always be greater than 500 MEFL
and less than or equal to 1000 MEFL.
G[30,120] (out0 <= 1000 && out0 > 500)
Example 3 This performance specification states that between the time intervals 30
minutes and 120 minutes, the output signal must greater than 500 MEFL and less
than or equal to 1000 MEFL at-least once.
F[30,120] (out0 <= 1000 && out0 > 500)
Example 4 This performance specification states that between the time intervals 30
minutes and 60 minutes, the output signal must greater than 500 MEFL and less
than or equal to 1000 MEFL and must remain between 500 and 1000 MEFL for 60
minutes.




Figure 3·2: Visual Representation of Example 2. Fig. 3·2a shows the
region of satisfaction of the STL formula specified in Example 2. The
area shaded in blue implies that a signal must always lie within the
region for the time bound that it covers. The trace shown in Fig. 3·2b
does not satisfy the STL specification since it between time 30 and
120 minutes, it is never less than or equal to 1000 MEFL. Although
the trace in Fig. 3·2c lies between 500 and 1000 MEFL from 30 to 80
mins, it does not satisfy the specification since it is greater than 1000
MEFL from time 80 to 120 mins. The trace in Fig. 3·2d satisfies the
specification since it always lies between 500 and 1000 MEFL from time
30 to 120 mins. Hence the trace in Fig. 3·2d has positive robustness
while the other two traces have negative robustness. The black dotted




Figure 3·3: Visual Representation of Example 3. Fig. 3·3a shows the
region of satisfaction of the STL formula specified in Example 3. The
area in green implies that as long as a trace comes within the green
region at-least once, the specification is satisfied. In this example, the
trace in Fig. 3·3b does not satisfy the specification while the traces in




Figure 3·4: Visual Representation of Example 4. Fig. 3·3a shows the
region of satisfaction of the STL formula specified in Example 4. In this
example, The blue box can be shifted left or right between time 30 and
120 mins. As long as a trace lies between 500 and 1000 MEFL for at-
least 60 mins between time 30 and 120 mins, it satisfies the specification.
Fig. 3·4d shows a trace that satisfies the specification while the traces
in Fig. 3·4b and Fig. 3·4c do not satisfy the specification.
96
3.3 Framework for Genetic Temporal Logic Synthesis
This section formally defines the terms and concepts required to formulate a frame-
work for Genetic Temporal Logic Synthesis (GTLS). The approach used to define
GTLS is similar to the one used to formulate the Genetic Logic Synthesis framework
specified in Section 2.3. In fact, many of the terms in this framework have been
derived from the GLS framework.
3.3.1 Terms
Synthetic Biology
• DNA Component (also referred to as a genetic part) - is an ordered set of nu-
cleotides that has a specific role during transcription or translation. A DNA
Component is typically represented by a finite string of characters (p1,p2,p3,. . .,pn),
where pi ∈ {A,T ,G,C} [Vaidyanathan et al., 2015]. A, T , G, and C represent the
nucleotides adenine, thymine, guanine, and cytosine, respectively. Promoters,
Ribosome Binding Sites (RBS), Coding Sequences (CDS), and Terminators are
examples of DNA components.
• Composite DNA Component (also referred to as a composite part) - is an
ordered set of two or more DNA components. Typically Composite DNA com-
ponents are DNA components whose role during transcription or translation
can be characterized or modelled together. In this framework, Promoters and
RBSs are clubbed together (represented as Promoter-RBS), and characterized
as a single unit.
• Biochemical Component - is a chemical compound or molecule that not a DNA
component but plays a specific role during transcription, translation, or any
biological interaction. Small molecules, Proteins, and RNA are examples of
97
Biochemical components.
• Genetic Module - is a set of DNA components and Composite DNA components
and any of their encoded or associated biochemical components (such as RNA,
protein, and small molecules) that interact to perform some biological function
[Vaidyanathan et al., 2015].
• Transcriptional Unit - is a partially ordered set of DNA components and Com-
posite DNA components that is capable of transcription [Vaidyanathan et al.,
2015]. A transcriptional unit has one or more terminators preceded by a CDS,
preceded by an RBS, preceded by one or more promoters. The set is partially
ordered since if a transcriptional unit has more than one promoter, the order
of the promoters can vary. It is important to note however that in the mod-
elling framework used in the implementation of GTLS, a transcriptional unit
has exactly one promoter followed by one RBS (both of which form a composite
Promoter-RBS component), followed by one CDS, followed by one Terminator.
• Library of parts and modules - is a set of well characterized composite parts
(Promoter-RBS), DNA components (CDS and terminators), and biochemical
components (proteins and small molecules).
• Genetic Circuit (or genetic circuit assignment) - is an ordered subset of the
library of parts and modules that is biologically valid (see section 3.3.2).
• Genetic Circuit Configuration (or circuit configuration) - is the specific def-
inition of a genetic circuit along with instructions for external controls (e.g.
time at which a small molecule is added to a genetic circuit, concentration of
small molecule that is added to the genetic circuit).
98
• Genetic Circuit Signal - is the transcript level of a transcription factor that
is produced by a CDS over time. The transcript level is measured in MEFL
(Molecules of Equivalent Fluorescein). The transcript level of a fluorescent
CDS (often referred to as an output or reporter CDS) is generally considered
an output signal of the genetic circuit. The output signal is used to quantify
the performance of genetic circuit against a performance specification.
Graphs
• Graph G = (V ,E) - is a data structure that consists of a set of vertices V and a
set of edges E where an edge connects two vertices [Vaidyanathan et al., 2015].
In this framework, each vertex represents a set of DNA Components.
• Directed Graph - is a graph where each edge is an ordered pair of elements
(vi ,vt) in V . An edge eit starts from the initial vertex vi and ends with the
terminal vertex vt [Vaidyanathan et al., 2015].
• Path is an ordered set of edges between two vertices such that the terminal
vertex of each edge is the same as the initial vertex of the next edge in the
path [Vaidyanathan et al., 2015].
• Tree - is a special type of graph where any two vertices are connected by exactly
one path. A tree is always acyclic - there exists no path that starts with a vertex
vi and ends with the same vertex vi .
• Rooted Tree - is a special tree where a specific vertex is designated as the root




A tree in this framework can contain the following types of vertices:
• Root Vertex - is a vertex which is not the terminal vertex for any edge in E.
This vertex represents an abstract circuit topology (genetic circuit template).
• Transcriptional Unit - vertices are children of the root vertex Each such vertex
consists of a sequence of DNA components and composite DNA components
that forms a single transcriptional unit.
• Leaf Vertex - is a vertex which is not the initial vertex for any edge in E. In
this framework, the leaf vertex consists of abstract DNA components (CDS or
Terminator) or composite DNA components (Promoter-RBS). Leaf vertices are
children of a transcriptional unit vertex.
The process of creating a tree by breaking down an abstract circuit topology is called
Vertex decomposition (Section 3.7.3) as shown in Figure 3·18. The tree structure is
used in the exhaustive search algorithm to explore the solution space of all possible
genetic circuit assignments for a given circuit topology.
Types of Modules
In this framework, modules are broadly classified into input, connector, and output
modules.
• Input Module - is a composite part which consists of a Constitutive Promoter
and an RBS.
• Connector Module - is a genetic module that is composed of a CDS that reg-
ulates a promoter via it’s encoded transcription factor, as well as the promoter
100
TFiTFo TFo
Input Module Connector Module
(a) (b)
Figure 3·5: Biological Modules used in the GTLS framework. Fig-
ure 3·5a shows an input module while Figure 3·5b shows a connector
module. The dotted box indicates the DNA components (and the as-
sociated biochemical components) that make up the module.
that is regulated by the transcription factor. In the implementation of the GTLS
framework, since promoters and RBS are modelled together, a connector mod-
ule is composed of a CDS component as well as the Promoter-RBS composite
component that is regulated by the encoded transcription factor of the CDS. A
connector module is categorized based on the type of interaction between the
transcription factor and promoter.
– Activating Module - is a connector module where the transcription fac-
tor activates (or stimulates) the promoter. In some cases, the activating
module requires a small molecule for the promoter to be activated. This
is known as an induced activating module.
– Repressing Module - is a connector module where the transcription factor
represses (or inhibits) the promoter. In some cases, the repressing module
requires a small molecule for the promoter to be repressed. This is called
an induced repressing module.
• Output Module - is a fluorescent CDS. The output module reads/displays the
output signal of the genetic circuit.
101
Response Functions
As stated in Section 3.3.1, a genetic circuit signal is a measure (s ∈ R≥0) of the
transcript level over time. A response function provides the relation between the
input transcript level and output transcript level of a genetic module at any time t.
To understand the relation between genetic circuit signals and modules, consider
a simple connector module (ci ,p) where ci is a CDS and p is a promoter. For the
sake of simplicity, only the promoter that is part of the Promoter-RBS composite
part is considered. Let the transcription factor encoded by ci be T Fi . T Fi regulates
the promoter p. Now, let co be the CDS whose expression is driven by p, and T Fo
be the transcription factor encoded by co. The input transcript level of the connector
module is the measure of T Fi while the output transcript level is the measure of T Fo.
It can be assumed that genetic modules are modular enough such that the output
transcript level does not vary based on co. A connector module relates the value of
the input transcript levels (si) to the value of the output value of the transcript level
(so) at any given time t, using a response function ψ
so[t] = ψ(si[t])
Input modules on the other hand, do not have any input transcript level since
constitutive promoters are not regulated by any transcription factor. Consider an
input module which consists of the promoter p. Let p drive the expression of the
CDS co and let T Fo be the transcription factor encoded by co. The value of the
output transcript level (so) of an input module at any given time t is given by:
so[t] = ψ(t)
The output module is typically a (fluorescent CDS) which displays/reads the
102
output transcript level of the input module or connector module, the promoter of
which drives the expression of the output module. The response function of an output
module consisting of the CDS cout, with an output response function ψout. At any
time t, the output module displays the value (sout) of the transcript level si and is
given by:
sout[t] = ψout(si[t])
Each unique output CDS has an output response function associated with it. For
any genetic circuit C, with n unique output CDS, the function of the circuit is given
by ΨC . At any time t, ΨC produces an n-tuple (s1[t], s2[[t]], . . . , sn[t]), where each
value of the tuple represents the value of an output signal at time t.
Biologically Valid Circuit Topologies
A biologically valid circuit topology is a template for a genetic circuit which consists
of an ordered set templates consisting of DNA components and composite DNA com-
ponents. For the circuit topology to be biologically valid, the sequence of elements










A biologically valid circuit topology does not guarantee the validity of a genetic circuit
that is designed by assigning parts and modules from a library to the abstract DNA
component template of the genetic circuit topology. In some cases, the assigned
genetic circuit may be biologically invalid or redundant, while in other cases, it might
complicate the verification step in the GTLS framework. A valid circuit assignment
103
is a genetic circuit that satisfies all the following conditions:
1. Number of unique ouput CDS must match the number of unique output vari-
ables in the STL formula. The mapping between the set of unique output CDS
in the genetic circuit and unique output variables in the STL formula should be
bijective (one-to-one and onto). This mapping is used in the verification step
to check if the genetic circuit satisfies the performance specification.
2. Circuit assignment must not be redundant. Every connector CDS assigned
to the genetic circuit, must be directly or indirectly involved in the biological
regulation of a promoter that drives the expression of an output (fluorescent)
CDS.
3. The circuit assignment must not have any incomplete connector modules. An
incomplete connector module is a module where either the CDS component or
the Promoter-RBS composite component of the connector module is assigned
but not both. The case where the CDS component of a connector module is
present but not the Promoter-RBS composite component, is a subset of the
previous rule - “Redundant circuit assignment”. An argument can be made
to allow an incomplete connector module where the Promoter-RBS component
has been assigned but not the CDS that regulates the promoter, since the
Promoter-RBS component could function as an input module. However, the
implementation of the GTLS framework (Phoenix) does not permit this case in
the interest of reducing the size of the solution space.
4. Circuit assignment must not violate any structural constraints. In the GTLS
framework, users can specify additional structural restrictions for DNA compo-
nents that typically address genetic context effects. A circuit assignment that
violates these rules is not valid.
104
GFP YFP luxR luxRpLux pLacIpLacI pLacI lasR luxR GFPpLacI pLas pLux




Figure 3·6: Biologically Invalid/Redundant Circuits.
Score of a Genetic Circuit Assignment
To quantify the extent to which a genetic circuit satisfies a performance specification,
this thesis introduces a score α. Consider a genetic circuit C with n unique outputs.
Let the response function for C be ΨC and let S = (s1, s2, . . . , sn) be the set of all output
signals of the genetic circuit. Let the performance specification for each signal si ∈ S
be φsi and let the set of all STL formulae over signals in S be Φ
S . The score of a
genetic circuit can be computed using two techniques depending on the number of
traces for each signal.
• Robustness - If each signal has exactly one trace (as in the case of determinis-
tic simulations or measuring the average of the signal value in an experimental
setup), the robustness (ρi) of each signal si is computed against its correspond-
ing performance specification φi . The score of a genetic circuit C for a given
specification ΦS is the arithmetic mean of the robustness of each signal:
αC =
ρ1 + ρ2 + . . .+ ρn
n
(3.6)
• Statistical Model Checking - For stochastic simulation results where many
traces are produced, statistical model checking techniques [Legay et al., 2010,
105
Younes et al., 2006] are utilized to determine the likelihood that a circuit will
satisfy the specification. When performing this type of analysis, a confidence













Here, c represents the desired confidence interval between 0 and 1, p is the
percentage of simulation runs that satisfied the STL formula, n is the total
number of simulation runs performed, ICDF(x) computes the inverse cumulative
probability of x, and d is the amount of error that the satisfying percentage of
simulation runs varies from the true satisfying percentage [Kuwahara and Mura,
2008]. The circuit designs are ordered based on the lowest possible satisfying
probability (p − d) for each circuit. The score of a genetic circuit C for a given




d1d2 . . .dn (3.8)
3.3.3 Problem Statement
The problem of Genetic Temporal Logic Synthesis can be defined as follows:
Given a performance specification ΦS which defines the parameters of desired
behaviors over time of a set of genetic circuit signals S, a structural specification E
which contains a set of biological structural constraints, a library M of genetic mod-
ules with known response functions, and a well defined scoring metric α, design a
genetic circuit C, that satisfies the specifications ΦS and E.
Remark 1 In the implementation of the GTLS framework (Phoenix), S is a unit set.
Hence the specification φ (for the signal s ∈ S) and ΦS can be used interchangeably.
106
Remark 2 Since the circuit C can have only one unique output CDS, ΨC (the function
of the genetic circuit) and ψout (the response function of the output module which
reads the signal s) can be used interchangeably.
Objectives
• minimize the number of DNA components used.
• maximize the score α.
3.3.4 Problem Complexity
This section will remark on the complexity of the Genetic Temporal Logic Synthesis
(GTLS) problem by using an approach similar to the one used to prove the complexity
of Genetic Logic Synthesis (GLS) (Section 2.3.5. Consider a special case where the
score αC of a circuit C is given to be 0. As stated in section 3.2.1, α = 0 is considered
as a case of satisfaction. If GTLS for any α was easy, it could be used to solve for
α = 0. In other words, if GTLS for any score αC was easy, it can be used to find a
genetic circuit C that can satisfy the performance specification Φ.
Consider SSP: given a set N of positive integers and a positive integer k, decide
whether a subset of N sums to k. SSP is known to be NP-complete via reduction
from the Knapsack problem [Lagarias and Odlyzko, 1985]. There exists a polynomial
time reduction from SSP to GTLS. Each element ni ∈N is translated to an activating
genetic module – with a linear response function:
ψi(x, t) = x+nit (3.9)
which results in a library M with |N | modules. Consider an additional module g
which is an input module. Let the response function for g be:
ψg(t) = c (3.10)
107
where c is a positive integer constant. Let the output module be h with the response
function ψout which reads/displays the transcript level of the output genetic circuit
signal s.
Assume that S is a singleton, which implies that the performance specification for
the circuit is ΦS = φ. Hence, for the sake of simplicity, φ can be considered as the
performance specification of the circuit.
Let φ be defined as:
φ = [ta,tb]
(
(out0 ≤ (c+ kt))∧ (out0 ≥ (c+ kt))
)
(3.11)
which states that for any t ∈ [ta, tb], the value of out0 must always be greater than
or equal to c+ kt and lesser than or equal to c+ kt. It is easy to surmise from Eq 3.1
that any signal s[t] |= φ if and only if ∀t ∈ [ta, tb], the value of s[t] is equal to c+kt. If
at any time t ∈ [ta, tb] the value of s[t] is greater than or less than c + kt, α < 0. The
maximum robustness value(αmax) that can be achieved for φ by the signal s is 0.
With the module library M∪{g,h}, consider an instance of SSP where the answer
to the SSP is “yes”. In this instance, it is known that there is a subset of N that
sums to k. Hence it must be shown that there is a corresponding circuit with α = 0.
Consider a circuit C comprising of input module g and activating modules A ⊆M,
and an output module h that reads the signal.




at = c+ kt
Hence ∀t ∈ [ta, tb], the value of s is c+kt which implies that αC = 0. In this case, the
circuit C satisfies the performance specification φ.
Now consider an SSP where the answer to the SSP is “no”. In this case, all subsets
108
of N sum to a value that is either greater than or less than k. Hence at any time t,
the output of C will be either greater than or less than c+kt which makes α < 0 and
hence the circuit C does not satisfy the performance specification φ.
This shows that an instance of SSP can be translated to GTLS such that a circuit
with α = 0 is obtained if and only if the SSP instance is positive. However since SSP
is complete for NP, GTLS must also be NP-hard. Moreoever, the decision version of
the GTLS: “is there a circuit C built using a library of modules M that satisfies φ?”
is clearly in NP since, given C, it’s robustness α can be computed and the modules
used to build C can be checked to be in M in time polynomical in |M | and |C|. Thus,
the decision version of GTLS is NP-complete.
It is easy to show that the proof holds true even if S is not a singleton since each
signal si ∈ S could be checked against the corresponding performance specification
φsi ∈ Φ
S in linear time. Hence if GTLS is hard when S is a singleton, it is also hard
when S is non-singleton.
109
3.4 General Approach
In many ways, the general approach towards implementing GTLS is quite similar to
the approach used for GLS (Section 2.4).
Specification to Design
The two main types of specifications that can be used in GTLS are structural and
performance specifications. While an argument can be made to include functional
specifications as well, the description in performance specifications often encapsulates
the intended functional behavior of a system.
The Specification to Design (or synthesis) step can be divided into two steps:
• Generating an Abstract Genetic Regulatory Network (AGRN): This step in-
volves exploring the design space of abstract circuit topologies to generate a list
of topologies that satisfy the structural specification. These abstract topologies
can then be assigned specific DNA components create a genetic circuit. The Eu-
gene framework [Oberortner et al., 2014] provides a formal language to specify
structural constraints and also provides a solver that evaluates the constraints
and explores the design space to generate a list of AGRNs.
• Generating a Genetic Regulatory Network(GRN)/Genetic Circuit Assignment:
In this step, a GRN is generated from the AGRN using search algorithms to
identify a genetic circuit that can satisfy the performance specification. This
can done by the following steps:
– assigning genetic parts to the abstract circuit topology
– generating a model for the circuit assignment (typically an ODE)
– simulating the model to generate time-series traces
110
– quantifying the score of the circuit assignment based on how well the time-
series traces satisfy the performance specification (Section 3.3.2).
Design to Test
In this step, the circuits produced by the synthesis step can be built and tested in
the lab. Time series measurements will yield time-series data that can be validated
against the performance specification. To analyze the accuracy of the predictions
of the simulations, the temporal properties can be learned from the experimental
data (using Temporal Logic Inference, explained in detail in Section 3.5) and can be
compared against the temporal properties of the simulation data by using metrics for
Signal Temporal Logic (which have been explained in detail in Section 3.6).
3.4.1 Workflow
Fig. 3·7 shows a general implementation of the workflow for GTLS. The parts, mod-
ules, and models of the genetic components can be represented in data standard
formats like SBOL [Galdzicki et al., 2014] and SBML [Hucka et al., 2003] and stored
in repositories like Synbiohub [McLaughlin et al., 2018]. Languages and frameworks
like Eugene [Oberortner et al., 2014] can be used to specify structural constraints and
identify abstract topologies that satisfy the structural specification. Models of candi-
date circuit assignments are then simulated and checked against the STL performance
specification and assigned a score.
111
Figure 3·7: Workflow for Genetic Temporal Logic Synthesis
112
3.5 Temporal Logic Inference for Time Series Data
While STL is a functionally rich language that can be used to specify temporal be-
havior, it might not always be easy to identify and specify temporal properties of
genetic circuits. For instance, biologists may have time-series measurements of nat-
urally occurring systems or time-series simulations of biochemical models(Fig. 3·8),
which they would then like to design. To address this issue, this section introduces a
technique to learn temporal properties from time-series data.
Figure 3·8: Time-series measurements and simulations of biological
networks.
3.5.1 Introduction
The challenge of inferring temporal logic properties from data procured from a system
has been approached in many ways, including mining properties [Jin et al., 2015a] and
temporal logic inference (TLI) [Kong et al., 2014,Bombara et al., 2016]. The latter
has been specifically designed to solve the two-class classification problem where a
classifier can distinguish and categorize an object that belongs to one of two classes.
In this setting, the learning is supervised, and the two classes are positive or negative.
Positive traces represent normal working conditions, expected behaviors, or de-
sired behaviors of a system, while negative traces indicate anomalies or non-conforming
113
patterns of a system [Bombara et al., 2016,Jones et al., 2014,Chandola et al., 2009].
For some cases in which TLI can be used, labeled data is hard or impossible to
procure. One such case is in complex biological systems–more specifically, genetic
logic circuits in synthetic biology–where the ability to describe biological networks’
behaviors over time represents a powerful move forward to building complex and ro-
bust genetic circuits. The synthetic biology field currently relies on manual design
of genetic circuits, sometimes aided by simulations, with recent advances to com-
putationally designed genetic circuits that can encode Boolean logic [Nielsen et al.,
2016,Vaidyanathan et al., 2015]. While the biological network construction paradigms
Biobrick [Knight, 2003], MoClo [Weber et al., 2011], and Gateway-Gibson [Guye et al.,
2013] allow for rapid genetic circuit assembly, they make no statement or guarantee to
the behavior of the assembled DNA. These behaviors, captured empirically, describe
specific desired properties of the genetic circuit in precise biochemical environments.
While these environments could potentially be varied to identify undesired behaviors,
the negative conditions for biological systems are so diverse that it is impractical
to generate negative traces. In such situations, the two-class classification approach
cannot be applied.
This thesis proposes a grid-based approach, GridTLI, where the space of the finite-
time system traces in the dataset is discretized, temporal properties are inferred and
expressed as a formula in STL [Maler and Nickovic, 2004]. The proposed algorithm
has some advantages over conventional two-class TLI approaches. First, GridTLI
only requires positive traces and can guarantee zero false negatives for any trace in
the training dataset. Second, the input parameters of the algorithm can be used to
tune the complexity of the formula to fit the context of the data.
114
3.5.2 Related Work
Two major tracks can be identified in the field of temporal logic inference. The first
track addresses the so-called Parameter Mining problem [Jin et al., 2015a, Hoxha
et al., 2018,Bartocci et al., 2015]: given a formula template and a set of traces gen-
erated by a normally-behaving system, find the optimal parameters for the template
such that the resulting formula satisfies the input traces. Generally, the parameters
are the time intervals for the temporal operators and the thresholds for the inequality
predicates in the template formula.
The approaches in these references differ in the way the underlying optimization
problem is formulated and solved. It is also worth mentioning that in [Jin et al.,
2015a] and in [Hoxha et al., 2018], the parameter optimization problem is cast within
a more general active learning framework, where the original system is queried for
new traces when deemed necessary.
The second track is focused on constructing a formula–both the structure and
its parameters–within the context of the supervised two-class classification prob-
lem [Kong et al., 2014, Bombara et al., 2016, Jones et al., 2014, Bartocci et al.,
2014a, Bufo et al., 2014, Grosu et al., 2009]. In this setting, given a set of labeled
traces, e.g., either normal or anomalous, the goal is to build a formula that can
distinguish between them. In [Kong et al., 2014, Jones et al., 2014], the formula is
constructed by exploring a fragment of STL that admits a partial ordering, whereas
the parameter optimization is performed using an SVM-like (Support Vector Machine)
optimization. [Bartocci et al., 2014a,Bufo et al., 2014] also tackled this problem by
first building two generative probabilistic models, one for each class, and later ob-
taining a discriminative classifier by searching for a formula that maximizes the odds
of being true for the first model and false for the other model. The formula structure
is constructed through heuristics, while the parameter space is explored through Sta-
115
tistical Model Checking. Recently, [Bombara et al., 2016] proposed a decision-tree
based approach to solve the two-class problem. In this approach, a binary tree is
constructed with simple formulae at each node, called primitives. The optimal primi-
tives, along with their parameters, are chosen by optimizing some impurity measures
which quantify how well a primitive partitions the traces by class at each node. A
tree with this structure is in a one-to-one mapping with an STL formula, which can
be used for classification of new traces or other purposes.
3.5.3 Problem Description
Given a training set of finite time signals S, which represents the desired behaviors of
the system, find an STL formula φ such that all the signals in S satisfy the formula.
The STL formula should fit the signals, and the fit should be neither too tight nor
too loose. A loose fit results in an STL formula that is not very useful, since it might
not grasp the peculiar characteristics of the signals. In contrast, a tight fit results in
a long STL formula that is excessively specific to the training signals. Ideally, the
granularity of the fit of the STL formula must be tunable based on the requirements
and properties of the system.
3.5.4 Grid-Based Temporal Logic Inference
Grid-based temporal logic inference, or GridTLI, is an algorithm that generates an
STL formula from a given set of training signals. The algorithm uses three thresholds
which affect the tightness of the produced formula to the training data. GridTLI op-
erates on one-dimensional signals, for multi-dimensional signals the algorithm should
be executed on each dimension separately (see Remark 4).
116
Figure 3·9: Sample grid with subgrids and clusters. The five signals in
this grid are split into two clusters (shaded in green and purple) based
on the cluster threshold.
Notation and Definitions
R is a closed rectangular region in the 2-dimensional Euclidean space R×R≥0 given
by the value codomain and time domain of one-dimensional signals. The region R is
bounded by [xmin,xmax]× [0, tmax], where xmin,xmax ∈R, xmin < xmax, and tmax ∈R>0.
This region R is partitioned into a grid made of rectangular cells (called subgrids).
A subgrid g is a rectangular bounding box in R ×R≥0 represented with a tuple
(x, t,xinc, tinc), where (x, t) ∈ R ×R≥0 is the Cartesian coordinate of the bottom left
corner of the box, and xinc ∈R>0 and tinc ∈R>0 are the lengths of the sides of the box
along the value and time dimensions, respectively. Therefore, a subgrid g is bounded
between [x,x+xinc] along the value dimension and between [t, t+ tinc] along the time
dimension. A set of subgrids is indicated with Γ .
The object oriented dot notation (“.”) is used to reference the properties of the
region and the subgrids. The properties of the region R are its bounding values, which
are R.xmin and g.xmin for the value dimension and R.tmax for the time dimension. The
117
Algorithm 1 GRID-TLI
Input: S, xt, tt, ct
Output: φ
1: Γ = ConstructGrid(S, xt, tt)
2: (S1, . . . ,Sk) := ClusterSignals(S, ct)
3: Γi := FindCoveredSubgrids(Γ , Si), i = 1, . . . , k
4: φi := ClusterToFormula(Γ , Si), i = 1, . . . , k φ :=
∨k
i=1φi
properties of a subgrid g are the elements of the tuple, which are g.x, g.t, g.xinc, g.tinc,
and g.IT which indicates the time interval [t, t + tinc] of g.
Given a subgrid g and a set of signals S, a Boolean function ĝ(S) is defined such
that ĝ(S) = > if and only if at least one signal in s ∈ S has a non-empty intersection
with g. Such subgrid is informally referred as covered by S.
GridTLI uses three thresholds xt,tt, and ct, to determine its fitting and clustering
behavior. Specifically, xt is the signal threshold and it defines the minimum value of
g.xinc for any subgrid. tt is the temporal threshold and defines the minimum value
of g.tinc for any subgrid. ct is the cluster threshold and it is used to determine the
number of clusters.
Algorithm
GridTLI produces an STL φ formula from a set S, made of one-dimensional positive-
only signals, and the three thresholds xt, tt, and ct. The formula is generated in
four major steps: 1) the signal space is partitioned with a grid; 2) the signals are
distributed into separate clusters; 3) simple STL primitives are fit to each cluster;
and 4) the clusters are then mapped to the final STL formula. An high level overview
of the algorithm is reported in Alg. 1.
The first step is implemented by the function ConstructGrid() in Alg. 2. This
function first devises the bounds xmin, xmax, and tmax of the region of interest R by
118
Algorithm 2 ConstructGrid()
Input: S, xt , tt
Output: Γ
1: R := FindBoundingRegion(S)
2: for i:=0 to R.tmax step tt do
3: for j:=0 to R.xmax step xt do
4: Γ .addSubgrid((i, j, xt, tt))
5: end for
6: for j:=0 to R.xmin step −xt do




looking at signals in S (specifically, where they are located in the 2D space R×R≥0).1
Then, the region R is partitioned in a set of subgrids R using a 2D orthogonal grid
with a granularity given by thresholds xt and tt along the value and time dimensions,
respectively.
The second step is implemented in function ClusterSignals(). This function
takes the input signals S and clusters them into k subsets (S1,S2, . . . ,Sk) using the
cluster threshold ct (k can be at most be | S |.). Specifically, two signals si , sj ∈ S
are in the same cluster if for all t ∈ [0, tmax], there exist a gp and a gq, with gp , gq,
such that the following conditions hold: 1) t ∈ gp.IT and ĝp(si) =>, 2) t ∈ gq.IT and
ĝq(sj) =>, and 3) 
gp.x − (gq.x+ gq.xinc) ≤ ct if gp.x > gq.x,
gq.x − (gp.x+ gp.xinc) ≤ ct if gq.x > gp.x.
(3.12)
The function FindCoveredSubgrids(), simply maps every cluster Si with a set
of subgrids Γi that are covered by the signals in Si , that is, for all g ∈ Γi , ĝ(Si) =>.
1The values of xmin and xmax are approximated to multiples of xt. The value of tmax is approxi-
mated to a multiple of tt.
119
In the fourth step, the function ClusterToFormula() maps each set of subgrids
Γi to an STL formula φi . The formula is constructed by finding the top-most and
bottom-most covered subgrids at each time step t, going from 0 to tmax in increments
of tt. This process is described in Alg. 3. Some simplifications to the formula are
applied during this transformation process (see Example 5). Finally, the complete
formula is assembled by disjunction of the formulae obtained for each cluster:
φ = φ1 ∨φ2 ∨ . . .∨φk .
Algorithm 3 ClusterToFormula()
Input: Γi
Output: STL formula (φi)
1: φi :=>
2: for i:=0 to tmax step tt do
3: find gm s.t. ∀g ∈ Γi where g.t,gm.t = i, gm.x ≥ g.x
4: φi := φi ∧[i,i+gm.tinc)(x ≤ (gm.x+ gm.xinc))
5: find gn s.t. ∀g ∈ Γi where g.t,gn.t = i, gn.x ≤ g.x
6: φi := φi ∧[i,i+gn.tinc)(x ≥ gn.x)
7: end for
8: Return φi
Example 5 Consider the clusters in Fig. 3·9. In the cluster shaded in green for t ∈
[0,1], the covered subgrid with the minimum x value is g(0,0,1,1). The covered subgrid
with the maximum x value is g(1,0,1,1). These two subgrids provide a lower and an
upper bound for the satisfaction region of the two signals in the green cluster from
t ∈ [0,1]. This is formally represented in STL as φa = [0,1](x > 0)∧[0,1](x ≤ 2).
Similarly, the STL formula for t ∈ [1,2] is φb = [1,2](x > 1)∧[0,1](x ≤ 2). The STL
formula representing the green cluster for t ∈ [0,2] is φa ∧φb.
To simplify and reduce the length of the formula for a cluster, the temporal oper-
ators with adjacent time horizons and similar linear predicates can be combined. For
example, [0,1](x ≤ 2)∧[1,2](x ≤ 2) can be transformed to [0,2](x ≤ 2).
120
The simplified formula for the green cluster is
φ1 =[0,7](x ≤ 2)∧[0,1](x > 0)
∧[1,5](x > 1)∧[5,7](x > 0)
Similarly, the STL formula for the cluster shaded in purple is
φ2 =[0,7](x ≤ 6)∧[0,1](x > 0)∧[1,5](x > 4)
∧[5,6](x > 3)∧[6,7](x > 0)
Remark 3 The clustering of signals can only be as precise as xt. In fact, for any
z ∈N and for any ct ∈
[
(z − 1)xt, zxt
)
, all other c′t ∈
[
(z − 1)xt, zxt
)
will result in the
same clustering as that for ct.
Remark 4 Algorithm 1 operates on one-dimensional signals. If the training set S
contains multi-dimensional signals with dimensionality n, the sets Sx, x ∈ 1, . . . ,n, are
derived from S by considering only the component sx of each signal. For each set Sx,
GridTLI is used to obtain a formula φx and the final formula φ is constructed by
conjunction of all φx, that is
φ = φ1 ∧φ2 ∧ · · · ∧φn
Properties of the Algorithm
The formula φ for S constructed using xt, tt, and ct, will have the following properties:
Lemma 1 ∀s ∈ S, s |= φ.
Lemma 2 For any subformula φa in φ, where φa ∈ {[t1,t2)pf (x)≤µ1 ,[t1,t2)pf (x)>µ1},
then t2 − t1 ≥ tt.
Lemma 3 For any two subformulae φa and φb within an STL formula φi for a
cluster of subgrids Γi, if φa ∈ {[t1,t2)pf (x)≤µ1 ,[t1,t2)pf (x)>µ1} and φb ∈ {[t1,t2)pf (x)≤µ2 ,
[t1,t2)pf (x)>µ2}, then | µ1 −µ2 | ≥ xt.
Lemma 4 For any two signals si , sj ∈ S such that si |= φ1 and sj |= φ2, where φ1 and
φ2 are STL formulae for two separate clusters, then there exists t ∈ [0, tmax] such that
no gp, gq satisfy Eq. 3.12.
121
Complexity
This section discusses the worst-case complexity to construct φ for a set of one-
dimensional signals S using GridTLI. Since the steps of the algorithm are sequential,
the overall asymptotic complexity can be obtained as the sum of the complexities of
each step.
The complexity of the first step (function ConstructGrid()) is the same as the
number of elements in Γ , that is:
| Γ |= tmax
tt
· xmax − xmin
xt
(3.13)
For the second step (function ClusterSignals()), the complexity is | Γ | · | S | since
the clustering condition is checked for all signals and for all subgrids. The worst case
complexity of the third step (function FindCoveredSubgrids()) is | Γ | · | S |, since all
the subgrids in Γ could be covered and there can be at most | S | clusters. The worst
case complexity of the final step (function ClusterToFormula()) is | S | since, each
signal could form a unique cluster if the signals in S are too diverse or if ct is too
conservative.
The overall worst-case complexity of all the steps is
(
(| Γ |) + (| Γ | · | S |) + (| Γ | · | S |) + (| S |)
)
(3.14)
which can be simplified to (| Γ | · | S |), since this value dwarfs the other terms in











3.5.5 Using GridTLI in Biological Networks
This case study, aims to characterize the levels of protein expression for a biological
network, such as the one seen in Fig. 3·10, over time using STL. However, two-class
classification cannot be performed in this context because negative traces do not exist.
This is largely because there is no practical way to define what data should be labeled
as negative. Many biological networks have similar, or overlapping, behaviors, hence,
while characterizing one circuit, the behavior of all other circuits cannot be defined as
negative. The utility of GridTLI comes from its lack of reliance on negatively labeled
data.
(a) SBOL Visual diagram (b) Block diagram
Figure 3·10: A biological network. 3·10a is the SBOL Visual repre-
sentation [Quinn et al., 2013] of a genetic transcriptional unit capable
of producing a protein in the presence of a protein complex formed by
the small molecule AHL and protein LasR. 3·10b shows a block dia-
gram representation of the same biological network. By varying the
initial concentrations of AHL and LasR, various output traces can be
generated for the expression of pLas.
The dataset consists of traces of the amount of fluorescence in individual cells
measured by flow cytometry. These cells were transformed with the genetic circuit
shown in Fig. 3·10a. This circuit has the input/output structure shown in Fig. 3·10b,
where there are two inputs, the protein LasR and small molecule AHL, and one
output, which can be any gene. In this case, mCherry, a gene encoding for a red flu-
orescent protein was used. Experiments were conducted over several hours, and cells
were harvested for measuring at regular intervals. Traces were generated by repeat-
123
edly sampling from the fluorescence measurements and calculating their geometric
mean. Initial conditions of the inputs were varied to obtain a spectrum of potential
expression levels.
GridTLI was performed on these data using a variety of values for the thresholds
(xt, tt, ct). The values of the thresholds were restricted by using the following intervals






















For these data, xmax−xmin2 = 1843.5 and
tmax−tmin
2 = 3.45, and every 10% was sampled
from these intervals for xt and tt. The values for ct are chosen based on the given
value for xt, following from Remark 3. For convenience, the lower bound of each
interval
[










In order to test the STL output from GridTLI, k-fold cross-validation (CV) was
used, with k = 10. Since traces in this dataset were gathered from physically dif-
ferent experiments, using different initial conditions, a stratified cross-validation was
performed by partitioning the data from each experiment evenly among the folds.
The complete dataset contains 960 traces, so each training set has 864 traces, while
the corresponding testing sets contain the remaining 96.
After fitting the training data, each trace in the dataset will either satisfy or not
satisfy the STL formula. By design, every trace in the training set of a given formula
will satisfy that formula. Since only positive traces exist, each trace in the testing
124
set will either be a true positive (TP) or a false negative (FN), respective of whether
it satisfies the formula or not. The false negative rate (FNR) and robustness degree
of testing traces was used to assess the quality of the STL formulae. The FNR is
defined as
FNR = | FN |
| TP | + | FN |
.
The minimum FNR over all threshold chosen was 0.0, and the maximum was 0.0115.
The mean robustness values of the false negative traces were compared against
the number of operators for each set of thresholds with the aim of identifying ideal
threshold values. One of the primary challenges of fitting an STL formula to a dataset
with only positive traces is that it is hard to quantify the quality of the fit. When
very large thresholds are chosen, the entire region might be marked covered. This
leads to an STL formula with fewer operators, higher robustness values, and a lower
FNR for the testing set. Yet the formula might not specify any useful information
about the characteristics of the behavior of the system that can be captured from
the traces. In contrast, very conservative threshold values result in a tight fit, which
leads to a higher FNR, lower robustness values, and a larger number of operators in
the formula.
Fig. 3·11 shows a scatter plot of the mean robustness values against the number of
temporal operators of the STL formula for the different threshold values used in this
case study. The left most point marked in red corresponds to the STL formula with the
fewest number of temporal operators (shortest formula) and highest mean robustness
value. The threshold values for this point were (xt = 1843.5, tt = 3.45, ct = 1843.5),
which are the highest threshold values used in this case study. Fig. 3·12a shows an
execution of GridTLI using these threshold values on a training set. The STL formula
generated was: [0,6.9](x ≤ 3687)∧[0,6.9](x ≥ 0). The testing dataset yielded 0 false
125
0 10 20 30 40 50























Figure 3·11: Scatter plot of mean robustness of false negative traces
against average number of operators in the STL formula generated by
GridTLI for the biological traces. The data circled in red correspond
to the cross-validation sets showcased in Fig. 3·12.
negative rates (and the highest robustness) as illustrated in Fig. 3·12c. This STL
formula does not convey any useful information about the system though. It merely
states the upper and lower bounds of the traces over time.
Similarly, for the rightmost point marked in red in Fig.3·11 which corresponds to
the STL formula with the highest number of temporal operators (longest formula)
and lowest mean robustness value, the threshold values correspond to the smallest
values used in the case study: (xt = 184.35, tt = 0.345, ct = 0). Fig. 3·12b shows an
execution of GridTLI using these low threshold values. It is evident that the covered
subgrids in the region form a very tight fit. However, this leads to a longer STL
formula and low robustness values for some signals in the testing set (as shown by
the red trace in Fig. 3·12d
126
(a) Large Thresholds - Training (b) Small Thresholds - Training
(c) Large Thresholds - Testing (d) Small Thresholds - Testing
Figure 3·12: 3·12a & 3·12b show covered subgrids for different thresh-
olds using GridTLI on the same training data of pLas Expression. 3·12c
& 3·12d show cross-validation for the testing data. The trace colored
red in 3·12d violates the STL formula learned from the training set in
3·12b.
As stated previously, it is hard to quantify these trade-offs due to the absence
of negative traces in the system. However, the ability to tune the tightness of the
fit for an STL formula will allow biologists to create specifications that match the
requirements of their experimental setup.
127
3.5.6 Discussion
This thesis presents a simple algorithm (GridTLI) that can infer the temporal prop-
erties from time series data using only positive traces. The algorithm discretizes the
domain and codomain of the signals in the training dataset and uses user specified
threshold values to fit the data into satisfaction areas and formally expresses these
areas as a Signal Temporal Logic formula.
The thesis also presents a case study to show how temporal properties can be
learned from biological networks where only positive traces are available. GridTLI
generated efficient STL formulae with acceptable misclassification rates for the testing
data. The case study in Appendix A shows that GridTLI is an effective algorithm
that works well, especially when only positive data is available, and is fast (over
3000 times faster than a previous two-class approach) with results comparable to an
inference algorithm that uses a decision-tree approach.
But more importantly, the GridTLI framework, helps extend the GTLS worflow
described in Section 3.4.1. Users can now draw time series data, or use simulations
generated by biochemical models, or use traces from preexisting genetic systems to
define a desired performance specification. The temporal properties of these time
series data can be learned using GridTLI to generate STL formulae that can be
plugged into the GTLS framework as a performance specification. Fig. 3·13 shows a
more powerful workflow for GTLS that uses GridTLI.
Future work for GridTLI includes optimizing, and automating, the choice of
threshold values by dividing available data into two sets: one for optimization and
another for testing. Future work can also include exploring ways to convert one STL
formula (or temporal operator) to another, to reduce the length of the STL formulae
while minimizing the change in the FNR. This work can also be extended to solve the
two class problem by using the positive data as mentioned in this thesis and using
128
Figure 3·13: GTLS Workflow using GridTLI.
the negative data to optimize the covered cells to obtain a better MCR.
129
3.6 Metrics for Signal Temporal Logic
Using model checking [Baier and Katoen, 2008] techniques, signals or traces can be
checked to determine whether or not they satisfy a specification. For STL in particu-
lar, the degree of satisfaction or robustness is a quantitative measure to characterize
how far a signal is from satisfaction [Fainekos and Pappas, 2009,Donzé and Maler,
2010,Donzé et al., 2013] of an STL formula. There is currently, however, no formal
way to directly compare specifications against each other. The ability to compare
specifications against each other would help compare performance specifications of
two biological systems. It can also be used to compare the accuracy of the in-silico
simulations of the models against empirical data, using GridTLI to learn temporal
properties from time-series data.
3.6.1 Introduction
Previous related approaches in planning and control have looked into specification
relaxation, where the goal is to minimally enlarge the specification language to include
a satisfying control policy for the system model. Various specification relaxations
have been defined including minimum violation [Tumova et al., 2013b,Tumova et al.,
2013a,Vasile et al., 2017b] for self-driving cars, temporal relaxation of deadlines [Vasile
et al., 2017a], minimum revision of Büchi automata [Kim et al., 2015], and diagnosis
and repair in reactive synthesis [Ghosh et al., 2016]. While language inclusion and
equivalence problems are of paramount importance in computer science and control
theory, they are only qualitative measures while the focus of this thesis is quantitative
metrics.
This thesis presents two metrics that can be used to compute the distance between
two STL specifications. Under mild assumptions, the metrics have been proposed
based on the languages of STL formulae. Two distance functions have been proposed
130
in this thesis. The first is based on the Pompeiu-Hausdorff (PH) distance [Munkres,
2000], which captures how much the language of one formula must be enlarged to
include the other, and the second is based on the symmetric difference (SD) [Conway,
2012], which characterizes how much overlap there is between the two formulae. The
theoretical contributions of the metrics presented in this thesis are:
1. formalization of STL formulae metrics based on the PH and the SD distances,
and
2. methods for computing the PH using mixed-integer linear programming (MILP),
and the SD using a recursive algorithm based on the area of satisfaction.
This thesis discusses the comparison of the two metrics in detail and provides examples
that highlight their differences.
This section additionally presents applications of the proposed metrics to a behav-
ioral synthesis problem and to the evaluation of temporal logic inference (TLI) [Bar-
tocci et al., 2014b, Jin et al., 2015b, Bombara et al., 2016, Vaidyanathan et al.,
2017,Hoxha et al., 2018] methods. The first case focuses on generating designs that
exhibit desired behaviors specified in STL by studying synthetic genetic circuits. Pos-
sible circuit designs were constructed and measured in laboratory experiments, and
the resulting traces were abstracted into STL specifications using TLI. These formu-
lae were compared quantitatively against the desired design specification using the
proposed metrics. The second setup considers the fundamental problem of evaluating
TLI methods themselves. Under the assumption that data used for inference can be
characterized by ground truth STL formulae, the question is, how well do the TLI
algorithms perform. As opposed to empirical evaluation used in previous work, this
thesis proposes the use of metrics as loss functions as the first step in establishing
theoretical foundations for TLI (Appendix B). The related contributions are:
131
3. a design quality measure for evaluating proposed implementations against an
STL specification, and
4. a loss function to quantify errors in TLI as a first step in establishing formal
performance guarantees of TLI algorithms (Appendix B).
3.6.2 Metrics
This section, introduces two functions dST L : ΦST × ΦST → R≥0 that quantify the
dissimilarity between the properties captured by the two STL formulae. However, it
is possible that different formulae may describe the same properties. For example,
φ1 and φ4 in (3.4) are describing the same behavior, since any signal that satisfies
φ1 already satisfies φ4 and vice versa. The key idea is to define the distance between
two STL formulae as the distance between their time-bounded languages.
Assumption 1 The set S ⊂Rn is compact.
Assumption 2 All of the predicates are rectangular.
Note that bounded-time languages are constructed in finite-dimensional Euclidean
spaces. Also, since all inequalities in the predicates are non-strict, bounded-time
languages are compact sets. Assumption 2 is theoretically restrictive, but not in
most applications - usually it is the case that all predicates are rectangular as they
describe thresholds for state components of a system.
Definition 2 The two STL formulae φ1 and φ2 are semantically equivalent, denoted
by φ1 ≡ φ2, if both induce the same language: L(φ1) = L(φ2).
The set of equivalence classes of ΦST induced by ≡ is denoted by ΦST / ≡. Distance














class associated with φ, and φ1 and φ2 are formulae in the two equivalence classes.
132
Note that, by definition, there is a one-to-one map between the equivalence classes of




, dST L(φ1,φ2) = 0.
The following two common metrics between sets have been adapted: (a) the
Pompeiu-Hausdorff (PH) distance based on the underlying metric between signals,
and (b) a measure of Symmetric Difference (SD) between sets. As it will be clarified
in this section, the choice of T , as long as it is larger than the horizons of the formulae
that are considered, does not affect the fundamental properties of the defined metrics.
In the case of the PH distance, it does not have any effect at all. For the SD metric,
the computed distances are scaled with respect to the inverse of T . These details are
explained in Section 3.6.2.
Pompeiu-Hausdorff Distance
Definition 3 The (undirected) PH distance is defined as:
dPH (φ1,φ2) = max
{
~dPH (φ1,φ2), ~dPH (φ2,φ1)
}
, (3.16)
where ~dPH denotes the directed PH distance:








Note that the directed PH distance is obviously not a metric as it is possible to have
~dPH (φ1,φ2) , ~dPH (φ2,φ1). ~dPH (φ1,φ2) = 0 if and only if L(φ1) ⊆ L(φ2). Another
way to interpret the PH distance is as follows [Munkres, 2000]:
~dPH (φ1,φ2) = min{ε | L(φ1) ⊆ L(φ2) + εBST }, (3.18)
where BST is the unit ball in ST : {s[0 : T ] | ‖st‖∞ ≤ 1, t ∈ [0,T ]}, and addition of
sets is interpreted in the Minkowski sense. In words, ~dPH (φ1,φ2) is the radius of the
minimum ball that should be added to L(φ2) such that it contains L(φ1).
133
Proposition 1 The (ΦST / ≡,dPH ) is a metric space.
Proof 1 Note that dPH (φ1,φ2) is effectively defined as dPH (L(φ1),L(φ2)) - remember
that languages are compact subsets of finite dimensional Euclidean space, for which
it is known that the PH distance is a metric [Munkres, 2000]. Moreover, there is a
one to one map between an equivalency class formula of a formula in ΦST / ≡ and its
language.
It is possible to interpret (3.17) as the distance between an STL formula and a
signal: ~dPH (s,φ) := mins′∈L(φ)d(s, s′). It is easy to see that ~dPH (s,φ) = 0 if and only
if s ∈ L(φ). The following result is a reformulation of Definition 23 in [Fainekos and
Pappas, 2009], which establishes a connection between the STL score and the notion
of signed distance.




−~dPH (s,φ) ~dPH (s,φ) > 0,
~dPH (s,¬φ) ~dPH (s,φ) = 0.
(3.19)
The following results are extensions of classical results for signed distances [Kraft,
2015].
Corollary 1 For any given two formulae φ1,φ2 ∈ ΦST and a signal s ∈ ST , the fol-
lowing inequalities apply:∣∣∣|ρ(s,φ1,0)| − |ρ(s,φ2,0)|∣∣∣ ≤ dPH (φ1,φ2)∣∣∣ρ(s,φ1,0)− ρ(s,φ2,0)∣∣∣ ≤ dPH (φ1,φ2) + dPH (¬φ1,¬φ2)
Corollary 2 Given ε > 0, define ε-neighborhood of an STL formula φ as:
{φ}ε ={
φ′ ∈ ΦST |dPH (φ,φ′) ≤ ε
}
Then, ρ(s,φ,0) ≥ ε implies that s |= φ′,∀φ′ ∈ {φ}ε.
134
Symmetric Difference
The SD is denoted by 4, and defined as X4Y = (X \ Y ) ∪ (Y \ X), where X and
Y are two sets. It induces a distance between compact sets as the measure of the
SD [Conway, 2012].





where | · | is the Lebesgue measure.
Proposition 3 The (ΦST / ≡,dSD) is a metric space.
Proof 2 Follows immediately from the definition. The metric is well-defined since the
formulae have time horizons bounded by T over discrete-time signals. Their languages
are compact subsets of the Euclidean space Rn(T+1). Thus, the Lebesgue measure is
defined.
The coverage of signal have been defined as sets in the space-time value set.




t∈[0,T ]{(st, τ) | t ≤ τ ≤
t +1}, where S ⊆ S . For an STL formula φ, P (φ) = P (L(φ)).
Let S =Un, and p = xi ≤ µ be a rectangular predicate with µ ∈U and i ∈ {1, . . . ,n}.




∪ (Un × {1 ≤ t ≤ T }).
Theorem 1 If φ1 and φ2 are two STL formulae with the same language, then they
cover the same space. Formally, L(φ1) = L(φ2) implies P (φ1) = P (φ2).
Proof 3 Immediately follows from the fact that P is a projection of L(φ) onto space-
time S× TU.




While the PH distance and the SD difference are both metrics, they have quite dif-
ferent behaviors. This section elaborates on these differences and shows an illustrative
example.
Informally, the PH distance has a stronger spatial notion, and it is closely con-
nected to STL score, as stated in Corollary 1 and Corollary 2. The PH distance,
captures the worst-case spatial difference between formulae. On the other hand, the
SD is a more temporal notion, as the areas also capture the length of temporal op-
erators. It is possible that two STL formulae have a large PH distance, but a small
SD distance, and vice versa. In applications, the choice is dependent on the user.
The most useful may be a convex combination - which is a metric by itself - with a
user-given convex coefficient.
Example 6 Consider the six STL formulae in (3.4). The PH and the SD distances
have been computed between all pairs of formulae using the methods proposed in
Section 3.6.3 and Section 3.6.4. The directed PH distances have also been reported.
1) Directed PH distance: The results are shown in Table 3.1, where the value in
the i’th row and j’th column is ~dPH (φi ,φj). Each maximizer (the PH distance) is
bolded.
The truth constant has also been included in the distance table. It is observed that
~dPH (φi ,>) = 0,∀i ∈ {1, · · · ,6}, which implies the fact that the language of each formula
is contained within the language of >, which is the set of all signals. The opposite
direction, ~dPH (>,φ) = dPH (φ,>), is, informally, the quantification of how restrictive
φ is.
Note that dPH (φ1,φ2) = 0.04. It is observed that most values are either 0.6 =
max(1−0.4,0.2−0) or 0.56 = max(1−0.44,0.2−0), which correspond to the extreme
signal that one language contains but the other does not, or it is 0, indicating that
one language is a subset of another. For instance, φ3 is a “weak” specification in
the sense that its language is broad - any signal with some value in θ1 at some time
satisfies it - so the directed distances from other formulae to φ3 are zero. Another
notable example is the relation between φ1 and φ4. The directed PH distance is zero
136
Table 3.1: Directed PH Distances for Example 6
~dPH > φ1 φ2 φ3 φ4 φ5 φ6
> 0 0.6 0.56 0.6 0.6 0.6 0.6
φ1 0 0 0 0 0 0 0
φ2 0 0.04 0 0.04 0.04 0.04 0.04
φ3 0 0.6 0.56 0 0.6 0.6 0.6
φ4 0 0 0 0 0 0 0
φ5 0 0.6 0.56 0 0.6 0 0.04
φ6 0 0.6 0.56 0 0.6 0.6 0
in both directions - the two formulae are equivalent. This is due to the fact that any
signal that satisfies φ4, already satisfies φ1. The other direction also trivially holds.
Note that some pairs, like φ2 and φ3, have non-zero PH distances in both directions.
2) SD distance: The results are shown in Table 3.2 along with the PH distances
for comparison. Here, it can be seen that in most cases, the SD distance is either a lot
larger or a lot smaller than the PH distance. This is largely due to the fact that this
metric is based on area which is particularly highlighted when comparing any of the
formulae to >. Since each formula’s satisfaction space is very small in comparison
to the entire bounded signal space, each of these values is quite large. In contrast, the
SD distance between φ1 and φ5 is fairly small since the satisfaction regions for each
of these formulae cover a similar area. The SD distance between φ2 and φ3 is on the
larger side as their areas of satisfaction are quite different; however, these areas are
still much closer to each other than they are to the entire bounded satisfaction area
represented by >. Similar to the PH distance, the SD distance between φ1 and φ4 is
zero as they have completely overlapping areas of satisfaction.
The results in Table 3.2 illustrate that there are different situations when the PH
distance might be favored over the SD distance and vice versa. In cases where one
cares about the area covered by the satisfaction region of a formula, the SD distance
should be used. For instance, the SD could be used to find a formula close to one that
requires a signal to be held at a particular value for a long time interval. However, if
one only cares about how close the signal bounds of the formulae are to each other,
the PH distance should be used.
137
Table 3.2: PH and SD Distances for Example 6
dPH\dSD > φ1 φ2 φ3 φ4 φ5 φ6
> 0 0.8 0.76 0.99 0.8 0.804 0.84
φ1 0.6 0 0.04 0.19 0 0.036 0.03
φ2 0.56 0.04 0 0.23 0.04 0.044 0.07
φ3 0.6 0.56 0.56 0 0.19 0.186 0.16
φ4 0.6 0 0.04 0.6 0 0.036 0.03
φ5 0.6 0.56 0.56 0.6 0.6 0 0.066
φ6 0.6 0.56 0.56 0.6 0.6 0.6 0
3.6.3 Pompeiu-Hausdorff Distance
This section, proposes an optimization-based method to compute the PH distance
between two STL formulae.
Definition 5 Given an STL formula ϕ that contains no negation, ϕε+ is defined with
the same logical structure as ϕ with predicates replaced as follows:
• f (x) ≥ µ replaced with f (x) ≥ µ− ε;
• f (x) ≤ µ replaced with f (x) ≤ µ+ ε.
Intuitively, ϕε+ is a relaxed version of ϕ. It is easy to verify from (3.1) that if
ρ(s,ϕε+,0) = ρ(s,ϕ,0) + ε,∀s ∈ ST .
Lemma 5 The following relation holds:
~dPH =min{ε ≥ 0 | L(ϕ1) ⊆ L(ϕε+2 )} (3.20)
Proof 4 (sketch) The result is a direct consequence of (3.18) as L(ϕε+2 ) = L(ϕ2) +
εBST .
The following statement provides the main result, and the base for the computational
method of this section.
138
Theorem 2 Given ϕ1,ϕ2 ∈ ΦST , L(ϕ1) , ∅, define ε∗ as the following optimum:
ε∗ = max ε,
subject to s |= ϕ1, s 6|= ϕε+2 ,
ε ≥ 0, s ∈ ST .
(3.21)
Then the following holds:
~dPH (ϕ1,ϕ2) =
{
ε∗ (3.21) is feasible,
0 otherwise. (3.22)
Proof 5 First, consider the case that (3.21) is infeasible or its value is 0. Then, it
means that the constraints s ∈ L(ϕ1), s < L(ϕε+2 ) are infeasible for all ε > 0, which im-
plies that for any s ∈ L(ϕ1), ρ(s,L(ϕ2),0) ≥ 0. Thus, L(ϕ1) ⊂ L(ϕ2) and consequently
~dPH (ϕ1,ϕ2) = 0.
Now consider the case (3.21) is feasible and ε∗ > 0. Then, s 6|= ϕε∗+2 is an active
constraint, which implies ρ(s,ϕε∗+2 ,0) = 0. Note that s is also optimized in (3.21).
Thus ρ(s,ϕ2,0) < 0, or s < L(ϕ2). (3.20) can be rewritten as:
~dPH = sup{ε ≥ 0 | L(ϕ1) * L(ϕε+2 )}. (3.23)
Note that sup has been used instead of max as L(ϕ1) * L(ϕε+2 ) is a strict relation.
Also note that such the supremum exists as i) the condition is satisfied for ε = 0 and
ii) the language sets are bounded. (3.21) captures (3.23). If L(ϕ1) * L(ϕε+2 ), then it
means that ∃s |= ϕ1 but ρ(s,ϕε
+
2 ,0) < 0. This is what is captured by the constant in
(3.21), with the difference that ρ(s,ϕε+2 ,0) < 0 is replaced by a non-strict inequality
and sup is replaced by max.
(3.21) is converted into a MILP problem. The procedure for converting STL into
MILP constraints is straightforward, see, e.g., [Raman et al., 2014]. The encoding
details are omitted here. By solving two MILPs, the PH distance can be obtained.
Two MILPs can be aggregated into a single MILP, but that usually more than doubles
the computation time due to larger branch and bound trees. Moreover, it is often
useful to have the knowledge of the directed PH distances.
Theorem 2 requires that formulae do not contain negation. Negation elimination is
straightforward: first, the formula is brought into its Negation Normal Form (NNF),
139
where all negations appear before the predicates. Next, the predicates are negated.
For example, ¬(x ≤ µ) is replaced by (x ≥ µ). It is important to remember that strict
inequalities are not considered, hence ¬(x ≤ µ) and (x ≤ µ) are both true if x0 = µ.
Finally, observe that the choice of T does not effect the values of PH distance, as
long as it is larger than the horizons of two formulae that are compared. Given
φ1,φ2 ∈ ΦST , the values of st for t > max{
∥∥∥φ1∥∥∥ ,∥∥∥φ2∥∥∥} do not have any associated
constraints in (3.21).
Complexity
The complexity of (3.21) is exponential in the number of integers, which grows with
the number of predicates and horizons of the formulae. However, since signal values
do not have any dynamical constraints, it was found that solving (3.21) was orders
of magnitudes faster than comparable STL control problems, such as those studied
in [Raman et al., 2014]. All the values obtained in Table 3.1 were evaluated almost
instantaneously using Gurobi MILP solver on a personal computer.
3.6.4 Symmetric Difference
This section presents an algorithm for computing boxes representing the area of sat-
isfaction of a formula as well as a method for determining the SD between two sets of
boxes. Each set of boxes approximates the projection (P ) of the formula and repre-
sents the valid value-space that a time-varying signal can take such that traces that
are contained entirely within the boxes satisfy the formula.
Computing the set of boxes representing the area of satisfaction is a recursive
process that takes as input an STL formula, φ, a set of max values, Xmax, for each
signal, x ∈ X (used to normalize the signal values to a unit space), and a discretization




Figure 3·14: a and b show the area of satisfaction boxes for φ1 and φ5
from Example 6, respectively. The blue regions represent the boxes that
are computed for globally () operators. In c, the red regions represent the
non-overlapping area and the purple regions represent the overlapping area
between φ1 and φ5. The SD distance for this example is the area of the red
regions ((2× 0.2) + (8× 0.04) = 0.72) divided by the maximum time horizon
which is 0.7220 = 0.036.
Here, box(t1, t2,x1,x2, i) = P ([t1,t2]x1 ≤ x
i ≤ x2) ⊆ S×TU creates a new box with
minimum and maximum times t1 and t2, minimum and maximum values x1 and x2,
and spatial dimension i ∈ {1, . . . ,n}, respectively; overlap determines the time window
of the overlap between two boxes; combine takes two boxes and produces a set of
boxes representing the intersection of the overlapping time window region; b.lt, b.ut,
b.lv ∈ S, and b.uv ∈ S return the lower time window, upper time window, lower
variable, and upper variable values for box b, respectively; ∗ in the definition of a box
141
denotes that it may restrict multiple spatial dimensions; and the ‖ operator is used
to create a “choice” set representing that either of the two sets separated by it can
be selected as the set of boxes representing the area of satisfaction.
Algorithm 4 Convert to Area of Satisfaction Boxes (AoS)
Input: STL formula φ, max value set Xmax, discretization threshold δ.
Output: Set of boxes B for each signal in φ.
1: if φ := xi ≤ π then
2: Return box(0,0,0,π/xmax, i)
3: else {φ := xi > π}
4: Return box(0,0,π/xmax,1, i)
5: else {φ := φ1 ∧φ2}
6: Create a new set B
7: for each b1 ∈ AoS(φ1) and each b2 ∈ AoS(φ2) do
8: if overlap(b1,b2) then
9: Add combine(b1,b2) to B
10: else




15: else {φ := φ1 ∨φ2}
16: Return AoS(φ1) ‖ AoS(φ2)
17: else {φ := [t1,t2](φ1)}
18: Create a new set B
19: for each b ∈ AoS(φ1) do
20: b′ = box(b.lt + t1,b.ut + t2,b.lv,b.uv,∗)
21: Add b′ to B
22: end for
23: Return B







To address the problem of projection of formulae containing disjunction (the con-
verse to Theorem 1), AoS utilizes the ‖ operator. If this algorithm instead generated
boxes representing the projection of all formulae, it would be possible for the satis-
142
faction space represented by the boxes to capture signals that the original formula
does not allow. The application in Section 3.6.5 highlights this problem and presents
a way of dealing with it for that particular example.
For operators such as globally (), AoS is exact and produces boxes bound by the
time bounds of the operator that represent the projection of the primitive. However,
operators such as eventually (♦) do not immediately lend themselves to conversion into
a set of boxes. In order to deal with this operator, it is approximated by converting
it into a disjunction of globally predicates. Each globally predicate is generated using
a small threshold value (δ) for its time window width. The new formula requires
that the expression be true in at least one of the smaller time windows essentially
introducing a mandatory δ “hold” time for eventually operators. The tunability of δ
allows for a user to give up some accuracy for gains in performance of box computation
and ultimately distance comparison. Examples of computing the area of satisfaction
boxes for some of the formulae in Example 6 are shown in Figure 3·14.
The SD between two sets of boxes is computed by calculating the area of the sum
of the non-intersected area for each box set.















Figure 3·14c visually illustrates how the SD between φ1 and φ5 is computed.
Note that the SD distance is scaled by T+1T ′+T+1 if the maximum horizon is increased
by T ′. This again shows the temporal nature of the SD as opposed to the PH distance
which does not change.
143
Complexity
The complexity of Algorithm 4 depends on the complexity of the ‖ operation which
may be exponential depending on how it is implemented. Otherwise, the algorithm is
polynomial due to the box combination operations carried out whenever a conjunction
predicate is encountered. In practice, computing the SD distance using this method
for formulae with a few dozen predicates typically takes only a few seconds.
3.6.5 Quantification of design quality
This section shows an example of how the proposed metrics can be used in behav-
ioral synthesis. Behavioral synthesis is an important process in design automation
where the description of a desired behavior is interpreted and a system is created
that implements the desired behavior. The main goal is to check if the characterized
implementations satisfy the specifications of a system. Implementations include sim-
ulations and execution traces of a system. These implementations are characterized
into formal specifications using TLI. This thesis shows that the proposed metrics can
be used in the synthesis step to choose a design from the solution space that can best
implement the desired specification. The specific example chosen to highlight this
application is the synthesis of genetic circuits in synthetic biology.
Synthetic Genetic Circuit Synthesis
This example shows a set of desired behaviors (each formally represented by STL)
which describe the various behaviors expected of a genetic circuit. This set of behav-
iors is referred to as a performance specification: Sφ. Sφ consists of 2 STL formulae:
φlow and φhigh which describe the desired amount of output produced by the genetic
144
circuit over time:
φlow = [0,300](x < 40)∧[0,300](x > 0)
φhigh = [0,125](x < 200)∧[125,300](x < 320) ∧
[0,150](x > 0)∧[150,200](x > 100) ∧
[200,300](x > 150)
In this case, the output of the circuit corresponds to the expression of a fluorescent
protein. φlow specifies that the output must consistently be below 40 units from time
0 to 300, and φhigh specifies that that output must gradually increase over time and
must end up between 150 and 320 units between time 200 and 300. The solution
space consists of two genetic circuits. The first circuit has a constitutive promoter
as shown in Figure 3·15a. Constitutive expression removes flexibility for consistency
allowing constant protein production independent of the state or inputs of the system,
which is highlighted in Figure 3·15c. The second circuit has an inducible promoter: a
sugar detecting transcription factor AraC*, which will turn on the protein production
if and only if it is in the presence of a specific input molecule (arabinose) as shown in
Figure 3·15b. Figure 3·15d shows the output of the circuit for various concentrations
of arabinose. Both of these synthetic genetic circuits were built in Escherichia coli.
The traces were obtained from biological experiments by measuring fluorescence.
The goal is to choose the circuit that can “satisfy as many behaviors as possible”
in Sφ. It is important to note here that it is difficult to express the term “satisfy as
many behaviors as possible” using the syntax and semantics of STL. For instance,
expressing the desired specification as a disjunction of all the formulae in Sφ would
imply that satisfying any one specification is sufficient for the genetic circuit to satisfy
the performance specification. Similarly, expressing the desired specification as a
145
(a) Constitutive Expression (b) Induction Circuit
(c) Output of Constitutive Expression (d) Output of Induction Circuit
Figure 3·15: a and b show SBOL Visual representations of the genetic
circuits with a constitutive promoter and an inducible promoter, respectively.
Biological traces in c and d were obtained by evaluating geometric mean
fluorescence at regular intervals by flow cytometry.
conjunction of all the formulae in Sφ would imply that at any point in time, the
output of a genetic circuit must have multiple distinct values, which is physically
impossible.
This conundrum is highlighted in the current example. The output of constitutive
expression satisfies φlow but cannot satisfy φhigh. The induction circuit produces
traces that can satisfy both φlow and φhigh. However, traditional model checking
techniques may not help a designer choose the desired circuit. Using statistical model
checking, for example, the circuit with constitutive expression yields a satisfaction
likelihood of 1.0 and the induction circuit yields a satisfaction likelihood of 0.83
when checked against φlow ∨ φhigh. With these results, one might think that the
146
circuit with constitutive expression best satisfies the performance specification.
To address the issue of satisfying as many behaviors as possible, the performance
specification’s region of satisfaction is treated as the union of the regions of satisfaction
of all the formulae in Sφ as shown in Figure 3·16a. This region is computed by
taking the union of the generated boxes for each formula that are computed using
Algorithm 4. The union of the box sets of all STL formulae in Sφ is represented as
Bφlow ∪Bφhigh = BSφ and is shown in Figure 3·16b.
(a) P (φlow ∨φhigh) (b) Bφlow ∪Bφhigh = BSφ
Figure 3·16: a shows the union of the areas of satisfaction for φlow and
φhigh. The vertical stripe and horizontal stripe areas represent P (φlow) and
P (φhigh), respectively. b shows the boxes created for Bφlow ∪Bφhigh .
Grid TLI [Vaidyanathan et al., 2017], was used to produce STL formulae, φcon
and φind , for each circuit using the traces shown in Figures 3·15c and 3·15d, respec-
tively. The SD metric was used to get the following values: dSD(BSφ ,Bφcon) = 0.636
and dSD(BSφ ,Bφind ) = 0.304. The PH metric yields: dPH (BSφ ,Bφcon) = 0.067 and
dPH (BSφ ,Bφind ) = 0. These results imply that the behavior of the induction circuit is
closer to the desired specification than the circuit with constitutive expression, and
thus, it should be selected as the desired circuit.
147
3.7 Phoenix - An Implementation of Genetic Temporal Logic Syn-
thesis
3.7.1 Phoenix Design Environment - Overview
The Phoenix tool is a software implementation of the GTLS framework shown in
Fig. 3·7. The Phoenix software provides users with a design automation environ-
ment (similar to the Cello software) in the form for a web-based interface. This
section describes how the Phoenix software converts structural specification (Eugene
code), performance specification (STL), and a library represented in the SBOL for-
mat [Galdzicki et al., 2014] and stored in a Synbiohub collection [McLaughlin et al.,
2018] into a genetic circuit configuration with the best score (α).
3.7.2 Structural Specification - miniEugene
Eugene is a design specification language that enables the integration of biological
rules while designing genetic circuits [Oberortner et al., 2014]. These rules (cate-
gorized as Counting, Pairing, Positioning, Orientation, and Interactions) can help
specify structural constraints of a genetic circuit topology.
Phoenix, uses the miniEugene framework, which is a simplified version of Eugene.
The syntax of miniEugene has been extended to include identifiers that specify a dis-
tinct type of DNA Component (Promoter, RBS, CDS, Terminator) as shown in Table
3.3. This extended syntax is referred to as the structural specification in Phoenix.
The miniEugene software was used to enumerate a design space of circuit topologies
that satisfy the rules specified in the structural specification.
The DNA Components in the structural specification can also broadly be cate-
gorized as Specific or Generic. The identifiers for specific components end with a
numeric value (e.g. ap0, f c1, r5, etc). Every instance of a specific component with
148
the same identifier must have the same DNA component assigned from the library.
Any two specific components with different identifiers must be assigned unique DNA
components.
The structural specification must also start with the following comment:
//Size=n
where n is a positive integer which indicates the size (number of DNA components) of
the genetic circuit. If this line is absent, Phoenix iterates through values ranging from
4 to |L|, (where |L| is the number of DNA components in the library) until miniEugene
can generate designs. If there are no designs found for the structural specification,
the software returns an error.
Examples
















^ap[0-9]+$ Specific Activatable Promoter - activated by a
transcription factor.
^rp[0-9]+$ Specific Repressible Promoter - repressed by a
transcription factor.
^cp[0-9]+$ Specific Constitutive Promoter - not regulated by
a transcription factor
p Generic Promoter - can be any promoter (Activat-
able, Repressible or Constitutive)
^r[0-9]+$ Specific RBS.
r Generic RBS.
^ac[0-9]+$ Specific Activating CDS.
^rc[0-9]+$ Specific Repressing CDS.
^fc[0-9]+$ Specific Fluorescent CDS.




Table 3.3: List of miniEugene identifiers used in Phoenix. The first col-
umn displays the regular expression that describes the the miniEugene
identifiers while the second columns describes each identifier. The reg-
ular expression ^[0-9]+$ denotes a sequence of numbers ranging from
0 to 9.
This specification yields exactly 1 design as shown in Figure 3·17a.




sequence cp0, r, fc0, t
ALL_FORWARD
This specification equivalent to the specification shown in Example 7 (Figure 3·17a).
Example 9 - Two transcriptional units where the first component can either be a
specific activatable promoter or a specific repressible promoter and both the RBS must
be the same.
//Size=8
sequence [ap0|rp0], r0, c, t, p, r0, c, t
ALL_FORWARD
This structural specification yields exactly two designs shown in Figure 3·17b and
3·17c.
Example 10 - Four transcriptional units with functional constraints.
//Size=16






This structural specification yields exactly one design shown in 3·17d. Note that
in this design, the biological interactions are also constrained. For instance, when a
promoter is assigned to ap0, it must be an activatable promoter and must be activated




cp0 fc0 ap0 rp0r0 r0 r0 r0
cp0 ac0 ap0 rc0 rp0 ac1 ap1 fc0
Figure 3·17: Circuit topologies produced by miniEugene
3.7.3 Structural Specification to Rooted Tree
Phoenix uses the miniEugene solver to enumerate all circuit topologies that can sat-
isfy the structural specification specified using the extended miniEugene language
(Section 3.7.2). Each topology generated by miniEugene is first checked for biological
validity (Section 3.3.2).
Each biologically valid abstract circuit topology is then converted to a root vertex
(Section 3.3.2). This is done by creating a Vertex object which contains an ordered
list of abstract DNA components. The root vertex is then decomposed using Vertex
decomposition.
Vertex Decomposition
Vertex decomposition is the process by which a directed rooted tree is created from
the root vertex. The root vertex describes an abstract circuit topology while the
children of the root vertex either specify a transcriptional unit, DNA components, or
composite DNA components. This helps in constructing models for the circuit using
a bottom up approach.
First, all DNA components of the root vertex are read from the first element to
152
the last element and two arrays are created - “forward components” and “reverse
components”. If the DNA component is oriented from 5’ to 3’, it is added to the
forward components array and if a DNA component is oriented from 3’ to 5’, it is
added to the reverse components array. Once this step is completed, the order of the
reverse components array is flipped. Both arrays are then scanned from left to right
(first element to the last element) to identify complete transcriptional units. Each
transcriptional unit vertex is added as a child of the root vertex.
The components of the transcriptional unit are then scanned from left to right.
The Promoter and RBS components are combined to form a Composite DNA compo-
nent. These components make up a leaf vertex which is added as the first child of the
transcriptional unit. The second leaf component comprises of the CDS component.
Since the terminator is not modelled separately, the terminator component has been
appended to the Promoter-RBS leaf vertex (for the sake of simplicity). This process
has been highlighted in Fig.3·18.
It is important to note that the decomposition method used in this framework was
dictated by the modelling paradigms that were adopted in the Phoenix framework.
However, this could potentially change if a different modelling approach was used.
For instance, if terminators also had separate models, then a separate leaf vertex
consisting of the terminator could be created. Similarly, if ribozyme insulators were
used, promoters and RBS would not be coupled to form a composite DNA component.
3.7.4 Part and Module Assignment
This section describes how parts (DNA components and composite DNA compo-
nents) and modules are assigned to abstract circuit topologies and scored against a
performance specification to identify the circuit configuration with the highest score.
The Phoenix software employs two search algorithms to explore the solution space of
153
TF0 TF1
Promoter-RBS CDS Promoter-RBS CDS














kexpress kloss0 basal, max, kloss1
n, Kd
Figure 3·18: Vertex Decomposition and Model Composition in
Phoenix. Vertex decomposition helps identify modular transcriptional
units for which models can be built based on the type of DNA com-
ponents in the transcriptional unit. The model for the circuit is then
built using a bottom-up approach.
genetic circuit configurations.
Exhaustive Search
The exhaustive search algorithm takes a directed rooted tree T , a library L, and a
performance specification Φ as inputs, searches the entire solution space, and returns
154
the genetic circuit configuration with the highest score (α) as the result.
For each leaf vertex in T , a set of candidates are assigned from L. Candidates are
assigned based on the component type as shown in Table 3.4.
Component Type Candidates
Specific Activatable Promoter Activatable Promoters and Induced Activatable
Promoters.
Specific Repressible Promoter Repressible Promoters and Induced Repressible
Promoters.
Specific Constitutive Promoter Constitutive Promoters.
Generic Promoter Activatable Promoters, Repressible Promoters, In-
duced Activatable Promoters, Induced Repressible
Promoters, and Constitutitve Promoters.
Specific RBS RBS.
Generic RBS RBS.
Specific Activating CDS Activating CDS.
Specific Repressing CDS Repressing CDS.
Specific Fluorescent CDS Fluorescent CDS.




Table 3.4: Candidate Assignment for DNA components.
In case of composite DNA components, candidates are assigned only if there ex-
ists a composite DNA component in the library such that each component type in
the candidate matches the component type in the abstract component. For instance,
155
consider a composite component: cp0-r which represents a composite DNA compo-
nent with a Specific Constitutive Promoter followed by a Generic RBS. In this case,
all candidates must be a composite DNA component with a Constitutive Promoter
followed by an RBS.
Once candidates are assigned to leaf vertices, candidates for the genetic circuit are
assigned by computing the Cartesian product of all the sets of candidates for the leaf
vertices as shown in Fig.3·19. Let this set of candidate circuit assignments formed by
computing the Cartesian product of all the candidates of the leaf vertices be X. Each
candidate x ∈ X is then checked for validity using the rules in Section 3.3.2. A set
Y ⊆ X of all valid candidate circuit assignments is created. A model is then created
for each circuit assignment in Y using model composition.
Figure 3·19: Part Assignment in Phoenix using Exhaustive Search
Each candidate circuit assignment y ∈ Y is then scanned to see if it contains any
induced activating module or induced repressing module. If an induced module is
detected, then for each unique small molecule (inducer) present in the system, a set
of concentrations are created by sampling values on a log scale between the min-
imum and maximum concentrations of the small molecule specified in the library
L. The concentrations are sampled from between 10−6 to 1 times the maximum
value as long as each concentration value is greater than or equal to the minimum
value and lesser than or equal to the maximum value. For instance, consider a small
156
molecule inducer “luxAHL” where the minimum value specified is 0nM and the max-
imum value is 1000nM. In this case, the small molecule concentrations considered are
= {0,0.001,0.01,0.1,1,10,100,1000}. Next, a Cartersian product of all sets of small
molecule concentrations for a specific circuit assignment is computed.
Finally, for each circuit assignment in y ∈ Y , parameters for the circuit model
are assigned based on the candidate components. If the circuit assignment has small
molecules, then for each combination of small molecule concentration (in the Cartesian
product computed earlier), initial concentrations of small molecules are assigned and
simulated as a separate circuit configuration. If the circuit assignment does not have
any small molecules, then the model of the circuit assignment is simulated as is.
Using iBioSim [Myers et al., 2009] either deterministic or stochastic (using gillespie’s
algorithm) simulations are performed. These simulations are then checked against
the performance specification Φ and the corresponding score α is computed. The
circuit assignment or circuit configuration with the highest score (α) is returned as
the solution. These steps have been highlighted in the Algorithm 5.
Performance
Exhaustive search depends completely upon the size of the library and the number
of DNA components in the abstract circuit topology. Table 3.5 shows the number
of circuit configurations found using exhaustive search for various abstract circuit
topologies in Fig. 3·20 using the library (Fig. 3·26 designed for the case studies in
Section 3.8.
157
Algorithm 5 Exhaustive Search
Input: Rooted Tree T , Library L, and STL Φ
Output: Circuit Assignment/Configuration C, Score α
1: for each leaf vertex vi ∈ T do
2: Add candidate components from L to a set Xvi
3: end for
4: Compute X =

vi=leaf vertexXvi
5: Create a new Set Y
6: for each x ∈ X do
7: if x is biologically valid then
8: Add x to Y
9: end if
10: end for
11: Create a new Set Z
12: for each y ∈ Y do
13: if y has inducers then
14: Identify set of unique inducers I in y
15: for each i ∈ I do
16: Create a set of concentrations Ji
17: end for
18: Compute J =

i∈I Ji
19: for each j ∈ J do
20: Create a circuit configuration z and add it to Z
21: end for
22: else
23: Add y to Z
24: end if
25: end for
26: Let C = z0 ∈ Z and let α = αC
27: for each z ∈ Z do
28: Compute αz against the specification Φ.
29: if αz > α then
30: C = z and α = αz
31: end if
32: end for
33: Return C and α
158
Topology Candidates Valid Can-
didates
0 SM 1 SM (x8) 2 SM (x64) Configurations
Fig. 3·20a 45 5 5 - - 5
Fig. 3·20b 2,025 39 21 18 - 165
Fig. 3·20c 91,125 299 76 195 28 3,428
Fig. 3·20d 455,625 2,035 235 1,290 510 43,195
Fig. 3·20e 20 20 - - 20 1,280
Table 3.5: Circuit configurations found using Exhaustive search. Col-
umn Candidates indicates the number of initial circuit assignments gen-
erated for the abstract circuit topology in Column Topology. Column
Valid Candidates shows the number of biologically valid candidates in
all the circuit assignments designed. Column 0 SM indicates the num-
ber of circuit assignments with no small molecule inducers. Column
1 SM (x8) indicates the number of circuit assignments with 1 small
molecule inducer which indicates that 8 configurations will be created
for that assignment. Column 2 SM (x64) indicates the number of cir-
cuit assignments with 2 small molecule inducers which indicates that
64 configurations will be created for that assignment.
(a) Topology 1
(b) Topology 2 (c) Topology 3
(d) Topology 4 (e) Topology 5
Figure 3·20: Abstract circuit topologies. All unlabeled DNA compo-
nents are generic DNA components. Any DNA component with a label
is a specific DNA component.
159
Simulated Annealing
Simulated annealing is a very popular approach for search algorithms, especially when
the solution space is very huge. When circuit configurations are also taken into
account, the design space for GTLS can explode very quickly. This section explains
the details of the simulated annealing approach along with the modifications made
to the standard algorithm to make it effective.
Algorithm 6 gives a high level overview of the simulated annealing algorithm.
Algorithm 6 Simulated Annealing
Input: Rooted Tree T , Library L, and STL Φ
Output: Circuit Assignment/Configuration C, Score α
1: Let temp = 10000
2: Let coolingRate = 0.045
3: Let vr be the root vertex of T .
4: Create a circuit configuration C by assigning random DNA components to vr
5: while C is Biologically invalid do
6: Re-assign random DNA components to vr
7: end while
8: Find α for C
9: while temp > 1 do
10: Find a neighbor Cn for C
11: Compute the score αn for Cn
12: Compute the Acceptance Probability (prob) for Cn and C using Eq. 3.24
13: Randomly choose a value (random) between 0 and 1.
14: if prob > random then
15: C = Cn
16: α = αn
17: end if
18: temp = temp * (1 - coolingRate)
19: end while
20: return C and α
Finding a biologically valid Neighbor
Finding a biologically valid neighbor is a non-trivial step. Typically, in problems
with similar structure, a random element from the current node can be swapped with
160
another random element either within the current node or from the library. How-
ever, in the current framework, swapping a DNA component in the current node with
any random DNA component cannot guarantee a biologically valid circuit assign-
ment. Hence additional swaps might be required to ensure that the neighbor is also
biologically valid. Algorithm 7
Algorithm 7 Finding a biologically valid Neighbor
Input: Current node C, Library L
Output: Neighboring node Cn which is biologically valid.
1: while Cn is biologically valid do
2: Step 1: Randomly choose DNA component or biochemical component c ∈ C
3: if c is a Small Molecule Inducer then
4: RETURN Cn = C, with new small molecule concentration for c.
5: else if c is a Small Molecule Inducer then
6: if c is part of a Composite DNA component then
7: Create Cn by swapping c with another DNA component only if the com-
posite DNA component exists in L.
8: else if c is part of a Module then
9: if Swapping c causes the module to be incomplete then
10: Create Cn by swapping the entire module
11: else
12: Create Cn by swapping c with another DNA component
13: end if
14: end if
15: if Cn is biologically valid then
16: Return Cn
17: else




This is highlighted by the following example. Consider a simple circuit topology
described by the following miniEugene code (visually represented in Fig. 3·21):
//Size=12




According to this specification, the second and third RBS component must be exactly
the same, while the first RBS component must be different from the second and third
RBS component.
Figure 3·21: Design Constraint Example for Simulated Annealing
Fig. 3·22 highlights two examples of finding two neighbors of the same current
node. The example uses the parts created for the Phoenix case studies (Fig. 3·26.
Calculating the acceptance probability





where prob is the acceptance probability, ∆ is the difference between the scores (αCn−
αC), T is the current temperature, and H is the hamming distance.
If each circuit configuration were represented as a string (κ), where each DNA
component and each small molecule inducer concentration for each inducer was rep-
resented by a unique character, the hamming distance H is the number of positions
at which corresponding letters of κC and κCn are different (Fig. 3·23.
Including the hamming distance in the acceptance probability reduces the chance
of the simulated annealing algorithm behaving like a random walk algorithm. Fig. 3·24
shows various values of ∆ based on certain fixed parameters.
162
(a) This shows a straightforward swap to find
a neighbor.
(b) In this case, additional elements of the current node have to be
swapped to ensure that the neighbor is biologically valid.
Figure 3·22: Finding a biologically valid neighbor.
163
(a) Hamming Distance = 2 (b) Hamming Distance = 5
Figure 3·23: Hamming Distance between two circuit configurations.
(a) Acceptance probabilities for various val-
ues of ∆ when H=1
(b) Acceptance probabilities for various val-
ues of ∆ when T=100
(c) Acceptance probabilities for various val-
ues of ∆ when T=1000
Figure 3·24: Acceptance probabilities for various values of ∆.
164
Performance
Simulated annealing generally performs well. Within a few steps, the algorithm usu-
ally reaches the best circuit configuration, or a circuit configuration with a score
pretty close to the best score. It is also a lot faster than exhaustive search and typ-
ically runs in a constant amount of time due to the fixed number of nodes that are
visited for each run. Fig. 3·25 highlights the performance of simulated annealing.
Figure 3·25: Performance of Simulated Annealing in Phoenix. The
blue line indicates the best score found using exhaustive search. The
black line indicates scores of each neighbor visited while running Alg. 6.




The Phoenix software was used to design genetic circuits for three case studies which
highlight the impact of using the GTLS framework in designing genetic circuits with
complex behaviors.
A library of DNA components, composite DNA components, and modules was
designed for these case studies, which is illustrated in Fig. 3·26 using visual SBOL
notation [Quinn et al., 2013]. The model chosen was a standard biochemical hill
equation model (as shown in Fig. 3·18). The parameters for the parts were obtained
using parameter inference from the data collected from time series experiments. The
following list explains the parameters used in the models:
• kloss: is the sum term of dilution and degradation of proteins in the cell.
• kexpress: is the sum term of all expression for a given constitutive promoter.
Modules that consist of non-inducible repressors have the following 4 parameters:
• basal: is the amount of protein expressed from the promoter even with no input
(or 0 input).
• max: is the maximum amount of protein expressed (low input for repressors,
high input for activators)
• n is the Hill Co-effecient, and represents the cooperativity.
• Kd : is the 1/2 activation threshold; at this value of input the output is 1/2 the
maximum.
If the module as an inducible element (inducible activator or inducible repressor),
there are two hill terms: the hill function that represents the amount of protein
166
activated by the small molecule (where SM) is added to the term (e.g. Kd_SM and
nSM), and the hill function that represents the amount of activated protein that binds
the the promoter (where T F) is added to the term (e.g. Kd_T F and nT F). The max
and min parameters for small molecule inducers indicate the maximum and minimum










RBS30 RBS31 RBS33 ter
Composite Parts







Figure 3·26: Library of DNA components, composite DNA compo-
nents, and modules designed for Phoenix.
The parameters for various parts have been shown in Tables 3.6, 3.7, 3.8, and
3.9.
167
Table 3.6: Parameters for Constitutive Promoter–RBS Composite







Table 3.7: Parameters for Activatable and Repressible Promoter–RBS
Composite DNA components in Phoenix
Part basal max kd n Kd_SM nSM maxT F Kd_T F nT F
pPhl-RBS30 0.969 928.567 57.191 1.337 - - - - -
pLas-RBS30 0.881 15.653 10199.337 1.719 75.976 1.831 895.850 92.818 0.776
pLux-RBS30 1.810 4.622 4023.220 1.890 76.658 1.428 622.359 257.013 0.880
pRhl-RBS31 2.527 1.558 862.809 0.940 370260.028 2.667 209.984 0.000 0.421














This case study is intended to be a very simple “Hello World” application that illus-
trates how Phoenix operates. The goal is to design a genetic circuit which has just
1 transcriptional unit and is encoded with constitutive expression over a period of
time.
Performance and Structural Specifications
The performance specification is given by the following STL formula:
G[0,300] (out0 >= 2000 && out0 < 5000)
This specification states that between time 0 mins and 300 mins, the output CDS
must display expression values between 2000 and 5000 MEFL.











This specification states that the circuit topology must contain exactly one generic
promoter, one generic RBS, one generic CDS, and one generic terminator each. It
also states that all components must be in the forward orientation (5’ to 3’) and the
generic promoter must drive the expression of the generic CDS.
169
Fig. 3·27 shows a visual representation of the structural and performance specifi-
cations.
(a) (b)
Figure 3·27: Specifications for the Phoenix Constitutive Expression
case study. Fig. 3·27a shows the region of satisfaction for the perfor-
mance specification, while Fig. 3·27b shows the abstract circuit topol-
ogy generated by miniEugene.
Design
This case study uses an exhaustive search since the solution space is quite small.
The design step produces 45 candidate circuit assignments out of which only 5 are
biologically valid candidate circuit assignments (Fig. 3·28).
Both deterministic (Fig. 3·29) and stochastic (Fig. 3·30) simulations were per-
formed for all 5 candidates circuit assignments. These simulations were then com-
pared against the performance specification and Candidate 0 - (pLacIq - RBS30 -
GFP - ter) yielded positive robustness (for deterministic simulations) and a statisti-
cal model checking score of 1 (for stochastic simulations). All other candidates yielded
negative robustness scores and statistical model checking scores equalling 0. Fig. 3·31
highlights how Candidate 0 satisfies the specification while all other candidates do
not satisfy the specification.
170
(a) Candidate 0: pLacIq-RBS30-GFP-Ter (b) Candidate 1: pLacI-RBS31-GFP-Ter
(c) Candidate 2: pLacI-RBS33-GFP-Ter (d) Candidate 3: pLacIq-RBS31-GFP-Ter
(e) Candidate 4: pLacI-RBS30-GFP-Ter
Figure 3·28: SBOL visual representations of candidate circuit assign-
ments for the Phoenix Constitutive Expression case study.
Test
All 5 candidates were tested in the lab. Time series measurements were taken at the
following time points (in mins): 0, 10, 20, 30, 45, 60, 150, 300, to produce a single trace
for each candidate. As predicted, Candidate 0: pLacIq-RBS30-GFP-Ter satisfied the
performance specification while all other candidates had a negative robustness value
(as shown in Fig. 3·32) The robustness of the deterministic trace of candidate 0 was
2109.16 while the robustness of the experimental trace of candidate 0 was 1952.
171
(a) Candidate 0 (b) Candidate 1
(c) Candidate 2 (d) Candidate 3
(e) Candidate 4
Figure 3·29: Deterministic Simulations for the Phoenix Constitutive
Expression case study.
172
(a) Candidate 0 (b) Candidate 1
(c) Candidate 2 (d) Candidate 3
(e) Candidate 4
Figure 3·30: Stochastic Simulations for the Phoenix Constitutive Ex-
pression case study.
173
(a) Candidate 0 (b) Candidate 1
(c) Candidate 2 (d) Candidate 3
(e) Candidate 4
Figure 3·31: Stochastic Simulations for the Phoenix Constitutive Ex-
pression case study against the the performance specification. The
region shaded in blue indicates the region of satisfaction of the perfor-
mance specification.
174
Figure 3·32: Experimental validation for Phoenix Constitutive Expres-
sion case study.
3.8.2 “Low” to “High”
One of the challenges in designing robust genetic circuits with specific functions is
the use of generic and ambiguous terms while specifying design constraints or system
specifications. One such commonly used example is the term “low” expression or
“high” expression which typically describes an arbitrarily defined level of expression.
This issue is compounded by the fact that measurements in biological experiments
are typically looked at in the log-scale. Hence, circuits with vastly different behav-
iors could be construed as similar in behavior. This case study highlights a specific
example and shows how Phoenix can be used to solve this problem.
Performance and Structural Specifications
The structural specification is given by the following miniEugene code.
//Size=8




This specification states that the circuit topology must have 8 DNA components (all
in the forward orientation) in the following order: Generic Promoter, Generic RBS,
Generic CDS, Generic Terminator, Generic Promoter, Generic RBS, Generic CDS,
Generic Terminator.
The performance specification is given by the following STL formula:
G[10,20](out > 1000 && out < 2500) && G[150,300](out > 7500 && out <
15000)
This specification states that the output CDS expression must initially be between
1000 and 2500 MEFL from time 10 to 20 mins and then must increase to values
between 7500 and 15000 MEFL from time 150 to 300mins. This specification provides
a very precise description of the “low” and “high” levels expected of the genetic circuit.
Fig. 3·33 shows a visual representation of the structural and performance specifi-
cations.
Design and Test
The design step produces 2,025 candidate circuit assignments out of which only 39 are
biologically valid. Out of these 39 candidate circuit assignments, 21 assignments have
no small molecule inducers involved while 18 have exactly 1 unique small molecule
inducer involved in the induction of a promoter. Hence by sampling 8 unique small
molecule concentrations, there are (18x8) + 21 = 165 circuit configurations.
Of these circuit configurations, consider the circuit assignment shown in Fig. 3·34
with the following two configurations:
• 10nM of luxAHL is added at time 0 mins.
176
(a) (b)
Figure 3·33: Specifications for the Phoenix “Low” to “High” case study.
Fig. 3·33a shows the region of satisfaction for the performance specifi-
cation, while Fig. 3·33b shows the abstract circuit topology generated
by miniEugene.
• 1000nM of luxAHL is added at time 0 mins.
Figure 3·34: Candidate circuit assignment designed by Phoenix for the
“Low” to “High” case study.
Fig.3·35a and 3·35b show deterministic simulations produced by Phoenix for 10nM
and 1000nM of AHL (respectively) added to the candidate circuit assignment. A
cursory glance indicates that both simulations seem pretty similar and the expression
levels seem to rise from a “low” level to a “high” level. The experimental results
177
in Fig.3·35c and 3·35d for 10nM and 1000nM of AHL (respectively) added to the
candidate circuit assignment, also match the simulations produced by Phoenix.
(a) luxAHL = 10nm (b) luxAHL = 1000nm
(c) luxAHL = 10nm (d) luxAHL = 1000nm
Figure 3·35: Deterministic Simulation and Experimental Traces for the
Phoenix “Low” to “High” case study. Fig. 3·35a and 3·35b show deter-
ministic simulation traces while Fig. 3·35c and 3·35d show experimental
traces.
However, when the experimental traces are checked against the performance spec-
ification (Fig. 3·36, the circuit configuration with luxAHL=10nM yields a positive
robustness value of 70 while the circuit configuration with luxAHL=1000nM yields
a negative robustness value of -20160. Hence Phoenix accurately predicted the cir-
cuit configuration (along with specifics including the concentration of small molecule
178
inducer to be added) that can satisfy the performance specification.
Figure 3·36: Deterministic Simulations and Experimental Traces for
the Phoenix “Low” to “High” case study checked against the perfor-
mance specification. The black line indicates the deterministic simu-
lation trace produced by Phoenix when 10nM of luxAHL is added at
time 0 min. The green line indicates the experimental trace measured
when 10nM of luxAHL is added at time 0 min and the red line indicates




This case study illustrates the functional richness of STL as a performance specifica-
tion to describe interesting temporal behaviors. It also highlights the feature shown
in the previous case study, where Phoenix can identify exact circuit configurations
to help biologists design very specific experiments. The goal of this case study is to
design a circuit where the expression of the output CDS resembles a pulse (i.e. the
expression starts low, and then increases and eventually decreases again).
Performance and Structural Specifications
The performance specification chosen for this case study is represented by the follow-
ing STL formula:
G[0,10](out > 0 && out < 2000 )
&&
G[100,180](out > 7000 && out < 8000)
&&
G[400,600](out > 4000 && out < 5500)
This formula has 3 distinct parts. In the first one, the output CDS must display
expression levels between 0 and 200 MEFL from 0 to 10 mins. Next, the expression
levels must increase and be between 7000 and 8000 MEFL from 100 to 180 mins.
Finally, the expression levels must drop to values between 4000 and 5500 from 400 to
600 mins.
The structural specification for the pulse circuit is given by the following miniEu-
gene code:
//Size=16








This states that the circuit topology must have exactly 16 DNA components all in
the forward orientation the components in the following order: Constitutive Pro-
moter (cp1), Generic RBS, Activating CDS (ac1), Generic Terminator, Activatable
Promoter (ap1), Generic RBS, Repressing CDS, Generic Terminator, Repressible Pro-
moter (rp1), Generic RBS, Activating CDS (ac2), Generic Terminator, Activatable
Promoter (ap2), Generic RBS, Output CDS (fc1), Generic Terminator. Specification
also states that ac1 must activate ap1, ac2 must activate ap2, and rc1 must repress
rp1.
Fig. 3·37 shows a visual representation of the structural and performance specifi-
cations.
(a) (b)
Figure 3·37: Specifications for the Phoenix Pulse case study. Fig. 3·37a
shows the region of satisfaction for the performance specification, while
Fig. 3·37b shows the abstract circuit topology generated by miniEugene.
181
Design
The design step in Phoenix generates 20 candidate circuit assignments. All 20 circuits
assignments(fig. 3·38 are biologically valid (due to the nature of the library and the
structural constraints imposed). Each of the 20 circuit assignments have 2 unique
small molecule inducers each. Sampling 8 concentrations for each small molecule
inducer, yields 64 configurations for each circuit assignment. Hence the total number
of circuit configurations are 64x20 = 1280.
Figure 3·38: All candidate circuit assignments designed by Phoenix for
the Pulse case study.
However, the most interesting candidate assignment is: pLacIq - RBS30 - LuxR
- Ter - pLux - RBS30 - PhlF - Ter - pPhl - RBS30 - RhlR - Ter - pRhl - RBS31 -
GFP - Ter shown in Fig. 3·39. This circuit has 2 small molecule inducers: luxAHL
and rhlAHL. By varying the small molecule concentrations, the circuit exhibits varied
types of behaviors.
Figure 3·39: Candidate Circuit assignment for Phoenix for the Pulse
case study.
182
Fig. 3·40 shows 4 different circuit configurations:
• At time 0 min, 0.01nM of luxAHL (no rhlAHL is added).
• At time 0 min, 0.01nM of luxAHL and 10000nM of rhlAHL is added.
• At time 0 min, 0.01nM of luxAHL and 1000nM of rhlAHL is added.
• At time 0 min, 0.01nM of luxAHL and 10000nM of rhlAHL is added.
(a) luxAHL = 0.01nM (b) luxAHL = 0.01nM, rhlAHL = 10000nM
(c) luxAHL = 10nM, rhlAHL = 1000nM (d) luxAHL = 10nM, rhlAHL = 10000nM
Figure 3·40: Stochastic simulations of a candidate circuit assignment
for the pulse case study with different configurations.
This case study is again interesting because Phoenix can go through the entire
183
solution space, and design a circuit configuration that can realize a complex temporal
behavior specification.
3.9 Discussion
This chapter presents a new framework: Genetic Temporal Logic Synthesis, that uses
performance specifications to reliably describe accurate behaviors of genetic systems.
Such an approach is vital towards being able to build robust genetic circuits. It is
also a stepping stone for exciting breakthroughs in stem cell research and pattern
formation.
This chapter also presented an implementation of GTLS: Phoenix. With Phoenix,
biologists can create a formal, performance-bound specification for complex biolog-
ical systems like genetic circuits, and run finite-time simulations for modular de-
sign components to determine genetic circuits with high likelihood of satisfying their
specification. After DNA assembly and genetic circuit characterization, Phoenix can
verify whether the collected data satisfies the performance specification. Phoenix
seamlessly connects existing and novel bio-design automation tools to create a closed-
loop, algorithm-driven design workflow for one possible solution to a complete genetic
circuit design-build-test cycle where the only inputs are design specifications and
collected data and the only outputs are DNA assembly and genetic circuit testing




Conclusions and Future Work
Synthetic biology continues to see technological gains in its ability to “write” DNA
(gene synthesis) and “read” DNA (gene sequencing). As this capacity grows it will
be crucial that the level of abstraction at which new designs are created be raised.
This phenomenon parallels the increasing silicon resources seen in computing. Simi-
larly, this abstraction will be eased if logical expressions of required behavior can be
expressed at a high level and automatically transformed to physical realities. This
thesis has presented outlines of two frameworks that will allow for such formalisms
to exist.
Chapter 2 introduced a framework for Genetic Logic synthesis which showed how
genetic circuits encoded with Boolean logic could be designed. This chapter also pre-
sented an implementation for GLS: Cello, which is a highly successful implementation
of a well established engineering concept (digital logic synthesis and circuit design)
in the field of synthetic biology.
Chapter 3 introduced the idea of using performance specifications to specify be-
havior of genetic circuits over time within a framework for Genetic Temporal Logic
Synthesis. This chapter presented a biologically validated implementation for GTLS
called Phoenix. This chapter also presented two concepts - GridTLI: A temporal
logic inference concept that can learn temporal properties from data without requir-
ing ‘negative traces’, and metrics for STL which will enable the comparison of two
STL formulae. An interesting extension of this work would be the application of
185
Phoenix with other tools and frameworks that produce models of genetic circuits and
simulate time-series data [Pedersen and Phillips, 2009,Mısırlı et al., 2018].
As the design complexity of the genetic circuits increases, it is important to plan
ahead and look into ways to automate the process of designing and testing the genetic
circuits produced by Cello and Phoenix in the lab. An extension to the frameworks
presented in this thesis would be to use techniques such as automated robotic liq-
uid handling assembly [Ortiz et al., 2017] to build modular parts and circuits in the
lab. Another exciting area is the use of microfluidics and other ‘lab on a chip’ tech-
niques [Lashkaripour et al., 2018, Silva, 2017, Crites et al., 2018] to automate the
process of characterizing parts and modules, and testing genetic circuits and systems.
Another interesting extension to the GLS and GTLS frameworks is - parameter
tuning. Parameters used in models are based on properties of the DNA components.
Tuning these components using techniques like Genetic Engineering (strengthening
or weakening a promoter or RBS) and Protein Engineering can yield better param-
eters that can satisfy a specification better. Techniques such as machine learning or
optimization can be used to identify the specific DNA part that must be tuned as
well as the extent to which it must be tuned such that a genetic circuit or genetic
system can better satisfy its specification.
The future of engineering biology will require integrated design across many sub-
cellular systems, including the creation of sensors that can process many stimuli,
management of resources and metabolites, and control over multiple cellular func-
tions (communication, stress response, chemotaxis, etc.). Integrating across these
computer-aided design (CAD) tools in a way that automates design choices and bal-
ances constraints will be critical to advancing the complexity of genetic engineering
projects. One can hope, that the application of the frameworks presented in this
thesis can help spark the third wave of synthetic biology.
186
Appendix A
Application of Grid TLI to a Fuel Control
System













(a) Trajectories from EGO sensor













(b) Trajectories from MAP sensor
Figure A·1: Example testing set for case study A. Every trace in the
set consists of a trajectory in (a) and a corresponding trajectory in (b).
Trajectories colored green are from positively labeled traces, and all
trajectories in red are labeled negative.
This dataset was constructed by modifying a built-in model from Simulink for the
fuel control system of a gasoline engine. The goal of this automotive subsystem is
to maintain the air-to-fuel ratio, i.e. keep the ratio between the air mass and the
fuel mass constant at all times. Two core sensors provide the feedback information
necessary to control the system: the EGO sensor, which reports the amount of residual
oxygen present in the exhaust gas, and the MAP sensor, which reports the (intake)
manifold absolute pressure. The dataset is composed of 1200 traces, with 600 normal
187
traces and 600 anomalous traces. The normal traces were obtained while the system
was operating properly, whereas the anomalous traces were obtained when a fault was
artificially injected into one or both the sensors. Every trace contains 200 samples
from each of the two sensors, and examples of traces are shown in Fig. A·1.
As in the first Grid-TLI case study(section 3.5.5), this case study uses 10-fold
CV to evaluate the output of GridTLI. Stratified sampling was used to insure each
partition contained an equal number of positive and negative traces. Although the
training sets contain negatively labeled traces, GridTLI uses only the positive traces.
The negative traces in the training sets will be used for training in the comparison.
Each training set has 1080 traces with a corresponding testing set of 120 traces.
Since these traces have two dimensions, GridTLI is run once for each. The resulting
STL formula has the form
φ = φEGO ∧φMAP.





t ) and (xMAPt , tMAPt , cMAPt ), respectively. Again the possible values
are restricted for the thresholds using the data:
xEGOt ∈ (0,0.5512] , xMAPt ∈ (0,0.35365] ,
tEGOt ∈ (0,29.85] , tMAPt ∈ (0,29.85] ,
cEGOt ∈ [0,0.5512] , cMAPt ∈ [0,0.35365] ,
and are sampled uniformly from these intervals in the same fashion as the first case
study.
In contrast with the first case study, traces in the testing set will now have four
possible characterizations: TP or FP (false positive), for positive or negative traces,
188
respectively, that satisfy the formula; or FN or TN (true negative), for positive or
negative traces, respectively, that do not satisfy the formula. This also allows for the
use of the misclassification rate (MCR) of testing traces to evaluate the STL formulae.
The MCR is defined as
MCR = | FP | + | FN |
| TP | + | FP | + | FN | + | TN |
.
The minimum MCR over all threshold chosen was 0.0183, and the maximum was
0.0342. Fig. A·2 shows all MCRs against the number of temporal operators in the
corresponding formula.
0 20 40 60




















Figure A·2: Scatter plot of mean MCR against mean number of op-
erators in the STL formula generated by GridTLI for the fuel control
system data.
In an attempt to find threshold values that can give formulae with acceptable
number of operators and effective MCR values, the threshold values associated with
189
the lowest MCRs and smallest number of temporal operators was investigated. In this
case, the number of operators was arbitrarily chosen to be limited to 15 and MCR to
0.05. There are 323 points from Fig. A·2 within these limits. The average threshold
values for these points are (xEGOt = 0.22116, xMAPt = 0.1419, tEGOt = 18.215,, tMAPt =
18.215, cEGOt = 0.25137, cMAPt = 0.16128). These threshold values were used and
the results of GridTLI was compared with a previous approach.
Decision Tree TLI from [Bombara et al., 2016] was used to learn STL formulae
for the same 10-fold CV training/testing sets. Table A.1 shows statistics–in the form
‘mean (std)’–used for comparison. Testing MCR is the mean MCR over all testing
sets, and while the mean MCR for the testing sets of the STL formulae from GridTLI
is higher than that of Decision Tree TLI, it is still reasonably low. Training FPR is the
false positive rate over all training sets. These are interesting numbers to consider
since GridTLI is guaranteed to have zero false negatives among the traces in the
training set but provides no guarantee or bound on false positives. Operators is the
mean number of temporal operators over all learned STL formulae, and it is obvious
that the mean number of temporal operators used in GridTLI is significantly higher
than that of Decision Tree TLI. Runtime is the mean algorithm runtime in seconds
over all training sets, where it can be observed that GridTLI drastically outperforms
Decision Tree TLI.
Table A.1: Comparison of GridTLI and Decision Tree TLI
Testing MCR Operators Runtime (seconds)
Grid-based 0.0242 (0.0186) 7.1 (0.32) 0.12 (0.015)
Decision tree 0.0217 (0.0131) 4.40 (0.70) 379.80 (33.17)
A fair comparison between the method proposed in [Bombara et al., 2016] and
GridTLI is not trivial because the two algorithms were designed to solve different
problems. In particular, [Bombara et al., 2016] assumes the availability of labeled
190
traces, both positives and negatives, whereas GridTLI is designed to work when only
positive traces are available. Another important difference is that Decision Tree TLI
can learn nested temporal operators. With this difference in mind, even though
Decision Tree TLI is slightly more accurate in terms of sheer classification accuracy
and is able to induce slightly shorter formulae, GridTLI can construct an effective
formula much faster using only positive examples.
191
Appendix B
Application of STL Metrics in Loss functions
for TLI
Loss functions play a fundamental role in statistical inference and learning theory.
This framework, usually has a pair (X ,Y ) of real vector spaces corresponding to
the state and observation spaces, respectively. Three ingredients are used in the
formalization: (a) a model of the states pX(x) – the prior distribution, (b) a model
of the observations given the state pY |X(y | x), and (c) a real-valued loss function
L : X ×X → R. Let h : Y → X be a decision rule, which can also be interpreted as a
partition of the state space X based on observations, and H be the set of all decision
rules or the hypothesis space. The frequentist and Bayesian risks are defined based
on the loss functions, and induce optimal decision rules. This general framework is
the basis for the study and design of decision algorithms in statistical inference and
learning. For more details, see [Vapnik, 2013].
This section, shows how this framework for TLI was adapted, where the proposed
metrics, PH and SD, were used as loss functions. This thesis, only focuses on the loss
functions, while characterization of optimal decision rules, their computation, and
regularization are left for future work.
For TLI, the state space X = ΦST is the set of all time-bounded STL formulae,
while the observation space Y = 2ST is the set of all languages. The hypothesis space
H is composed of decision rules that map languages to STL formulae. Lastly, the loss
192
functions are defined as LST L : ΦST ×ΦST →R such that LST L(φ,h(S)) represents the
dissimilarity between the ground truth formula φ and the STL formula obtained by
the decision rule using the signal set S ⊆ ST . In this case, the PH and the SD metrics
can be used as loss functions LST L.
The performance of the two decision rules from TLI: TreeTLI [Bombara et al.,
2016] based on decision trees, and GridTLI [Vaidyanathan et al., 2017] based on
minimum covers of signals in space-time S × [0,T ], have been assessed with respect
to the two metrics as shown in Figure B·1b.
The ground truth STL formula that was used to generate the signals in Figure B·1a
is
φGT =[0,1]((x1 ≤ 0.1)∧ (x2 ≤ 0.6)∧ (x2 ≥ 0.4))
∧[7,10]((x1 ≥ 0.7)∧ ((x2 ≥ 0.8)∨ (x2 ≤ 0.2)))
where S = U2 and T =
∥∥∥φGT ∥∥∥ = 10. Note that TreeTLI requires both positive
and negative examples, while GridTLI only needs positive ones. For brevity, the
formalization for rules that require both types of examples have been omitted.
The results in Figure B·1b show the distances between the ground truth formula
φGT and the iterations of TreeTLI (lower plot) as the decision tree grows [Bombara
et al., 2016]. For GridTLI (upper plot), the discretization thresholds [Vaidyanathan
et al., 2017] were varied from rougher to finer grids, δs ∈ {0.5,0.45, . . . ,0.1} and δt ∈
{5,4.5, . . . ,1} for space and time, respectively. The upper plot for GridTLI highlights
the over-fitting phenomenon in the PH metric (red), where reducing the discretization
thresholds helps reducing the error, but further reduction leads to over-fitting. For
the SD (blue), the loss has a decreasing trend which can be hypothesized as due to a
better temporal fitting that the PH distance does not capture. In the case of TreeTLI
(lower plot), the PH distance (red) is constant. This masking behavior might be due
193
(a) Positive and Negative Signals (b) Results
Figure B·1: a shows the blue positive, and red and orange negative example
signals used by the two TLI algorithms. The positive signals start in the
gray region, and end in one of the black regions. The red negative signals
do not start in the gray region, while the orange ones do not end in the
black regions. b shows the PH and the SD distances between the ground
truth formula φGR and the learned formulae using GridTLI and TreeTLI,
respectively.
to the compounding effect of i) the primitives used do not match the structure of
φGT , and ii) the incremental and local nature of TreeTLI. Thus, the first step of the
decision tree is heavily penalized by the PH metric. The SD metric (blue), which
shows an increasing trend, is consistent with this conclusion.
Thus, the statistical learning approach to TLI gives insight into the ability of
algorithms to recover temporal logic rules assumed to underlie data. It also provides
a formal framework to study TLI methods. A detailed account of problems GridTLI
and TreeTLI are appropriate for based on the insights provided by the proposed
metrics is left for future work.
194
Appendix C
User Constraint File (UCF) for Phoenix
The user constraint file is an easy to read, detailed description of the genetic com-
ponent library. Phoenix enables users to convert the contents of the UCF to the
Synthetic Biology Open Standard(SBOL) format [Cox et al., 2018] which can then
be uploaded to a Synbiohub [McLaughlin et al., 2018] registry. The UCF designed
for Phoenix is an adaptation of the user constraint file used in Cello [Nielsen et al.,
2016]. This section describes the format of the UCF as well as various components
of the UCF.
Format
Like the Cello UCF, the Phoenix UCF is also specified using JavaScript Object No-
tation (JSON). JSON is a very popular data-interchange format since it is language
independent, human readable, and can be very easily parsed by machines. JSON
format consists of attribute-value pairs and array data types. The values in a JSON
attribute-value pair are restricted to strings, numbers, arrays, and objects, while at-
tributes are always strings.
The Phoenix UCF is structured as one JSON Object with the following attributes:
header, parts, composite-parts, modules, and eugene-rules. In this framework, based on
the modeling paradigm used in Phoenix, eugene-rules is the only optional attribute.
All other attributes are required to design genetic circuits.
195
“header”
The value of the header attribute describes the operating conditions for testing the
genetic circuit, as well as meta-data such as version and authors of the UCF. The
value of the header attribute is a JSON object with the following attributes:
version: An alphanumeric string that can be used to document versions of the
UCF.
Required : Yes
author: An array of strings, where each string represents an authors who has
contributed to or modified the UCF.
Required : No
description: A brief description of the UCF. This could also be notes associated
with the UCF.
Required : No
organism: The organism, species, and strain in which the genetic circuits were
built and tested.
Required : Yes
genome: The genotype of the organism.
Required : Yes
temperature: The temperature at which the genetic circuit behave as intended.
Required : Yes





The value of parts is an array of JSON objects. Each JSON object desribes a DNA
component or a Biochemical component (Section 3.3.1). Each part object has the
following attributes:
id: An alphanumeric string that is unique for each DNA component.
Required : Yes
name: The name of the DNA component. If this is omitted, the id value is assigned
as the name of the DNA component.
Required : No
sequence: The string representing sequence of the DNA component (in forward
orientation). The only characters allowed are ATGCatgc.
Required : Yes (except if the part is a small Molecule)
part: A string that identifies the type of DNA component or Biochemical compo-
nent. The only values allowed are: “promoter”, “rbs”, “cds”, “terminator”, and
“smallMolecule”.
Required : Yes
Additionally, based on the type of part, certain additional attributes are required. If
the part is a promoter, the following additional attribute is required.
type: A string that indicates the type of promoter. The only values allowed
are: “constitutive”, “activated”, “activated-induced”, “repressed”, and “repressed-
induced”.
Required : Yes
If the part is a CDS, the following additional attributes are required:
197
type: A string that indicates the type of CDS. The only values allowed are: “con-
nector”, “reporter”.
Required : Yes
k_loss: A float that specifies the degradation and dilution of the protein encoded
by the CDS.
Required : Yes
If the part is a small molecule, the following additional attributes are required:
min: A float that specifies the minimum small molecule concentration.
Required : Yes






































The value of composite-parts is an array of JSON objects. Each JSON object describes
a composite DNA component (Section 3.3.1). In the Phoenix framework, a composite
DNA component is always consists of only a single promoter followed by an RBS. Each
composite-parts object has following attributes:
id: An alphanumeric string that is unique for each composite DNA component.
Required : Yes
199
type: A string that specifies the type of composite DNA component. In the current
modelling paradigm, the value can only be “promoter-rbs”.
Required : Yes
parts: An ordered array of strings where each string represents the id of the DNA
component which is part of the composite DNA component.
Required : Yes
Additionally, composite-parts could also have the following attributes based on the
type of promoter in the composite DNA component:
id: An alphanumeric string that is unique for each composite DNA component.
Required : Yes
k_express: A float that specifies the sum term of all expression for a given con-
stitutive promoter
Required : Only if the composite DNA component contains a constitutive pro-
moter.
basal: A float that specifies the amount of protein expressed from the promoter
even with 0 input
Required : If the promoter in the DNA component is either of type “activated”,
“activated-induced”, “repressed”, or “repressed-induced”.
max: A float that specifies the maximum amount of protein expressed (low input
for repressors, high input for activators)
Required : If the promoter in the DNA component is either of type “activated”,
“activated-induced”, “repressed”, or “repressed-induced”.
K_d: A float that specifies the 1/2 activation threshold, at this value of input the
output is 1/2 the maximum.
200
Required : If the promoter in the DNA component is either of type “activated”,
“activated-induced”, “repressed”, or “repressed-induced”.
n: A float that specifies the Hill Co-effecient, and represents the cooperativity.
Required : If the promoter in the DNA component is either of type “activated”,
“activated-induced”, “repressed”, or “repressed-induced”.
K_d_SM: Same as K_d but for the hill function that represents the amount of
protein activated by the small molecule.
Required : If the promoter in the DNA component is either of type “activated-
induced” or “repressed-induced”.
n_SM: Same as n but for the hill function that represents the amount of protein
activated by the small molecule..
Required : If the promoter in the DNA component is either of type “activated-
induced” or “repressed-induced”.
K_d_TF: Same as K_d but for the hill function that represents the amount of
activated protein that binds the promoter.
Required : If the promoter in the DNA component is either of type “activated-
induced” or “repressed-induced”.
n_TF: Same as n but for the hill function that represents the amount of activated
protein that binds the promoter.











































The value of modules is an array of JSON objects. Each object represents a biological
module (Section 3.3.1). Each object contains the following attributes:
parts: An ordered array of strings where each string represents the id of a DNA
component or a composite DNA component. The ids are ordered such that the id
of the CDS precedes the id of the Promoter-RBS composite DNA component.
Required : Yes
interaction: A string that indicates the type of biological interaction. The string










The value of eugene-rules contains an array of strings where each string is a Eugene
rule that specifies structural constraints of the genetic circuit.




(2002). IEEE standard for verilog register transfer level synthesis. IEEE Std 1364.1-
2002.
Appleton, E. M. (2016). A design-build-test-learn tool for synthetic biology. PhD
thesis, Boston University.
Arkin, A. P. (2013). A wise consistency: engineering biology for conformity, reliabil-
ity, predictability. Current opinion in chemical biology, 17(6):893–901.
Baier, C. and Katoen, J. (2008). Principles of model checking. MIT Press.
Bartocci, E., Bortolussi, L., Nenzi, L., and Sanguinetti, G. (2015). System design of
stochastic models using robustness of temporal properties. Theoretical Computer
Science, 587:3–25.
Bartocci, E., Bortolussi, L., and Sanguinetti, G. (2014a). Data-driven statistical
learning of temporal logic properties. In Formal Modeling and Analysis of Timed
Systems, pages 23–37. Springer.
Bartocci, E., Bortolussi, L., and Sanguinetti, G. (2014b). Data-driven statistical
learning of temporal logic properties. In Legay, A. and Bozga, M., editors, Formal
Modeling and Analysis of Timed Systems, pages 23–37, Cham. Springer Interna-
tional Publishing.
Beal, J., Lu, T., and Weiss, R. (2011). Automatic compilation from high-level
biologically-oriented programming language to genetic regulatory networks. PLoS
ONE, 6.
Beal, J., Weiss, R., Densmore, D., Adler, A., Appleton, E., Babb, J., Bhatia, S.,
Davidsohn, N., Haddock, T., Loyall, J., Schantz, R., Vasilev, V., and Yaman, F.
(2012a). An end-to-end workflow for engineering of biological networks from high-
level specifications. ACS Synthetic Biology, 1(8):317–331.
Beal, J., Weiss, R., Yaman, F., Davidsohn, N., and Adler, A. (2012b). A method for




Bikard, D., Jiang, W., Samai, P., Hochschild, A., Zhang, F., and Marraffini, L. A.
(2013). Programmable repression and activation of bacterial gene expression using
an engineered crispr-cas system. Nucleic Acids Research, 41(15):7429–7437.
Bombara, G., Vasile, C.-I., Penedo, F., Yasuoka, H., and Belta, C. (2016). A Decision
Tree Approach to Data Classification Using Signal Temporal Logic. In Proc.
International Conference on Hybrid Systems: Computation and Control, pages
1–10, New York, NY, USA. ACM.
Brayton, R., Hachtel, G., McMullen, C., and Sangiovanni, A. (1984/5). Espresso-
ii: A new logic minimizer for programmable logic arrays. Proc. 1984 Custom
Integrated Circuits Conference.
Brayton, R. and Mishchenko, A. (2010). Abc: An academic industrial-strength
verification tool. In Computer Aided Verification, LNCS Computer Science, volume
6174, pages 24–40.
Brophy, J. A. and Voigt, C. A. (2014). Principles of genetic circuit design. Nature
Methods, 11(5):508–520.
Bryant, R. E. (1986). Graph-based algorithms for boolean function manipulation.
IEEE Transactions on Computers, C-35, No. 8.
Bufo, S., Bartocci, E., Sanguinetti, G., Borelli, M., Lucangelo, U., and Bortolussi,
L. (2014). Temporal Logic Based Monitoring of Assisted Ventilation in Intensive
Care Patients. In Leveraging Applications of Formal Methods, Verification and
Validation, number 8803 in Lecture Notes in Computer Science, pages 391–403.
Springer.
Cardinale, S. and Arkin, A. P. (2012). Contextualizing context for synthetic biology–
identifying causes of failure of synthetic biological systems. Biotechnology Journal,
7(7):856–866.
Cardinale, S., Joachimiak, M. P., and Arkin, A. P. (2013). Effects of genetic variation
on the e. coli host-circuit interface. Cell reports, 4(2):231–237.
Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey.
ACM computing surveys (CSUR), 41(3):15.
Chappell, J., Takahashi, M. K., and Lucks, J. B. (2015). Creating small transcription
activating RNAs. Nature Chemical Biology, 11(3):214.
Charoonnart, P., Purton, S., and Saksmerprome, V. (2018). Applications of microal-
gal biotechnology for disease control in aquaculture. Biology, 7(2):24.
205
Chu, P. P. (2006). RTL Hardware Design Using VHDL: Coding for Efficiency,
Portability, and Scalability. Wiley-IEEE Press.
Conway, J. B. (2012). A course in abstract analysis, volume 141. American Mathe-
matical Society.
Cox, R. S., Madsen, C., McLaughlin, J. A., Nguyen, T., Roehner, N., Bartley, B.,
Beal, J., Bissell, M., Choi, K., Clancy, K., et al. (2018). Synthetic biology open
language (sbol) version 2.2. 0. Journal of integrative bioinformatics, 15(1).
Cox, R. S., Surette, M. G., and Elowitz, M. B. (2007). Programming gene expression
with combinatorial promoters. Molecular systems biology, 3(1):145.
Crick, F. (1970). Central dogma of molecular biology. Nature, 227(5258):561.
Crick, F. H. (1958). On protein synthesis. In The Symposia of the Society for
Experimental Biology, volume 12, page 8.
Crites, B., Sanka, R., Lippai, J., McDaniel, J., Brisk, P., and Densmore, D. (2018).
Parchmint: A microfluidics benchmark suite. In 2018 IEEE International Sympo-
sium on Workload Characterization (IISWC), pages 78–79. IEEE.
Crockford, D. (2006). The application/json media type for javascript object notation
(json). https://www.rfc-editor.org/info/rfc4627.
Daika, M. S. and Maranas, C. D. (2008). OptCircuit: An optimization based method
for computational design of genetic circuits. BMC Systems Biology, 2(24).
Daniel, R., Rubens, J. R., Sarpeshkar, R., and Lu, T. K. (2013). Synthetic analog
computation in living cells. Nature, 497(7451):619–623.
Del Vecchio, D., Ninfa, A. J., and Sontag, E. D. (2008). Modular cell biology:
retroactivity and insulation. Molecular Systems Biology, 4:161.
Densmore, D. and Hassoun, S. (2012). Design automation for synthetic biological
systems. IEEE Design & Test of Computers, 29(3):7–20.
Dokhanchi, A., Hoxha, B., and Fainekos, G. (2014). 5th International Conference
on Runtime Verification, Toronto, ON, Canada., chapter On-Line Monitoring for
Temporal Logic Robustness, pages 231–246. Springer.
Donzé, A., Ferrere, T., and Maler, O. (2013). Efficient robust monitoring for STL.
In Computer Aided Verification, pages 264–279. Springer.
Donzé, A. and Maler, O. (2010). Robust satisfaction of temporal logic over real-valued
signals. Springer.
206
Donzé, A., Maler, O., Bartocci, E., Nickovic, D., Grosu, R., and Smolka, S. (2012).
On temporal logic and signal processing. In International Symposium on Auto-
mated Technology for Verification and Analysis, pages 92–106. Springer.
Endy, D. (2005). Foundations for engineering biology. Nature, 438(7067):449–453.
Fainekos, G. E. and Pappas, G. J. (2009). Robustness of temporal logic specifications
for continuous-time signals. Theoretical Computer Science, 410(42):4262–4291.
Friedland, A. E., Lu, T. K., Wang, X., Shi, D., Church, G., and Collins, J. J. (2009).
Synthetic gene networks that count. Science, 324(5931):1199–1202.
Galdzicki, M. et al. (2014). The Synthetic Biology Open Language (SBOL) provides
a community standard for communicating designs in synthetic biology. Nature
Biotechnology, 32:545–550.
Gardner, T. S., Cantor, C. R., and Collins, J. J. (2000). Construction of a genetic
toggle switch in Escherichia coli. Nature, 403(6767):339–342.
Gendrault, Y., Madec, M., Lallement, C., and Haiech, J. (2014). Modeling biol-
ogy with HDL languages: A first step toward a genetic design automation tool
inspired from microelectronics. IEEE Transactions on Bio-medical Engineering,
61(4):1231–1240.
Ghosh, S., Sadigh, D., Nuzzo, P., Raman, V., Donzé, A., Sangiovanni-Vincentelli,
A. L., Sastry, S. S., and Seshia, S. A. (2016). Diagnosis and repair for synthesis
from signal temporal logic specifications. In Proc. International Conference on
Hybrid Systems: Computation and Control, pages 31–40. ACM.
Goler, J. (2004). BioJADE: A design and simulation tools for synthetic biological
systems. Master’s thesis, MIT, Cambridge, MA.
Green, A. A., Silver, P. A., Collins, J. J., and Yin, P. (2014). Toehold switches:
de-novo-designed regulators of gene expression. Cell, 159(4):925–939.
Grosu, R., Smolka, S. A., Corradini, F., Wasilewska, A., Entcheva, E., and Bar-
tocci, E. (2009). Learning and detecting emergent behavior in networks of cardiac
myocytes. Communications of the ACM, 52(3):97–105.
Guye, P., Li, Y., Wroblewska, L., Duportet, X., and Weiss, R. (2013). Rapid,
modular and reliable construction of complex mammalian gene circuits. Nucleic
acids research, page gkt605.
Hoxha, B., Dokhanchi, A., and Fainekos, G. (2018). Mining parametric temporal
logic properties in model-based design for cyber-physical systems. International
Journal on Software Tools for Technology Transfer, 20(1):79–93.
207
Huang, Z., Wang, L., Nasikovskiy, Y., and Mishchenko, A. (2013). Fast boolean
matching based on npn classification. In 2013 International Conference on Field-
Programmable Technology (FPT), pages 310–313. Citeseer.
Hucka, M. et al. (2003). The Systems Biology Markup Language (SBML): a medium
for representation and exchange of biochemical network models. Bioinformatics,
19(4):524–531.
Huynh, L., Kececioglu, J., Koppe, M., and Tagkopoulos, I. (2012). Automatic design
of synthetic gene circuits through mixed integer non-linear programming. PloS
ONE, 7(4).
Huynh, L. and Tagkopoulos, I. (2014). Optimal part and module selection for syn-
thetic gene circuit design automation. ACS Synthetic Biology, 3(8):556–564.
Huynh, L., Tsoukalas, A., Koppe, M., and Tagkopoulos, I. (2013). SBROME: a
scalable optimization and module matching framework for automated biosystems
design. ACS Synthetic Biology, 2(5):1073–1089.
Jin, X., Donzé, A., Deshmukh, J., and Seshia, S. A. (2015a). Mining Requirements
from Closed-Loop Control Models. IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, PP(99):1–1.
Jin, X., Donzé, A., Deshmukh, J. V., and Seshia, S. A. (2015b). Mining requirements
from closed-loop control models. IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, 34(11):1704–1717.
Jones, A., Kong, Z., and Belta, C. (2014). Anomaly detection in cyber-physical
systems: A formal methods approach. In 2014 IEEE 53rd Annual Conference on
Decision and Control (CDC), pages 848–853. IEEE.
Karig, D. K. (2017). Cell-free synthetic biology for environmental sensing and reme-
diation. Current opinion in biotechnology, 45:69–75.
Kelly, J. R., Rubin, A. J., Davis, J. H., Ajo-Franklin, C. M., Cumbers, J., Czar, M. J.,
de Mora, K., Glieberman, A. L., Monie, D. D., and Endy, D. (2009). Measuring
the activity of biobrick promoters using an in vivo reference standard. Journal of
biological engineering, 3(1):4.
Kim, K., Fainekos, G., and Sankaranarayanan, S. (2015). On the minimal revision
problem of specification automata. The International Journal of Robotics Research.
Knight, T. (2003). Idempotent vector design for standard assembly of biobricks.
Technical report, DTIC Document.
208
Kong, Z., Jones, A., Medina Ayala, A., Aydin Gol, E., and Belta, C. (2014). Temporal
Logic Inference for Classification and Prediction from Data. In Proceedings of
the 17th International Conference on Hybrid Systems: Computation and Control,
HSCC ’14, pages 273–282, New York, NY, USA. ACM.
Kosuri, S. and Church, G. M. (2014). Large-scale de novo dna synthesis: technologies
and applications. Nature methods, 11(5):499.
Kraft, D. (2015). Computing the Hausdorff distance of two sets from their signed
distance functions. Computational Geometry.
Kuwahara, H. and Mura, I. (2008). An efficient and exact stochastic simulation
method to analyze rare events in biochemical systems. The Journal of Chemical
Physics, 129(16):165101.
Lagarias, J. C. and Odlyzko, A. M. (1985). Solving low-density subset sum problems.
Journal of the ACM (JACM), 32(1):229–246.
Lashkaripour, A., Silva, R., and Densmore, D. (2018). Desktop micromilled microflu-
idics. Microfluidics and Nanofluidics, 22(3):31.
Legay, A., Delahaye, B., and Bensalem, S. (2010). Statistical Model Checking: An
Overview, pages 122–135. Springer Berlin Heidelberg, Berlin, Heidelberg.
Lienert, F., Torella, J. P., Chen, J.-H., Norsworthy, M., Richardson, R. R., and Silver,
P. A. (2013). Two-and three-input tale-based and logic computation in embryonic
stem cells. Nucleic Acids Research, 41(21):9967–9975.
Lohmueller, J. J., Armel, T. Z., and Silver, P. A. (2012). A tunable zinc finger-
based framework for boolean logic computation in mammalian cells. Nucleic Acids
Research, 40(11):5180–5187.
Lou, C., Stanton, B., Chen, Y. J., Munsky, B., and Voigt, C. A. (2012). Ribozyme-
based insulator parts buffer synthetic circuits from genetic context. Nature Biotech-
nology, 30(11):1137–1142.
Lu, T. K., Khalil, A. S., and Collins, J. J. (2009). Next-generation synthetic gene
networks. Nature Biotechnology, 27(12):1139–1150.
Maler, O. and Nickovic, D. (2004). Monitoring temporal properties of continuous
signals. In Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant
Systems, pages 152–166. Springer.
Marchisio, M. A. (2014). Parts & Pools: A framework for modular design of synthetic
gene circuits. Frontiers in Bioengineering and Biotechnology, 2(42).
209
McLaughlin, J. A., Myers, C. J., Zundel, Z., Mısırlı, G., Zhang, M., Ofiteru, I. D.,
Goni-Moreno, A., and Wipat, A. (2018). Synbiohub: A standards-enabled design
repository for synthetic biology. ACS synthetic biology, 7(2):682–688.
Metropolis, N. and Ulam, S. (1949). The monte carlo method. Journal of the
American statistical association, 44(247):335–341.
Mishchenko, A., Chatterjee, S., and Brayton, R. (2006). Dag-aware aig rewriting
a fresh look at combinational logic synthesis. In Proceedings of the 43rd annual
Design Automation Conference, pages 532–535. ACM.
Mısırlı, G., Nguyen, T., McLaughlin, J. A., Vaidyanathan, P., Jones, T., Densmore,
D., Myers, C. J., and Wipat, A. (2018). A computational workflow for the auto-
mated generation of models of genetic designs. ACS Synthetic Biology.
Miyamoto, T., Razavi, S., DeRose, R., and Inoue, T. (2012). Synthesizing biomolecule-
based boolean logic gates. ACS Synthetic Biology, 2(2):72–82.
Moon, T. S., Lou, C., Tamsir, A., Stanton, B. C., and Voigt, C. A. (2012). Genetic
programs constructed from layered logic gates in single cells. Nature, 491:249–253.
Munkres, J. R. (2000). Topology. Prentice Hall.
Mutalik, V. K., Guimaraes, J. C., Cambray, G., Lam, C., Christoffersen, M. J., Mai,
Q.-A., Tran, A. B., Paull, M., Keasling, J. D., Arkin, A. P., et al. (2013). Precise
and reliable gene expression via standard transcription and translation initiation
elements. Nature Methods, 10(4):354–360.
Myers, C. J., Barker, N. A., Jones, K. R., Kuwahara, H., Madsen, C., and Nguyen,
N.-P. D. (2009). iBioSim: a tool for the analysis and design of genetic circuits.
Bioinformatics, 25(21):2848–2849.
Nielsen, A. A. (2017). Biomolecular and computational frameworks for genetic circuit
design. PhD thesis, Massachusetts Institute of Technology.
Nielsen, A. A., Der, B. S., Shin, J., Vaidyanathan, P., Paralanov, V., Strychalski,
E. A., Ross, D., Densmore, D., and Voigt, C. A. (2016). Genetic circuit design
automation. Science, 352(6281):aac7341.
Nielsen, A. A., Segall-Shapiro, T. H., and Voigt, C. A. (2013). Advances in ge-
netic circuit design: novel biochemistries, deep part mining, and precision gene
expression. Current Opinion in Chemical Biology, 17(6):878–892.
Nielsen, A. A. and Voigt, C. A. (2014). Multi-input CRISPR/Cas genetic circuits
that interface host regulatory networks. Molecular Systems Biology, 10(11).
210
Nirenberg, M. W. and Matthaei, J. H. (1961). The dependence of cell-free pro-
tein synthesis in e. coli upon naturally occurring or synthetic polyribonucleotides.
Proceedings of the National Academy of Sciences, 47(10):1588–1602.
Oberortner, E., Bhatia, S., Lindgren, E., and Densmore, D. (2014). A rule-based
design specification language for synthetic biology. ACM Journal on Emerging
Technologies in Computing Systems (JETC), 11(3):25.
Ortiz, L., Pavan, M., McCarthy, L., Timmons, J., and Densmore, D. M. (2017).
Automated robotic liquid handling assembly of modular dna devices. Journal of
visualized experiments: JoVE, (130).
Paddon, C. J. and Keasling, J. D. (2014). Semi-synthetic artemisinin: a model
for the use of synthetic biology in pharmaceutical development. Nature Reviews
Microbiology, 12(5):355.
Pan, P. and Lin, C.-C. (1998). A new retiming-based technology mapping algorithm
for lut-based fpgas. In Proceedings of the 1998 ACM/SIGDA sixth international
symposium on Field programmable gate arrays, pages 35–42. ACM.
Pedersen, M. and Phillips, A. (2009). Towards programming languages for ge-
netic engineering of living cells. Journal of the Royal Society, Interface, 6(Suppl
4):S437–S450.
Purnick, P. E. and Weiss, R. (2009). The second wave of synthetic biology: from
modules to systems. Nature reviews Molecular cell biology, 10(6):410–422.
Qi, L. S., Larson, M. H., Gilbert, L. A., Doudna, J. A., Weissman, J. S., Arkin,
A. P., and Lim, W. A. (2013). Repurposing crispr as an rna-guided platform for
sequence-specific control of gene expression. Cell, 152(5):1173–1183.
Quinn, J., Beal, J., Bhatia, S., Cai, P., Chen, J., Clancy, K., Hillson, N., Galdzicki,
M., Maheshwari, A., Pocock, M., et al. (2013). Synthetic biology open language
visual (SBOL Visual), version 1.0. 0. Technical report.
Raman, V., Donzé, A., Maasoumy, M., Murray, R. M., Sangiovanni-Vincentelli, A.,
and Seshia, S. A. (2014). Model predictive control with signal temporal logic
specifications. In 2014 IEEE 53rd Annual Conference on Decision and Control
(CDC), pages 81–87. IEEE.
Rodrigo, G. and Jaramillo, A. (2013). AutoBioCAD: Full biodesign automation of
genetic circuits. ACS Synthetic Biology, 2(5):230–236.
Roehner, N. (2014). Technology mapping of genetic circuit designs. PhD thesis, The
University of Utah.
211
Roehner, N. and Myers, C. J. (2014). Directed acyclic graph-based technology map-
ping of genetic circuit models. ACS Synthetic Biology, 3(8):543–555.
Roehner, N., Oberortner, E., Pocock, M., Beal, J., Clancy, K., Madsen, C., Misirli,
G., Wipat, A., Sauro, H., and Myers, C. J. (2015). Proposed data model for the
next version of the Synthetic Biology Open Language. ACS Synthetic Biology,
4:57–71.
Rosen, K. H. (1999). Discrete Mathematics and Its Applications. WCB/McGraw-
Hill.
Saeidi, N., Arshath, M., Chang, M. W., and Poh, C. L. (2013). Characterization of
a quorum sensing device for synthetic biology design: Experimental and modeling
validation. Chemical Engineering Science, 103:91–99.
Sarpeshkar, R. (2014). Analog synthetic biology. Philosophical Transactions of the
Royal Society A, 372(2012):20130110.
Shis, D. L. and Bennett, M. R. (2013). Library of synthetic transcriptional and
gates built with split T7 RNA polymerase mutants. Proceedings of the National
Academy of Sciences of the United States of America, 110(13):5028–5033.
Silva, R. J. (2017). MakerFluidics: low cost microfluidics for synthetic biology. PhD
thesis, Boston University.
Siuti, P., Yazbek, J., and Lu, T. K. (2013). Synthetic circuits integrating logic and
memory in living cells. Nature Biotechnology, 31(5):448–452.
Smanski, M. J., Bhatia, S., Zhao, D., Park, Y., Woodruff, L. B., Giannoukos, G.,
Ciulla, D., Busby, M., Calderon, J., Nicol, R., et al. (2014). Functional optimiza-
tion of gene clusters by combinatorial design and assembly. Nature biotechnology,
32(12):1241.
Stableford, B. M. (2004). Historical dictionary of science fiction literature, volume 1.
Scarecrow press.
Stanton, B. C., Nielsen, A. A., Tamsir, A., Clancy, K., Peterson, T., and Voigt,
C. A. (2014). Genomic mining of prokaryotic repressors for orthogonal logic gates.
Nature Chemical Biology, 10(2):99–105.
Tamsir, A., Tabor, J. J., and Voigt, C. A. (2011). Robust multicellular computing us-
ing genetically encoded NOR gates and chemical ’wires’. Nature, 469(7329):212–215.
Thomas, D. and Moorby, P. (2008). The Verilog® Hardware Description Language.
Springer Science & Business Media.
212
Tumova, J., Hall, G. C., Karaman, S., Frazzoli, E., and Rus, D. (2013a). Least-
violating control strategy synthesis with safety rules. In International Conference
on Hybrid Systems: Computation and Control, pages 1–10, Philadelphia, PA, USA.
Tumova, J., Reyes-Castro, L., Karaman, S., Frazzoli, E., and Rus, D. (2013b).
Minimum-violating planning with conflicting specifications. In American Control
Conference (ACC).
Umans, C., Villa, T., and Sangiovanni-Vincentelli, A. L. (2006). Complexity of
two-level logic minimization. IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, 25(7):1230–1246.
Vaidyanathan, P., Der, B. S., Bhatia, S., Roehner, N., Silva, R., Voigt, C. A., and
Densmore, D. (2015). A framework for genetic logic synthesis. Proceedings of the
IEEE, 103(11):2196–2207.
Vaidyanathan, P., Ivison, R., Bombara, G., DeLateur, N. A., Weiss, R., Densmore,
D., and Belta, C. (2017). Grid-based temporal logic inference. In 2017 IEEE 56th
Annual Conference on Decision and Control (CDC), pages 5354–5359. IEEE.
Vapnik, V. (2013). The nature of statistical learning theory. Springer.
Vasile, C. I., Aksaray, D., and Belta, C. (2017a). Time Window Temporal Logic.
Theoretical Computer Science, 691(Supplement C):27–54.
Vasile, C.-I., Tumova, J., Karaman, S., Belta, C., and Rus, D. (2017b). Minimum-
violation scLTL motion planning for mobility-on-demand. In IEEE International
Conference on Robotics and Automation, pages 1481–1488, Singapore, Singapore.
Watson, J. D., Crick, F. H., et al. (1953). Molecular structure of nucleic acids.
Nature, 171(4356):737–738.
Weber, E., Engler, C., Gruetzner, R., Werner, S., and Marillonnet, S. (2011). A
modular cloning system for standardized assembly of multigene constructs. PloS
one, 6(2):e16765.
Wolfram, S. (2002). A new kind of science, volume 5. Wolfram media Champaign,
IL.
Xie, Z., Wroblewska, L., Prochazka, L., Weiss, R., and Benenson, Y. (2011). Multi-
input rnai-based logic circuit for identification of specific cancer cells. Science,
333(6047):1307–1311.
Yaman, F., Bhatia, S., Adler, A., Densmore, D., and Beal, J. (2012). Automated
selection of synthetic biology parts for genetic regulatory networks. ACS Synthetic
Biology, 1(8):332–344.
213
Yao, A. I., Fenton, T. A., Owsley, K., Seitzer, P., Larsen, D. J., Sit, H., Lau, J., Nair,
A., Tantiongloc, J., Tagkopoulos, I., et al. (2013). Promoter element arising from
the fusion of standard biobrick parts. ACS synthetic biology, 2(2):111–120.
Younes, H., Kwiatkowska, M., Norman, G., and Parker, D. (2006). Numerical vs.
statistical probabilistic model checking. International Journal on Software Tools
for Technology Transfer (STTT), 8:216–228.
CURRICULUM VITAE
215
216
217
218
219
220
