Algorithms for White-box Obfuscation Using Randomized Subcircuit Selection and Replacement by Norman, Kenneth E.
Air Force Institute of Technology 
AFIT Scholar 
Theses and Dissertations Student Graduate Works 
3-2008 
Algorithms for White-box Obfuscation Using Randomized 
Subcircuit Selection and Replacement 
Kenneth E. Norman 
Follow this and additional works at: https://scholar.afit.edu/etd 
 Part of the Computer Sciences Commons 
Recommended Citation 
Norman, Kenneth E., "Algorithms for White-box Obfuscation Using Randomized Subcircuit Selection and 
Replacement" (2008). Theses and Dissertations. 2756. 
https://scholar.afit.edu/etd/2756 
This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been 
accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more 
information, please contact richard.mansfield@afit.edu. 
Algorithms for White-box Obfuscation
Using Randomized
Subcircuit Selection and Replacement
THESIS
Kenneth E. Norman, Major, USAF
AFIT/GCS/ENG/08-17
DEPARTMENT OF THE AIR FORCE
AIR UNIVERSITY
AIR FORCE INSTITUTE OF TECHNOLOGY
Wright-Patterson Air Force Base, Ohio
APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED.
The views expressed in this thesis are those of the author and do not reflect the
official policy or position of the United States Air Force, Department of Defense, or
the United States Government.
AFIT/GCS/ENG/08-17
Algorithms for White-box Obfuscation
Using Randomized
Subcircuit Selection and Replacement
THESIS
Presented to the Faculty
Department of Electrical and Computer Engineering
Graduate School of Engineering and Management
Air Force Institute of Technology
Air University
Air Education and Training Command
In Partial Fulfillment of the Requirements for the
Degree of Master of Science in Computer Science
Kenneth E. Norman, B.E.E., M.S.Eng.Mgt.
Major, USAF
27 March 2008
APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED.
AFIT/GCS/ENG/08-17
Algorithms for White-box Obfuscation
Using Randomized
Subcircuit Selection and Replacement
Kenneth E. Norman, B.E.E., M.S.Eng.Mgt.
Major, USAF
Approved:
/signed/ 27 Feb 2008
Lt Col J. Todd McDonald, Ph.D. (Chairman) Date
/signed/ 27 Feb 2008
Dr. Yong C. Kim (Member) Date
/signed/ 27 Feb 2008
Lt Col Stuart H. Kurkowski, Ph.D. (Member) Date
AFIT/GCS/ENG/08-17
Abstract
Software protection remains an active research area with the goal of preventing
adversarial software exploitation such as reverse engineering, tampering, and piracy.
Heuristic obfuscation techniques lack strong theoretical underpinnings while current
theoretical research highlights the impossibility of creating general, efficient, and in-
formation theoretically secure obfuscators. In this research, we consider a bridge
between these two worlds by examining obfuscators based on the Random Program
Model (RPM). Such a model envisions the use of program encryption techniques
which change the black-box (semantic) and white-box (structural) representations of
underlying programs.
In this thesis we explore the possibilities for white-box transformation. Under an
RPM formulation, if an adversary cannot distinguish an original program from either
its obfuscated version (whose black-box behavior has been strategically altered) or
a randomly generated program of comparable size, then the white-box intent of the
original program has been sufficiently protected. One proposed method of creating
such random indistinguishability is by choosing (at random) a program from a size-
bounded set of all semantically equivalent possibilities.
Since full enumeration of reasonably sized programs is not possible, in this
work we focus on obfuscators which introduce random white-box structural variation
based on iterative selection and replacement. We design and develop an obfuscation
framework for programmatic logic expressed as combinatorial Boolean circuits and
compare six unique approaches for sub-circuit selection. We analyze the relative
behavior of random and guided-random sub-circuit selection algorithms while showing
their utility in producing random white-box structural variation.
iv
Acknowledgements
To my wife and son: Thank you for your love and support. My success is
equally yours, and for your sacrifices, I owe you more than I can ever repay. I love
you both very much.
Professionally, I owe a debt of gratitude to my thesis advisor, Lt Col Todd
McDonald, and my research partner, Capt Moses James. As an electrical engineer in
a computer science program, I know I taxed their patience with my many questions.
Thank you.
Kenneth E. Norman
v
Table of Contents
Page
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Problem area . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Motivating scenario . . . . . . . . . . . . . . . . 1
1.1.2 Context . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Research objectives . . . . . . . . . . . . . . . . . . . . . 4
II. Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 What is obfuscation? . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Preliminary definitions . . . . . . . . . . . . . . 5
2.1.2 Classifications of obfuscation . . . . . . . . . . . 5
2.1.3 Theoretical definitions . . . . . . . . . . . . . . 6
2.1.3.1 Virtual Black Box Obfuscation . . . . . 7
2.1.3.2 Indistinguishability Obfuscation . . . . . 8
2.1.3.3 Best-Possible Obfuscation . . . . . . . . 9
2.1.4 Practical applications . . . . . . . . . . . . . . . 9
2.2 Shortfalls of current theoretical work . . . . . . . . . . . 10
2.3 Random Program Security Model . . . . . . . . . . . . . 11
2.3.1 Program encryption . . . . . . . . . . . . . . . . 12
2.3.2 Intent protection . . . . . . . . . . . . . . . . . 14
III. Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Programs represented as circuits . . . . . . . . . 15
3.2.1.1 Combinational circuits . . . . . . . . . . 16
3.2.1.2 Directed acyclic multi-graphs . . . . . . 18
3.2.2 Iterative randomization . . . . . . . . . . . . . . 21
3.2.3 Circuit library exists . . . . . . . . . . . . . . . 21
3.3 Obfuscation toolkit . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 CORGI: the circuit randomizer . . . . . . . . . 23
vi
Page
3.3.1.1 Development environment . . . . . . . . 23
3.3.1.2 Subcircuit selection and replacement . . 24
3.3.2 CXL: the circuit library . . . . . . . . . . . . . 25
3.4 Empirical Approach . . . . . . . . . . . . . . . . . . . . 26
3.4.1 Key concepts . . . . . . . . . . . . . . . . . . . 26
3.4.2 Properties of obfuscated circuits . . . . . . . . . 28
3.4.3 White-box obfuscation algorithms . . . . . . . . 30
IV. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Smart strategies . . . . . . . . . . . . . . . . . . 33
4.2.2 Introduced cycles . . . . . . . . . . . . . . . . . 34
4.3 Analysis of subcircuit selection algorithms . . . . . . . . 35
4.3.1 Common functions . . . . . . . . . . . . . . . . 36
4.3.2 RandomSingleGate . . . . . . . . . . . . . . . . 36
4.3.3 RandomTwoGates . . . . . . . . . . . . . . . . . 38
4.3.4 RandomLevelTwoGates . . . . . . . . . . . . . . 45
4.3.5 FixedLevelTwoGates . . . . . . . . . . . . . . . 47
4.3.6 LargestLevelTwoGates . . . . . . . . . . . . . 49
4.3.7 OutputLevelTwoGates . . . . . . . . . . . . . . 50
4.4 Runtime performance analysis . . . . . . . . . . . . . . . 52
V. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . 68
Appendix A. CORGI software . . . . . . . . . . . . . . . . . . . . . . 70
A.1 CORGI architecture . . . . . . . . . . . . . . . . . . . . 70
A.1.1 Functionality . . . . . . . . . . . . . . . . . . . 70
A.1.1.1 JGraphT . . . . . . . . . . . . . . . . . 70
A.2 Non-selection algorithms . . . . . . . . . . . . . . . . . . 70
A.3 Selection algorithm behavior . . . . . . . . . . . . . . . . 74
A.4 Selection algorithm results . . . . . . . . . . . . . . . . . 78
A.4.1 C17 with all algorithms . . . . . . . . . . . . . . 78
A.4.2 C880 with OutputLevelTwoGates . . . . . . . . 78
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Index-1
vii
List of Figures
Figure Page
1.1 Program Encryption . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 The Random Program Model . . . . . . . . . . . . . . . . . . . 12
2.2 RPM obfuscation . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Black box obfuscated program . . . . . . . . . . . . . . . . . . 13
3.1 The Random Program Model . . . . . . . . . . . . . . . . . . . 17
3.2 ISCAS Benchmark Circuit C17 . . . . . . . . . . . . . . . . . . 18
3.3 Graph examples . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Iterative randomization . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Circuit hierarchy example . . . . . . . . . . . . . . . . . . . . . 28
3.6 Example histogram . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Improper subcircuit selection creates cycles . . . . . . . . . . . 35
4.2 Introduced control flow in ISCAS C17 . . . . . . . . . . . . . . 42
4.3 Diffusion of replacements in ISCAS C17 . . . . . . . . . . . . . 43
4.4 How a replacement subcircuit creates a new control flow . . . . 44
4.5 Runtime data for R1G on C17 . . . . . . . . . . . . . . . . . . . 55
4.6 Runtime data for R1G on C880 . . . . . . . . . . . . . . . . . . 56
4.7 Runtime data for R2G on C17 . . . . . . . . . . . . . . . . . . . 57
4.8 Runtime data for R2G on C880 . . . . . . . . . . . . . . . . . . 58
4.9 Runtime data for RL2G on C17 . . . . . . . . . . . . . . . . . . 59
4.10 Runtime data for RL2G on C880 . . . . . . . . . . . . . . . . . 60
4.11 Runtime data for FL2G on C17 . . . . . . . . . . . . . . . . . . 61
4.12 Runtime data for FL2G on C880 . . . . . . . . . . . . . . . . . 62
4.13 Runtime data for LL2G on C17 . . . . . . . . . . . . . . . . . . 63
4.14 Runtime data for LL2G on C880 . . . . . . . . . . . . . . . . . 64
4.15 Runtime data for OL2G on C17 . . . . . . . . . . . . . . . . . . 65
viii
Figure Page
4.16 Runtime data for OL2G on C880 . . . . . . . . . . . . . . . . . 66
A.1 CORGI UML class diagram . . . . . . . . . . . . . . . . . . . . 71
A.2 Behavior data for all six selection algorithms . . . . . . . . . . 75
A.3 Chart of behavior data: circuit height . . . . . . . . . . . . . . 76
A.4 Chart of behavior data: circuit width . . . . . . . . . . . . . . 77
A.5 Sample results of R1G and OL2G algorithms . . . . . . . . . . 79
A.6 Sample results of R2G and FL2G algorithms . . . . . . . . . . 80
A.7 Sample results of RL2G and LL2G algorithms . . . . . . . . . . 81
A.8 Sample result 1 of OL2G applied to C880 . . . . . . . . . . . . 81
A.9 Sample result 2 of OL2G applied to C880 . . . . . . . . . . . . 82
A.10 Sample result 3 of OL2G applied to C880 . . . . . . . . . . . . 83
ix
List of Tables
Table Page
3.1 Notation for the Random Program Model . . . . . . . . . . . . 16
3.2 Features and benefits of JGraphT . . . . . . . . . . . . . . . . 24
3.3 Candidate circuit properties . . . . . . . . . . . . . . . . . . . . 29
3.4 Candidate subcircuit selection algorithms . . . . . . . . . . . . 30
4.1 Summary of runtime data . . . . . . . . . . . . . . . . . . . . . 53
x
Algorithms for White-box Obfuscation
Using Randomized
Subcircuit Selection and Replacement
I. Introduction
Across the Department of Defense, it is increasingly difficult to find a weaponsystems which does not rely upon software to perform its intended function.
The United States Air Force in particular is reliant on software across every facet of its
mission: air, space, and cyberspace. The ubiquity of software-based systems, and the
interconnectedness of such systems, demands we protect them from our adversaries’
prying eyes. In many cases, physical security is sufficient to thwart anyone who
seeks to gain access to our systems. When physical security fails to protect our
critical software, we must turn to alternate means. One such alternative is software
obfuscation.
1.1 Problem area
Software obfuscation is not a new concept, but neither is it a well-defined dis-
cipline in practice. The concept of software obfuscation is in many ways the un-
raveling of sound development principles. The objective in software engineering is
to produce systems which are defect-free, modular, maintainable, and extensible. A
well-engineered system will function as efficiently as possible, and perform the job the
user expects, in the manner he expects it. The objective in software obfuscation is to
produce highly coupled, difficult-to-understand, complex systems which, nevertheless,
perform the job the user expects, in the manner he expects it (though perhaps with
less efficiency by comparison).
1.1.1 Motivating scenario. In early 2001, the world watched as the US
and China found themselves at odds after what became known as the Hainan Island
1
incident. In brief, a US EP-3 reconnaissance plane and a Chinese Shenyang J-8
collided, and the EP-3 was forced to make an emergency landing on Hainan Island
off the south coast of China. According to a 2 April 2001 UPI press release [11],
“[t]he EP-3 could not have landed in a better place for China or a worse
one for U.S. military intelligence. Hainan island is host to one of China’s
largest electronic signals intelligence complexes and is manned by experts
who can glean critical information on the aircraft’s capabilities if they gain
access to the Navy’s EP-3” . . . Pentagon sources said.
The crew was held hostage for 12 days before being released. The plane, how-
ever, remained on Hainan Island for a total of 94 days, during which time China had
unfettered access to the equipment on board. If the EP-3 crew was unable to entirely
destroy all information storage devices (and the software they contain) before they
landed, then the Chinese had ample opportunity to learn about US collection methods
and targets of interest during the time the plane was in their control. Even if their
examination would have taken more than 94 days, it would have been easy enough to
copy the code (from undamaged equipment) and analyze it after they returned the
aircraft to US custody.
1.1.2 Context. This research augments earlier work initiated by Lt Col
Todd McDonald for his doctorate degree. In his dissertation, McDonald described
software obfuscation as protecting program intent [12]. The concept of intent protec-
tion stands in contrast to traditional definitions of obfuscation, all of which require
that a program’s functionality remain unchanged (without regards to some acceptable
degradation of time and/or space efficiency). Instead, McDonald takes inspiration
from the field of cryptography and likens intent protection to data encryption. The
idea is to transform a program in two ways—structurally and functionally. If func-
tionality (that is, input/output behavior) must change, then it must also be possible
to recover the original behavior (see Figure 1.1). McDonald further requires that an
intent protected program be indistinguishable from any other program, selected ran-
2
Figure 1.1: Program encryption under the Random Program Model
domly, which has a similar number of inputs, outputs, and is of similar size. This he
calls the Random Program Model (RPM).
The difficult question is how to devise a random selection schema. Clearly, for
any but the most basic of programs, software can be written in almost limitless ways
to accomplish the same function. If the set is impossible (or at least infeasible) to
create, an alternate means of “selection” is required.
Rather than attempt to enumerate entire sets of programs, then select a re-
placement in toto, we consider an alternate approach of iterative randomization. This
process obfuscates a program by changing the structure of only a small portion of the
program per iteration, but many iterations produce a randomized program.
For this nascent research, we narrow our focus to combinational Boolean cir-
cuits. This simplifies the problem domain by avoiding non-terminating programs and
program state (memory). Additionally, circuits can be modeled using constructs from
the mathematical discipline of graph theory.
3
1.2 Research objectives
We seek to accomplish two objectives with this research.
1. Develop a software architecture for developing and testing random selection
schema for obfuscating a circuit’s structure.
2. Develop an initial set of selection algorithms and characterize their behavior
with regards to white-box obfuscation.
The first objective above is a means to an end. In other words, to develop
and analyze selection algorithms, we need an architecture which will import, export,
and manipulate combinational Boolean circuits. No complete application is available
to perform the operations we seek to employ, so we developed a software package
(CORGI1) to fill the void. Although CORGI is all new, it integrates an existing Java
library (JGraphT) to represent the circuits as directed acyclic graphs.
For the second objective, we devised candidate algorithms which demonstrate
the concept of random selection and replacement. The algorithms each produce an
obfuscated version of an original circuit. Each circuit produced in this way is a
randomly “selected,” semantically equivalent version of the original, with the selection
occurring as a sequence of steps rather than a single-step selection from a large set.
Although this research is based on a new obfuscation paradigm, the next chapter
explores the current theoretical understanding of obfuscation and how it relates to
our current work.
1CORGI stands for C ircuit Obfuscation via Randomization of Graphs I teratively, and is dis-
cussed in more detail in Section 3.3.1
4
II. Literature Review
Several key papers have been published which provide theoretical bases for whyobfuscation is both impossible and, indeed, possible. Practical applications of
these theories, however, do not appear in the literature. As such, one approach,
the Random Program Security Model, proposes that practical obfuscation is indeed
possible and that a program’s intent can be protected even if the adversary has access
to the obfuscated version of the program. The Random Program Security Model is
fundamentally an analog to data encryption, but applied to programs rather than
data.
2.1 What is obfuscation?
2.1.1 Preliminary definitions. Before delving into the finer details of ob-
fuscation, it is instructive to understand how the word obfuscation is used in several
contexts. In generic speech, to obfuscate means to “make obscure” or “confuse” [13].
As applies to computing, to obfuscate means “to alter code while preserving its
behavior but conceal its structure and intent” [19]. Alternately, obfuscation is “any
efficient semantic-preserving transformation of computer programs aimed at bringing
a program into such a form, which impedes the understanding of its algorithm and
data structures or prevents the extracting of some valuable information from the
plaintext of a program” [18]. These two definitions provide the context for our review
of current theory and techniques for program obfuscation.
2.1.2 Classifications of obfuscation. Program development and execution
involves several steps, and program obfuscation can be applied at one or more of these
steps. Fundamentally, there are three classifications of program obfuscation: layout,
data, and control [3]. Layout obfuscation involves such techniques as scrambling
identifier names and removing layout formatting. Both of these techniques operate
on the source code, and do nothing to alter control flow of the program.
5
Data obfuscation is also primarily focused on altering the source code. Tech-
niques include (a) storage and encoding transformations, which alter the way data
is encoded or manipulated (b) aggregation transformations, which operate on data
structures, and (c) ordering transformations, which change the order of variables and
methods (within classes) and parameters (within methods). To some extent, these
techniques can have an impact on control flow within a program, but it is not the
primary intent. Like layout obfuscation, many of the specific transformations do not
change control flow (although some introduce new control mechanisms).
The final classification is control obfuscation, and its techniques include (a) con-
trol aggregation transformations, which break up computations that logically belong
together or merge computations that do not, (b) control ordering transformations,
which randomize the order in which computations are carried out, and (c) control
computation transformations, which insert new (redundant or dead) code, or make
algorithmic changes to the source application. Control obfuscation techniques, as de-
scribed in [3], are not strictly limited to source code, which means it has more generic
applicability (e.g., assembly language and machine code).
Among the three broad categories described above, general program (circuit)
obfuscation must account for control flow. This becomes clear as we look at additional
definitions of obfuscation.
2.1.3 Theoretical definitions. The first formalized theoretical definition of
program (or circuit) obfuscation was introduced by Barak et al. in [1]. This was a
watershed publication because it formally proved that universal obfuscators do not
exist. It also had the effect of spawning alternate theoretically-based definitions of
obfuscation in several publications which followed. We will look at several of these
definitions here.
6
2.1.3.1 Virtual Black Box Obfuscation. “Informally, an obfuscator O
is an (efficient, probabilistic) compiler that takes as input a program P (or circuit C)1
and produces a new program O(P ) that has the same functionality as P yet is un-
intelligible in some sense” [1]. In lay terms, virtual black box (VBB) obfuscation
can be thought of as some transformation to a program which completely hides all
information about the program except input/output (i.e., black box) behavior, even
though the obfuscated program is itself observable. In that sense, the obfuscated ver-
sion provides virtually equivalent information as could be obtained with only black
box access to the program.
Although informal, the definition above makes no distinction of what constitutes
a program. No mention is made of “source code,” “assembly language,” or “machine
code” anywhere in the paper (save one quote in a footnote). Thus, while there are
clear differences between the three levels of a program, their fundamental nature is
the same. Indeed, their equivalence is evidenced by the fact that programs can be
viewed as boolean (specifically, combinational) logic circuits, and the Barak paper
uses the terms program and circuit almost interchangeably. This is not to imply that
obfuscated source code will necessarily yield object code that is obfuscated to the
same degree (however measured). This remains an open question which, in part, will
be addressed by this thesis.
Barak et al. formally define a (circuit) obfuscator as having these three proper-
ties:
1. Functionality property: For every circuit C, O(C) describes a circuit that com-
putes the same function as C.
2. Polynomial slowdown property: There is a polynomial p such that for every
circuit C, |O(C)| ≤ p(|C|). This property may apply to size, run time, or both.
1Since this concept applies equally to programs and circuits, and since this thesis will specifically
explore obfuscation of circuits, we will limit further discussion to circuit obfuscation. Therefore,
substituting C for P does not alter the definition.
7
3. “Virtual black box (VBB)” property: For any probabilistic polynomialtime Tur-
ing machine (PPT) A, there is a PPT S and a negligible function α such that
for all circuits C,
|Pr[A(O(C)) = 1]− Pr[SC(1|C|) = 1]| ≤ α(|C|) (2.1)
The obfuscator O is efficient if it runs in polynomial time.
From this definition, Barak, et al. prove that no universal obfuscator exists.
The basis of their proof is to show that, for any given obfuscator, there exists a family
of circuits which cannot be obfuscated. “However, it does not mean that there is no
method of making circuits ‘unintelligible’ in some meaningful and precise sense” [1].
To be clear, the impossibility result still allows for a given obfuscator O to be able to
protect some (though not all) families of circuits C. From this, Barak et al. offer a
weaker notion of obfuscation: indistinguishability obfuscation.
2.1.3.2 Indistinguishability Obfuscation. An indistinguishability ob-
fuscator is defined in the same way as a circuit obfuscator, except that the “virtual
black box” property is replaced with the following:
• Indistinguishability property: For any PPT A, there is a negligible function α
such that for any two circuits C1, C2 which compute the same function and are
of the same size k,
|Pr[A(O(C1))]− Pr[A(O(C2))]| ≤ α(k) (2.2)
Observe that the indistinguishability property compares the obfuscations of two
different circuits, unlike the VBB property, which compares an obfuscated circuit to
a simulator which has only black box access to the original circuit. By weakening the
VBB definition in this way, it is provable that obfuscation (however inefficient) is not
impossible.
8
2.1.3.3 Best-Possible Obfuscation. Goldwasser and Rothblum define
an obfuscator as “a compiler that transforms any program (which we will view. . . as a
boolean circuit) into an obfuscated program (also a circuit) that has the same input-
output functionality as the original program, but is unintelligible” [6]. It is clear
that this is the same definition found in [1], but it is nevertheless included because of
the parenthetical comment that programs can be viewed as circuits. This concept is
central to the research presented herein.
2.1.4 Practical applications. Obfuscation software, of varying sophistica-
tion, is widely available from both commercial vendors and open source developers.
Among commercial products, there are several well-known titles. PreEmptive Solu-
tions [16] produces two popular tools: Dotfuscator (for .NET) and DashO (for Java).
Smardec [17], produces Allatori, a Java obfuscator. Yet another company, Semantic
Designs, Inc. [15] has a suite of tools collectively called Thicket™. It provides tools to
obfuscate several languages, including C, C++, C#, Java, JavaScript, Ada, and PHP.
There are, of course, other vendors which offer products that purport to obfuscate
software to some degree, but enumerating them all here is beyond the scope of this
thesis.
On the open source side, the number of projects is as plentiful as on the com-
mercial side. One in particular, ProGuard Java Optimizer and Obfuscator is one of
the most popular projects on SourceForge.net.2
It is not surprising that these companies and open source developers reveal lit-
tle about the inner workings of their obfuscation techniques, except to describe the
results of applying a particular approach (e.g., name obfuscation, flow obfuscation,
string encryption, etc.). Interestingly, however, Semantic Designs’ web site unequiv-
ocally states, “Warning: obfuscators do not stop reverse-engineering efforts by really
determined opponents.” This statement is an acknowledgment of the theoretical work
2From its home page, “SourceForge.net is the world’s largest Open Source software development
web site.” As of 16 Jan 2008, ProGuard was ranked 291 out of 166,996 projects listed.
9
of Barak et al. described above. Nevertheless, practical obfuscators are not in short
supply, despite this limitation, which begs the question: “Why not?”
2.2 Shortfalls of current theoretical work
To begin to answer the question of why practical software obfuscators are even
available, much less trusted, one must further ask, “what makes them useful despite
the impossibility results asserted—‘proved ’—by the theoreticians?” The answer is at
least two-fold.
First, commercial and open source obfuscation tools are not typically employed,
for the most part, to hide the purpose of the target software, but rather to hide the
manner in which that purpose is achieved. For example, Microsoft may choose to
obfuscate all or part of the source code for its spreadsheet program, Excel™. The ob-
fuscated version would not hide the fact that the application is a spreadsheet. Rather,
it would hide some portion of the code to prevent competitors from learning how part
of the code is implemented, thus protecting Microsoft’s competitive advantage in the
marketplace. In this way, the obfuscation would be useful, even if though it necessarily
fails the VBB paradigm of perfectly secure obfuscation.
A second (perhaps more profound) reason may be that the tools do not address
obfuscation from a theoretical perspective. In light of an absence in the literature that
correlates theoretical results to practical implementations, it is difficult to make this
claim definitively (i.e., “absence of proof is not proof of absence”). It is nonetheless
intriguing that developers do not relate the strength of their obfuscation schema to
results predicted by the theoretical models.
From a VBB perspective, no obfuscators of any ilk should be useful or benefi-
cial. Although the VBB standard is not achievable in a general, efficient, universal
sense, some amount of obfuscation, as pertains to some as-yet undefined metric of
obfuscation, may be desirable. This is certainly the case with existing obfuscators,
even if not explicitly stated or understood by the developers, because all such tools
10
both exist and fail the VBB test. Therefore, the VBB standard is not viable as a
measure of practical obfuscation.
The other two theoretical results mentioned before—indistinguishability obfus-
cation and best-possible obfuscation—are similar. They both relate obfuscation to
some property of the program, and use that to compare obfuscation results to each
other (whereas VBB relates obfuscation to a black box version of a program). This
distinction is subtle, but it opens the door to finding useful obfuscators even if they
fail VBB scrutiny. Unfortunately, the underpinning theory behind indistinguishability
obfuscation and best-possible obfuscation do not offer suggestions on what property
or properties of a program should be the basis of comparison when deciding if an
obfuscator yields indistinguishable results, or the best-possible level of obfuscation.
The research supporting this thesis was conducted to directly address what
properties of a program might (or might not) be useful measures of obfuscation,
and to provide a framework for empirically testing the efficacy of those properties.
In other words, we seek to produce a “tangible” correlation to the theoretical work
which has preceded this research. This objective is an outgrowth of the doctorate
research conducted by Lt Col Todd McDonald. In his dissertation, he suggests a new
paradigm of program obfuscation, the Random Program Security Model [12].
2.3 Random Program Security Model
Recall from [1] the theoretical benchmark definition of an obfuscator—the VBB
paradigm—requires that three properties hold: functionality, polynomial slowdown,
and the VBB property. Under the Random Program Security Model (or simply
Random Program Model, RPM), McDonald replaces two of the three properties,
functionality and VBB [12]. Only the polynomial slowdown property is retained.
For the functionality property, McDonald postulates instead that program ob-
fuscation should apply both black-box and white-box obfuscation techniques. The
principle is that neither approach on its own is sufficient to obfuscate a program.
11
Figure 2.1: The Random Program Model (Program domain)
When combined, however, they act synergistically to overcome the inherent weak-
nesses of each.
For the VBB property, McDonald reasons that if an obfuscated program is
indistinguishable from another program randomly-selected from the same family of
programs (based on inputs, outputs, and size of the program), then the intent of the
original program is protected.
The RPM is similar to, and derived from, data cryptography. RPM models
black-box obfuscation after data encryption, and white-box obfuscation is analogous
to comparing cryptographic data ciphers to random bit strings. Figure 2.1 graphically
depicts the RPM. The obfuscator function, O, uses both black-box and white-box
transforms, as shown in Figure 2.2. These are described below in Sections 2.3.1
and 2.3.2.
2.3.1 Program encryption. Figure 2.3 illustrates the concept of black box
obfuscation using program encryption. For an input x to program P , the result,
P (x) is the unobfuscated output of P . Intermediate result P (x) becomes the input
12
Figure 2.2: RPM obfuscation combines both black-box and white-
box transforms
Figure 2.3: A black box obfuscation P ′′ of program P . P and P ′′ are
not semantically equivalent because P ′′ includes a program, E, which
encrypts the output of P .
of another component, E, which encrypts P (x) based on some key k. The output
E(P (x), k) of E is the overall output of P ′′. Since P (x) 6= E(P (x), k) (i.e., P (x) 6=
P ′′(x)) for a given input x, program P ′′ is thus said to be a black-box obfuscated
version of P .
Program encryption might be sufficient to protect a program if an adversary
never obtains white-box access to the obfuscated program, P ′′. If the adversary did
have white-box access, the demarcation between P and E would be discernible, and
P would be revealed independent of E. Thus, RPM adds white-box protection to
program encryption to achieve overall protection of the program’s intent.
13
2.3.2 Intent protection. As previously stated, perfect, efficient, universal
VBB obfuscators do not exist. If an adversary has access to an obfuscated, seman-
tically equivalent program, the adversary will eventually be able to understand the
intent of the original program. McDonald theorizes that program encryption can be
augmented in such a way as to prevent an adversary from being able to isolate P
from E in an encrypted program P ′′. The goal is to hide the fact that there is a
semantics-altering component E. If this is possible, then even if the adversary is able
to (eventually) predict the output of P ′′, such output will be meaningless with respect
to P (x), and program intent will remain protected.
McDonald proposes that if P ′′ (which is not semantically equivalent to P ) is
replaced with a randomly chosen—or produced—program P ′ (which is semantically
equivalent to P ′′), then P is intent protected if the following hold:
• P ′ is such that the adversary cannot distinguish between the functional program
P and the composite encryption program E
• P ′ is indistinguishable from a random program selected from the set of all pro-
grams the same size as P ′
14
III. Methodology
The Random Program Model posits that an intent-protected program is indistin-guishable from any other program with the same number of inputs and outputs,
and of comparable size. This thesis specifically considers the white-box obfuscation
component of the RPM. In this initial research, a program is modeled as a combi-
national boolean circuit. The circuit is white-box obfuscated by iteratively replacing
random subcircuits with randomly-chosen, semantically-equivalent replacement sub-
circuits. Several algorithms are considered for selecting the subcircuits, and as well
as candidate metrics with which to quantify the level of obfuscation achieved.
3.1 Notation
Since this research follows earlier work conducted by Lt Col Todd McDonald,
we use his notation for the sake of consistency. Table 3.1 provides the notation used
in the discussion which follows.
3.2 Assumptions
The current experimental environment relies on some simplifying assumptions,
which are discussed here.
3.2.1 Programs represented as circuits. Software functionality, at its most
fundamental level, can be represented as a sequence of Boolean expressions. For typ-
ical programs, which include loops (for, while, etc.), sequential boolean circuits map
most directly to the program structure. In general, sequential (cyclic, in graph theory
parlance) circuits can be converted to combinational (acyclic) circuits. Edwards [4]
offers an algorithm which performs this transformation, but warns it is inefficient for
anything but trivially small circuits (his algorithm ran for 51 seconds when oper-
ating on a 281-gate circuit). Despite potential intractability when converting large
sequential circuits, we choose combinational logic over sequential logic because of its
comparative simplicity.
15
Table 3.1: Notation used in describing the Random Program Model
Variable Meaning
C A combinational Boolean circuit
C ′i Original circuit C after i iterations of randomization
C ′, C ′n Original circuit C after n-iteration randomization is finished
Ω circuit basis. Ω is a set of Boolean functions such that
Ω ⊆ {AND, NAND, OR, NOR, XOR, XNOR, NOT}
CX-Y -S-Ω the class of a circuit, indicating inputs (X), outputs (Y ),
size (S = maximum number of gates), and basis (Ω)
δ, δX-Y -S-Ω circuit family, i.e., the set containing all circuits CX-Y -S-Ω
δC family of circuits semantically equivalent to C (δC ⊂ δ)
The Random Program Model applies not only to the program domain, but to
the circuit domain as well. Figure 2.1 is given again (with only a notational change)
in Figure 3.1 to show the parallel between the two.
3.2.1.1 Combinational circuits. Combinational circuits have no state,
whereas sequential circuits are temporal, which is to say they have memory and feed-
back loops (cycles). Since sequential circuits can be decomposed into combinational
components, it is sufficient at the outset of this research to forgo the former in favor
of the latter. As an aside, combinational circuits sidestep the issue of non-terminating
programs–another complication of sequential circuits.
Our decision to use combinational circuits is supported by [9] which points
out in Chapter IV that a very simple grammar is all that is needed to compute
everything that can be computed by large languages like C and Java. In particular,
the grammar, in Backus Naur form, is shown in Equation 3.1 where B represents
any Boolean expression and E represents any integer expression. It is because of this
16
Figure 3.1: The Random Program Model (Circuit domain)
underlying simplicity that any software can be mapped to combinational logic form.
B ::= true|false|(!B)|(B&B)|(B ‖ B)|(E < E) (3.1)
An obvious benefit of choosing combinational logic is that it is easy to un-
derstand. As demonstrated in Equation 3.1 above, only three logic functions are
necessary: NOT (!), AND (&), and OR (‖). There are other commonly used logic
functions (namely NAND, NOR, XOR, and XNOR), but these can be represented
using various combinations of NOT, AND, and OR.
Combinational logic circuits are used across a broad spectrum of applications,
within both the hardware and software domains. At the 1985 International Sym-
posium of Circuits and Systems (ISCAS), the IEEE introduced a set of benchmark
circuits, which are collectively referred to as ISCAS-85 benchmark circuits. [8] They
are particularly useful to our purpose, even though they were initially targeted at the
hardware community. A list of these circuits can be found at [2]. The smallest of
these circuits, C17, is shown in Figure 3.2.
17
Figure 3.2: ISCAS Benchmark Circuit C17
3.2.1.2 Directed acyclic multi-graphs. In order to manipulate circuits,
they must be in a format suitable for that purpose. For this research, the discipline of
graph theory provides a suitable application domain. Namely, we represent circuits
as directed acyclic multi-graphs. We turn to Gross and Yellen [7] for a brief
refresher on graph theory terminology to help describe the rationale for choosing
graphs to represent circuits (reference Figure 3.3).
graph: A graph G = (V, E) is a mathematical structure consisting of two finite sets
V and E. The elements of V are called vertices (or nodes), and the elements of
E are called edges. Each edge has a set of one or two vertices associated to it,
which are called endpoints. [Example: All graphs in Figure 3.3.]
The authors correctly allow for edges with only one endpoint, which “is an edge that
joins a single endpoint to itself.” However, such a construct in a circuit would make it
sequential, not combinational. For our purposes, we only consider edges with exactly
two distinct vertices. See the definition for cycle below.
directed edge: A directed edge is an edge, one of whose endpoints is designated as
the tail, and whose other endpoint is designated as the head. An edge is said
to be directed from its tail to its head.
directed graph: A directed graph (or digraph) is a graph each of whose edges is
directed. [Example: Figures 3.3(b), (d), and (f).]
18
Figure 3.3: Example graphs.
(a) An undirected graph with no cycles.
(b) A directed graph with no cycles.
(c) An undirected graph with one cycle (1− 2− 3− 4− 1 and 1− 4−
3− 2− 1).
(d) A directed graph with one cycle (1 → 2 → 3 → 4 → 1 only).
(e) An undirected multi-graph with one cycle.
(f) A directed acyclic multi-graph.
19
We must limit the graphs we use to directed graphs because in a combinational
circuit, a connection between gates is always from the output of one gate to an input
of another gate.
cycle: A cycle is a nontrivial closed path.1
acyclic graph: An acyclic graph is a graph that has no cycles. [Example: Fig-
ures 3.3(a), (b), and (f).]
Combinational circuits do not have any feedback loops or memory, as do sequential
circuits. Therefore, only an acyclic graph can represent a combinational circuit.
multi-edge: A multi-edge is a collection of two or more edges having identical end-
points. The edge multiplicity is the number of edges within the multi-edge.
multi-graph: A multi-graph is a graph that may contain multi-edges. [Example:
Figures 3.3(e) and (f).]
In a combinational circuit, it is permissible for the output of one gate to be connected
to more than one input of another single gate. The analogous construct in graph
theory is a multi-graph.
directed acyclic graph: A directed acyclic graph (DAG) is a graph that is at the
same time a directed graph and an acyclic graph. It may or may not be a
multi-graph. [Example: Figures 3.3(b) and (f).]
For our purposes, we implicitly accept DAGs as also being multi-graphs. In other
words, DAG and directed acyclic multi-graph carry the same meaning, thus Fig-
ures 3.3(b) and 3.3(f) are both DAGs.
1A path does not repeat any vertex (except possibly the initial/final vertex) or edge. Nontrivial
means the path includes more than one vertex. Closed means the initial vertex is the same as the
final vertex.
20
3.2.2 Iterative randomization. The RPM requires that an intent-protected
circuit, C ′, be indistinguishable from a randomly selected circuit, CR. An interesting
aspect of the RPM is that the comparison itself is not influenced by the choice of orig-
inal circuit, C. Consequently, if the obfuscator O does not encrypt (i.e., semantically
transform) a circuit, the indistinguishability comparison can still be performed. This
fact allows us to segregate the white-box component of O from its black-box compo-
nent as we explore randomization methods for white-box obfuscation of circuits.
To perform white-box obfuscation, we consider the process of subcircuit se-
lection and replacement . Two reasons drive us to this choice. First, to randomly
select a white-box replacement of C would require enumeration of all circuits in δC .
As circuit size increases, δC becomes prohibitively large, and the obfuscator suffers
greater-than-polynomial slowdown. Second, the separate steps of subcircuit selection
and subcircuit replacement offer opportunities to inject randomness into the white-
box obfuscation process.
Section 3.4.3 describes selection and replacement in greater detail, but we in-
troduce here the basic of the concept (reference Figure 3.4). Given a circuit C which
is to be white-box obfuscated, select a subcircuit, Csub. Retrieve a randomly chosen
circuit Crep from a library of circuits which contains a set of all circuits semanti-
cally equivalent to Csub (the assumption that such a library exists will be discussed
in Section 3.2.3). Finally, remove Csub from C and insert Crep in its place. As long
as Csub and Crep are semantically equivalent (and the order of inputs and outputs is
preserved), then semantic equivalence exists for C, all C ′i, and C
′
n.
3.2.3 Circuit library exists. A library of replacement circuits must exist
in order for the process of iterative randomization to be possible. However, in Sec-
tion 3.2.2 we said that enumerating all possible replacements for C would violate the
polynomial slowdown condition of RPM. We overcome this apparent contradiction
21
(a)
(b)
Figure 3.4: Two representations of iterative white-box randomiza-
tion.
(a) White-box obfuscation of circuit C by iteratively replacing ran-
domly selected subcircuits (Csub) with a semantically equivalent sub-
circuit (Crep) chosen randomly from a circuit library. C is the unobfus-
cated circuit, C ′i is C after the i
th iteration of replacement, and C ′n is
C after an n-iteration obfuscation is complete.
(b) Depicts the sequential iterations of subcircuit selection and replace-
ment.
22
by developing2 a library whose contents are limited to only small circuits, typically
on the order of 5 or fewer gates. In this way, all semantically equivalent circuits in
a particular family (i.e., all C ∈ δC) can be enumerated. Therefore, in the iterative
replacement process, a given Crep can truly be selected from among all size-bounded
circuits semantically equivalent to Csub.
3.3 Obfuscation toolkit
As this research is empirically based, a software tool was developed to perform
the white-box circuit obfuscation portion of the RPM. Although the RPM calls for
both black-box (program encryption) and white-box (randomization) techniques, they
are performed independently from one another. This allows us to develop software
which only performs the white-box function. The tool has two major components,
CORGI and CXL.
3.3.1 CORGI: the circuit randomizer. CORGI, which stands for C ircuit
Obfuscation via Randomization of Graphs I teratively, was developed to empirically
analyze the RPM. Its development was a major benefit of this research. The inner
workings of the software are described in greater detail in Appendix A. Here, we
briefly discuss the main features of CORGI.
3.3.1.1 Development environment. CORGI is coded entirely in Java.
Several factors influenced this choice. First, there is a strong emphasis on object-
oriented design (OOD) at the Air Force Institute of Technology (AFIT), and Java
is the de facto language of choice for the academic environment. Second, given the
nature of the problem domain (i.e., circuits), OOD is a logical design choice. The
third factor is based on our choice of application domain (i.e., to represent circuits as
graphs), which allowed us to incorporate JGraphT into the development.
2The circuit library used in this research is a product of concurrent research conducted by Capt
Moses James. His research focuses on circuit randomization as a set selection problem.
23
Table 3.2: The most notable features and benefits JGraphT con-
tributed to the development of CORGI.
Feature Benefit to CORGI development
Graph package Model CORGI circuits as graphs. In particular,
JGraphT’s graph package included classes for all the
types of graphs described in Section 3.2.1.2.
Subgraph class Manipulate subgraphs without modifying the base
graph. This is a critical component of the subcircuit
selection and replacement process.
Exporter classes Export circuits to standard formats used by vari-
ous graph software packages (e.g., yGraph, GraphVis,
prefuse, etc.). Allows user to render circuits visually.
Algorithms package Contains classes for standard algorithms used in
graph theory. In particular, the CycleDetector class
is a critical part of CORGI because it enforces the
acyclic nature of DAGs.
JGraphT is an open source Java graph library [14]. Its free availability as an
open source project shortened the time to develop CORGI by at least several weeks–
possibly much more. JGraphT provides the means to easily generate graphs and
apply to them many of the common graph theory techniques. It is the crux of what
makes CORGI work. JGraphT not only provides the ability to model the underlying
graph of a circuit, it also has methods and services which make circuit manipulation
and analysis possible. Table 3.2 shows the key features and benefits of JGraphT.
Despite the graph basis for circuit manipulation—as implemented by way of the
JGraphT library—CORGI completely elides from the user any references to graphs or
graph behavior. Thus, CORGI is effectively a translation between the two domains.
3.3.1.2 Subcircuit selection and replacement. Subcircuit selection and
replacement is the principle function CORGI performs. From the user perspective, it
is a single action, but as already described, this function is iterative. We describe in
more detail here the mechanics of how CORGI carries out one iteration of the process
(ref. Figure 3.1).
24
CORGI does not actually select subcircuits. Instead, it selects a subset of the
circuit’s gates based on a selection strategy chosen by the user.3 This subset of the
circuit’s gates corresponds to a subset of vertices in the underlying graph, by which
a vertex-induced subgraph (or simply subgraph) is derived. CORGI then copies the
subgraph (leaving the base graph unchanged) and uses it to construct a separate
subcircuit representative of the gates selected.
Next, CORGI uses the new subcircuit’s truth table, along with other user inputs,
to request a replacement from the circuit library (CXL). CXL selects a random,
semantically equivalent, subcircuit replacement (i.e., its truth table is the same). The
original subcircuit is removed from the circuit, and the replacement subcircuit is
inserted in its place.
3.3.2 CXL: the circuit library. CXL is a component of CORGI which
contains a library of circuits. In a sense, CXL is really a library of sets of circuits.
Each set is a circuit family δC where C is characterized by a particular class CX-Y -S-Ω
(ref. Table 3.1).
Because of the various equivalence relationships in Boolean logic, |δC | rapidly
increases exponentially with even small increases in S and/or |Ω|. For practical rea-
sons, we choose S ≤ 3, although we do allow Ω ⊆ {AND, NAND, OR, NOR, XOR,
XNOR, NOT} (i.e., |Ω| ≤ 7).
From a user perspective, CXL is not a separate component from CORGI. Indeed,
CXL is accessed by CORGI via an interface, which is called from within the iterative
function of subcircuit selection and replacement. The user provides parameters which
are used by the interface, but the call itself is not controlled by the user. Because of
this, we consider CXL to be an integrated component of CORGI, and this perspective
is implicit in any further references to CORGI unless otherwise stated.
See [10] for more detailed information on the behavior of CXL.
3The initial implementation of CORGI limits selection to only one or two gates, primarily for
performance reasons, but also due to limitations imposed by the circuit library.
25
3.4 Empirical Approach
This research is predicated on the notion that we need empirical data to be able
to demonstrate whether practical obfuscation might be possible in light of theoretic
impossibility results. Perhaps there exist imperfect obfuscators that protect circuits
to a useful, measurable degree. Inherent in the preceding conjecture are two questions:
• What properties of circuits are indicators of useful, measurable circuit
protection?
• What methods of obfuscation produce such properties in circuits?
Since our standard of useful is the RPM, we are really asking what properties
of circuits are indistinguishable between an obfuscated circuit, C ′, and a randomly
selected (generated) circuit, CR. If we know which properties relate to indistinguisha-
bility under the RPM, our intuition is we should be able to easily find algorithms which
produce those properties in C ′. On the other hand, if we know that a particular ob-
fuscator will produce a C ′ which meets the RPM definition of indistinguishability, we
can deduce which properties are indicators of well-obfuscated circuits.
In reality, we do not know a priori the answer to either of the two questions
above. Our approach, therefore, is to work the problem incrementally to see where
the results converge. We briefly consider several candidate properties with which
to measure circuit obfuscation under RPM. Then we propose several algorithms for
performing subcircuit selection as part of the iterative randomization process. These
algorithms are applied to a circuit, C, and then the resulting white-box obfuscated
circuit, C ′, is examined for their effect on obfuscation under RPM. Next, we define
some key concepts used in the discussion which follows.
3.4.1 Key concepts. First, a circuit property, as we shall use the term, is
a descriptor of a single circuit. This is an important distinction since the white-
box circuit obfuscation process we employ is iterative (ref. Figure 3.4), creating
many intermediate circuits Ci
′ before finishing with Cn
′ (Cn
′ is the same as C ′ in
26
Figure 3.1). These intermediate circuits provide us the means to measure how a given
property changes throughout the iterative process, but each Ci
′ will have its own set
of properties independent of any other circuit.
Second, since combinational circuits are modeled as DAGs, we look initially to
graph theory for properties of graphs which may be candidate measures of circuit
obfuscation. This choice leads us to also use graph terminology to describe some of
the properties. When this occurs, equivalent terminology—if it exists—is included
parenthetically.
Third, our use of the term path is limited to only those paths which begin at
a circuit input and end at a circuit output. The intention is to describe control flow
through a circuit.
Fourth, DAGs are by their nature hierarchical, thus combinational Boolean
circuits are, too. A circuit’s gate hierarchy is dictated by the predecessor or successor
relationships of the various gates in the circuit. By our convention, if a gate precedes
another gate in some path through the circuit, then the preceding gate is at a higher
level. Equivalently, if a gate succeeds another gate along some path through the
circuit, then the succeeding gate is at a lower level. It is possible that a particular
gate could be assigned to any one of several levels, but our convention is to assign the
gate to the lowest level that preserves the hierarchy of the circuit.
Figure 3.5 demonstrates the concept of gate hierarchy. Note that gate B is at
level 2, not level 1, as is gate C. This is because the longest path from inputs of gate B
to the output of gate D is length 2. Similarly, gate A could have been assigned to
a new level 3, but the addition of the extra level breaks the convention that gates
should be assigned to the lowest level that preserves the hierarchy of the circuit.
Finally, certain proposed circuit properties are frequency distributions, repre-
sented graphically as histograms. An example might be the number of unique paths
that transit each gate. In Figure 3.5, for example, gate A has two unique paths:
27
Figure 3.5: A simple example of circuit hierarchy.
(a) A simple circuit (X = 4, Y = 1, S = 4, Ω = {NAND}) without
hierarchical levels.
(b) The same circuit with lowest hierarchy level assigned to each gate.
i0-A-C-D and i1-A-C-D. Similarly, gate B has two, gate C has four, and gate D has
six. The associated histogram is shown in Figure 3.6.
3.4.2 Properties of obfuscated circuits. A property of a circuit may be a
single value (e.g., average path length), or a distribution of values (see Figure 3.6). In
case of the latter, the property will be identified as such. We propose several circuit
properties as candidate measures of circuit obfuscation, without consideration of the
efficacy of each property (see Table 3.3.)
To be clear, the properties listed in Table 3.3 serve two purposes. First, they are
objects of the proposed algorithms (Section 3.4.3 below). Second, they are collectively
a leaping-off point for future research on which circuit properties are strong indicators
of effective obfuscation.
28
Figure 3.6: A simple example of a histogram of a circuit property.
The chart represents the frequency of occurrence of gates having a par-
ticular number of unique paths passing through them. In this example
circuit, two gates have two unique paths (gates A and B), one gate has
four unique paths (gate C), and one gate has six unique paths (gate D).
Table 3.3: A set of candidate circuit properties for measuring circuit
obfuscation.
Circuit-level Gate-level
Number of vertices at each hierarchical
level [distribution]
Number of paths through each gate
[distribution]
Set of input/output pairs as deter-
mined by paths through the circuit
Number of unique input/output pairs
represented by paths through each gate
[distribution]
Number of vertex (gate) types (|Ω|) Number of successors of each gate (i.e.,
gate fanout) [distribution]
Number of each vertex type (e.g., AND,
OR, etc.) [distribution]
Number of predecessors of each gate
(i.e., gate fan-in) [distribution]
29
Table 3.4: A set of candidate subcircuit selection algorithms used to
iteratively white-box obfuscate a circuit. Algorithm names are derived
from the file name of the Java class which implements the algorithm in
CORGI.
Selection Algorithm Description
RandomSingleGate Selects a single gate at random
RandomTwoGates Selects two gates at random
RandomLevelTwoGates Selects a hierarchical level at random, and limits re-
placement to two gates selected at random from that
level (±1 level)
FixedLevelTwoGates Same as RandomLevelTwoGates except the hierarchi-
cal level is specified
LargestLevelTwoGates Same as FixedLevelTwoGates except the hierarchical
level is the one containing the most gates
OutputLevelTwoGates Same as FixedLevelTwoGates except the hierarchical
level is 0 (level 0 contains all the output gates)
3.4.3 White-box obfuscation algorithms. CORGI was designed to use mul-
tiple, interchangeable subcircuit selection algorithms. Recall that under the RPM,
an obfuscated circuit C ′, which is semantically equivalent to circuit C, is indistin-
guishable from a random circuit CR. We would like to be able to select C
′ from a
completely enumerated δC′ , but for large |C|, the size of δC′ is prohibitively large to
enumerate all circuits in the set. This limitation forces us to choose another method of
random “selection” of C ′: iterative randomized subcircuit selection and replacement.
The process of obfuscating a large circuit by iteratively randomizing small sub-
circuits provides opportunities and introduces challenges as compared to direct selec-
tion from δC′ . An advantage of the process is that a subcircuit selection algorithm can
be chosen such that it optimizes a particular obfuscation metric. A disadvantage, due
to the fact that the process is a metaheuristic, may be that a particular sequence of
iterations will converge on a final C ′ with a suboptimal value for the target property.
30
Table 3.4 lists a candidate set of randomization algorithms developed for this
research with a brief description of each. In Chapter IV, we analyze these algorithms
and how they were derived.
31
IV. Results
CORGI is an architecture for obfuscating combinational Boolean circuits via iter-ative subcircuit selection and replacement. Six strategies for subcircuit selection
are implemented in CORGI as modular algorithms. When executed, these algorithms
transform a circuit C into a randomized (i.e., white-box obfuscated) but semantically
equal circuit C ′. The nature of the transformation is different for each algorithm.
Also, the design of certain CORGI components degrades CORGI performance (run-
time) when some selection algorithms are employed.
4.1 Overview
To perform white-box obfuscation under the RPM on circuit C, we would ideally
like to enumerate all circuits in δC , then select one at random as the semantically
equivalent replacement circuit C ′. Such enumeration is infeasible for large circuits,
which means a replacement circuit cannot be directly selected at random. Instead, it
must be built, but still yield a random C ′ ∈ δC . The process of iterative subcircuit
selection and replacement described in Section 3.2.2 provides two ways for introducing
randomness into the process.
1. Random selection: Select a subcircuit Csub ⊂ C at random.
2. Random replacement : Select a replacement circuit Crep ∈ δCrep at random.
There may also be some intermediate circuit C ′i for which non-random selection
and replacement are preferred. Here, also, there are two such smart choices.
1. Smart selection: Only select subcircuits which have a particular property. If
the subset of allowable selections contains more than one subcircuit, then one
may be selected at random or based on another property.
2. Smart replacement : Similar to smart selection, only select replacement circuits
from the library which have a particular property. If the subset of allowable se-
lections contains more than one subcircuit, then one may be selected at random
or based on another property.
32
4.2 Limitations
Our research exposed certain limitations on the development of subcircuit se-
lection and replacement algorithms. Smart strategies often impinged upon temporal
or spatial efficiency, and the problem domain (i.e., combinational Boolean circuits)
reduced the randomness of random selection strategies as we seek to avoid creating
sequential circuits.
4.2.1 Smart strategies. There are multiple ways to make smart subcircuit
selections. Some examples include choosing only subcircuits with a particular input
size (Xsub), output size (Ysub), circuit size (Ssub), basis (Ωsub), and/or truth table.
Selection can also be made based on particular subsets of the circuit’s gates. For
example, select only subcircuits which have gates in a particular hierarchical level in
the circuit. Other smart selection strategies require searching the underlying graph
for isomorphic subgraphs, which is an NP-complete problem [5]. These can all pose
intractability problems for our iterative randomization process when we have large
circuit sizes, which limits the efficiency of the search algorithm.
Consider a smart selection strategy which is based on subgraph isomorphism.
Since the search is NP-complete, and the search space can be quite large (circuits
with thousands, perhaps millions of gates), the strategy becomes too computationally
intensive to be efficient, as required by the RPM.
Two of the six algorithms (RandomSingleGate and RandomTwoGates) use a
purely random selection strategy which are discussed in Sections 4.3.2 and 4.3.3.
The other four algorithms (RandomLevelTwoGates, FixedLevelTwoGates,
LargestLevelTwoGates, and OutputLevelTwoGates) use a blend of smart and ran-
dom selection, as is described in Sections 4.3.4–4.3.7. None of the latter four algo-
rithms use NP-complete selection strategies.
As for smart replacement, CXL currently has no means to employ such a strat-
egy. The problem is more a limit on space than on time. Specifically, if all replacement
circuits are stored with sufficient metadata, then finding a particular replacement is
33
basically a simple lookup in a database. However, as the size bound of candidate
replacement circuits increases, the size of the library increases exponentially, thus
limiting the set from which replacements can be selected.
4.2.2 Introduced cycles. The choice of combinational Boolean circuits places
a particular limitation on which subcircuits may be selected for replacement, as stated
in Axiom 1.
Axiom 1. In order to maintain the combinational structure of circuit C, the set of
gates G(Csub) in a selected subcircuit Csub must not contain any pair of gates (Gi, Gj)
such that (WLOG):
(a) Gi precedes Gj along some directed path in C, and
(b) the longest directed Gi-Gj path in C is ≥ 2.
The results of improperly selecting Csub is shown in Figure 4.1. Figure 4.1(a)
shows a 4-input, 1-output, 4-gate circuit. In Figure 4.1(b), a 3-input, 2-output, 2-
gate subcircuit Csub is selected for replacement. However, Csub contains a pair of
gates, B and D, which violate Axiom 1. Figure 4.1(c) shows that a cycle is created
if Csub is replaced with any replacement circuit Crep. The problem occurs because
gate C receives an output from Csub but also provides an input to Csub, thus creating
a cycle. If Csub is improperly selected, there exists no Crep such that a cycle is not
created.
As a result of this limitation, the manner of subcircuit selection in CORGI
requires a sequential selection of gates for those algorithms which select multi-gate
subcircuits. There are differences in how this is performed for each algorithm, which
are discussed below. The important point here is that, once the set of gates is selected,
the subcircuit is defined (induced) by the set of selected gates, as well as all connections
(“wires”) leading into or out of those gates. It is not necessary that selected gates be
connected directly to each other in C.
34
(a) (b) (c)
Figure 4.1: An example of an improper subcircuit selection and how
it will create a cycle after replacement.
(a) A circuit before subcircuit selection.
(b) Subcircuit Csub is selected. It is not a valid selection since gate B
is a predecessor of gate D and the longest path from B to D is ≥ 2.
(c) A cycle is created after replacing an improperly selected subcircuit,
regardless of what replacement Crep is used.
4.3 Analysis of subcircuit selection algorithms
For this research, six subcircuit selection algorithms were developed. All algo-
rithms adhere to a standard selection interface in CORGI, which does not actually
select a subcircuit Csub directly from a circuit C
′
i. Instead, the interface requires each
algorithm to return a set of gates G. CORGI then uses G, together with JGraphT’s
DirectedSubgraph class to create the subcircuit Csub induced by the selected gates in
set G. Thus, each algorithm returns a set of gates, not a subcircuit. The sections that
follow describe the manner of selection and the “behavior” each algorithm exhibits.
The development of these algorithms was itself an iterative process. As each new
algorithm was developed and tested, the results would suggest alternate strategies for
selection. Therefore, the algorithms are presented below in roughly the same order in
which they were developed.
35
4.3.1 Common functions. The overall process of randomization is also
presented in Appendix A (Algorithm 7). For the sake of brevity here, we defer to
Appendix A, Section A.2 for the details of two functions used by the six selection
algorithms discussed below: SelectRandomGate (Algorithm 8) and RejectGates (Al-
gorithm 9). It is sufficient to know that SelectRandomGate randomly selects a gate
from a set of gates, and RejectGates populates a set of gates which should not be
part of the input to SelectRandomGate.
A third function, EstablishGateHierarchy, is used only by the so-called level
algorithms (those for which selection is based on a circuit’s hierarchical level—all
have “Level” in their name). The details of EstablishGateHierarchy are presented
in Appendix A (Algorithm 10), but its basic functionality is to assign each gate to
the lowest allowable level in the circuit’s hierarchy. The details of why this function
is required will be presented in Section 4.3.4, where we introduce the first of the level
algorithms, RandomLevelTwoGates.
4.3.2 RandomSingleGate. RandomSingleGate was the first selection algo-
rithm developed for CORGI. As the name implies, all subcircuit selections Csub are
of the class CXsub-1-1-Ωsub where
Ωsub ⊂ {AND, NAND, OR, NOR, XOR, XNOR}
Xsub ≥ 2
Ysub = Ssub = |Ωsub| = 1
Since all Csub contain only one gate, any gate can be selected from C
′
i and
replaced without creating cycles in C ′i+1 (as long as C
′
i is combinational). The selection
procedure is simple, as shown in Algorithm 1.
RandomSingleGate was developed initially as a simple algorithm by which
CORGI functionality could be tested. The function of removing a subcircuit from
a circuit, then replacing it with a different subcircuit is a non-trivial activity. Se-
36
Algorithm 1 RandomSingleGate(C ′i)
1: Gsub ← ∅ {set of gates (1 in this case) to induce Csub}
2: G(C ′i) ← set of all gates in C ′i
3: gk ← call SelectRandomGate(G(C ′i))
4: Gsub ← Gsub ∪ {gk}
5: return Gsub
lecting a single-gate subcircuit for replacement, while simple to do, provides multiple
dimensions by which to test the process of iterative randomization. When the sub-
circuit is replaced, gate properties such as type (NAND, NOR, etc.), fan-in (number
of adjacent predecessors), fanout (number of adjacent successors), and whether the
selected gate is a circuit output, must all be accounted for.
RandomSingleGate is a purely random (as opposed to smart) selection process.
No knowledge of the target circuit is needed other than the set of gates in the circuit.
The iterative process cannot be guided in any way.
There are three results produced by RandomSingleGate. First, no new external
control flows are introduced in the circuit; second, the size S of C ′i+1 is never smaller
than C ′i; and third, the circuit becomes very “tall” (` ∝ n, where ` = number of
hierarchical levels, and n = number of replacement iterations).
The first result is contingent on how we use the term control flow. If we have
access to the structure of a circuit, then every unique path through a circuit is a control
flow. If, however, we only have black-box access to a circuit, then no distinction can be
made between unique paths which share a common source (input) and sink (output).
We will refer to the former as internal control flow and the latter as external control
flow. RandomSingleGate will never introduce new external control flows because all
subcircuit inputs connect to a single subcircuit output. However, RandomSingleGate
will always introduce new internal control flows. The only way RandomSingleGate
does not produce new internal control flows is the trivial case where the selected single-
gate subcircuit is replaced with itself. All other semantically equivalent replacements
37
have more than one gate, with connections between them; thus new internal control
flows are always introduced.
The second result is a function of the replacement subcircuits Crep returned by
CXL. In order for a replacement Crep of a single gate subcircuit Csub to change C
′
i,
Crep must have more than one gate. The reason for this is there is no non-trivial
single-gate equivalence between any pair of gates (gi, gj) in Ω = {AND, NAND, OR,
NOR, XOR, XNOR}.
The third result is a natural consequence of the first two. A subcircuit Csub
comprised of a single Boolean logic gate has only one hierarchical level (` = 1). All
nontrivial replacements Crep of Csub have at least two gates. If Crep has n gates, then
it can have 1 ≤ ` ≤ n hierarchical levels. If ` ≥ 2, then C ′i+1 could “grow”—relative
to C ′i—by as much as ` − 1 levels (although it may not grow at all). The rate of
growth over many iterations is a function of which gates are selected and the average
number of levels in each Crep selected from CXL.
Reference Figure A.5(b) for a sample result of applying this algorithm to IS-
CAS benchmark circuit C17. As we see from the data presented in Section A.3,
RandomSingleGate produces the tallest C ′ on average among all the circuits.
4.3.3 RandomTwoGates. RandomTwoGates is meant to be a two-gate version
of RandomSingleGate. All subcircuit selections Csub are of the class CXsub-Ysub-2-Ωsub
where
Ωsub ⊂ {AND, NAND, OR, NOR, XOR, XNOR}
Xsub ≥ 2
Ysub = 1 or Ysub = 2
Ssub = 2
|Ωsub| = 1 or |Ωsub| = 2
38
The selection of Csub is accomplished by sequentially selecting the two gates.
The first gate, g1, is selected entirely randomly, in exactly the same fashion as the
gate gk was selected by the RandomSingleGate algorithm. The second gate, g2, must
be selected more carefully, however, in order not to introduce cycles after replacement.
Specifically, g2 can only be selected from a specific subset of gates in C
′
i that remains
after g1 was selected (ref. Section 4.2.2). The procedure is shown in Algorithm 2.
Algorithm 2 RandomTwoGates(C ′i)
1: Gcand ← ∅ {set of candidate gates to select from randomly}
2: Gcand ← Gcand ∪G(C ′i) {set of all gates in C ′i}
3: g1 ← call SelectRandomGate(Gcand)
4: Gcand ← Gcand − {g1} {g1 cannot also be g2}
5: Gcand ← Gcand − {call RejectGates(g1, true)} {remove predecessors of g1}
6: Gcand ← Gcand − {call RejectGates(g1, false)} {remove successors of g1}
7: g2 ← call SelectRandomGate(Gcand)
8: Gsub ← ∅ {set of gates to induce Csub}
9: Gsub ← Gsub ∪ {g1, g2}
10: return Gsub
There were two motivations for developing RandomTwoGates. We wanted to
continue testing the capabilities of CORGI to determine if the selection/replacement
process will work for Crep with more than one output. We also had the intuition
that a replacement for a multi-input, multi-output subcircuit would introduce new
external control flows.
RandomTwoGates is (almost) purely a random selection algorithm. The only
caveat is that not every pair of gates (g1, g2) ⊂ G(C ′i) are “legal” selections since
some pairs introduce cycles when replaced. Despite the fact that the candidates for
selecting g2 is a subset of G(C
′
i)−{g1}, RandomTwoGates is in no way a smart selection
algorithm.
There were three results from analyzing the behavior of RandomTwoGates. First,
we confirmed our intuition that new external control flows can indeed be intro-
duced. Second, similar to RandomSingleGate, the circuit becomes very tall, with
few gates in any single hierarchical level. Third, RandomTwoGates runs slower than
39
RandomSingleGate because CXL must select from a larger store of semantically equiv-
alent replacements as the number of inputs increases. We will discuss each result
separately.
By far the most profound discovery was that new external (and internal) con-
trol flows can be introduced (but it does not always occur). The reason it can is
because the subcircuit selected can be (and often is) comprised of two gates, g1
and g2, which are not adjacent to each other (i.e., g1 is not a predecessor of g2).
If g1 and g2 are adjacent, then the resulting subcircuit Csub will have only one output,
and RandomTwoGates will behave like RandomSingleGate for that single iteration.
The probability P that RandomTwoGates creates a new control flow during any
given replacement iteration i is described by
P (i) ∝
(
1− ne
X × Y
)
× pc × pa (4.1)
where i, ne, X, Y , pc, and pa are described below:
i A particular iteration of the algorithm
ne Number of external control flows in C
′
i before selection
X, Y Number of inputs and outputs, respectively, of C ′i
pc Probability that CXL returns a replacement subcircuit Crep with
more control flows than Csub (ref. Figure 4.4 for an example)
pa Probability that RandomTwoGates will choose two gates adjacent
to one another (i.e., the output of one gate feeds an input of the
other—ref. Section 4.3.2)
The foregoing can best be demonstrated with an actual circuit. Figure 3.2
depicts ISCAS benchmark C17, which was the target circuit for an experiment to
demonstrate how selection algorithm RandomTwoGates can introduce new external
control flows. A series of 20-iteration trials were performed until the final circuit C ′
had more external control flows than the original C (i.e., ISCAS benchmark C17).
After only seven trials, a C ′ was found with a path from input 1 to output 23, which
was not present in C. CORGI has the capability to output the results of each iteration
40
of randomization, and by looking back through the data, we found that the seventh
iteration produced the desired effect. Figure 4.2 shows the transition from C ′6 to C
′
7
(iteration #7 in this example).
The second result for RandomTwoGates—the fact that it also makes circuits grow
very tall—was somewhat unexpected. In retrospect, it probably should not have been
since the same relationship between the hierarchical levels of Csub and Crep described
in Section 4.3.2 exists for RandomTwoGates. As circuit size increases, the probability
that Csub will have two hierarchical levels (` = 2) decreases since the number of
adjacent gate pairs in C ′i is exponentially smaller than the number of all gate pairs
in C ′i. The rate at which a circuit obfuscated with RandomTwoGates grows taller
is, on average, slightly slower than for RandomSingleGate since there is a non-zero
probability that a one-output Csub is selected during a given iteration of subcircuit
selection and replacement.
The third result has to do with a non-intuitive property of circuit families. The
size of a given family δ is a function of several factors, including input quantity,
output quantity, basis, and gate quantity. But it is also a function of the signature
(truth table) of elements of δ. Some families have circuit signatures such that there
are relatively few (sometimes zero) elements. Others families may have thousands of
elements. When a subcircuit Csub is selected such that |δ| is large, the selection of a re-
placement Crep takes longer. RandomTwoGates selects Csub such that δCsub (from which
CXL must choose a Crep) is, on average, larger than it is when RandomSingleGate
is the selection algorithm. See [10] for more details on the relationship of a circuit’s
signature (truth table) to the size of its circuit family.
Reference Figure A.6(b) for a sample result of applying this algorithm to IS-
CAS benchmark circuit C17. Again, from the data presented in Section A.3, Random-
TwoGates produces C ′ which are, on average, about half the height of circuits pro-
duced by RandomSingleGate.
41
(a) Circuit C ′6 with Csub selected (b) New circuit C
′
7 with Crep inserted
Figure 4.2: Subcircuit selection and replacement using RandomTwoGates on ISCAS
C17, which creates a new external control flow in the circuit (input 1 [In1] to out-
put 23 [Out23]).
(a) Gates 31 and 32 (Csub) will be removed from C
′
6. Note there is no control flow
from In1 to Out23.
(b) New circuit C ′7 is created after Csub in circuit C
′
6 is replaced with semantically
equivalent Crep (gates 41, 42, and 43). A new control flow now exists from In1
to Out23 (path: In1→35→41→42→29→40→Out23).
42
(a) Circuit C ′7 with Csub selected (b) New circuit C
′
8 with Crep inserted
Figure 4.3: Subcircuit selection and replacement using RandomTwoGates on ISCAS
C17, which replaces a two gates, each added during different iterations.
(a) Gates 39 and 43 (Csub) will be removed from C
′
7. Note that gate 39 was not in
the original circuit.
(b) New circuit C ′8 is created after Csub in circuit C
′
7 is replaced with semantically
equivalent Crep (gates 44, 45, 46, and 47). Because of the structure of the selected
replacement circuit, gates 38 and 35 are each elevated to the next higher layer in the
circuit hierarchy.
43
(a) Csub from Figure 4.2(a) (b) Crep from Figure 4.2(b)
Inputs Outputs
26 35 39 31/42 32/43
F F F F T
F F T F T
F T F F T
F T T F F
T F F T T
T F T T T
T T F T T
T T T T F
(c) Truth table of Csub and Crep
Figure 4.4: An example of how a replacement subcircuit Crep can
introduce a new control flow where none existed in the selected subcir-
cuit Csub (reference Figure 4.2).
(a) No control flow exists between gate 35 and gate 29 in Csub.
(b) Subcircuit Crep has a control flow from gate 35 to gate 29.
(c) The truth table of both circuits.
44
4.3.4 RandomLevelTwoGates. The RandomLevelTwoGates selection algo-
rithm functions the same as RandomTwoGates except that Gcand only contains gates
which are in at most three contiguous levels of the circuit hierarchy. All subcircuit
selections Csub are of the class CXsub-Ysub-2-Ωsub where
Ωsub ⊂ {AND, NAND, OR, NOR, XOR, XNOR}
Xsub ≥ 2
Ysub = 1 or Ysub = 2
Ssub = 2
|Ωsub| = 1 or |Ωsub| = 2
`g2 = `g1 ± 1
The similarity between RandomLevelTwoGates and RandomTwoGates is in how it
selects the first gate, g1. In both cases, g1 is selected entirely at random. Since gate g1
occupies some hierarchical level `g1 , then level `g1 is a de facto random selection.
The difference between these two algorithms is in how gate g2 is selected. With
RandomLevelTwoGates, gate g2 must be selected from within levels `g1 , `g1 + 1, or
`g1 − 1. As we saw with RandomTwoGates, its gate g2 selection can be any gate that
does not introduce a cycle. Note that RandomLevelTwoGates selects two gates which
are within one level of each other. Therefore, no call to RejectGates is required since
it is impossible to introduce a cycle.
Having observed that RandomTwoGates increases the number of hierarchical lev-
els (i.e., the height) at approximately half the rate of RandomSingleGate (see Sec-
tion A.3 for discussion), we developed RandomLevelTwoGates to see if we could further
reduce the rate of height increase relative to the number of iterations performed, yet
retain as much randomness as possible otherwise. The conjecture was that, by limit-
ing subcircuit selection to gates in a single “band” of at most three hierarchical levels,
45
Algorithm 3 RandomLevelTwoGates(C ′i)
1: call EstablishGateHierarchy() {Assigns each gate to a hierarchical level}
2: Gcand ← ∅
3: g1 ← call SelectRandomGate(G(C ′i)) {A random gate from any level}
4: `g1 ← hierarchy level of gate g1
5: Gcand ← Gcand ∪G(`g1) {all gates in level `g1}
6: Gcand ← Gcand − {g1}
7: if `g1 > 0 then
8: Gcand ← Gcand ∪G(`g1 − 1) {all gates one level below g1}
9: end if
10: if `g1 < `MAX then
11: Gcand ← Gcand ∪G(`g1 + 1) {all gates one level above g1}
12: end if
13: g2 ← call SelectRandomGate(Gcand)
14: Gsub ← ∅
15: Gsub ← Gsub ∪ {g1, g2}
16: return Gsub
the propensity of a replacement circuit Crep to increase the circuit’s height should be
further mitigated.
The fact that we specifically disregard particular levels when choosing g2 makes
RandomLevelTwoGates a smart selection algorithm. Gate g1 is still selected randomly,
but since the subset of gates from which g2 is chosen is dependent on g1, we expect to
be able to better control how RandomLevelTwoGates modifies a circuit. The reason
that g2 is not restricted only to `g1 is because of the nature of subcircuits Crep returned
by CXL. As previously discussed in Section 4.3.2 (page 38), Crep can—and often
does—have more than one hierarchical level. If it occurs that Csub has the same
height as Crep, then the overall circuit C
′
i will not grow in height during iteration i.
There were four results from analyzing the behavior of RandomLevelTwoGates.
First, we confirmed our hypothesis that RandomLevelTwoGates produces shorter cir-
cuits than either RandomSingleGate or RandomTwoGates. Second, we demonstrated
that a smart selection strategy can be employed to guide the behavior of a white-box
obfuscator to a particular end. In this case, we took our observations of how single
46
iterations of random selection strategies impacted circuit growth to develop a smart
algorithm.
The third result is the nature of the internal circuit structure. Unlike the two
random selection algorithms, RandomLevelTwoGates has connections (edges) which
span fewer hierarchical levels. This can be observed by comparing Figures A.5(b)
and A.6(b). In the former image, many connections span more that half the length
of the circuit, whereas in the latter image, connections spanning more that eight
levels appear much less frequently. The implication of this finding is that level-based
algorithms could be useful if connection length is a circuit property that correlates to
the degree of obfuscation.
The fourth result has to do with algorithm efficiency. The function Establish-
GateHierarchy is a component of this (indeed, all four) level-based algorithm. It
must be invoked at the beginning of every iteration, as shown in algorithms 3, 4 , 5,
and 6 (line 1 in each). In its current implementation, EstablishGateHierarchy is
inefficient.1 For C ′i with small size |S|, this is not a problem; but as the number of
iterations increase, the circuit size also increases, and EstablishGateHierarchy slows
down the selection algorithm. Future versions of CORGI must take this performance
factor into account in order that level-based selection algorithms are efficient for large
circuits.
4.3.5 FixedLevelTwoGates. The FixedLevelTwoGates selection algorithm
functions the same as RandomLevelTwoGates except for two differences. Whereas in
RandomLevelTwoGates the target level is based on the selection of gate g1, the opposite
is true here. FixedLevelTwoGates must first have a level `F to target (user input),
and from that level, it selects gate g1 (the numerical value of `F remains constant for
all iterations). In addition, FixedLevelTwoGates selects gate g2 only from levels `F or
`F +1 (not from `F − 1). All subcircuit selections Csub are of the class CXsub-Ysub-2-Ωsub
1The details of why this is the case are discussed in Appendix A
47
where
Ωsub ⊂ {AND, NAND, OR, NOR, XOR, XNOR}
Xsub ≥ 2
Ysub = 1 or Ysub = 2
Ssub = 2
|Ωsub| = 1 or |Ωsub| = 2
`g2 = `g1 or `g2 = `g1 + 1
Algorithm 4 FixedLevelTwoGates(C ′i)
1: call EstablishGateHierarchy()
2: Gcand ← ∅
3: `F ← fixed level where 0 ≤ F ≤ `MAX {user inputs F}
4: Gcand ← Gcand ∪G(`F ) {all gates in level `F}
5: g1 ← call SelectRandomGate(Gcand)
6: Gcand ← Gcand − {g1}
7: if `F < `MAX then
8: Gcand ← Gcand ∪G(`g1 + 1)
9: end if
10: g2 ← call SelectRandomGate(Gcand)
11: Gsub ← ∅
12: Gsub ← Gsub ∪ {g1, g2}
13: return Gsub
The first three algorithms developed successively improved control over cir-
cuit growth as measured by circuit height, yet they each created a wide range of
height results. In other words, over many trials, the data shows a large standard
deviation (σ) for circuit height (see Figure A.2). Our motivation for developing
FixedLevelTwoGates next was to observe whether targeting a single level for subcir-
cuit selection would cause the circuit to grow wider than it did with
RandomLevelTwoGates.
There were two findings regarding FixedLevelTwoGates, one negative, and
one positive. First, it produces circuits which are (on average) both taller and
48
narrower than those produced by RandomLevelTwoGates. This is the opposite of
what we expected, but the circuits did exhibit one similarity to those produced by
RandomLevelTwoGates; namely, there are relatively few connections that span more
than 10% of the circuit’s height.
Second, however, FixedLevelTwoGates achieved more predictable behavior rel-
ative to the number of iterations performed (i.e., smaller standard deviation, σ). We
attribute that fact to limiting the selection of gate g2 to only two, rather than three,
contiguous levels in C ′i. This substantially limits the set of gates from which gate g1
may be selected (the previous three algorithms select gate g1 at random from among
all gates in C ′i). As a result, this smart selection algorithm has much less randomness,
which may be the basis of the tight coupling between circuit height and number of
iterations.
4.3.6 LargestLevelTwoGates. With LargestLevelTwoGates, we combine
the variable level selection of RandomLevelTwoGates with the targeted level selection
of FixedLevelTwoGates. This algorithm is procedurally the same as FixedLevel-
TwoGates except that the selected largest (widest) level, `W , is calculated for every
iteration.
Algorithm 5 LargestLevelTwoGates(C ′i)
1: call EstablishGateHierarchy()
2: Gcand ← ∅
3: `W ← widest (largest) level where 0 ≤ `W ≤ `MAX {tiebreaker: smallest `W}
4: Gcand ← Gcand ∪G(`W ) {all gates in level `W}
5: g1 ← call SelectRandomGate(Gcand)
6: Gcand ← Gcand − {g1}
7: if `W < `MAX then
8: Gcand ← Gcand ∪G(`g1 + 1)
9: end if
10: g2 ← call SelectRandomGate(Gcand)
11: Gsub ← ∅
12: Gsub ← Gsub ∪ {g1, g2}
13: return Gsub
49
Our objective in developing LargestLevelTwoGates is an algorithm that is ag-
ile enough to “chase” the largest (widest) level as C ′i grows. The nature of subcircuit
replacement, combined with the rigidity of predecessor relationships in a combinato-
rial Boolean circuit, causes gates to migrate to higher levels in the circuit hierarchy.
When a gate moves from one level to the next, the population of the level it origi-
nally occupied decrements by one. To combat this tendency, LargestLevelTwoGates
always selects gates from the largest level. If multiple levels are largest, choose the
lowest level among them.
Two results from LargestLevelTwoGates are clearly evident in Figure A.7(c).
First, the algorithm provides more control over circuit growth than any of the previous
selection algorithms. From the data in Section A.3, we see the average height of C ′
produced by LargestLevelTwoGates is approximately 54% the average height of
C ′ produced by its nearest competitor, RandomLevelTwoGates. Circuits produced
using LargestLevelTwoGates are also much wider than any of the other circuits.
RandomLevelTwoGates is again the closest competition, but LargestLevelTwoGates
produces C ′ more than twice as wide.
A second result is the fact that LargestLevelTwoGates can introduce external
control flows, just as we first saw with RandomTwoGates. The C ′ circuit represented
in Figure A.7(c) has external control flows In1–Out23 and In7–Out22, whereas the
original circuit C in Figure A.7(a) does not. Thus, LargestLevelTwoGates provides
a high degree of control over circuit growth, yet retains the potential to introduce
control flows.
4.3.7 OutputLevelTwoGates. The last of the six algorithms is another
variation on a theme. OutputLevelTwoGates is, in fact, only a special case of
FixedLevelTwoGates where the target level contains the circuit outputs. In other
words, using FixedLevelTwoGates with `F = 0 is the same as using OutputLevel-
TwoGates. Algorithm 6 shows this special case.
50
Algorithm 6 OutputLevelTwoGates(C ′i)
1: call EstablishGateHierarchy()
2: Gcand ← ∅
3: `0 ← level 0 which only contains circuit output gates
4: Gcand ← Gcand ∪G(`0) {all gates in level `0}
5: g1 ← call SelectRandomGate(Gcand)
6: Gcand ← Gcand − {g1}
7: Gcand ← Gcand ∪G(`1) {all gates in level `1}
8: g2 ← call SelectRandomGate(Gcand)
9: Gsub ← ∅
10: Gsub ← Gsub ∪ {g1, g2}
11: return Gsub
OutputLevelTwoGates was developed purely out of curiosity, and it proved
to be a worthwhile endeavor. When `F > 0 in FixedLevelTwoGates, there is the
possibility that the width of `F can increase. Conversely, the width of a circuit’s
output level (`0) is fixed, so any replacement of a subcircuit that contains an output
gate must not increase the number of circuit outputs.2 The net effect on circuit growth
is best described by way of analogy, followed by three example C ′ circuits produced
by OutputLevelTwoGates.
The behavior of OutputLevelTwoGates resembles the manufacturing process of
extrusion which creates long objects of a fixed cross-sectional profile. In this case,
the cross-sectional profile is circuit width. However, unlike the random algorithms,
OutputLevelTwoGates produces circuits in which all3 levels are approximately the
same width as output level `0. Regardless of how many outputs the circuit has, or
how many iterations are performed, the widest level will contain only a few more
gates than the output level, `0.
To see the extrusion effect, reference C ′ in Figure A.5(c) which has height of
93 levels and widest layer only of 4 gates. But on average, all layers are not 4
2Under the concept of black-box refinement, adding decoy outputs—and inputs—is desireable.
This research does not address the concept, however, so we restrict ourselves to preserving circuit
input and output quantities.
3An exception to this is when, prior to iteration 1, the widest level of C is substantially wider
than `0.
51
gates wide; they are only 2 gates wide, which is equal to the number of outputs.
For another demonstration, we apply OutputLevelTwoGates to a different ISCAS
benchmark circuit, C880, which has 26 outputs. The results, for different numbers of
iterations, are shown in Figures A.8–A.10. Again, the average width of all layers is
approximately 26, which is the same as `0.
From these results, we must revisit our observations for FixedLevelTwoGates.
Basically, FixedLevelTwoGates behaves the same as OutputLevelTwoGates, but the
extrusion occurs at some user-defined level. In essence, the target circuit C will be
“split” at the chosen level `F which has a particular number of gates, nF . All levels
0 through `F − 1 will remain unchanged and an extruded subcircuit will connect the
top and bottom of C in the final randomized circuit C ′.
4.4 Runtime performance analysis
We conclude with a brief discussion on the runtime performance of the six
algorithms. This is not intended to be a rigorous examination of CORGI performance,
but instead it will provide an understanding of what factors influence run times as
well as compare the performance of each of the six selection algorithms relative to one
another. Figures 4.5–4.16 contain runtime performance data for the six algorithms as
applied to two different ISCAS BENCH circuits: C17 and c880. Each figure displays
representative results from two trials for each combination (i.e., selection algorithm
and circuit). In all cases, each trial is 1000 iterations. Table 4.1 provides a summary
of the data.
The time required for each iteration of randomization is comprised of the time
needed to perform the selection and the time required for CXL to produce a replace-
ment subcircuit. For all algorithms, the latter is independent of both the structure
of C ′i and the time required for CORGI to select a subcircuit from C
′
i; however, the
runtime of CXL is dependent on whether it generates an equivalent subcircuit at
runtime, or simply selects an equivalent subcircuit from a static store. For the data
presented here, we used the runtime option. Even though that choice is more time-
52
Algorithm
C17 C880
Trial 1 Trial 2 Trial 1 Trial 2
RandomSingleGate 38 ms 37 ms 44 ms 44 ms
RandomTwoGates 441 ms 447 ms 476 ms 504 ms
RandomLevelTwoGates 290 ms 282 ms 430 ms 436 ms
FixedLevelTwoGates 404 ms 393 ms 420 ms 438 ms
LargestLevelTwoGates 350 ms 359 ms 400 ms 373 ms
OutputLevelTwoGates 331 ms 359 ms 438 ms 445 ms
Table 4.1: Summary of runtime data for the six selection algorithms.
The data show the average per-iteration time (in milliseconds) after
1000 iterations for two trials on each of two circuits: C17 and C880.
Times are rounded to the nearest millisecond.
intensive, the average per-iteration time will remain constant over many iterations.
Therefore, by comparing the results of one selection algorithm to those of another (or
the same selection algorithm applied to different circuits), we can deduce the relative
performance characteristics of CORGI.
RandomSingleGate is the fastest of the six selection algorithms. Each subcircuit
only has one gate, thus the subcircuit only has one output. As a result, CXL can
more quickly return a replacement. The slower times for RandomSingleGate when
applied to C880 vs. C17 is because C880 initially has 437 gates to only 6 gates for
C17.
The remaining five selection algorithms are substantially slower than Random-
SingleGate primarily because selected subcircuits contain two outputs. Therefore,
the library of equivalent subcircuits in CXL is substantially larger. Since, for our
experiments, CXL generates the equivalent subcircuits at runtime, the per-iteration
times increase substantially as compared to RandomSingleGate.
RandomTwoGates is the slowest of the six selection algorithms. RandomTwoGates
is the only selection algorithm that calls RejectGates (Algorithm 9), which is a
recursive DFS.
For the four level-based selection algorithms, average run times over 1000 it-
erations are all less than times for RandomTwoGates. Whereas RandomTwoGates
53
calls RejectGates, the four level-based selection algorithms all call EstablishGate-
Hierarchy. This, too, is a DFS, but employs pruning. Pruning is a graph theory
technique for limiting the search space, and in part accounts for the relative speedup
of these four algorithms as compared to RandomTwoGates. Another factor that con-
tributes to the increased speed of these four algorithms is the frequency of selecting
single output subcircuits. Specifically, these algorithms select gates in adjacent hi-
erarchical layers, which means the second gate selected by the algorithms is more
likely to be a predecessor or successor of the first gate selected. The result of such a
selection is a one-output subcircuit, which CXL more quickly produces than it does
two-output subcircuits.
54
(a) Trial 1
(b) Trial 2
Figure 4.5: Sample per-iteration runtime data from applying selection algorithm
RandomSingleGate to circuit C17.
55
(a) Trial 1
(b) Trial 2
Figure 4.6: Sample per-iteration runtime data from applying selection algorithm
RandomSingleGate to circuit C880.
56
(a) Trial 1
(b) Trial 2
Figure 4.7: Sample per-iteration runtime data from applying selection algorithm
RandomTwoGates to circuit C17.
57
(a) Trial 1
(b) Trial 2
Figure 4.8: Sample per-iteration runtime data from applying selection algorithm
RandomTwoGates to circuit C880.
58
(a) Trial 1
(b) Trial 2
Figure 4.9: Sample per-iteration runtime data from applying selection algorithm
RandomLevelTwoGates to circuit C17.
59
(a) Trial 1
(b) Trial 2
Figure 4.10: Sample per-iteration runtime data from applying selection algorithm
RandomLevelTwoGates to circuit C880.
60
(a) Trial 1
(b) Trial 2
Figure 4.11: Sample per-iteration runtime data from applying selection algorithm
FixedLevelTwoGates to circuit C17.
61
(a) Trial 1
(b) Trial 2
Figure 4.12: Sample per-iteration runtime data from applying selection algorithm
FixedLevelTwoGates to circuit C880.
62
(a) Trial 1
(b) Trial 2
Figure 4.13: Sample per-iteration runtime data from applying selection algorithm
LargestLevelTwoGates to circuit C17.
63
(a) Trial 1
(b) Trial 2
Figure 4.14: Sample per-iteration runtime data from applying selection algorithm
LargestLevelTwoGates to circuit C880.
64
(a) Trial 1
(b) Trial 2
Figure 4.15: Sample per-iteration runtime data from applying selection algorithm
OutputLevelTwoGates to circuit C17.
65
(a) Trial 1
(b) Trial 2
Figure 4.16: Sample per-iteration runtime data from applying selection algorithm
OutputLevelTwoGates to circuit C880.
66
V. Conclusions
The work described in the foregoing chapters comprises only the beginning ofa much larger effort. Going forward, we expect a steep learning curve given
the “obstacle” of the impossibility result presented in [1]. However, this research—
combined with that which will follow—seeks to set intent protection (which alters
structure and function) apart from the common understanding of obfuscation (which
only alters structure). In this research, we focused only on the process of white-box
obfuscation, a necessary but not sufficient component of program intent protection.
We further narrowed our scope to white-box obfuscating combinational Boolean cir-
cuits. We developed an architecture for manipulating circuits, and developed an
initial set of algorithms for white-box obfuscating circuits via subcircuit selection and
replacement.
5.1 Contributions
Perhaps our biggest contribution to our area of study is CORGI, the tool upon
which this and future research is based. As with any new software, its development
was not without difficulty. However, without CORGI, the process of subcircuit selec-
tion and replacement would have been entirely manual which would have yielded little
data: calculations by-hand would simply take too long. On the other hand, the time
spent to develop a stable architecture clearly impacted the number and complexity of
selection algorithms that were produced. We view this tradeoff as appropriate since
it will allow future research to focus on the process of obfuscation rather than the
tool that performs the task.
The six subcircuit selection algorithms we produced yielded some surprising
results, and they gave us new insights into the heretofore untested process of subcir-
cuit selection and replacement. The RandomTwoGates algorithm alone provided two
valuable results. First, it demonstrates that the gates of a subcircuit need not be
connected to be selected. Additionally, RandomTwoGates also demonstrates how a
circuit library (CXL) can provide replacement subcircuits that introduce new control
67
flows in the circuit. These results mean that completely disparate portions of a circuit
can be intertwined, both from a black-box (functional) and a white-box (structural)
perspective.
All six of the algorithms revealed that circuit size always increases when only
one or two gates are selected for replacement. For single-gate subcircuits, all re-
placements have at least two gates. For a two-gate subcircuit, if its function is not
semantically equivalent to a basic gate (AND, NAND, OR, NOR, XOR, or XNOR),
then all replacement circuits in the circuit library will be, on average, larger than two
gates. Unless and until we devise algorithms that select three or more gates can we
expect to reduce circuit size. The ability to either increase or decrease circuit size
is how the process of subcircuit selection and replacement will be able to produce a
truly random circuit from a particular circuit family.
Finally, the three “smart” algorithms, especially, LargestLevelTwoGates, show
how circuit growth can be controlled and predicted, even when the selection algorithm
produces ever-increasing circuit size.
5.2 Future work
As alluded to above, we see 3-gate selection algorithms as the most important
next step in devising an intent protection framework. One approach is to extend
RandomTwoGates to select a third gate at random. This may be the easiest to do, but
our insight is that it will provide results which will guide the development of other
algorithms. In particular, as another approach, it may be advantageous during some
iteration of selection to chose only subcircuits for which there is a large population
of replacements in the circuit library.1 Such a strategy will require the algorithm
to find subcircuits with a particular truth table. In graph theory, this is known as
subgraph isomorphism, and is an NP-complete problem. Depending on circuit size,
it may nonetheless be a feasible approach.
1This assumes the library has a cache of metadata on its stores of circuit libraries which can be
quickly and easily searched.
68
There are at least two ways CORGI can be augmented which have nothing to do
with the algorithms directly. Currently, CORGI maintains no historical log of what
steps and in what order were performed to obfuscate a circuit. A future version of
CORGI with this capability would benefit the notion that an original circuit can be
recovered from an obfuscated version. In a sense, such a log file would be analogous
to a data encryption key for the white-box portion of the obfuscator. It remains to be
seen what advantages might accrue for the cost of this operation, but its a question
worth exploring.
Finally, CORGI is a solid proof-of-concept tool, but to make it better suited
to the research, two major augmentations need to occur. An obvious shortfall is the
need for a better user interface. Although not addressed in this text, the tool func-
tionality was accessed for this research entirely through test cases since the textual
user interface was too cumbersome for repeated experimentation. Ideally, a graphical
user interface will be developed so that rapid selection of input parameters and selec-
tion algorithm(s) will further keep the focus on experimentation rather than coding.
CORGI also needs a review of the efficiency of some of its processes (not the selection
algorithms themselves). Under the hood, there are several methods which employ re-
cursive search algorithms that are not very efficient. They become even less efficient
as circuit size increases. By instituting some optimization techniques, and limiting
calls to these methods only when necessary, CORGI will be more likely to achieve, at
worst, polynomial slowdown for large circuits.
69
Appendix A. CORGI software
A.1 CORGI architecture
A.1.1 Functionality. CORGI is a Java application which employs a model-
view-controller (MVC) architecture. In Figure A.1 (page 71), the model is the
Circuit, which is composed of Gate objects. The controller is CircuitController.
The view is the UserCommandParser, which provides the user a text-based user in-
terface.
A.1.1.1 JGraphT. The Java graph library JGraphT, introduced in
Section 3.3.1.1, is the “engine under the hood” of CORGI. Recall, the ‘G’ in CORGI
stands for graphs, and JGraphT is what allows us to manipulate circuits as DAGs,
yet elide that fact from the user. Every circuit has an underlying graph (DAG), so
Circuit is really a façade for a JGraphT DirectedGraph.
All circuit modifying behavior is contained in Circuit; however, the mechanism
of subcircuit selection and replacement is modularized as a separate class, ... (more
to come)
A.2 Non-selection algorithms
For the sake of brevity in the main text, the discussion of the non-selection al-
gorithms is presented here. The entire process of subcircuit selection and replacement
is given in Algorithm 7. The procedures for removeSubCircuit, fetchReplacement,
and insertReplacement are elided since they are purely “mechanical” in the sense
that they do not impact the selection process. Once a subcircuit Csub is selected from
circuit C ′i, then these three methods will, respectively, remove Csub, get a replacement
circuit Crep from CXL, then insert Crep in place of Csub.
Algorithm 8 (SelectRandomGate) and Algorithm 9 (RejectGates) are helper
methods used by the six selection algorithms discussed in Chapter IV. SelectRandom-
Gate simply selects a single gate at random from among a set of gates. This capability
is needed since subcircuit selection relies on a sequence of random gate selections.
70
Figure A.1: The UML class diagram which shows the CORGI architecture.
71
Algorithm 7 performReplacement(Selection(C ′i))
1: C ′i ← circuit C after i iterations of randomization
2: Gsub ← ∅ {subset of gates in C ′i: Gsub ⊂ G(C ′i)}
3: Gsub ← call Selection(C ′i) {the interface for the selection algorithms}
4: Csub ← call RemoveSubCircuit(Gsub)
5: Crep ← call FetchReplacement(Csub) {this is the CXL interface}
6: C ′i+1 ← call InsertReplacement(Crep)
7: return C ′i+1 {circuit C ′i after replacing Csub with Crep}
Algorithm 8 SelectRandomGate(G)
Require: G is a non-empty set of gates
1: k ← uniform random number such that 0 ≤ k < |G|
2: gk ← the kth gate in G
3: return gk
RejectGates identifies the set of all gates which lie on all paths through a
particular gate and which are more than one hierarchical level removed from said
gate. RejectGates is the means by which performReplacement prevents cycles from
being introduced in C ′i+1 when replacing a subcircuit that contains more than one
gate.
72
Algorithm 9 RejectGates(gk, P )
Require: P true for predecessors of gk, false for successors of gk
1: Grej ← ∅ {set of rejected gates}
2: Gcurr ← ∅ {set of gates being considered for rejection}
3: Gprev ← ∅ {set of gates already considered for rejection}
4: Gnext ← ∅ {set of gates to be considered for rejection}
5: Gadj ← ∅ {set of predecessors (successors) of a gate}
6: Gcurr ← Gcurr + gk
7: if P = true then
8: Gadj ← predecessors of gk
9: else
10: Gadj ← successors of gk
11: end if
12: for all gates ga in Gadj do
13: if difference between hierarchy levels of ga and gkis > 1 then
14: Gcurr ← Gcurr ∪ {ga}
15: end if
16: end for
17: Gadj ← ∅
18: while Gcurr 6= ∅ do
19: Gnext ← ∅
20: for all gates Gc in Gcurr do
21: if P == true then
22: Gadj ← predecessors of Gc
23: else
24: Gadj ← successors of Gc
25: end if
26: Gnext ← Gnext ∪Gadj
27: end for
28: Gprev ← Gprev ∪Gcurr
29: Gcurr ← ∅
30: Gcurr ← Gcurr ∪Gnext
31: end while
32: return Grej
EstablishGateHierarchy is a circuit function that sets the hierarchy at-
tribute for all gates in the circuit. When there are multiple paths between a particular
pair of gates, and when one path is shorter than the other (in terms of number of
gates along the path), then one or more of the gates on the shorter path could legally
occupy any one of several levels in the hierarchy. We choose to assign gates to the
73
lowest possible level that adheres to this convention: every gate in the circuit will
always occupy a level that is lower (smaller) than the level of any of its predecessors.
Algorithm 10 EstablishGateHierarchy()
1: label all gates as `0
2: `G ← 0 {initialize global maximum level}
3: `L ← 0 {initialize local (output) maximum level}
4: for all circuit output gates gout do
5: `L ← call SetGateHierarchies(gout, 0, 0)
6: `G ← MAX(`L, `G)
7: end for
None of the so-called level -based selection algorithms would function properly
without EstablishGateHierarchy. EstablishGateHierarchy, in turn, relies upon
the recursive function SetGateHierarchies (described in Algorithm 11). The way
it works is to perform a DFS beginning at each circuit output, explore that output’s
predecessor tree (in the underlying DAG), and set the the hierarchy attribute for all
gates along the way. Some pruning is performed, but there will invariably be gates
that are visited at least twice, which makes EstablishGateHierarchy inefficient.
Since so much of CORGI relies on gates having a correct hierarchy attribute, future
versions of CORGI will benefit greatly from optimizing EstablishGateHierarchy.
Algorithm 11 SetGateHierarchies(gi, `L, `G)
1: for all predecessor gates gj of gate gi do
2: if `(gj) ≤ `(gi) then
3: `(gj) ← `(gj) + 1
4: `G ← call SetGateHierarchies(gj, `L + 1, `G)
5: end if
6: return MAX(`L, `G)
7: end for
A.3 Selection algorithm behavior
Figures A.2, A.3, and A.4 give insight into the behavior of the six selection
algorithms.
74
Algorithm Hmax Havg Hmin Hσ
R1G 291 183.7 117 61.5
OL2G 103 97.7 90 4.8
R2G 119 89.9 75 15.6
FL2G 78 69.6 62 5.1
RL2G 87 65.6 46 14.0
LL2G 46 35.2 31 4.2
(a)
Algorithm Wmax Wavg Wmin Wσ
R1G 9 6.3 4 1.5
OL2G 5 4.4 4 0.5
R2G 7 5.3 4 1.1
FL2G 6 5.4 5 0.5
RL2G 8 6.8 5 1.2
LL2G 20 14.8 12 2.4
(b)
Algorithm Havg/Wavg Growth(%)
R1G 29.2 90.4
OL2G 22.2 47.4
R2G 17.0 43.5
FL2G 12.9 33.3
RL2G 9.6 31.3
LL2G 2.4 16.1
(c)
Figure A.2: Experimental results from performing ten trials of 200 iterations each
using all six selection algorithms, with ISCAS circuit C17 as the target C. To pro-
vide a common mode of comparison, all three tables are sorted in decreasing order
of Havg.
(a) The number of hierarchical levels in C ′ (maximum, average, minimum, and stan-
dard deviation).
(b) The number of gates in the widest hierarchical level of C ′ (maximum, average,
minimum, and standard deviation).
(c) Height-to-width ratio and rate at which number of hierarchy levels increase per
iteration.
75
Figure A.3: Chart of data from Figure A.2(a).
76
Figure A.4: Chart of data from Figure A.2(b).
77
A.4 Selection algorithm results
A.4.1 C17 with all algorithms. Figures A.5, A.6, and A.7 display examples
of the results achieved when each of the six algorithms are applied to a simple ISCAS
benchmark circuit, C17. In each case, the algorithm ran for 200 iterations. The
images are DAGs which represent the various circuits. While they are not strictly
circuits, they demonstrate the behavior of each algorithm. All images are drawn to
relative scale for ease of comparison.1
A.4.2 C880 with OutputLevelTwoGates. Figures A.8, A.9, and A.10 shows
how circuit C880 changes over time when randomized using the OutputLevelTwoGates
selection algorithm. Compare Figure A.8 to Figure A.5(c). Note that C880, which has
26 outputs, grows in height much more slowly than does C17, which has 2 outputs,
when OutputLevelTwoGates is applied for 200 iterations.
1When viewing this document electronically in PDF format, the circuit details can be seen by
zooming in to at least 1600% magnification.
78
In1
10 In2
16
In3
11
In6
In7
19
Out22 Out23
(a)
In1
312
314
432433
In2
64 399
400
In3
11
236
507
508
In6
In7
164
447
30 42
91
159 180
182
319
363
364396405
407
414
416
555
32
212
219
456
38
146
496
43
52 106
108
544
47
48
71
119
171
67
68
72
140
351352
75
76
277
441600
602
609
79
Out23
Out22
225226 504
506
82
567
568
570
571
84
303304
127
137
204
205
402
404
498
499
500
564
591 592
99
103
104
105
516517
543
118
111
301
307
308
534
535
114
294
240
343
471
124
165
166
494
130
390
391
247324
141
153
142
217
465
147
258
259
260
309
310
514
150
152
154
155
522
156
157
267615
616
163
249
250
251
161
448
167
438
440
169
170
238525
526
173
572
174
176
232
179
330 331
495
546
185
188
360
189
423
193
355378
196
213 483
484
485
199
208
345
346
203
261
262
492
493
206
209
231
233
215
264
266
218
288
457
474
222
223
224
511
227
297298
228
230
229
450
245
248
316
317
436
437
531532
263
265
269
369
270
292
468469
274
420
278
283
284
337
453
455
286
287
486
487
502
293
357
358
359
424
333
296
472
299
302
510
512
305
373
374
587
594
595
313
320
523
321
322
323
417
418
326
327
576
329
332
335
444489
530
336
338
426 427
339
480 344
429
431
347
349
350
601
368
562366
356
362
513
365
408
421
370
371
376
377
540 541
379
380
537
538
539
382
383
384
385
386
387
388
389
392
393
569
556
557
398
612613
478
579
580
581
406
409
561
411
413
549 550
419
422
425
428
430
434
459
461
446
449
452
451
454
476
460
462
588
466
573
597
598
599
473
475
501
479
481
482
488
490
528
497
566
503
505
509
558
560
515
518
521
524
527
606
533
552
542
545
548
551
553
554
559
582
563
618
619
620
574
575
577
578
603
604
583
584
585
586
589
590
593
596
605
607
608
610
611
614
617
(b)
In1
26 In2
16
In3
11
In6
In7
19
24
34
37
44
47
36
41
28
30
31
32
53
40
45
46
50
49
55
54
62
64
57
59 65
63
75
83
84
71
67 77
73
74
82
79
80
81
8691
98102
88
9293
108
96
103
110
100107
117
115
113
112 119
122
116
128
134
130
126
127
131
141
142
166
145
135
138
139 147
148159
176
151
152
153
161
163
178194
195
201
175
165
168
169
171
187
177
180
181
186
184
188
197
204
207
208209
190
191
192
214
213
216
218211
215
229
225
230
221223
228
233
232
244
242 248254
238
240
241
258
249
255
250
252
259
256
265 266
261
289
301 309
263
280
268
270
275
272278
279
284282
286
291
292
287
296
306
298 303
300
308
310
307
312
313
315
332
314 353362
319 320
323
321
328
338
339
344
330
336
341
346
348
347
351
354
349
369
371
378
360
364
365
375
367
372
381
382
373
379
380
Out22
384
385
Out23
(c)
Figure A.5: Comparison of original circuit (ISCAS C17) to sample results of R1G
and OL2G algorithms (200 iterations; circuits represented as DAGs).
(a) C = ISCAS benchmark circuit C17 (height = 3 levels, width = 3 gates).
(b) C ′ after applying RandomSingleGate to C (height = 189 levels, width = 7
gates).
(c) C ′ after applying OutputLevelTwoGates to C (height = 93 levels, width = 4
gates).
79
In1
10 In2
16
In3
11
In6
In7
19
Out22 Out23
(a)
In1
261
386
572604
608
In2
512
598
599
In3
179
321
343
364
378
393
499
523
566
586
602
In6
110
189
271
334
366426
532
In7
70
418
587
610
612
33
349
175
344
160
520
521
585
163
517
167
467
168
277
171
448
449
181
184
435
445
447
463
464
510
515516
190
508
509
209
219
439
223
312
360
530
227
235
280
411462
478
237
257
506616
618
249
250
544 545
251
536
Out22
574
268
333
493
495
363
552
593
287
288
494
290
306
Out23
421
327
603
353
527
531
537
579
562
563
356
487
591
359
368
555
367
456
373
514
384
472615
388
403407
525
412
596
406
528
410
413
417
490
497
458
427
428
429
432
550
564
580581
437
540
441
617
601
451
453
471
461465
466
488
470
551
595
481
473
568
476
477
547
518
485
584
607
491
557
613
614
498
501
503
507
546
542
538
590
524
526
529
534
600
558
549
597
570
560
571
605
576
575
578
582
592
594
588
611
606
609
(b)
In1
30
In2
45
In3
2739
In6
In7
62
Out22
Out23
42 43
36
57
54
69
90
47
48
66
93
95
102
113
117
63 75 81
140
70
72
106
96
98
105
153
156
115
114
108
130
126 141
145129
132
135
190
213
147 168
149
150
178
195
171
174
159
185 204 222
237
165
187
196
192
231
277198
209
207
219
216
220
226
244
255
264
225
228
234
243
249
258
259
267
270
282
272
275327
301
273
285
287348
297 306
318
307
291
294
300 351
309
310
343
382
333
324
330
332
336
357
345
354
364
359
369
360
393
366
378384 387 402
370
386 388
396
449
465
399
417
414
429
398
483
416
430
420
423
432
462
433
435
441
442
453
457 456 472
459
468 498
463
474
480
481
500
513
489
519
528543
549
550
495
492
516
497
504
511
534514
537
540
541
553
600
559
545 585
546
552
564
570572
582
621
588
573579
576597
591
603
605
606
612
599
618
601
609
613
615622
619
623
(c)
Figure A.6: Comparison of original circuit (ISCAS C17) to sample results of R2G
and FL2G algorithms (200 iterations; circuits represented as DAGs).
(a) C = ISCAS benchmark circuit C17 (height = 3 levels, width = 3 gates).
(b) C ′ after applying RandomTwoGates to C (height = 93 levels, width = 6 gates).
(c) C ′ after applying FixedLevelTwoGates to C (height = 67 levels, width = 6
gates).
80
In1
10 In2
16
In3
11
In6
In7
19
Out22 Out23
(a)
In1
473
In2
402
490
551
In3
455
474
In6
318 456
In7
45
Out22
111
521
523
117
161
419 528
138
296
348
507
550
165
330
177
247
274
374
194
424
203
377
400
206
450
483 491
541
222477
560
572
210
313
574
214
215
310 225
539
279
441
513
253
369
371
559
254
372
379
435
468
283
326
293
333
343
512
299
425
308
543
311
517
314
503
504
327
328506
508
428
453547
563
565
334
335
501
207
405
443
444
446
525
356
427
357
358
359
362
542
544
370
373
380
376
378
469
558
385
387
389
388
534
391
396
398
470
505
Out23
488
418
408
413
414
415
417
457
422
423
426
537
433
462
436
437
548
442
562
471
449
451
476
494 495
458
530
460
466
467
497498
502
554555
478
509
480
561
496
533
535
499
518 520
549
510
514
515 516
573
448 527
569570
571
526 529
531
536
538
566 567
540
557
552
553
564
568
(b)
In1
39
In2
16
In3
11
In6In7
36
Out23
30
3363
69
78
157
60
Out22
74
197
200
238
153159207
219
53
55
56
62
117
137
148
231 87
152
84
101
134
162
174
175
199
210
320
96
123130
136
102
105
135
126
127
132
139
138
140
146
149
161165
204
263
168
180
259
476
346
508
205
226
187
188
191
196
261
222
216
225 240
252
315
318
228
243
255
365
258
246
291
267 294
268
276
306
273
369
371
569285 381 435439
288
316
287
356
605
293
334
390
603
372
373 448
492
507
582
335 325
329
331
349
350355
361
362
393396
399
419
440
483
489
562
579
378
379
473501
387583
385
388
471
591
593
417
452
486
522
434
517
442
451 453
475
460
464
577
552565
498528
549551
595
600
601
493
504
505
571
585
587
534
536 523
527
574
531
532
540
544
545
572
561
568
564
566
575
576
578
543
597 598
581
584
589 594 607
599
606 608
604
(c)
Figure A.7: Comparison of original circuit (ISCAS C17) to sample results of RL2G
and LL2G algorithms (200 iterations; circuits represented as DAGs).
(a) C = ISCAS benchmark circuit C17 (height = 3 levels, width = 3 gates).
(b) C ′ after applying RandomLevelTwoGates to C (height = 61 levels, width = 8
gates).
(c) C ′ after applying LargestLevelTwoGates to C (height = 32 levels, width = 15
gates).
In1
280
483
270
276
279269
In8
309
In13 In17
317
323322
432
443
In26
In29
986
984
285 287
273
290
291
In36
907 296 295
In42
284
294
In51
316
In55
427
437
In59
319
442286 293
In68 In72
In73
400
In74
In75
In80
In85
297
In86
In87
298
In88
In89
355
In90
886
In91
332 302 301 502
In96
333 504
In101
334304 303 506
In106
335 508
In111
336
511306305
In116
338
513
In121
340
515308307
In126
517
In130
519518499498
In135
501500
In138
318
In143
510 475
In146
512 477
In149
514 479
In152
In153
516 481
In156
In159
324325522 590593
In165
600
523 597
In171
609
606327326524
In177
616525 619
In183
625 628328329
526
In189
527
635
632
In195
330 331
528
644
641
In201
529
651
654
In207
521 520
In210
417
In219
810 809 808
836
852851853
794
In228
748 745
742
736
739
754 751759
In237
746749 740743737760 755 752
In246
605624 615640631659 650 596
In255
339 337341
In259In260
In261
734733732
758757In267
In268
310
806
825
807
828
804 721
805
811
802803
945
946
860
1164
861
385
943
1018
1293
827
949
987
712
845
947
1001
1002
826 777
814 673765766
747
796
530
552 550
772
533
551
813 682 764
539
565
770
538
561
537
557
678
536
553
744
822
677
771
741
819
349
935
930
994
1046
931
731
932
964
961952
1156
1222
933
998
343
392
963
345
393
346
399
937
347
410
348
540
569
735
849
409
463
929
1023
773
408
426
542
577
769
541
573
405
425
544
585587
738
815
543
581
404
407
406
460
424
547
586
402
925
923
778782
834 835839
829
840
837838
1005
926
482 480476 478369
921
922
859
416
495
727
415
445
918
972
466
722
763761756
700 697
696
832 833
414
413
492
412
444
795
411812
841 830831
379
382
375
915
376
910
912953
965
1087
1171
1245
913
1009
717
793792
669 686
713
762753
1201
687692704
451
843842
888
1022
981
1117
980
1072
1137
858
848
857
9901056
1010
847 846
855 854
844
788 789
708
790 791
489 488491 490
705
750
977
978
1031
357360
363366
971
1185
1004
974
975
1039
589
661
588
893 891
869 868
969
992
968
967
996
1011
1147
966
1008
862
991
873 872
867
871
960
1003
1169
959
1098
1047
958
954
956
1250
881
1043
1165
883
1203
1371
1502
1206
1233
1266
1200
1242
1301
1202
1306
1387
1247
1208
1256
1209
905
885
1119
887
1060
10591051
1052
882
899
904
877875
1075
1061
876
890
670
997
1007
1057
1106
1100
1013
1014
895
1000
896
903781785
786
787
505503509 507
662 665
1195
1217
1197
13781407 1433
1192
1193
1219
1090
1093
1114
1095
1146
1190
1426
1105
1089
1113
1188
1238
1186
1184
1183
1080
1120
1182 1180
1338 1083
1081
1234
1173
1078
1179
1230
1067
1102
1066
1064
1127
1063
1058
1110 1148 1049
1150
1054
1030
1017
1028
1033
10061157
1318
1213
350
1104
1140
1015
1163 1231
1375
1376
1125
1025
1111
1235 1236
1124
1134
1367
1122
1135
1176
1275
1336
14721130
1284
1259
352
351
1115
354
1118 1343
1214
1012
1264
1316
1446
1121
1162
1265 1136
1050
Out389
1170
1218
1139
1252
1297
1036
1312
1041
1143
1248
1167
1024
1302
1419
1505
1289
1294
1353
1458
1518
1552
Out419
1159
1161
12111261
1263
1210
1332 1225
1399
1257 1276
1278
12681272
1274
1258
1344
1329
1249
12431300
1346
1340
1342
1254
1348 1531 Out878
1310
1335
1267 1269
1260
1325
1281
1382
1279 1292
1286
1413
1326 1356
1417 1282
1392
1285
1368
1370
1473
1359
Out880
13631313
1328
13191504
1349
1373
1498
1323
1435
1361
1445
1436
1479
1454
1489
1350 1418
1484
1388
1424
1339
1383
1385
1396
1540
1403
1362 1437 14411374
1468
1386
1369
1393
1443
1389 15081565
1423
1440
Out388
1520
15221448
Out865
1477
14501500
1475
1397
1432
1429 1430
1444
1467
1406
1462
1459 1460
1414
1476
1456
1546
1547
1451
1491
1463 1511
1434
14961562
Out449
1533
1534 15151517
1538
1557
1559
1447
1507
1529
1481 1512
1465
1501
1492
1487
Out879
1485
Out418
Out863
Out767
Out390
Out423
1480
1486
1514
Out866
1493
1549
1526
Out450
1543
1568
1527
1495
Out420
1536
1539
1528
1567
1548
Out447
1523 1560
1564
1556
Out768
Out864
Out446
Out874
Out422
Out448
Out391
1542
Out850
1571 1555
1570
Out421
Figure A.8: C ′ after applying 200 iterations of OutputLevelTwoGates to ISCAS
benchmark circuit C880 (height= 42 levels, width= 38 gates).
81
In1
280
483
270
276
279269
In8
309
In13 In17
317
323 322
432
443
In26
In29
986
984
285 287
273
290
291
In36
907 296 295
In42
284
294
In51
316
In55
427
437
In59
319
442286 293
In68 In72
In73
400
In74
In75
In80
In85
297
In86
In87
298
In88
In89
355
In90
886
In91
332302301 502
In96
333 504
In101
334304 303 506
In106
335 508
In111
336
511306 305
In116
338
513
In121
340
515308307
In126
517
In130
519518
499498
In135
501 500
In138
318
In143
510 475
In146
512 477
In149
514 479
In152
In153
516 481
In156
In159
324 325522 590593
In165
523 597
600
In171
327 326524
609
606
In177
525 616 619
In183
328329
526
625 628
In189
527
635
632
In195
330 331
528
644
641
In201
529
651654
In207
521520
In210
417
In219
794 810 809808
836
852 851853
In228
748 745
742
736
739
754 751759
In237
746749 740 743737760 755 752
In246
596 605624 615640631659 650
In255
339 337341
In259In260
In261
734733732 758757
In267
In268
310
828
721
1414
1476
1413
860
945
946
1419
1547
861
1164 1418
1435
385
943
1293
1018
1417
1456
949
987
827
712
845
947
1001
1002
826 777530
552 550
747
796
772533
551 770
539
565
538
561
537
557
536
553
744
822
771
741
819
349
935
1424
1491
1423
1451
930
994
1046
1426
931
932
964
961952
1156
1222
731
933
998
343
392
963
1429
1444
1463
345
393
346
399
937
347
410
348
540
569
735
849
929
1023
773
769
542
577
541
573
544
585587
543
581
738
815805
547
586
811
778782
1437
15331706
1737
1767
1436
1468
1579
1592 1434
1622
1433
1432
1511
1430
925
1005
926
923
369482 480476478
921
922
727
918
972
722
763761756
839
1440
1446
1447
1445
1559
1557
1448
14501803
1507
379
382
1441
1515 1573
1443
1502
1538
375
915
376
910
912953
9651171
1087
1245
913
1009
669
717
793792
686677
713
762753
1201 838
1529
15121481
704 696
1022
981
1117
980
1137
1072
9901056
1010
700
788789
837
708
790 791
705
750
977
978
1031
357360
363
971
1185
1004
366
974
975
1039
589
661
588
893891
969
992
968
967
996
1147
1011
966
1008
991
960
1169
1003
959
1098
1047
958
954
956
1406
1462
1407
1459
1460
1250
1403
1203
1371
1206
1266
1233
1200
1301
1242
1202
1387
1306
1247
1208
1256
1209
806807 812 813 814
795
804
997
803
1007
1106
1057
1100
1000
781785
786
787
802
854
505
868
503
869
509 507
764 765766
867
1195
1217
1197
1378
1192
1193
1219 1190
1188
1238
1186
1184
1183
1182
1180
13381179
1230
1213
1396
1397
1104
1140
1393
1475
1473
1105
1376
1375
1163 1231
1102
1392
1110
1125
1389
1500
1388
1607
1386
1574
1382
1520
1660
1662
1672
1383
1385
1921
1124
1134
1367
1122
1135
1127
1336
1176
1275
1635
1130
1284
1119
673
670
1111
1259
1113
1115
1114
11181343
12141264
1316
1399
1467
1121
1120
662 665
841
1350
1368
1148
1150
1162
1146
1265
416 414
415
1348
1349
1346
1362
1344
1218
1342
1540 1139
1340
1136
1252
1297
413412
1339
411
1335
1590
1143
1248
1332
1370
1167
1374
1508 1565
1373
405 406
409408 407
489488491490
1173
1170
1302
1505
1312
1353
1518 1552
1458
1294
1289
1599
1701
1369
1157
11591318
1361
1161
1363
350
402
351
1078
352
354
404
1356
1210
1225
1359
1165
825
1319
1313
1323
1326
1310
1498
682
678
1329
1489
1454
1328
1325
1479
463
426
425
460
424
834835
829
840
859
495
445
466
697
832833 492
444
830 831
1300
1504
687692
451
843842
888
858
848
857
847
846
855
844
862
873872 871881
1043
883
1523
1616
1617
905
885
887
1527
1556
1618
1060
1059 1051
1052
1526
1727
882
1645
899
904
1528
15751771
1668
1548
877875
1075
1061
876
890
1560
1603
1501
1689
1623
1567
1013
1014
895
896
903
1570
16281730
1568
1627 1632
16331697
1549
1611
1542
1543
17141644
1531
1577
1779
1596
1480
1090
1093
1095
1089
1486
1487
1493
1485
1651
1652
1080
1663
1688
1083
1081
1234
1495
1492
1465
1067
1587
1601
1625
1634
1597
1066
1064
1063
1058
1049
1292
1677
1967
1054
1030
1017
1028
1033
1006
1015
1025
1281
1286
1282
1285
1276
1278
1279
1235 1236
1274
1272
1269
1268
1267
1261
1263
1260 1012
1258
1254
1257
1050
1764
1243
1249
1036
1041 1024
1211
1683
1692
1760
1630
16311735
1736
1757
1591
1841
1604
1606
1598
1675
1720
1629
1641
1620
1709
1658
1705
1654
16381640
1624
1637
1667
1673
1682
1691
1791
1798
18151824
1872
1715
1676
1669
1864
1907
1643
1681
1725
1655
1734
1745
1659
16861717
1719
1665
1671
1816
1695
1813
16801698
171017291784
1711
2041
1748
1818
1724
1690
1837
1783
1750
1768 1777
1901
1796
1746
1775
1754
1819
18211733
1723
1752
17221801
1829
20622101
1839 1944
17881789
1858
1860
17551762
1848
1759
1983
1985
2039
2206
1795
1753
1866
1765
1782
1785
1908
1885
1886
1808
1781
1805
1786
1830
1810
1845
1847
1772
1776
1780
1868
1802
2013
2153
18741890
2002
1807
1822
1871
1838
1833 1869
1863
1811
1923
1894
1981
18421956
1952
1881
1928
1932
1962 1861
1903
1940
2108 2109
1852
1873
2037
1974
1854
1835
1955
1963
1865
2066
2106 1843
1982
1879
1880
1887
1937
2225 1867
2081
1883
1898
1912
19411957
2034
1914
1916 1936
1892
1897
192420002009
1915 1971
1909
2049
1902
1954
2031
1910
2263
2027
1976
1919
1920
1987
2020
2023
1978
1934
19861994
2038
1946 1966
19421993
2024
1960
1992
1995
2017
2005
2007
20112053 1953
19772064
2065
19641969
2099
1973
1990 2032
2089
2144
1996
2052
2029
2132
2146
1997
20682082
2015
2008 2080 2176
2079
2164
2218
Out865
2137
2138
20122133
2036 2044
2054
2110
2135
2030
2035
Out421
2199 2057
2186 2055
2223
2083
2127
Out447
2090
2073 2209
2271
20502093
2180
2042
2091
2117
2211 Out389
2060
2123
2192 2196
2167
2078
2070
2234
2229
2230
2076
2169
2103
2085
2074
2173
2175
2120
2140
2191
2155
2105
2152
21302160
2220
2222Out4182262
2200
2131
2139
2233
2126
2124 2248
2270
2254
22562189
Out8802145
2203
2141
2157
Out767
2227
Out866
2143
Out423
2179
2159
2149
2215
2228
2239
Out391
2161
2163
2171 22502272
Out450
21882246
2243
2231
2217
Out768
2216
2241
Out446
2214
2181
2207
Out390
Out863
2257 2197
2235
2219
2236
Out850Out420
2242 2245
2273
Out879 Out448
2274
Out449
2266
2252
2278
2260
Out419
2238
Out388
2276
Out874
Out864
2269
Out878
2259
2280
2282
2268
Out422
2279
Figure A.9: C ′ after applying 400 iterations of OutputLevelTwoGates to ISCAS
benchmark circuit C880 (height= 60 levels, width= 31 gates).
82
In1
280
483
270
276
279269
In8
309
In13 In17
317
323 322
432
443
In26
In29
986
984
285 287
273
290
291
In36
907 296295
In42
284
294
In51
316
In55
427
437
In59
319
442286 293
In68In72
In73
400
In74
In75
In80
In85
297
In86
In87
298
In88
In89
355
In90
886
In91
332 302301 502
In96
333 504
In101
334 304303 506
In106
335 508
In111
336
511306305
In116
338
513
In121
340
515308307
In126
517
In130
519518
499 498
In135
501500
In138
318
In143
510 475
In146
512 477
In149
514 479
In152
In153
516 481
In156
In159
324 325522 590593
In165
523 597
600
In171
327 326524
609
606
In177
525 616 619
In183
328329
526
625628
In189
527
635
632
In195
330 331
528
644
641
In201
529
651
654
In207
521 520
In210
417
In219
794 810 809 808
836
852 851853
In228
748745
742
736
739
754 751759
In237
746 749 740743737760755 752
In246
596 605624 615640 631659650
In255
339 337341
In259In260
In261
734733732
758757In267
In268
310
828
721
1414
1476
1413
860
945
946
1419
1547 1418
1435
861
1164
385
943
1293
1018
1417
1456
949
987
827
712
845
947
1001
1002
826 777
530
552 550
747
796
772
1821
2109
2108
1822
1852
533
551
770
539
565
538
561
537
557
1824
1874
1873
2037
536
553
1829
1974
744
822
771
741
819
349
935
1424
1491
1423
1451
930
994
1046
1426
931
932
964
961952
1156
1222
731
933
998
343
392
963
1429
1444
1463
345
393
346
399
937
347
410
348
540
569
735
849
773
929
1023
769
542
577
541
573
544
585587
543
581
738
815
1810
1842 1956
805
1811
1952
1813
1819
1881
547
586
1815
1838
1816
1928 1932
1962
811
1818
1861 1885
1903
778
1940
782
1437
1706
1767
1737
1533
1436
1468
1579
1592
1434
1622
1433
1432
1511
1430
925
1005
926
923
369 482 480 476478
921
922
1843
1982
1841
1864
727
918
972
722
763761756
1440
839
1847
1879
1880
1848
18872225
1937
1845
1446
1447
1445
1559
1557
1448
14501803
1507
379
382
1441
15731515
1443
1502
1538
375
376
915
910
912
953
9651171 1087
1245
913
1009
1830
1854
669
717
793 792
686
1833
1835
677
713
762753
8381201
1865
1839
2106
2066
1529
15121481
704
1837
1955
1963
696
1022
2110
2164
981
1117
980
1137
1072
9901056
1010
2248
2270
700
788789
2103
2130
2139
2233
837
21012124
708
790791
2105
2152
705
750
2126
977
978
1031
2120
2189
357360
363
971
1185
1004
366
974
975
1039
589
661
2117
2137
2254
2316
2317
2551
588
893891
2133
2472
1601
1629
2132
2223
2131
2141
2157
969
992
968
967
996
1147
1011
966
1008
991
1607
1658
960
1169
1003
1606
1620
1603
1611
1641
16881604
1709
1808
1807
1981
1908
1805
1894
1923
1802
1801
1863
2127
2203
2145
959
1098
1047
2159
958
2144
2228
2143
2149
2284
954
956
1406
1462
2140
2179
1407
1459
1460
1250
1403
1203
1371
1206
1266
1233
1200
1301
1242
1202
1387
1306
1247
2138
2173 2329
2341
2459
1208
1256
1209
2146
2161
2239 23242163
1625
1667
1628
1682
1673
1627
1735
16911798
1791
2171
1640 1638
2153
2250231523392370
2372
1624
1637
2155
2272
2352
1623
21862234
2439
1985
1892
1897
1898
1907
2049
1909
1617
1705
1616
1618
1632
1654
22432160
2246
2231
2217 2167
2169
2533
1644
1725
2175
22292383
1643
1681
2176
2646
1645
1683
1872
1669
2216
2241
2214
806 807 812 813 814
795
804
997
803
2180
2181
1007
1106
1057
1676
1100
1635
1715
1634
1633
2364
1631
1630
2207
2404
1000
1868
2034
781
1867
1957
1869
1914
785
786
787
802
2191
1860
2081
18661883
1671
1912
1941
2192
2257
1662
1672
1663
1665
2197
2196
1668
1695
854
505
1858
868
503
869
509 507
16601652
1655
1651
1734
1745
1659
1686 1717
1719
764 765766
1886
1915
1783
1690
1971
1692
1768
1750
2009
867
1890
1689
1711
1724
1680
1748
1936
1871
1916
20001924
1677
2041
1675
1729 1784
1698
17101697
1195
1217
1197
1378
1192
1193
1219
1577
1714
1757
1736
1190
1574
1760
1575
1188
1238
1186
1184
1183
1182
1180
1338
1587
1591
1730
1590
1179
1230
15961598
1727
1597
1599
1720 1701
1213
1396
1397
1104
1140
1393
1475
1473
1105
1376
1375
1163 1231
1102
1392
1110
1125
1389
1500
1388
1386
1382
1520
1383
1385
1921
1124
1134
1367
1122
1135
1127
1336
1176 12751130
1284
1119
673
670
1111
1259
1113
1115
1114
11181343
121412641316
1399
1467
1121
1120
662 665
841
1350
1368
1148
1150
1162
1146
1265
416 414415
1348
1349
1346
1362
1344
1218
1342
1540
1139
1340
1136
1252
1297
413412
1339
411
1335
1143
1248
1332
1370
1167
1374
15081565
1373
405 406
409408 407
489 488491490
1173
1170
1302
1505
1312 1369
1353
1518
1552
1458
1294
12891157
11591318
1361
1161
1363
350
402
351
1078
352
354
404
1356
1210
1225
1359
1165
2008
825
2017
2054
2007
2011
2005
1319
2002
2044
2036
1313
1323
1326
1310
1498
1788 1789
2209
2211
2062
1944
2363
2012
2089
1722
682
2206
2236
1723
1901
1775
678
2343
2200
2235
2220
2433
2013
2030
1329
1489
14541328
2015
1325
1479
463
426
425
460
2410
2020
2035
424
2023
2348
2356
2382
2218
1752
2300 2303
2304
2333
22452242
2273
2297
834835
829
840 859
1754
495
445
1733
466
697
832 833 492
444
830 831
1300
1504
687692
2038
451
843 842
888
2252
2263
858
848
857
2266
847
846
855
844
2269 2292
2289
2276
2282
2280
2262
2478
2260
2443
2510
862
873 872871
24232430
2464
2601
2756
2762
2274
2308
2367
23752377 2455
881
1043
883
2325
2591
2313
2278
2359
2238
2296
2389
2090
2091
2093
2099
2293
1523
905885
887
1527
1556
1060
1059 1051
1052
1526
882
899
904
1528
1771
17951796
1548
877 875
1075
1061
876
890
1560
2070
1782
2268
2074
2073
1785
2076
1786
2078
2085
2068
2365
2673
2314
1501
2337
1567
1779
2080
2279
2420
2083
1772
2082
1776
1777
1780
2285
2079
1781
1013
1014
895
896
903
2053
1570
1765
1764
2055
2050
2052
1762
1568
2060
2295
1755
2064
2065
1753
1759
2057
2031
2032
2291
1746
1549
1542
2024
2027
1543
2029
2042
2526
2039
1983
1531
2380
1480
1090
1093
1095
1089
1996
1973
1976
1977
1978
1486
1487
1493 1485
1080
1083
1081
1234
1990
1997
1986
1987
1495
1492
1465
1995
1067
1992
1993
1994
2287
2288
1066
1064
1063
1058
1049
1292
1967
1054
1030
1017
1028
1033
1006
1015
1025
1281
1286
1282
1285
1902
1954
1910
1276
1278
1279
12351236
1274
1272
1934
1919
1920
1269
1268
1267
1261
1263
1260
19461966
1012
1258
1254
1257
1960
1942
1050
1243
1249
1036
1041
1969 1964
1953
1024
1211
2373
2405
2319
2357
2321
2332 2387
2409
2489
2335
2318
2360
2362
2527
2724
2368
2411
2453
2454
2440
2442
2531
2398
2330
23952406
24792536
2557
2571
2708
2374
2435
2541
2477
2355
2403
2402
2484
2397
2671
2495
2498
24342651
2713
2566
2598
2448
2414
2486
2415
2588
2456
24182438
2469
2476
2413
2559
2491
2516
2467 2426
25242429
2682
2768
2770
2799
2431
2512
2483
2432
2772
2437
2501
2504
25082911
2471
2712
2496
2522
2563
2465
2468
2735
2630
2474
2755
2778
2475
2487
24992561
2507
2574
2652 2482
2581
2612
26242603
2544
2553
2518 2545
2649
2550
2676
2736 2539
2547
2582
2519
26612506
2509 2511
25292514
2579
2605
2609 2633
26352669
2720
2597
2614 2697
2830
2896
2555
2573
2549
2584
2607
2616
274825542570
2617
2594
2599
2578
2660
2619
2577
2643
2710
2585 2589
2663
26262586
2658
2744
2889
2938
2939
2615
2806
2858
2596
26802693
2613
2602
2666
2684 2608
26672727
2637
2653
2924
2648 2751
2628
2718
2620 2691
2828
2665
2695 2740
2814
2696 2717
2659
2647
2705
2706
2757
2838
2719
2699
2701
2747
2678
2775
2668
2805
2787
2916
2721
2914
2729
2702
2761
27522802
2733
2679
2930
2931
2689
2690
2698
2707
2809
2795
2816
2817
2909
2857
2792
2860
2777 2856
2711
2837
2922
2959
2850
2853
2758
2903
2723
2812
2764
2786
2849 2885
2734
2823
2783
2789
2797
2779
2836
2879
2913
2934
2848
2813
2873
2821
2801
2790
2818
2845
2791
29252950
2955
3050
3051
2864
3095
3097
3245
3263
2881
2882
2983
304830492810
2862
2798
2904
2936
2923
2902
28072833
28972835
2863
3106
28242869
2894
2929
2842
2839
2932
2876
2844
2918
3029
2943 2872
2968
2927
2859
2892
2852 2975 3005
2912
2895
2910
2901
2883
2947
3061
28932900
2880
2956
3027
2906
3022
2972
3018
2944
2919
2921
2926
2907
3007
30162917
2964
3070
3319
33833433 2915
2928
2933
2952
3135
3196
2957
2951
2962
2945
2971
3171
29983012
3057
30823243 3255
2941
2979 3031
2961
3008
2937
2974
2969
2980
3167
30193088
3090
2970
3054
3011
30303204
3001
3002
3077
3105
2986
3055
3091
3136
3155
31573307
3428
3164
3200
3311
2987 3038
3047
29812990
3067
30853114
3278
3015
30782994 3068
30253281
3102
3369
3033
30283034
3187 3271
3276
3113
30363065
3040
3073
3107
3058
3060
3323
3377
33783165
3045
3042 3066
3125
3127
3172
30753076 3104
3081
3109
323032363132
3385
31403296
3094
3206
33153366
3407
31183220
3191
3219
3139
3098
3110
3152
3124
3166
3093
3158
3329
3162
3175
3201 3193
3209 3250
3274
3233
3482
3108
3145
3268
3122
31343128
3149
3161
3213
3270
3146
3143 3160
3173
319433713283 3178 3197
3199
3258
3308
3212
32173249
34173460
3494
3321
3357
3185
3454
3180
3225
3397
3282
3248
3241 3269 3207
3350 3338
3203
3414
3416
3301
3381
3252
3253
3223
3224
3445
3465
3313
32793239
3299
3501
3238
3235
32943406
3363
3340
3341
3384
3244
3266
3265 3489
3280
3286
3411
3256
3303
3339
3260
3335
3291
3540
3310 3360
3354
3284
3285
3275
3434
3503
3314
3374
3353
34083472
3379
3317
3305
3325
3342
3370
3336
3390
3391
33273343
3328
3362
3410
3394
3401
3421
3634
34323331
3355
3447
3544
3458 3539
3439
3440
3398
3373
3530
3359
3427
3389
3510
3423
3375
3532
3548
3550
3562
3476
3524
3564 3566
3591
Out390
3399
3388
3395
3431
3412
3413
3536
3641345234753534
3422
3511
34803671
3672
3676
3438
3448
3512
3425
3418
3453 3456
3426
3514
3506 3568
3571
3617
3462
36553485
Out863
3437
3652
3611
3500
3474
3583 3486 34973574
3597
3502
3623
Out389
3558
3466
3468
3666
3679
3469
3521Out450
3513
3517
3614
3525
3481
3492
3538
3638
3523
3585
3649
3590
3662 3584 36293635
3622
35533569
35373626
3518
3554
3600
3665
3527 3603
3630
Out391
3543
3560
35563637
Out422
3678
Out767
3565
3605
3561
3618
3656Out874
Out419
3573
3541
3632
3633
3659
3661
3657
3610
3594
3596
3682
3616
3690 3691
Out768
3647
3675
3608 3588
3589
3619
Out447
3636
3593
36503687
Out864
3644
3607
3642
Out388
Out449
3615
3663
3627
3621
Out850
Out420
3683
3680
Out8803673
3670
Out423
3668
3692
3689
3694
3695
Out421
3685
Out865
Out878
3686
Out879 3684
Out866
Out446
Out418
3693
Out448
Figure A.10: C ′ after applying an additional 800 iterations of
OutputLevelTwoGates to the circuit C ′ in Figure A.9 (height= 98 levels,
width= 33 gates).
83
Bibliography
1. Barak, Boaz, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai,
Salil Vadhan, and Ke Yang. “On the (Im)possibility of obfuscating programs”.
Electronic Colloquium on Computational Complextiy, 8(57):1–41, 2001.
2. “Benchmark circuits”. Internet: http://www.fm.vslib.cz/∼kes/asic/iscas/,
Jan 2007.
3. Collberg, Christian, Clark Thomborson, and Douglas Low. A Taxonomy of Obfus-
cating Transformations. Technical Report 148, University of Auckland, Jul 1997.
URL http://www.cs.arizona.edu/∼collberg/Research/Publications/.
4. Edwards, Stephen A. “Making cyclic circuits acyclic”. DAC ’03: Proceedings of
the 40th conference on Design automation, 159–162. ACM, New York, NY, USA,
2003. ISBN 1-58113-688-9.
5. Garey, M. R. and D. S. Johnson. Computers and Intractability : A Guide to the
Theory of NP-Completeness. W. H. Freeman, January 1979. ISBN 0-716-71045-5.
6. Goldwasser, Shafi and Guy N. Rothblum. “On Best-Possible Obfuscation”. 4th
Theory of Cryptography Conference, volume 4392 of Lecture Notes in Computer
Science, 194–213. Springer, 21-24 February 2007. ISBN 3-540-70935-5.
7. Gross, Jonathan L. and Jay Yellen. Graph Theory and its Applications. Chapman
& Hall/CRC, 2 edition, 2006. ISBN 1-58488-505-X.
8. Hansen, Mark C., Hakan Yalcin, and John P. Hayes. “Unveiling the ISCAS-85
Benchmarks: A Case Study in Reverse Engineering”. IEEE Des. Test, 16(3):72–
80, 1999. ISSN 0740-7475.
9. Huth, Michael and Mark Ryan. Logic in computer science: Modelling and Rea-
soning about Systems. Cambridge University Press, 2004.
10. James, Moses C. Obfuscation Framework Based on Functionally Equivalent Com-
binatorial Logic Families. Master’s thesis, Air Force Institute of Technology,
WPAFB, OH, March 2008.
11. Kukis, Mark and Katherine Arms. “Bush to China: Return Plane, Crew”.
Internet: http://www.military.com/Content/MoreContent1?file=standoff,
April 2001.
12. McDonald, Jeffrey T. Enhanced Security for Mobile Agent Systems. Ph.D. thesis,
Florida State University, 2006.
13. Mish, Frederick C. (editor). Merriam-Webster’s collegiate dictionary. Merriam-
Webster, Incorporated, Springfield, MA, 10 edition, 2001. ISBN 0-87779-710-2.
14. Naveh, Barak. “JGraphT”. Internet: http://jgrapht.sourceforge.net/, Jan-
uary 2008.
84
15. “PreEmptive Solutions”. Internet: http://www.preemptive.com/, Jan 2008.
16. “Semantic Designs, Inc.” Internet:
http://www.semdesigns.com/Products/Obfuscators/, Jan 2008.
17. “Smardec”. Internet: http://www.smardec.com/products.html, Jan 2008.
18. Varnovsky, Nikolay P. and Vladimir A. Zakharov. “On the Possibility of Provably
Secure Obfuscating Programs.” Manfred Broy and Alexandre V. Zamulin (edi-
tors), Ershov Memorial Conference, volume 2890 of Lecture Notes in Computer
Science, 91–102. Springer, 2003. ISBN 3-540-20813-5.
19. “Wiktionary”. Internet: http://en.wiktionary.org/, Oct 2007.
85
Vita
Major Kenneth E. Norman graduated from Fayette County High School in
Fayetteville, Georgia. He entered undergraduate studies at the Georgia Institute of
Technology in Atlanta, Georgia where he graduated with a Bachelor degree in Elec-
trical Engineering in 1992. He was commissioned through Officer Training School in
1993. In 2002, he earned his first Masters degree in Engineering Management at the
Florida Institute of Technology.
Major Norman was first assigned to HQ Standard Systems Group, Maxwell
AFB, Alabama in July 1993 as officer in charge of software development. In Octo-
ber 1996, he was assigned to the National Air Intelligence Center, Wright-Patterson
AFB, Ohio where he served as an intelligence analyst. His third assignment began in
October 1999 when he was selected to stand up a new joint interoperability program
office at the US Army’s Communications-Electronics Command, Fort Monmouth,
New Jersey. Next, he became an assignment officer for the developmental engineer
career field at HQ Air Force Personnel Center, Randolph AFB, Texas in July 2002.
Maj Norman was next assigned to the National Reconnaissance Office in Chantilly,
Virginia in August 2004 as a systems engineer. While there, he was selected for in-
residence Intermediate Developmental Education, which precipitated his assignment
to attend the Air Force Institute of Technology in August 2006. Upon graduation,
he will remain at Wright-Patterson AFB for his assignment to Air Force Research
Laboratory.
Permanent address: Air Force Institute of Technology
2950 Hobson Way
Wright-Patterson AFB, OH 45433-7765
86
Index
The index is conceptual and does not designate every occurrence of a key-
word. Page numbers in bold represent concept definition or introduction.
acyclic graph, 17
black-box obfuscation, 8
circuit class, 13
CORGI, 20
CXL, 20
digraph, see directed graph
directed acyclic graph, 17
directed graph, 15
graph, 15
multi-graph, 17
Random Program Model, 8
RPM, see Random Program Model
subcircuit replacement, 18
subcircuit selection, 18
VBB, see virtual black box
virtual black box, 4
white-box obfuscation, 8
Index-1
REPORT DOCUMENTATION PAGE Form ApprovedOMB No. 0704–0188
The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including
suggestions for reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704–0188), 1215 Jefferson Davis Highway,
Suite 1204, Arlington, VA 22202–4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any penalty for failing to comply with a collection
of information if it does not display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS.
1. REPORT DATE (DD–MM–YYYY) 2. REPORT TYPE 3. DATES COVERED (From — To)
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
6. AUTHOR(S)
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT
NUMBER
9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT
NUMBER(S)
12. DISTRIBUTION / AVAILABILITY STATEMENT
13. SUPPLEMENTARY NOTES
14. ABSTRACT
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF:
a. REPORT b. ABSTRACT c. THIS PAGE
17. LIMITATION OF
ABSTRACT
18. NUMBER
OF
PAGES
19a. NAME OF RESPONSIBLE PERSON
19b. TELEPHONE NUMBER (include area code)
Standard Form 298 (Rev. 8–98)
Prescribed by ANSI Std. Z39.18
27-03-2008 Master’s Thesis Sep 2006–Mar 2008
Algorithms for White-box Obfuscation Using
Randomized Subcircuit Selection and Replacement
08-183
Norman, Kenneth E., Maj, USAF
Air Force Institute of Technology
Graduate School of Engineering and Management (AFIT/EN)
2950 Hobson Way
WPAFB OH 45433-7765
AFIT/GCS/ENG/08-17
Air Force Office of Scientific Research
801 North Randolph Street, Rm 732
Arlington VA 22203-1977
703–696–9544 (DSN: 426)
Approval for public release; distribution is unlimited.
Software protection remains an active research area with the goal of preventing adversarial software exploitation such as reverse engineering, tampering,
and piracy. Heuristic obfuscation techniques lack strong theoretical underpinnings while current theoretical research highlights the impossibility of creating
general, efficient, and information theoretically secure obfuscators.
In this research, we consider a bridge between these two worlds by examining obfuscators based on the Random Program Model (RPM). Such a model
envisions the use of program encryption techniques which change the black-box (semantic) and white-box (structural) representations of underlying programs.
In this thesis we explore the possibilities for white-box transformation. Under an RPM formulation, if an adversary cannot distinguish an original program
from either its obfuscated version (whose black-box behavior has been strategically altered) or a randomly generated program of comparable size, then the
white-box intent of the original program has been sufficiently protected. One proposed method of creating such random indistinguishability is by choosing (at
random) a program from a size-bounded set of all semantically equivalent possibilities.
Since full enumeration of reasonably sized programs is not possible, in this work we focus on obfuscators which introduce random white-box structural
variation based on iterative selection and replacement. We design and develop an obfuscation framework for programmatic logic expressed as combinatorial
Boolean circuits and compare six unique approaches for sub-circuit selection. We analyze the relative behavior of random and guided-random sub-circuit
selection algorithms while showing their utility in producing random white-box structural variation.
software tools, software engineering, computer programs, cryptography, obscuration, software obfuscation, randomization,
pseudo random sequences, random functions
U U U UU 99
Lt Col J. Todd McDonald
937–255–3636 x4639, jmcdonal@afit.edu
