A genetic parallel programming based logic circuit synthesizer. by Lau, Wai Shing. & Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
A Genetic Parallel 
Programming based Logic 
Circtiit Synthesizer 
I 
LAU, Wai Shing 
. — • — • . 
A Thesis Submitted in Partial Fulfilment 
of the Requirements for the Degree of 
Master of Philosophy 
in 
Computer Science and Engineering 
© T h e Chinese University of Hong Kong 
November 2006 
The Chinese University of Hong Kong holds the copyright of this thesis. Any 
person(s) intending to use a part or whole of the materials in the thesis in 
a proposed publication must seek copyright release from the Dean of the 
Graduate School. 
A / 統 系 書 圓 、 " ^ 、 
Q 3 Ml 18 )i) 
UNIVERSITY 1 鋪 
Ngj^ M-IBRARy SYSTEMy^^ 
Thesis/Assessment Committee 
Professor Ng Kam Wing (Chair) 
Professor Leung Kwong Sak (Thesis Supervisor) 
Professor Lee Kin Hong (Thesis Supervisor) 
Professor Wu Yu Liang, David (Committee Member) 
Abstract of thesis entitled: 
A Genetic Parallel Programming based Logic Circuit Synthe-
sizer 
Submitted by L A U Wai Shing 
for the degree of Master of Philosophy 
at The Chinese University of Hong Kong in November 2006 
Genetic Parallel Programming (GPP) is a novel Genetic Pro-
gramming paradigm. This thesis presents a G P P based Logic 
Circuit Synthesizer (GPPLCS) which is a combinational logic 
circuit learning system. G P P L C S can synthesize (evolve) opti-
mal logic circuits on Field Programmable Gate Arrays (FPGAs) 
given the truth table of a circuit as an input. It employs a Multi 
Logic Unit Processor (MLP) which is a multiple instruction-
stream multiple data-stream (MIMD), general-purpose register 
machine. Based on the parallel architecture of MLP, G P P L C S 
evolves genetic programs in parallel form (MLP programs). 
The G P P L C S has been improved in two different ways. First 
of all, we make use of hardware accelerator in the GPPLCS. 
A Multi M L P based G P P L C S ( M M G P P L C S ) is proposed so 
that the whole evolution can be sped up. M M G P P L C S is de-
signed to speed up the processes of both evolution and evalua-
tion of genetic parallel programs that represent combinational 
logic circuits. Moreover, a hardware based M L P has been imple-
mented in FPGAs. Experimental result shows that the speedups 
vary from 10 to 36 depending on applications. The second im-
provement is by making use of local search operators, FlowMap 
or D A O M a p . By integrating G P P L C S and FlowMap, a Hy-
i 
bridized G P P L C S (HGPPLCS) is developed. The H G P P L C S 
first evolves circuits in 2-input LookUp Table (LUT) circuit and 
then relies on FlowMap to give a 4-input L U T mapping solu-
tion. Experimental results show that both the L U T counts and 
the propagation L U T delays of the circuits collected are better 
than the original GPPLCS. In addition, by including D A O M a p 
as a local search operator, a novel memetic algorithm has been 
developed and used in a Memetic G P P L C S (MGPPLCS). G P P 
is first used for evolving a population of LUT-based circuits. 
D A O M a p is for optimization purpose while the G P P searches 
for the possible global optima locations (vicinity). D A O M a p 
acts as a greedy local search operator to return an optimum cir-
cuit for each individual in the G P P population. G P P keeps on 
evolving and the process continues until some certain stopping 
criteria are met. Experimental results show that circuits found 
using this approach contain smaller number of LUTs and L U T 


























I would like to express my appreciation to my supervisors Pro-
fessor K.S. Leung and Professor K.H. Lee for their invaluable 
advice and guidance during my research study. Besides, I would 
like to thank Professor K.W. Ng for his suggestions in my re-
search. 
During my Master of Philosophy study, I have also benefited 
from a lot of people. M y peers in our Evolutionary Computation 
study group, Li Gang, Ar Ho, Ar Man, Dr. Liang Yong and Dr. 
Ivan Cheang, have given me a lot of invaluable comments and 
suggestions in my past 2 years. 
Last but not least, I would like to express my gratitude to 
my family for their support and love throughout my life. 
iv 





1 Introduction 1 
1.1 Field Programmable Gate Arrays 2 
1.2 F P G A technology mapping problem 3 
1.3 Motivations 5 
1.4 Contributions 6 
1.5 Thesis Organization 9 
2 Background Study 11 
2.1 Deterministic approach to technology mapping 
problem 11 
2.1.1 FlowMap 12 
2.1.2 D A O M a p 14 
2.2 Stochastic approach 15 
2.2.1 Bio-Inspired Methods for Multi-Level Com-
binational Logic Circuit Design 15 
2.2.2 A Survey of Combinational Logic Circuit 
Representations in stochastic algorithms . 17 
2.3 Genetic Parallel Programming 20 
2.3.1 Accelerating Phenomenon 22 
2.4 Chapter Summary 23 
vi 
3 A GPP based Logic Circuit Synthesizer 24 
3.1 Overall system architecture 25 
3.2 Multi-Logic-Unit Processor 26 
3.3 The Genotype of a M L P program 28 
3.4 The Phenotype of a M L P program 31 
3.5 The Evolution Engine 33 
3.5.1 The Dual-Phase Approach 33 
3.5.2 Genetic operators 35 
3.6 Chapter Summary 38 
4 MLP in hardware 39 
4.1 Motivation 39 
4.2 Hardware Design and Implementation . . . . . . . 40 
4.3 Experimental Settings 43 
4.4 Experimental Results and Evaluations 46 
4.5 Chapter Summary 50 
5 Feasibility Study of Multi MLPs 51 
5.1 Motivation 52 
5.2 Overall Architecture 53 
5.3 Experimental settings 55 
5.4 Experimental results and evaluations 59 
5.5 Chapter Summary 59 
6 A Hybridized GPPLCS 61 
6.1 Motivation 62 
6.2 Overall system architecture 62 
6.3 Experimental settings 64 
6.4 Experimental results and evaluations 66 
6.5 Chapter Summary 70 
7 A Memetic GPPLCS 71 
7.1 Motivation 72 
7.2 Overall system architecture 72 
vii 
7.3 Experimental settings 76 
7.4 Experimental results and evaluations 77 
7.5 Chapter Summary 80 
8 Conclusion 82 
8.1 Future work 83 
Bibliography 85 
viii 
List of Figures 
1.1 General Model of an F P G A which consists of 
Configurable Logic Blocks (CLBs), Input Output 
Blocks (lOBs) and routing resources 2 
1.2 2-Slice Virtex-E C L B 3 
1.3 Schematic of a SRAM-based 3-LUT 4 
1.4 F P G A mapping example 4 
1.5 The system block diagram of the G P P L C S . . . . 7 
2.1 Label Calculation in FlowMap 13 
2.2 Label Calculation in FlowMap (Cont’） 14 
2.3 The structure of Programmable Logic Devices . . 18 
2.4 The phenotype used in Cartesian G P 19 
2.5 Louis's Two-Dimensional Gate Array 19 
2.6 The phenotype proposed by Torresen 20 
2.7 The phenotype of F ^ P G A 20 
2.8 The framework of a G P P system [12] 21 
3.1 The system block diagram of G P P L C S 25 
3.2 The 2-LUT M L P used by the G P P L C S 26 
3.3 The 4-LUT M L P used by the G P P L C S 27 
3.4 The genotype of a L m a x - ^ 1 (PI[0]-PI[Lm^x-1]), 
16-SI(SI[*,0]-SI[*,15]) M L P program 28 
3.5 Representations of Sis in evolving 2-LUT and 4-
L U T circuits 29 
3.6 Functions bO - bF used in 2-LUT circuits SI . . . 30 
ix 
3.7 The corresponding content of 4-LUT of the "bF6E0 
r31 r27 r08 r29 rOO" sub-instruction 31 
3.8 Optimized M L P program for 1-bit full adder in 
2-LUT format 32 
3.9 A 1-bit full adder in 2-LUT format 32 
3.10 Optimized M L P program for 2-bit full adder in 
4-LUT format 33 
3.11 A 2-bit full-adder in 4-LUT format 33 
3.12 PI level crossover on two parents 36 
3.13 An SI swapping in a single M L P program 37 
4.1 The architecture of the M L P core 41 
4.2 A Processing Element 42 
4.3 The speedup ratio versus tournaments for M U X 
problem 47 
4.4 The speedup ratio versus tournaments for A D D 
problem 47 
4.5 The speedup ratio versus tournaments for C M P 
problem 48 
4.6 The speedup ratio versus tournaments for PRI 
problem 48 
4.7 The speedup ratio versus tournaments for M A J 
problem 48 
4.8 The speedup ratio versus tournaments for B C D 
problem 49 
5.1 The system block diagram of M M G P P L C S . . . . 53 
5.2 FIFO design 54 
5.3 Algorithm of M M G P P L C S in simulation 57 
6.1 H G P P L C S 63 
6.2 FlowMap refines the fitness of individuals in GP-
PLCS 64 
XV 
6.3 Average number of 4-LUT count and L U T level 
collected from H G P P L C S and G P P L C S on the 
six problems in 50 runs 69 
6.4 Best number of 4-LUT and L U T level collected 
from H G P P L C S and GPPLCS on the six prob-
lems in 50 runs 69 
6.5 The best 3-bit comparator evolved by the HGP-
PLCS 69 
7.1 The system block diagram of M G P P L C S 73 
7.2 D A O M a p refines the fitness of individuals in GP-
PLCS 74 
7.3 Algorithm of M G P P L C S •. . . 75 
7.4 6-bit multiplexer evolved by the M G P P L C S . . . 80 
xi 
List of Tables 
3.1 Control-codes in 2-LUT circuits SI 28 
3.2 Control-codes in 4-LUT circuits SI 30 
4.1 Pilchard board features 42 
4.2 Six combinational logic circuit problems used in 
G P P L C S with the hardware assisted MLP. The 
Nin and Nout denote the numbers of inputs and 
outputs respectively. The Nrow (=2风")denotes 
the number of rows in the truth tables . The 
Ncase {=Nrow X A^ oui)denotes the total number of 
training cases 44 
4.3 Experimental settings used in G P P L C S with the 
hardware assisted M L P 45 
4.4 Summary of experimental results in G P P L C S with 
hardware assisted M L P 47 
5.1 Six combinational logic circuit problems used in 
the simulation. The Nin and N — denote the 
numbers of inputs and outputs respectively. The 
Nrow (=2风")denotes the number of rows in the 
truth tables . The Ncase {=Nrow x Nout)denotes 
the total number of training cases 56 
5.2 Experimental settings used in M M G P P L C S and 
G P P L C S 58 
xii 
5.3 Number of tournaments (x lO” needed by M M G P -
PLCS and G P P L C S in design phase on six prob-
lems (Average value) 59 
6.1 Six combinational logic circuit problems used in 
H G P P L C S . The Nin and Nout denote the num-
bers of inputs and outputs respectively. The TV卿 
(=2风卞enotes the number of rows in the truth 
tables . The Ncase {=Nrow x A^ out)denotes the to-
tal number of training cases 65 
6.2 Experimental settings used in H G P P L C S 67 
6.3 Best circuits collected from H G P P L C S , G P P L C S 
and FlowMap algorithm on six problems . . . . 68 
6.4 Successful rate of evolving circuit problems in HGP-
PLCS and G P P L C S 68 
7.1 Six combinational logic circuit problems used in 
M G P P L C S . The N饥 and Nout denote the num-
bers of inputs and outputs respectively. The TV謂 
( d e n o t e s the number of rows in the truth 
tables . The Ncase {=Nrow X A^ouOdenotes the to-
tal number of training cases 76 
7.2 Experimental settings used in M G P P L C S . . . . 78 
7.3 Best circuits collected from M G P P L C S , GPPLCS, 
D A O M a p and FlowMap algorithm on six problems 79 
7.4 Circuits collected from M G P P L C S , GPPLCS, D A O M a p 
and FlowMap on six problems (Average value) . . 79 
xiii 
List of Abbreviations 
• ALU: Arithmetic Logic Units 
• AGO: Ant Colony Algorithms 
• CAD: Computer Aided Design 
• CGP: Cartesian Genetic Programming 
• GIGA: Case Injected Genetic Algorithms 
• CLBs: Configurable Logic Blocks 
• CU: Control Unit 
• D S W : Dynamic Sample Weighting 
• EE: Evoluiton Engine 
• E H W : Evolable Hardware 
• ES: Evolutionary Strategy 
• FF: Filp Flop 
• F^PGA: Functional-based Field Programmable Gate Array 
• FPGA: Field Programmable Gate Array 
• GAs: Genetic Algorithms 
• GASA: Genetic Algorithms with Simulated Annealing 
• GPs: Genetic Programmings 
xiv 
• GPP: Genetic Parallel Programming 
• GPPLCS: Genetic Parallel Programming based Logic Cir-
cuit Synthesizer 
• HGPPLCS: Hybridized Genetic Parallel Programming based 
Logic Circuit Synthesizer 
• ICs: Integrated Circuits 
• lOBs: Input Output Blocks 
• lORs: Internal Operand Registers 
• /c-LoUs: /u-input logic units . 
• LUT: Lookup Table 
• A:-LUT: A;-input LookUp Table 
• M I M D : Multiple Instruction-streams Multiple Data-streams 
• MLP: Multi Logic Unit Processor 
• M G P P L C S : Memetic Genetic Parallel Programming based 
Logic Circuit Synthesizer 
• M M G P P L C S : Multi M L P Genetic Parallel Programming 
based Logic Circuit Synthesizer 
• O L M C : Output Logic Macro Cell 
• PFU: Programmable floating-point processing units 
• PE: Processsing Element 
• PI: Primary Input 
• Pis: Parallel Instructions 
• PLD: Programmable Logic Device 
XV 
• P〇：Primary Output 
• PS〇： Particle Swarm Optimization 
• SGA: Simple Genetic Algorithms 
• Sis: Sub Instructions 
• SIR: Sub Instructions Registers 
• VGA: Variable-length Genetic Algorithms 
• VHDL: Very High Speed Integrated Circuit Hardware De-
scription Language 
xvi 
List of Symbols 
• d: the propagation delay 
• dmax: the maximum value allowed for propagation delay 
參 fraw'. raw fitness 
參 fdp: fitness in the design phase 
• fop： fitness in the optimization phase 
• g: the number of LookUp Table count (the number of nor-
mal sub-instructions (SI)) 
• gmax: the maximum value allowed for number of LookUp 
Table count 
• L: length of Parallel Instructions (Pis) 
• Lmax- Maximum length of Parallel Instructions 
• tmax' Maximum tournaments allowed 
• Pxover'- PI crossover Probability 
• Pbtmut: Bit Mutation Probability 
• Psiswp'- SI swapping Probability 




Field Programmable Gate Arrays (FPGAs) have become very 
popular for prototyping new designs of digital logic circuits. 
This is because the F P G A implementation of a design is rel-
atively easy, thus allowing logic verification to be performed 
early in the design process and reducing the turnaround time 
62]. This has further ramifications on the manufacturing costs. 
In implementing a design in FPGAs, the optimized logic descrip-
tion obtained during logic synthesis must be mapped onto the 
modules and routing resources available on a particular F P G A . 
The objective is to find the best mapping , in terms of number 
of modules required, onto the F P G A . Other factors, such as per-
formance, may also be considered. In this thesis, a synthesizer 
using genetic parallel programming (GPP) for F P G A technol-
ogy mapping problem - a Genetic Parallel Programming based 
Logic Circuit Synthesizer (GPPLCS) is presented. 
This chapter is organized as follows. An overview of the 
F P G A is given in Section 1.1. In Section 1.2, F P G A technol-
ogy mapping problem is described. The motivations and our 
contributions can be found in Sections 1.3 and 1.4 respectively. 
Finally, the thesis organization is given in Section 1.5. 
1 
CHAPTER 1. INTRODUCTION 2 
lOB 
Ram 
CLB CLB CLB 
R ^ ^ ^ P ^ ^ I R 
lOB a CLB CLB CLB a lOB 
m ^ ^ ^ ^^^^^ m 
CLB CLB CLB 
一 U __JjH —— ——ui__ 
Routing resources , 
Ram I -
lOB 
Figure 1.1: General Model of an FPGA which consists of Configurable Logic 
Blocks (CLBs), Input Output Blocks (lOBs) and routing resources 
1.1 Field Programmable Gate Arrays 
Field Programmable Gate Arrays (FPGAs) are a class of pro-
grammable hardware devices which consist of an array of Input 
Output Blocks (lOBs), Configurable Logic Blocks (CLBs) and 
routing resources. A simplified general model of an F P G A is 
shown in Figure 1.1. lOBs are responsible for connection be-
tween the CLBs logic and the outside world. A CLB is a basic 
unit of a logic function implementation in FPGAs. Routing re-
sources interconnect the CLBs and form connections between 
the CLBs and the lOBs. Some FPGAs may also contain on-
chip R A M . Figure 1.2 shows a 2-Slice Virtex-E CLB [2] which 
contains two logic cells. Each Logic Cell consists of a function 
generator in the form of a LookUp Table (LUT), a storage ele-
ment or Flip Flop (FF), internal Carry and Control Logic and 
registers. 
CHAPTER 1. INTRODUCTION 3 
oi'UT courr 小 木 
It.出 r~~ i~»YB 
LUT 一 C^rryi D^O—> YO 山T _ t —_f^ l^ . G2 > CoHici 7 CiMilrol ^^  YQ 
01 > I , , I Q1> I I 
BY > L™iJ By> LEJ 
今 *B 
> _ _ L f 4 > _ L 
p , > 一 $p I F3 > &P I C LUr — CarrvA D Q —> xo Z ^ - LUT ——Cairyl 一一 ^ o > 们 F2 > Ccolrol .p ，拟 Conlr4 ^^  ^ »� 
Fl >1__I I I Fl：!I__I I I 
> > 
BX > 1 t ^ BX> 1 
SIC4 1 0 
A K 
CIN CiH 
Figure 1.2: 2-Slice Virtex-E CLB .. 
LUT-based FPGAs are a new generation of integrated cir-
cuit with an array of programmable logic blocks placed in an 
infrastructure of interconnections. Usually, fixed size LUTs are 
used among the whole F P G A chip and the size of every L U T is 
denoted by the number of inputs (A;), which is commonly chosen 
to be 4 or 5. A /c-input L U T (/c-LUT) can be used to imple-
ment any Boolean function of up to k variables. Every L U T 
is implemented by memory cells with k-hit address decoder. 
Any inputs to a Boolean function will be taken as an address to 
read the corresponding bit pre-loaded inside the memory cell. 
Therefore, a A;-LUT can be used to implement any A;-variable 
Boolean functions. Figure 1.3 shows a possible structure of a 
3-LUT. 
1.2 FPGA technology mapping problem 
A typical design flow for FPGAs consists of a number of steps. 
W e first synthesize the logic circuit from specification and then 
follow by logic optimization. Then, it is followed by technol-
ogy mapping and finally placement and routing. The aim of 
CHAPTER 1. INTRODUCTION 4 
~ RAM cell 一 
> — R A M cell 一 
” I 一 RAM cell — 
a n> 
——RAM cell — 
b""“ ^ ~~ f 
q — RAM cell — • 
C— s. -
Q — R A M cell 一 
一 RAM cell 一 
RAM cell 
Figure 1.3: Schematic of a SRAM-based 3-LUT 
� 
！cr ^^�� 
H L J L---------------------j I 
Figure 1.4: FPGA mapping example 
F P G A technology mapping is to get a functionally equivalent 
L U T network based on a given Boolean circuit while placement 
and routing is to realize an implementation of the mapped L U T 
network. As a result, the objective of technology mapping is 
either to use a minimal chip area (i.e. area minimization) or to 
have a minimum circuit delay (i.e. depth minimization). The 
area is commonly indicated by the number of LUTs while the 
circuit delay is measured by the number of level of LUTs. 
Our definition of the F P G A technology mapping problem is 
slightly different from the one used by the Computer Aided De-
sign group. In their problem definition, the input to the F P G A 
CHAPTER 1. INTRODUCTION 5 
technology mapping problem is a Boolean Network which is 
modeled from a circuit. That means technology mapping ap-
plies on an existing circuit. W e believe that any existing circuits 
would hinder our GPPLCS from reaching a global optimum. 
Thus, we used a different definition. Our input to G P P L C S is 
a truth table of a circuit. As the truth table specifies the func-
tionality of a circuit only, circuits can be evolved freely in the 
GPPLCS. Thus, GPPLCS can be prevented from being trapped 
in a local optima. 
The output of our GPPLCS would be a network which is com-
posed of LUTs which performs the same function as stated in 
the input truth table. The number of inputs to L U T is bounded 
by a variable k. If the network is /c-bounded, all inputs of LUTs 
will be less than or equal to k. Clearly, /c-bounded network can 
be implemented by an F P G A using /c-LUTs as logic block. Fig-
ure 1.4 shows an example on this problem. This example can 
be implemented by 5 LUTs. 
The F P G A technology mapping problem is formulated as fol-
lows: 
• INPUT: A truth table of a circuit 
• O U T P U T : A /c-bounded network 
• Objectives: 
1. Minimize the number of LUTs used to map the circuit. 
2. Minimize the delay of the circuit mapping result. 
1.3 Motivations 
A Genetic Parallel Programming based Logic Circuit Synthe-
sizer (GPPLCS) is proposed in this thesis. It is motivated by 
the following two observations: 
CHAPTER 1. INTRODUCTION 6 
1. Traditionally, technology mapping problems are solved by 
deterministic algorithms like FlowMap [21] and D A O M a p 
13]. Although mapping solutions can be obtained in a 
short period of time, the qualities of the solutions are not 
the best. The application of stochastic algorithms like 
Genetic Parallel Programming (GPP), which are particu-
lary good at finding the global optimum to optimization 
problems, should be explored. Moreover, since G P P is a 
population-based search approach and has a strong opti-
mization capability, it can find more of the best solutions 
among the possible solutions. That means more than one 
mapping solutions can be found by GPP. 
2. Although G P P is good at locating the global optimum in 
optimization problems, G P P usually takes a long time for 
the computation. Some improvements are necessary to 
tackle this problem. 
A G P P L C S is therefore proposed and implemented to tackle 
the first problem. Some further improvements are made to the 
GPPLCS. By having an hardware implementation of the GP-
PLCS in FPGAs is one of a feasible ways to solve the efficiency 
problem. The other way is to include a non-genetic deterministic 
local search operator in the GPPLCS. These improvements are 
shown to be effective in significantly shortening the computation 
time. 
1.4 Contributions 
Firstly, the major contribution of our work is the design and 
implementation of a GPPLCS. The G P P L C S is used to design 
optimized combinational logic circuits with LUTs, which are the 
basic logic representation components in FPGAs. Designing an 
optimized lookup-table network is a non-trivial task. Based on a 
CHAPTER 1. INTRODUCTION 7 
M L P program in parallel assembly truth table 
r ^ rn 
training —— 
decompile 
I expected o u t p u t ! 
I J evaluated 
( ^ p ^ l a t i ^ evaluate ^ ^ ^ ^ o u ^ u t ^ ^ TO 
fitness S n 
T I r ^ ^ 
G E N E T I C OPERATIONS:- ‘ ；^；：；；；；；；：；；；；^ genotype ‘ — • I ^ ^ U t U • 
mutation, crossover, individuals ^ ^ ^ ^ ^ 
selection, etc... ^ ^ _ _ _ 
Evolution Engine (EE) Multi-Logic-Unit Processor (MLP ) 
Figure 1.5: The system block diagram of the GPPLCS 
tailor-made combinational logic evaluation engine, Multi Logic 
Unit Processor (MLP) and an Evolution Engine (EE) (see Fig-
ure 1.5), the GPPLCS successfully evolved high qualities multi-
level combinational logic circuits. The results are superior to 
other existing Genetic Programmings (GPs) and Genetic Algo-
rithms (GAs) systems. 
Secondly, we have successfully built a hardware evaluation 
engine on FPGAs. Based on the architecture of the MLP, a 
hardware based M L P on FPGAs has been designed and imple-
mented so that the evolution speed can be boosted. A G P P L C S 
with software version of the EE and the hardware based M L P 
were built to verify the effectiveness. 
Thirdly, further improvements have been achieved on the GP-
PLCS with the hardware assisted MLP. First of all, we have 
investigated the possibility of full scale hardware implementa-
tion of GPPLCS. As the execution time of the M L P and the E E 
are different, a special model of cooperation between the M L P 
and the E E are necessary in a hardware implementation of the 
G P P L C S in an F P G A . By including multi M L P with a single 
EE in a GPPLCS, it can reduce the waiting time of E E during 
an evaluation of evolved combinational logic circuit in the MLP. 
CHAPTER 1. INTRODUCTION 8 
The simulation shows that the model works fine in evolving logic 
circuits and is suitable for the implementation of the G P P L C S 
in FPGAs. 
Fourthly, we have included a local search operator in our GP-
PLCS. Based on existing deterministic algorithms for technol-
ogy mapping problems such as FlowMap and D A O M a p [13, 21], 
a Hybridized GPPLCS (HGPPLCS) and a Memetic G P P L C S 
(MGPPLCS) have been designed and implemented. The hy-
bridized G P P L C S make use of the population-based Genetic 
Parallel Programming (GPP) and FlowMap to evolve 4-LUT 
circuits. Since G P P is population-based, it has a number of 
individuals (circuits) that have the same function (i.e. many-to-
one genotype^-phenotype^ mapping). Thus, G P P can provide a 
number of different circuits as inputs to the FlowMap algorithm. 
In this way, FlowMap can return different mapping solutions so 
that a better solution can be obtained. 
Lastly, algorithms hybridize a non-genetic deterministic local 
search to refine the qualities of solutions with a genetic algorithm 
are called memetic algorithms [53]. This inspires an idea of using 
a local search operator in GPPLCS. By refining the individuals, 
local optima can be found more efficiently. During the process 
of evolution, D A O M a p keeps refining individuals so that more 
and more optima can be explored. This new G P P L C S with a lo-
cal search operator - D A O M a p becomes our memetic GPPLCS. 
Experimental result shows that the memetic G P P L C S evolve 
better circuits using smaller number of tournaments. 
Generally speaking, the memetic G P P L C S is the most effi-
cient and effective method to generate circuits. It requires fewer 
evaluations to identify higher quality solutions than GPP. Both 
iThis is the representation which consists of encoded codes (chromosomes) for the 
phenotype 
2The phenotype is the representation (as opposed to the genotype) which exhibits 
features that can be evaluated. The phenotype is the visible, behavioral expression of the 
genotype 
CHAPTER 1. INTRODUCTION 9 
the lookup table counts and the propagation delays of the cir-
cuits collected are better than those obtained by conventional 
design or evolved by G P P alone. 
1.5 Thesis Organization 
The rest of the thesis is organized as follows: 
Chapter 2 first presents the research background of this the-
sis. Then, it gives a thorough review on both deterministic 
and stochastic algorithms to technology mapping problem. Af-
terwards, a brief introduction of Genetic Parallel Programming 
(GPP) will be given. 
Chapter 3 presents a Genetic Parallel Programming based 
Logic Circuit Synthesizer (GPPLCS). G P P L C S is a G P P sys-
tem which comprises two core components, a Multi-Logic-Unit 
Processor (MLP) and an Evolution Engine (EE). The M L P is 
an evaluation engine to execute parallel genetic programs for 
fitness evaluation. The EE is a population-based evolutionary 
process which manipulates the population and performs genetic 
operators. 
Chapter 4 shows a design and implementation of a Multi 
Logic Unit Processor (MLP). The M L P is a hardware imple-
mentable evaluation engine to execute parallel genetic programs 
for fitness evaluation. With a cooperation of the software ver-
sion EE and the hardware based MLP, combinational circuits 
are evolved at a faster rate. Experimental results in terms of 
actual speedup ratio on several combinational logic circuits are 
presented. 
In Chapter 5, we describe a new model of cooperation be-
tween the M L P and the EE. This new model is designed for 
hardware implementation in FPGAs. The main contribution is 
to shorten the waiting time of EE during an evaluation of logic 
circuit programs in the M L P based on a pipeline concept. Sim-
CHAPTER 1. INTRODUCTION 10 
Illation results on several combinational circuits compared with 
the current G P P L C S are presented. 
Chapter 6 presents a hybridized GPPLCS. A system which 
integrates the GPPLCS and the FlowMap algorithm is pre-
sented. Experiments on several combinational logic circuits are 
presented. 
Chapter 7 gives a presentation of a memetic GPPLCS. By 
including a non-genetic local search operator - D A O M a p in GP-
PLCS, better circuits can be evolved with a smaller number 
of tournaments. Experimental result on several combinational 
logic circuits are given. 
Finally, Chapter 8 concludes this thesis with a summary of 
the issues addressed in this thesis and their contributions. It also 
suggests several directions for future research in our GPPLCS. 
• End of chapter. 
Chapter 2 
Background Study 
In Computer Aided Design (CAD) field, technology mapping 
problem is mainly tackled by deterministic algorithms. They are 
mainly network-flow-based algorithms which produce mapping 
solutions with optimal depth. Although there are no stochastic 
algorithms designed to tackle the technology mapping problem, 
some stochastic algorithms are designed for multi-level combi-
national logic circuit design. 
This chapter is organized as follows. A literature review on 
two deterministic network-flow-based algorithms (FlowMap and 
D A O M a p ) is given in Section 2.1. Section 2.2 is a literature re-
view on stochastic algorithms for multi-level combinational logic 
circuit design. Finally, a brief introduction of Genetic Parallel 
Programming is presented in Section 2.3. 
2.1 Deterministic approach to technology map-
ping problem 
In this section, we introduce two network-flow-based algorithms 
for the technology mapping problem. These algorithms guaran-
tee to produce mapping solutions with optimal depth. Therefore 
in the later design process, the wiring delays of the circuit are 
also optimized. 
11 
CHAPTER 2. BACKGROUND STUDY 12 
2.1.1 FlowMap 
A circuit is modeled as a Boolean Network. There is a set of 
nodes PI representing the primary inputs (Pis) and another set 
of nodes P O representing the primary outputs (P〇s). All other 
nodes in the network are called internal nodes and these nodes 
are associated with specific functions. The function type of the 
internal nodes can be simple (AND, OR, N O T , X O R ) or com-
plex. Every wire in the circuit is represented by an edge between 
two nodes. All incoming edges to a node are called fanin of this 
node and all outgoing edges are called fanout; Nodes in PI has 
only fanouts while nodes in P O has only fanins. If the in-degrees 
of all nodes are less than or equal to k, the network is /c-bounded. 
Clearly k bounded network can be implemented by an F P G A 
using /u-input LookUp Tables (/c-LUTs) as logic block. 
FlowMap [21] is the first depth-optimal technology mapping 
algorithm developed. The algorithm will first apply Decompose 
Multi-Input Gate (DMIG) [14] to decompose the network into a 
network composed of small gates which have a smaller number 
of inputs (say 2). Experimental results show that small gates 
can be packed and grouped more efficiently than large input 
gates. The depth of the mapped network is the smallest when 
the original network was first decomposed into 2-input gates. 
After gate decomposition, the algorithm enters the labeling 
phase. The algorithm calculate a label l(t) for every node t in 
topological order. The label l(t) gives the minimum depth of any 
mapping solution of the subnetwork rooted at node t, denoted 
by Nt. Moreover, l(t) is either equal to the maximum label p of 
the nodes in fanin of t or one more than the maximum label. 
FlowMap first collapses all nodes with label p in Nt to get a 
new network N^. , then it continues to compute the maximum 
volume min-c lit of N[ using the classic network flow technique. 
If the cut size is less than or equal to k, the label l(t) is assigned 
to be p, otherwise l(t) = p+1, indicating a new L U T is used to 
CHAPTER 2. BACKGROUND STUDY 13 
CO 
y 
Figure 2.1: Label Calculation in FlowMap 
map Nt. 
After label calculation, FlowMap starts A;-LUT generation 
with a list of P O nodes. It iteratively takes a non-PI nodes on 
the list and generate a L U T to implement the function for all the 
nodes with the same label. The fanins to this newly generated 
L U T is then put on the list. 
To illustrate the label calculation we show the network for 
the circuit in Figure 2.1. There are 6 Pis (from a to /) and 
1 P O {yl). For simplicity, we take k = ?> (i.e. 3-input LUT). 
Suppose we need to compute the label for node g% with p = 2 
(i.e label of gl is 2, l[gl) = 2) during the label phase. Thus we 
collapse the node g7 with g8 together and consider this collapsed 
node as the node sink. After addition of a dummy source node 
(src) connecting to all 5 PI nodes, we find a minimum cut on 
the network by network flow technique. Figure 2.2 shows the 
collapsed network and the graph for flow calculation. The min-
cut simply separates the sink node with all the other nodes, 
CHAPTER 2. BACKGROUND STUDY 14 
Figure 2.2: Label Calculation in FlowMap (Cont’） 
which implies that nodes g7 and gS can be grouped together 
and implemented by a 3-LUT. Since the cut size equals to 3, 
the label of node g8 is 2, same as that of g7. 
FlowMap has a polynomial time complexity of 0{kmn) where 
n and m are the number of nodes and the number of edges in N. 
Therefore the algorithm is extremely fast even for large circuits 
with thousands of gates. 
2.1.2 DAOMap 
D A O M a p [13] which stands for Depth-optimal Area Optimiza-
tion of F P G A designs is an extension of FlowMap . The differ-
ence lies in the way of modeling and controlling node duplica-
tions so as to reduce area through the entire mapping process. 
First, a cut-enumeration-based method that consists of cut gen-
eration and cut selection is adopted. Cut generation traverses 
the network from Pis to POs, and combines subcuts on the 
fanin nodes of the target node to generate all the cuts on the 
target node (each cut represents one possible L U T implementa-
tion rooted on the target node). After all the cuts are generated, 
the network from POs to Pis is traversed and cuts to produce 
the L U T mapping result is selected. 
CHAPTER 2. BACKGROUND STUDY 15 
In order to reduce area through the entire mapping process, 
three novel approaches to effectively model and control node du-
plications and reduce area through the entire mapping process 
are done in D A O M a p . First, the potential duplications during 
the cut generation procedure are considered so that the mapping 
solutions encoded in the cuts can consider duplication costs. 
This will help the cut selection procedure to make the right de-
cisions to cover the circuit with less node duplications from a 
global optimization point of view. Second after the timing con-
straint is determined (the longest optimal mapping delay of the 
network), the noncritical paths will be relaxed by searching the 
solution space which will consider both local and global opti-
mality information to minimize the mapping area. Third, an 
iterative cut selection procedure that further explores and per-
turbs the solution space is carried out to improve the solution 
quality. 
2.2 Stochastic approach 
Although there are no stochastic algorithms designed for tack-
ling technology mapping problems, there are some related work 
on multi-level combinational logic circuit design by bio-inspired 
methods. In addition, there are many different existing pheno-
type representations for combinational logic circuits. They are 
described in the following subsections. 
2.2.1 Bio-Inspired Methods for Multi-Level Combina-
tional Logic Circuit Design 
In this subsection, we summarize the current researches on bio-
inspired methods for multi-level combinational logic circuit de-
sign. 
• Simple Genetic Algorithms (SGA): It encodes a combina-
CHAPTER 2. BACKGROUND STUDY 16 
tional logic circuit by using a fixed-length genotype [15, 16, 
17, 32, 49, 50, 55, 58, 59]. Standard genetic operators such 
as one-point crossover and bit mutation are used. 
• Variable-length Genetic Algorithms (VGA). It is an exten-
sion of S G A [33, 34, 35]. A genotype only encodes the 
effective part of the architecture bits of a combinational 
logic circuit. Comparing with SGA, the lengths of V G A 
genotypes are smaller. Thus, it is possible to grow larger 
circuits in a shorter evolution time with V G A . Special ge-
netic operators such as cut, splice [25] are used. 
• Standard GP. It uses a tree structure to represent an in-
dividual combinational logic circuit [4, 40]. Standard G P 
operators such as node mutation, sub-tree mutation and 
sub-branch crossover are used. The main drawback of this 
method is that only single-output combinational logic cir-
cuits can be evolved. It is because there is only one root 
node in each program tree. 
• Evolutionary Strategy (ES) are used to evolve combina-
tional logic circuits [37, 52]. It includes five steps: 1) ran-
domly initializes a population of 7 genotypes; 2) evaluates 
all genotypes; 3) copies the fittest genotype into a new pop-
ulation; 4) fills the remaining 7 — 1 places in the new popu-
lation by the mutated versions of the fittest genotype; and 
5) replaces the old population by the new one. The algo-
rithm repeats steps 2 to 5 until the termination criterion is 
achieved. 
• Ant Colony Algorithms (AGO). A G O is used to evolve logic 
circuits [3, 20]. It is a multi-agent system in which inter-
actions between low-level agents (ants) results in a meta-
heuristic behavior of the whole ant colony [24 . 
CHAPTER 2. BACKGROUND STUDY 17 
• Particle Swarm Optimization (PSO). PS〇 is to evolve com-
binational logic circuits [19]. It simulates the movements 
of a flock of birds which seek for food (a global aim). It is 
a distributed algorithm that performs a multi-dimensional 
search [38 . 
• Genetic Algorithms with Simulated Annealing (GASA). It 
is a hybridization of a G A with Simulated Annealing (SA) 
18, 39]. In this algorithm, the G A locates good regions of 
the search space whereas the SA exploits these good regions 
in order to find the optima. 
• Case Injected Genetic Algorithms (CIGA). It combines a 
G A system with a Case-Based Reasoning (CBR) module 
45，46]. In the C B R , a case-base is built during G A search. 
Whenever the best individual is found, it will be stored in 
the case-base. The case-base can be reused to solve a new 
problem by injecting similar cases to the initial population 
of a new G A search. 
2.2.2 A Survey of Combinational Logic Circuit Repre-
sentations in stochastic algorithms 
Most of the existing phenotype representations for combina-
tional logic circuits adopt two-dimensional geometric structures. 
This subsection presents five typical geometries proposed and 
used by different groups of researchers. They are: 
• Programmable Logic Device (PLD) Structure. P L D struc-
ture is used to evolve logic circuits [29]. P L D is a class 
of reprogrammable logic devices, e.g. GAL16V8. Each 
P L D consists of a fused array and an Output Logic Macro 
Cells (OLMC) (see Figure 2.3). A fused array can be 
programmed to represent minterms of a Boolean function. 
Multiple minterms are connected to an O L M C in which 
CHAPTER 2. BACKGROUND STUDY 18 
— 
tH>o—— OLMC 
‘ ‘ ；： ‘ ‘~ ‘~！！ ‘ ^ 
； 二 _ ！ _ W 
ftise nnay 一 ： ；T OR ~ 与 
Figure 2.3: The structure of Programmable Logic Devices 
a multi-input O R gate is configured. This phenotype is 
designed to match the architecture bits of PLDs in a sum-
of-products form. 
• Cartesian G P (CGP) [51]. As shown in Figure 2.4, the 
phenotype is a two-dimensional array of cells. Each cell 
contains a logic gate with some inputs and outputs. All 
external inputs and gate outputs can be reused by their 
higher level (right-hand side) cells. The final outputs can 
be connected to any external inputs and/or cell outputs 
in any levels. A levels-back parameter is used to limit the 
maximum number of levels that a cell output can be reused 
by its higher level cells. 
• Louis's Two-Dimensional Gate Array. It is a two-dimensional 
gate array proposed by Louis [17, 45] (see Figure 2.5). The 
phenotype is a two-dimensional array of two-input logic 
gates. Except the first level gates (the left-most column in 
the figure), a gate G[i,j] gets its upper input from G[i,j.r 
and lower input from either G[i.l,j.l] or G[i+l,j.l]. The 
outputs of the circuit are always connected to the outputs 
of the highest level gates (the right-most column in the fig-
ure). This representation reduces the genotype length by 
CHAPTER 2. BACKGROUND STUDY 19 
c c c … c 
• L 1,1 ^ U ^ 1,3 ^ l.m u 
3 — f c f c ~ f c … f c p 
B —— Itl —— 
• • • _• - 叙 • ；mi 
一 ： ： ：守：： 一 
f r f c f c … f c ~ 
11,1 u.i n.m 
cell i 
Figure 2.4: The phenotype used in Cartesian GP 
" ^ G [ l， l ] | - ^ G [ 1 . 2 ] [|r^G[1.3] ^ 
I irlG[2,l] [JfljGfIsi] - I 
^ I 云 
"T|G[3,1] G[3,2] P|I:|G[3,3] ——-
Figure 2.5: Louis's Two-Dimensional Gate Array-
restricting the connectivity of a circuit. 
• Torresen's Two-Dimensional Gate Array. Another two-
dimensional gate array is proposed by Torresen [58] (see 
Figure 2.6). It relaxes the restrictions imposed on Louis's 
phenotype. A gate's input can be connected to any gate 
output in its previous layer. 
• The Function-Based F P G A (F^PGA). It is a function-level 
Evolvable Hardware (EHW) proposed by Murakawa [54 
(see Figure 2.7). It is used to evolve hardware solutions 
for calculation intensive applications such as digital sig-
nal processing and data compression [54]. In an F^PGA, 
there are multiple layers of programmable floating-point 
processing units (PFUs) that can perform different high-
CHAPTER 2. BACKGROUND STUDY 20 
— L G 二 = : LG 二 二 ： LG 一 i = ： LG 一 
f — I LG n 二 =1 LG n 二 =1 LG 广 ：：二：! LG 广？ 
^ T5 
？ C 
'X • • • _ 一 
• • • • 
LG ~ = ： LG = : LG 一 ：: = : LG 一 
layer 1 layer 2 layer i layer ii 
Figure 2.6: The phenotype proposed by Torreseii 
llTi l^llllll p |-H||||||[ [lllllll [96 1— 
i:三 PFU PFU i:::::::三 ; ; ; ： P F U 
——(ex SIN) j^UmH^ ]^ - - - -
X~-Tl- 1 ~41 p ~ _ ^ ~-
= U ：：：： PFU W^ PFU 三 三 PFU 
S ( fx . COS) fflfffffl - - - -
i/t — _ 11111111 I 
Y -— —— 3 ~-lj| 8 — 1. 9S ~-
三 PFU PFU ::::::::三• • • :::::::: PFU 
-••- (ex. ADD) H^HHp^  :::::::: 二 ：::: ：： 
——n^ ——m\\\\ P _ _ 9 9 一 
：：：：PFU PFU 。：:::::三 [ P F U 
5 ~4|||| no 一 100 - o 
i::三 PFU ：：：：：；；•： PFU 三 三 PFU — z | 
feed back ^^  
Figure 2.7: The phenotype of F^PGA 
level mathematic functions (e.g. sine, cosine, etc.). The ar-
chitecture of F^PGA is similar to the Torresen's one. The 
main difference is that F^PGA shares all external inputs to 
all PFUs in all layers. 
2.3 Genetic Parallel Programming 
In this section, a brief introduction about Genetic Parallel Pro-
gramming will be given. 
Genetic Programming (GP) [31] is a robust method in Evo-
CHAPTER 2. BACKGROUND STUDY 21 
Evolution Engine (EE) 
expected 
^^ : I operaucGs “ ‘ 
(po^a^OD)''^^ decompile — istl'^ly'^  
• 
mpun , outputs nx oa die MAP diicctly ^ 
a _ 1 -
-HI II 1.3 :、 I ieg[vVl] 1 
‘ ： 3 艺 
一 I ,, P.A ： 一 
alu[r] I 
__(I I ) t^H 'S iea[(j-l)>' >v-l； 
.. r"^ ： }： 
• [ — ( r e g K M - D v y ] 
a h i M ] ^ 1 
t — ^ ? 
, • • f regCv] 
crossbar a-I J ： 
switching- I ' S ’ 
network |,eg[-.-c-l] 
points 二- registers 
^ H z IN I …|z|N| 
stanis-flags 
Miilti-ALU Processor (MAP) 
Figure 2.8: The framework of a GPP system [12] 
CHAPTER 2. BACKGROUND STUDY 22 
lutionary Computation. There are many streams in G P like 
graph-based GP, stack-based GP, Cartesian GP, linear-tree and 
linear-graph G P and grammar-based GP. The two main streams 
in G P are standard G P [40] and linear-structured G P (linear 
GP) [6]. In standard GP, a genetic program is represented in a 
tree structure. In linear GP, a genetic program is represented in 
a linear list of machine code instructions or high-level language 
statements. A linear genetic program can be run on a target 
machine directly without performing any translation process. 
The Genetic Parallel Programming (GPP) paradigm pro-
posed by Cheang et. al. [43] is developed on the basis of linear 
GP. G P P is a novel linear G P paradigm that evolves paral-
lel programs of a Multiple Instruction-streams Multiple Data-
streams (MIMD) architecture with multiple Arithmetic-Logic-
Units (ALU). A genetic parallel program consists of a sequence 
of parallel-instructions. A parallel-instruction comprises multi-
ple sub-instructions that can perform multiple operations simul-
taneously in an execution step. G P P has been used to evolve 
compact parallel programs for different problems, such as nu-
meric function regression [43] and data classification problems 
9]. Figure 2.8 shows the framework of a G P P system. It con-
sists of two components, a Multi Logic Unit Processor (MLP) 
and an Evolution Engine (EE). The M L P is an execution engine 
for genetic program fitness evaluation. The E E manipulates the 
population of genetic programs, performs genetic operators such 
as mutation and crossover and decompiles the solution program 
to symbolic assembly and high-level language codes. The details 
of the M L P and the EE are presented in the subsequent section. 
2.3.1 Accelerating Phenomenon 
Experimental results show that G P P can evolve wide programs 
(more sub-instructions within a parallel-instruction) more ef-
CHAPTER 2. BACKGROUND STUDY 23 
ficiently than narrow programs (less sub-instructions within a 
parallel-instruction). It is called the G P P accelerating phenom-
enon [44]. This phenomenon is particulary important and neces-
sary. Having more sub-instructions within a parallel-instruction 
means that circuits can be evolved by G P P with a smaller 
depth level and smaller number of lookup tables. As a result, a 
Genetic Parallel Programming based Logic Circuit synthesizer 
(GPPLCS) can developed based on GPP. 
2.4 Chapter Summary 
This chapter has given a literature review on two determinis-
tic network-flow-based algorithms, i.e. FlowMap and D A O M a p 
which are popular among Computer Aided Design community. 
Moreover, a literature review on stochastic algorithms for multi-
level combinational logic circuit design as well as five different 
phenotype representation of combinational logic circuit design. 
Finally, a brief introduction of Genetic Parallel Programming 
have been presented. 
• End of chapter. 
Chapter 3 
A GPP based Logic Circuit 
Synthesizer 
In this chapter, a Genetic Parallel Programming based Logic 
Circuit Synthesizer System (GPPLCS) is presented [10, 11’ 12 . 
There are two main cores, the Evolution Engine (EE) and the 
Multi Logic Unit Processor (MLP). The EE manipulates the 
population of genetic programs, performs genetic operators such 
as mutation and crossover. The M L P is an execution engine 
for genetic program fitness evaluation. Variable-length parallel 
program structure (MLP program) is used to represent combi-
national logic circuits in order to preserve introns in the early 
stage. Circuits are evolved by a dual-phase approach. The first 
phase is called design phase. G P P L C S aims at finding a 100% 
functional program. Only functional correctness of the genetic 
. programs are taken into consideration in this stage. Other qual-
itative factors like LookUp Table (LUT) count, propagation de-
lay and program size are not considered. Once a first correct 
genetic program is found by the GPPLCS, we proceed to the 
second phase, optimization phase. Another set of genetic opera-
tors together with an optimization-oriented fitness function are 
used to improve the qualities of the correct program. 
This chapter is organized as follows. The overall architecture 
of G P P L C S is described in Section 3.1. A detailed description 
24 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 25 
MLP program in parallel assembly truth table r ^ ^ I n 
decompile t ^ c a s ^ ^ ^ ^ ^ / ^ " 
I expected ou t pu t ! 
章 evaluated 
( ^ p u l a l i c ^ ^ ^ ^ ^ ^ ^ evaluate ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ <2. T 1 r ^ ^ 
G E N E T I C Ol 'bRATlONS：. ‘ genotype ‘ -•|4-LUt|- - » 
mutation, crossover, i n d i v i d u a l s ^ ^ ~ 
sclcction, ctc... ^ ^ ^ z 
Evolution Engine (EE) Multi-Logic-Unit Processor (MLP) 
Figure 3.1: The system block diagram of GPPLCS 
of the M L P is presented in Section 3.2. Then, both the genotype 
and phenotype of M L P program are discussed in Section 3.3 and 
3.4. It is followed by a detailed description of the EE in Section 
3.5. Finally, a chapter summary is given in Section 3.6. 
3.1 Overall system architecture 
Genetic Parallel Programming (GPP) is a linear G P paradigm 
that evolves parallel programs based on the MLP. Thus, par-
allel programs evolved are called M L P programs. G P P L C S is 
developed based on the GPP. It is a logic circuit synthesizer de-
signed for tackling technology mapping problem by a stochastic 
approach. It first takes a truth table of a circuit (training cases) 
as an input. The output is a mapping solution to the circuit 
in the L U T format. Although numbers of inputs to the L U T 
can be varied, they are chosen to be either 2 or 4. All com-
binational digital circuits presented are evolved by a two-stage 
(i.e. design and optimization stages) approach. Different sets 
of genetic operators including crossover, bit mutation and sub-
instruction swapping are used in different stages. In the design 
stage, the G P P L C S system aims at finding a 100% functional 
program (correct program). The raw fitness is given by the ratio 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 26 
32 
_ i e g [ 0 ] 
• p-B 
« • « • 
i i l ^ f c l B L。U[15] -4^reg[15]-4^ 
32-to-l multiplexers • * r constant J • 
registers I J 
single-bit registers 
Figure 3.2: The 2-LUT MLP used by the GPPLCS 
of unsolved training cases. In the optimization stage, the raw 
fitness then puts emphasis on the L U T count, the propagation 
delay and the program length. In other words, the major ob-
jective of the optimization stage is reducing the L U T count and 
then the propagation delay. 
G P P L C S consists of two components, the EE and the MLP. 
The E E manipulates the genetic parallel programs and performs 
genetic operations. The M L P evaluates the genetic parallel pro-
grams to determine their fitness. Figure 3.1 shows the system 
block diagram of GPPLCS. The details of the E E and the M L P 
are presented in the subsequent sections. 
3.2 Multi-Logic-Unit Processor 
The M L P used in the GPPLCS is a general-purpose, tightly cou-
pled processor. It is used for executing Boolean circuits evolved 
in G P P L C S (i.e evaluation of genetic program in GPPLCS). 
Since G P P L C S can evolve circuits in either 2-input L U T (2-
LUT) format or 4-input L U T (4-LUT) format, the architecture 
of M L P is problem specific. The difference lies on the k-input 
logic units (A:-LoUs). 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 27 
(VrV rVrV “ �� S � 
• • jMiToUHZr^I �‘ 
Ld . H • I ^ 
I ： p L15 R15 f ： 
r R16 • 
_ _ . _ \j 
constants < ' 丫 
&inputs R31 |j 
. , . , 32 bits single-bit registers 
Figure 3.3: The 4-LUT MLP used by the GPPLCS 
The M L P designed for evaluating circuits in 2-LUT format 
(2-LUT M L P ) is shown in Figure 3.2. It consists of 16-LoUs 
(L0-L15), 16 variable registers (reg[0]-reg[15]) and 16 constant 
registers (reg[16]-reg[31]). 
In the MLP, variable registers store intermediate values and 
program outputs; and constant registers store program inputs 
and constants. Each variable register can only be modified by 
a dedicated LoU (as shown in Figure 3.2, LoU[i] writes to reg[i 
only). Constant registers are preloaded by EE before execution 
of an M L P program. In each processor clock cycle, multiple 
LoUs take input values from registers and perform Boolean op-
erations concurrently. Then, all LoUs write single-bit results to 
their corresponding output variable registers. For example, the 
2-LUT M L P shown in Figure 3.2 can perform up to 16 differ-
ent operations concurrently, and 16 intermediate results can be 
carried forward to the subsequent parallel-instructions through 
the variable registers. 
Figure 3.3 shows the M L P designed for evaluating circuits in 
4-LUT format (4-LUT MLP). Similarly, the M L P consists of 32 
registers. R0-R15 are variable registers that store intermediate 
values and program outputs while R16-R31 are read-only regis-
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 28 
LoU[0] LoU[l] ... LoU [？] ... LoU[15] 
PI[0] SI[0,0] 5I[0,1] ~[… 5I[o}] ... SI[0;15]— 
PI[1] SI[1,0] SI[U] ~ ... Sljl:/] I ... 5I[1/15]— 
PI[i] I SI[t.O] I ！3I[U] ~I ... I SI['t',;;] ~| ... I SI[；:15]— 
1" 1 Linax 一 1 ] I S I [ L n ^ , x - l , 0 ] I S I [ L m a x - l , l ] | … | S I [ L m a x - 1 , / ] … | S I [ L m a > c - l , 1 5 ] | 
Figure 3.4: The genotype of a Lmax-P'^ (PI[0]-PI[LMyu^-l])’ 16-SI(SI[*,0j-
SI[*,15]) MLP program 
Table 3.1: Control-codes in 2-LUT circuits SI 
fields number of bits encoding 
function opcode 5 00000 - 01111 = bO - bF (see Figure 3.6) 
10000 - m i l = no operation, nop 
operand A 5 00000 - 11111 = input [0] - input [31] 
operand B 5 00000 - 11111 二 input [0] - input [31] 
Total 15 
ters that store program inputs and logic constants. A variable 
register can only be modified by a dedicated 4-LoUs (e.g. LO can 
write to RO only). 16 4-LoUs (L0-L15) perform logic operations. 
E E will preload the program inputs and the constants into the 
read-only registers before a parallel program is executed. 
3.3 The Genotype of a MLP program 
The individual representation of G P P L C S includes a sequence 
(LMAX) of parallel instructions (Pis). In each PI, there are 16 
sub-instructions (Sis). Figure 3.4 shows the genotype of an M L P 
program. The choice of LMAX depends on the problem difficulty. 
Normally, it is set to 25. Figure 3.5 shows the representation of 
Sis. 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 29 
SI used in evolving 2-LUT ^ , . , ^ , . � 广•� � " 5-bit opcode 5-bit operand 5-bit operand circuits 
SI used in jwlvmg 4 LUT 17-bit opcode 5-bit operand 5-bit operand 5-bit operand 5-bit operand 
Figure 3.5: Representations of Sis in evolving 2-LUT and 4-LUT circuits 
Theoretically, G P P L C S can evolve circuits with any number 
of inputs of LUTs. The difference only lies on the encoding. 
Since G P P L C S currently evolves circuits in either 2-LUT or 4-
L U T format, encoding methods of Sis used are slightly different 
as each SI is used to resemble a LUT. For 2-LUT circuits, each 
SI consists of a 5-bit opcode (encoding at most 32 functions) 
and two 5-bit operands (encoding 32 choices of different inputs) 
(see Table 3.1). Since there are 16 Sis in a PI, a total of 240 
bits ((5+5+5) X 16) are used to encode a parallel-instruction. 
If LMAX is chosen to be 25 (25 Pis), the genotype may contain 
up to 6,000 (240 X 25) bits. 
For 4-LUT circuits, each SI consists of a 17-bit opcode and 
four 5-bit operands (see Table 3.2). The Boolean function of 
each SI is denoted by a four-digit hexadecimal number which 
represents the 16-bit memory contents of the 4-LUT. For exam-
ple, the SI with opcode bF6E0 means loading "0000 0111 0110 
1111" to the corresponding 4-LUT which can be treated as a 
16 to 1 multiplexer. The content of the corresponding 4-LUT is 
shown in Fig. 3.7. Similar to the 2-LUT circuits, if the max-
imum program length is 25 parallel-instructions, the genotype 
may contain up to 14,800 ((17+5+5+5+5) X 16 X 25) bits. 
G P P L C S can further be extended to evolve 6-LUT circuits. 
Each SI will consist of 65-bit opcode and six 5-bit operands. 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 30 
inputs addresses 
function A _ _ 1 _ _ 0 _ _ ^ Boolean 
names B 1 0 1 0 expressions 2-LUT symbols 
— b o "o"o~~o""cr 0 0 — -
M 0 0 0 1 A T B 
b2 0 0 1 0 AB b^D-
b3 0 0 1 1 A ^ - O -
b4 0 1 0 0 A B s i j D -
1)5 0 1 0 1 B B-g>-
— b b ^ T T " o " A ㊉ B 
b7 0 1 1 1 AB bIEV 
b 8 ~ AB “ b = E > - — 
b9 1 0 0 1 A e B 
— b A T " m B B - ^ 
bB 1 0 1 1 A + B 
bC TT"o""o"" A — 
bP 1 1 0 1 A + B s j g ^ 
一bE r T T T A + B 
bF I 1 I 1 I 1 I 1 I 1 一 
Figure 3.6: Functions bO - bF used in 2-LUT circuits SI 
Table 3.2: Control-codes in 4-LUT circuits SI 
fields number of bits encoding 
function opcode 17 00...0 - 01...1 = bOOOO - hFFFF 
10...0 - 11...1 = no operation, nop 
operand A 5 00000 - 11111 = input [0] - input [31] 
operand B 5 00000 - 11111 = input[0] - input [31] 
operand C 5 00000 - 11111 = input [0] - input [31] 
operand D 5 00000 - 11111 input[0] - input [31] 
Total 37 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 31 

















Figure 3.7: The corresponding content of 4-LUT of the "bF6E0 r31 r27 r08 
r29 rOO" sub-instruction 
3.4 The Phenotype of a MLP program 
M L P programs are presented in parallel assembly form. Figure 
3.8 shows an optimized M L P program for 1-bit full adder in 2-
L U T format evolved by GPPLCS. It consists of two sections, 
the #data and #progi:am sections. The #data section defines 
constant, input and output Boolean variables. Before starting 
an execution, an M L P always initializes all variable registers 
(reg[0]-reg[15]) to logic 0. The constants: line in the #data sec-
tion initializes constant registers reg[16]-reg[21] to logic 0 and 
reg[22]-reg[28] to logic 1. The inputs: line defines input vari-
ables (Cin, A and B) and assigns them to constant registers 
(reg[29], reg[30] and reg[31]). The outputs: line defines output 
variables (Cout and S) and assigns them to variable registers 
(reg[0] and reg[l]). The #program section contains parallel-
instructions that perform Boolean operations. 
For example, the numbered lines in the #program section 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 32 
#data 
constants: (rl6-r21)=0, (r二；：-r二8)=1 
inputs : (r：：9, r30, r 31) <= <Cin, A, B} 
outputs: (r00,r01)=>(Cout,S) 
#program 
00: b9 r2 9 r30 r04 
01: b8 r04 r30 rOO,b：： r04 r3丄 rl4 
02: b6 rl4 rOO r00,b9 r3丄 r04 rOl 
Figure 3.8: Optimized MLP program for 1-bit full adder in 2-LUT format 
A[丨.30] ~ J - i i y 
I "^ J 勺 Coi;t[tOO] 
'-fTV— 
B[r31] I=s 乙J 
= > SCrOl] 
Figure 3.9: A 1-bit full adder in 2-LUT format 
in Figure 3.8 list out three parallel-instructions. For easy in-
terpretation, all nop sub-instructions in the original program 
are hidden. Each sub-instruction consists of three parts: 1) a 
function name {bO-bF or nop); 2) registers for input operands; 
and 3) an output register. For example, the b6 rl4 rOO rOO 
sub-instruction in parallel-instruction 02: performs b6 (XOR) 
on reg[14] and reg[0] and then writes the result back to reg[0 . 
Figure 3.9 shows the corresponding combinational logic circuit 
of the M L P program shown in Figure 3.8. 
The situation is similar in evolving circuits in 4-LUT format. 
Figure 3.10 shows a 2-bit full-adder in 4-LUT format evolved by 
the GPPLCS. Figure 3.11 shows the 2-bit full adder. Noticeably, 
three out of the four 4-LUTs can be replaced by 3-LUTs because 
they have one input set to a constant logic 0. 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 33 
#data 
constants: <1：丄6—1-：2丄）=0, (r：：：： — 二：：6)=丄 
inputs ： < r27, r2 8, r：： 9, r30, r31) <= (Cin, M , AO, B丄，BO) 
outputs: (rOO,rOl,r02)=>(Cout,SI,SO) 
#program 
00: bF6E0 1-31 r27 r08 r：：^  rOO 
01: b3AA4 rOO r28 r06 r30 r00,bCB9E rOO r23 r30 r21 rOl, 
b849E 1-31 r27 r31 r29 r02 
Figure 3.10: Optimized MLP program for 2-bit full adder in 4-LUT format 
nzzzzzizii^ ^^ iz m 
_ i ] 1 = — 
Cln[r27] J —— 
0 - g w 
A0[r29] c = > ~ 5 c = > Cout[rOO] 
S ^ I I ——^ 
0—I 
Figure 3.11: A 2-bit full-adder in 4-LUT format 
3.5 The Evolution Engine 
The Evolution Engine (EE) is responsible for manipulating the 
population, performing genetic operations, loading genetic pro-
grams to a M L P for fitness evaluations, calculating/reporting 
statistics and decompiling the evolved solution program to a 
symbolic parallel assembly program (MLP program). 
3.5.1 The Dual-Phase Approach 
In order to evolve a solution with GPP, enough spare space (for 
both parallel-instructions and sub-instructions) are necessary to 
be given in each genetic program for introns to be built up. In-
trons are non-effective instructions which do not contribute to 
the final output of a genetic program. Research results show 
that the existence of introns in genetic programs in the early 
and middle stage of a run can benefit evolution [5]. The exis-
tence of introns in the early and middle stages of a G P evolution 
is necessary. Introns are necessary to be in the genetic programs 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 34 
until we find the first correct program. However, the first cor-
rect program is usually not an optimized solution in terms of 
quality measurements such as L U T count and the propagation 
delay. To tackle this problem, G P P L C S uses a dual-phase (de-
sign and optimization phases) approach with a dual-phase fit-
ness function.The dual-phase fitness function intends to improve 
the functionality of genetic programs before the first correct ge-
netic program is found. Whenever a correct genetic program 
is found, it changes its fitness calculation criteria to incorporate 
optimization-oriented measurements. Besides the dual-phase fit-
ness function, GPPLCS uses different set of genetic operators in 
the two phases. Details can be found in subsequent section. 
In the design phase, G P P L C S aims at finding a 100% func-
tional program (correct program). Its raw fitness is given by 
fdp — � 
where U is the number of unmatched training case and T is the 
total number of training cases. 
The design phase raw fitness /办 is used to evaluate the func-
tional fitness of a genetic program. If there is a partial correct 
genetic program, its fdp is greater than zero. /办 equals to zero 
only when all training cases are matched. After finding the first 
correct genetic program, the evolution will proceed to the opti-
mization phase to optimize correct genetic programs based on 
some optimization-oriented criteria. In the optimization phase, 
the raw fitness is given by 
, g d 1 L 1 
fop = — — + - j X ——+ — X 
Qrnax ^rnax 9max -^max ^maxQmax 
The optimization phase raw fitness fop of a correct genetic pro-
gram is calculated from three qualitative indicators: 1) the L U T 
count g (the number of normal sub-instructions) ； 2) the prop-
agation delay d\ and 3) the program length L (the number of 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 35 
parallel-instructions). Since a genetic program consists of nop 
and introns, L represents the number of L U T levels in the logical 
circuit diagram but not the actual L U T delay in hardware. It is 
because nop and introns are not placed in real hardware so that 
their L U T delays are not counted. The ffmax,山腿 and Lmax 
are the maximum values allowed for the L U T count, the prop-
agation delay and the program length respectively. The main 
objective of the optimization phase is to reduce the L U T count 
and then the propagation delay. The last multiplication term in 
fop guides the evolution to shorten the lengths of correct genetic 
programs. Normally, a shorter program has greater chance to 
have smaller g and d values. -
By combining the two phases raw fitness functions (/办 and 
fop), the dual-phase fitness function of the whole evolution process 
is obtained. In the design phase (/办 > 0), /謂 is given by 
fraw = 1.0 + fdp 
In the optimization phase (/办=0), fraw is given by 
fraw — fop 
The constant 1.0 is used to distinguish the two phases. With 
this fitness function, a partially correct genetic program has an 
fraw greater than 1.0 whereas a correct genetic program has an 
fraw less than 1.0. In the design phase, whenever G P P L C S finds 
the first genetic program with an fraw equal to 1.0, it proceeds 
to the optimization phase. 
3.5.2 Genetic operators 
In this subsection, genetic operators used in G P P L C S are de-
scribed. 
• Genetic Programs Initialization: G P P L C S uses a binary 
string (genotype) to encode a M L P program (phenotype). 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 36 
paieut 1 parent 2 ciiild 丨 chiki 2 
11 ^ 111 MpUM 111 A 111 LjjJtJl 
PIS < 國 I X 闘 ! ~ 一 1 1 2 1 ^ H 
I I ‘ I I I 、 ； ， 
Figure 3.12: PI level crossover on two parents 
Before an evolution process, EE initializes all genetic pro-
grams in a population randomly. The number of PI {L : 
length) of a genetic program is chosen randomly between 
one to a predefined value (L丽:the maximum program 
length). Each bit in a genotype has equal chance to be 0 
or 1. 
• Tournament Selection: G P P L C S uses tournament selection 
to produce its offspring. In each tournament, a fixed num-
ber (tournament size) of genetic programs are randomly 
selected from the population to form a tournament set. Ac-
cording to their fitness, the two best genetic programs in 
the tournament set are selected as parents to produce two 
offspring. The tournament size controls the selection pres-
sure and affects the convergence rate. 
• PI level crossover: It is a two-point crossover to exchange 
two segments of PI from two parent M L P programs (see 
Figure 3.12). All sub-instructions in a parallel-instruction 
will always be kept as a whole. The probability to take this 
operator is Pxover-
• Bit Mutation: It mutates individual bits in the genotype of 
an M L P program based on a probability Pumut-
• SI swapping: It swaps two sub-instructions inside an M L P 
program based on a probability Psiswp (see Figure 3.13). 
It can pack more normal sub-instructions in less number 
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 37 
a 剛 alu[l] alu[15] 
PI[p]|SI[/，,0]|SI[jM]| • • • |SI[；U5 了 
^ ^ 
PI[ q ]|SI[ry,0]|SI[g;i]| • • • |SI[^y,15] 
Figure 3.13: An SI swapping in a single MLP program 
of parallel-instructions so as to increase the parallelism of 
M L P program. SI swapping is only used in the optimiza-
tion phase since it intends to improve the performance of a 
correct genetic program. 
• SI-Deletion: It simply replaces a normal sub-instruction 
with a nop sub-instruction based on a probability Psidei- It 
can delete inactive sub-instructions (introns) from a correct 
genetic program and therefore is only used in the optimiza-
tion phase. 
• Diversity Maintenance: In order to maintain the diversity 
of population, E E adopts an individual replacement tech-
nique similar to the pre-selection [47]. In each tournament, 
two children are bred and evaluated. Then, the better one 
is selected and compared with its parents. If its fitness is 
different from both of its parents, it will replace the worst 
individual in the tournament set. This approach avoids 
similar individuals filling up the population and hence in-
creases the diversity of search. 
• Dynamic Sample Weighting (DSW): For some problems, 
e.g. Boolean functions, the distribution of training samples 
in the sample space is biased. These biased samples usually 
cause premature convergence in Genetic Algorithms (GAs) 
and Genetic Programmings (GPs). D S W [8] is used to bal-
ance the contributions of training samples so that the di-
CHAPTER 3. A GPP BASED LOGIC CIRCUIT SYNTHESIZER 38 
versity of genetic programs can be increased. This operator 
is only used in the design phase. 
3.6 Chapter Summary 
This chapter has presented GPPLCS. Two core components of 
G P P L C S (MLP and EE) are described. The M L P is tightly-
coupled processor which is used to execute and evaluate genetic 
programs produced by EE. The genotype of a M L P program 
is a sequence of control-codes which can be executed on the 
corresponding M L P directly. The phenotype of a M L P program 
is a parallel assembly program. EE is an evolutionary process 
which performs genetic operators, loads genetic programs to the 
MLP, calculates/reports statistics and decompiles the solution 
parallel program to a symbolic parallel assembly program. 
Furthermore, G P P L C S uses a dual-phase evolutionary ap-
proach which divides the evolution into two sequential phases. 
Firstly, the leaning phase evolves correct genetic programs. Then, 
the optimization phase improves the qualities of correct genetic 
programs. A dual-phase fitness function is used to guide the 
evolution. 
• End of chapter. 
Chapter 4 
MLP in hardware 
This chapter presents a hardware-assisted Multi-Logic-Unit Proces-
sor (MLP). It is a hardware processor built on a Field Pro-
grammable Gate Array (FPGA). The purpose is to speed up 
the evaluation of genetic parallel programs (MLP programs) 
that represent combinational logic circuits. Six combinational 
logic circuit problems are presented to show the performance 
of the hardware-assisted Genetic Parallel Programming based 
Logic Circuit Synthesizer (GPPLCS). Experimental results show 
that the hardware M L P speeds up the evolutions over 10 times. 
For difficult problems such as the 7-bit majority selector, the 
speedup ratio can be up to 36. 
This chapter is organized as follows. Our motivation is de-
scribed in Section 4.1. Then, the hardware design and imple-
mentation of M L P is presented in Section 4.2. It is followed by 
experiments. Section 4.3 is on the experimental settings. The 
experimental results and evaluations are given in Section 4.4. 
Finally, Section 4.5 is a chapter summary. 
4.1 Motivation 
In the last decade, advances in F P G A [2] have made efficient 
Evolvable Hardware (EHW) [63] possible. E H W uses Evolu-
39 
CHAPTER 4. MLP IN HARDWARE 40 
tionary Algorithms to evolve hardware architecture extrinsically 
or intrinsically. One of the major usages of E H W is to design 
combinational logic circuits [19, 36, 52]. However, the impor-
tance of scalability of E H W has been recognized by several re-
searchers [27, 30]. It is a tough problem faced not only by E H W 
researchers, but by other researchers in the fields of evolution-
ary computation, artificial neural networks, and artificial intel-
ligence in general. 
Using hardware to increase the speed of evolution is one of 
possible ways to combat the high computational cost. F P G A 
has been adopted to speed up Genetic Algorithms (GAs) and 
Genetic Programming systems [28, 41, 48, 56]. The basic idea 
is to put the whole or a part of a G A or G P system in hardware 
so as to solve problems in a shorter time than a pure software 
system. 
A hardware assisted M L P is designed and implemented to 
speed up evaluation of genetic parallel programs in GPPLCS. 
The overall system of hardware assisted G P P L C S is exactly the 
same as the pure software G P P L C S in Chapter 3. The difference 
only lies on the MLP. Experiments on six combinational logic 
circuit problems (i.e. a 6-bit multiplexer, a 2-bit full-adder, 
a 3-bit comparator, a 6-bit priority selector, a 7-bit majority 
selector and a 2-digit binary coded decimal to binary decoder) 
were conducted to show the effectiveness of G P P L C S with the 
hardware MLP. Experimental results show that the hardware 
M L P speeds up the evolution by at least 10 times even for the 
easier problems which are less computation intensive. 
4.2 Hardware Design and Implementation 
This section presents the hardware design and implementation 
details of MLP. Fig. 4.1 shows the architecture of the core part 
of MLP. The 16 sub-instruction registers (SIR0-SIR15) store the 
CHAPTER 4. MLP IN HARDWARE 41 
32 
\ 
• S I R O K • P E O \ » 
n 
S 《 S l R l 《 P E l I — 
2. . • 
— ^ ~ ^ SIR15 \ • PE15 — A ~ • 4 \ read-only registers 
control slgniis 
Figure 4.1: The architecture of the MLP core 
individual sub-instructions in the current parallel-instructions. 
The 16 processing elements (PE0-PE15) run sub-instructions 
and store results to their corresponding variable registers. The 
Control Unit (CU) decodes parallel-instructions and gives con-
trol signals to all M L P components. Due to the limited size of 
the inter-face bus between the C U and the host (64-bit only), 
more than one bus cycle are needed to transfer the evaluation 
results of all rows in a truth table to the host. 
In most cases, G P P L C S only uses the first eight variable reg-
isters (R0-R7) to store program outputs. Thus, M L P only needs 
to transfer the first eight variable registers to the host. In order 
to maximize the usage of the 64-bit interface bus, M L P is de-
signed to buffer eight sets of program outputs (of eight training 
cases). In this way, the evaluation results of the entire truth ta-
ble are passed to the host in burst mode. For example, if there 
are N rows in a truth table, it takes N/8 clock cycles to transfer 
all program outputs to the host. 
Fig. 4.2 shows a PE (PEi) which receives a sub-instruction 
from SIRi. It stores the result in the variable register Ri. The 
core of the PE is a 4-LUT. It takes two processor clock cycles for 
the PE to execute one sub-instruction. In the first cycle, four 
input registers are selected by four multiplexers (M1-M4), and 
their values are then latched into an Internal Operand Register 
CHAPTER 4. MLP IN HARDWARE 42 
R0..R31 C H S \ » A 
\ • \ »• \ • B 办 
2 P -V—T^  \ <'UZ> R1 
-iURnU* 知 c H "-v 
! ill Liugyu^LU^D I 
\ 5 \ 5 \ 5 V \ leiopoKte] \ I 
5 1 R . P / / / / 2 ^ 
Figure 4.2: A Processing Element 
Table 4.1: Pilchard board features 
Field Details 
Host interface: DIMM interface (a 64-bit data bus and a 14-
bit address bus) 
Operating frequency: 100 MHz 
FPGA device: XCV1000E-HQ240-6 
OS supported: GNU/Linux 
(lOR). In the second cycle, the 4-LUT uses the four latched 
operands to look up one bit and stores the result into Ri. The 
lOR is used to pipeline the operations, i.e. selecting operands 
and looking up results, and to balance the long delay time on 
the route from the registers' outputs to the multiplexers' inputs. 
The M L P is implemented on a Pilchard board [42, 60] which 
is a high performance reconfigurable computing development en-
vironment employing an F P G A . The Pilchard board is plugged 
into a 133 M H z synchronous dynamic R A M Dual In-line Mem-
ory Modules (DIMMs) slot of a PC. The Pilchard board can 
achieve a very high data transfer rate by making use of the 
D I M M R A M interface of the PC. Its efficient interface and low 
cost make it suitable for implementing the MLP. Here are some 
major features of the Pilchard board: 
The F P G A used in the Pilchard board belongs to the Virtex-
E series. The M L P uses only 2,515 slices. It is about 20% of the 
CHAPTER 4. MLP IN HARDWARE 43 
12,288 slices available in the F P G A . Moreover, only one (out of 
96) BlockRAM is used by the MLP. The critical path delay of 
the M L P is 9.965ns. Hence, it can operate at 100 MHz. 
The M L P is coded in Very High Speed Integrated Circuit 
Hardware Description Language (VHDL) [57] which is a stan-
dard language for describing the structure and function of inte-
grated circuits (ICs). 
4.3 Experimental Settings 
To investigate the performance of the GPPLCS, we have used 
the system to evolve networks for six combinational logic cir-
cuit problems in 4-input L U T format (see Table 4.2). Although 
the G P P L C S evolves circuits in dual-phase approach, all exper-
iments in this chapter are conducted with design phase only. It 
is because large proportion of execution time used in evolving 
circuits by the G P P L C S lies on the design phase. Moreover, 
only one independent run is necessary to show the effectiveness 
of the hardware assisted GPPLCS. 
Note that the 6-bit priority selector is to show the position of 
value，r which first appears starting from the least significant 
bit in the 6-bit input. If none of the bits is set to value '1', 
an extra output bit which shows the case of all zero value is 
responsible for this special case. Since we have got six input 
bits (Inputs - InputO), we need extra three bits to indicate the 
position. Therefore, there are 4-bit outputs. 
The 7-bit majority selector is to determine the majority value 
of the 7 bits inputs. If more than 4 bits have value '1', the output 
value will be '1'. Otherwise, the output bit will have value ’0’. 
In addition, the 2 digit Binary Coded Decimal (BCD) to Bi-
nary decoder is to decode the 2 B C D into binary value. B C D is 
the most common way of encoding decimal digits in computing 
and in electronic systems. In B C D , a digit is usually represented 
CHAPTER 4. MLP IN HARDWARE 44 
Table 4.2: Six combinational logic circuit problems used in GPPLCS with 
the hardware assisted MLP. The Nin and Nout denote the numbers of inputs 
and outputs respectively. The Nrow (=2^''")denotes the number of rows in 
the truth tables . The N a^se {=Nrow x A^out)denotes the total number of 
training cases • 
Name Description Nin Nout Nrow Ncase 
MUX 6-bit multiplexer 6 1 64 64 
ADD 2-bit full-adder 5 3 32 96 
CMP 3-bit comparator 6 3 64 192 
PRI 6-bit priority selector 6 4 64 256 
MAJ 7-bit majority selector 7 1 128 128 
BCD 2-digit Binary Coded 8 7 256 1792 
Decimal to Binary de-
coder 
by four (binary) bits, of which the leftmost (written convention-
ally) has value 8, and the remaining three have values 4, 2, and 
1. Only the combinations of these bits which, when summed, 
have values in the range 0-9 are valid. The decoder has got 
2 B C D . Thus, there will be 8-bit input which is correspond to 
value 0 - 99. The output value range 0-99 then needs 7 bits to 
represent its output values. 
All experimental settings are listed out in Table 4.3 below. 
In order to have a fair comparison in the performance between 
hardware-assisted G P P L C S and the pure software counterpart, 
evolutions of combinational logic circuits for the six combina-
tional logic circuit problems were run on the same host (i.e. 
the P C where a Pilchard board locates). The host in which the 
Pilchard board locates is a Pentium III 800 M H z P C with ASUS 
CUSL2-C motherboard. The Pilchard board relies on the P C to 
communicate. User can transfer data to the Pilchard board via 
the D I M M slot in the host PC. The PC host is chosen because 
CHAPTER 4. MLP IN HARDWARE 45 
Table 4.3: Experimental settings used in GPPLCS with the hardware assisted 
MLP 
Design phase only 
maximum program length 25 parallel instructions (Pis) 
(-^max) 
initialization bit random, average 12.5 (Z/max/2) Pis 
selection method tournament (size二 10) 
4-LUT function set bOOOO, . . . , bFFFF, nop 
inputs ...只31 
outputs outputs： RQ . . . I?7VO„t-l 
constants logic 0, logic 1 
population size 2000 
termination(tmax) 40,000,000 tournaments 
PI crossover Prob. {Pxover) 0.1 
bit mutation Prob. {Pumut) 0.002 
Sub instruction (SI), swap- 0.0 
ping Prob. {Psiswp) 
SI. deletion Prob. {Psidd) 0.0 
Dynamic Sample Weight- 10,000 tournaments 
ing (DSW) (weights update 
freq.) 
preselection yes 
raw fitness the ratio of unsolved training cases ( = 
1.0 + fdp) 
success predicate all training cases solved (= 1.0 (i.e. 
fdp=0.0) 
CHAPTER 4. MLP IN HARDWARE 46 
of the low level control required to mange the Pilchard board. 
W e tested the problems with both the hardware-assisted GP-
PLCS and the pure software GPPLCS. The time for each tour-
nament was recorded for comparison. 
4.4 Experimental Results and Evaluations 
Promising results are obtained for all the six combinational logic 
circuit problems. Table 4.4 summarizes the total elapsed times 
for the G P P L C S to evolve complete correct solutions with a pure 
software M L P and a hardware MLP. The tn and ts columns list 
out the execution times of the hardware-assisted G P P L C S and 
the pure software G P P L C S respectively. 
It can be seen that the speedup of hardware over software is 
significant. For the A D D , M U X and PRI problems, the speedups 
are more than 10 times. For the C M P and B C D problems, 
the speedups are more than 20 times. For the most difficult 
problem in our circuits evolved - M A J, the speedup can be up 
to 36. The C M P problem takes nearly 10 hours to complete 
with the pure software GPPLCS, but it only takes less than half 
an hour with the hardware-assisted GPPLCS. Thus, problems 
of different levels of difficulties gain different speedups. This 
is easily recognized because the more difficult the problems, the 
more tournaments (computational effort) are taken to complete. 
Fig. 6 shows the speedup curves for the six tested problems. In 
these figures, the X-axis is the number of tournaments taken 
while the Y-axis is the speedup ratio {ts/tn)-
Figures 4.3 and 4.4 show that the speedup ratios for the M U X 
and A D D problems increase steadily to around 10. These two 
problems are relatively simple. Thus, the required computa-
tional efforts to evolve solutions for them are not so large. Con-
versely, in Figures 4.5 ,4.6 and 4.8, the speedup ratios are less 
than five initially when the evolution takes only a few thousand 
CHAPTER 4. MLP IN HARDWARE 47 
Table 4.4: Summary of experimental results in GPPLCS with hardware as-
sisted MLP 
Problems tfj ts speedup ratio 
(in sec) (in sec) [ts/tH) 
MUX 68 689 10.13 
ADD 346 3497 10.11 
CMP 1,575 31,983 20.30 
PRI 720 13,471 18.71 
MAJ 24,680 895,581 36.29 
BCD 11,608 280,269 24.14 
12「 
10 - z 
I：： Z 
0 I ‘ ‘ ‘ ‘ 1 
0 10000 20000 30000 40000 50000 60000 
Tournaments 
Figure 4.3: The speedup ratio versus tournaments for MUX problem 
12「 
10 — _ - - • - • - - - • ^ 
w 4 、------- -. 
2 / …… 
0 I ‘ —' ‘ ‘ ‘ 1 
0 100000 200000 300000 400000 500000 600000 
Tournaments 
Figure 4.4: The speedup ratio versus tournaments for ADD problem 
CHAPTER 4. MLP IN HARDWARE 48 
25 . . ； 
3 I ‘ I • 
I '0 -
W ‘ _ ‘ 
5 ： ： I 
0 ^  ‘ ‘ ‘ ‘ ‘ ‘ 
0 500000 1000000 1500000 2000000 2500000 3000000 
Tournaments 
Figure 4.5: The speedup ratio versus tournaments for CMP problem 
20 _ 丨 ^^^^^一 
16 • • - - , - — — 
1 / ； 
t'o 7 i 
'、L 
2 T - -
0 I ‘ ‘ ‘ ‘ ‘ ‘ 
0 200000 400000 600000 800000 1000000 1200000 
Tournaments 
Figure 4.6: The speedup ratio versus tournaments for PRI problem 
40 r 
35 V 
30 ..--.- • •- -. 
•S 25 • 
2 
§•20 
I '5 . 
10 • 
0 L— ‘ ‘ ‘ ‘ 
0 10000000 20000000 30000000 40000000 
Tournaments 
Figure 4.7: The speedup ratio versus tournaments for MAJ problem 
CHAPTER 4. MLP IN HARDWARE 49 
30 r . . 
25 • ： ； ^ i ： S 20 -…-- - -十 -… 
2 ‘ _ 
I 15 丨 ： 
！丨0.--…------1——^ 
5 - r , 
0 I ‘ ‘ ‘ 
0 5000000 10000000 15000000 
Tournaments 
Figure 4.8: The speedup ratio versus tournaments for BCD problem 
tournaments. As the evolution completes more tournaments, 
the speedup ratio increases rapidly to 24 times. For the most 
difficult problem - M A J in our problem sets, the speedup can be 
up to 36 due to large computational efforts required. The result 
is shown in Figure 4.7. 
It is found that the speedup ratio increases with the number 
of tournaments taken in the evolution. It is obvious since execu-
tion time of each hardware evaluation is faster than that of each 
software evaluation by a certain theoretical limit. However, the 
speedup is not so high due to the overhead in the communica-
tion bus between the software EE and the hardware MLP. Thus, 
there is a small speedup ratio when the number of tournaments 
executed is small as the overhead occupies a larger proportion 
of execution time during the evolution than than fitness evalu-
ation. However, it is expected that the speedup ratio is higher 
in those problems which have a larger number of tournaments 
taken as fitness evaluation occupies the largest proportion of ex-
ecution time. For example, in the M U X problem, only 10-time 
speedup is obtained due to the small number of tournaments 
taken (52,286). However, 20-time speedup is found in the C M P 
problem which takes 2,398,865 tournaments. 36-time speedup 
is also found in the M A J problem which takes 34,006,503 tour-
naments. 
CHAPTER 4. MLP IN HARDWARE 50 
4.5 Chapter Summary 
In this chapter, we have presented the design and implementa-
tion of a hardware-assisted G P P Logic Circuit Synthesizer (GP-
PLCS) prototype which uses a 4-LUT Multi-Logic-Unit Proces-
sor (MLP). The M L P uses a generic register machine architec-
ture which can represent any combinational logic circuits. More-
over, the architecture of the M L P is so simple that multiple 
MLPs can be placed in an F P G A . 
The hardware-assisted G P P L C S shows promising results in 
the speedup. With the help of hardware, GPPLCS achieves a 36-
time speedup at most in our tested problems. Furthermore, the 
speedup ratio increases with the number of tournament taken 
in solving the problems. It is particularly suitable for solving 
difficult problems. 
• End of chapter. 
Chapter 5 
Feasibility Study of Multi 
MLPs 
Although the circuits evolved by Genetic Parallel Programming 
based Logic Circuit Synthesizer (GPPLCS) are of good quali-
ties, it is computation intensive. As a result, implementation 
of G P P L C S in Field Programmable Gate Arrays (FPGAs) is 
proposed. The idea is to speed up the fitness evaluations. How-
ever, the current model is not suitable for the implementation. 
Two main components in GPPLCS, Evolution Engine (EE) and 
Multi Logic Unit Processor (MLP), are discovered either one is 
idle during the evolution. Thus, a Multi M L P Genetic Parallel 
Programming base Logic Circuit Synthesizer ( M M G P P L C S ) is 
proposed and presented for implementation in FPGAs in this 
chapter. Simulations are done to evaluate the effectiveness of 
our proposed architecture. 
This chapter is organized as follows. Section 5.1 gives our 
motivation. Then, our proposed architecture of M M G P P L C S 
is presented in Section 5.2. It is followed by experimental set-
tings in Section 5.3. Section 5.4 is the experimental result and 
evaluations. Finally, a chapter summary is found in Section 5.5. 
51 
CHAPTER 5. FEASIBILITY STUDY OF MULTI MLPS 52 
5.1 Motivation 
As introduced in the previous chapter, G P P L C S is a dual phase 
fitness suitable for evolving LookUp Table (LUT) based circuits. 
In the design phase, G P P L C S aims at finding a 100 % correct 
genetic program. Once it is found, G P P L C S proceeds to the 
optimization phase. Other factors such as lookup table (LUT) 
count and L U T level count are taken into consideration in the 
optimization phase. It is discovered that design phase occupies a 
large proportion of computation time during the whole evolution 
process. Thus, we would like to seek help from implementation 
of G P P L C S in FPGAs to speed up the whole evolution process 
especially in design phase. 
In GPPLCS, there are two steps which are always repeated. 
They are the fitness evaluation and breeding stages. During 
breeding stage, the current population is used to form a new 
population by selecting the better programs and using the breed-
ing operators such as crossover and mutation to propagate and 
modify the programs. It is held in the EE. The programs are 
then evaluated to measure how fit they are. The two stages are 
repeated until either a pre-determined number of generations 
have been processed or an individual meets a pre-determined 
level of fitness. This is done in the MLP. It is discovered either 
E E or M L P is idle at any time. Thus, direct implementation 
of this model in FPGAs does not maximize the benefits of the 
parallelism in FPGAs. 
W e propose an M M G P P L C S for implementation in FPGAs 
which is based on the pipeline concept in hardware design. Im-
plementing algorithmic parallelism, or pipelining, is a frequently 
used technique in hardware design that reduces the number of 
clock cycles needed to perform complex operations. The idea is 
to execute the fitness evaluation (held in the M L P ) in parallel 
with the breeding stages of G P P L C S (done in the EE). In this 
CHAPTER 5. FEASIBILITY STUDY OF MULTI MLPS 53 
MLP program in parallel assembly truth tabic 1. , 1 
r T T ： : x n r 
1 cxpcclcd output • 
_ J evaluated 
( ^ p u l a l i ^ ^ ^ ^ ^ ^ evaluate ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ t| 
I f咖 ss • ^—^ I / 
t \ 
GENETIC OPERATIONS:- | 广 genotype — ^ 4-LUl|— 
mutation, crossover, sclcction, ctc MLP progranv^^_ ^ ^ ^ ^ ^ ^ ^ ^ ^ I I | 
LOCAL SEARCH OPERATOR: i n d i v i d u a l s ^ ^ ^ ^ ^ ^ ^ 
DAOMap I • 
Evolution Engine (EE) MuUi-Logic-Unit Processors (MLPs) 
Figure 5.1: The system block diagram of MMGPPLCS 
way, both the M L P and the EE can be kept operating at full 
speed. 
5.2 Overall Architecture 
This section presents the design of the new architecture of GP-
PLCS for implementation in FPGAs. Figure 5.1 shows the block 
diagram of M M G P P L C S . It has one E E and several (up to n) 
MLPs (MLPl, MLP2, MLPn). The existence of several 
MLPs is to execute the fitness evaluation in parallel with the 
breeding operations. 
The breeding operations and fitness evaluation are the itera-
tive processes and their execution time are different. The time 
used in fitness evaluation is much longer than the one used in 
the breeding operation. Moreover, with the advance in FPGAs, 
it is possible to allow more MLPs within an F P G A . Thus, we 
propose to implement one EE and n MLPs in M M G P P L C S . 
E E can keep generating new children and then pass to MLPs 
for evaluation. Previous experiments done on circuits evolved 
by G P P show that the execution time of breeding operation in 
E E is 10 times faster than that of fitness evaluation in M L P 
including overhead. Thus, in our design, we employ 10 MLPs 
so that every children generated in E E can be evaluated in the 
CHAPTER 5. FEASIBILITY STUDY OF MULTI MLPS 54 
/ MEFIFO w 
” ，. Multi Logic Unit 
E ， _ n Processors 
Engine (MLPs) 
EMFIFO 
Figure 5.2: FIFO design 
M L P with no delay. 1:10 pipeline design can maximize the ad-
vantage of implementation in FPGAs. The algorithm can be 
found in Figure 5.3. 
In hardware design of the M M G P P L C S , we insert two FI-
FOs between EE and MLP, the EMFIFO and the MEFIFO. 
The purpose is to keep both E E and M L P running. For EE, 
they can keep evolving children from the population. Children 
is then placed in EMFIFO for fitness evaluation. Once one of 
the evaluation engines is ready, it can push one child from the 
EMFIFO for evaluation and place the fitness evaluation result 
in the MEFIFO. The process continues until a solution is found. 
Figure 5.2 shows our proposed design. In M M G P P L C S , evo-
lution are no longer based on up-to-date fitness evaluation of 
the population. Instead, cross breeding is among old and new 
generations as evaluation is done on different era child. Since it 
is different from the original flow of GPPLCS, a software sim-
ulation is necessary to evaluate the impact on GPPLCS. The 
simulation result is presented later. 
CHAPTER 5. FEASIBILITY STUDY OF MULTI MLPS 55 
5.3 Experimental settings 
Simulation of M M G P P L C S was done on six problems. W e first 
assume that ratio of execution time of the E E and that of M L P 
is 1 over 10. That means there are 10 MLPs and one EE in the 
M M G P P L C S . In our software simulation, there are 3 phases. 
First of all, it is the initialization. The first ten breeding opera-
tions without any fitness evaluations are done initially. This is 
to model the situation in the M M G P P L C S . Then, it comes to 
pipeline phase. A fitness evaluation is done on the first children 
generated. After the first fitness evaluation is done, the chil-
dren evaluated are determined whether it is discarded or not. 
If they are fitter than their parents, they replace their parents. 
As the breeding operation and fitness evaluation are expected 
to execute in parallel in this phase, the first fitness evaluation is 
followed by the eleventh breeding operations in our simulation. 
Indeed, we resumes original flow in the pipeline phase. That 
means a breeding operation is followed by a fitness evaluation. 
However, the fitness evaluation is not on the children which are 
just generated. Instead, the M L P evaluates the past children. 
The pipeline phase continues until a number of tournaments 
have been processed or an individual meets a pre-determined 
level of fitness (i.e. /办=0). The evolution is finished in the 
last phase. See Figure 5.3. 
The six problems are 2-bit full adder (ADD2), 6-bit com-
parator (CMP3), 4-to-l multiplexer (MUX6), 6-bit priority se-
lector (PSL6), 3-bit multiplier (MUL3) and 6-bit one's counter 
(0CN6). See Table 5.1. 
Note that the 6-bit priority selector is to show the position of 
value - 1 which first appears starting from the least significant 
bit in the 6-bit input. If none of the bits is set to value - 1, an 
extra output bit which shows the case of all zero value. Since we 
have got six input bits (Input5 - InputO), we need extra three 
CHAPTER 5. FEASIBILITY STUDY OF MULTI MLPS 56 
Table 5.1: Six combinational logic circuit problems used in the simulation. 
The Nin and Nout denote the numbers of inputs and outputs respectively. 
The Nrow (=2风n)denotes the number of rows in the truth tables . The Ncase 
(=Nrow X A^ out) denotes the total number of training cages . 
Name Description Njn Nput N聊 Ncase 
ADD2 2-bit full-adder 5 3 ^ % 
CMP3 3-bit comparator 6 3 64 192 
MUX6 6-bit multiplexer 6 1 64 64 
PSL6 6-bit priority selector 6 4 64 256 
MUL3 3-bit multiplier 6 6 64 384 
0CN6 6-bit one's counter 6 3 64 192 
bits to indicate the position. Therefore, there are 4-bit outputs. 
In addition, the 6-bit one's counter is to calculate the number 
of value - 1 in the 6-bit inputs. Therefore, it requires 3-bit to 
represent the number in the output. 
All experimental settings are listed in Table 5.2 below. Hav-
ing investigated the difficulties of the six benchmark problems 
shown in Table 5.1, we set the maximum program length to 25 
Pis. This provides enough sub-instructions (for both effective 
operations and introns) to evolve correct programs. Hence, at 
most 400 (25 by 16) operations can be used to build a solution. 
As introduced before, the design phase occupies the largest pro-
portion of execution time during evolution. Thus, experiments 
conducted in the design phase only are sufficient to show the 
effectiveness of the M M G P P L C S . 
W e have also tried the six problems on the GPPLCS. The GP-
PLCS adopts the same experimental settings as M M G P P L C S 
which are shown in Table 5.2. To ensure a fair comparison be-
tween M M G P P L C S and GPPLCS, all evolutions of combina-
tional logic circuits for the six combinational logic circuit prob-
lems were run on the same P C configuration (Pentium 4 C P U 
2.80GHz with 512 M B R A M ) with 20 independent runs. 
CHAPTER 5. FEASIBILITY STUDY OF MULTI MLPS 57 
Algorithm MMGPPLCS in simulation 
Input: Truth table of circuits 
Output: Circuits in 4-LUT format 
1. Initialize population 
2. Evaluate population 
3. Perform 10 breeding operations: 
4. Tournament selection, Bit Mutation with Pumut and PI crossover with 
Pxover 
5. Evaluate the first children 
6. if fchiidren > fparents 八 children + parents 
7. then 
8. Replace parents with children 
9. else 
10. Discard children 
11. Perform breeding operations: 
12. Tournament selection, Bit Mutation with Pbtmut and PI crossover with 
Pxover 
13. Evaluate children 
14. if fchiidren > fparents A children + parents 
15. then 
16. Replace parents with children 
17. else 
18. Discard children 
19. if t < tmax 
20. then 
21. if fdp > 0 
22. then 






Figure 5.3: Algorithm of MMGPPLCS in simulation 
CHAPTER 5. FEASIBILITY STUDY OF MULTI MLPS 58 
Table 5.2: Experimental settings used in MMGPPLCS and GPPLCS 
Design phase only 
maximum program length 25 parallel instructions (Pis) 
( - ^ m a x ) 
initialization bit random, average 12.5 (I/^ax/2) Pis 
selection method tournament (s ize: 10) 
4-LUT function set bOOOO, bFFFF, nop 
inputs ...尺31 
outputs outputs: Ro . . . fiNont-l 
constants logic 0, logic 1 
population size 2000 
t e r m i n a t i o n ( 力 4 0 , 0 0 0 , 0 0 0 tournaments 
PI crossover Prob. (Pxover) 0.1 
bit mutation Prob. {Pbtmut) 0.002 
Sub instruction (SI), swap- 0.0 
ping Prob. (Psiswp) 
SI. deletion Prob. {Psidei) 0.0 
Dynamic Sample Weight- 10,000 tournaments 
ing (DSW) (weights update 
freq.) 
preselection yes 
raw fitness the ratio of unsolved training cases (= 
1.0 + U ) 
success predicate all training cases solved {= 1.0 (i.e. 
/dp 二 0.0) 
CHAPTER 5. FEASIBILITY STUDY OF MULTI MLPS 59 
Table 5.3: Number of tournaments (xlO®) needed by MMGPPLCS and GP-
PLCS in design phase on six problems (Average value) 
Version ADD2 CMP3 MUX6 PSL6 MUL3 0CN6 
M M G P P L C S E t o 009 8 3 . 3 2 9 . 9 7 
GPPLCS 0.50 1.80 0.08 0.47 81.94 9.84 
5.4 Experimental results and evaluations 
The proposed M M G P P L C S neither improves nor worsens the 
evolution process. Table 4.4 shows the average number of tour-
naments required in evolving six circuits in both the M M G P -
PLCS and the GPPLCS. The number of tournaments are ex-
pressed in 106) order of magnitude. 
Our objective of the simulation is to prove the pipeline phase 
works. Although the M M G P P L C S does not decrease the num-
ber of tournaments used in the whole evolution process, the 
M M G P P L C S is a feasible model for implementation in FPGAs. 
This multi MLPs with one EE can keep both M L P and EE run-
ning without being idle. The performance of the M M G P P L C S 
is similar to that of the GPPLCS. This is critical to the success 
of the M M G P P L C S . Executing parallel fitness evaluation with 
breeding operators without increasing number of tournaments 
can be found during the whole evolution process. As a result, 
the M M G P P L C S is a suitable for implementation in FPGAs. 
5.5 Chapter Summary 
The proposed M M G P P L C S has been shown to be a feasible 
model for implementation in FPGAs. Simulation results show 
that M M G P P L C S does not increase the number of tournaments 
during evolution of circuits. Since the performance of the M M G P -
PLCS is nearly the same as that of GPPLCS, the G P P L C S can 
be benefited from a hardware implementation by adopting a 
CHAPTER 5. FEASIBILITY STUDY OF MULTI MLPS 60 
model like M M G P P L C S to significantly increase the evaluation 
speed by orders of magnitude as shown in next chapter. 
• End of chapter. 
Chapter 6 
A Hybridized GPPLCS 
Based on Genetic Parallel Programming (GPP) [43] paradigm 
and a deterministic local search operator - FlowMap [21], a 
logic circuit synthesizing system integrating Genetic Parallel 
Programming based Logic Circuit Synthesizer (GPPLCS) and 
FlowMap, a Hybridized G P P L C S is developed. To show the ef-
fectiveness of the proposed HGPPLCS, six combinational logic 
circuit problems are used for evaluations. Each problem is run 
for 50 times. Experimental results show that both the lookup 
table counts and the propagation delays of the circuits collected 
are better than those obtained by conventional design or evolved 
by G P P L C S alone. For example, in a 6-bit one counter exper-
iment, we obtained combinational digital circuits with 8 four-
input lookup tables in 2 L U T level on average. It utilizes 2 
lookup tables and 3 L U T levels less than circuits evolved by 
G P P L C S alone. 
This chapter is organized as follows. Our motivation can be 
found in Section 6.1. Section 6.2 presents HGPPLCS. Experi-
mental settings can be found in Section 6.3. Section 6.4 presents 
results and discussions. Finally, section 6.5 concludes our work. 
61 
CHAPTER 6. A HYBRIDIZED GPPLCS 62 
6.1 Motivation 
Although the qualities of evolved combinational digital circuits 
from G P P L C S are better than conventional designs, there is 
still room for improvement. Algorithms hybridize a non-genetic 
local search to refine the qualities of solutions with a genetic 
algorithm are called memetic algorithms [53]. This inspires an 
idea of using a local search operator in GPPLCS. Since G P P 
is population-based, it has a number of individuals (circuits) 
that performs the same function (i.e. many-to-one genotype-
phenotype mapping). Thus, G P P can provide a number of dif-
ferent circuits as inputs to the FlowMap algorithm. In this way, 
FlowMap can return different mapping solutions so that a bet-
ter solution can be obtained. Since FlowMap obtains a depth 
optimal mapping solutions when it is applied on 2-input lookup 
table (LUT) Boolean circuit, G P P L C S must first evolve circuits 
in 2-input L U T (2-LUT) and then relies on FlowMap to give 
a 4-LUT mapping solution. This new G P P L C S with a local 
search operator - FlowMap is the basic of our HGPPLCS. 
6.2 Overall system architecture 
FlowMap [21] is an LUT-based F P G A mapping algorithm for 
depth minimization guaranteeing depth-optimal mapping for 
a given input Boolean circuit. Since the working principle of 
FlowMap algorithm is not our focus, only a very brief descrip-
tion of FlowMap is given in this section. Details can be found 
in 2.1.1. A key step in FlowMap algorithm is to compute a 
minimum height K-feasible cut in a network, which is solved 
optimally in polynomial time based on network flow computa-
tion. FlowMap algorithm also effectively minimizes the number 
of LUTs by maximizing the volume of each cut and by several 
post-processing operations. It should be noted that FlowMap 
CHAPTER 6. A HYBRIDIZED GPPLCS 63 
GPP 
Evolution Eng ine- | | 1 2 " LUT 4 -LUT 
Genetic Operators: Crossover, Mult i - A L U Processor ( M A P ) - circuits eircmts 
Mutation etc Programs Evaluation | “ ~ ； ！ ！ | 
I j f^ F lowMap Optimization 
Figure 6.1: HGPPLCS 
gets a better mapping solution when a 2-LUT Boolean circuit 
is given as an input. As a result, it gives an opportunity of 
adopting FlowMap in GPPLCS. 
Since FlowMap will return a depth optimal mapping solution 
for a 2-LUT Boolean circuit input, and hence is a very suitable 
tool to help G P P L C S to locate the local optimum. Since GP-
PLCS can provide a population of 2-LUT Boolean circuits with 
same functionality, FlowMap can give a best mapping solution 
among all the mapping solutions. 
H G P P L C S first evolves 2-LUT Boolean circuits. Then it 
chooses the best one among the population of the 2-LUT Boolean 
circuits as the input for the FlowMap. The FlowMap generate 
a 4-LUT mapping solution (see Figure 6.1). The synergy effect 
of G P P L C S and FlowMap in H G P P L C S is well established that 
evolutionary algorithms are not well suited to fine tuning greedy 
local search in complex combinatorial spaces and that hybridiza-
tion with other techniques can greatly improve the efficiency of 
search [22, 23, 26，61]. FlowMap can be applied to significantly 
improve G P P L C S by obtaining the local optimal circuits effi-
ciently and effectively (see Figure 6.2). The population-based 
G P P L C S provides FlowMap with a group of diversified Boolean 
circuits with the same functionality which cannot be obtained by 
any deterministic algorithms. In this way, a global optimal cir-
cuit can be evolved with the aid of the efficient local and global 
search power efficiency from FlowMap and G P P respectively. 
CHAPTER 6. A HYBRIDIZED GPPLCS 64 
/ 广R^efmed b y \ \ 
/ / FlowMap \ \ 
\\ \ 
t � � � F o u n d by / \ 
� G P P L C S / 
Figure 6.2: FlowMap refines the fitness of individuals in GPPLCS 
6.3 Experimental settings 
H G P P L C S were evaluated on six problems the same used in 
Chapter 5. They are the 2-bit full adder (ADD2), 6-bit com-
parator (CMP3), 4-to-l multiplexer (MUX6), 6-bit priority se-
lector (PSL6), 3-bit multiplier (MUL3) and 6-bit one's counter 
(0CN6) (see Table 6.1). They are all benchmark Boolean prob-
lems that have been tried in other evolvable hardware approaches. 
All experimental settings are listed in Table 6.2 below. Hav-
ing investigated the difficulties of the six benchmark problems 
shown in Table 6.1, we set the maximum program length to 25 
Pis. This provides enough sub-instructions (for both effective 
operations and introns) to evolve correct programs. Hence, at 
most 400 (25 X 16) operations can be used to build a solution. 
It is important to note that, in the optimization stage, we force 
the system to optimize the size of the correct programs as much 
as possible. Thus, all runs terminate after 40,000,000 tourna-
ments which we believe it is large enough to evolve the circuits. 
Preliminary experiments have been done to show circuits can 
CHAPTER 6. A HYBRIDIZED GPPLCS 65 
Table 6.1: Six combinational logic circuit problems used in HGPPLCS. The 
Nin and Nout denote the numbers of inputs and outputs respectively. The 
Nrow (=2^'")denotes the number of rows in the truth tables . The Ncase 
{=Nrow X A^ out) denotes the total number of training cases . 
Name Description Nin Nout Nrow Ncase 
ADD2 2-bit full-adder 5 3 ^ ^ 
CMP3 3-bit comparator 6 3 64 192 
MUX6 6-bit multiplexer 6 1 64 64 
PSL6 6-bit priority selector 6 4 64 256 
MUL3 3-bit multiplier 6 6 64 384 
0CN6 6-bit one's counter 6 3 64 192 
be evolved at most 40,000,000 tournaments in our benchmark 
problems. 
In order to show the effectiveness of HGPPLCS, we tried the 
same six problems on G P P L C S and FlowMap. However, we 
have not compared with any evolvable hardware techniques like 
Cartesian G P due to the different circuits evolved. They are 
in boolean gate form (i.e., 2-LUT) while we are focusing on 4-
L U T circuits. G P P L C S adopts the same experimental settings 
as H G P P L C S which are shown in Table 6.2. To ensure a fair 
comparison between H G P P L C S and GPPLCS, all evolutions 
of combinational logic circuits for the six combinational logic 
circuit problems are run on the same P C configuration (Pentium 
4 C P U 2.80GHz with 512 M B R A M ) with 50 independent runs. 
In addition, circuits are also evolved by dual phase approach. 
The only difference is in the types of circuits evolved. G P P L C S 
evolves the circuits with 4-LUT while H G P P L C S evolves the 
2-LUT type. Since the difficulty for evolving 2-LUT and 4-
L U T Boolean circuits in each problem are different, numbers of 
tournaments are not compared in this paper. 
Results from the FlowMap algorithm are collected from the 
experiments which were run on U C L A R A S P F P G A / C P L D 
CHAPTER 6. A HYBRIDIZED GPPLCS 66 
Technology Mapping and Synthesis Package [1]. Firstly, we 
used the ESPRESSO [7] to optimize the truth tables of the six 
Boolean problems into optimal (or near optimal) sum of product 
(SOP) forms. Then the resulting SOP expressions were passed 
to produce 4-LUT networks with FlowMap algorithm. 
6.4 Experimental results and evaluations 
From the 50 runs of the six individual problems, it is shown 
that H G P P L C S evolved the best circuits among the three meth-
ods (HGPPLCS, G P P L C S and FlowMap). Table 6.3 shows the 
best circuits collected from the three methods and Table 6.4 
indicates the successful rate of evolving circuits in H G P P L C S 
and GPPLCS. Since FlowMap depends heavily on the given in-
put circuits, the mapping solution will not be of a good quality 
if the input circuits provided are in a bad form (e.g in SOP 
forms). As FlowMap is a deterministic algorithm, the map-
ping solutions are always the same regardless of the number of 
times it is tried. Thus, mapping results by FlowMap are not 
shown in the charts about comparison between H G P P L C S and 
GPPLCS. Fig. 6.3 is the average values of the circuits evolved 
(in terms of 4-LUT count and L U T level) collected in the 50 
independent run of H G P P L C S and G P P L C S while Fig. 6.4 
is the best circuit evolved in the 50 runs of H G P P L C S and 
GPPLCS. Obviously, H G P P L C S successfully improves the GP-
PLCS. On the six problems, both the average number of L U T 
count and L U T level in the circuits evolved from H G P P L C S are 
smaller than that from GPPLCS. H G P P L C S outperforms GP-
PLCS. The circuits evolved by the H G P P L C S are better than 
that by the GPPLCS. In the 3-bit comparator problem (CMP3), 
the best circuit evolved from H G P P L C S is 1 4-LUT and 1 L U T 
level less than the one from GPPLCS. The circuit is shown in 
Fig. 6.5. 
CHAPTER 6. A HYBRIDIZED GPPLCS 67 
Table 6.2: Experimental settings used in HGPPLCS 
both design and optimization phases 
maximum program length 25 parallel instructions (Pis) 
{Ljnax) 
initialization bit random, average 12.5 (-i/max/^) Pis 
selection method tournament (size二 10) 
4-LUT function set bOOOO, . . . , bFFFF, nop 
2-LUT function set bO, . . . , bF, nop 
inputs R32-Nin . •.只31 
outputs outputs： Rq . .. Rnoui-1 
constants logic 0, logic 1 
population size 2000 
termination(imaa：) 40,000,000 tournaments 
PI crossover Prob. {Pxover) 0.1 
design phase optimization phase 
bit mutation Prob. {Pbtmut) 0.002 0.0 
Sub instruction (SI), swap- 0.0 0.5 
ping Prob.(尸s—p) 
SI. deletion Prob. (Psidei) 0.0 0.1 
Dynamic Sample Weight- 10,000 tournaments -
ing (DSW) (weights update 
freq.) 
preselection yes -
raw fitness the ratio of unsolved the ratio of LUT level 
training cases ( = 1 . 0 & LUT count (= /叩) 
+ fdp) 
success predicate all training cases optimize as much as 
solved {= 1.0 (i.e. possible (i.e. /叩 < 0) 
fdp=0.0) 
CHAPTER 6. A HYBRIDIZED GPPLCS 68 
Table 6.3: Best circuits collected from HGPPLCS, GPPLCS and FlowMap 
algorithm on six problems 
Version Type ADD2 CMP3 MUX6 PSL6 MUL3 0CN6 
HGPPLCS L U T 4 5 2 5 15 7 
Level 2 2 2 2 3 2 
GPPLCS L U T 4 6 2 5 15 6 
Level 2 3 2 3 4 3 
FlowMap L U T 1 6 S 3 U ^ 1 1 3 ^ 
Level 3 3 2 3 3 4 
Table 6.4: Successful rate of evolving circuit problems in HGPPLCS and 
GPPLCS 
Version ADD2 CMP3 MUX6 PSL6 MUL3 0CN6 
HGPPLCS 100% 100% 100% 100% 5 0 % 5 8 % 
GPPLCS 100% 100% 100% 100% 54% 100% 
It is found that the circuits evolved from H G P P L C S may have 
a greater number of 4-LUT count than the ones from GPPLCS. 
In the 6-bit one's counter problem (0CN6), although the best 
circuit evolved from H G P P L C S is 1 L U T level less than the one 
from GPPLCS, it utilizes 1 4-LUT more. The reason lies on the 
FlowMap algorithm. Since FlowMap only guarantees a depth 
optimal mapping solution on a given input circuit, the number 
of 4-LUT of the solution may not be smaller than the circuit 
found in GPPLCS. However, the depth of the circuit is always 
the smallest. 
H G P P L C S shows a perfect synergy between G P P L C S and 
FlowMap. The population based G P P L C S provides FlowMap 
with a group of diversified Boolean circuit with the same func-
tionality while FlowMap returns a better mapping solutions 
than GPPLCS. 
The successful rate of the H G P P L C S and the G P P L C S are 
nearly the same. From the rate shown in Table 6.4, it is found 
that the M U L 3 and 0 C N 6 problems are more difficult than 
CHAPTER 6. A HYBRIDIZED GPPLCS 69 





‘ ：i I • a v g Gate by H G P P L C S 
査 15 ； --—I n, avg Gate by G P P L C S 
I ‘ • avg Level by H G P P L C S 
1° ! J 1 I • a v g Level by G P P L C S 
A D D 2 C M P 3 MUX6 P S L 6 MUL3 0 C N 6 
Six Problems 
Figure 6.3: Average number of 4-LUT count and LUT level collected from 
HGPPLCS and GPPLCS on the six problems in 50 runs 
Best circuits (in terms of number of 4-LUT count and gate levels) 
collected in 50 independent run 
14 - —— 
^ 10 - - a best Gate by HGPPLCS 
I g [ED best Gate by G P P L C S 
^ - i • b e s t Level by H G P P L C S 
6 J | l l ^ I • best Level by G P P L C S | 
4丨丨 fc.lLhlhh 
A D D 2 C M P 3 MUX6 P S L 6 MUL3 0 C N 6 
Six Problems 
Figure 6.4: Best number of 4-LUT and LUT level collected from HGPPLCS 
and GPPLCS on the six problems in 50 runs 
o H ^ 
A 2 [ r 2 6 ] I n m Q i = ^ ^ A = B [ r 0 1 ] 
A l [ r 2 7 ] c = 5 玄 O 
B 2 [ r 2 9 ] i 
B l [ r 3 0 ] c = ^ f - \ CTi 
‘ ~ a ' = > A > B [ r i ) 2 ] 
§ "7= 
m 
一 T l 
^ 5E 
A 0 [ r 2 8 ] ! = = > — > ( z = : = > A < B [ r i ) 0 ] 
B 0 [ r 3 1 ] L Z S 
Figure 6.5: The best 3-bit comparator evolved by the HGPPLCS 
CHAPTER 6. A HYBRIDIZED GPPLCS 70 
others. As H G P P L C S first evolves circuit in 2-LUT form and 
then relies on FlowMap to give a 4-LUT mapping, the searching 
space in H G P P L C S is much larger than those in G P P L C S which 
evolves 4-LUT instead. Thus, it is expected that the successful 
rate of H G P P L C S is lower or equal to that of GPPLCS. 
6.5 Chapter Summary 
In this chapter, we have presented a Hybridized Genetic Parallel 
Programming based Logic Circuit Synthesizer (HGPPLCS). It 
makes use of a Genetic Parallel Programming based Logic Cir-
cuit Synthesizer (GPPLCS) and the FlowMap algorithm. HGP-
PLCS applies a dual phase approach to evolve a 2-LUT circuit. 
Then the circuit is passed to FlowMap for further optimiza-
tion. Finally, FlowMap returns a depth optimal mapping so-
lution based on the given input circuit. Experimental results 
show that H G P P L C S improves the performance of G P P L C S in 
terms of qualities of circuits. The qualities of evolved circuits 
are the best among the three methods (HGPPLCS, G P P L C S 
and FlowMap). 
• End of chapter. 
Chapter 7 
A Memetic GPPLCS 
By including a deterministic local search operator - D A O M a p 
13] in Genetic Parallel Programming (GPP), a Memetic G P P 
based Logic Circuit Synthesizer (MGPPLCS) is developed. To 
show the effectiveness of the proposed M G P P L C S , six combi-
national logic circuit problems are used for evaluations. Each 
problem is run for 20 times. Experimental results show that 
M G P P L C S is both more efficient and effective than GPP. On 
average, M G P P L C S requires 1 order of magnitude fewer evalua-
tions to identify higher quality solutions. Both the lookup table 
counts and the propagation delays of the circuits collected are 
better than those obtained by conventional design or evolved 
by G P P alone. For example, in a 6-bit priority selector exper-
iment, we evolved combinational digital circuits with 5.1 four-
input lookup tables in 2 L U T level on average. It utilizes 2 
lookup tables and 1 L U T levels less than circuits evolved by 
G P P L C S alone. 
This chapter is organized as follows. Section 7.1 gives our 
motivation. M G P P L C S is presented in Section 7.2. The ex-
perimental settings can be found in Section 7.3. It is followed 
by experimental results and evaluations in Section 7.4. Finally, 
Section 7.5 is a chapter summary. 
71 
i 
CHAPTER 7. A MEMETIC GPPLCS 72 
7.1 Motivation 
Evolutionary Algorithms (EAs) are a class of search and opti-
mization techniques that work on a principle inspired by nature: 
Darwinian Evolution. It is well established that hybridization 
with other techniques in EAs can greatly improve the efficiency 
of search. Algorithms hybridize a non-genetic local search to re-
fine the qualities of solutions with a genetic algorithm are called 
memetic algorithms [53]. This inspires the idea of using a de-
terministic local search operator in GPPLCS. 
D A O M a p algorithm [13] proposed by Prof. Jason Cong is a 
technology mapping algorithm for depth minimization in lookup 
table (LUT)-based F P G A designs, which is optimum for any K-
bounded Boolean network. D A O M a p can return a depth opti-
mal mapping solution with possible area optimization based on 
a given Boolean circuit. Thus, D A O M a p is an ideal local search 
operator for G P P so that it can improve G P P in both efficiency 
and effectiveness. Any individuals found in G P P can be refined 
by D A O M a p . A large number of evaluations can be saved to 
locate optima. Moreover, D A O M a p can force G P P to explore 
more optima by recording the previous optima found. This new 
G P P L C S with a local search operator - D A O M a p becomes the 
M G P P L C S . 
7.2 Overall system architecture 
Based on GPPLCS, a combinational logic circuit design system, 
M G P P L C S is developed. Basically, the architecture of G P P L C S 
and M G P P L C S are the same. The difference is the application 
of local search operator - D A O M a p in M G P P L C S . The core of 
the M G P P L C S system consists of an Evolution Engine (EE) and 
M L P (see Fig. 7.1). EE manipulates the genetic parallel pro-
grams and performs genetic and local search operations. M L P 
CHAPTER 7. A MEMETIC GPPLCS 73 
MLP program in parallel assembly truth labic 
r Z J Z J n r n 
I c x p c c t c d output • 
J evaluated 
.""""^N I I outputs 2 
( popu l a t i om ^ ^ ^ ^ ^ ^ evaluate ^ ^ ^ ^ ^ ^ H B ^ H ^ ^ ^ H H B H H B H B B B I 召 • • 
^ r ^ I I——I I 
t I “ “ 4-LU-f-» 钱 
GENETIC OPERATIONS:- | genotype 4 - L U l l — ^ 
mulalion, crossover, sclcclioii, clc M L P programN^_ ^ ^ ^ ^ ^ ^ ^ ^ I I 
LOCAL SEARCH OPERATOR: i n d i v i d u a l s ^ ^ ^ ^ ^ ^ ^ ^ 
DAOMap I 
Evolution Engine (EH) Mulli-Logic-Unii Processor (MLP) 
Figure 7.1: The system block diagram of MGPPLCS 
is responsible for the genetic parallel programs evaluation. 
Similar to GPPLCS, all combinational digital circuits are 
evolved by a dual phase (i.e. design and optimization phases) 
approach. Different sets of genetic operators including crossover, 
bit mutation and sub-instruction swapping are used in different 
stages. In the design phase, the M G P P L C S system aims at 
finding a 100% functional program (correct program). The raw 
fitness is given by the ratio of unsolved training cases. In the op-
timization phase, the raw fitness then put emphasis on the L U T 
count, the propagation delay and the program length. In other 
words, the major objective of the optimization stage is reducing 
the L U T count and then the propagation delay. Obviously, we 
apply our local search operator in this phase. D A O M a p can be 
applied to significantly improve M G P P L C S by obtaining the lo-
cal optimal circuits efficiently and effectively (see Fig.7.2). The 
population-based M G P P L C S provides D A O M a p with a group of 
diversified Boolean circuits with same functionality which can-
not be obtained by any deterministic algorithms while D A O M a p 
returns the refined individuals (optima). In this way, a global 
optimal circuit can be evolved with efficiency and global search 
power from D A O M a p and E A respectively. 
However, it should be noted that refined individuals are not 
put back to the population. Since any introns will be removed 
CHAPTER 7. A MEMETIC GPPLCS 74 
O 
个/Refined b X ^ 
J/Refinedby \ \ [ � � \\ 
/ / DAOMap \ \ 、、、Found by \ 
/ / \ \ 'MGPPLCS、 
H X \ 
C、、--、Found by / 
MGPPLCS 
W 
Figure 7.2: DAOMap refines the fitness of individuals in GPPLCS 
after the refinement, refined individuals would not benefit evolu-
tion. To some extent, refined individuals in a population would 
dominate and let G P P trap in a local optima. As a result, re-
fined individuals are not placed in the population. 
Instead, refined individuals serve as a similarity measure. Re-
fined individuals (optima) are recorded in terms of number of 
LUTs and LUT's level. Any newly evolved individuals will re-
tain only when it is different from the previous recorded optima 
in there two values. In each tournament, the M G P P L C S gener-
ates two new genetic programs (children). They are refined by 
D A O M a p . If a child is structurally equivalent to any of opti-
m u m found before (number of LUTs and LUT's level are found 
in the list), it will be discarded. This is to maintain a reasonable 
diversity in the search. In this way, it serves as a diversity mea-
sure and M G P P L C S can then be forced to evolve new optima. 
See Figure 7.3. 
CHAPTER 7. A MEMETIC GPPLCS 75 
Algorithm MGPPLCS 
Input: Truth table of circuits 
Output: Circuits in 4-LUT format 
1. Initialize population 
2. Evaluate population 
3. if fdp > 0 / * design phase * / 
4. then 
5. Perform breeding operations: 
6. Tournament selection, Bit Mutation with Pumut and PI 
crossover with Pxover 
7. else / * optimization phase * / 
8. Perform breeding operations: 
9. Tournament selection, SI swapping with Psiswp^  SI deletion with 
Psidel and PI crossover with Pxover 
10. Optimize circuits with D AO Map 
11. Evaluate children 
12. if fchildren > fparents 八 children ^ parents 
13. then 
14. Replace parents with children 
15. else 
16. Discard children 
17. if t < trnax 
18. then 
19. if Design phase 八/办 > 0 
20. then 
21. GOTO Step 3 
22. else 
23. if Optimization phase 
24. then 
25. GOTO Step 3 
26. else 




Figure 7.3: Algorithm of MGPPLCS 
CHAPTER 7. A MEMETIC GPPLCS 76 
Table 7.1: Six combinational logic circuit problems used in MGPPLCS. The 
Nin and Nout denote the numbers of inputs and outputs respectively. The 
Nrow denotes the number of rows in the truth tables . The Ncase 
(=Nrow X A^out)denotes the total number of training cases . 
Name Description Nin Nout Nrow Ncase 
ADD2 2-bit full-adder 5 3 ^ S 
CMP3 3-bit comparator 6 3 64 192 
MUX6 6-bit multiplexer 6 1 64 64 
PSL6 6-bit priority selector 6 4 64 256 
MUL3 3-bit multiplier 6 6 64 384 
0CN6 6-bit one's counter 6 3 64 192 
7.3 Experimental settings 
M G P P L C S was evaluated on the same six problems as in Chap-
ters 5 and 6 . They are 2-bit full adder (ADD2), 6-bit com-
parator (CMP3), 4-to-l multiplexer (MUX6), 6-bit priority se-
lector (PSL6), 3-bit multiplier (MUL3) and 6-bit one's counter 
(0CN6). (see Table 7.1). 
All experimental settings are listed in Table 7.2 below. Hav-
ing investigated the difficulties of the six benchmark problems 
shown in Table 7.1, we set the maximum program length to 25 
Pis. This provides enough sub-instructions (for both effective 
operations and introns) to evolve correct programs. Hence, at 
most 400 (25 X 16) operations can be used to build a solution. 
Noticeably, in the optimization stage, we force the system to 
optimize the size of the correct programs as much as possible. 
Thus, all runs terminate after 40,000,000 tournaments. 
In order to show the effectiveness of M G P P L C S , we tried 
the six problems on GPPLCS, D A O M a p and FlowMap. The 
G P P L C S adopt the same experimental settings as M G P P L C S 
which is shown in Table 7.2 except all runs terminate after 
40,000,000 tournaments. Moreover, no local search operator 
will be used in GPPLCS. To ensure a fair comparison between 
CHAPTER 7. A MEMETIC GPPLCS 77 
M G P P L C S and GPPLCS, all evolutions of combinational logic 
circuits for the six combinational logic circuit problems are run 
on the same P C configuration (Pentium 4 C P U 2.80GHz with 
512 M B R A M ) with 20 independent runs. 
Results from D A O M a p and FlowMap algorithm are collected 
from the experiments which were run on U C L A R A S P F P G A / C P L D 
Technology Mapping and Synthesis Package [1]. Firstly, we 
used the ESPRESSO [7] to optimize the truth tables of the six 
Boolean problems into optimal (or near optimal) sum of product 
(SOP) forms. Then the resulting SOP expressions were passed 
to produce 4-input L U T networks with the D A O M a p algorithm 
as well as FlowMap algorithm. 
7.4 Experimental results and evaluations 
From the 20 runs of the six individual problems, it is shown that 
M G P P L C S evolved the best circuits among the four methods 
(MGPPLCS, GPPLCS, D A O M a p and FlowMap). Table 7.3 
shows the best circuits collected from the four methods while 
Table 7.4 shows the average value. Please note that all run are 
successful. That means we can evolve solutions in every run. 
Since D A O M a p and FlowMap are deterministic algorithms, the 
mapping solutions are always the same regardless of the number 
of times it is tried. Thus, the result will be the same in both 
tables. 
It is shown that M G P P L C S and G P P L C S outperform D A O M a p 
and FlowMap since they depend heavily on the given input cir-
cuits. The mapping solution will not be of a good quality if the 
input circuits provided are in a bad form (e.g. in SOP forms). 
Obviously, the M G P P L C S successfully improves the GPPLCS. 
On the six problems, both the average number of L U T count and 
LUT's level in the circuits evolved from M G P P L C S are smaller 
than that from GPPLCS. Moreover, the number of tournaments 
CHAPTER 7. A MEMETIC GPPLCS 78 
Table 7.2: Experimental settings used in MGPPLCS 
both design and optimization phases 
maximum program length 25 parallel instructions (Pis) 
(Ljfiax) 
initialization bit random, average 12.5 {Lmax/‘^) Pis 
selection method tournament (size二 10) 
4-input LUT function set bOOOO, . . . , bFFFF, nop 
i n p u t s Ii32-Nin . . .丑 3 1 
outputs outputs： Rq . . . RNoui-I 
constants logic 0，logic 1 
population size 2000 
termination(i^ax) 40,000,000 tournaments 
PI crossover Prob. (Pxover) 0.1 
design phase optimization phase 
bit mutation Prob. (Pbtmut) 0.002 0.0 
Sub instruction (SI), swap- 0.0 0.5 
ping Prob. (Psiswp) 
SI. deletion Prob. (P—ei) 0.0 0.1 
DAOMap local search - yes 
Dynamic Sample Weight- 10,000 tournaments -
ing (DSW) (weights update 
freq.) 
preselection yes -
raw fitness the ratio of unsolved the ratio of LUT level 
training cases ( = 1 . 0 & LUT count (= /叩) 
+ fdp) 
success predicate all training cases optimize as much as 
solved (= 1.0 (i.e. possible (i.e. /叩 < 0) 
fdp=0.0) 
CHAPTER 7. A MEMETIC GPPLCS 79 
Table 7.3: Best circuits collected from MGPPLCS, GPPLCS, DAOMap and 
FlowMap algorithm on six problems 
Version T ^ ADD2 CMP3 MUX6 PSL6 MUL3 0CN6 
MGPPLCS LUT 4 5 2 5 15 6 
Level 2 2 2 2 3 2 
Tournament (x 10^) 0.097 0.17 0.29 0.12 6.01 5.89 
GPPLCS LUT 4 6 2 6 15 6 
Level 2 3 2 3 4 3 
Tournament (x 10®) 8.72 4.81 5.05 4.51 99.40 13.39 
D A O M a p L U T W ^ 3 10 ^ 118 
Level 3 3 2 3 3 4 
FlowMap UJT 16 ^ 3 i T ^ 113 
Level 2 3 2 3 4 3 
Table 7.4: Circuits collected from MGPPLCS, GPPLCS, DAOMap and 
FlowMap on six problems (Average value) 
Version Type ADD2 CMP3 MUX6 PSL6 MUL3 O C T ^ 
MGPPLCS lAJT ^ 6.45 
Level 2 2.1 2.2 2 3.25 2.2 
Tournament (x 10^) 0.18 0.72 0.70 0.46 9.32 8.52 
GPPLCS LUT O ^ 4 1 m O 
Level 3 4.25 2.85 3 4.85 3 
Tournament (x 10^) 8.63 4.42 5.10 4.30 98.94 12.84 
DAOMap LOT ^ ^ 3 10 ^ 118 
(Deterministic Level 3 3 2 3 3 4 
algorithm) 
FlowMap LUT 16 ^ 3 11 ^ m ^ 
(Deterministic Level 2 3 2 3 4 3 
algorithm) 
CHAPTER 7. A MEMETIC GPPLCS 80 
I l [ r 3 0 ] r ~ ~ 
S l [ r 2 6 ] c = > S 
I3[r28] c i i ^ . ^ ~ ] 
S0[r27] 1 = | _ _ J 
I0[,-31] = I ^ 
I2[ i29] t = > 
Figure 7.4: 6-bit multiplexer evolved by the MGPPLCS 
used in M G P P L C S are always smaller than that in G P P L C S 
by 1 order of magnitude. Although M G P P L C S may not al-
ways get a better circuit than GPPLCS in all six problems, 
M G P P L C S performs well on average. In the 3-bit comparator 
problem (CMP3), the best circuit evolved from M G P P L C S is 1 
4-LUT and 1 L U T level less than the one from G P P L C S and so 
does the case in 6-bit priority selector (PSL6). Figure 7.4 shows 
the 3-bit multiplier. 
M G P P L C S shows a perfect synergy between G P P L C S and 
D A O M a p . The population based G P P L C S provides D A O M a p 
with a group of diversified Boolean circuits with the same func-
tionality while D A O M a p returns a better mapping solutions 
than GPPLCS. 
7.5 Chapter Summary 
In this chapter, we have presented a Memetic Genetic Parallel 
Programming Logic Circuit Synthesizer (MGPPLCS). It makes 
use of a Genetic Parallel Programming Logic Circuit Synthesizer 
(GPPLCS) and D A O M a p algorithm. M G P P L C S applies a two-
stage approach to evolve a 2-LUT circuit. During the second 
stage, a local search operator - D A O M a p is applied to refine in-
dividuals. Experimental results show that M G P P L C S improves 
the performance of GPPLCS. The qualities of evolved circuits 
are the best among the three methods (MGPPLCS, G P P L C S 
and DAOMap). 
CHAPTER 7. A MEMETIC GPPLCS 81 
• End of chapter. 
Chapter 8 
Conclusion 
This thesis has presented a novel Genetic Parallel Programming 
based Logic Circuit Synthesizer (GPPLCS) designed for tackling 
technology mapping problems in the automatic logic circuit syn-
thesis optimization. It consists of two core components, an Evo-
lution Engine (EE) and a Multi Logic Unit Processor (MLP). 
The E E is responsible for the genetic operations, the control 
strategies and the application specific processes. The M L P is 
a general-purpose, multiple instruction-streams multiple data-
streams (MIMD) register machine which is implementable on 
modern commercial Field Programmable Gate Arrays (FPGAs). 
G P P evolves genetic programs in a specific parallel format (MLP 
programs). 
Four improvements have been proposed and implemented to 
improve the GPPLCS. In Chapter 4, a hardware design and im-
plementation of a Multi Logic Unit Processor (MLP) has been 
shown. In order to execute parallel genetic programs for fitness 
evaluation in hardware, the hardware based M L P has been pro-
posed and implemented. Experimental results show that evolv-
ing combinational logic circuits can be sped up with a cooper-
ation of software version E E and the hardware MLP. Speedup 
ratios varied from 10 to 36 are obtained in the hardware-assisted 
G P P L C S compared with the pure software version GPPLCS. 
In Chapter 5, a new model of cooperation between multi M L P 
82 
CHAPTER 8. CONCLUSION 83 
and EE have been proposed. This new architecture of GPPLCS 
( M M G P P L C S ) is designed for optimal logic circuit synthesis in 
FPGAs. It has one EE and several MLPs. Simulation results 
show that the performance of M M G P P L C S is nearly the same 
as that of the current G P P L C S in terms of the number of tour-
naments but expecting time for each tournament can be reduced 
significantly. 
In Chapter 6, a Hybridized GPPLCS (HGPPLCS) has been 
presented. By integrating the G P P L C S and the FlowMap al-
gorithm, better circuits can be found. W e first evolve circuits 
in 2-input lookup table (2-LUT) and rely on FlowMap to give 
circuits with a 4-LUT format. Experimental results show that 
both the lookup table counts and the propagation delays of the 
circuits collected are better than those obtained by conventional 
design or evolved by G P P L C S alone. 
W e have gone one step further in Chapter 7. A novel Memetic 
G P P L C S (MGPPLCS) has been proposed and implemented. 
D A O M a p is included in G P P L C S as a non-genetic local search 
operator. It is shown that better circuits with smaller number of 
LUTs and shorter propagation delay are evolved with a smaller 
number of tournaments. 
8.1 Future work 
This work can be improved or extended in two main directions. 
With the success of M M G P P L C S , a hardware implementa-
tion of G P P L C S is a feasible way to speed up the evolution 
process. A full-scale hardware based G P P L C S system can be 
implemented in the latest FPGAs for speeding up design phase. 
The increased clock rates (550 MHz) in the latest generation of 
F P G A , Virtex-5 compared with Virtex-E (133 MHz) in Pilchard 
enable us to achieve a faster hardware design of MLP. Since 
we have already got a hardware implemented MLP, we need to 
CHAPTER 8. CONCLUSION 84 
design and implement a hardware evolution engine to perform 
genetic operations. 
Both H G P P L C S and M G P P L C S give us a possibility to solve 
some benchmark problems in technology mapping problems. In 
the current moment, it takes a few hours to evolve a solution 
program for difficult problems. With the speedup in both design 
phase and optimization phase, some large scale real life problems 
such as five-input X O R function in M C N C benchmark problems 
can be solved. 
• End of chapter. 
Bibliography 
:1] UCLA RASP FPGA/CPLD Technol-
ogy Mapping and Synthesis Package. 
http://ballade.cs.ucla.edu/software_release/rasp/htdocs/. 
2] Virtex E Platform FPGAs: Introduction and Overview, Xil-
inx,Inc. 2002. 
3] M . Abd-El-Barr, S. M . Sait, B. A. B. Sarif, and U. Al-Saiari. 
A modified ant colony algorithm for evolutionary design 
of digital circuits. In R. Sarker, R. Reynolds, H. Abbass, 
K. C. Tan, B. McKay, D. Essam, and T. Gedeon, editors, 
Proceedings of the 2003 Congress on Evolutionary Compu-
tation CEC2003, pages 708-715, Canberra, 8-12 Dec. 2003. 
IEEE Press. 
4] A. H. Aguirre, B. P. Buckles, and C. A. C. Coello. A genetic 
programming approach to logic function synthesis by means 
of multiplexers. In Evolvable Hardware, pages 46-53. IEEE 
Computer Society, 1999. 
5] P. J. Angeline. Two self-adaptive crossover operators for 
genetic programming. In P. J. Angeline and K. E. Kinnear, 
Jr., editors, Advances in Genetic Programming 忍,chapter 5, 
pages 89-110. MIT Press, Cambridge, M A , USA, 1996. 
6] W . Banzhaf, P. Nordin, R. E. Keller, and F. D. Fran-
cone. Genetic Programming An Introduction: On the 
85 
BIBLIOGRAPHY 86 
Automatic Evolution of Computer Programs and Its Appli-
cations. Heidelberg and San Francisco CA, resp., 1998. 
•7] R. K. Brayton, G. D. Hachtel, C. T. McCullen, and A. L. 
Sangiovanni-Vincentelli. Logic Minimization Algorithms for 
VLSI Synthesis. Kluwer. 
8] S. M . Cheang, K. H. Lee, and K. S. Leung. Applying sam-
ple weighting methods to genetic parallel programming. In 
R. Sarker, R. Reynolds, H. Abbass, K. C. Tan, B. McKay, 
D. Essam, and T. Gedeon, editors, Proceedings of the 2003 
Congress on Evolutionary Computation CEC2003, pages 
928-935, Canberra, 8-12 Dec. 2003. IEEE Press. 
9] S. M . Cheang, K. H. Lee, and K. S. Leung. Evolving data 
classification programs using genetic parallel programming. 
In R. Sarker, R. Reynolds, H. Abbass, K. C. Tan, B. McKay, 
D. Essam, and T. Gedeon, editors, Proceedings of the 2003 
Congress on Evolutionary Computation CEC2003, pages 
248-255, Canberra, 8-12 Dec. 2003. IEEE Press. 
10] S. M . Cheang, K. H. Lee, and K. S. Leung. Designing 
optimal combinational digital circuits using a multiple logic 
unit processor. In M . Keijzer, U.-M. O'Reilly, S. M . Lucas, 
E. Costa, and T. Soule, editors, Genetic Programming 7th 
European Conference, EuroGP 2004, Proceedings, volume 
3003 of LNCS, pages 23-34, Coimbra, Portugal, 5-7 Apr. 
2004. Springer-Verlag. 
11] S. M . Cheang, K. H. Lee, and K. S. Leung. Use of genetic 
parallel programming to design multi-output combinational 
logic circuits. In 2nd Intl. Conf. Artificial Intelligence in 
Engineering and Technology (ICAIET'2004), Proceedings, 
pages 828-835, 2004. 
BIBLIOGRAPHY 87 
12] S. M . Cheang, K. S. Leung, and K. H. Lee. Genetic parallel 
programming: Design and implementation. Evolutionary 
Computation, pages 129-156, 2006. 
13] D. Chen and J. Cong. DAOmap: a depth-optimal area opti-
mization mapping algorithm for F P G A designs. In ICC AD, 
pages 752-759, 2004. 
14] K.-C. Chen, J. Cong, Y. Ding, A. B. Kahng, and P. Traj-
mar. DAG-map: Graph-based F P G A technology mapping 
for delay optimization. IEEE Design & Test of Computers, 
9(3):7-20, 1992. 
15] C. A. Coello, A. D. Christiansen, and A. H. Aguirre. Us-
ing genetic algorithms to design combinational digital cir-
cuits. In Smart Engineering Systems: Neural Networks, 
Fuzzy Logic and Evolutionary Programming, pages 391-396, 
1996. 
16] C. A. Coello, A. D. Christiansen, and A. H. Aguirre. Auto-
mated design of combinational logic circuits using genetic 
algorithms. In Int. Conf. Artificial Neural Nets and Genetic 
Algorithms (ICANNGA97), pages 335-338, 1997. 
17] C. A. Coello, A. D. Christiansen, and A. H. Aguirre. Use 
of evolutionary techniques to automate the design of com-
binational circuits. In Int. J. Smart Engineering System 
Design, pages 299-314, 2000. 
18] C. A. C. Coello, E. Alba, G. Luque, and A. H. Aguirre. 
Comparing different serial and parallel heuristics to design 
combinational logic circuits. In Evolvable Hardware, pages 
3-12. IEEE Computer Society, 2003. 
19] C. A. C. Coello, E. H. Luna, and A. H. Aguirre. Use of 
particle swarm optimization to design combinational logic 
BIBLIOGRAPHY 88 
circuits. In A. M . Tyrrell, P. C. Haddow, and J. Torre-
sen, editors, Evolvable Systems: From Biology to Hardware, 
Fifth International Conference, ICES 2003, volume 2606 
of LNCS, pages 398-409, Trondheim, Norway, 17-20 Mar. 
2003. Springer-Verlag. 
20] C. A. C. Coello, R. L. Zavala, B. M . Garcia, and A. H. 
Aguirre. Ant colony system for the design of combinational 
logic circuits. 2000. 
21] J. Cong and Y. Ding. Flowmap: An optimal technology 
mapping algorithm for delay optimization in lookup-table 
based F P G A designs. Jan. 13 1994. 
'22] J. C. Culberson. On the futility of blind search: An algo-
rithmic view of "No free lunch". Evolutionary Computation, 
6(2):109-127, 1998. 
23] L. Davis, editor. Handbook of Genetic Algorithms. Van 
Nostrand Reingold, 1991. 
24] M . Dorigo and G. Di Caro. The ant colony optimization 
meta-heuristic. In D. Corne, M . Dorigo, and F. Glover, 
editors, New Ideas in Optimization, pages 11-32. McGraw-
Hill, London, 1999. 
25] D. E. Goldberg, K. Deb, H. Kargupta, and G. Harik. Rapid, 
accurate optimization of difficult problems using fast messy 
genetic algorithms. In Proceedings of the Fifth International 
Conference on Genetic Algorithms, pages 56-64, San Ma-
teo, CA, 1993. Morgan Kaufman. 
26] D. E. Goldberg and S. Vossner. Optimizing global-local 
search hybrids. In GECCO, pages 220-228, 1999. 
BIBLIOGRAPHY 89 
27] H. Hemmi, J. Mizoguchi, and K. Shimohara. Development 
and evolution of hardware behaviors. Lecture Notes in Com-
puter Science, 1062:250-265, 1996. 
28] M . I. Heywood and A. N. Zincir-Heywood. Register based 
genetic programming on F P G A computing platforms. In 
R. Poli, W . Banzhaf, W . B. Langdon, J. Miller, P. Nordin, 
and T . C. Fogarty, editors, Proceedings of the Third Euro-
pean Conference on Genetic Programming (EuroGP-2000), 
volume 1802 of LNCS, pages 44-59, Edinburgh, Scotland, 
2000. Springer Verlag. 
29] T. Higuchi, H. Iba, and B. Manderick. Evolvable hardware. 
In Massively Parallel Artificial Intelligence, pages 399-421. 
MIT Press, Combridege, M A , 1994. 
30] T. Higuchi, T. Niwa, T. Tanaka, H. Iba, H. de Garis, and 
T. Furuya. Evolving hardware with genetic learning: A 
first step toward building a darwin machine. In Proc. 2nd 
Int. Conf. Simulation Adaptive Behavior (SAB92), pages 
417-424, 1992. 
31] H. Hirsh, W . Banzhaf, J. R. Koza, C. Ryan, L. Spector, and 
C. Jacob. Genetic programming. IEEE Intelligent Systems, 
15(3):74-84, May-June 2000. 
32] J.-H. Hong and S.-B. Cho. MEH: Modular evolvable 
hardware for designing complex circuits. In R. Sarker, 
R. Reynolds, H. Abbass, K. C. Tan, B. McKay, D. Essam, 
and T. Gedeon, editors, Proceedings of the 2003 Congress 
on Evolutionary Computation CEC2003, pages 92-99, Can-
berra, 8-12 Dec. 2003. IEEE Press. 
33] H. Iba, M . Iwata, and T. Higuchi. Gate-level evolvable 
hardware: empirical study and application. In Evolution-
BIBLIOGRAPHY 90 
ary Algorithms in Engineering Applications, pages 259-276, 
1997. 
34] H. Iba, M . Iwata, and T. Higuchi. Machine learning ap-
proach to gate-level evolvable hardware. Lecture Notes in 
Computer Science, 1259:327-343, 1997. 
35] I. Kajitani, T. Hoshino, M . Iwata, and T. Higuchi. Variable 
length chromosome G A for evolvable hardware. In Inter-
national Conference on Evolutionary Computation, pages 
443-447, 1996. 
36] T. Kalganova. An extrinsic function-level evolvable hard-
ware approach. In R. Poli, W . Banzhaf, W . B. Langdon, 
J. F. Miller, P. Nordin, and T. C. Fogarty, editors, Genetic 
Programming, Proceedings of EuroGP'2000^ volume 1802 of 
LNCS, pages 60-75, Edinburgh, 15-16 Apr. 2000. Springer-
Verlag. 
37] T. Kalganova, J. F. Miller, and T. C. Fogarty. Some as-
pects of an evolvable hardware approach for multiple-valued 
combinational circuit design. In M . Sipper, D. Mange, and 
A . Perez-Uribe, editors, Evolvable Systems: From Biology 
to Hardware Second International Conference, ICES ,98, 
volume 1478 of LNCS, pages 78-89, Lausanne, Switzerland, 
Sept. 23-25 1998. Springer-Verlag. 
38] J. Kennedy and R. C. Eberhart. Particle swarm optimiza-
tion. In Proc. of the IEEE Int. Conf. on Neural Networks, 
pages 1942-1948, Piscataway, NJ, 1995. IEEE Service Cen-
ter. 
39] S. Kirkpatrick, C. D. Gelatt, and M . P. Vecchi. Optimiza-
tion by simulated annealing. Science, 220:671-680, 1983. 
BIBLIOGRAPHY 91 
40] J. R. Koza. Genetic Programming: On the Programming 
of Computers by Means of Natural Selection. M I T Press, 
Cambridge, M A , USA, 1992. 
41] J. R. Koza, F. H. Bennett III, J. L. Hutchings, S. L. Bade, 
M . A. Keane, and D. Andre. Rapidly reconfigurable field-
programmable gate arrays for accelerating fitness evaluation 
in genetic programming. In J. R. Koza, editor, Late Break-
ing Papers at the 1997 Genetic Programming Conference, 
pages 121-131, Stanford University, CA, USA, 13—16 July 
1997. Stanford Bookstore. 
42] P. Leong, M . Leong, O. Cheung, T. Tung, C. Kwok, 
M . Wong, and K. Lee. Pilchard - a reconfigurable com-
puting platform with memory slot interface. In 8th Annual 
IEEE Symposium on Field Programmable Custom Comput-
ing Machines, FCCM, 2001, 2001. 
43] K. S. Leung, K. H. Lee, and S. M . Cheang. Evolving 
parallel machine programs for a Multi-ALU processor. In 
D. B. Fogel, M . A. El-Sharkawi, X. Yao, G. Greenwood, 
H. Iba, P. Marrow, and M . Shackleton, editors, Proceed-
ings of the 2002 Congress on Evolutionary Computation 
CEC2002, pages 1703-1708. IEEE Press, 2002. 
44] K. S. Leung, K. H. Lee, and S. M . Cheang. Parallel pro-
grams are more evolvable than sequential programs. In E. C. 
C. Ryan, T. Soule, M . Keijzer, E. Tsang, R. Poli, editor, 
Proceedings of the Sixth European Conference on Genetic 
Programming (EuroGP-2003), volume 2610 of LNCS, pages 
107-118, Essex, U K , 2003. Springer Verlag. 
45] S. J. Louis. Genetic Algorithms as a Computational Tool 
for Design. PhD thesis, Department of Computer Science, 
Indiana University, Aug. 1993. 
BIBLIOGRAPHY 92 
46] S. J. Louis. Genetic learning for combinational logic design. 
Soft Com/put, 9(l):38-43, 2005. 
47] S. W . Mahfoud. Crowding and preselection revisited. Tech-
nical Report IlliGAL Report No 92004, University of Illi-
nois, Urbana, 1992. 
48] P. Martin. A pipelined hardware implementation of genetic 
programming using FPGAs and Handel-C. In J. A. Foster, 
E. Lutton, J. Miller, C. Ryan, and A. G. B. Tettamanzi, 
editors, Genetic Programming, Proceedings of the 5th Eu-
ropean Conference, EuroGP 2002, volume 2278 of LNCS, 
pages 1-12, Kinsale, Ireland, 3-5 Apr. 2002. Springer-
Verlag. 
'49] J. F. Miller. An empirical study of the efficiency of learning 
boolean functions using a cartesian genetic programming 
approach. In W . Banzhaf, J. Daida, A. E. Eiben, M . H. 
Garzon, V. Honavar, M . Jakiela, and R. E. Smith, editors, 
Proceedings of the Genetic and Evolutionary Computation 
Conference, volume 2, pages 1135-1142, Orlando, Florida, 
USA, 13-17 July 1999. Morgan Kaufmann. 
50] J. F. Miller and P. Thomson. Aspects of digital evolution: 
Evolvability and architecture. In A. E. Eiben, T. Back, 
M . Schoenauer, and H.-P. Schwefel, editors, Parallel Prob-
lem Solving from Nature - PPSN F, pages 927-936, Berlin, 
1998. Springer. Lecture Notes in Computer Science 1498. 
51] J. F. Miller and P. Thomson. Cartesian genetic program-
ming. In R. Poli, W . Banzhaf, W . B. Langdon, J. Miller, 
P. Nordin, and T. C. Fogarty, editors, Proceedings of 
the Third European Conference on Genetic Programming 
(EuroGP-2000), volume 1802 of LNCS, pages 121-132, Ed-
inburgh, Scotland, 2000. Springer Verlag. 
BIBLIOGRAPHY 93 
52] J. F. Miller and V. K. Vassilev. Principles in the evolution-
ary design of digital circuits 一 part I, Oct. 28 2000. 
53] P. Moscato. On evolution, search, optimization, genetic al-
gorithms and martial arts: Towards memetic algorithms. 
Technical Report C3P 826, Caltech Concurrent Computa-
tion Program, California Institute of Technology, Pasadena, 
CA, 1989. 
54] M . Murakawa, S. Yoshizawa, I. Kajitani, and T. Furuya. 
Hardware evolution at function level. Lecture Notes in 
Computer Science, 1141:62-71, 1996. 
55] A. Nicholson. Evolution and learning for digital circuit de-
sign, Apr. 17 2000. 
56] B. Shackleford, G. Snider, R. J. Carter, E. Okushi, M . Ya-
suda, K. Seo, and H. Yasuura. A high-performance, 
pipelined, FPGA-based genetic algorithm machine. Genetic 
Programming and Evolvable Machines, 2(l):33-60, 2001. 
57] K . Shahill. VHDL for Programmable Logic. Addison Wes-
ley, 1998. 
58] J. Torresen. Scalable evolvable hardware applied to road 
image recognition. In Evolvable Hardware, pages 245-252. 
IEEE Computer Society, 2000. 
59] J. Torresen. A scalable approach to evolvable hardware. 
Genetic Programming and Evolvable Machines, 3(3):259-
282, 2002. 
60] K. H. Tsoi. Pilchard user reference (vl.O). 2004. 
61] D. H. Wolpert and W . G. Macready. No free lunch theo-
rems for optimization. IEEE Transactions on Evolutionary 
Computation, l(l):67-82, Apr. 1997. 
BIBLIOGRAPHY 94 
62] Xilinx. Programmable logic design quick start handbook. 
2006. 
63] X. Yao and T. Higuchi. Promises and challenges of evolv-
able hardware. IEEE Transactions on Systems, Man, and 
Cybernetics, Part C, 29(l):87-97, 1999. 
List of Publications 
1. W.S. Lau, G. Li, K.H. Lee, K.S. Leung and S.M. Cheang: 
Multi-logic-Unit Processor: A Combinational Logic Circuit 
Evaluation Engine for Genetic Parallel Programming, 
Proceedings of the 8th European Conference on Genetic 
Programming, Lecture Notes in Computer Science, Vol. 
3447, pp. 167-177, Springer, 30 March - 1 April 2005. 
2. W.S. Lau, K.H. Lee and K.S. Leung: 
A Hybridized Genetic Parallel Programming Logic Circuit 
Synthesizer, 
Proceedings of the 8th annual conference on Genetic and 
Evolutionary Computation, pp. 839 - 846, Seattle, Wash-
ington, USA. 
95 
-  f  
J  .  • - •  -  •  •  } . .  -  -
‘.  .f
 
,  -  * 
. .  .  .  -  . -  ,  •-





1  .  .  .  -  、  i  -  r.  、 











..  .  ！..... .  
.  . -；、：..
、.
 -  ：  ：  •.  . 、  .，  •  ,  .、•  .  .  、  . 
•  .-
-.-  -  •  -  ‘ 
•  V - .. 、 ,  •  .  .  ...  •  .  -  •  . 







 •  -  • 
>
•  -  • 
.-...、.
 . 














“....  )>•.,.  ::、."•,•  .  .:..：  . .  .  
CUHK Libraries 
__l_lll l l l 
0 0 4 3 6 6 7 0 9 
