Design synthesis for dynamically reconfigurable logic systems by Vasilko, Milan
DESIGN SYNTHESIS FOR 
DYNAMICALLY RECONFIGURABLE 
LOGIC SYSTEMS 
MILAN VASILKO 
A thesis submitted in partial fulfilment of the requirements 
of Bournemouth University for the degree 
of Doctor of Philosophy 
October 2000 
Bournemouth University 
Design Synthesis for Dynamically 
Reconfigurable Logic Systems 
Copyright © 2000 
Milan Vasilko 
All rights reserved. 
Abstract 
Design Synthesis for Dynamically 
Reconfigurable Logic Systems 
Milan Vasilko 
Bournemouth University 
Dynamic reconfiguration of logic circuits has been a research problem for 
over four decades. While applications using logic reconfiguration in prac- 
tical scenarios have been demonstrated, the design of these systems has 
proved to be a difficult process demanding the skills of an experienced re- 
configurable logic design expert. 
This thesis proposes an automatic synthesis method which relieves de- 
signers of some of the difficulties associated with designing partially dy- 
namically reconfigurable systems. A new design abstraction model for re- 
configurable systems is proposed in order to support design exploration 
using the presented method. Given an input behavioural model, a tech- 
nology server and a set of design constraints, the method will generate a 
reconfigurable design solution in the form of a 3D floorplan and a config- 
uration schedule. The approach makes use of genetic algorithms. It facili- 
tates global optimisation to accommodate multiple design objectives com- 
mon in reconfigurable system design, while making realistic estimates of 
configuration overheads and of the potential for resource sharing between 
configurations. A set of custom evolutionary operators has been developed 
to cope with a multiple-objective search space. 
Furthermore, the application of a simulation technique verifying the 
lll 
results of such an automatic exploration is outlined in the thesis. 
The qualities of the proposed method are evaluated using a set of bench- 
mark designs taking data from a real reconfigurable logic technology. Finally, 
some extensions to the proposed method and possible research directions 
are discussed. 
IV 
To Maria, Dominika, Viktoria 
and my parents, 
for their love, patience and support. 
V 
Contents 
List of Figures x 
List of Tables xiii 
List of Algorithms xiv 
List of Abbreviations xv 
Acknowledgements xvii 
1 Introduction 1 
1.1 Reconfigurable Logic .... . ....... ......... .. 3 
1.1.1 Software Acceleration ....... ......... .. 5 
1.1.2 Hardware Virtualisation ............... .. 6 
1.1.3 Fault Tolerance . ..... ..... ......... .. 7 
1.1.4 In-Field and Remote Hardware Modification ... .. 8 
1.2 Reconfigurable Logic in Real-World Applications .... .. 9 
1.3 This Thesis ..... ..... . ....... ......... .. 11 
2 Reconfigurable Systems: Background 13 
2.1 Typical Architecture of a Reconfigurable System .... . .. 14 2.2 Reconfigurable Logic Technology ..... ........ ... 16 
2.2.1 Support for Reconfiguration ............ ... 17 
2.2.2 Configuration Interface ...... ........ ... 19 
2.2.3 Configuration Data Distribution ....... .. ... 19 
2.2.4 Configuration Activation ..... ...... .. ... 23 
2.3 Reconfiguration Latency ........ .......... ... 25 
2.4 Summary ....... ..... ....... ........ ... 33 
3 Previous Work on Reconfigurable System Design 35 
3.1 Design for Non-Reconfigurable Systems ............ 35 
3.1.1 Synthesis Design Flow ..... ............. 36 
3.1.2 Automatic Design Synthesis .... ........... 38 
vi 
3.2 Design for Reconfigurable Systems .... ....... .... 
39 
3.2.1 Evolution of Design Methodologies for Reconfigurable 
Systems 
........... . ... .... .... ... 
40 
3.2.2 Partitioning at Behavioural Level ............ 
42 
3.2.3 Partitioning at Register-Transfer Level .... ..... 
45 
3.2.4 Partitioning at Gate Level ..... ........... 
46 
3.2.5 Floorplanning ..... .................. 
48 
3.3 Solution Feasibility 
........ . .... ...... ... .. 
48 
3.3.1 Synthesis for Full versus Partial Reconfiguration ... 
49 
3.4 Summary .............................. 
51 
4 Reconfigurable System Synthesis Problem Formulation 52 
4.1 Fundamental Assumptions 
................... 
53 
4.1.1 Input to Reconfigurable System Synthesis ....... 
53 
4.1.2 Design Goal ........ . .... ... . ..... .. 56 4.1.3 Target Architectural Model ............... 56 
4.2 Reconfigurable System Design Synthesis Transformations .. 
57 
4.2.1 Behavioural Architectural Level ........... 
59 
4.2.2 Architectural -4 Physical Level ............. 
64 
4.3 Comparison with a Traditional High-Level Synthesis Formu- 
lation.... 67 
4.4 Summary of the Model Features ................ 69 
5 DYNASTY Framework 71 
5.1 Introduction . ... ...... ...... .... ... ..... 72 5.1.1 Architecture ..... ...... ............. 72 
5.1.2 Design Manipulation and Visualisation.. ....... 74 
5.1.3 Technology Server ....... ............. 78 
5.1.4 Design Simulation .... ................ 79 
5.1.5 Third-Party Interfaces .................. 79 
5.1.6 Synthesis of Configuration Controllers and Static De- 
sign Modules ........ . .... . ........ .. 80 5.2 Designing with the DYNASTY Framework .......... 80 
5.3 Design Example ....... . ....... ........... 82 
5.4 Conclusions ......... ................... 83 
6 Synthesis of Dynamically Reconfigurable Systems with Evolution- 
ary Algorithms 86 
6.1 Restricted Problem for Synthesis of Reconfigurable Systems 87 
6.2 Synthesis Process Overview (Temporal Floorplanning) ... 93 
6.2.1 Technology Independence . .... ... . ..... .. 
95 
6.3 Optimisation Algorithm Selection ............... 96 
6.4 Genetic Algorithms ........ . .... ........... 
98 
vii 
6.5 Implementation of an Automatic Reconfigurable System Syn- 
thesis ... ....... ....... ....... ...... .. 
102 
6.5.1 Problem Representation .... ........ ... .. 
102 
6.5.2 Population Initialisation .... ..... . ....... 
103 
6.5.3 Selection of Genetic Operators ..... ........ 
105 
6.5.4 Crossover Operators . ........... ..... .. 
106 
6.5.5 Mutation Operators .... ....... ........ 
108 
6.5.6 Overall Synthesis Procedure ...... ...... .. 
110 
6.5.7 Solution Feasibility ..... ....... ...... .. 
110 
6.5.8 Problem-Specific Fitness Function . ........ .. 
115 
6.5.9 Selection of a Genetic Algorithm Procedure and Con- 
trol Parameters ...... ........ ........ 
115 
6.5.10 Implementation ....... ....... ...... .. 
116 
6.5.11 Summary ... ....... ............. .. 
116 
7 Experimental Results 118 
7.1 Benchmark Problems ........ ....... ...... .. 
118 
7.2 Target Technology ... ....... ....... ....... . 
122 
7.3 Experimental Procedure 
. ..... ....... ...... .. 
122 
7.3.1 Design Verification ............ ........ 
124 
7.3.2 Design Implementation ......... ........ 
126 
7.4 Summary of Results ................ ........ 126 
8 Conclusions 135 
8.1 Summary of the Contribution .......... ...... . . 135 
8.1.1 Applications of the Proposed Approach ...... . . 138 
8.2 Areas for Improvement and Future Directions ...... . . 140 
8.2.1 Composite Cost Function ............... . 140 
8.2.2 Evaluation with Large and Multi-cycle Modules .. . 140 
8.2.3 Routing Consideration .......... . ...... . 141 
8.2.4 Architectural-Level Resource Sharing ....... . . 142 
8.2.5 Register Allocation, Pipelining and Retiming . ... . 143 
8.2.6 Summary 
... ...... ........ ...... . . 143 
Appendix 146 
A Model Reconfigurable Logic Technology 147 
A. 1 Architecture ...... ....... ........ ..... .. 148 
A. 1.1 Device Size ......... ....... . ..... .. 148 
A. 1.2 Logic Block ................. ....... 149 A. 1.3 Routing Resources . ... ......... ..... .. 150 
A. 2 Configuration Subsystem .... ........ . ..... .. 151 
A. 3 Library Modules . ........ ........ . ..... .. 
152 
A. 4 Support for Design Verification ......... . ..... .. 160 
vu' 
Glossary 
References 
161 
163 
ix 
List of Figures 
1.1 Typical architecture of a configurable logic device. ...... 
4 
2.1 Typical architecture of a reconfigurable logic system.. .... 14 
2.2 Propagation of the configuration data through the reconfig- 
uration subsystem . ........ . ....... ........ 
18 
2.3 Serial configuration data distribution .............. 20 
2.4 Random-access configuration memory .............. 
22 
2.5 One-to-one versus many-to-one configuration activation. .. 24 
2.6 Multiple-context configuration memory ...... ..... .. 25 
2.7 4-bit subtractor configuration experiment.. .......... 27 
2.8 Csub =f (0x, ay), subtractor module configuration latency 
Csb as a function of the offset (0x, Ay) against the adder 
module (8-bit parallel random access configuration interface). 28 
2.9 X-Y cross-sections for the diagram shown in Fig. 2.8 (8-bit 
parallel random access configuration interface). ....... 29 
2.10 Csub =f (Ax, ay), subtractor module configuration latency 
Csb as a function of the offset (0x, Ay) against the adder 
module (32-bit parallel random access configuration interface). 30 
2.11 X-Y cross-sections for the diagram shown in Fig. 2.10 (32-bit 
parallel random access configuration interface). ....... 31 
2.12 Subtractor module configuration latency Csub as a function 
of the offset (Ax, Ay) against the adder module (serial column- 
access configuration interface) ..... ...... ....... 32 
2.13 Subtractor module configuration latency Csub as a function 
of the offset (Ax, Ay) against the adder module (multiple- 
context configuration memory pre-loaded with the configu- 
ration data) ..... ..... ... ................ 32 
3.1 A typical design flow for non-reconfigurable systems..... 37 
3.2 Temporal partitioning at behavioural level. ........ .. 43 
3.3 Temporal partitioning at RTL ... . ............... 45 
3.4 Temporal partitioning at gate level ........ ... ..... 47 
X 
4.1 Example of a Control/Data Flow Graph model with the cor- 
responding behavioural code fragment ............. 
55 
4.2 Transformation of a reconfigurable design during synthesis. 58 
4.3 Example of a feasible schedule in a reconfigurable system. 63 
5.1 DYNASTY Framework architecture ............. .. 
73 
5.2 Typical DYNASTY session. . .......... ........ 74 
5.3 Laplace operator data-flow graph and 3D floorplan after schedul- 
ing .................................. 83 
5.4 Laplace operator 3D floorplan and data-flow graph after schedul- 
ing .................................. 84 
6.1 Architectural-level resource sharing controlled by an FSM. 89 
6.2 Architectural-level resource sharing with module ao shared 
between behavioural computations bo and b1.. ........ 90 
6.3 Relationship between the system and configuration clock sig- 
nals .................................. 91 
6.4 A Laplace operator mask 3D floorplan and data-flow graph 
during temporal floorplanning ................ .. 94 
6.5 An example of a chromosome coding in a genetic algorithm 
(1-dimensional binary string). The binary value encoded in 
the chromosome is linked to the system variable under opti- 
misation ....... ..... . ................ .. 99 
6.6 An example of a crossover operator (one-point crossover). . 100 
6.7 An example of a mutation operator (random 'flip' mutation). 100 
6.8 Reconfigurable system synthesis problem GA representation. 104 
7.1 Laplace operator data-flow graph ................ 120 
7.2 Differential equation solver data-flow graph. .... ..... 120 7.3 Elliptic wave filter data-flow graph. .............. 121 
7.4 An example of a CM-based simulation model used during 
verification. . ... ........ ................ 125 
7.5 Design schedule example ..... .............. .. 127 
7.6 Solution stability over 10 GA-synthesis runs (Laplace oper- 
ator benchmark, 24 x 24 array, 8-bit parallel random access 
configuration subsystem) . ............ ...... .. 130 
7.7 Comparison of a manually constructed design solution with 
a design obtained automatically (Laplace operator bench- 
mark, 24 x 24 array, 8-bit parallel random access configura- 
tion subsystems, no configuration cycles are shown). (a)-(b) 
show the placement of the design modules, (c)-(d) show the 
design execution schedule ... ................ .. 131 
A. 1 XC6200 logic block ......................... 149 
A. 2 Model XC6200 technology logic array . ..... . ..... .. 150 
xi 
A. 3 4-bit adder (a + b): schematic diagram ... . ..... . .... 154 A. 4 4-bit adder (a + b): detailed layout . .... . ..... . .... 155 A. 5 4-bit subtractor (a - b): schematic diagram. .......... 156 A. 6 4-bit subtractor (a - b): detailed layout ............. 157 
xii 
List of Tables 
7.1 Behavioural benchmarks used in the synthesis evaluation. . 119 
7.2 Relative module latencies used during the synthesis of ex- 
amples ........ ..... . .................. 123 
7.3 Synthesis results for an 8-bit parallel random access config- 
uration subsystem (XC6200) . ... ............... 127 
7.4 Synthesis results for multiple contexts configuration subsys- 
tem (DPGA) ....... ...... ................ 128 
7.5 Results for an 8-bit parallel random access configuration sub- 
system (XC6200) optimised by hand ............... 132 
A. 1 A selection of XC6200 library modules used in experiments 
described in Chapter 7 ....................... 153 
A. 2 Characteristics for 4-bit and 8-bit adder modules.. ...... 158 
A. 3 Characteristics for 4-bit and 8-bit subtractor modules. .... 158 
A. 4 Characteristics for 4-bit and 8-bit 'greater than' comparator 
modules ......... . ..... ................ 159 
A. 5 Characteristics for 4x 4-bit and 8x8-bit multiplier modules.. 159 
Xlii 
List of Algorithms 
6.1 Simple Genetic Algorithm . ... ...... ..... . .... 
101 
6.2 Population initialisation ..... ..... ...... . .... 
103 
6.3 First-come first-served allocation and binding .... . .... 105 
6.4 3D floorplan placement ...... ........ . ... .... 
106 
6.5 Module binding crossover .... ...... ..... . .... 
107 
6.6 2D floorplan crossover . ..... ...... ..... . .... 
107 
6.7 3D floorplan crossover . ..... ............ .... 108 6.8 Module binding mutation . ... ...... ..... . .... 109 
6.9 2D floorplan mutation . ..... ..... ...... . .... 
109 
6.10 3D floorplan mutation . ..... ..... ....... .... 
110 
6.11 Floorplan 'shaking' mutation .............. .... 111 
6.12 Overall GA-based synthesis procedure ........ . .... 112 
6.13 3D floorplan correction and reconfiguration latency calculation 113 
xiv 
List of Abbreviations 
ASIC Application Specific Integrated Circuit 
CAD Computer Aided Design 
CDFG Control/Data Flow Graph 
CDS Configuration and Data Store 
CM Clock Morphing 
CMOS Complementary Metal Oxide Silicon 
DFG Data Flow Graph 
DRL Dynamically Reconfigurable Logic 
FPGA Field Programmable Gate Array 
FPL Field Programmable Logic 
FSM Finite-State Machine 
FSMD Finite-State Machine Datapath 
GA Genetic Algorithm 
HDL Hardware Description Language 
ILP Integer Linear Program(ming) 
xv 
OTP One-Time Programmable 
RCU Reconfiguration Control Unit 
RL Reconfigurable Logic 
RLU Reconfigurable Logic Unit 
RS Reconfigurable System(s) 
RTL Register Transfer Level 
RTOS Real-Time Operating System 
SLU Swappable Logic Unit 
SRAM Static Random Access Memory 
VLSI Very Large Scale Integration 
VHDL VHsic HDL 
VHSIC Very High Speed Integrated Circuit 
WCS Writeable Control Store 
xvi 
Acknowledgements 
This thesis would have not existed without the efforts of many kind in- 
dividuals. It gives me a great pleasure to acknowledge their contribution 
here. 
I would like to thank my first supervisor, Graham Benyon-Tinker, for 
his advice and encouragement, generous support, and dedication to see 
this project through to its successful end. I am very grateful to my second 
supervisor, David Long, for his help, constructive criticism and support on 
this and other related projects. Special thanks to Jim Roach and the rest of 
the management team of the School of Design, Engineering & Computing 
who have provided generous support throughout my studies. 
My special gratitude goes to Djamel Ait-Boudaoud who was the first to 
suggest a research topic in the field of FPGAs, and later offered his support 
and advice as my first supervisor in the early years of my PhD studies. Pro- 
fessor Sa'ad Medhat also provided encouragement and support through- 
out this period. 
I am grateful to my past and present colleagues, Steve Holloway, Petr 
Voles, Radovan Cemes, David Cabanis, Darrell Gibson for their friendship, 
help and an excellent working environment they have provided over the 
years. 
Many individuals from the community of scientists, engineers and in- 
xvii 
dustrialists have helped to provide suggestions, guidance and shape the 
ideas presented in this thesis. While I cannot name all here, I would like to 
express my gratitude to Patrick Lysaght from University of Stathclyde and 
Wayne Luk from Imperial College for their encouragement and discussions 
on the topics of dynamically reconfigurable logic from the very beginning. 
I am also indebted to Patrick Schaumont and Serge Vernalde at IMEC, for 
their views and suggestions on modelling aspects of reconfigurable sys- 
tems. 
I am grateful to Tom Kean, John Gray, Jason Feinsmith and Patrick Kane 
and the companies they have been representing, namely Xilinx Develop- 
ment Corporation and Xilinx, Inc., for their advice, support and donations, 
which made work on aspects of this research possible. 
Financial support provided during my PhD studies by the UK CVCP 
Overseas Research Award Scheme, the NATO ASI grant and the EC LSF 
programme grant is gratefully acknowledged. 
My work on this research would have never been possible without the 
support from my family. I am grateful to my parents for their continuing 
support and encouragement in my pursuit of the university studies and 
the scientific career. Above all, I am indebted to my wife Maria and daugh- 
ters Dominka and Viktoria, who have firmly supported me throughout my 
PhD studies and have patiently suffered all the consequences. 
xviii 
Chapter 1 
Introduction 
Techniques for the design of computing systems have been attracting inter- 
est since the invention of the first computing machines. Early design tech- 
niques have relied on the engineering excellence of computing pioneers 
and offered little assistance with the laborious procedures involved in the 
construction of even the simplest system components. 
Advances in microelectronic technologies and system integration in the 
second half of the 20th century have moved computing machines from spe- 
cialised research laboratories into everyday-life products. Many thousands 
of designers are developing embedded computing products today in an 
environment very different from that of fifty years ago. 
Short product life cycles and increased competition are forcing design 
teams to minimise the 'time-to-market' of their products. While only a few 
years ago it was common for the development of an embedded computer 
product to take 1-2 years, the current market situation for many products is 
forcing design teams to deliver increasingly complex and powerful systems 
within only several months. 
In this market environment, product design techniques and method- 
1 
ologies are of paramount importance. Computer-aided design tools and 
methodologies have helped to automate the laborious and repetitive de- 
sign tasks, while increasingly higher levels of abstraction are being used to 
cope with system complexity. 'Push-button' design methodologies are be- 
ing used to generate large portions of designed systems automatically, thus 
dramatically reducing the design time. Short development cycles demand 
'first-time right' design methodologies which can guarantee the correctness 
of the automatically generated designs and minimise or completely ehm- 
inate design iterations. Objectives such as design time and performance 
now often dominate product development, rather than the previously de- 
sired silicon area efficiency. 
Furthermore, many computing systems today are subjected to modifi- 
cations during their product life cycles. These modifications can be caused 
by changes in protocol or interface standards, changes in user require- 
ments, correction of product faults, and others. The period of such changes 
may vary from years to months, when products need to be upgraded for 
example with a new version of an embedded operating system. Changes 
may be also required in matters of seconds or milliseconds, when systems 
have to react to requirements changing in real-time. For example, where 
the type of data to be processed depends on system locality, or coding pro- 
tocol depends on the type of processed data. 
This flexibility has traditionally been provided via modifications of sys- 
tem software components-programs stored in re-writable memories. Such 
programs can be modified or completely replaced without the need to phys- 
ically replace the memory devices. However, this 'software' approach to 
design flexibility is limited to processor-based components. While these 
can achieve high performance, timing-critical system components often 
2 
have to be implemented using custom integrated circuits. 
With the commercial availability of reconfigurable logic (RL) devices, 
a similar level of flexibility can be provided in custom high-performance 
circuits. 
The following sections outline some of the benefits and uses of recon- 
figurable logic technology and illustrate further the motivation for the pre- 
sented work. This chapter concludes with a brief summary of the topics 
covered in the thesis. 
1.1 Reconfigurable Logic 
With in-field programmable or configurable logic technology it is possible to 
construct logic circuits in-field, after the technology devices have been man- 
ufactured. 
A typical configurable logic architecture is composed of logic blocks sur- 
rounded by the routing wires (Fig. 1.1). Boolean functions and storage ele- 
ments implemented in the individual logic blocks, together with the con- 
nectivity of reconfigurable routing elements, can be configured via a set of 
configuration switches. The on/off state of these switches is controlled by 
individual memory cells contained within a device configuration memory. 
The collection of states stored within the configuration memory is called 
a (hardware or device) configuration. It is the type and organisation of the 
configuration memory cells which determine the level of flexibility avail- 
able in these technologies. 
For example, one-time programmable (OTP) technologies store hard- 
ware configurations in distributed programmable read-only memories. 
Once a selected configuration is transferred into an OTP device, the con- 
3 
logic blocks routing wires 
1-bit 
memory 
cell 
1-bit 
memory 
cell 
1-bit 
memory 
cell 
1-bit 
memory 
cell 
configurable 
Boolean function 
d IQ aQQ 
routing switch 
Q 111711HIE111 QQQ 
Q Q Q 1111111 
Q Q Q [111 Qm 
cep°f 
y 
look-up table 
configurable routing 
Figure 1.1: Typical architecture of a configurable logic device. 
figuration becomes permanent and cannot be modified'. 
Reconfigurable technologies store hardware configurations in a re- 
writable memory. While many different reconfigurable logic technologies 
have been developed, most of the contemporary RL technologies use dis- 
tributed configuration memories based on conventional 5-transistor static 
CMOS memory cells (Trimberger, 1994). In the remainder of this text we 
will refer to this type of RL technology as 'SRAM reconfigurable logic'. 
The suitability of a reconfigurable logic technology for a specific ap- 
plication is determined by the technology characteristics, including its ar- 
chitecture, granularity of logic cells, interconnection structures, availability 
of special architectural blocks (memories, clock generators, interface cores, 
etc. ) and others. Detailed discussions of various such reconfigurable archi- 
tectures can be found in field-programmable logic textbooks (e. g. (Oldfield 
and Dorf, 1995; Trimberger, 1994)). 
With SRAM reconfigurable logic it is possible to change the configura- 
'In some OTP technologies it is possible to change the state of configuration memory 
cells, if these were left unprogrammed (in their default states). However, the state of pro- 
grammed memory cells cannot be changed. 
4 
tion in a manner similar to how a program stored in the SRAM of a proces- 
sor system can be changed. Thus it is possible to load or modify the circuits 
implemented in reconfigurable logic. 
This flexibility of reconfigurable logic has attracted considerable re- 
search interest in recent years. Much of the excitement is focused around 
the promising application of this technology in the following areas: 
" software acceleration 
" hardware virtualisation 
" fault tolerance 
" in-field and remote hardware modification 
1.1.1 Software Acceleration 
In his early work Estrin (1960) has highlighted the potential of computer 
systems with 'flexible' architectural structures. If the processor architecture 
could be adapted to each computation executed on it, the speed of each 
computation could be improved. 
Later technological advances based on work by Minninck (1964) and 
others have led to the development of reconfigurable technologies capable 
of supporting these flexible architectural features. Commercial availability 
of reconfigurable devices, such as SRAM Field-Programmable Gate Arrays 
(FPGAs) in the 80's, has allowed the practical implementation of systems 
based on Estrin's ideas. 
Software acceleration aims to achieve computational speed-up for a 
specific algorithm/program by using a processor with its architecture tai- 
lored to the computational requirements of the input algorithm. The speed- 
up is measured as the ratio of the algorithm execution time on a processor 
5 
designed for a variety of different algorithms (general-purpose processor) to 
the execution time on a processor with its architecture customised for the 
accelerated algorithm (custom processor). 
Processor architectures based on reconfigurable logic technologies can 
be customised for a large variety of algorithms. Some impressive speed-up 
results have been demonstrated for many algorithms running on reconfig- 
urable computing machines. The early Splash-I system (a 32 FPGA linear 
array accelerator) was reported to outperform a standard general-purpose 
processor in a Sun3 workstation by a factor of 2700, while a single Cray-2 
processor was outperformed by a factor of up to 300 (Gokhale et al., 1991). 
High speed-up figures have been demonstrated with many specialised 
algorithms, however, the architectural limitations of custom-computer plat- 
forms and inherent limitations on the computational speed-up (Amdahl's 
law, (Amdahl, 1967)) do not allow for significant speed-up figures to be 
achieved for many other algorithms (Albaharna et al., 1994; Guccione, 
1995). Speed-up figures in the order of tens to hundreds are being routinely 
reported in the custom computing community. 
1.1.2 Hardware Virtualisation 
With reconfigurable logic it is possible to implement designs with sizes 
larger than the available hardware logic resources through hardware re- 
configuration. This concept of 'hardware virtualisation' is similar to the 
concept of virtual memory available in most modern operating systems 
(Silberschatz and Galvin, 1998). Hardware virtualisation is also known as 
logic 'time-sharing' (Lautzenheiser, 1986). 
The maximal size of a program which can be executed on a given pro- 
cessor is limited only by external factors, such as program storage size. 
6 
Similarly, the size of a hardware circuit implemented on a reconfigurable 
logic device is limited only by the size of storage for RL configurations. 
Hardware virtualisation has been explored for the implementation of 
high-performance applications of large sizes. Several implementations of 
neural networks on small reconfigurable logic devices have been demon- 
strated (e. g. (Lysaght et al., 1994; Eldredge and Hutchings, 1994)). In this 
approach, the computations in the neural networks are evaluated in small 
incremental configurations. Jones and Lewis (1995) have demonstrated a 
system capable of emulating large logic circuits partitioned over a number 
of configurations on a reconfigurable FPGA emulation platform. 
1.1.3 Fault Tolerance 
Early pioneers of computer science have envisaged difficulties with the op- 
eration of complex computing machines. In his theoretical work, von Neu- 
mann (1966) argued that complex computing machines will have to tolerate 
unreliability of their individual components as a normal part of their oper- 
ation. He discussed self-repair and self-reproduction mechanisms, which 
could improve the reliability of complex systems. 
With reconfigurable logic, the structure and connectivity of imple- 
mented circuits can be changed during the life-time of a reconfigurable 
system. Provided that it is possible to identify the failure of a circuit compo- 
nent, the functionality of such a component can be 'replaced' via hardware 
reconfiguration. This suggests that a significant amount of spare reconfig- 
urable resources need to be available in the reconfigurable logic for such 
a system to continue to function correctly. Such redundancy, however, is 
commonly incorporated in the design of systems demanding high reliabil- 
ity (Lala, 2000), although not at such a fine-grained level. 
7 
Successful implementation of 'defect-tolerant' systems based on recon- 
figurable logic was demonstrated by Culbertson et al. (1997) and others. 
The feasibility of self-reproduction and self-repair mechanisms in a re- 
configurable logic system were demonstrated with a purpose-built dynam- 
ically reconfigurable logic system (Mange et al., 1995). 
The applications of systems based on the above principles range from 
mission-critical or life-support systems to computing machines based 
on unreliable computational structures (e. g. chemical nanostructures 
(Heath et al., 1998) or biomolecular technologies (McCaskill and Wagler, 
2000)). 
1.1.4 In-Field and Remote Hardware Modification 
The ability to modify reconfigurable logic designs allows for errors to be 
corrected and new features to be added after the product was released. 
These updates can be performed in the field and even remotely over a dis- 
tributed network. Such features become vital in the environment where 
short time-to-market and short product life cycles force application devel- 
opers to accommodate new features late in the development cycle or during 
the product life-time (Kean, 1999). 
Slow standardisation efforts fail to keep up with the customer demand 
for new products. This situation is forcing manufacturers to develop prod- 
ucts with incomplete draft standards and then update the final products to 
comply with the final standard documents. 
The above features of reconfigurable logic have gained much impor- 
tance with the recent development of large-complexity and single-chip sys- 
terns. In these systems the reconfigurable logic will provide the required 
flexibility to cope with the uncertainties of product design, reliability, stan- 
8 
dards or feature demand. 
Remote changes of reconfigurable logic circuits will provide benefits to 
many systems requiring hardware changes during their life cycle. These 
systems include reconfigurable communication systems, consumer elec- 
tronics products (e. g. digital TV), automotive systems, and others. One 
other example of such a promising application are the systems using hard- 
ware reconfiguration in remote and harsh environments (Brebner and Berg- 
mann, 1999). 
Although the above features of reconfigurable logic have gained con- 
siderable interest only recently, this concept has been known to processor 
architects for many years. Early implementations of IBM 370 and DEC VAX 
computer architectures developed in the 70's utilised a Writeable Control 
Store (WCS)-an SRAM which could hold custom microcode (Hennessy 
and Patterson, 1990; Edwards, 2000). Through WCS it was possible to 're- 
place' faulty instructions with the new ones, thus allowing corrections of 
hardware bugs after the entire computer system was manufactured. Simi- 
lar features can be found in many contemporary processors today. 
1.2 Reconfigurable Logic in Real-World Applications 
Recent technological advances in the area of reconfigurable logic have pro- 
duced technologies capable of integrating designs with over one-million 
gates operating at system clock frequencies of hundreds of megahertz. In 
contrast, other ASIC technologies (gate arrays, standard cells, full-custom) 
can provide an order of a magnitude better performance and capacity. De- 
spite these limitations reconfigurable logic technologies are becoming in- 
creasingly popular for design implementation in low to medium volumes. 
However, to date applications using hardware reconfiguration in practical, 
9 
real-world scenarios are rare. 
Let us examine two principal difficulties which contemporary users of 
reconfigurable logic are confronted with: 
" Limited reconfiguration performance. The reconfiguration latency 
overhead of current reconfigurable technologies is prohibitive for 
many practical applications. While the system clock period of con- 
temporary general-purpose processors reduces below 1 ns, the recon- 
figuration time for typical commercial devices remains in the order of 
tens of microseconds and more. 
Furthermore, the flexibility gain of reconfigurable technology can of- 
ten be outweighed by alternative approaches using microprocessor or 
non-reconfigurable technologies. Several technological approaches 
have been proposed to reduce the reconfiguration overheads (e. g. 
(Brown et al., 1994; Vasilko and Ait-Boudaoud, 1996b)), however, a 
practical commercial implementation of these technologies is yet to 
be demonstrated. 
Chapter 2 discusses further the various approaches used for the con- 
struction of reconfigurable logic devices. 
" Lack of suitable CAD tools and methodologies. Most of the de- 
sign tools and methodologies available for reconfigurable systems are 
based around static (i. e. non-reconfigurable) design flows. 
Using such design flows for the design of reconfigurable systems can 
lead to a large number of design iterations due to an inability to 
accommodate reconfiguration overhead metrics early in the design 
flow. Automation of this process has been limited due to its com- 
putational complexity and the difficulties in estimating the low-level 
10 
design metrics at higher abstraction levels. A more detailed discus- 
sion of the topic of automatic design for reconfigurable logic, and a 
review of the past work in this area, is provided in Chapter 3. 
1.3 This Thesis 
This thesis addresses the need for an automatic design flow for dynami- 
cally reconfigurable systems. A new approach is proposed, which allows 
simultaneous searching through the design space of a reconfigurable sys- 
tem at behavioural and layout levels, while attempting to satisfy the overall 
design constraints. 
The presented approach uses a new model for the process of re- 
configurable system synthesis, which permits technology-dependent and 
solution-dependent characteristics (such as position-dependent configura- 
tion latency) to be calculated and inserted into the model during the solu- 
tion search. Therefore realistic estimates of reconfiguration overheads can 
be considered during the solution search. This approach guarantees that 
the automatically produced design solutions are feasible, i. e. they can be 
implemented using the target reconfigurable device without the occurrence 
of resource or dependency conflicts. 
This thesis is composed as follows. Chapter 2 provides background for 
reconfigurable systems with a particular emphasis on the reconfigurable 
logic technologies. The chapter examines the characteristics of several tech- 
nological approaches used for the construction of reconfigurable logic de- 
vices in order to demonstrate how the technology-specific features com- 
plicate the design considerations during the reconfigurable system design. 
Chapter 3 outlines the various techniques used for the design of recon- 
figurable systems, summarises the previous work significant to the area 
11 
of automatic synthesis for reconfigurable systems and highlights some of 
the shortcomings of the previously proposed approaches. Chapter 4 intro- 
duces a new model for the process of synthesising a design for a reconfig- 
urable system. The model provides the formalism necessary for the discus- 
sion of the design techniques presented in the following chapters. Chap- 
ter 5 presents a new experimental CAD framework developed to support 
the work presented in this thesis. Chapter 6 presents an automatic synthe- 
sis technique for a restricted instance of this problem based on the use of 
genetic algorithms. The qualities of this approach have been tested using a 
number of benchmark problems. These experimental results are presented 
in Chapter 7. The contribution of this thesis is summarised in Chapter 8, 
which also provides suggestions for future work in this area. 
12 
Chapter 2 
Reconfigurable Systems: 
Background 
Reconfigurable systems combine features which have previously existed 
individually in either software or hardware systems. Reconfigurable sys- 
tems provide flexibility similar to that of processor-based software systems, 
while their performance is close to custom hardware circuits. Reconfig- 
urable systems are therefore often viewed as a 'missing link' between soft- 
ware and hardware systems. 
The previous chapter has discussed the historical motivation for the de- 
velopment of reconfigurable logic systems. This chapter provides a back- 
ground for reconfigurable systems and reconfigurable logic technologies. 
The design difficulties resulting from the availability of different techno- 
logical approaches for supporting reconfiguration are discussed using a 
simple example at the end of this chapter. 
13 
------------------------------------------- 
RECONFIGURABLE SYSTEM 
----------------------- 
configuration 
RLU 0 synchronisation RCU 
RECONFIGURABLE RECONFIGURABLE 
LOGIC CONTROL 
UNIT UNIT 
configuration data 
and 
RLU state 
CDS 
CONFIGURATION 
application-specific configuration data 
data and DATA and 
STORE RLU state 
Figure 2.1: Typical architecture of a reconfigurable logic system. 
2.1 Typical Architecture of a Reconfigurable System 
A variety of reconfigurable systems and applications has been demonstrated, 
including those exploiting the less traditional concepts of self-reconfiguration 
(e. g. (French and Taylor, 1993; Sidhu et al., 1999; McGregor and Lysaght, 
1999)) and hardware evolution (e. g. (Thompson, 1996; Tangen, 2000)). The 
majority of these systems are built around a typical architecture shown in 
Fig. 2.1. 
A typical reconfigurable system architecture includes: 
" The Reconfigurable Logic Unit (RLU) which contains reconfigurable 
logic and routing resources, but may also provide memory blocks, 
configurable dock generators, etc. The state of all RLU resources 
can be accessed via the RLU's configuration interface. In many con- 
temporary systems, the RLU is represented by a single FPGA device, 
14 
while the FPGA configuration interface provides access to its config- 
uration memory. 
" The Reconfiguration Controller Unit (RCU) which manages the RLU 
reconfiguration. The RCU typically allows loading and retrieval of 
the RLU resource data states and configurations, although the avail- 
ability of these features depend on the RLU-specific capabilities. 
The RCU unit can be implemented in a hardware controller or a pro- 
cessor. It can operate either as a unit dedicated to the task of configu- 
ration control, or its functionality can be provided by units shared be- 
tween other system functions (e. g. on a processor running a real-time 
operating system (RTOS)). Furthermore, the RCU may be integrated 
within the RLU. In such a scenario, the RCU can either control the 
RLU's reconfiguration via a single configuration interface or the RCU 
may provide a distributed reconfiguration control mechanism. 
" The Configuration and Data Store (CDS) which provides memory 
storage for both the RLU configuration and state data, and application- 
specific data. 
There are many options for the CDS implementation, including a sin- 
gle memory block, two memory blocks (one for configuration data 
and one for application data), hierarchical memory structures, and 
others. 
A reconfigurable system can operate either as a self-contained unit or it 
can be embedded in a larger system. The complexity of the individual RS 
units and the complexity of the overall operation will vary greatly with the 
application requirements. 
Simple reconfigurable systems may implement the RLU on a single 
15 
FPGA device, the RCU on a dedicated configuration controller with an 
address counter and control generator, while the CDS can store the RLU 
configurations in a read-only memory. Reconfiguration of such a system 
can be initiated by an external control signal and may occur more or less 
frequently. 
Complex reconfigurable systems may be part of a larger real-time sys- 
tem, implementing the RLU using several FPGA devices and field pro- 
grammable crossbar switches, the RCU on a processor running a real-time 
operating system, responsible for the management of real-time event initi- 
ated reconfiguration, while also servicing other system functions. The CDS 
may be placed on a shared system memory with the system's RTOS being 
responsible for the management of memory sharing between system com- 
ponents. 
2.2 Reconfigurable Logic Technology 
Several technological approaches are available for the implementation of 
the RLU discussed in the previous section. The technological features of 
reconfigurable technologies can be separated into two main categories: 
" Architectural features: 
- logic block functionality and granularity 
- structure of routing resources 
- architectural organisation of logic and routing 
- availability of application-specific architectural components (fast 
local memory, fast carry-chains, etc. ) 
- availability, characteristics and number of external connections 
16 
" Configuration capabilities: 
- organisation of configuration memory and data 
- reconfiguration interface throughput and characteristics 
Reconfigurable logic has evolved from technologies developed for FPGAs. 
Therefore architectural tradeoffs for the implementation of reconfigurable 
logic are similar to those of FPGAs. A detailed discussion of the architec- 
tural tradeoffs for FPGA architectures can be found in a number of FPGA 
technology textbooks (e. g. (Trimberger, 1994; Oldfield and Dorf, 1995)). 
A number of techniques have been developed to provide configuration 
interfaces optimised for reconfigurable logic. The main techniques are sum- 
marised in the following section. 
2.2.1 Support for Reconfiguration 
This section summarises the main techniques used in reconfigurable tech- 
nologies for the construction of circuits supporting reconfiguration. The 
performance of reconfiguration for a specific type of reconfigurable tech- 
nology is determined by the performance of its individual components: 
1. External configuration interface, which transfers configuration data 
from the external sources (e. g. CDS in Fig 2.1) to the internal struc- 
tures of a reconfigurable device (Fig. 2.2(a)). 
2. Configuration data distribution network, which transports the con- 
figuration data to the individual configuration memory cells (Fig. 2.2(b)). 
3. Configuration activation scheme, which activates the configuration 
data by connecting the selected configuration memory cells with the 
configurable components of a reconfigurable system (Fig. 2.2(c)). 
17 
ý- 1-1-0 1ý0 
O-ý 
Iý7-Q 
P-Q 4D-El 
0-0 ¢D 6-0 110 
6-0 
configuration 
interface 
configuration data 
(a) external transfer (b) distribution 
LýJ ýJ 
logic blocks 
LýJ 
LSý! J 
ail 
ýJ 
ýJ 
ýK 
Lý 
l. Ti'J 
&, I 
configuration 
interface 
(c) activation 
Figure 2.2: Propagation of the configuration data through the reconfigura- 
tion subsystem. 
Although in the following the implementation of these components in 
FPGA technologies is discussed, these considerations are applicable gen- 
erally to the implementation of the RLU; regardless of whether the RLU 
contains a single FPGA device or a combination of FPGA and other recon- 
figurable technologies. 
The configuration performance depends on the throughput of all of the 
above components. Unbalanced configuration subsystems require configu- 
ration buffering between the configuration components with different through- 
put. The overall bandwidth of a reconfiguration subsystem is limited by 
the throughput of its slowest component. 
18 
local configuration 
stores 
2.2.2 Configuration Interface 
The connection of the configuration interface to the external world deter- 
mines the speed at which the configuration data can be transferred to the re- 
configurable device. A high-throughput parallel configuration interface con- 
nection is well suited for fast configuration transfers. However, it requires 
that a number of device I/O ports are assigned to this function. In those 
implementations where external ports are in demand or when fast reconfig- 
uration is not required, the parallel interface might not be desired. In such 
cases, the serial configuration interface provides a low-throughput and low- 
cost alternative. Most of the contemporary reconfigurable devices provide 
a dual parallel/serial interface allowing users to select the interface best 
suited for the targeted application. 
After the configuration data has been received from an external source, 
the configuration interface circuitry will arrange the data in the format suit- 
able for distribution within the reconfigurable logic array. 
2.2.3 Configuration Data Distribution 
Similar tradeoffs between serial and parallel access apply for the distribu- 
tion of the configuration data in the reconfigurable logic array. The follow- 
ing are two examples of common serial and parallel distribution mecha- 
nisms. 
2.2.3.1 Serial distribution 
Early commercial reconfigurable logic devices were designed for config- 
uration only during the system power-up. This scenario required a con- 
figuration interface, which is simple and has only minimal area and pin 
overhead in the reconfigurable logic array. 
19 
(0,9) 
logic 
serial configuratior 
data 
000101... 00010111 
(0,0) 
configurable logic array 
serial configuration memory 
Figure 2.3: Serial configuration data distribution. 
(9,9) 
(9,0) 
A serial configuration distribution (Fig. 2.3) was used in these tech- 
nologies. Typically, the configuration memory is arranged as a long shift- 
register. In order to program a device, the configuration data is shifted from 
the configuration interface to the configuration store bit-by-bit. The device 
is activated for normal operation once the entire configuration memory has 
been filled with the configuration data. Xilinx XC2000/3000/4000 (Xilinx, 
1994) and Altera FLEX 8k/10k (Altera, 1995) FPGA families are examples 
of the technologies with serial distribution of the configuration data. 
In order to improve the throughput of the configuration interface these 
families also provide parallel access to their configuration interfaces. These 
configuration interfaces internally reformat the parallel configuration data 
into a serial configuration 'bitstream'. 
Early implementations of configuration distribution mechanisms required 
that with each reconfiguration the entire configuration memory must be 
filled with configuration data before the new configuration can be used. 
20 
The term full reconfiguration is used to denote this type of reconfiguration. 
Later configuration distribution mechanisms have allowed for only a 
portion of the reconfiguration memory to be modified, while the rest of the 
system remains unchanged. This type of reconfiguration is called partial 
reconfiguration. 
The term dynamic reconfiguration is used to refer to cases when it is pos- 
sible to perform reconfiguration while the system remains in operation. In 
this text, the term 'dynamic reconfiguration' denotes a temporal quality of 
a reconfigurable system, while the terms 'partial/full reconfiguration' de- 
note its spatial qualityl. 
The Atmel AT6000 (Atmel, 1994) is an example of an FPGA technology 
with serial configuration data distribution, which supports both partial and 
dynamic reconfiguration2. 
The main drawback of the serial configuration data distribution mech- 
anism is its low configuration data throughput. Although partial reconfig- 
uration can reduce average configuration time, in the worst case it is still 
necessary to shift the configuration data through the entire array (e. g. con- 
sider the configuration of the block at position (0,9) in Fig. 2.3). 
On the other hand, this type of reconfiguration distribution mechanism 
is simple to implement as it does not incur a large area overhead. It pro- 
vides sufficient performance and flexibility for many applications which do 
not require rapid reconfiguration. 
2.2.3.2 Parallel distribution 
With the development of reconfigurable computing, it became desirable to 
provide a faster and more direct method of accessing configuration data 
'The use of this terminology is summarised in the Glossary on page 161. 
2 Atmel use the term Cache LogicT M to refer to this feature. 
21 
column address decoder 
(0,9) (9,9) 
_......... o. o_ a. aQaQaö_o aQ o_. oQ ao0.. ............ 
0 
logic blocks 
aQaQoaQ T 
Qoa_ooaao RAM interface 7711 DDII 
control 
Q: Q QQiQ. Q El QQ 
EI: address 
11: 
_1 _a 
77Q Q 
QQQQQQQQQQ ................ data QQQ, QQQ, Q. _Q, 7'_. (0,0) 
configurable logic array 
(9,0) 
random-access configuration memory 
Figure 2.4: Random-access configuration memory. 
and the RLU state (e. g. from a coupled microprocessor). A configuration 
distribution mechanism allowing random access to the RLU configuration 
memory was developed in response to these needs (introduced by Kean 
(1988), prototyped in the Algotronix CAL1024 device (Algotronix, 1991) 
and later improved in the Xilinx XC6200 FPGA family (Churcher et at, 
1995; Xilinx, 1997b)). 
This is an example of a parallel configuration distribution mechanism, 
where the internal configuration memory is organised in a structure simi- 
lar to that of a conventional single-port RAM (Fig. 2.4). The configuration 
data and the RLU state can be set and retrieved through addressing the 
appropriate location in a random-access configuration memory. 
Through this configuration distribution mechanism, the configuration 
memory appears as an ordinary RAM, which can be easily interfaced to 
standard processor-based systems. 
22 
The main advantage of this approach is that individual words in the 
configuration memory can be accessed quickly at any address (e. g. in Xil- 
inx XC6200 technology, a 32-bit configuration data word can be written in 
the configuration memory within a 30 ns write cycle). This approach also 
provides a partial reconfiguration capability. 
The approach requires an increased reconfigurable logic array area due 
to the necessity to provide the configuration address/ data routing and the 
logic supporting the random memory access. 
2.2.4 Configuration Activation 
A configuration activation scheme determines how the distributed config- 
uration data is transferred to the programmable switches within the recon- 
figurable array. The following are examples of two approaches used for the 
activation of the configuration data: 
2.2.4.1 Direct one-to-one activation 
Traditional reconfigurable logic devices provide one configuration mem- 
ory cell for each configurable switch in the array (Fig. 2.5(a)). In this case, 
the configuration memory cell is directly connected to the configurable 
switch. The new configuration is activated immediately after the configu- 
ration memory cell has been written with the new data. Examples of recon- 
figurable devices with direct one-to-one activation, include Xilinx XC2000/ 
3000/4000 (Xilinx, 1994), XC6200 (Xilinx, 1997b) and Altera FLEX 8k/10k 
(Altera, 1995) FPGA families. 
23 
C 
C 1-bit 0 
local configuration ° n-1 memory 
cc cell 
memory 
rn o c E 
1-bit V E 1-bit "f--- 
ö 
0 
memory 1 memory 
cell V cell ZT 
0 
1-bit 
0 memory 0 
cell 
configuration switch configuration switch 
(a) one-to-one (b) many-to-one (n-to-1) 
Figure 2.5: One-to-one versus many-to-one configuration activation. 
2.2.4.2 Many-to-one activation 
Configuration sub-systems connecting several configuration cells to one 
configuration switch were proposed to accelerate the speed of reconfigu- 
ration (e. g. (Brown et al., 1994; Trimberger et al., 1997)). These systems 
provide a multiple-context configuration memory, which contains more than 
one 'layer' of the configuration data storage (Fig. 2.6). Once the configura- 
tion data has been pre-loaded into the context layers, the configuration can 
be activated by selecting configuration cells in one of the memory layers 
(Fig. 2.5(b)). This process is often referred to as a 'context switch' because 
of its similarities to CPU context switching in multi-process operating sys- 
tems. 
The main advantage of this system is the speed of reconfiguration. Pro- 
vided that all configuration memory contexts can be pre-loaded with the 
desired configuration, the configuration for the entire device context can 
24 
multiple-context 
configuration memory 
active 
configuration º 
context 
context n-1 
context 1 
context 0 
(9,9) 
blocks 
Figure 2.6: Multiple-context configuration memory. 
be changed within several nanoseconds (e. g. 3 ns in MIT DELTA proto- 
type (DeHon, 1996) and 30 ns in Xilinx Time-Multiplexed FPGA (Trim- 
berger et al., 1997)). 
The main drawbacks of this approach are a large silicon area overhead 
required for the configuration context memory and its control signals, longer 
routing delays due to the increased spacing between the logic blocks and 
unpredictable power consumption during reconfiguration (Trimberger et al., 
1997). 
2.3 Reconfiguration Latency 
The various reconfiguration mechanisms provide a tradeoff between the 
configuration throughput and the area overhead required for the imple- 
mentation of a reconfiguration sub-system. In particular, the time needed 
for the configuration of a design module will vary with the technology. For 
25 
(0,0) 
reconfigurable logic array 
(9,0) 
partially reconfigurable technologies, the required configuration time will 
also vary with the current contents of the configuration memory and there- 
fore is dependent on the placement of modules in the design solution and 
their mapping to primitive device elements. This is demonstrated using 
the following example. 
Consider a reconfigurable logic array of size 31 x 31 with an architec- 
ture similar to that of the Xilinx XC6200 FPGA family (Xilinx, 1997b). The 
reconfigurable logic array has a fine-grained architecture and offers several 
different configuration modes. The relevant details of this technology and 
the modules used in this experiment are summarised in Appendix A. 
In the example, first the content of the configuration memory was cleared, 
and then a 4-bit adder was configured at coordinate (0,0). We will examine 
the configuration latency required for the configuration of a 4-bit subtrac- 
tor onto this array at different coordinates within a 17 x 17 square area 
(Fig. 2.7). The reconfiguration overhead was measured as the minimal 
number of configuration cycles required to complete the configuration of 
the entire 4-bit subtractor (Club). 
Reconfiguration overhead routines calculate the differences between 
the two configurations (i. e. an empty XC6200 device + adder configura- 
tion versus subtractor configuration) for all model XC6200 technology con- 
figuration subsystems, while accommodating the addressing restrictions 
of their respective configuration interfaces. The configuration differencing 
routine compares the configuration of XC6200 technology primitive ele- 
ments (i. e. logic and routing multiplexors), while ignoring the elements 
which are not used in the subtractor module circuit. 
The entire experiment was implemented within the DYNASTY Frame- 
work (Chapter 5). Reconfiguration using the following configuration sub- 
26 
Ay 
, 
Lx 
Figure 2.7: 4-bit subtractor configuration experiment. 
systems was examined: 
1. Partial reconfiguration via an 8-bit parallel random access configura- 
tion interface: Figs 2.8 and 2.9 show Csub as a function of the offset 
from the base coordinate (0,0). Due to similarities between the adder 
and subtractor modules, the minimal reconfiguration latency is at off- 
set (0,0). Configuration latency varies with the subtractor module 
position between 15 and 54 configuration cycles. 
2. Partial reconfiguration via a 32-bit parallel random access configura- 
tion interface (Figs 2.10 and 2.11). Compared to the previous case, the 
reconfiguration interface offers a higher configuration throughput (32 
configuration data bits/cycle). The minimal reconfiguration latency 
is lower and also varies with the module position. Note that the la- 
tency also varies in the locations with empty configuration memory 
27 
CSut 
15 
20 
25 
30 
35 
40 
45 
50 
55 
-8 
8 
,y 
Figure 2.8: Csb =f (Ax, ay), subtractor module configuration latency 
Csb as a function of the offset (Ax, Ay) against the adder module (8-bit 
parallel random access configuration interface). 
(e. g. Ay =2 in Fig. 2.11(b)). This is caused by the column-based 
alignment of the 32-bit configuration words within the configuration 
memory. 
3. Partial reconfiguration via a serial column access configuration in- 
terface (Fig. 2.12). Throughput of the configuration interface is lower 
then in the previous two cases (less then 1 configuration data bits/cycle) 
resulting in much higher absolute reconfiguration latency. The ad- 
dressing structure of the configuration memory requires that the en- 
tire column is reconfigured even if only one bit in a column needs to 
be changed. 
In the case of the subtractor module reconfiguration, the reconfigura- 
tion time is reduced only at coordinate (0,0). Also the reconfiguration 
28 
Ax 08 -o 
10 
15 
20 
25 
30 
35 
40 
45 
50 
55 
a 
Uý 
Ay = -6 x--- Ay -4 ... R.. _ 
ay =1_. -0-. - 
py=5 .... 
Ay =, 7 
O' 
" 
-8 -7 -6 -5 -4 .3 -2 -1 012345678 
Ox 
(a) Csub =f (Ox), Ay={-8, -6, -4, -2,0,1,3,5,7} 
I Ox=-2-ý 
Ox = -1 ---x--- 
r_ Ox =0... *... Ox=1..... ß..... 
Ox 2 
r 
-8 -7 -6 -5 -4 -3-2 -1 012345678 
Ay 
(b) Csub =f (Dy), Ax=1-2, -1,0,1,21 
Figure 2.9: X-Y cross-sections for the diagram shown in Fig. 2.8 (8-bit par- 
allel random access configuration interface). 
29 
Csub 
4 
6 
8 
10 
12 
14 
16 
18 
20 
22 
24 
8 
,y 
Figure 2.10: Csb =f (Ax, Ay), subtractor module configuration latency 
Csub as a function of the offset (Ax, Ay) against the adder module (32-bit 
parallel random access configuration interface). 
latency function has been simplified considerably. 
4. Full reconfiguration with multi-context configuration memory, pre- 
loaded with the subtractor module configuration data (Fig. 2.13). With 
this type of reconfiguration memory, it is possible to switch between 
the pre-loaded context within one configuration cycle. If the full con- 
text switch is performed, the time required for reconfiguration is con- 
stant regardless of module position or previous contents of the con- 
figuration memory. 
However, if the new configuration has to be loaded via the external 
configuration interface during the run-time, the performance of the 
reconfiguration will decrease considerably. Such a situation is not 
considered in this example. 
30 
Ax °$ ro 
6 
8 
10 
12 
Um 
14 
16 
18 
20 
22 
9d 
4 
6 
8 
10 
12 
614 
16 
18 
20 
22 
7d 
-8 -7 -6 -5 -4 -3 -2 -1 012345678 
Ax 
(a) Csub =f (Ax), Ay=1-8, -6, -4, -2,0,1,3,5,7} 
-8 -7 -6 -5-4 -3 -2 -1 012345678 
Ay 
(b) Csub =f (Dy), Ox = {-2, -1,0,1,2} 
Figure 2.11: X-Y cross-sections for the diagram shown in Fig. 2.10 (32-bit 
parallel random access configuration interface). 
31 
Csub 
1000 
1200 
1400 
1600 
1800 
2000 
2200 
-E 
8 
,y 
Figure 2.12: Subtractor module configuration latency Csub as a function of 
the offset (tx, Ay) against the adder module (serial column-access config- 
uration interface). 
! 
sub 
0 
1 
2 
3 
8 
,y 
Figure 2.13: Subtractor module configuration latency Csub as a function of 
the offset (Ax, Ay) against the adder module (multiple-context configura- 
tion memory pre-loaded with the configuration data). 
32 
Ax b8 -o 
8 -o 0 
The above examples demonstrate that the reconfiguration latency varies 
greatly with the type of reconfiguration sub-system. In the simplest case, 
the reconfiguration latency is constant and does not depend on the mod- 
ule position or previous contents of the configuration memory (Fig. 2.13). 
Therefore the reconfiguration latency can be calculated accurately at a high- 
level. 
In the most complex case, the reconfiguration latency is a function of 
module position and the previous contents of the device configuration mem- 
ory (Fig. 2.8). The reconfiguration latency function becomes a complex 
non-linear function, which can be accurately determined only after physi- 
cal synthesis has been completed. The corollary is that simple reconfigura- 
tion latency estimating techniques used prior to accurate knowledge of the 
physical design layout can be seriously misleading. 
The availability of alternative configuration sub-systems has fuelled re- 
search into techniques allowing minimisation of the configuration latency 
for specific technologies. For example Hauck et al. (1998) and Shirazi et al. 
(1998) have investigated various such techniques for Xilinx XC6200 tech- 
nology. 
2.4 Summary 
The approaches presented for the construction of the reconfiguration sup- 
port circuitry provide tradeoffs between the reconfiguration speed and the 
size of reconfigurable array devoted to the reconfiguration circuitry. Recon- 
figurable technologies may combine these approaches to provide a balance 
suitable for a selected application domain. 
For example, a Xilinx Virtex FPGA (Xilinx, 2000b) uses a serial config- 
uration interface allowing partial reconfiguration. The configuration sub- 
33 
system provides an addressed access to configuration memory at coarse 
granularity (column-based). This solution provides a tradeoff between fast 
partial random-access to the configuration memory and its slow serial re- 
configuration. 
As many different approaches exist for the implementation of the con- 
figuration interface, configuration data distribution and activation, the speed 
of reconfiguration will depend greatly on the characteristics of each such 
implementation in a specific reconfigurable logic technology. In general, 
the reconfiguration latency is a non-linear and technology-dependent func- 
tion. 
Considering that in practical systems the period of the configuration cy- 
cle is often comparable to the period of the system clock cycle, the impact 
of the selected reconfiguration mechanism on the design execution latency 
will vary considerably with the technology, but for some technologies also 
with the module placement. This dependency on the low-level layout char- 
acteristics makes it very difficult to consider realistic configuration latency 
during reconfigurable system design. A solution to this problem is one of 
the principal results of this thesis. 
34 
Chapter 3 
Previous Work on 
Reconfigurable System Design 
Reconfigurable systems have evolved to bridge the gap between flexible 
processor-based programmable systems and high-performance systems with 
'static' hardware functionality. Research in automatic reconfigurable sys- 
tems design therefore has origins in two scientific disciplines: program 
compilation and high-level synthesis. This chapter examines the relevant 
work of researchers in both of these fields to date. 
First the design process for non-reconfigurable systems is summarised 
in the following section in order to facilitate the comparison between the 
reconfigurable and non-reconfigurable design approaches. 
3.1 Design for Non-Reconfigurable Systems 
Design methodologies for non-reconfigurable systems have been the sub- 
ject of active research since the invention of electronic circuits. Automatic 
synthesis of digital circuits from high-level descriptions became viable with 
35 
the development of numerous optimisation algorithms combined with de- 
sign flows based on hierarchical design abstraction and hardware descrip- 
tion languages (HDL). 
3.1.1 Synthesis Design Flow 
A typical design flow for non-reconfigurable systems is shown in Fig. 3.1. 
This is typically a hierarchical top-down process, composed of individual 
transformation and optimisation steps performed in a sequence. 
During behavioural (also architectural) synthesis an abstract behaviou- 
ral design model is translated into one of the possible architectural design 
models, while attempting to meet the design constraints. The architectural 
model is a collection of interconnected computational blocks and a system's 
controlling finite state machine(s) (FSM). This model is often referred to as 
register-transfer level (RTL) architecture to emphasise that at this level data 
transfers between each block's registers become visible. 
In the following step, the architectural model is translated into a gate- 
level model. The gate-level model is a netlist capturing the connectivity be- 
tween the instances of cells from the target technology library. This model 
is equivalent to a schematic diagram in a traditional schematic-based de- 
sign flow. Whereas the behavioural and architectural models are indepen- 
dent of the target technology, the gate-level model is target-technology spe- 
cific. The process of automatic transformation of an architectural design 
model into a gate-level model is known as logic or RTL synthesis. This pro- 
cess involves the mapping of all computational design blocks into their 
gate-level representation and the synthesis and optimisation of the design 
FSMs. 
The design flow is completed by mapping the gate-level design model 
36 
DESIGN DESIGN 
CONSTRAINTS PROBLEM 
.2 . io 
BEHAVIOURAL 
SYNTHESIS 
...................... 
s, 
ALU w. ý. 
LOGIC 
SYNTHESIS 
=ý, 
, 1ýý 
ý 
PHYSICAL 
SYNTHESIS 
+ 
behavioural model 
architectural model 
(RTL) 
gate-level model 
device model 
0101001001011.... 101010 configuration bitstream 
Figure 3.1: A typical design flow for non-reconfigurable systems. 
37 
onto a target technology physical model. This process is known as the place- 
ment & routing (P&R) stage or physical or layout synthesis, and normally in- 
volves placement of the target technology cells and their routing within the 
target technology physical model. The exact nature of this process depends 
on the type of the target technology. For programmable logic technologies, 
the gate-level model has to be mapped onto the primitive logic and routing 
elements available in the targeted programmable logic device. 
The result of physical synthesis is a design implemented in the target 
technology. From such a physical design model, it is possible to extract the 
detailed timing characteristics of the design, which can be used to verify the 
overall timing and its conformance to the timing design constraints. The 
contents of the configuration memory needed for the implementation of the 
design in a selected programmable technology can be also determined from 
the physical model of the design. A tool performing this task is commonly 
denoted as a configuration bitstream generator. 
3.1.2 Automatic Design Synthesis 
All the synthesis steps of the design flow outlined in the previous sec- 
tion have been subjected to design automation. Numerous algorithms and 
methodologies have been proposed for automatic translation between the 
various design abstraction levels. A detailed description of these algo- 
rithms is outside the scope of this thesis, but can be found in several VLSI 
design automation textbooks (e. g. (Gajski et al., 1992; De Micheli, 1994; 
Sherwani, 1995; Gerez, 1999)). 
Typically, synthesis is performed in small steps at each of the design 
abstraction levels. Accurate design metrics, such as design size, power 
consumption, or detailed timing characteristics, are not known before the 
38 
design flow has been completed. However, these metrics can be estimated 
using a variety of techniques available at each abstraction level. 
Such a design approach does not normally lead to the most optimal 
results. Often design iterations are necessary to accommodate strict tim- 
ing constraints. However, in many cases the inefficiency of the automatic 
design techniques can be tolerated for a large part of the designed system. 
Automatic design techniques offer a productivity gain, which in many real- 
world situations out-weighs the design inefficiency for all but a very small 
proportion of a designed system. 
3.2 Design for Reconfigurable Systems 
With the introduction of dynamically reconfigurable systems, design tech- 
niques capable of supporting the dynamic operation of these systems be- 
came desirable. Initial efforts were focused on techniques known from non- 
reconfigurable system design with the expectation that these techniques 
could be adapted for reconfigurable systems. Later new approaches to the 
design for reconfigurable systems were proposed to address specific prob- 
lems not present in the traditional non-reconfigurable design methodolo- 
gies. 
The design process for reconfigurable systems can also be seen as a 
task with synthesis steps identical to those of non-reconfigurable systems 
(Fig. 3.1). The difference is in the type of the intermediate results produced 
at various abstraction levels. During the design for reconfigurable systems, 
the goal is to partition the design model into temporal segments so that the 
set of input design constraints could be satisfied. 
This problem of temporal partitioning is different to the problem of parti- 
tioning into multiple FPGA devices (spatial partitioning). While both prob- 
39 
lems address the partitioning of design computational and design storage 
elements, temporal partitioning must also consider the temporal relation- 
ships between the individual design partitions to ensure that no depen- 
dency violations or other conflicts occur during execution. 
Temporal partitioning can be performed at either behavioural level, 
register-transfer level, or at gate-level. The previous work in the area of 
design techniques for reconfigurable systems, and specifically that which 
addresses the automatic design of dynamically reconfigurable systems, is 
summarised in the following sections. 
3.2.1 Evolution of Design Methodologies for Reconfigurable Sys- 
tems 
The difficulties with the design of reconfigurable systems have been high- 
lighted by the work of several researchers. Hadley and Hutchings (1995) 
have described a manual design methodology for partially reconfigurable 
systems, noting the difficulties of using the conventional tools designed for 
non-reconfigurable systems. 
The DISC system (Wirthlin and Hutchings, 1995) used a library of cus- 
tom 'instructions' created using the standard FPGA CAD tools. The in- 
structions were required to align with a dedicated communication and con- 
trol architecture provided by the DISC system. The encapsulation of units 
of computations in such well-characterised instructions have allowed for 
dynamic instruction reconfiguration and placement to be managed during 
run-time. 
The approach has been generalised by Brebner (1997), who introduced 
the Swappable Logic Unit (SLU) as a new computing paradigm to support 
dynamic reconfiguration and placement in reconfigurable computer sys- 
40 
tems. 
This 'library-based' approach has been further advocated in the recon- 
figurable systems design methodologies in order to reduce the difficulties 
in designing reconfigurable systems (e. g. (Luk et al., 1996; Luk et al., 
1997b)). 
In such methodologies, a low-level library of target-technology mod- 
ules is provided as a part of the design flow. Using these modules, which 
were pre-placed and pre-routed using the target reconfigurable technology, 
it is possible to estimate both the computational performance and worst- 
case configuration latency of the system with a good degree of accuracy for 
many reconfigurable technologies. 
However, for many partially reconfigurable technologies, the actual re- 
configuration latency is a non-linear function of module size, shape and 
the previous content of the configuration array (e. g. Xilinx XC6200 or At- 
mel AT6000 FPGA families), which is difficult to estimate even with the 
library-based approach. 
For example, consider the experiment from Section 2.3. While the worst- 
case configuration latency for a 4-bit subtractor module configured via the 
8-bit random access interface is 67 configuration cycles (Table A. 3 in Ap- 
pendix A), in the best case the latency reduces to only 15 (Fig. 2.8). This 
is a considerable reduction of 78%, which can be determined only after 
the place & route stage. These difficulties are forcing designers to iterate 
through the entire design flow several times in order to quantify the recon- 
figuration latency. 
Luk et al. (1997b) and Robinson et al. (1998) reported design method- 
ologies, which attempt to find the individual temporal design partitions in 
a sequential and iterative design process. Govindarajan and Vemuri (2000) 
41 
describe SPARCS-a system capable of performing both temporal and spa- 
tial (multi-FPGA) partitioning within one design flow. 
These sequential design methodologies suffer from their inability to 
consider the strong interdependencies between the design decisions at high 
and low levels. For all but the simplest designs, these problems will lead to 
an excessive number of design iterations, which in turn will require several 
passes through the physical design tools. 
Automatic temporal partitioning has been shown to be possible at var- 
ious abstraction levels, although the quality of such partitioning varies de- 
pending on the applied method. The following sections summarise the 
past achievements in automatic temporal partitioning and run-time floor- 
planning for reconfigurable systems. 
3.2.2 Partitioning at Behavioural Level 
Conceptually this process is depicted in Fig. 3.2. Starting from a behaviou- 
ral design model and a set of constraints, the temporal partitioning is per- 
formed directly on the behavioural model. As the partitioning is performed 
at high-level, it is possible to explore the tradeoffs between the problem im- 
plementations using different architectural options. 
The product of such a temporal partitioning after the behavioural syn- 
thesis is a set of reconfigurable system partitions and a configuration con- 
troller for the design. 
Automatic techniques which fall in the category of behavioural level 
temporal partitioning have been reported by several authors. This work 
includes methods based on heuristics and exact combinatorial optimisa- 
tion. 
Ling and Amano (1993a) have implemented a priority-list scheduling 
42 
temporal 
I'ý"'ý ) 
-ýýý 
partitions 
... behavioural model 
*2 *10 
. :I 
BEHAVIOURAL 
SYNTHESIS 
I 
-- 10 
,o 
architectural model 
..... .......... ..... 
-- m (RTL) 
F11: 
ý. L 
Figure 3.2: Temporal partitioning at behavioural level. 
based technique for partitioning data-flow problems into multiple config- 
urations. The technique has been further improved by Takayama et al. 
(2000). These heuristic techniques specifically target the WASMII platform 
which offers multiple-context memory for the configuration storage. These 
techniques do not consider partial reconfiguration. 
Gokhale and Marks (1995) have proposed a partitioning approach for 
reconfigurable computing platforms, where each program function is par- 
titioned into a separate FSM. This allows large programs to be executed on 
a limited reconfigurable computing platform. Each such FSM was imple- 
mented on a single FPGA. 
A list scheduling based synthesis technique was developed by Vasilko 
and Ait-Boudaoud (1996a), which allows synthesis for partially reconfig- 
urable systems. The technique assumes a constant reconfiguration time, 
while a simple approach was used for the high-level estimation of the avail- 
43 
able reconfigurable device area. Both of these considerations limit the prac- 
ticality of this technique with the current reconfigurable technologies. Fur- 
thermore, architectural-level sharing between the design modules is not 
considered in this approach. 
GajjalaPurna and Bhatia (2000) have presented two different heuristic 
methods for temporal partitioning and scheduling data-flow graphs for re- 
configurable computing. These heuristics partition an input design prob- 
lem into a set of full partitions, while using a simple area metric to express 
the reconfigurable technology resource constraint. The two techniques pro- 
vide tradeoff between maximum parallelism and minimum communica- 
tions cost. A correction factor based on the FSM communication cost is 
devised in order to deal with the routability problems. 
Sels (1996) has addressed the problem of temporal partitioning for a 
reconfigurable device using integer-linear programming (ILP). However, 
only a one-dimensional model of the reconfigurable system was used in 
order to simplify the ILP problem formulation. 
Kaul and Vemuri (1998) have presented an ILP-based temporal parti- 
tioning technique which optimises the communication and memory band- 
width between the individual configurations. The technique has consid- 
ered a target system with reconfigurable devices of fixed size, while no 
partial reconfiguration was considered. 
More recently Zhang et at (2000) have proposed a temporal partition- 
ing technique based on Constraint Logic Programming, which permits the 
design modules to be shared between configurations. 
The problem of temporal and spatial partitioning has been also consid- 
ered in the SPARC system by Govindarajan and Vemuri (2000). The SPARC 
partitioning process does not consider device-level partial reconfiguration. 
44 
-------------- 
,... _ ........................ 
temporal 
' 
ýo sp 
partitions 
` .. 1.., neuer 
"-------- architectural model 
(RTL) 
I LOGIC 
SYNTHESIS 
gate-level model ;; 
Mý 
Figure 3.3: Temporal partitioning at RTL. 
3.2.3 Partitioning at Register-Transfer Level 
Conceptually this process is shown in Fig. 3.3. Given a non-reconfigurable 
register-transfer level architecture, the RTL temporal partitioning will pro- 
duce a gate-level reconfigurable implementation. This approach explores 
the partitioning of both the design structure and its finite-state machine(s) 
for one RTL design architecture. 
The RTL partitioning alone is unlikely to be a practical approach for 
synthesis of reconfigurable systems. This is because at RTL several de- 
sign decision have been already fixed (including design module allocation, 
binding and scheduling) without any consideration for the design reconfig- 
uration. While the RTL design architecture can be partitioned into separate 
configurations, the design optimisation is limited at this level to FSM opti- 
misation/partitioning. 
45 
However, in a combination with other approaches (e. g. a reconfigurable 
system synthesis from behavioural level), the RTL partitioning could pro- 
vide a valuable optimisation technique allowing optimisation of the design 
architecture and FSM partitioning at the same level. 
While no previous work has been reported to date which directly ad- 
dresses this problem, the use of reconfigurable finite-state machines was 
considered in (Skylarov and de Brito Ferrari, 1998; Oliveira et al., 1998). 
3.2.4 Partitioning at Gate Level 
Once a gate-level design model has been generated, it is not possible to 
modify the architecture or execution schedule of a design. Temporal parti- 
tioning at gate-level becomes attractive if the final gate-level model cannot 
fit into the target device. In such a case, one possibility is to 'fold' the im- 
plementation of the gate-level netlist over multiple design configurations 
(Fig. 3.4). 
This approach has been shown to be beneficial in the field of logic em- 
ulation where hardware emulation resources are limited (Jones and Lewis, 
1995; Trimberger et al., 1997). Both of these techniques are based on prior- 
ity list scheduling heuristics operating on a gate-level design netlist. 
Shirazi et al. (1998) describe an optimisation technique based on graph 
bi-partitioning, capable of optimising the layout in two configurations. The 
algorithm maximises the overlap of similar blocks in two configurations in 
order to minimise the overhead associated with the reconfiguration of the 
partitions. 
Canto et al. (1999) describe a heuristic gate-level bi-partitioning tech- 
nique. The technique will split the gate-level circuit into two configura- 
tions, which can be mapped on the target device with two configuration 
46 
temporal 
partitions 
\ 
gate-level model 
I PHYSICAL 
__j L SYNTHESIS 
=ODD OÖÖÖÖO. 
Qaoo0= device model 
oýýý oooao= R 0* ^0ý00 
=ooo©o oöoao: 
000000 130000 Qoooo= <aýýýý ý Qoooa= 
=aoooQ . 00000oo00000 
Figure 3.4: Temporal partitioning at gate level. 
context layers. 
Other authors have improved partitioning techniques at gate-level, in- 
cluding recent works by Liu and Wong (1999) and Chang and Marek-Sadowska 
(1998). 
The advantage of gate-level partitioning is that the granularity of the 
design structure representation is close to that of the target architecture. It 
is therefore possible to estimate whether the design will fit into the target 
device and predict how successful routing will be. 
The limitation of this approach is that the architectural implementation 
of the system, as captured in the gate-level netlist, cannot be changed to 
match the properties of the target reconfigurable technology. The original 
gate-level netlist was produced while targeting a non-reconfigurable system 
under a set of constraints intended for non-reconfigurable design imple- 
mentation. Whilst considering the properties of the target reconfigurable 
47 
system (such as reconfiguration latency dependencies or resource limita- 
tions), the gate-level temporal partitioning cannot guarantee that these con- 
straints will be satisfied. 
3.2.5 Floorplanning 
The problem of automatic placement and dynamic rearrangement of com- 
putational modules in reconfigurable computing systems has been addressed 
by several researchers. Brebner (1996) has discussed SLUs and their use in 
the context of an operating system for a reconfigurable computing plat- 
form. 
A method for automatic placement of SLUs in a 3D floorplan based on 
simulated annealing has been proposed by Bazargan et al. (1999). A model 
reconfigurable system is used to conduct their experiments, while assum- 
ing a constant reconfiguration latency for each of the configured blocks. 
Diessel et al. (2000) have proposed three different scheduling techniques 
for dynamic placement and rearrangement of tasks in reconfigurable com- 
puter systems. In this approach, the partial reconfiguration latency is as- 
sumed to be a linear function of the module size. 
3.3 Solution Feasibility 
The above approaches provide useful methods for temporal partitioning 
at behavioural and gate levels. However, few of these techniques consider 
partial reconfiguration. Even if the partial reconfiguration is considered, its 
properties are modelled using simplified and inaccurate models. In most 
cases, the above techniques attempt to reduce the complexity of the search 
space by working with a very simplified model of the reconfigurable ar- 
chitecture, often over-simplifying the impact of reconfiguration, ignoring 
48 
partial reconfiguration and routing implications. 
If a partitioning technique for this design problem is to produce prac- 
tical results, it must use a realistic model of the target reconfigurable tech- 
nology. Furthermore, the advanced features of the configuration interfaces 
which allow reduction of the configuration overheads must be considered 
as a part of the partitioning process. For example, Luk et al. (1997b) have 
demonstrated that the latency required for reconfiguration of a 32-bit adder 
to become a 32-bit subtractor can be reduced 8-fold (from 32 to 4 configura- 
tion cycles) if the wildcard feature of the target XC6200 FPGA technology 
is used. 
A further difficulty for the design of dynamic systems is that there are 
tight temporal and spatial interdependencies between the design entities 
at all abstraction levels. For example, a small modification of a floorplan 
at the physical level may cause a violation of the behavioural data depen- 
dencies due to an increased configuration latency. The interdependencies 
between the individual design problems for reconfigurable systems are fur- 
ther discussed when a formal model for this design process is introduced 
in Chapter 4. 
From the above discussion it is apparent that the design of reconfig- 
urable systems is much more difficult that that of non-reconfigurable sys- 
tems. Indeed, non-reconfigurable system design can be perceived as a spe- 
cial case of reconfigurable system design, but where the entire design solu- 
tion can be fitted into one configuration. 
3.3.1 Synthesis for Full versus Partial Reconfiguration 
The design of reconfigurable systems is further complicated by the variety 
of available reconfigurable technologies. Section 2.2 discussed the various 
49 
approaches for the implementation of reconfiguration sub-systems (partial 
vs full reconfiguration, random-access vs serial distribution mechanisms, 
various configuration activation techniques) and the resulting tradeoffs. 
When temporal partitioning is performed for a technology offering full 
reconfiguration only, the goal of the partitioning is to split an input prob- 
lem model into a number of configuration 'pages'. Each such page must fit 
within the resources available in the targeted reconfigurable device. Tradi- 
tional design techniques used for non-reconfigurable systems can often be 
used to synthesise each such configuration page. 
While these techniques cannot guarantee that each page can be fully 
placed and routed for the targeted reconfigurable device, simple meth- 
ods can be used to avoid such problems. For example, temporal parti- 
tion techniques may be permitted to use only a proportion of the total 
resources available in order to provide redundancy in case of placement 
or routing difficulties (e. g. used in (Govindarajan and Vemuri, 2000) or 
(Takayama et al., 2000)). 
If the latency of full reconfiguration is prohibitive, a technology sup- 
porting partial reconfiguration would be a preferred option. 
Temporal partitioning for partially reconfigurable technologies is a much 
harder problem. Additional constraints and features must be considered, 
such as ensuring that partially reconfigured modules do not overlap, whether 
the state of device flip-flops can be shared between configurations, the po- 
sition of the design blocks to minimise the configuration overheads, and 
others. The design optimisation is further complicated by the fact that in 
many partially reconfigurable technologies, the partial reconfiguration la- 
tency is a non-linear function of the position of a design module and the 
previous contents of the configuration memory (see Section 2.3 for further 
50 
discussion on the reconfiguration latency function). 
3.4 Summary 
Various technological approaches have been developed for the design of 
reconfigurable systems offering tradeoffs between device area, reconfigu- 
ration speed and other design metrics. The reconfiguration performance of 
reconfigurable systems is dependent on a specific technology and is deter- 
mined by the speed of its reconfiguration sub-system. 
However, the review of the published work to date reveals that there 
is not yet a solution to the problem of automatic design synthesis which 
can reliably exploit the features of partially and dynamically reconfigurable 
logic systems. 
The design of reconfigurable systems differs from the design for non- 
reconfigurable systems, because it must consider the temporal partition- 
ing of the input design problem and technological dependencies of the de- 
sign metrics associated with the selected target technology. Partially re- 
configurable systems offer many technological advantages over fully re- 
configurable systems. However, their features together with placement- 
dependent reconfiguration latency further complicates the design process. 
Given the availability of various reconfigurable logic technologies, it is 
important that a synthesis methodology for reconfigurable systems con- 
siders the dependency of the reconfiguration latency function at a 
high level. 
This thesis present an example of one such approach to reconfigurable sys- 
tem synthesis. 
51 
Chapter 4 
Reconfigurable System 
Synthesis Problem Formulation 
High-level synthesis is a multiple-level transformation process involving 
translation of an initial design problem represented by a behavioural model 
into a design implementation model, while considering both design perfor- 
mance constraints and the constraints imposed by the selected target tech- 
nology. 
The success of a solution search for any optimisation problem is deter- 
mined by the qualities of the model characterising the problem. The qual- 
ity of any such model is measured by its ability to capture the fundamental 
problem characteristics, while simplifying or neglecting the factors, which 
have only a minor or no contribution to the success of the solution search. 
This chapter presents a new formulation of the problem of synthesis for 
reconfigurable systems. The formulation captures the low-level technology- 
dependent characteristics and can therefore guarantee the feasibility of a 
generated solution, while permitting the optimisation algorithm to explore 
the solution search space efficiently. 
52 
The following section summarises the initial assumptions about the re- 
configurable system synthesis problem and models used in its formulation. 
The formulation of the synthesis problem for reconfigurable systems is pro- 
vided in the reminder of this chapter. The formulation highlights the im- 
pact of the technology-dependent design characteristics on the synthesis 
process. 
4.1 Fundamental Assumptions 
Various techniques can be used to implement circuits using reconfigurable 
logic technologies, working in various operational modes. These range 
from systems implemented without any reconfiguration (non-reconfigurable 
systems), systems which will reconfigure more or less frequently, systems 
with a separate configuration control, self-repairing, self-reproducing or 
self-reconfiguring circuits, circuits constructed and controlled via simu- 
lated run-time natural evolution and possibly many others. 
This section summarises the fundamental assumptions made about the 
problem of high-level synthesis for reconfigurable systems. These assump- 
tion also characterise the target computational model for reconfigurable 
systems considered in this thesis. While this formulation is restrictive and 
does not cover all possible applications of reconfigurable logic, the pre- 
sented formulation is feasible for many reconfigurable systems of practical 
interest. 
4.1.1 Input to Reconfigurable System Synthesis 
The process of reconfigurable system synthesis considered here begins with 
a behavioural problem model and a set of design constraints. However, real- 
world design projects rarely start at this level. This level of abstraction 
53 
is normally preceded by a design problem analysis leading to a formula- 
tion of a system specification. From the system specification it is possible 
to derive a system-level model, which can be used to establish and verify 
the desired system functionality at this level. Once the system-level func- 
tionality has been established, a behavioural design model together with 
design constraints can be extracted in a suitable form. If the system is to be 
implemented on a heterogeneous platform, including hardware, software, 
reconfigurable hardware, etc., this step has to be preceded by system-level 
partitioning. This problem is not considered here. 
4.1.1.1 Design Problem Model 
A variety of different design models have been developed in the field of 
high-level synthesis research. A Control/Data Flow Graph (CDFG) model 
(developed by McFarland et al. (1990) and others) was selected here to 
represent the design behaviour. This choice was motivated by the model's 
ability to capture both data and control characteristics of an input design 
problem in a single design model. The CDFG model is popular in high- 
level synthesis tools because of this ability. Similar models capturing both 
control and data characteristics of an input design problem could have been 
used as alternatives. 
Definition 4.1.1 (Control/Data Flow Graph) A CDFG is a directed graph 
G(V, E), where the set of vertices V represents a set of operations and 
the set of edges E represent dependencies between the pairs of oper- 
ations. 
The vertex set V can be decomposed into a set of data vertices Vd rep- 
resenting behavioural data operations (multiplication, subtraction, 
in- 
crement, etc. ) and the set of control vertices VV representing control 
54 
ab C 
d-a+b 
if (d > c) then d 
x -2 *d 
T F 
else 
*2 *1 
x -16 *d 
ýT 
Ff 
end if 
a; 
(a) code fragment (b) CDFG 
Figure 4.1: Example of a Control/Data Flow Graph model with the corre- 
sponding behavioural code fragment. 
operations (control flow fork/join), where Vd U VV = V. Similarly, the 
edge set E can be decomposed into a set of data edges (those carrying 
data token values) Ed and control edges (carrying control tokens) E, 
where Ed U EE = E. 
In the following, where the distinction between V and E is not impor- 
tant, the set B=VUE is used to denote all elements of a CDFG model. 
An example of a CDFG model and its corresponding behavioural pro- 
cedure is shown in Fig. 4.1. 
4.1.1.2 Design Constraints 
Design constraints restrict the set of possible design implementations to a 
set of solutions which can accommodate them. Design constraints fall into 
two categories: 
. Performance constraints which represent bounds on the desired per- 
55 
formance of the implementation, such as execution latency, through- 
put, size, power consumption, testability, etc. In the following, a set of 
performance constraints W is used to encapsulate all such constraints 
imposed on the design. 
" Technology constraints enforced by the selection of the target im- 
plementation technology. For reconfigurable logic technologies these 
might include propagation delays of reconfigurable logic and rout- 
ing, device architecture, quantity of the available resources, limitation 
on module placement, throughput and capabilities of the reconfigu- 
ration interface, etc. A set of technology constraints is referred to as O. 
4.1.2 Design Goal 
It is assumed that the goal of reconfigurable system synthesis is to con- 
struct a design implementation (also called design solution) using the se- 
lected target technology, such that all design constraints are satisfied. It is 
further assumed that the aim is to implement the entire input design model 
using the selected reconfigurable logic technology. The problem of hard- 
ware/software partitioning for the input design problem is not considered 
here. 
There is no a priori assumption that the design problem is to be imple- 
mented using dynamic reconfiguration. Given the set of design constraints 
and the selection of the target technology, the synthesis process may result 
in either a reconfigurable or a non-reconfigurable design implementation. 
4.1.3 Target Architectural Model 
The target architecture for the reconfigurable system synthesis problem 
considered is the architecture discussed in Section 2.1. It is assumed that 
56 
each of the main architectural components is represented by a single device 
(spatial partitioning between multiple reconfigurable devices, controllers 
or memories is not considered). No initial assumptions are made about the 
architecture and capabilities of the targeted reconfigurable logic technol- 
ogy. 
4.2 Reconfigurable System Design Synthesis Transfor- 
mations 
This section presents a formal framework for the definition of design prob- 
lems in reconfigurable system design. The definitions presented here are 
derived from the traditional definitions of high-level synthesis problems 
for non-reconfigurable systems (Gajski et al., 1992; De Micheli, 1994; Gerez, 
1999), whilst new or modified definitions are provided for problems which 
exist in the synthesis of reconfigurable systems. The aim is to provide a 
macroscopic view of the entire synthesis process, while highlighting a new 
formulation of problems resulting from the use of reconfigurable systems. 
The following formulations do not make any assumptions about whether 
the synthesis is performed during compile-time or run-time, rather they pro- 
vide a formalism for the problems which need to be addressed by either of 
these two approaches. 
A design solution is constructed by finding associations between the 
elements of a design model, library and target technology device elements 
at several abstraction levels. This process is illustrated in Fig. 4.2. 
There are number of individual transformation tasks (problems) which 
need to be solved during the reconfigurable system synthesis. The individ- 
ual problems are inherently interdependent. In the 
following, the individ- 
57 
Ä........ 8....... C.......: BEHAVIOURAL 
DESIGN ABSTRACTION 
behavioural 
operatorltask 
allocation __ 
OOO 
*2 *10 
<< 
etc. 
BEHAVIOURAL 
X 
LIBRARY ...................................... 
n architectural resource 
instance binding 
architectural A 
resource 
allocation 
uu 
O 
wir 
etc. 
ARCHITECTURAL 
V 
....................................... 
BC 
RESOURCE LIBRARY .................................... 
layout resource 
binding 
ARCHITECTURAL 
DESIGN ABSTRACTION 
......................... 
s0 
s1 
x 
.................... 
0000oaoaa 
PHYSICAL 
00000000 
a01111QQ011Q0 Q0 
DESIGN ABSTRACTION 
a0Qooo000QQ0 0110 QQ011Q11 Q0 
00 0 QQ01111 QQ0 
0 000 === 
0QQQQ0 
a0QQQQ0 
0Q0 QQ0 ........... 
a0QQQQ0 
0000 ................................................. 
Figure 4.2: Transformation of a reconfigurable design during synthesis. 
58 
ual transformations are described in a hierarchical order. 
4.2.1 Behavioural -+ Architectural Level 
During transformation from a behavioural to an architectural abstraction 
level it is necessary to perform the operations of resource allocation, re- 
source binding and scheduling. 
4.2.1.1 Architectural resource allocation 
This selects a set of resources from the resource types available in the tar- 
get technology library at the architectural level. This selection must ensure 
that resources providing the implementation for all types of behavioural 
model elements are available. In the following, the term 'architectural re- 
source library' is used to refer to a collection of (parametrisable) functional 
resources available for the target technology at architectural level (Fig. 4.2). 
Resource allocation involves allocation of library computational resources 
for behavioural model vertices (e. g. multipliers, subtractors, comparators, 
etc. ) and allocation of library connectivity resources for behavioural model 
edges (e. g. buses, permanent and 'virtual' registers, register files, FIFOs 
and other memory elements, etc. ). 
Definition 4.2.1 (Resource allocation problem) Given a behavioural 
input model G(V, E) with a set of behavioural model elements B and 
a set of resource types R from the architectural resource library, find a 
set of architectural resource instances A such that the following con- 
dition is satisfied for all bEB (resource availability): 
3a E A. F(b) 9 F(a) (4.1) 
where F(b) and F(a) represent the set of operations performed by 
59 
the behavioural element b and the set of operations performed by the 
architectural resource instance a respectively. 
The result of resource allocation can be characterised by mapping 
ar :A --4 R, which represents associations between all architectural 
resource instances and their corresponding library resource types. 
The technology-specific architectural resource library provides resources 
specific to the features of the targeted technology. For example, if a recon- 
figurable technology permits the transfer of register states via the config- 
uration interface, the architectural library will provide a 'virtual register'1 
resource available for allocation to behavioural model edges. 
4.2.1.2 Architectural resource binding 
Architectural resource binding creates a mapping between the set of archi- 
tectural resource instances and the behavioural model elements. The prob- 
lem of resource binding at architectural level can be defined as follows: 
Definition 4.2.2 (Architectural resource binding problem) Given an 
input behavioural model G(V, E), its set of behavioural elements B, 
and the set of resource instances A find mapping ,8: B -+ A such that 
the following condition is satisfied (functional compatibility): 
Vb E B. F(b) 9 F(ß(b)) (4.2) 
Once both vertices and edges from the behavioural model have been 
allocated and bound to specific architectural instances, it is possible to de- 
termine the implementation characteristics associated with the behaviou- 
'Virtual register is a register, whose state is transferred via the configuration interface 
to/from an external memory storage (CDS in Fig. 2.1). Virtual registers are used in reconfig- 
urable systems to transfer the values stored in hardware registers between either different 
configurations or different modules/ports in one configuration. Such registers can be also 
used in 'virtual pipelines' (Luk et al., 1997c). 
60 
ral model elements (e. g. latency, pipeline stages, area, signal quantisation, 
buffering, etc. ), while other characteristics can be only quantified in the 
later stages. 
The resource allocation may result in some of the architectural resources 
in the set A to be shared between the elements of the behavioural model 
G(V, E): 
Definition 4.2.3 (Architectural-level resource sharing) The resource 
sharing at architectural level occurs when given the behavioural model 
G(V, E), the set of its behavioural model elements B and the resource 
binding mapping ß, the following condition is satisfied (architectural- 
level resource sharing): 
3b b'. O(b) = ß(b') (4.3) 
where ß(b) =, 6(b') EA is the shared resource. 
4.2.1.3 Scheduling 
Scheduling can be performed once the timing characteristics of architec- 
tural resources associated with the behavioural model elements can be es- 
timated. Scheduling can be performed in a variety of scenarios, depending 
on the type of the design constraints. In order for a design implementa- 
tion to be feasible, the behavioural model has to be scheduled such that no 
violations of data or control dependencies occur. 
In the following, the integer delay function Delay (ß (vj)) represents the 
latency2 of the resource(s) bound to vj and the token transport latencies 
from the edges connecting from vj to v2, where vj is a predecessor to v2. 
The integer setup function Setup(ß (vz)) represents the latency required 
to setup the resource bound to v2. This may include configuration of the 
2the integer latency represents the latency relative to the system control step period 
61 
library resource instances and associated routing resources, and other tasks 
which need to be completed before the operation v2 can be executed. 
A general constrained scheduling problem for reconfigurable systems 
can be then formulated as follows: 
Definition 4.2.4 (General constrained scheduling problem) Given a 
set of operations V and a partial order on operations E, find an inte- 
ger labeling of operations a, p: V -+ Z+, representing a design 
execution schedule v and a design configuration schedule p, while the fol- 
lowing condition is satisfied for all i, j such that (vj , v2) EE (schedule 
feasibility condition): 
a(vi) > Q(v1) +Delay(ß(vj)) A o(vi) > p(v2) + Set'up(ß(v2)) (4.4) 
and the set of performance constraints I is satisfied. 
The execution schedule time o(v2) represents the execution start time 
of the resource bound to the behavioural operation vi, while cr(vj) 
represents the execution start time of the resource bound to vj. 
The configuration schedule time p(v2) represents the configuration 
start time of the resource bound to v2. 
An example of a feasible schedule for the design behavioural model 
G(V, E) is shown in Fig. 4.3. 
The integer labels in a, p represent system 'control steps', which usually 
correspond to the system clock cycles. Thus labeling p represents the recon- 
figuration schedule in terms of system time units. The detailed timing of 
the reconfiguration process can be controlled by a separate synchronisation 
mechanism. 
In the case when Setup(ß(vi)) = 0, there is either no need to setup the 
resource associated with the operation vi or the contribution of the setup 
62 
p(v3) (configuration start for v3) 
Q(v3) (execution start for V3) 
configuration for V3 
VI + 
J.......... 
.............. ................. ^ .......... 
MN 
V2 
................................ .. *...... 
CaL.......... 
w 
In 3 
................................ .................... 
.. 
o(v2) (execution time for v2) 
Figure 4.3: Example of a feasible schedule in a reconfigurable system. Con- 
figuration latency for V3 resources (represented by Setup(ß(v3))) has no 
temporal impact on the design execution latency as the configuration is 
performed in parallel with execution of operations vl and v2. 
latency is considered as a part of the system clock cycle. 
For this special case, the Eq. 4.4 becomes: 
a(vi) > o(va) + Delay(ß(vj)) A o(va) > p(vi) (4.5) 
As was demonstrated in Section 2.3, the reconfiguration latency is a 
technology and design-dependent function, which may vary greatly with 
the physical design characteristics (e. g. design module floorplan position 
or overlap). In traditional design approaches, no physical design character- 
istics are available at the time of scheduling. If for a selected target technol- 
ogy the actual Setup() function is not known at the time of scheduling, then 
only an estimate of the reconfiguration latency can be made. It is therefore 
difficult to ensure that the schedule feasibility condition (Eq. 4.4) is satisfied 
at this stage. 
The result of behavioural synthesis for a specific set of performance 
constraints IF is a 4-tuple (A,, 8, o, p), representing the design architectural 
63 
model with the selection and binding of its architectural elements, and their 
execution and reconfiguration schedules. The target technology in this pro- 
cess is represented by the architectural resource library providing the set of 
resource types R and their Setup() functions. 
4.2.2 Architectural -+ Physical Level 
The transformation process from architectural to physical level has to trans- 
late the architectural abstract model into a physical design model, which 
can be mapped onto the target technology device. The following problems 
need to be solved during this process. 
4.2.2.1 Logic synthesis 
This translates architectural model elements into a set of connected, technology- 
specific primitive cells representing a technology cell netlist. The following 
is a macroscopic definition of the logic synthesis problem for reconfigurable 
systems. 
Definition 4.2.5 (Logic synthesis problem) Given an architectural de- 
sign model (A, ß, o,, p) find a design logic model (U, N) representing 
the set of target technology primitive logic blocks U and nets N such 
that the model (U, N) preserves the functionality of architecture A, 
execution schedule o, and reconfiguration schedule p of the architec- 
tural model. 
The process of logic synthesis may involve the following tasks: 
" synthesis and optimisation of the logic representation for each 
element of the architecture A (if the logic representation of ar- 
chitectural elements in the target technology is not known) 
Ö4 
" synthesis and optimisation of the design state-machines which 
implement the schedule defined by a 
" synthesis and optimisation of the configuration controller au- 
tomata implementing the schedule p. Configuration controller 
synthesis will produce a cycle-accurate reconfiguration schedule 
p,, indicating the exact activity of the reconfiguration interface in 
each configuration clock cycle 
A description of the individual tasks involved in logic synthesis is out- 
side the scope of this thesis. Details can be found in the logic synthesis 
literature (e. g. (De Micheli, 1994; Murgai et al., 1995; Gerez, 1999)). 
The reconfiguration controller can be implemented by a dedicated logic 
circuit using traditional finite-state machine synthesis methods, or its oper- 
ation can be provided by a processor-based system which implements the 
reconfiguration schedule p, The implementation of dedicated reconfigu- 
ration controller circuits has been studied by Robinson and Lysaght (1999) 
and others. 
Once the design logic model is obtained, the netlist elements can be 
mapped onto a target technology device. 
4.2.2.2 Physical synthesis 
This involves finding a solution to both placement and routing problems. 
In the case of reconfigurable systems this can be generally defined as fol- 
lows: 
Definition 4.2.6 (Physical synthesis problem) Given the design logic 
model (U, N) with the set of logic model elements L=UUN, the tar- 
get technology device with a limited set of resources D, the 
design 
execution schedule Q and cycle-true configuration schedule pc, 
find 
65 
the mapping 0: L --* D such that the functionality of the logic model 
L is preserved and the schedule feasibility condition (Eq. 4.4) is not 
violated. 
If the design implementation is non-reconfigurable then the device re- 
sources are not shared between the elements of the logic model: 
di j. 0 (lz) 0(1j) (4.6) 
For a reconfigurable design implementation it is possible to share the 
device resources between the elements of the logic model (physical re- 
source sharing): 
3i j" o(lz) = q5(lj) (4.7) 
Once the mapping between the design logic model and the physical 
device is known, it is possible to construct a configuration controller im- 
plementing the cycle-accurate reconfiguration schedule pc for the design. 
4.2.2.3 Solution feasibility 
The feasibility of the final design solution is defined as follows: 
Definition 4.2.7 (Solution feasibility) The design solution is feasible 
if and only if it can be implemented on the target technology device 
resources D with behavioural functionality identical to that of an in- 
put behavioural model G(V, E), while the schedule feasibility condi- 
tion (Eq. 4.4) is not violated. 
In the above definition, the solution feasibility itself does not imply that 
the design solution meets all of the performance constraints IF. 
The result of physical synthesis for the target device is a 4-tuple (L, ¢, 0, pc) - 
This physical design model can be analysed for physical timing character- 
66 
istics and used to generate device configuration data necessary for the im- 
plementation of the design functionality on the device resources D. 
4.3 Comparison with a Traditional High-Level Synthe- 
sis Formulation 
This section compares the problem formulation introduced in Section 4.2 
with the traditional formulation of high-level synthesis for non-reconfigurable 
systems (e. g. (Gajski et al., 1992; De Micheli, 1994; Gerez, 1999)). 
The main differences between the two formulations are as follows: 
" The scheduling problem (Definition 4.2.4) for reconfigurable systems 
considers the latency required to setup the architectural resources 
bound to their respective behavioural model elements. The schedule 
feasibility condition (Eq. 4.4) ensures that the data and control depen- 
dencies between the behavioural operations are not violated, and the 
architectural resources are setup before they can perform any compu- 
tations. Furthermore, the function Setup() expresses the interdepen- 
dence between the design schedule (high-level characteristic) and the 
setup latency (typically a low-level technology-dependent character- 
istic). 
The formulation of the scheduling problem for non-reconfigurable 
systems requires that only the data and control dependencies between 
the behavioural operations are not violated. When such a formulation 
is applied to reconfigurable system synthesis, the reconfiguration 
la- 
tency is not considered at the time of scheduling. Therefore the 
design 
schedule may be invalidated at a later stage in the design 
flow, when 
the target technology-specific reconfiguration latency is inserted into 
67 
the design schedule. 
" The presented formulation of a physical synthesis problem (Defini- 
tion 4.2.6) allows for a clear distinction to be made between synthe- 
sis for reconfigurable and non-reconfigurable systems (Eq. 4.7 versus 
Eq. 4.6). In the context of the presented formulation, the reconfigu- 
ration is viewed as an instance of a resource sharing problem; note 
similarities between the architectural resource sharing (Eq. 4.3) and 
device resource sharing (Eq. 4.7). This observation suggests that sim- 
ilar methods could be used to search for solutions to these problems, 
while working at different abstraction levels. 
" Reconfigurable technologies may provide new possibilities of 'con- 
necting' behavioural model operations V. For example, it might be 
possible to connect two operations using 'virtual registers' or using 
a pair of overlapping registers which share their contents3. The pre- 
sented formulation allows for these special features to be considered 
as a normal part of the synthesis transformations for B=VUE (Defi- 
nitions 4.2.1-4.2.4). These special features are supported via the avail- 
ability of specific connectivity resource types in R. Timing character- 
istics of such resources may be considered as a part of Setup() and 
Delay() functions. 
In the traditional high-level synthesis formulation, the problems of 
resource allocation and binding are considered only for behavioural 
model operations V. This is because the implementation of behaviou- 
'For a pair of overlapping registers the register state is transfered directly between the 
overlapping registers, i. e. without the need for the state to be transfered via the configu- 
ration interface to the external memory storage (as for virtual registers). For example by 
overlapping the count register between up-counter and down-counter configurations, it is 
possible for the counters in two different configurations to share their count values (Vasilko 
and Cabanis, 1999). 
68 
ral model edges E (i. e. wires, multiplexors and registers) at the later 
stage is assumed not to introduce a significant timing overhead into 
the design schedule. 
"A separate reconfiguration schedule pc was introduced in the pre- 
sented formulation to capture the configuration cycle-true activity of 
the configuration interface. No equivalent of a reconfiguration sched- 
ule exists for non-reconfigurable systems. 
The presented problem formulation generalises the problem of system 
synthesis. The synthesis problem for non-reconfigurable systems is viewed 
as a special case of this general formulation. 
4.4 Summary of the Model Features 
The following are the main features of the formulation presented for the 
problem of reconfigurable system synthesis as compared to other approaches 
to reconfigurable system synthesis discussed in Chapter 3: 
" Resource sharing with architectural granularity only is considered 
during behavioural synthesis. This type of resource sharing is iden- 
tical to that of non-reconfigurable behavioural synthesis, where for 
example one functional unit (e. g. ALU) can be shared between the 
behavioural operations which require that functionality (it must sat- 
isfy the condition of functional compatibility, Eq. 4.2). 
" Resource sharing via reconfiguration is considered at a fine-grained 
physical level. This allows sharing of primitive physical components 
at the layout level. Therefore an optimisation algorithm working 
within this framework will be able to evaluate not only sharing be- 
69 
tween architectural elements with identical architectural functional- 
ity, but also sharing between unrelated architectural elements based 
on the similarities of their configurations. 
" Contrary to other approaches discussed in Sections 3.2.2-3.2.4 no 'con- 
figuration partitioning' is performed during the behavioural -ý archi- 
tectural translation. Rather the temporal dependencies resulting from 
reusing the architectural elements via reconfiguration are annotated 
as a part of the configuration schedule p. 
A cycle-accurate configuration schedule is generated based on the 
mapping of the logic design model to the physical device model gen- 
erated during physical synthesis. This schedule represents the set of 
reconfigurations performed at the granularity determined by the ca- 
pabilities of the target technology reconfiguration sub-system. 
" As there is no a priori assumption that the design should be imple- 
mented as a reconfigurable system, the optimisation techniques work- 
ing within this framework are free to explore the tradeoffs between 
the reconfigurable and non-reconfigurable implementations. There- 
fore, the result of the synthesis is a solution which accommodates the 
input design constraints and which may be either a reconfigurable or 
a non-reconfigurable design implementation. 
70 
Chapter ,5 
DYNASTY Framework 
This chapter presents DYNASTY-an experimental CAD framework de- 
veloped as a part of this project to support the research on reconfigurable 
system design techniques. This Framework was used throughout the work 
presented in this thesis for the development of the presented algorithms, 
techniques, but also for the experimentation with the target technologies, 
new simulation techniques and methodologies. 
The presentation of the Framework in this chapter is based on the au- 
thor's previous publications from this project (Vasilko et al., 1999; Vasilko, 
1999; Vasilko, 2000). 
The following section provides the details about the Framework fea- 
Lures and its implementation. In Section 5.2 a user's view of the DYNASTY 
design flow is presented, while Section 5.3 illustrates the Framework ca- 
pabilities using a simple design example. The chapter concludes with the 
summary of the DYNASTY Framework's features in Section 5.4. 
71 
5.1 Introduction 
DYNASTY Framework is an extensible generic CAD tool-suite, designed to 
support research of reconfigurable system design techniques and method- 
ologies. 
The Framework fully supports the problem formulation presented in 
Chapter 4. The design methodology currently implemented in the DY- 
NASTY Framework is based around the temporal floorplanning (Vasilko, 1999), 
which allows simultaneous reconfigurable design space exploration at mul- 
tiple levels of design abstraction in both spatial and temporal design di- 
mensions. 
The Framework implements several novel concepts, including tempo- 
ral floorplanning, technology server1 based design methodology and a va- 
riety of the design visualisation techniques allowing designers to interact 
with the design process throughout the entire design flow. 
5.1.1 Architecture 
The overall architecture of the DYNASTY Framework is shown in Fig. 5.1. 
The core of the Framework is the internal design representation provid- 
ing design model view at (i) behavioural, (ii) architectural, and (iii) layout 
abstraction levels. 
Design manipulation tools are provided to facilitate construction of the 
design solution. Design analysis tools help to evaluate the quality of the 
constructed solution. Design visualisation techniques provide visual 
feed- 
back about the design structure, characteristics and performance. 
Inter- 
faces to third party tools are provided to allow importing designs into the 
'Some of author's previous publications use the term library server to refer 
to a technol- 
ogy server. 
72 
DYNASTY Framework 
............................................................. 
- DehevlouraJ model 
-20 and 3D p acement 
- d. teib0 cNl [onfigurotbn 
- o. o n sýMaul. 
- 
eepsndency/DRC 
Notations 
technology-. p. cMlc 
bftrtrNm famif 
Ndpn coM7g at on blt brm 
- Layout device mo eis o 
PiT 11 
C 
4 
54 
® 
- cell, tnc 
co module, IP IP core Iibrvbs 
- technology specific algorithm °Z 
(config. overhead & delay estinutbn 
PI. Cem nt routing, etc. ) 
3rd-PARTY TOOLS 
VHDL 's - Denevwunl b timing VHOL simulation % (clock morw+r+g. Ica) 
EDIF post-layout delay enalysd 
oondguret on contmlbr ryntnes6 
Figure 5.1: DYNASTY Framework architecture. 
Framework, communication with a VHDL simulator and other technology- 
specific analysis tools. 
A selectable technology server provides technology-specific libraries, 
device models and algorithms (estimation, placement & routing). Other 
DYNASTY components not shown in Fig. 5.1 include design and technol- 
ogy server database, and an internal Tcl command interpreter. A typical 
DYNASTY design session is shown in Fig. 5.2. 
The internal design representation allows a combination of design views 
during a reconfigurable design exploration. For example during temporal 
floorplanning various portions of the design could be available in the archi- 
tectural and behavioural views. Incomplete design representations are also 
supported to facilitate late insertion of configuration controllers or other 
static circuits. 
design entry I 
- modu4IsignaVng. allocation design Import 
- aonfiguntion partitioning 
- 30 pavement 
- oonstlalnt spedfical' n 
a"thm I-ocitgn DiTEMAL 
DESIGN 
REPRESENTATN 
- cwfiguralbn w. rMIE 
- design latency 
- routing fe"bility ýpnfl9uratbn 
- date/canwucoflgurnon ah. port depanancy vWaUons 
73 
Behavioural model 
(CDFG) viewer 
_ý aMeq 
Design browser 
7u ewr' 
9 GI ý'{ Lib ,y Senen 
yz - teezoo 
Technology server cy LIoruIes 
browser 7a rO acooo 
... Su .: fý *c6000_ wcros 
G Vi ýC Nlgonmmt 
xctzie 
. cJO V: ew Czl^. 
02 ss67 
Schedule editor 
at I. 
7 
. 
Command console 
Technology server 
cell configuration 
dialog 
2D floorplanner 
Figure 5.2: Typical DYNASTY session (not all tools shown). 
5.1.2 Design Manipulation and Visualisation. 
Unlike non-reconfigurable system designers, the designers of dynamically 
reconfigurable systems are required to analyse numerous design character- 
istics simultaneously. The search for a good design solution requires anal- 
ysis of various temporal and spatial design properties, including design 
latency, throughput, configuration time, spatial conflicts, sharing of recon- 
figurable resources in different configurations, impact of placement on the 
RLU reconfiguration time, size of configuration data, power consumption, 
etc. 
A set of novel design visualisation techniques have been developed 
to support design visualisation and manipulation within the DYNASTY 
Framework. The aim was to achieve visualisation of the following recon- 
figurable design properties: 
9 reconfiguration overhead effects 
" configuration partitioning 
fc 
j FDJ 
oLItflo 
Zew Layer ArIWV\ýs 
74 
" spatial conflicts (overlaps) between blocks in different configurations 
" design execution and configuration schedule 
Furthermore, the goal was to support the variety of reconfigurable logic 
technologies and so the visualisation techniques should be technology-independent. 
The following are the key design manipulation and visualisation tools 
provided in the DYNASTY Framework (Fig. 5.2): 
" Design browser provides a list of designs, with details about their de- 
sign elements at all abstraction levels. The browser allows manipula- 
tion of the individual design elements in order to perform allocation, 
module placement, invoke specific design algorithms, etc. The fol- 
lowing models are used to represent the design at three abstraction 
levels: 
- Control/Data Flow Graph (CDFG) represents design behaviour 
- Finite-State Machine Datapath (FSMD) represents design archi- 
tecture. 
- 3D structural netlist is used to represent the 
design layout 
. Technology server browser provides list of supported technology servers, 
with the details of their libraries, devices and technology-specific al- 
gorithms. The most suitable algorithm for a given design stage can 
be chosen interactively. This is typically used to chose an estimation 
technique with the desired accuracy/run-time tradeoff. 
" Behavioural model viewer provides a CDFG view of the system 
behaviour. 
" Schedule Editor allows tracking the execution and reconfiguration sched- 
ule for the system during the design space exploration. 
75 
" Floorplanner provides design structure visualisation in either spatial 
2D or 3D floorplan. It provides two views: 
- Configuration view (e. g. Fig. 5.3(b)) represents partitioning of the 
design into individual configurations. Such a view is useful in 
the early design stages when a designer needs to perform this 
partitioning on a behavioural design model. At this stage, only 
sequencing of design block execution is determined, while the 
cycle-accurate execution schedule will be calculated at a later 
stage. 
- System clock view (e. g. Fig. 5.4(b)) is a cycle-true display of the de- 
sign activity. This view includes visualisation of both execution 
and configuration processes for all design blocks. The cycle-true 
schedule can be recalculated by the technology server as the de- 
sign is being manipulated. 
" Cell configuration dialog allows manipulation of the configuration for 
the detailed layout elements, e. g. routing and logic switches. 
The structure and parameters of the reconfigurable logic design solu- 
tion can be directly manipulated via the DYNASTY Framework graphical 
user interface. A change in any view will be propagated to the relevant de- 
sign representation in the other design views. For example, when a design 
module is placed in a configuration which results in a violation of data de- 
pendencies in the design CDFG, the execution and configuration schedules 
can be automatically recalculated to reflect the resulting design latency and 
reconfiguration overhead. 
76 
5.1.2.1 Reconfiguration Overhead Effects 
In the DYNASTY Framework, the reconfiguration overhead is calculated 
by a technology-specific algorithm provided by the technology server. In 
the current implementation, the Framework supports reconfigurable logic 
designs with one configuration controller. The period of reconfiguration is 
identified in the Schedule Editor using a red bar on the top of schedule dis- 
play (seen as dark-grey in Figure 5.2). In the Floorplanner tool, the configu- 
ration of individual blocks is indicated using a pyramid (3D Floorplanner) 
or a triangle (2D Floorplanner). The number of pyramids/ triangles in the 
direction of the z-axis indicates the configuration interface activity during 
the system clock cycles. 
With these techniques designers can assess the configuration overheads 
for the current placement, partitioning and clock/ configuration cycles of a 
reconfigurable logic design. 
5.1.2.2 Execution and configuration schedule for the design 
Due to the interdependencies between the execution and configuration de- 
sign scheduling, both schedules have been merged into a single Schedule 
Editor tool. Here the overall execution schedule is displayed, which com- 
bines the execution and configuration latencies of the individual design 
blocks. Schedule steps are identical to the system clock cycles. If the config- 
uration clock is different from the system clock, the configuration latencies 
are scaled to the system clock units. 
5.1.2.3 2D versus 3D Floorplanner 
The Floorplanner has been designed to provide design visualisation in both 
2 and 3 dimensions. While a 3D floorplan view represents the overall 
de- 
77 
sign structure and partitions well, its manipulation may become tedious 
for large designs. With the 2D Floorplanner, designers can examine each of 
the layers individually and also the locations which are 'difficult to reach' 
in a 3D view. The 2D Floorplanner is also better suited for the exploration 
of the desired sharing between configuration layers (users can display se- 
lected number of layers to examine their similarities). 
5.1.3 Technology Server 
Available reconfigurable technologies support a wide variety of reconfigu- 
ration mechanisms and device architectures as was discussed in Section 2.2.1. 
The use of technology servers in our Framework offers technology inde- 
pendence as the technology-specific features can be provided as a 'plug-in' 
technology server. 
In its basic configuration, the technology server includes the following 
components: 
"A set of target-technology cell and module libraries. These are a com- 
mon part of modern FPGA design tools and provide a selection of 
technology-specific components which can be used in the design. 
" Reconfigurable architecture device models provide a detailed model for 
each of the available devices. Such models contain all logic, routing 
and configurable resources available in the target technology. 
" Technology-specific algorithms. These include algorithms for estimation 
of configuration overheads at various levels, placement and routing 
algorithms, delay estimation, and other routines required to support 
the specific features of the target reconfigurable technology. 
Compared with the other approaches used to implement a generic tech- 
78 
nology support, the technology server is unique in providing technology- 
specific algorithms along with the technology libraries and device models. 
5.1.4 Design Simulation 
The simulation of reconfigurable logic designs in DYNASTY is supported 
at two abstraction levels: 
"A VHDL simulation model can be generated for the design at any 
stage during the design. Clock Morphing simulation (Vasilko and 
Cabanis, 1999) was selected as a primary simulation method in the 
Framework for its ability to provide simulation of a reconfigurable 
design at various abstraction levels. 
" The completed design can be exported (via the EDIF third-party tools 
interface) to Xilinx XACT6000 (XC6200 P&R tool (Xilinx, 1997a)), where 
a detailed timing model can be generated and then simulated using a 
third-party VHDL simulator. 
5.1.5 Third-Party Interfaces 
A design can be imported into the Framework in the EDIF 200 format (Stan- 
ford and Mancuso, 1990), which can be exported from many popular de- 
sign entry tools. A design behavioural model can be stored in EDIF as a 
structural representation of the design CDFG. 
The design can be exported from the DYNASTY Framework in both 
EDIF and VHDL formats. This can be used for simulation, synthesis or de- 
lay analysis using third party tools. The reconfigurable design configura- 
tion data can be generated using a technology-specific bitstream generator, 
which produces design configuration files. 
79 
5.1.6 Synthesis of Configuration Controllers and Static Design 
Modules. 
Automatic synthesis of configuration controllers is not directly supported 
by the DYNASTY Framework. However, the Framework can generate a 
configuration control schedule in a text file, from which such controllers 
can be constructed using standard ASIC/FPGA design tools or processor 
compilation tools. 
The Framework could provide a library of various reconfiguration con- 
trollers suitable for the selected target reconfigurable logic technology. Such 
a library could be used by the technology server configuration overhead 
estimation algorithms in order to provide estimates on non-deterministic 
metrics such as overheads due to random interrupts or memory contention, 
etc. 
DYNASTY's built-in Tcl language command interpreter allows for all 
of the technology server components to be defined using Tcl commands. 
Such a capability allows for the technology server to reside on a network 
and communicate the technology-specific characteristics and algorithms to 
the DYNASTY Framework remotely. 
5.2 Designing with the DYNASTY Framework 
From the user's perspective the DYNASTY Framework provides a collec- 
tion of tools allowing the designer to construct and analyse various recon- 
figurable logic design solutions in an interactive environment. A typical 
design sequence in the Framework includes the following steps: 
1. Design capture using either schematic or HDL design entry. 
2. Selection of the static parts of the design which should not be subjected 
80 
to reconfigurable logic design exploration. These are marked as static 
throughout the design flow. 
3. Design exploration of a reconfigurable design search space using temporal 
floorplanning. This involves the use of the tools described in Sec- 
tion 5.1.2. Typically, a good candidate solution is created first and 
various implementation and scheduling options are then explored in 
order to meet the design criteria. 
An initial solution can be created by allocating modules from the tech- 
nology libraries to nodes in the design CDFG (using Design browser) 
and placing these modules in the design floorplan (using 3D floor- 
planner). The design performance can then be estimated (using Sched- 
ule Editor). Design exploration is performed by gradual modification 
of design parameters (module allocation & placement, execution and 
reconfiguration schedule, spatial and temporal partitioning, etc. ). Ex- 
ecution and configuration schedules are analysed throughout the de- 
sign exploration in order to (i) verify design performance and (ii) en- 
sure that no data, control and configuration dependencies have been 
violated. 
Once a satisfactory design solution has been created, it can be ex- 
ported from the Framework for a detailed timing analysis. Any vi- 
olations of timing constraints are used to adjust design solution pa- 
rameters until all design constraints are met. 
4. Generation of final solution. The configuration bitstreams are generated 
for the final design. 
81 
5.3 Design Example 
A simple example is used in this Section to demonstrate some of the capa- 
bilities of the DYNASTY Framework. Other examples are provided in the 
core text of this thesis. In the following only the 3D Floorplanner visuali- 
sation tool will be used. The example will be implemented on a model dy- 
namically reconfigurable FPGA architecture, derived from the Xihnx XC6200 
FPGA technology (Appendix A). 
A Laplace transform operator mask design is used here as an example 
to demonstrate the design flow (its data-flow graph is shown in Fig. 5.3(a)). 
This design is also used later in the thesis (Chapter 7) to benchmark the 
performance of the developed synthesis technique. 
Let us consider an implementation of the Laplace operator on a resource- 
limited FPGA architecture (20 x 20 array). The size of the reconfigurable 
array does not allow for the entire Laplace operator to be implemented 
in a single configuration. A designer may opt to consider an alternative 
implementation where the data-flow computation is 'folded' over several 
configurations. 
In this case the designer would construct a 3D floorplan from the avail- 
able design blocks. The main design objective in most cases will be to min- 
imise latency of the execution for the entire design. The latency is deter- 
mined by both module execution latency and the configuration latency2. 
While the module execution latency is fixed for a given module type, the 
configuration latency can be reduced if module resources can be shared 
between configurations. The designer needs to identify design solutions, 
'In order to maintain clarity of the presented example, the configuration clock frequency 
was chosen so that the number of system clock cycles needed for configuration 
does not 
exceed four. Selection of ratio between the system and configuration clock will normally 
depend on design objectives and constraints. 
82 
Tonfiguration 
partitions 
partition 1 
et na 
"partition 2 
ns 
2 
1 
(a) Data-flow graph. (b) 3D floorplan (configuration 
partition view). 
Figure 5.3: Laplace operator 3D floorplan and data-flow graph after 
scheduling. Each layer in the 3D floorplan represents one design config- 
uration as partitioned by a designer. 
where such resource sharing is maximised. 
First the design modules can be partitioned into individual configura- 
tions. The 3D Floorplanner tool in a configuration view can be used to 
visualise such an initial solution (Figure 5.3(b)). Once initial partitioning 
was decided, the designer would aim to minimise configuration overhead 
with a module placement which would maximise module sharing. The ac- 
tual execution latency can be measured in the Schedule Editor and seen in 
the 3D Floorplaner using a system clock view (Figure 5.4(b)). 
5.4 Conclusions 
The DYNASTY Framework provides a combination of techniques, which 
allow simultaneous exploration of a reconfigurable design search space in 
both temporal and spatial dimensions. Technology-dependent features are 
provided by a technology server, which is unique in providing device mod- 
83 
121 ux 
1 
schedule 
execution 
order 
File 
n3 n2 n4 
n5 
n1 
system 
dock 
cycles 
9 
8 
7 
6 
5 
4 
3 
2 
0 
(a) Sequenced (b) 3D floorplan (system clock view). 
data-flow graph. 
Figure 5.4: Laplace operator 3D floorplan and data-flow graph after 
scheduling. Each layer in the 3D floorplan represents one system clock cycle; 
a pyramid indicates that a block is being reconfigured and a cube denotes 
its execution. 
els and technology-specific algorithms along with the cell and module li- 
braries. 
The experience with using DYNASTY for the design of various partially- 
reconfigurable circuits in the XC6200 technology, confirms that temporal 
floorplanning leads to a considerable reduction of the design time com- 
pared to the other iterative XC6200 design methodologies (e. g. (Robinson et al., 
1998)). 
Such a reduction can be attributed to the capabilities of the Framework, 
which provide designers with an immediate visual feedback about design 
characteristics throughout the entire design flow, and allow design manip- 
ulation at multiple abstraction levels simultaneously. Bad design decisions 
can be identified early, while the feasibility of the final design solution is 
guaranteed through checking the dependency violations. 
The Framework allows an expert human designer to control the entire 
ö4 
reconfigurable logic design process, while automatic design and estimation 
techniques can provide guidance and acceleration of computationally in- 
tensive tasks. Further 'push-button' techniques for automatic exploration 
of a multiple-level reconfigurable design search space will provide a fast 
design route in scenarios where design time is a primary objective, while 
possible implementation inefficiencies can be tolerated. One such auto- 
matic technique is presented in this thesis. 
Although further development of automatic synthesis and estimation 
algorithms for reconfigurable systems can be expected to reduce the em- 
phasis on manual reconfigurable logic design manipulation, the presented 
visualisation techniques will still be able to provide an intuitive visual frame- 
work for the analysis and manipulation of auto-generated design solutions. 
85 
Chapter 6 
Synthesis of Dynamically 
Reconfigurable Systems with 
Evolutionary Algorithms 
The theoretical model for the synthesis of reconfigurable systems presented 
in the previous chapter provides a framework in which various optimisa- 
tion algorithms can search for design solutions. 
In order to confirm the viability of this formulation an evolutionary al- 
gorithm based optimisation technique has been developed for the synthesis 
of reconfigurable systems, which is presented in this chapter. 
The following section defines a restricted problem for the synthesis of 
reconfigurable systems, which is considered by the presented evolutionary 
optimisation technique. Section 6.2 presents a newly developed temporal 
floorplanning representation of this problem. Section 6.3 discusses the suit- 
ability of various optimisation algorithms for the solution search using the 
temporal floorplanning representation. Genetic algorithms are briefly in- 
troduced in Section 6.4. The implementation of the newly developed re- 
86 
configurable system synthesis technique based on genetic algorithms is de- 
scribed in Section 6.5. 
6.1 Restricted Problem for Synthesis of Reconfigurable 
Systems 
The formulation of the reconfigurable synthesis problem presented in Chap- 
ter 4 defines a generalised and complex set of interdependent transforma- 
tions. In order to simplify the initial search for algorithms capable of solv- 
ing this problem, the formulation was restricted into a simplified instance 
of the original reconfigurable synthesis problem. 
This section presents the restricted formulation which constrains the 
type of reconfigurable system into a system, which can be synthesised us- 
ing the presented approach. Furthermore, the assumptions about the de- 
sign methodology within which this algorithm operates provide a simplifi- 
cation of several processing steps to allow practical verification of the pre- 
sented technique. 
The following are the assumptions and practical considerations which 
restrict the RS synthesis problem considered here: 
1. Only acyclic data-flow problems are considered. With respect to 
Definition 4.1.1, in the input behavioural model G(V, E) the sets of 
control vertices and control edges are assumed to be empty: 
VV=O A EE=O (6.1) 
and thus for G(V, E), it is assumed that V= Vd and E= Ed. In this 
case G(V, E) represents a data flow graph. 
This assumption ensures that no cycles and no conditional execution 
branches exist in the input problem. Therefore the design execution 
87 
and reconfiguration schedules can be determined and fixed during 
synthesis. 
While only acyclic graphs are considered, it should be noted that 
cyclic data-flow graphs can be easily transformed into acyclic graphs 
representing algorithm iterations with non-overlapping schedule. This 
approach is commonly used in high-level synthesis (Gerez, 1999, Chap- 
ter 12). 
2. The input design problem is deterministic. It is assumed that the 
input design problem is fully specified prior to synthesis and there- 
fore all behavioural design characteristics and dependencies can be 
determined during the synthesis. 
3. Logic synthesis is performed via direct mapping to 'hard' library 
modules. It is assumed that the architectural resource library for the 
targeted technology provides a set hard macro modules which are 
available during synthesis. 
The term 'hard macro modules' refers to target technology modules, 
in which primitive cells were placed relative to the module origin and 
routed locally. The use of only local routing allows for modules to be 
placed in more than one location within the target technology array. 
In respect to Definition 4.2.5, given the design architectural model 
(A, 0, a, p), the design logic model (U, N) can be constructed by re- 
placing the architectural resource instances by their corresponding 
'hard' macro modules from the target technology library. 
4. Architectural modules can be connected only via register transfers. 
It is assumed that the architectural module input/output register val- 
ues are transferred between the modules via the RLU's configura- 
88 
BEHAVIOURAL LEVEL 
dbc 
bo ßo 
>......... bi 
a1 
x 
ARCHITECTURAL LEVEL 
abc 
x 
-v 
.................... 
s0 
sl 
......................... 
FSM 
Figure 6.1: Architectural-level resource sharing controlled by an FSM. 
tion interface. No physical wiring is allowed between the architectural 
modules. 
In order to allow for inter-module wiring to exist, the synthesis pro- 
cess would have to address the routing problem for reconfigurable 
systems. This is a complex problem, for which solutions have not 
yet been proposed and therefore the routing problem is not consid- 
ered here. The routing problem in reconfigurable systems is further 
discussed in Section 8.2. 
5. Only a restricted case of architectural-level resource sharing is con- 
sidered. 
Implementation of resource sharing at the architectural level may re- 
quire that a finite-state machine (FSM) is constructed for each in- 
stance where such sharing occurs. This FSM controls the access of 
input/output signals and registers to the ports of the shared architec- 
tural resource (Fig 6.1). The existence of such a FSM would require 
that both the FSM and its supporting logic are synthesised as a part 
of the RS synthesis. Such FSM synthesis is not considered here. 
89 
BEHAVIOURAL LEVEL ARCHITECTURAL LEVEL 
abCab 
a+bo 
bo .. 
00 Zo IT, 11-T ii 
ý"""""""".., Configuration 
........................ bi . --" ..............:::::: ADD aý interface data 
1 transfers 00 
x 
a+b 
x 
Figure 6.2: Architectural-level resource sharing with module ao shared be- 
tween behavioural computations bo and b1. The values stored in registers 
to, il, oo are transferred via the configuration interface. 
However, an architectural module can be shared between two behavi- 
oural computations without any additional logic in the arrangement 
shown in Fig. 6.2. In this case, the shared module remains configured 
in the RLU, while its input/output registers are programmed with 
new values via the configuration interface. 
This type of architectural-level resource sharing is considered by the 
presented restricted synthesis problem. Resource sharing at physical 
level is also considered 
6. Target architecture assumptions. It is assumed that synthesis is tar- 
geting the reconfigurable system architecture depicted in Fig. 2.1, with 
the following restrictions/features: 
" the target architecture is a synchronous system. Both the compu- 
tations in the RLU and the system's RCU are synchronised with 
one system clock signal. The communication between the 
RLU 
and RCU is controlled by a separate configuration clock, which 
90 
T sys_cik 
Tconfig_clk 
(a) 
Tsys_c1k =3X Tconfig_clk 
Zsys_c1k 
Tconfig_clk 
(b) Tsys_c1k > Tconfig_clk 
Tsys_clk 
Tconfig_clk 
t 
(c) Ts s_clk < Tconfig_clk 
Figure 6.3: Relationship between the system and configuration clock sig- 
nals. Tconfig_cuk and Tsys_cik are the periods of the configuration clock signal 
and system clock signal respectively. 
is synchronised with the system clock at the beginning of the 
system clock cycle, although it may run asynchronously within 
the system cycle. Examples of possible timing relationships be- 
tween the system and configuration clock signals are illustrated 
in Fig. 6.3. 
This ensures that the relationship between the system and con- 
figuration clock signals is well defined. 
9 the RLU is implemented on a single reconfigurable logic device 
and therefore partitioning between multiple reconfigurable de- 
vices will not be considered 
" reconfiguration is controlled by a dedicated RCU, which is ex- 
ternal to the RLU and therefore all RLU resources are available 
for the design implementation 
" the RCU-CDS and RCU-RLU interfaces are implemented as ded- 
icated interfaces which are not shared between other system corn- 
portents. The data transfers across these interfaces will be fully 
91 
deterministic and therefore there is no need to consider delays 
due to bus sharing and arbitration. 
" the configuration and data stored within the CDS can be read/written 
within one configuration clock cycle. Furthermore, the CDS has 
a storage capacity sufficient for the storage required for the con- 
figuration, RLU state and application-specific data. 
7. Synthesis goals. It is assumed that there are two main goals for re- 
configurable logic system synthesis: 
" The primary goal is to produce a feasible reconfigurable logic sys- 
tem (Definition 4.2.7). 
" The secondary goal is to produce a system which meets the con- 
straint on its execution latency. 
These goals guarantee that even if the solution violates the secondary 
design goal (execution latency constraint), it will be guaranteed to 
operate correctly in the target reconfigurable system. Therefore the 
approach can be used for feasibility evaluation of the given design 
constraints (e. g. design latency and the targeted technology). 
The above assumptions restrict the application domain for which the 
presented synthesis technique will be applicable. However, these assump- 
tions are representative of many practical reconfigurable systems imple- 
menting real-time applications, such as digital signal and image process- 
ing, but also other applications which use data-flow computations. 
92 
6.2 Synthesis Process Overview (Temporal Floorplan- 
ning) 
The first problem in the reconfigurable synthesis formulation is resource 
allocation and binding. Then a scheduling problem can be considered. The 
feasibility of a reconfigurable design schedule for the selected type of tar- 
get technology may depend greatly on the feasibility of the design layout, 
where there should be no spatial or temporal conflicts between the design 
configurations. 
The restricted RS synthesis problem assumes that the logic synthesis 
is replaced by direct mapping of the architectural modules onto the target 
technology hard macro modules. Such an approach allows for the physical 
characteristics of the technology modules, e. g. module dimensions and la- 
tency, to be known at a high level. Furthermore, as no wiring is considered, 
physical synthesis translates to a problem of design module placement. 
Due to the interdependent relationship between the above problems, 
i. e resource allocation, resource binding, design scheduling and module 
placement, it would be useful to consider the problems together. This is 
possible within a 3D floorplan model (see Fig. 6.4(a)). 
In the 3D floorplan the horizontal x/y-coordinates represent spatial po- 
sitions of the design modules, while vertical z-coordinate represent the sys- 
tem dock cycles. In Fig. 6.4(a) the execution of the system proceeds from 
the bottom (z = 0) towards the top of the floorplan. The 3D dimensions 
of the design module represent its spatial and temporal characteristics, ex- 
tracted from the library cell linked with the design block as a result of re- 
source allocation and binding. 
The 3D floorplan can be therefore represented as a 4-tuple (A, ß, 0', 
ý). 
Considering the above interpretation of the 3D floorplan, the 4-tuple can 
93 
system 
clock 
cycles 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
Eva -9 nx 
File View 
n4 n5 
n3 n2 I 
---ý(, 
n1 
122 i21 112 ßi32 i23 
nl 
n2 a3 
+11 
schedule 
execution 
order 
(a) 3D floorplan. 
Device reconfiguration Block/CDFG node 
intervals execution 
File View /Calculate 
0 Vi 234 5& 67 
U0 
ut 
: 
ý, yR2 
u2 
u3 eää94 
u4 
(b) Scheduled DFG. (c) Design execution and configuration schedule. 
Figure 6.4: A Laplace operator mask 3D floorplan and data-flow graph dur- 
ing temporal floorplanning. Each layer in the 3D floorplan represents one 
system clock cycle; a pyramid indicates that a block is being reconfigured 
while a cube denotes its execution. 
94 
be expressed as (A,, 8, z, x, y) 
In the following, the process of design synthesis using the above 3D 
floorplan model is referred to as temporal floorplanning. Temporal floorplan- 
ning involves finding a solution to the following problems: 
9 resource allocation and binding, i. e. finding the set A in Eq. 4.1 and ,Q 
in Eq. 4.2 
" design execution scheduling, i. e. finding o" in Eq. 4.4 
" reconfiguration overhead calculation and configuration scheduling, 
i. e. finding p, in Definition 4.2.5 and scaling the reconfiguration over- 
head to Setup() function in Eq. 4.4 
" design block placement, i. e. finding 0 in Definition 4.2.6 
while the solution feasibility condition is not violated (Definition 4.2.7) and 
the set of performance constraints b is satisfied. 
Once the final solution has been found, the information contained within 
the 3D floorplan can be used to extract the configuration data necessary 
for the implementation of the design functionality in the target reconfig- 
urable logic technology. Furthermore, the configuration schedule p, can 
be extracted and used to construct a reconfiguration controller, which will 
control the system's reconfiguration. 
6.2.1 Technology Independence 
In the 3D floorplan model, the base 2-D floorplan array is composed of 
blocks which encapsulate the primitive components of the target technol- 
ogy. For example in Fig. 6.4(a) a primitive floorplan block corresponds to a 
logic block and a set of routing multiplexors in the Xilinx XC6200 technol- 
ogy (see Fig. A. 1 in Appendix A, page 149). While different reconfigurable 
95 
devices provide different primitive physical components, their abstraction 
in the 3D floorplan is the same. Therefore, the 3D floorplan abstraction 
allows algorithms to operate within one common model, while targeting 
different technologies. 
6.3 Optimisation Algorithm Selection 
Chapter 4 has formulated the problem of synthesis for reconfigurable sys- 
tems. Although only a restricted case of this problem is considered for auto- 
matic synthesis here, the formulation demonstrates tight interdependence 
between the individual tasks. Most importantly, the technology-specific 
features such as partial reconfiguration and sophisticated features of recon- 
figurable logic configuration interfaces, can further complicate these inter- 
dependencies. 
In the search for a suitable optimisation algorithm for this problem a 
number of options were examined. 
In non-reconfigurable system synthesis, simple and fast heuristics are 
used to solve the individual synthesis problems. These techniques cannot 
guarantee the feasibility of the generated solution because they operate on 
simplified problem models, which do not consider the low-level design 
characteristics and technology constraints. In the case of reconfigurable 
systems, examples which use simple heuristic search techniques include 
(Ling and Amano, 1993b) and also the author's previous work (Vasilko and 
Ait-Boudaoud, 1996a). 
Given the complex formulation of the problem of synthesis for recon- 
figurable systems, and the tight interdependence between the individual 
synthesis problems, techniques capable of considering all the interdepen- 
dencies should be more appropriate. The following techniques were con- 
96 
sidered: 
" Integer Linear Programming (ILP) (e. g. (Nemhauser and Wolsey, 
1988)) is a robust optimisation technique capable of finding a glob- 
ally optimal solution for any problem which can be formulated as a 
set of linear relations. However, the run time of these techniques is 
prohibitive for large and interdependent problems. Examples of us- 
ing ILP optimisation on simplified models of reconfigurable system 
synthesis include (Sels, 1996; Kaul and Vemuri, 1998). 
" Simulated Annealing (Kirkpatrick et al., 1983) is a stochastic optimi- 
sation method based on a mathematical model of annealing in natu- 
ral systems. The technique gained its popularity through its ability to 
climb out of local minima. 
Although the run time of simulated annealing based optimisation can 
be significantly shorter than that of an ILP method, it can still be pro- 
hibitive for large problems. The long run times can be attributed to 
the technique using relatively small steps to progress towards new 
solutions during the search. A simulated annealing based algorithm 
for the synthesis of reconfigurable systems in a simple 3D floorplan 
was demonstrated by Bazargan et al. (1999). 
" Evolutionary Algorithms (Holland, 1975; Goldberg, 1989) are a global 
optimisation technique based on a model of a Darwinian evolution. 
Like simulated annealing, this technique is capable of 'hill-climbing'. 
Evolutionary algorithms operate on a large set of solutions rather 
than on a single solution. This allows for a large number of alter- 
native solutions to be examined within a short period of time. The 
application of evolutionary algorithms to a simplified problem of re- 
97 
configurable system synthesis was considered by Zhang et al. (1998). 
Evolutionary algorithms were selected as a suitable candidate for the 
evaluation of reconfigurable system synthesis using the presented problem 
formulation. One category of evolutionary algorithms, genetic algorithms 
(GAs), have attracted a considerable interest over recent years for their ap- 
plications to complex real-world problems. 
Evolutionary algorithms have also been demonstrated to successfully 
solve multi-objective and interdependent problems in VLSI CAD (e. g. (Dodhi et al., 
1995; Ohmori, 1995; Morris and Nowrouzian, 1996; Sait et al., 1996)). 
The ability of genetic algorithms to search a large pool of very differ- 
ent solutions early during the solution search will be beneficial for recon- 
figurable system synthesis. While the overall objective of reconfigurable 
system synthesis is to produce a feasible solution which meets the perfor- 
mance constraints, it is often only necessary to quickly evaluate the feasi- 
bility of the target technology or a performance constraint for a given input 
problem. A genetic algorithm will produce many different solutions early 
during the solution search, allowing designers to assess the suitability of 
the target technology from these early results. 
6.4 Genetic Algorithms 
Genetic algorithms (Goldberg, 1989; Holland, 1975) are a stochastic opti- 
misation technique based on principles of natural evolution. They operate 
using a mathematical model of natural evolution based on the 'survival of 
the fittest' strategy, which is similar to the process thought to occur in na- 
ture, and which can lead to the selection of the best solution for a given set 
of environmental conditions. 
98 
GA chromosome encoded value 
d 
100101= 37 
MSB LSB 
Figure 6.5: An example of a chromosome coding in a genetic algorithm (1- dimensional binary string). The binary value encoded in the chromosome 
is linked to the system variable under optimisation. 
An optimisation problem which is to be considered by the genetic al- 
gorithms is encoded so that it can be manipulated by the algorithm. The 
individual solutions are coded in a data structure representing chromosomes. 
The values encoded in the chromosome are linked to system variables, the 
values of which are the subject of GA optimisation. For example in Fig. 6.5, 
the solution is represented by a binary string of a fixed size. Its encoded in- 
teger value can be linked to a single one-dimensional system variable, e. g. 
an angle, velocity, position, priority, etc. 
The chromosomes are grouped to form a population of possible solu- 
tions, which is manipulated by two types of genetic operators during the 
evolutionary process: crossover and mutation. 
The crossover operator selects pairs of chromosomes from the old gen- 
eration ('parents') and performs their 'mating' to generate two new chro- 
mosomes ('children') in the new generation (Fig. 6.6). These children may 
combine some of the 'good' characteristics of their parents. Therefore there 
is a likelihood that the crossover operator will produce at least some new 
chromosomes which will have better characteristics then the parents. 
The mutation operator selects a chromosome from the old generation 
and introduces a random change to the chromosome which is then stored 
in the new generation (Fig. 6.7). This allows for the generation of new 
'offsprings' which do not inherit the entire set of parents' characteristics. 
99 
old individuals new individuals 
parent 1 child 1 
1001 ; 
r2: 
1 `1 1000 
ent child 2 
0110p101 
Figure 6.6: An example of a crossover operator (one-point crossover). 
old individual 
parent 
L1 101011 [oil 
T 
new individual 
child 
11o1,0 0 
invert 
Figure 6.7: An example of a mutation operator (random 'flip' mutation). 
This is the mechanism which allows genetic algorithms to escape the 'lo- 
cal minima' during the evolutionary search process. There is a possibility 
that new characteristics introduced by the mutation operator may provide 
some chromosomes with different and better characteristics than those of 
the parents. 
A generic simple genetic algorithm can be expressed by the procedure 
shown in Algorithm 6.1. 
Initialisation of a population creates a set of individuals with popula- 
tion size selected by one of the parameters of the genetic algorithms. This 
initialisation is often performed by a simple random assignment of the 
chromosome values. 
100 
Algorithm 6.1 Simple Genetic Algorithm 
initialise a population 
evaluate the population fitness 
while stopping criteria is not satisfied do 
select individuals for the next generation 
apply crossover 
apply mutation 
evaluate the population fitness 
end while 
After initialisation the genetic algorithm enters into an iterative loop 
which simulates natural evolution by performing a sequence of genetic op- 
erations. 
First, the fitness of each individual in the population is evaluated. The 
fitness provides a measure of the individual's quality and is traditionally 
represented by a scalar value. 
The fitness evaluation is followed by a selection procedure, which selects 
the individuals which will survive in the next generation. This selection is 
based on the individuals' fitness: those individuals with high fitness are 
more likely to survive than the individuals with low fitness. After selec- 
tion a new generation of individuals is stored as the current population to 
generate a new generation. 
Then both crossover and mutation operators are applied to individuals 
randomly selected from the current population. 
The entire process repeats until one or more stopping criteria are satis- 
fied. Various stopping criteria have been used with genetic algorithms. 
The limit on the total number of generations, and population convergence 
(similarity of individuals within the entire population), are most commonly 
101 
used. 
6.5 Implementation of an Automatic Reconfigurable 
System Synthesis 
When genetic algorithms are considered for the optimisation of any given 
problem, it is necessary to examine several problem-specific issues: 
" problem representation 
" population initialisation 
" selection of genetic operators (crossover and mutation) 
" fitness function 
" selection of a genetic algorithm procedure and control parameters 
6.5.1 Problem Representation 
Conventional GA-based optimisation techniques use simple data structures 
to represent the problem variables. Examples include binary (Fig. 6.7) or 
integer strings, arrays and trees. 
The problem of reconfigurable system synthesis represents a collection 
of complex and interdependent relationships. Although temporal floor- 
planning provides a simplification of this process, it still involves a solution 
search for several interdependent problems. 
In order to use genetic algorithms successfully, the problem represen- 
tation and the genetic operators must match well the characteristics of the 
problem. For the problem of temporal floorplanning a problem-specific 
representation was developed together with a set of problem-specific ge- 
netic operators. 
102 
Algorithm 6.2 Population initialisation 
Require: G(V, E), n- population size 
1. Create n individuals in population P 
for all p2 EP do 
2. Perform resource allocation (Algorithm 6.3) 
3. Place the design in a 3D floorplan (Algorithm 6.4) 
4. Calculate the reconfiguration latency and correct the 3D floorplan 
using a technology-specific procedure (Algorithm 6.13). 
end for 
A composite chromosome was designed, where the chromosome genes 
are linked with behavioural model elements and represent the following 
behavioural element properties: (Fig. 6.8): 
" resource binding, represented as a link to the library or a shared de- 
sign module implementing the desired behavioural functionality 
" 2-D position in the target technology device represented as a pair of 
integer floorplan x/y coordinates 
" temporal position which represents the system's execution schedule 
time slot 
6.5.2 Population Initialisation 
An individual in a population is initialised by the procedure in Algorithm 6.2. 
First a population of the required size is created. Then all individu- 
als in the population are initialised to hardware modules using a greedy 
'first come - first served' allocation and binding algorithm 
(Algorithm 6.3), 
followed by a random placement of the design modules in a 3D floorplan 
(Algorithm 6.4). The placement is followed by a procedure (Algorithm 6.4) 
103 
FIIN Vlies 
n4 n5 
n3 n2 
ýj eit r 
n1 
i ý1 
SYSTEM CLK CYCLE 7 
(X, Y) FLOORPLAN POSITION 32 
RESOURCE BINDING ADD 
GENES 
CHROMOSOME 
GA POPULATION 
ARCHITECTURAL 
RESOURCE LIBRARY 
Figure 6.8: Reconfigurable system synthesis problem GA representation. 
ALU 
....... 
ADD 
MOLT 
0 
104 
Algorithm 6.3 First-come first-served allocation and binding 
Require: G(V, E), B=VUE, R, A=0 
for all bi EB do 
for all r3 ERdo 
if F(bi) 9 F(TC) then 
create a2, A=AU ai 
rj ai 
ai +- bi 
break 
end if 
end for 
end for 
Ensure: Eq. 4.1 and Eq. 4.2 are satisfied. 
which checks and corrects any data or configuration dependency conflicts. 
All such conflicts are corrected in order to guarantee that the pool of initial 
solutions is feasible. 
6.5.3 Selection of Genetic Operators 
A set of genetic operators was developed for the above problem representa- 
tion. The operators exploit the problem knowledge to perform operations 
similar to those a human designer would do during manual optimisation of 
such a problem in a 3D floorplan. 
The generic operators were selected to be able to manipulate each of 
the properties encoded in the composite chromosome described in 
Sec- 
tion 6.5.1. For each property both crossover and mutation operators were 
developed to ensure that good partial solutions are preserved across gener- 
ations (using crossover), but also new alternative solutions can 
be explored 
105 
Algorithm 6.4 3D floorplan placement 
Require: G(V, E), A, ß 
for all ay EA do 
1. Randomly select the next module aj =ß (vj) such that all the mod- 
ules of vg's predecessors were already placed. 
2. Place aj at a random (x, y) position in the next available system 
clock cycle such that Eq. 4.4 is satisfied assuming Setup() =0 and the 
module fits within the target device. 
end for 
to escape local minima (using mutation). 
A special mutation operator (Section 6.5.5.4) was developed after it was 
observed that many solutions could be improved by slightly changing the 
locations of the design modules. These slight changes would allow mod- 
ules to 'descend' into lower locations in a 3D floorplan, which could create 
new solutions with reduced design latency. 
The operation and implementation of all generic operators is further 
discussed in the following Sections 6.5.4 and 6.5.5. 
The probability of application of the operators is controlled by an evo- 
lution control strategy designed for this algorithm (Section 6.5.9). 
6.5.4 Crossover Operators 
The crossover operator simulates 'mating' between two parent individuals 
which produces two child individuals in the new generation. 
The follow- 
ing problem-specific crossover operators were developed: 
106 
Algorithm 6.5 Module binding crossover 
Require: parentl, parent2, px_bind (crossover probability) 
for all v2 EV do 
if toss_coin(PX_bind) then 
exchange ß(v2) between v2 in parentl and parent2 
end if 
end for 
Algorithm 6.6 2D floorplan crossover 
Require: parentl, parent2, PX-2D (crossover probability) 
for all vi EV do 
if toss_coin(pX-2D) then 
exchange (x, y) coordinates between vi in parentl and parent2 
end if 
end for 
6.5.4.1 Module binding crossover 
This exchanges modules bound to identical CDFG nodes in parent solu- 
Lions (Algorithm 6.5). This operator aims to preserve binding of modules 
between the generations. 
6.5.4.2 2D floorplan crossover 
This exchanges X-Y positions between the modules in one floorplan layer. 
This operator will preserve the relative spatial positions between the mod- 
ules in a 3D floorplan (Algorithm 6.6). 
6.5.4.3 3D floorplan crossover 
This exchanges the positions of a randomly-sized group of modules be- 
tween the two parent 3D floorplans. This will copy the X-Y-Z positions of 
107 
Algorithm 6.7 3D floorplan crossover 
Require: parentl, parent2, PXSD (crossover probability) 
for all v2 EV do 
if toss_coin(pX_3D) then 
exchange (x, y, z) coordinates between v2 in parentl and parent2 
end if 
end for 
the selected modules (both horizontal and vertical placement). This oper- 
ator preserves both spatial and temporal relationship between the design 
modules in a 3D floorplan. Relative module position in Z-direction char- 
acterises the possibility for physical resource sharing between the blocks 
(Algorithm 6.7). 
6.5.5 Mutation Operators 
A mutation operator simulates random changes to one or more individuals. 
The following problem-specific mutation operations were developed: 
6.5.5.1 Module binding mutation 
This changes the module bound to a given CDFG node to a module of a 
different type, but with the same functionality (e. g a ripple-carry adder can 
be swapped for a carry-lookahead adder). Both the architectural module 
library and the modules already instantiated in the design are searched for 
new architectural modules with identical functionality. This allows for ei- 
ther new module types to be introduced in the design solution or for mod- 
ules with identical functionality to be shared (Algorithm 6.8). 
108 
Algorithm 6.8 Module binding mutation 
Require: parent, Pmut_bind (mutation probability) 
for all v2 EV do 
module-selection-set +- lib modules(F(v2)) U design-modules(F(vi)) 
if toss_coln(px_bind) then 
remove old 0 (vi) from parent 
select mj from module-selection-set at random 
create new binding 0 (v2) = mj 
end if 
end for 
Algorithm 6.9 2D floorplan mutation 
Require: parent, Pm-2D (mutation probability)) 
for all vi EV do 
if toss_coin(pm D) then 
calculate_validi(Y-range(ß (vi) ) 
change (x, y) coordinates for vi at random (within the valid range) 
end if 
end for 
6.5.5.2 2D floorplan mutation 
This randomly changes the X-Y coordinates of the selected module. This 
operator changes the module position(s) relative to the entire design floor- 
plan (Algorithm 6.9). 
6.5.5.3 3D floorplan mutation 
This changes the X-Y-Z coordinates of the selected module(s) in a 
3D floor- 
plan layer. 
109 
Algorithm 6.10 3D floorplan mutation 
Kequire: parent, pm-3D (mutation probability)) 
for all vi EV do 
if toss_coin(Pm-3D) then 
calculate_validXYZ-range(ß (v2) ) 
change (x, y, z) coordinates for vi at random (within the valid range) 
end if 
end for 
6.5.5.4 3D floorplan 'shaking' 
This special mutation operator was developed to simulate the effect of ran- 
domised 'shaking' of the entire 3D floorplan. This is a greedy algorithm, 
which generates new X-Y coordinates for randomly selected modules in 
the solution. 
The floorplan 'shaking' may lead to further compaction of the 3D floor- 
plan in Z direction as modules from bottom layers may create sufficient 
space to allow modules from higher layers to 'descend'. 
6.5.6 Overall Synthesis Procedure 
The overall procedure using the proposed technique is outlined in Algo- 
rithm 6.12. 
6.5.7 Solution Feasibility 
Application of each of the genetic operators introduced above may lead 
to a spatial or temporal conflict. In order to guarantee the feasibility of 
each individual solution in the population, the procedure in Algorithm 6.13 
recalculates the design schedule for the entire 3D floorplan. 
110 
Algorithm 6.11 Floorplan 'shaking' mutation 
Require: parent, Pm-shake (mutation probability) 
if toss -coin 
(pm. shake) then 
for all vi EV do 
Vtmp +- V 
while VtLp 00 do 
choose integer j at random from interval (1, n Vtmp 
) 
Xdir = toss coin() 
Ydir = toss coin() 
calculate_validi(Y-range(ß(va), Xdir, Ydir) 
change (x, y) coordinates for ß(vj) at random (within the valid 
range) 
remove vj from Vtmp 
end while 
end for 
end if 
111 
Algorithm 6.12 Overall GA-based synthesis procedure. 
Require: G(V, E), Amax, T, r, Px_bindý PxJCY. Px-XYZ/ Pm-bind" PmXY, Pm-CYZ, 
Pmshake 
1. Read G(V, E) and analyse its dependencies 
2. Initialise target technology device 
3. Initialise GA: 
Pold = initialise(G, T, n) {initial population} 
F= evaluate chromosome- itness(Poid) 
4. Do the evolution: 
while stopping criteria is not satisfied do 
Pnew = selectjndividuals(Pold, F) 
Pnew = apply_crossover(Pnew) 
Pnew = applysnutation(Pnew) 
F= evaluate_chromosome_fitness(Pnew) 
end while 
5. Store the best solution 
s= select best individual(Pnew, F) 
extract 3D floorplan positions & reconfiguration schedule p, 
from s 
112 
Algorithm 6.13 3D floorplan correction and reconfiguration latency calcu- 
lation 
Require: B=VUE, A, L (3D floorplan model) 
1. Perform ASAP resource-constrained scheduling dbi E B, ai EA so 
that (Eq. 4.4) is satisfied, while assuming Setup(a2) = 0. 
2. Calculate reconfiguration latency using a technology-specific proce- 
dure 
block-list <- A 
while block_list 0 do 
select available block ai from block-list (selected at random if 2 or more 
are available) 
cai f- calculate reconfiglatency(ai, L) {Technology-specific} 
o, p i- update_design_schedules(az, ca, i, o, p, 
L) 
remove ai from block-list 
end while 
113 
First the design blocks are rescheduled using the 'as soon as possible' 
(ASAP) scheduling algorithm constrained to operate within the 3D floor- 
plan. This removes any behavioural dependency violations between the 
modules positioned in the 3D floorplan. The ASAP scheduling affects only 
z-coordinates of the design modules. 
In the second step, the actual reconfiguration latencies for all modules 
in the design are calculated. The procedure selects modules from the list 
of 'available" modules and then calculates the reconfiguration latency for 
the selected module. A technology-specific function evaluates the actual 
reconfiguration latency. If two or more modules are available, one of the 
available modules is selected at random. 
In the presented approach a simple greedy algorithm from the DYNASTY 
Framework's XC6200 technology server was used to evaluate the module 
reconfiguration latencies. The algorithm inserts additional system clock cy- 
cles for the violating modules until all conflicts in the design schedule are 
resolved. The algorithm uses an intermediate content of the configuration 
memory to evaluate the possibility of reusing previous configurations. Al- 
ternatively, a more sophisticated approach could have been used to explore 
specific features of the target technology (e. g. (Hauck et al., 1998)). 
After the reconfiguration latency has been calculated, the cycle-accurate 
reconfiguration schedule pc is updated and scaled to the execution sched- 
ule v and configuration schedule p. This ensures that the accurate reconfig- 
uration overheads are considered throughout the temporal floorplanning 
process. 
'Design module is 'available' if and only if reconfiguration latency of all its predecessors 
was already calculated. 
114 
6.5.8 Problem-Specific Fitness Function 
A simple fitness function extracts the overall execution latency of the gen- 
erated solutions from the 3D floorplan. The fitness is calculated as the ratio 
of the desired execution latency to the latency extracted from the 3D floor- 
plan. 
Other design characteristics could be examined during the fitness eval- 
uation, including the size of the configuration data, power consumption, 
device usability, etc. These are not considered here. 
6.5.9 Selection of a Genetic Algorithm Procedure and Control Pa- 
rameters 
The simulated evolution is being controlled by a core steady-state genetic 
algorithm with tournament selection. A supplementary monitoring algo- 
rithm is implemented, which is used to control the frequency of application 
of the selected genetic operators. These can change probabilities dynami- 
cally in response to population convergence changes during the course of 
the evolution. The control function and the individual probabilities can be 
changed by the designer. 
The overall strategy is to apply the operators which produce big changes 
(e. g. 3D floorplan crossover) in the design solution early during the evolu- 
tion (when the population divergence is large). As the confidence in the 
generated solutions increases (observed as decreasing population diver- 
gence) the probability of the fine-tuning operators (e. g. 2D floorplan mu- 
tation) increases at the expense of operators producing big changes. The 
evolution terminates once a solution satisfying the design objectives 
has 
been generated or on command from the designer. 
115 
6.5.10 Implementation 
The genetic algorithm based synthesis technique presented in this chapter 
was implemented using the MIT GAlib library (Wall, 1996) and the DY- 
NASTY Framework (Chapter 5). 
6.5.11 Summary 
A new genetic algorithm based technique has been developed for a specific 
case of the complex problem of reconfigurable systems design. A problem- 
specific chromosome representation was constructed together with a set of 
problem-specific genetic operators. 
The presented approach represents a combination of technology-specific 
heuristics responsible for ensuring the solution feasibility, and knowledge- 
based and problem-specific genetic algorithm manipulations within a de- 
sign 3D floorplan. 
In contrast to traditional applications of genetic algorithms, the chro- 
mosome problem representation is 'corrected' after the application of ge- 
netic operators in order to rectify possible conflicts or inefficiencies result- 
ing from a different reconfiguration overhead distribution in the newly cre- 
ated individuals. 
The correction procedure evaluates the reconfiguration overheads and 
reschedules the execution of the problem in order to guarantee the feasibil- 
ity of each such potential design solution. 
The correction procedure can completely change all absolute 3D floor- 
plan coordinates. This, however, does not destroy the main design char- 
acteristics, which are determined by the relative placement of the design 
modules in the 3D floorplan. The relative 3D placement represents the 
module overlaps at fine-grained level, but also their relative execution de- 
116 
pendencies resulting from the schedule feasibility condition (Eq. 4.4). 
A number of experiments were conducted using the presented synthe- 
sis technique in order to ascertain its capabilities. The results from these 
experiments are discussed in the following chapter. 
117 
Chapter 7 
Experimental Results 
The previous chapter has outlined a reconfigurable systems synthesis tech- 
nique, which uses genetic algorithms to search for a solution in a complex, 
multi-dimensional search space. This chapter presents a selection of the 
experimental results used to assess the capabilities of the technique. 
7.1 Benchmark Problems 
No standard set of benchmarks exists for the evaluation of synthesis tech- 
niques for dynamically reconfigurable logic. Therefore, different methods 
for the evaluation of the qualities of previously reported techniques have 
been used. For example, Chatha and Vemuri (1999) use a single benchmark 
of a JPEG algorithm to explore the partitioning between software and re- 
configurable hardware, Bazargan et al. (1999) use a set of random graphs 
to illustrate the capabilities of their 3D floorplanning technique, Lysaght 
and Stockwood (1996) and Luk et al. (1997b) use a simple pattern matcher 
circuit to demonstrate the capabilities of the low-level modelling features 
of their respective techniques, while Vasilko and Ait-Boudaoud (1996a) and 
Zhang et al. (2000) use a high-level synthesis benchmarks originally devel- 
118 
Benchmark Graph Source 
Laplace filter operator Fig. 7.1 (Heron and Woods, 1996) 
Differential equation solver Fig. 7.2 (Paulin et al., 1986) 
Elliptic wave filter Fig. 7.3 (Dewilde et al., 1985) 
Table 7.1: Behavioural benchmarks used in the synthesis evaluation. 
oped for non-reconfigurable systems. 
In the evaluation of the approach presented in this thesis, the following 
requirements were placed on the selection of benchmark problems: 
9 all benchmarks should be behavioural design problems, which satisfy 
the restricted formulation presented in Section 6.1 
9a reference design implementation designed by hand or other near- 
optimal method should exist for at least some of the benchmarks 
used. This will allow assessment of the quality of the synthesised 
results 
" benchmarks should be of various sizes so that the presented synthesis 
technique can be exercised for problems of different sizes 
" benchmarks should be composed of functions for which modules ex- 
ist in the target technology library 
Table 7.1 shows the set of behavioural design problems selected to demon- 
strate the performance of the presented synthesis technique (their data- 
flow graphs are shown in Figs 7.1-7.3). The implementation of these bench- 
marks using a dynamically reconfigurable system is considered here. 
119 
Figure 7.1: Laplace operator data-flow graph. 
Figure 7.2: Differential equation solver data-flow graph. 
120 
Figure 7.3: Elliptic wave filter data-flow graph. 
121 
7.2 Target Technology 
A model technology for the implementation of the target RLU was used 
and implemented for this purpose in the DYNASTY Framework (Chap- 
ter 5). The technology is based on the Xilinx XC6200 series of partially re- 
configurable FPGAs (Xilinx, 1997b), while the original technology was en- 
hanced to include several different reconfiguration subsystems. Appendix A 
summarises the relevant details of the implementation of the model target 
technology. 
The availability of different reconfiguration subsystems (Section A. 2) 
allows comparison of the tradeoffs between the implementations using dif- 
ferent types of subsystem within the same logic array architecture. 
A small library of modules suitable for the experiments using the pre- 
sented synthesis approach was derived from the existing XC6200 module 
libraries (Section A. 3). 
Note that because in the presented approach a design solution is con- 
structed only from the modules available in the target technology module 
library, the efficiency of design implementation is limited by the implemen- 
tation of the library modules. For example, the constant multiplication 
by 
4 in the Laplace operator benchmark (Fig. 7.1) could be implemented using 
a simple shift operation. However, because only a full A*B multiplier 
is 
available in the target technology module library (Table 
A. 1), the multipli- 
cation will be implemented using this multiplier module. 
7.3 Experimental Procedure 
Each of the benchmark problems was synthesised using the algorithm pre- 
sented in Chapter 6. A selection of target reconfigurable 
device array sizes 
122 
Module Operation Relative latency = TS 0d-ß 1 
ADD + 1 
SUB - 1 
GTN > 1 
MULT * 3 
Table 7.2: Relative module latencies used during the synthesis of examples. 
have been used. Library modules with 8-bit operands were used for all 
of the presented experiments. All experiments were conducted using the 
model XC6200 technology with a 8-bit parallel random access configura- 
tion subsystem (Section A. 2). 
A system clock cycle of 50 ns was used. The functional unit module 
latencies (shown in Tables A. 2-A. 5 in Appendix A) scaled to the system 
clock cycle of 50 ns are shown in Table 7.2. 
The configuration clock cycle was assumed to be equal to the system 
clock cycle (Tconfig_clk = Tsys_clk)" 
The genetic algorithm synthesis technique was run until the population 
converged and no further improvement to the best design solution in the 
population was found. 
The following configuration subsystems of the targeted model XC6200 
technology (Section A. 2) were used to compare the algorithm performance 
with different types of configuration mechanisms: 
" 8-bit parallel random access (original Xilinx XC6200) 
9 pre-loaded multiple contexts (similar to MIT DPGA) 
After automatic synthesis, the benchmark design implementations were 
sample tested for cycle-accurate functionality using the following method. 
123 
7.3.1 Design Verification 
A simulation model for each benchmark design was created in order to ver- 
ify the correctness of the design reconfiguration schedule. A 'clock morph- 
ing' (CM) simulation model (Vasilko and Cabanis, 1999) has been used for 
this purpose. 
The simulation model provides behavioural models for all design mod- 
ules. These are connected to a global reconfiguration controller (RCU) 
model via separate clock signals (Fig. 7.4). The RCU implements the de- 
sign reconfiguration schedule. During the simulation the module reconfig- 
uration is modelled by assigning special signal values to the module clock 
signals. A direct connection of the system clock signal to the module clock 
signal indicates that the module is active (rd and n4 in Fig 7.4). Clock sig- 
nals of inactive design modules (n2, n3 and n5 in Fig 7.4) are driven to 
special 'V' values, which indicate that these modules are in virtual states. 
The CM simulation model was preferred to other approaches to recon- 
figurable system simulation because of its flexibility and the capabilities 
to highlight the problems with design reconfiguration scheduling and re- 
source sharing. 
Only one VHDL model had to be written for each benchmark design. 
In order to simulate the various design solutions, the design modules in the 
simulation model are annotated with their respective spatial floorplan po- 
sitions during the simulation. There is no need to recompile or regenerate 
a new simulation model for each design solution. 
The functionality of the simulation model required to support the vir- 
tual state propagation has been implemented in a modified version of the 
IEEE st d_1 og ic _116 4 
VHDL package. Details of this implementation can 
be found in (Vasilko and Cabanis, 1999). 
124 
RLU register array config_clk II sys_clk 
(0,15) 
(0,1 
(0,15) 
MULT ADD 
n1 n2 
(15,15) 
ADD 
n3 
register values 
transfer 
/ýý 
ý --i 
)) 
(9,8) 
ADD 
11M 
k 
cn 
o -ý 
'v' 
, . V 
sys_clk 
n5 
SUB 
IV, 
Figure 7.4: An example of a CM-based simulation model used during veri- 
fication (Laplace operator design). The modules nl and n4 are active, posi- 
tioned at specific coordinates in the RLU and their registers are connected 
to the RLU array registers. Inactive modules are disconnected from the 
RLU register array by driving their respective clock inputs to the virtual 
state "V'. The operation of the entire system is controlled by an RCU model. 
125 
A small set of test vectors was used to confirm the functionality of the 
reconfigurable design implementation. 
7.3.2 Design Implementation 
While the results from the experiments in this chapter were not imple- 
mented on real FPGAs, the route from a design solution to design imple- 
mentation is straightforward. The following two additional steps are re- 
quired: 
" Provided that the target technology sever (Section 5.1.3) provides mod- 
els and algorithms for the targeted device and its configuration sub- 
system, it is possible to generate a configuration bitstream file for all 
configurations in the design. This can be then stored in a configura- 
tion data store for a specific reconfigurable application. 
" Design and configuration schedules need to be generated for the de- 
sign to be implemented. The schedules are provided as human-readable 
text files, which can be used to implement the design reconfigura- 
tion controller in either hardware or software. The configuration con- 
troller synthesis is not directly supported by the DYNASTY Frame- 
work (see Section 5.1.6). 
7.4 Summary of Results 
Tables 7.3-7.4 display a selection of results generated using the benchmarks 
from Table 7.1. These were selected as best results out of specific 10 runs of 
the synthesis algorithm. The following symbols are used in the 
following 
tables (Fig. 7.5): 
126 
)total 
vl v2 
Aexe 
Acom 
Areconfig 
tl t2 t3 tq t5 t6 t7 t8 tg tlo tll 
Figure 7.5: Design schedule example. Aexe =3 (t5, t9, tio), Acom =4 (t3, t7, t8, t1J, Areconfig =5 (tl, t2, t4, t5, t6). 
Laplace operator 
Array size Atotal Aexe Acom Areconfig 
24 x 24 979 7 75 897 
32 x 32 82 7 75 0 
48 x 48 82 7 75 0 
64x64 82 7 75 0 
Differential eauation 
Array size Atotal Rexe Acom Areconfi 
24 x 24 1337 23 336 978 
32 x 32 1303 23 336 944 
48 x 48 1281 23 336 922 
64x64 359 23 336 0 
Elliptic wave filter 
Array size Atotal Aexe Acom Areconfi 
24 x 24 1584 50 564 970 
32 x 32 1747 50 564 1133 
48 x 48 1663 50 564 1049 
64x64 1782 50 564 1168 
Table 7.3: Synthesis results for an 8-bit parallel random access configuration 
subsystem (XC6200). 
127 
Laplace operator 
Array size %'total Aexe Acorn Areconfig 
24 x 24 83 7 75 1 
32 x 32 82 7 75 0 
48 x 48 82 7 75 0 
64x64 82 7 75 0 
Differential equation 
Array size Atotal Aexe 'com Areconfi 
24 x 24 364 23 336 5 
32 x 32 366 23 336 7 
48 x 48 361 23 336 2 
64x64 360 23 336 1 
Elliptic wave filter 
Array size Atotal Rexe Acorn Areconfig 
24 x 24 627 50 564 13 
32 x 32 626 50 564 12 
48 x 48 620 50 564 6 
64x64 618 50 564 4 
Table 7.4: Synthesis results for multiple contexts configuration subsystem 
(DPGA). 
128 
Aexe is the number of system clock cycles spent executing the individual 
computations in the design 
Arom is the number of system clock cycles spent transferring RLU register 
values from/to the RLU (used for the transfer of arguments and re- 
trieval of the results of computations implemented in the RLU). The 
data transfer can be performed in parallel with execution of the com- 
putations. 
)reconfig is the number of system clock cycles spent reconfiguring the RLU. 
The RLU configuration can be performed in parallel with execution 
of computations. 
, Xtota1 denotes the total design execution latency as the number of system 
clock cycles required to complete the entire design computation. As 
the register and configuration data transfers can be performed in par- 
allel with the computations, the following relationship holds 
Atotal C %'exe + Acom + Areconfig (7.1) 
An example of the results for the Laplace operator benchmark over 10 
runs of the synthesis algorithms is shown in Fig. 7.6. For each run, the ini- 
tial population was generated using a different random seed. This demon- 
strates that genetic algorithms cannot guarantee that an identical solution 
will be found with the same design problem and constraints. Close exam- 
ination of the generated results reveals that the displayed variation is due 
to the algorithm's inability to share one adder resource for all add opera- 
tions (some solutions provide 1 adder block while others provide 2) and 
also the adder and subtractor resources did not fully overlap in some cases 
(an example of a solution with 2 adder blocks, 1 subtractor and 1 multiplier 
is shown in Fig. 7.7(b)). 
129 
1000 
800 
id 
600 
400 
200 
0 
1 c3456789 10 
synthesis run No. 
Figure 7.6: Solution stability over 10 GA-synthesis runs (Laplace operator 
benchmark, 24 x 24 array, 8-bit parallel random access configuration sub- 
system). 
The effect of a multiple-level resource sharing can be also observed in 
Fig. 7.7. The design optimised by hand (Fig. 7.7(a)) requires only 3 modules 
(adder, subtractor and multiplier), while an example of the automatically 
generated solution (Fig. 7.7(b)) requires 4 modules (1 additional adder) due 
the inability of the synthesis algorithm to fully overlap all 3 adder opera- 
Lions. 
Architectural-level resource sharing allows for 2 behavioural add op- 
erations (n3 and n4) to be bound to a single adder module (Fig. 7.7(b)). 
Physical-level resource sharing allows for the subtractor (n5) to share rout- 
ing and logic resources with this adder module. The effects of resource 
sharing from both architectural and physical levels are combined in the de- 
sign 3D floorplan. 
Some of the designs presented above were optimised by hand to pro- 
vide the most efficient implementation given the size of an array, module 
library and an input design problem. These results are summarised in Ta- 
130 
oX Elle View 
n5 
4aM,. 
t\ý 
n3 n2, n4 
(a) optimised by hand (modules) (b) automatically synthesised 
(modules) 
9 Mma 
File 
mý 
View -M 
oY 
n5 
n1 
.F 
/ 
n4 
n3 n2 
(c) optimised by hand (schedule) (d) automatically synthesised 
(schedule) 
Figure 7.7: Comparison of a manually constructed design solution with a 
design obtained automatically (Laplace operator benchmark, 24 x 24 array, 
8-bit parallel random access configuration subsystems, no configuration cy- 
cles are shown). (a)-(b) show the placement of the design modules, (c)-(d) 
show the design execution schedule. 
131 
n2, n3, n4 nl 
Laplace operator 
Array size Atotal Rexe Acorn Areconfi 
24 x 24 803 6 75 724 
64x64 78 5 75 0 
Differential eauation 
Array size Atotal )exe Acorn Areconfig 
24 x 24 1240 23 336 888 
64x64__ 
__342 
8 336 0 
Table 7.5: Results for an 8-bit parallel random access configuration subsys- 
tem (XC6200) optimised by hand. 
ble 7.5. 
The results demonstrate that given the size of the target technology de- 
vice (technology resource constraint) and the desired design latency (de- 
sign performance constraint), the synthesis technique is able to find sev- 
eral design solutions. The results are consistent with manually optimised 
results, although it is apparent that the automatic synthesis results are sub- 
optimal. 
Given the performance and the target technology constrains the syn- 
thesis algorithm will explore both reconfigurable and non-reconfigurable 
design implementation options. If all of the design blocks can be placed 
in a single configuration, the design can be treated as non-reconfigurable 
(represented by Areconfig = 0)" 
The results from using the XC6200-compatible reconfiguration interface 
(Fig. 7.3) demonstrate that the large overhead required for the reconfigura- 
tion of the design modules (Ar, onng) and communication of 
design data 
(Aýoý) results in a very inefficient design implementations. 
The results from using the reconfiguration subsystem with pre-loaded 
multiple-context configuration memory (Fig. 7.4) show that although the 
132 
reconfiguration latency is negligible, the communication latency (Aýom) lim- 
its the performance of the generated design implementations. 
The imbalance between Aexe and acorn / Areoonfig (Tables 7.3-7.5) suggests 
that if a reconfigurable design implementation using the above configura- 
tion subsystems is to be efficient, the design modules should spend much 
longer time in executing useful computation than in configuration or com- 
munication. In this context, a reconfigurable design is more efficient when 
the reconfiguration of design modules can be performed while others are 
executing useful computations. Thus long reconfiguration and communi- 
cation latencies could be amortised over many execution cycles. Alterna- 
tively, if the design implementation can operate with a clock period much 
longer than that of configuration clock, the reconfiguration and communi- 
cation overhead could be accommodated during the system clock period. 
One difficulty with using genetic algorithms for the optimisation of 
complex problems is that the algorithm runtime may vary considerably be- 
tween optimisation runs. For the experiments presented in this chapter, the 
synthesis algorithm usually converged to a final solution within only sev- 
eral minutes on a PentiumIIl/400MHz PC, while runtime increased with 
larger problems. However, in some cases the synthesis algorithm runtime 
exceeded one hour and the synthesis had to be terminated, while accepting 
the current best solution. 
An advantage of genetic algorithms over other global optimisation tech- 
niques is that a large pool of different solutions is generated early in the 
algorithm execution. If it is not necessary to search for the most optimal 
solution (for example when it is only necessary to access the suitability of 
the target technology for the implementation of the design problem), it is 
possible to terminate the synthesis early and use the best solution gener- 
133 
ated so far. Although the quality of such a solution is low, this may provide 
sufficient indication of the implementation feasibility, while the algorithm 
runtime is greatly reduced. 
For the practical implementation of the presented synthesis algorithm 
the runtime must be improved. Several opportunities exist here, including 
the combination of GA with knowledge-based heuristics and other optimi- 
sation techniques (such as simulated annealing). These would allow ge- 
netic algorithm to converge faster once a global minimum is thought to be 
found, but also could lead to more optimal results. 
134 
Chapter 8 
Conclusions 
This chapter summarises the contribution of the presented work and out- 
lines the areas for further improvements. Possible future directions of this 
research are outlined at the end of this chapter. 
8.1 Summary of the Contribution 
This thesis has presented a new formulation for the problem of synthesis 
for dynamically reconfigurable logic systems. A new synthesis and optimi- 
sation algorithm working on a restricted formulation of this problem was 
presented to demonstrate the feasibility of a special case of the proposed 
model. 
The features provided by the presented problem formulation were dis- 
cussed at the end of Chapter 4 and are summarised below: 
"A generic framework for the construction of various synthesis/compilation 
techniques. The problem formulation allows encapsulation of a vari- 
ety of techniques for reconfigurable system synthesis. Given the con- 
straints and assumptions about the targeted technology and the im- 
135 
plementation methodology, a specific problem instance can be char- 
acterised using this formulation (as demonstrated in Section 6.1). 
Synthesis and optimisation techniques can then operate within such 
a problem instance. For example, design for non-reconfigurable sys- 
tems can be treated within this framework as a special instance of the 
problem model (Eq. 4.5). Another example is the technique presented 
in Chapter 6, which assumes compile-time synthesis and emphasises 
detailed analysis of the reconfiguration overheads. Other techniques 
aiming to achieve different synthesis objectives at either compile-time 
or run-time can be characterised through a definition of a specific in- 
stance of this general problem. 
" Multiple-level resource sharing. The presented formulation allows for 
resources in the design to be shared at both architectural and fine- 
grained physical level. It is therefore possible to consider the re- 
duction of the reconfiguration overhead not only through sharing of 
architectural modules between identical behavioural operations, but 
also through sharing of primitive physical resource configurations be- 
tween the modules implementing different behavioural operations. 
The synthesis technique presented in Chapter 6 provides an example 
of a technique operating on an instance of the problem characterised by 
the assumptions in Section 6.1. The following are the main features of this 
technique: 
" Solution feasibility guarantee. As the approach combines the explo- 
ration of low-level design characteristics (layout position, reconfig- 
uration overhead, etc. ) with the high-level considerations (design 
scheduling, resource allocation and binding) it can guarantee that 
136 
the implementation of the synthesised solution will be feasible. This, 
however, is possible only if the reconfiguration overhead calculation 
function provides an estimate of the reconfiguration latency with suf- 
ficient accuracy. 
The solution feasibility ensures that no design flow iterations are re- 
quired in order to generate an implementation which will operate on 
the selected target technology. However, design iterations while se- 
lecting different target technologies or performance constraints may 
be required if the implementation with the originally chosen target 
technology cannot satisfy the original performance constraints. 
" Tradeoff analysis between reconfigurable and non-reconfigurable design im- 
plementation. As there is no a priori assumption that the design imple- 
mentation has to be reconfigurable, the optimisation technique can 
freely explore design implementations for these two options. Whether 
the design solution will operate as a reconfigurable or non-reconfigurable 
system depends on the selected target technology and the design con- 
straints. 
is Technology independence. The temporal floorplanning process, together 
with a model of a 3D floorplan, provides an abstraction which allows 
design optimisation for many common reconfigurable logic technolo- 
gies. The technology architecture abstraction offered by the 3D floor- 
plan model allows for complex technology-specific features to be 'hid- 
den' from the synthesis technique, while these features can be ex- 
ploited through the availability of a suitable technology server. The 
technology-specific considerations (such as the availability of specific 
library modules, specific technology resources or features, reconfig- 
uration overhead calculation procedures, etc. ) can be then 
'plugged- 
137 
in' to the synthesis technique. Therefore it can be expected that many 
other reconfigurable logic technologies, such as Xilinx Virtex, Atmel 
AT40k and others, can be used within such a framework. 
Furthermore, the results from the synthesis experiments presented in 
Chapter 7 appear to provide the expected results. Although no exact anal- 
ysis of the optimality of these solutions has been conducted, in several 
known cases the results are consistent with previous hand implementa- 
tions. 
8.1.1 Applications of the Proposed Approach 
The synthesis technique presented in Chapter 6 can be used in several ap- 
plications: 
" Synthesis for reconfigurable systems. As was demonstrated in Chapter 7, 
the developed technique can be used to produce working reconfig- 
urable implementations. The practicality of these implementations 
depends greatly on the features provided by the target technology. 
Contemporary commercial reconfigurable logic devices suffer from 
large reconfiguration overheads, which makes reconfigurable design 
implementations practical only (i) when system reconfiguration does 
not occur very often or (ii) when a relatively long system clock cycle 
can accommodate many configuration cycles, thus reducing the 
im- 
pact of the reconfiguration overhead on the design latency. 
Future 
improvements may provide technologies with faster reconfiguration 
times and therefore make frequent reconfiguration a more practical 
design option. 
. Reconfigurable technology architecture/features analysis. 
The approach 
138 
can be also used to evaluate the suitability of a specific reconfigurable 
logic architecture, reconfiguration subsystem and other technological 
features for the implementation of design problems from a specific 
application domain. Given an input set of typical design problems 
it would be possible to synthesise their implementation over a set of 
variations of the target technology. A domain-specific reconfigurable 
logic technology could be then derived from such experiments. 
" Retargetable compilation for reconfigurable computing platforms. Many 
different reconfigurable computing platforms and technologies have 
been developed for use in software acceleration. The design of a re- 
configurable implementation for a specific algorithm and a specific 
type of reconfigurable computing platform can be difficult, especially 
when partial reconfiguration is considered as a part of this process. 
Even if a high-level languages such as VHDL, C or Java were used to 
implement the algorithm on a specific reconfigurable platform, the 
implementation of the reconfiguration-related functionality will be 
specific to a given reconfigurable computing platform. Porting such an 
application to a different platform may involve a complete redesign 
of such algorithm hardware implementation (consider for example 
porting an algorithm implementation from a Xihnx XC6200 FPGA to 
Xilinx Virtex FPGA based platform). 
The synthesis technique presented in Chapter 6 offers a technology- 
independent model of the synthesis / compilation process. Technology- 
specific architecture and reconfiguration capabilities are provided through 
a technology server. For a compiler targeted to a reconfigurable com- 
puting platform, the technology server can be represented as a part 
of the target platform architecture model. If such a model is pro- 
139 
vided for each targeted reconfigurable computing platform, the port- 
ing of an algorithm to these platforms involves mere re-compilation 
(re-synthesis) of the algorithm with a different architectural model. 
The guaranteed feasibility of the implementation will ensure that an 
input algorithm will operate on the target reconfigurable computing 
platform, while the actual latency of the computation will depend on 
its speed and reconfiguration features. 
8.2 Areas for Improvement and Future Directions 
The presented synthesis technique limits the category of technologies (as 
detailed in Section 6.1) and applications which can be considered using 
this approach. Some of the possible areas for improvement are identified 
below: 
8.2.1 Composite Cost Function 
While the synthesis technique presented in Chapter 6 uses latency as a mea- 
sure of quality for the generated solutions, the fitness evaluation may in- 
clude further considerations. These might include constraints on the size 
of the storage available for the configuration and application data, con- 
straints on power consumption, constraints on the individual placement 
and scheduling of the design modules, and others. 
8.2.2 Evaluation with Large and Multi-cycle Modules 
The synthesis examples presented in Chapter 7 provide only a small set 
of benchmarks, limited to data-flow problems with behavioural graph el- 
ements being primitive arithmetic computations. It is necessary to bench- 
140 
mark the performance of this technique using larger problems with com- 
plexity approaching that of industrial applications. While primitive arith- 
metic operations of data-flow graphs provide many opportunities for archi- 
tectural exploration, without the ability to synthesise finite-state machines 
and route these primitive operations, it will not be possible to construct 
reconfigurable system design implementations at such a fine-grained level. 
It is foreseeable that the presented technique will be useful for synthe- 
sis from other system-level models. For example, task graphs are being 
used successfully in hardware /software co-design for reconfigurable sys- 
tems (e. g. (Chatha and Vemuri, 1999)). Each task in the model can be rep- 
resented by a complex hardware module. In this scenario, the presented 
algorithm should be able to explore any similarities between the configu- 
ration in order to produce systems which share primitive reconfigurable 
technology resources. This remains a topic for further investigation. 
8.2.3 Routing Consideration 
In the presented approach all data transfers between the architectural de- 
sign modules are performed via registers. It would be desirable to evaluate 
the opportunities for direct wire routing between the modules in a recon- 
figurable array. Routing may reduce the overheads required for the transfer 
of the computational data and also offer more implementation options. 
While this is a desirable feature, routing for reconfigurable systems is a 
very difficult problem. Routing algorithms have not only to select the rout- 
ing paths with limited reconfigurable routing resources, but also consider 
the impact of the routing on future re-configuration and the impact of the 
overhead due to configuration of routing switches on the overall execution 
latency. 
141 
Furthermore, direct routing in a reconfigurable system may not always 
be a desirable option. If the design implementation reconfigures frequently, 
it might be preferred to transfer register data via the configuration interface, 
rather than reconfigure many distributed routing resources. For example, 
a 32-bit data word in the Xilinx XC6200 FPGA technology can be transfered 
via its configuration interface in only 1 configuration interface cycle (in the 
best case). The number of configuration interface cycles required for the 
configuration of routing resources for a 32-bit wired bus depends on the 
position of the source and destination of the data transfer and the avail- 
ability of routing resources. This latency may be considerably longer than 
that of an equivalent register value transfer via the configuration interface. 
Such tradeoffs must be considered during the routing process. 
The routing problem is a part of physical synthesis and can be viewed 
within the framework presented in Chapter 4. The contribution of the rout- 
ing configuration and the delay is considered as a part of the Setup() func- 
tion in Eq. 4.4. 
In the presented synthesis technique, the routing could be considered as 
a part of the configuration correction procedure presented in Section 6.5.7. 
Routing techniques, however, must provide a worst case estimate of the 
impact of the routing on the execution latency so that the solution feasibility 
could be maintained. 
8.2.4 Architectural-Level Resource Sharing 
The presented technique has considered only one specific type of resource 
sharing at the architectural level (assumption 5 in Section 6.1). 
With the 
availability of routing algorithms suitable for dynamically reconfigurable 
systems it will be possible to consider architectural-level resource sharing 
142 
in scenarios when other components (e. g. FSM and its control logic) need 
to be placed and routed into the design. 
8.2.5 Register Allocation, Pipelining and Retiming 
If the synthesis approach can consider the routing problem, it will also be 
possible to consider different arrangements for the distribution of registers 
and data transfers in the design. This opens possibilities for the exploita- 
tion of different approaches aimed at the improvement of throughput and 
resource usage of the design implementation. 
Furthermore, the possibilities for design module pipeliriirtg and their 
reconfiguration through 'pipeline morphing' (Luk et al., 1997c) could be 
considered as an extension to the presented approach. 
While various design optimisation techniques could be applied to the 
presented synthesis method, there are several options how these could 
be implemented: (i) provided as new genetic algorithm operators, (ii) in- 
cluded as a part of the 3D floorplan correction routine (Algorithm 6.13), 
(iii) included into a fitness calculation routine, or (iv) provided as a pre- or 
post-processing algorithms. Further work is needed to establish suitable 
implementations of various optimisation techniques in the context of the 
presented synthesis method. 
8.2.6 Summary 
The presented technique allows automatic design synthesis for a category 
of reconfigurable systems. The result of such a synthesis is a reconfigurable 
system implementation represented as a set of configuration data for run- 
time reconfiguration of the RLU and a schedule for the control of this re- 
configuration process from the RCU. 
143 
Many problems in the design automation for reconfigurable systems re- 
main unsolved. While all areas of improvement listed in Section 8.2 should 
receive attention in the future, its is the routing problem for reconfigurable 
systems which demands high priority. Understanding of routing in recon- 
figurable systems and its impact on the system performance, will allow for 
future synthesis approaches to consider wire routing as a central part of the 
reconfigurable system synthesis. 
Improvements to the presented synthesis method are necessary to im- 
prove its efficiency and speed, and to include more advanced optimisation 
transformations. Furthermore, the interdependence between the reconfig- 
urable architectures, technologies, modelling and design tools needs to be 
studied to improve our understanding of reconfigurable systems and their 
applications. 
The synthesis results using the model XC6200 technology demonstrate 
some of the limitations of the current reconfigurable technologies. While 
better reconfigurable technologies are needed, these have to be developed 
to consider the targeted application domain. Furthermore, the reconfig- 
urable technology features should be developed in conjunction with the 
design tool development to ensure that the efficient algorithms can be con- 
structed which will use the technology features efficiently. 
In many practical systems where reconfigurable logic is considered as 
one of the implementation options, the system will include a combina- 
tion of software implemented on an embedded processor(s), fixed function 
hardware and reconfigurable hardware. In order to explore the high-level 
partitioning of the system's functionality between these three options, the 
system-level tools need to estimate the expected performance of each target 
technology option for various partitioning scenarios. 
144 
The presented technique for the design of reconfigurable systems should 
integrate with other design tools operating at a system-level. Such an in- 
tegration would allow system-level designers to consider reconfigurable 
logic implementation as one of the options equivalent to those of fixed 
hardware and processors, but providing different implementation trade- 
offs. 
145 
Appendix 
146 
Appendix A 
Model Reconfigurable Logic 
Technology 
A Xilinx XC6200 FPGA (Xilinx, 1997b) based model reconfigurable logic 
technology was used in the experiments presented in this thesis. To differ- 
entiate between this model technology and the original Xilinx XC6200 tech- 
nology, the model technology is referred to as a model (XC6200) technology, 
while the original (XC6200) technology denotes the technology as originally 
developed by Xilinx. 
The model XC6200 technology was implemented as a technology server 
in the DYNASTY Framework (Chapter 5). The implementation includes: 
" logic arrays of various sizes 
"a set of libraries, including primitive technology cell libraries and 
macro libraries 
" various configuration subsystems and supporting configuration la- 
tency estimation algorithms 
"a low-level device simulation model 
147 
The model XC6200 technology does not implement the full set of fea- 
tures available in the original XC6200 technology. While preserving the 
overall architecture and the basic functionality of the original configura- 
tion interface, the model technology was further enhanced to include other 
reconfiguration subsystems, which are not available in the original XC6200 
technology. 
This appendix highlights the features and incompatibilities of the model 
XC6200 technology as used in this thesis. Further detailed description of 
the original XC6200 technology and its configuration interface can be found 
in the relevant literature (Xilinx, 1997b; Churcher et al., 1995). 
A. 1 Architecture 
A compatible subset of the original XC6200 architecture was implemented 
in the model XC6200 technology, thus allowing for many of the circuits 
developed for the original XC6200 technology to be implemented in the 
model technology. The aim of the architectural implementation was to pre- 
serve the architectural features required by the XC6200 module library, but 
also to maintain functionality similar to the original XC6200 technology, 
so that the newly developed design techniques could use a realistic target 
technology model. 
A. 1.1 Device Size 
The original Xilinx XC6200 technology provides a selection of 
devices with 
array sizes of 48 x 48,64 x 64,96 x 96 and 128 x 128. The model 
XC6200 tech- 
nology implementation provides additional devices of other sizes, 
includ- 
ing 8x8,16 x 16,24 x 24,32 x 32. While these smaller devices are probably 
not useful for practical real-world applications 
(other than small coproces- 
148 
N N SEW N4 S4 E4W4 
S 
E 
W 
N4 
S4- X2 xi 
E4 
W4 
1 
CS mux 
N 0 m s- 
E 
: 
F: 
W 
Q 
N4 
: 
S4 
E4 
4 
p 
MAG IC 
Wo N NOUt NS N EOUt t S EE 
W WW E 
F FF SOUL F 
Figure A. 1: XC6200 logic block. The configuration multiplexers (shown in 
grey) are in their default states after the device reset. 
sors or reconfigurable ALUs), these devices were used in the experiments 
presented in Chapters 2 and 7 to enforce tighter resource constraints for 
small design problems. 
A. 1.2 Logic Block 
Xilinx XC6200 logic block architecture (including its functional unit and the 
local routing multiplexers) is depicted in Fig. A. 1. An identical logic block 
was implemented in the model XC6200 technology. 
The XC6200 logic block provides two 2-input look-up tables with out- 
puts connected to a 2: 1 multiplexer, one D flip-flop, and a number of rout- 
ing multiplexers. The special RP multiplexer can disable access to the D 
flip-flop from within the array, thus allowing for the flip-flop value to be 
read/written only via the XC6200 configuration interface. 
149 
nearest-neigh 
routing 
length-4 
routing 
logic blocks 
Figure A. 2: Model XC6200 technology logic array (not all length-4 connec- 
tions are shown). 
A. 1.3 Routing Resources 
Only the nearest-neighbour and length-4 interconnections were provided 
in the model XC6200 technology. While other longer interconnections from 
the original XC6200 technology (length-16 and chip-length) could have been 
easily implemented, this was unnecessary as the modules in the model 
XC6200 parametrised library do not use interconnections other than nearest- 
neighbour and length-4. As no inter-module routing is used in the ap- 
proach presented in this thesis, the longer routing resources would remain 
unused. 
An example of the implemented model XC6200 logic array is shown in 
Fig. A. 2. 
The model XC6200 technology does not provide any input/output blocks. 
The routing lines connecting to these blocks in the original XC6200 technol- 
ogy were left unconnected in the model XC6200 technology. Also MAGIC 
150 
routing connections originating in the logic blocks (Fig. A. 1) were left un- 
connected. 
A. 2 Configuration Subsystem 
The original XC6200 technology provides a combined serial and parallel 
configuration interface, with parallel random-access configuration data dis- 
tribution and one-to-one activation mechanism'. 
The model XC6200 technology was enhanced to provide several con- 
figuration subsystems with different configuration speed. These were pro- 
vided to facilitate testing of new synthesis algorithms while targeting tech- 
nologies with different configuration subsystems. The following configura- 
tion subsystems were implemented as typical representatives of the current 
trends: 
1. A subset of the original Xilinx XC6200 configuration subsystem. Only 
the parallel configuration interface is provided; 8,16 and 32-bit con- 
figuration data words are supported. The configuration distribution 
does not support the wildcard feature available in the original XC6200 
technology. Also many of the original configuration memory loca- 
tions are inactive due to lack of the corresponding configurable re- 
sources. 
2. A Xilinx Virtex-like configuration subsystem (Xilinx, 2000a) provid- 
ing frame-based configuration data distribution. The individual frames 
are aligned with the device columns. There are 3 frames for each col- 
umn. The frame length depends on the device size (8-bit control and 
address word + configuration data): 
'This categorisation of reconfigurable technologies was introduced in Section 2.2.1 on 
page 17. 
151 
8+8x logic blocks per 1 column [bits] 
The subsystem further provides a serial configuration interface and 
direct one-to-one configuration activation. 
3. A subsystem with a multiple-context configuration memory (similar 
to MIT DPGA (Brown et al., 1994)) providing an unlimited number of 
layers (many-to-one configuration activation). The memory contexts 
can be accessed through the original XC6200 parallel random-access 
distribution mechanism and the parallel configuration interface. 
In all of the above cases, the access to the device cell register values is 
assumed to be through the original XC6200 configuration interface. 
Reconfiguration latency calculation algorithms have been provided for 
each of the above configuration subsystems. Given a design solution rep- 
resented as a 3D floorplan these calculate the number of reconfiguration 
cycles required for the design implementation using the selected configu- 
ration subsystem. 
A. 3 Library Modules 
The model XC6200 technology uses a library of parametric modules de- 
rived from the original XC6200 macro library (Luk et al., 1997a; Xilinx, 
1998). 
A selection of arithmetic modules shown in Table A. 1 was adopted 
for 
the model technology library. All model XC6200 technology library mod- 
ules support signed numbers (negative numbers are represented 
in a 2's 
complement system). The modules were provided with input and output 
registers to allow data transfer from and to the modules via the configura- 
tion interface. 
152 
Name Function Description 
ADD A+B signed ripple-carry adder 
SUB A-B signed ripple-carry subtractor 
GTN A>B signed greater than comparator 
MLJLT A*B signed multiplier 
Table A. 1: A selection of XC6200 library modules used in experiments de- 
scribed in Chapter 7. 
In the original XC6200 macro library, the relative placement of logic 
blocks within each macro module was fixed using the positional constraints, 
while the detailed routing was performed by the Xilinx XACT6000 place & 
route tool. In the model XC6200 technology module library, each module 
was fully routed when the library was constructed. This ensures that the 
configuration of all routing multiplexors and look-up tables in the mod- 
ule is known during the synthesis process. Therefore the reconfiguration 
latency, which considers the configuration of both logic and routing re- 
sources, can be calculated accurately. 
As only the local and length-4 routing wires were used for the mod- 
ule routing, the modules can be positioned at any array location (if only 
the local routing wires were used) or locations with coordinates which are 
multiple of 4 (when the length-4 wires were used). 
Examples of fully placed and routed modules for 4-bit adder and sub- 
tractor modules are shown in Figs. A. 3-A. 4 and Figs. A. 5-A. 6 respectively. 
Detailed configuration data for these modules can be derived from the con- 
figuration of logic and routing resources shown in Figs. A. 4 and A. 6. 
The summary of module characteristics for adder, subtractor, compara- 
tor and multiplier modules are shown in Tables A. 2-A. 5. 
153 
B(3) 
A(3) 
B(2) 
A(2) 
B(1) 
A(+) 
9(0) 
A(O) 
roue 
C(J) 
C(2) 
G1) 
C(O) 
Figure A. 3: 4-bit adder (a + b): schematic diagram. 
154 
8)3) 
_ 
II 
rýQ 
to r 
ýa ý 
ilLLLLLL 
C(3) 
tR 
S(2) 
mgwlo 
C(1) 
A(2) 
ýp 
rý 
p 
IR. r 
Rp 
Fit., 
sýrro 
Ire_r 
a 
pY 
B(O) 
ýaOJ 
ýei0 
YJro 
q0) 
Figure A. 4: 4-bit adder (a + b): detailed 
layout. 
155 
B(3) 
A(3) 
B(2) 
A(2) 
B(1) 
A41) 
B(c) 
A(O) 
tout 
C(3) 
C(2) 
C(+) 
C(O) 
Figure A. 5: 4-bit subtractor (a - b): schematic diagram. 
156 
B(7) 
A(s) 
C (c) 
Figure A. 6: 4-bit subtractor 
(a - b): detailed 
layout. 
157 
4-bit ADD 8-bit ADD 
size 3x8 3x16 
execution latency 17 ns 37 ns 
data retrieval latency (in/out) 1 1 4/2 c/cycles 4/2 c/cycles 
Worst-case configuration latency (in c/cycles) 
Configuration interface 4-bit ADD 8-bit ADD 
8-bit XC6200 65 129 
16-bit XC6200 44 80 
32-bit XC6200 27 45 
frame-based (Virtex) 2376 2376 
multiple contexts (DPGA) 1 1 
Table A. 2: Characteristics for 4-bit and 8-bit adder modules. 
4-bit SUB 8-bit SUB 
size 3x8 3x16 
execution latency 22 ns 42 ns 
data retrieval latency (in/out) 4/2 c/cycles 4/2 c/cycles 
Worst-case configuration latency (in c/cycles) 
Configuration interface 4-bit SUB 8-bit SUB 
8-bit XC6200 67 131 
16-bit XC6200 44 80 
32-bit XC6200 27 45 
frame-based (Virtex) 2376 2376 
multiple contexts (DPGA) 1 1 
Table A. 3: Characteristics for 4-bit and 8-bit subtractor modules. 
158 
4-bit GTN 8-bit GTN 
size 3x8 3x16 
execution latency 22 ns 44 ns 
data retrieval latency (in /out) 1 1 4/2 c /cycles 4/2 c/ cycles 
Worst-case configuration latency (in c/cycles) 
Configuration interface 4-bit GTN 8-bit GTN 
8-bit XC6200 64 136 
16-bit XC6200 41 77 
32-bit XC6200 27 45 
frame-based (Virtex) 2376 2376 
multiple contexts (DPGA) 1 1 
Table A. 4: Characteristics for 4-bit and 8-bit 'greater than' comparator mod- 
ules. 
4x4-bit MULT 8x8-bit MULT 
size 10x9 18x17 
execution latency 62 ns 141 ns 
data retrieval latency (in/out) 1 1 13/14 c/cycles 25/26 c/cycles 
Worst-case configuration latency (in c/cycles) 
Configuration interface 4-bit MULT 8-bit MULT 
8-bit XC6200 204 790 
16-bit XC6200 124 442 
32-bit XC6200 84 246 
frame-based (Virtex) 7920 14256 
multiple contexts (DPGA) 1 1 
Table A. 5: Characteristics for 4x4-bit and 8x 8-bit multiplier modules. 
159 
A. 4 Support for Design Verification 
A low-level VHDL model of the device architecture was implemented for 
the model XC6200 technology server. The model can be used to verify the 
validity of the configuration data produced by the design synthesis. 
The VHDL device model provides a model of the configuration inter- 
face and a structural netlist of primitive device blocks (look-up tables, rout- 
ing multiplexers, wires, etc. ). The individual primitive blocks are modelled 
at behavioural level. The configuration of all primitive elements can be read 
from a configuration text file during the VHDL simulation. The configura- 
tion text file can be changed by another program (e. g. another simulator 
or a debugger), which allows co-simulation of software and reconfigurable 
hardware system components. 
160 
Glossary 
compile-time refers to time during program (or design) compilation (or 
synthesis). Compile-time is an opposite to run-time. 
configurable is used here as a synonym for (in-field) programmable. 
dynamically reconfigurable refers to a quality of a system of being changed 
during its own operation as opposed to requiring the system power- 
down. Note that the term dynamic is used here to denote a temporal 
quality only. Dynamic reconfigurability does not automatically imply 
partial reconfigurability, which is a spatial characteristic. Dynamically 
reconfigurable is a synonym for run-time reconfigurable. 
Note that currently two interpretations of this terminology are in com- 
mon use. The above interpretation is consistent with Lautzenheiser 
(1986) and other early works in this field. 
The other interpretation of dynamically reconfigurable implies both tem- 
poral and spatial characteristics (e. g. (Lysaght and Dunlop, 1994)), i. e. 
systems which can be changed during their operation and partially. 
dynamically reconfigurable logic is a logic system, which is dynamically 
reconfigurable. 
full reconfiguration refers to a type of reconfiguration, when only the en- 
161 
tire reconfigurable system can be modified in one configuration. Full 
reconfiguration is an opposite of partial reconfiguration. 
(in-field) programmable refers to a quality of a system (or a device, circuit, 
sub-system, etc. ) such that the system's configuration can be changed 
away from the system vendor's manufacturing facility. 
partial reconfiguration refers to a type of reconfiguration, which permits 
that only a portion of the reconfigurable system is modified. Depend- 
ing on the context where this term is used, the 'system' may include a 
single reconfigurable logic device, but also a complex reconfigurable 
system with several reconfigurable logic devices, and other compo- 
nents. Partial reconfiguration is an opposite of full reconfiguration. 
reconfigurable system is a system which can be configured more than once. 
run-time is time during system's operation. Run-time is an opposite to 
compile-time. 
run-time reconfigurable is a synonym of dynamically reconfigurable. 
162 
References 
Albahama, 0. T., Cheung, P. and Clarke, T. (1994). Virtual hardware and 
the limits of computational speed-up, In: IEEE International Symposium 
on Circuits and Systems, pp. 159-162. 
Algotronix (1991). CAL1024 Datasheet, Algotronix Ltd. Version 004. 
Altera (1995). 1995 Data Book, Altera Corporation, 2610 Orchard Parkway. 
Amdahl, G. M. (1967). Validity of the single processor approach to achiev- 
ing large scale computing capabilities, In: Proc. AFIPS 1967 Spring Joint 
Computer Conference, Atlantic City, NJ, April, pp. 483-485. 
Atmel (1994). Configurable Logic Design and Application Book 1994/1995, At- 
mel Corporation, 2125 O'Nel Drive, San Jose, CA, 95131. 
Bazargan, K., Kaster, R. and Sarrafzadeh, M. (1999). 3-D floorplanning: 
Simulated annealing and greedy placement methods for reconfig- 
urable computing systems, In: Proceedings of the IEEE Workshop on 
Rapid System Prototyping (RSP'99), Clearwater, FL, USA, June 16-18. 
Brebner, G. (1996). A virtual hardware operationg system for the xil- 
inx XC6200, In: R. W. Hartenstein and M. Glesner (editors), Field- 
Programmable Logic: Smart Applications, New Paradigms and Compilers 
(FPL '96 Proceedings), LNCS 1142, Springer-Verlag, pp. 327-336. 
163 
Brebner, G. (1997). The swappable logic unit: a paradigm for virtual hard- 
ware, In: IEEE Symposium on FPGAs for Custom Computing Machines, 
Napa Valley, CA, USA, April 16-18, pp. 77-86. 
Brebner, G. and Bergmann, N. (1999). Reconfigurable computing in re- 
mote and harsh environment, In: P. Lysaght, J. Irvine and R. Harten- 
stein (editors), Field-Programmable Logic and Applications, LNCS 1673, 
Springer-Verlag, Glasgow, UK, August 30-September 1, pp. 195-204. 
Brown, J., Chen, D., Eslick, I., Tau, E. and DeHon, A. (1994). DELTA: Pro- 
totype for a first-generation dynamically programmable gate array, 
Transit Note 112, MIT Artificial Intelligence Laboratory. 
Canto, E., Moreno, J. M., Cabestany, J., Faura, J. and Insenser, J. M. 
(1999). A bipartitioning algorithm for dynamic reconfigurable pro- 
grammable logic, In: P. Lysaght, J. Irvine and R. Hartenstein (editors), 
Field-Programmable Logic and Applications, LNCS 1673, Springer-Verlag, 
Glasgow, UK, August 30-September 1, pp. 134-143. 
Chang, D. and Marek-Sadowska, M. (1998). Partitioning sequential circuits 
on dynamically reconfigurable FPGAs, In: International Symposium on 
Field Programmable Gate Arrays, Monterrey, CA, February, pp. 161-167. 
Chatha, K. S. and Vemuri, R. (1999). Hardware-software codesign for dy- 
namically reconfigurable architectures, In: P. Lysaght, J. Irvine and 
R. Hartenstein (editors), Field-Programmable Logic and Applications, 
LNCS 1673, Springer-Verlag, Glasgow, UK, August 30-September 1, 
pp. 175-184. 
Churcher, S., Kean, T. and Wilkie, B. (1995). The XC6200 FastMapTM proces- 
sor interface, In: W. Moore and W. Luk (editors), 5th International 
Work- 
164 
shop on Field Programmable Logic and Applications, LNCS 975, Springer- 
Verlag, Oxford, UK, August 29-September 1, pp. 36-43. 
Culbertson, W. B., Amerson, R., Carter, R. J., Kuekes, P. and Snider, G. 
(1997). Defect tolerance on the teramac custom computer, In: IEEE 
Symposium on FPGAs for Custom Computing Machines, Napa, CA 
(USA), April, pp. 116-123. 
De Micheli, G. (1994). Synthesis and Optimisation of Digital Circuits, McGraw- 
Hill. 
DeHon, A. (1996). E-mail conversation. 
Dewilde, P., Deprettere, E. and Nouta, R. (1985). Parallel and pipelined 
VLSI implementation of signal processing algorithms, In: S. Kung, 
H. Whitehouse and T. Kailath (editors), VLSI and Modern Signal Pro- 
cessing, Prentice Hall, pp. 257-264. 
Diessel, 0., ElGindy, H., Middendorf, M., Schmeck, H. and Schmidt, 
B. (2000). Dynamic scheduling of tasks on partially reconfigurable 
FPGAs, IEE Proceedings Computer and Digital Techniques 147(3): 181- 
188. 
Dodhi, M. K., Hielschner, F. H., Storer, R. H. and Bhasker, J. (1995). Datap- 
ath synthesis using a problem-space genetic algorithm, IEEE Transac- 
tions on CAD of Integrated Circuits and Systems 14(8): 934-944. 
Edwards, C. (2000). Vax all, folks, Electronics times. September 11, pp. 42-44. 
Eldredge, J. G. and Hutchings, B. L. (1994). RRANN: A hardware imple- 
mentaion of the backpropagation reconfigurable FPGAs, In: Proceed- 
ings of the IEEE World Conference on Computational Intelligence, IEEE, 
Orlando, Florida, June, pp. 77-80. 
165 
Estrin, G. (1960). Organization of computer systems-the fixed plus vari- 
able structure computer, In: Proc. Western Joint, Computer Conference, 
San Francisco, CA, USA, May 3-5, pp. 33-40. 
French, P. C. and Taylor, R. W. (1993). A self-reconfigurable processor, 
In: D. A. Buell and K. L. Pocek (editors), IEEE Workshop on FPGAs 
for Custom Computing Machines, IEEE Comput. Soc. Press, Napa, CA, 
USA, April 5-7, pp. 50-59. 
GajjalaPurna, K. M. and Bhatia, D. (2000). Temporal partitioning and 
scheduling data flow graphs for reconfigurable computers, IEEE 
Transactions on Computers 48(6): 579-590. 
Gajski, D. D., Dutt, N. D., Lu, A. C. and Lin, S. Y. (1992). High-level synthesis: 
Introduction to Chip and System Design, Kluwer Academic Pubishers. 
Gerez, S. H. (1999). Algorithms for VLSI Design Automation, John Wiley & 
Sons. 
Gokhale, M. and Marks, A. (1995). Automatic synthesis of paral- 
lel programs targeted to dynamically reconfigurable logic arrays, 
In: W. Moore and W. Luk (editors), 5th International Workshop on Field 
Programmable Logic and Applications, LNCS 975, Springer-Verlag, Ox- 
ford, UK, August 29-September 1, pp. 399-408. 
Gokhale, M., Holmes, W., Kopser, A., Lucas, S., Minnich, R. and Sweely, 
D. (1991). Building and using a highly parallel programmable logic 
array, IEEE Computer 24(1): 81-89. 
Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimisation, and Ma- 
chine Learning, Addison-Wesley. 
166 
Govindarajan, S. and Vemuri, R. (2000). Tightly integrated design space ex- 
ploration with spatial and temporal partitioning in SPARC, In: R. W. 
Hartenstein and H. Grünbacher (editors), Field Programmable Logic and 
Applications (FPL 2000 Proceedings), LNCS 1896, Springer-Verlag, Vil- 
lach, Austria, August 27-30, pp. 7-18. 
Guccione, S. A. (1995). Programming Fine-Grained Reconfigurable Architec- 
tures, PhD thesis, University of Texas at Austin. 
Hadley, J. D. and Hutchings, B. L. (1995). Design methodologies for par- 
tially reconfigured systems, In: Proc. IEEE Symposium on FPGAs for 
Custom Computing Machines, Napa, CA, USA, April 19 21, pp. 78-84. 
Hauck, S., Li, Z. and Schwabe, E. (1998). Configuration compression for the 
xilinx XC6200 FPGA, In: Proceedings of IEEE Symposium on FPGAs for 
Custom Computing Machines (FCCM'98), Napa, CA, USA, April 15-17, 
pp. 138-146. 
Heath, J. R., Kuekes, P. J., Snider, G. S. and Williams, R. S. (1998). A de- 
fect tolerant computer architecture: Opportunities for nanotechnol- 
ogy, Science 280: 1716-1721. 
Hennessy, J. L. and Patterson, D. A. (1990). Computer architecture: a quanti- 
tative approach, Morgan Kaufmann Publishers. 
Heron, J. R and Woods, R. F. (1996). Architectural strategies for imple- 
menting an image processing algorithm on XC6200 FPGA, In: R. W. 
Hartenstein and M. Glesner (editors), Field-Programmable Logic: Smart 
Applications, New Paradigms and Compilers (FPL '96 Proceedings), LNCS 
1142, Springer-Verlag, pp. 317-326. 
167 
Holland, J. H. (1975). Adaptation in natural and artificial systems, The Univer- 
sity of Michigan Press, Ann Arbor, Michigan, USA. 
Jones, D. and Lewis, D. M. (1995). A time-multiplexed FPGA architecture 
for logic emulation, In: Proceedings of IEEE Custom Integrated Circuits 
Conference, pp. 495-498. 
Kaul, M. and Vemuri, R. (1998). Optimal temporal partitioning and synthe- 
sis for reconfigurable architectures, In: Design, Automation and Test in 
Europe Conference, Paris, France, February 23-26. 
Kean, T. (1988). Configurable Logic: A Dynamically Programmable Celluar Ar- 
chitecture and its VLSI Implementation, PhD thesis, University of Edin- 
burgh, Dept. of Computer Science. 
Kean, T. (1999). FPL in the era of system level integration, In: FPL'99 Special 
One-day Seminar, ISLI, The Alba Campus, Scotland, September 2. 
Kirkpatrick, S., Gelatt, C. D. and Vecchi, M. P. (1983). Optimisation by sim- 
ulated annealing, Science 220(4598): 671-680. 
Lala, P. K. (2000). Self-Checking and Fault-Tolerant Digital Design, Morgan 
Kaufmann Publishers. 
Lautzenheiser, D. P. (1986). Using dynamic reconfigurable logic in a 
XC2064 logic cell array, Electro'86 and Mini/Micro Northeast Conference 
pp. 26/2/1-10. 
Ling, X. -P. and Amano, H. (1993a). Performance evaluation of 
WASMII: a 
data driven computer on a virtual hardware, In: Proceedings 5th Inter- 
nalional PARLE Conference, LNCS 694, pp. 610-621. 
168 
Ling, X. -P. and Amano, H. (1993b). WASMII: A data driven computer on 
a virtual hardware, In: D. A. Buell and K. L. Pocek (editors), IEEE 
Workshop on FPGAs for Custom Computing Machines, IEEE Comput. Soc. 
Press, Napa, CA, USA, April 5-7, pp. 33-42. 
Liu, H. and Wong, D. F. (1999). Circuit partitioning for dynamically re- 
configurable FPGAs, In: International Symposium on Field Programmable 
Gate Arrays, Monterrey, CA, February, pp. 187-194. 
Luk, W. et al. (1997a). Parametrised libraries for Xilinx 6200 FPGAs, Prelimi- 
nary documentation, Dept of Computing, Imperial College, 180 Queen's 
Gate, London SW7 2BZ, United Kingdom. Version 2.1. 
Luk, W., Guo, S., Shirazi, N. and Zhuang, N. (1996). A fremework for 
developing parametrised FPGA libraries, In: R. W. Hartenstein and 
M. Glesner (editors), Field-Programmable Logic: Smart Applications, New 
Paradigms and Compilers (FPL '96 Proceedings), LNCS 1142, Springer- 
Verlag, pp. 24-33. 
Luk, W., Shirazi, N. and Cheung, P. Y. (19976). Compilation tools for run- 
time reconfigurable designs, In: Proc. IEEE Symposium on FPGAs for 
Custom Computing Machines (FCCM'97), Napa, CA, USA, April 16-18, 
pp. 56-65. 
Luk, W., Shirazi, N., Guo, S. R. and Cheung, P. Y. K. (1997c). Pipeline 
morphing and virtual pipelines, In: W. Luk, P. Y. K. Cheung and 
M. Glesner (editors), Field Programmable Logic and Applications (FPL '97 
Proceedings), LNCS 1304, Springer-Verlag, pp. 111-120. 
169 
Lysaght, P. and Dunlop, J. (1994). Dynamic reconfiguration of FPGAs, 
In: W. R. Moore and W. Luk (editors), More FPGAs, Abingdon EE&CS 
Books, pp. 82-94. 
Lysaght, P. and Stockwood, J. (1996). A simulation tool for dynamically 
reconfigurable field programmable gate arrays, IEEE Transactions on 
VLSI Systems 4(3): 381-390. 
Lysaght, P., Stockwood, J., Law, J. and Girma, D. (1994). Artificial neu- 
ral network implementation on a fine-grained FPGA, In: 4th Interna- 
tional Workshop on Field Programmable Logic and Applications, LNCS 849, 
Springer-Verlag, Prague, Czech Republic, September 7-9, pp. 421-431. 
Mange, D., Durand, S., Sanchez, E., Stauffer, A., Tempesti, G., Marchal, P. 
and Piguet, C. (1995). A new self-reproducing automaton based on a 
multi-cellular organization, Technical Report No. 95/114, Logic Synthe- 
sis Laboratory, Dept of Computer Science, Swiss Federal Institute of 
Technology, Lausanne, Switzerland. 
McCaskill, J. and Wagler, P. (2000). From reconfigurability to evolu- 
tion in construction systems-spanning electronic, microfluidic and 
biomolecular domains, In: R. W. Hartenstein and H. Grünbacher (ed- 
itors), Field Programmable Logic and Applications (FPL 2000 Proceedings), 
LNCS 1896, Springer-Verlag, Villach, Austria, August 27-30. 
McFarland, M. C., Parker, A. and Camposano, R. (1990). The high-level 
synthesis of digital systems, IEEE Proceedings pp. 301-318. 
McGregor, G. and Lysaght, P. (1999). Self controlling dynamic reconfigura- 
tion: A case study, In: P. Lysaght, J. Irvine and R. Hartenstein (editors), 
170 
Field-Programmable Logic and Applications, LNCS 1673, Springer-Verlag, 
Glasgow, UK, August 30-September 1, pp. 144-154. 
Minninck, R. (1964). Cutpoint cellular logic, IEEE Transactions on Electronic 
Computers EC-13: 685-698. 
Morris, R. and Nowrouzian, B. (1996). A novel technique for pipelined 
scheduling and allocation of data-flow graphs based on genetic algo- 
rithms, In: T. J. Malkinson (editor), 1996 Canadian Conference on Elec- 
trical and Computer Engineering. Conference Proceedings. Theme: Glimpse 
into the 21st Century (Cat. No. 96TH8157), Vol. 1, Calgary, Alta., Canada, 
May 26-29, pp. 429-432. 
Murgai, R., Brayton, R. K. and Sangiovanni-Vincentelli, A. (1995). Logic 
Synthesis for Field-Programmable Gate Arrays, Kluwer Academic Pu- 
bishers, Boston. 
Nemhauser, G. L. and Wolsey, L. A. (1988). Integer and Combinatorial Opti- 
mization, Chichester & Wiley. 
Ohmori, K. (1995). High-level synthesis using genetic algorithm, 
In: 1995 IEEE International Conference on Evolutionary Computation 
(Cat. No. 95TH8099), Perth, WA, Australia, November 29-December 1, 
pp. 209-213. 
Oldfield, J. V. and Dorf, R. C. (1995). Field Programmable Gate Arrays: Recon- 
figurable Logic for Rapid Prototyping and Implementation of Digital Sys- 
tems, John Wiley and Sons. 
Oliveira, A., Lau, N. and Sklyarov, V. (1998). Synthesis of VHDL code from 
the hierarchical specification of control circuits for dynamically recon- 
171 
figurable FPGAs, In: Proceedings of VUIF Fall '98, Orlando, Florida, Oc- 
tober 26-28. 
Paulin, P. G., Knight, J. P. and Girczyc, E. F. (1986). Hal: A multi-paradigm 
approach to automatic data path synthesis, In: Proc. 23rd IEEE Design 
Automation Conference, Las Vegas, NV, USA, July, pp. 263-270. 
Robinson, D. and Lysaght, P. (1999). Modelling and synthesis of configu- 
ration controllers for dynamically reconfigurable logic systems using 
the dcs cad framework, In: P. Lysaght, J. Irvine and R. Hartenstein (ed- 
itors), Field-Programmable Logic and Applications, LNCS 1673, Springer- 
Verlag, Glasgow, UK, August 30-September 1, pp. 41-50. 
Robinson, D., McGregor, G. and Lysaght, P. (1998). New CAD frame- 
work extends simulation of dynamically reconfigurable logic, In: R. W. 
Hartenstein and A. Keevallik (editors), Field Programmable Logic and 
Applications (FPL '98 Proceedings), LNCS 1482, Springer-Verlag, pp. 1- 
8. 
Sait, S. M., All, S. and Benten, M. S. T. (1996). Scheduling and allocation in 
high level synthesis using stochastic techniques, Microelectronics jour- 
nal 27(8): 693-712. 
Sels, P. (1996). Scheduling for dynamically reconfigurable FPGAs, Master's the- 
sis, Keble College, Oxford University. 
Sherwani, N. (1995). Algorithms for VLSI Physical Design Automation, 2nd 
edn, Kluwer Academic Pubishers. 
Shirazi, N., Luk, W. and Cheung, P. Y. (1998). Automating production of 
run-time reconfigurable designs, In: Proceedings of 6th 
IEEE Symposium 
172 
on Field-Programmable Custom Computing Machines (FCCM'98), Napa, 
CA (USA), April 14-17, pp. 147-156. 
Sidhu, R. P. S., Mei, A. and Prasanna, V. K. (1999). Genetic programming us- 
ing self-reconfigurable FPGAs, In: P. Lysaght, J. Irvine and R. Harten- 
stein (editors), Field-Programmable Logic and Applications, LNCS 1673, 
Springer-Verlag, Glasgow, UK, August 30-September 1, pp. 301-312. 
Silberschatz, A. and Galvin, P. B. (1998). Operating System Concepts, 5th edn, 
Addison-Wesley Longman. 
Skylarov, V. and de Brito Ferrari, A. (1998). Design and implementation of 
control circuits based on dynamically reconfigurable FPGA, In: Proc. of 
IEEE Internatinal Conference on Electronics, Circuits and Systems, Lisbon. 
Stanford, P. and Mancuso, P. (editors) (1990). EDIF Electronic Design Inter- 
change Format Version 200,2nd edn, Electronic Industries Association. 
Takayama, A., Shibata, Y., Iwai, K. and Amano, H. (2000). Dataflow par- 
titioning and scheduling algorithms for WASMII, a virtual hardware, 
In: R. W. Hartenstein and H. Grünbacher (editors), Field-Programmable 
Logic and Applications (FPL 2000 Proceedings), LNCS 1896, Springer- 
Verlag, Villach, Austria, August 27-30, pp. 685-694. 
Tangen, U. (2000). Self-organisation in micro-configurable hardware, 
In: M. A. Bedau et al. (editors), Artificial Life VII: Proceedings of the 7th 
International Conference. 
Thompson, A. (1996). An evolved circuit, intrinsic in silicon, entwined with 
physics, In: Proc. 1st Int. Conf. on Evolvable Systems (ICES 96). 
Trimberger, S., Carberry, D., Johnson, A. and Wong, J. (1997). A time- 
multiplexed FPGA, In: Proc. IEEE Symposium on FPGAs for Custom 
173 
Computing Machines (FCCM'97), Napa, CA, USA, April 16-18, pp. 22- 
28. 
Trimberger, S. M. (1994). Field-Programmable Gate Array Technology, Kluwer 
Academic Publishers. 
Vasilko, M. (1999). DYNASTY: A temporal floorplanning based 
CAD framework for dynamically reconfigurable logic systems, 
In: P. Lysaght, J. Irvine and R. Hartenstein (editors), Field-Programmable 
Logic and Applications (FPL'99 Proceedings), LNCS 1673, Springer- 
Verlag, pp. 124-133. 
Vasilko, M. (2000). Design visualisation for dynamically reconfigurable 
systems, In: R. W. Hartenstein and H. Grünbacher (editors), Field- 
Programmable Logic and Applications (FPL 2000 Proceedings), LNCS 1896, 
Springer-Verlag, Villach, Austria, August 27-30, pp. 131-140. 
Vasilko, M. and Ait-Boudaoud, D. (1996a). Architectural synthesis tech- 
niques for dynamically reconfigurable logic, In: R. W. Hartenstein and 
M. Glesner (editors), Field-Programmable Logic: Smart Applications, New 
Paradigms and Compilers (FPL '96 Proceedings), LNCS 1142, Springer- 
Verlag, pp. 290-296. 
Vasilko, M. and Ait-Boudaoud, D. (1996b). Optically reconfigurable 
FPGAs: Is this a future trend ?, In: R. W. Hartenstein and M. Glesner 
(editors), Field-Programmable Logic: Smart Applications, New Paradigms 
and Compilers (FPL '96 Proceedings), LNCS 1142, Springer-Verlag, 
pp. 270-279. 
Vasilko, M. and Cabanis, D. (1999). A technique for modelling dynamic re- 
configuration with improved simulation accuracy, IEICE Transactions 
174 
on Fundamentals of Electronics, Communications and Computer Science 
E82-A(11): 2465-2474. 
Vasilko, M., Gibson, D., Long, D. and Holloway, S. (1999). Towards a consis- 
tent design methodology for run-time reconfigurable systems, In: IEE 
Colloquium on Reconfigurable Systems, Digest No. 99/061, Glasgow, Scot- 
land, March 10, pp. 5/ 1-4. 
von Neumann, J. (1966). Theory of Self-Reproducing Automata, University of 
Illinois Press. 
Wall, M. (1996). GAlib: A C++ Library of Generic Algorithm Components, ver- 
sion 2.4, MIT, available from http: //lancet. mit. edu/ga/, Au- 
gust. 
Wirthlin, M. J. and Hutchings, B. L. (1995). A dynamic instruction set com- 
puter, In: Proceedings IEEE Symposium on FPGAs for Custom Computing 
Machines, Napa Valley, CA, USA, April 19-21, pp. 99-107. 
Xilinx (1994). The ProgrammableLogic Data Book, Xilinx, Inc., 2100 Logic 
Drive, San Jose, CA 95124. 
Xilinx (1997a). XACTstep Series 6000 User Guide, Xilinx, Inc. 
Xilinx (1997b). XC6200 Field Programmable Gate Arrays, Xilinx, Inc., April 24 
(Version 1.10). Product Description. 
Xilinx (1998). Parametrised Library for XC6200, Xilinx, Inc., January. Velab 
documentation. 
Xilinx (2000a). Virtex Series Configuration Architecture User Guide, Xilinx, 
Inc., September 27. XAPP151 (vl. 5). 
175 
Xilinx (2000b). VirtexT M 2.5 V Field Programmable Gate Arrays, Xilinx, Inc., 
September 19. DS003 (v2.3). 
Zhang, X. j., Ng, K. -w. and Luk, W. (2000). A combined approach to 
high-level synthesis for dynamically reconfigurable systems, In: R. W. 
Hartenstein and H. Grünbacher (editors), Field Programmable Logic and 
Applications (FPL 2000 Proceedings), LNCS 1896, Springer-Verlag, Vil- 
lach, Austria, August 27-30, pp. 361-370. 
Zhang, X. -j., Ng, K. -w. and Young, G. H. (1998). High-level synthesis using 
genetic algorithms for dynamically reconfigurable FPGAs (Abstract), 
In: Proceedings of the 1998 ACM/SIGDA 6th International Symposium on 
Field Programmable Gate Arrays (FPGA'98), ACM, Monterey, CA (USA), 
February 22-25, p. 258. 
176 
