Synthesis of compositional microprogram control units for programmable devices by Wiśniewski, Remigiusz
Synthesisof Compositional
Microprogram Control Units
for Programmable Devices
Faculty of Electrical Engineering, Computer Science and Telecommunications
University of Zielona Góra
Lecture Notes in Control and Computer Science
Volume 14
Editorial Board:
• Józef KORBICZ – Editor-in-Chief
• Marian ADAMSKI
• Alexander A. BARKALOV
• Krzysztof GAŁKOWSKI
• Roman GIELERAK
• Eugeniusz KURIATA
• Sławomir NIKIEL
• Andrzej OBUCHOWICZ
• Andrzej PIECZYŃSKI
• Dariusz UCIŃSKI
• Marcin WITCZAK
Universityof Zielona Góra Press, Poland
2009
Synthesis of Compositional
Microprogram Control Units
for Programmable Devices
Remigiusz Wiœniewski
Remigiusz WIŚNIEWSKI
Institute of Computer Engineering and Electronics
University of Zielona Góra
ul. Podgórna 50
65-246 Zielona Góra, Poland
e-mail: r.wisniewski@iie.uz.zgora.pl
Supervisor:
• Alexander A. BARKALOV, University of Zielona Góra
Referees:
• Dariusz KANIA, Silesian University of Technology
• Marian ADAMSKI, University of Zielona Góra
The text of this book was prepared based on the author’s Ph.D. dissertation
entitled Synthesis of Compositional Microprogram Control Units
for Programmable Devices
ISBN 978-83-7481-293-1
Camera-ready copy prepared in LATEX2ε by the author
Copyright c©University of Zielona Góra Press, Poland, 2009
Copyright c©Remigiusz Wiśniewski, 2009
University of Zielona Góra Press
ul. Licealna 9, 65-417 Zielona Góra, Poland
tel./fax: +48 68 328 78 64, e-mail: oficynawydawnicza@adm.uz.zgora.pl
Printed by the University of Zielona Góra Printing House
Contents
List of the most important symbols . . . . . . . . . . . . . . . . . . . . . 4
List of the most important abbreviations . . . . . . . . . . . . . . . . . . 5
List of abbreviations of synthesis methods and CMCU structures . . . . 7
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1 Thesis and the main goals . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Integrated circuits and programmable devices . . . . . . . . . . . . 11
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Programmable logic devices . . . . . . . . . . . . . . . . . . 12
2.1.3 Complex programmable logic devices . . . . . . . . . . . . . 16
2.1.4 Field programmable gate arrays . . . . . . . . . . . . . . . . 17
2.2 Control units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Single-level control units (finite state machines) . . . . . . . 22
2.2.2 Microprogram control units . . . . . . . . . . . . . . . . . . 24
2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Compositional microprogram control units . . . . . . . . . . . . 26
3.1 Functional decomposition of control units . . . . . . . . . . . . . . 26
3.2 Structural decomposition of control units . . . . . . . . . . . . . . 29
3.2.1 Main definitions . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 CMCU with a base structure . . . . . . . . . . . . . . . . . 31
3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Compositional microprogram control units with mutual memory 35
4.1 CMCU with mutual memory . . . . . . . . . . . . . . . . . . . . . 35
4.1.1 Main idea of the method . . . . . . . . . . . . . . . . . . . . 36
4.1.2 Synthesis of the CMCU with mutual memory . . . . . . . . 36
4.1.3 Example of the synthesis of the CMCU with mutual memory 37
4.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 CMCU with a function decoder . . . . . . . . . . . . . . . . . . . . 42
4.2.1 Main idea of the method . . . . . . . . . . . . . . . . . . . . 42
4.2.2 Synthesis of the CMCU with a function decoder . . . . . . 43
4.2.3 Example of the synthesis of the CMCU with a function decoder 44
4.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 CMCU with output identification . . . . . . . . . . . . . . . . . . . 47
4.3.1 Main idea of the method . . . . . . . . . . . . . . . . . . . . 47
2 CONTENTS
4.3.2 Synthesis of the CMCU with output identification . . . . . 47
4.3.3 Example of the synthesis of the CMCU
with output identification . . . . . . . . . . . . . . . . . . . 49
4.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 CMCU with output identification and a function decoder . . . . . 52
4.4.1 Main idea of the method . . . . . . . . . . . . . . . . . . . . 52
4.4.2 Synthesis of the CMCU with output identification
and a function decoder . . . . . . . . . . . . . . . . . . . . . 53
4.4.3 Example of the synthesis of the CMCU with output
identification and a function decoder . . . . . . . . . . . . . 54
4.4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5 Compositional microprogram control units with sharing codes 58
5.1 CMCU with sharing codes . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.1 Main idea of the method . . . . . . . . . . . . . . . . . . . . 59
5.1.2 Synthesis of the CMCU with sharing codes . . . . . . . . . 59
5.1.3 Example of the synthesis of the CMCU with sharing codes 61
5.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 CMCU with sharing codes and a function decoder . . . . . . . . . 65
5.2.1 Main idea of the method . . . . . . . . . . . . . . . . . . . . 66
5.2.2 Synthesis of the CMCU with sharing codes
and a function decoder . . . . . . . . . . . . . . . . . . . . . 66
5.2.3 Example of the synthesis of the CMCU with sharing codes
and a function decoder . . . . . . . . . . . . . . . . . . . . . 68
5.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 CMCU with an address converter . . . . . . . . . . . . . . . . . . . 70
5.3.1 Main idea of the method . . . . . . . . . . . . . . . . . . . . 71
5.3.2 Synthesis of the CMCU with an address converter . . . . . 71
5.3.3 Example of the synthesis of the CMCU with an address
converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4 CMCU with an address converter and a function decoder . . . . . 77
5.4.1 Main idea of the method . . . . . . . . . . . . . . . . . . . . 77
5.4.2 Synthesis of the CMCU with an address converter
and a function decoder . . . . . . . . . . . . . . . . . . . . . 78
5.4.3 Example of the synthesis of the CMCU with an address
converter and a function decoder . . . . . . . . . . . . . . . 79
5.4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6 Partial reconfiguration of CMCUs implemented in the FPGA 83
6.1 Introduction to partial reconfiguration of FPGA devices . . . . . . 83
6.2 Mechanism of partial reconfiguration of Xilinx FPGAs . . . . . . . 85
6.3 Traditional prototyping flow of control units . . . . . . . . . . . . . 87
6.4 Partial reconfiguration of CMCUs implemented in the FPGA . . . 88
36.5 Example of partial reconfiguration . . . . . . . . . . . . . . . . . . 89
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7 CAD Tool for Automatic synthesis of CMCUs (ATOMIC) . . 93
7.1 Overview of ATOMIC . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2 Realisation of ATOMIC . . . . . . . . . . . . . . . . . . . . . . . . 94
8 Results of experiments . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.1 Results of experiments of prepared methods . . . . . . . . . . . . . 96
8.1.1 Library of test modules . . . . . . . . . . . . . . . . . . . . 96
8.1.2 Verification of the prepared methods . . . . . . . . . . . . . 97
8.1.3 Results of experiments . . . . . . . . . . . . . . . . . . . . . 98
8.2 Analysis of the results of experiments . . . . . . . . . . . . . . . . 99
8.3 Results of experiments of partial reconfiguration . . . . . . . . . . 102
9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
A Description of ATOMIC . . . . . . . . . . . . . . . . . . . . . . . . 109
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
A.1 Structure of ATOMIC . . . . . . . . . . . . . . . . . . . . . . . . . 109
A.2 Input and output data formats of ATOMIC . . . . . . . . . . . . . 110
A.2.1 Input data format of ATOMIC . . . . . . . . . . . . . . . . 110
A.2.2 Intermediate data formats of ATOMIC . . . . . . . . . . . 112
A.2.3 Output data format of ATOMIC . . . . . . . . . . . . . . . 114
A.3 Arguments of ATOMIC modules . . . . . . . . . . . . . . . . . . . 116
A.3.1 Usage of the fc2olc module . . . . . . . . . . . . . . . . . . 116
A.3.2 Usage of the olc2mcu module . . . . . . . . . . . . . . . . . 116
A.3.3 Usage of the mcu2verilog module . . . . . . . . . . . . . . . 117
B Detailed results of experiments . . . . . . . . . . . . . . . . . . . . 118
B.1 Detailed results of implementation of prepared methods . . . . . . 118
B.2 Detailed results of partial reconfiguration . . . . . . . . . . . . . . 126
List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
List of listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Streszczenie (Summary) . . . . . . . . . . . . . . . . . . . . . . . . . . 147
List of the most important symbols
B – set of operational vertices of the flow-chart; B={b1,. . . ,bK}
X – set of conditional vertices of the flow-chart; X={x1,. . . ,xL}
Y – set of microoperations executed by the controller; Y={y1,. . . ,yN}
K – number of all operational vertices in the flow-chart
L – number of conditional vertices in the flow-chart
N – number of microoperations that are executed by the controller (the total
number of microoperations is equal to N + 2 because of two additional mi-
crooperations: y0 and yK)
C – set of operational linear chains; C={α1,. . . ,αG}
C ′ – subset of the set C; it contains only such OLCs that are not connected with
the final vertex of the flow-chart; C ′ ⊂ C
I – set of inputs of OLCs; I={I1,. . . ,IJ}
O – set of outputs of OLCs; O={O1,. . . ,OG}
G – total number of OLCs in the flow-chart
J – number of all OLCs inputs
αg – operational linear chain g
Itj – t-th input of the operational linear chain αj
Og – output of the OLC αg
Mg – number of operational vertices that belong to the OLC αg
M1 – number of operational vertices of the longest OLC; M1=Mg (here αg is the
OLC that contains the greatest number of operational vertices)
R1 – number of bits required to encode the longest OLC; R1=dlog2M1e
M2 – number of OLCs in the set C (equal to the parameter G)
R2 – number of bits required to encode OLCs; R2=dlog2M2e
M3 – number of all microinstructions kept in control memory (equal to the pa-
rameter K and to the number of operational vertices in the flow-chart)
5R3 – minimum number of bits required to address microinstructions; R3=dlog2M3e
MZ – number of all OLCs inputs (equal to the parameter J)
RZ – number of bits required to encode all OLC inputs; RZ=dlog2MZe
ROI – minimum number of bits required for the recognition of OLC outputs
T – excitation function for the counter; it consists of R1 variables: T={t1,. . . ,tR1}
D – excitation function for the register; it consists ofR2 variables: D={d1,. . . ,dR2};
it is not generated in the case of CMCUs with mutual memory
A – function generated by the counter (in the case of CMCUs with sharing codes
it is treated as a minor-part of the microinstruction address, in all other
CMCUs this function directly addresses microinstructions); A={a1,. . . ,aR1}
Q – feedback function usually generated by the register (except CMCUs with
mutual memory, where Q is generated by the counter and is a sub-function
of A: Q ⊂ A); it consists of R2 variables: Q={q1,. . . ,qR2}
Z – excitation function for the function decoder; consists of RZ variables:
Z={z1,. . . ,zRZ}
V – converted microinstruction address generated by the address converter; it
consists of R3 variables: V={v1,. . . ,vR3}; usually implemented as memory
y0 – additional microoperation, used for organising the addressing mode (it incre-
ments the counter and forbids changing the state of the CMCU when equal
to 0; it loads the counter and changes the state when equal to 1)
yK – additional microoperation, used for indicating that the final vertex of the
flow-chart will be reached (at the next clock trigger); it terminates the fetch-
ing of microinstructions from control memory
SCM – volume of control memory:
• SCM=(N+2)∗2R1+R2 – for CMCUs with sharing codes
• SCM=(N+2)∗2R3 – for CMCUs with an address converter
• SCM=(N+2)∗2R1 – for all other CMCUs
SFD – volume of the function decoder:
• SFD=R1∗2RZ – for all CMCUs with mutual memory
• SFD=(R1+R2)∗2RZ – for all CMCUs with sharing codes
SCA – volume of the address converter; SCA=(R1 +R2)∗2R3
List of the most important abbreviations
ASIC Application Specific Integrated Circuit
BRAM Block Random Access Memory
CLB Configurable Logic Block
CPLD Complex Programmable Logic Device
CU Control Unit
CMCU Compositional Microprogram Control Unit
FDC Flip-flop with Data and asynchronous Clear
FDCE Flip-flop with Data, asynchronous Clear and clock Enable
FPGA Field Programmable Gate Array
FSM Finite State Machine
HDL Hardware Description Language
LUT Look-Up Table
MCU Microprogram Control Unit
PAL Programmable Array Logic
PLA Programmable Logic Array
PLD Programmable Logic Device
PROM Programmable Read Only Memory
SoC System-on-a-Chip
SoPC System-on-a-Programmable-Chip
SPLD Simple Programmable Logic Device
List of abbreviations of synthesis
methods and CMCU structures
MM (synthesis method of a CMCU with) Mutual Memory
FD Function Decoder
OI Output Identification
OD Output identification and a function Decoder
SC Sharing Codes
SD Sharing codes and a function Decoder
CA Converter of an Address
CD Converter of an address and a function Decoder
CMCU UMM CMCU (represented as a unit) with Mutual Memory
CMCU UFD CMCU with a Function Decoder
CMCU UOI CMCU with Output Identification
CMCU UOD CMCU with Output identification and a function Decoder
CMCU USC CMCU with Sharing Codes
CMCU USD CMCU with Sharing codes and a function Decoder
CMCU UCA CMCU with a Converter of an Address
CMCU UCD CMCU with a Converter of an address and a function Decoder
Chapter 1
INTRODUCTION
A control unit (CU) is one of the most important parts of any digital system
(De Micheli, 1994; Clements, 2000; Łuba, 2003; Bolton, 1990; Bursky, 1999). It
can be found in almost all devices that contain microelectronics, such as computers’
central processing unit (CPU), cellular phones, cars and even remote controllers.
The control unit is responsible for managing all modules of the designed system –
it sends adequate microinstructions that should be executed (Gajski, 1996).
Most control units that are available on the market are created as a single-
level finite-state-machine (FSM). This means that the control unit is formed as a
simple Moore or Mealy automaton (Mealy, 1955; Moore, 1956). Such a solution
was good for small systems. But the size of devices grows very fast, and now
complex digital systems can be implemented using one digital board such as a
system-on-a-chip (SoC) or a system-on-a-programmable-chip (SoPC). Especially
SoPC systems, where logic functions are realized using programmable logic devices
(PLDs), complex programmable logic devices (CPLDs) or field programmable gate
arrays (FPGAs), are very popular nowadays. Such devices compact all elements
of the design on a single chip that contains built-in logic and dedicated memory
blocks (Altera, 2006; Xilinx, 2000). Therefore, traditional methods of control unit
prototyping evolve. One of effective methods of CU realization is the application
of the model of the compositional microprogram control unit (CMCU).
The compositional microprogram control unit is a multi-level device, where
the control unit is decomposed into two main units (Łuba, 2005; Baranov, 1994;
Barkalov, 2002). The first one is responsible for addressing microinstructions that
are kept in control memory. It is a simple finite-state machine. The role of the
second part is to hold and generate adequate microinstructions. Such a solution
may lead to the minimization of the number of logic elements that are used for
the implementation of the CU. Therefore, wider areas of the target device can
be accessed by other modules of the designed system. CMCU memory can be
implemented using either logic elements or dedicated memory blocks of a chip
(Wiśniewski, 2005b; Xilinx, 2004; Altera, 2006).
1.1. Thesis and the main goals 9
1.1. Thesis and the main goals
This dissertation is focused on proving that the following claim is true:
Appropriate modification of traditional structures of the compositional
microprogram control unit permits to decrease the number of logic
blocks that are required for the implementation of the controller in the
target FPGA device.
There are two main goals of the dissertation. The first one is to reduce the
number of logic blocks that are required for the implementation of the composi-
tional microprogram control unit in an FPGA. The reduction is reached trough
the application of additional internal blocks of the control unit.
The second aim is to reduce the size of the bit-stream that is sent to the FPGA
during physical implementation of the compositional microprogram control unit.
This task will be solved thanks to partial reconfiguration of programmable devices.
1.2. Structure
The dissertation is divided into eight chapters and two appendices. Chapter 1
presents the thesis and the main goals. It outlines the structure of the book.
The work related to the dissertation is reviewed in Chapter 2. Information
about integrated circuits, programmable devices and control units is briefly given.
Functional and structural decomposition of control units is presented in Chap-
ter 3. Moreover, compositional microprogram control units based on structural
decomposition are presented in greater detail. The most important symbols and
abbreviations related to the CMCU are defined, too.
Chapter 4 introduces new synthesis methods of a CMCU implemented in the
FPGA. All proposed methods are based on a CMCU with mutual memory. Each
designing method is presented in detail and illustrated with an example.
Chapter 5 has a structure similar to that of Chapter 4. New synthesis methods
of the CMCU implemented in the FPGA are presented. However, all the proposed
CMCUs are based on the application of the idea of sharing codes. Each method
is shown in detail and illustrated with an example.
The idea of partial reconfiguration of control units implemented in the FPGA
is shown in Chapter 6. First, the mechanism that permits to replace a portion of
the design implemented in the FPGA is briefly introduced. Next, the traditional
design process of controllers is presented. Furthermore, a new prototyping flow
based on the application of partial reconfiguration of CMCUs implemented in the
FPGA is proposed.
Chapter 7 briefly describes a tool that was designed for automatic synthe-
sis of the CMCU (ATOMIC). The main ideas used during the implementation
of ATOMIC are presented and reviewed. The tool implements all synthesis al-
gorithms presented in Chapters 4 and 5. Therefore, its functionality aided the
prototyping process of CMCUs and was a very important link during experiments.
10 1. Introduction
The most important results of experiments are presented in Chapter 8. The
analysis of the gained values was divided into two parts. The first one deals
with the effectiveness of the proposed synthesis methods, while the second one
concentrates on the results achieved during partial reconfiguration. A detailed
analysis of experimental results is concluded with an attempt to select a suitable
method depending on the initial description of the controller.
Chapter 9 summarises the dissertation. Conclusions and plans for future work
are presented.
The structure of ATOMIC is shown in Appendix A. Input and output data
formats are described. Furthermore, all ATOMIC modules are presented in detail.
Appendix B describes detailed results of experiments. Values gained during
the verification of the effectiveness of the proposed synthesis methods are shown.
Moreover, the results of partial reconfiguration of CMCUs implemented in the
FPGA are presented.
Chapter 2
RELATED WORK
2.1. Integrated circuits and programmable devices
2.1.1. Introduction
By the late 1940s the first transistor was created as a point-contact device formed
from germanium. Such an invention was the basis for further digital circuits and
programmable devices. In the next decade the development of transistors benefited
in the first digital logic gates and circuits classified as TTL (transistor-transistor
logic). The devices had up to 16 pins and each could perform a simple logic function
(for example, the device 7400 contained four 2-input NAND gates, 7404 – six NOT
inverters). Those small circuits were the first application specific integrated circuits
(ASICs) where logic functions were fixed and unchangeable. It means that the
ASIC contains dedicated logic values and cannot be reconfigured. Up to the early
1970s, ASICs were developed implementing more and more gates on one chip,
although the main problem was fixed logic. Once manufactured device could not
be changed, therefore there was no possibility to correct any errors or bugs. The
designer could not verify her/his prototype using a real physical circuit.
This problem was solved in the 1970s, when the first programmable devices
were introduced. Those circuits were named programmable logic devices (PLDs).
The PLD was built as a fixed array of AND (OR) functions driving a programmable
array of OR (AND) functions (Maxfield, 2004). Such a circuit was initially used
to implement simple combinational logic. Later, registered and tri-state outputs
were added. In the early 1980s the first complex PLD (CPLD) was presented.
Basically, the CPLD is an extended version of the PLD; it contains small PLD
blocks linked to interconnect arrays. CPLDs were highly configurable but it was
impossible to implement large and complex functions. Thus, in 1984, the first
field programmable gate array (FPGA) was introduced. The device was made of a
matrix of programmable logic blocks. Each logic block contained a 3-input look-up
table (LUT) that could perform any combinational function. Additionally, there
were a simple multiplexer and a flip-flop inside the logic block.
Nowadays, FPGAs are still the best solution for designers who want to verify
their prototype of the device. However, FPGAs are too slow and too expensive to
compete with ASICs in the case of mass production. On the other hand, features
of FPGAs like partial reconfiguration offer new ideas and new markets for such
devices.
12 2. Related work
2.1.2. Programmable logic devices
Programmable logic devices can be generally divided into two groups:
• Simple programmable logic devices (SPLDs), which refer to relatively
small programmable devices. Three types of the SPLD: PROM, the PLA
and PAL are briefly described in this subsection.
• Complex programmable logic devices (CPLDs), which consist of mul-
tiple smaller devices (like the SPLD). Because of their structure, CPLDs will
be shown in the next subsection in greater detail.
2.1.2.1. Programmable read-only memory (PROM)
The first programmable logic device was made as programmable read-only memory
(PROM). Generally, PROM performs the function of a fixed AND-array which
drives the programmed OR gates (array). The idea of PROM is shown in Fig. 2.1
(the device is in the unprogrammed state).
a b c
x y zv
predefined link
programmable link
Fig. 2.1: Unprogrammed PROM device
Here the predefined (fixed) AND array has three inputs (variables a,b,c) and
eight outputs that are created as all combinations of the inputs. Thus the outputs
of the AND matrix hold all terms of the input function. The OR-matrix is pro-
grammable and its inputs refer to the outputs of the AND-matrix. Programmable
2.1. Integrated circuits and programmable devices 13
links of the OR array permit to create any sum of logic terms, therefore any
logic function may be programmed using the AND-OR structure. An example of
a programmed PROM device is shown in Fig. 2.2. The device performs simple
combinational functions of three inputs and four outputs, where
v = a · b · c+ a · b · c,
x = a · b · c+ a · b · c, (2.1)
y = a · b · c+ a · b · c+ a · b · c,
z = a · b · c+ a · b · c+ a · b · c.
The AND-array produces the combination of all possible logic terms. The
logic functions v, x, y, z are in fact realised by the OR-array via programmable
lines.
Fig. 2.2: Programmed PROM device
It is clear that every combination of inputs is always decoded creating the
possibility of using all terms of the input function. Therefore, PROMs are useful
for circuits that realise functions with a large amount of product terms. The main
problem in PROM-based PLDs is the number of inputs because each additional
input requires a memory volume that is twice as wide.
14 2. Related work
2.1.2.2. Programmable logic array (PLA)
In the mid-1970s, a new way of PLDs realisation was invented. In programmable
logic arrays (PLAs), both matrices, AND and OR, are configurable. What is most
important, the number of inputs does not influence the number of terms performed
by the AND-array. Therefore, the device based on PLA technology could perform
functions with a relatively higher number of inputs in comparison to PROM. An
example of an unprogrammed PLA device is illustrated in Fig. 2.3.
a b c
x y zv
predefined link
programmable link
Fig. 2.3: Unprogrammed PLA device
The cost of the device can be reduced by minimizing the functions that
ought to be realised (Dagless, 1983; Ciesielski and Yang, 1992; Yang and Ciesiel-
ski, 1989; Sasao, 1988). Because both matrices are programmable, the AND-
matrix defines prime implicants while the OR-matrix sums all necessary impli-
cants. The functions defined in the example (2.1) can be minimized and repre-
sented as
v = a · b,
x = a · b, (2.2)
y = a · c+ b · c,
z = a · c+ a · b · c.
There are only four primary implicants needed to represent all functions.
Therefore, the AND-matrix realises four minterms unlike the PROM device, where
all eight lines were used. Furthermore, the OR-matrix sums the products generated
2.1. Integrated circuits and programmable devices 15
by the AND-matrix using four programmable lines (Fig. 2.4). Additionally, the
functions y and z use the sharing of the minterm ac, thus the size of both matrices
is reduced.
Fig. 2.4: Programmed PLA device
The PLA is useful especially for large projects where many common implicants
are present in the design. However, because of its structure, the PLA was relatively
expensive to manufacture. Furthermore, the device was slow due to propagation
delays that appeared in programmable links. Therefore, in the late 1970s, a new
type of PLD devices was proposed – programmable array logic.
2.1.2.3. Programmable array logic (PAL)
The structure of programmable array logic (PAL) is the opposite of the PROM
structure. There is a programmable AND-matrix and a fixed OR-plane in the PAL
device. An example of an unprogrammed PAL circuit is presented in Fig. 2.5.
In PAL, only the first level of the device is configurable. Thus, any variation of
input values in the AND-matrix may be defined by the designer. Such a combina-
tion creates product terms at outputs of the AND-plane. The OR-matrix is fixed
so it can connect a restricted number of product terms for the realisation of logic
functions. Figure 2.6 shows example implementation of the functions 2.2. Simi-
larly to the PLA, the AND-matrix is programmed to generate prime implicants on
its outputs. Furthermore, the OR-matrix generates proper functions using fixed
OR-lines.
16 2. Related work
a b c
x y zv
predefined link
programmable link
Fig. 2.5: Unprogrammed PAL device
PAL devices are not so flexible in configuration as PLAs. The structure of the
device ought to be prepared, and very often designers use some approaches to omit
the restricted number of product terms that should be OR-ed (Kania, 2004; Kania
and Kulisz, 2007; Hrynkiewicz et al., 1997). The main advantage of PAL devices
in comparison to PLAs is their speed. There is only one programmable matrix in
PAL, thus the circuit is much faster in comparison with the PLA device. Another
important benefit is the price. Thanks to their low cost and high speed, PAL
devices became very popular in the late 1970s. However, the development of
prototyped systems effected the appearance of a new branch of programmable
circuits: complex programmable logic devices.
2.1.3. Complex programmable logic devices
The complex programmable logic device appeared at the beginning of the 1980s.
The main idea was to connect several small PLD devices to create a wider area
for programmable logic. The first CPLDs had 100% connectivity to the inputs
and outputs associated with each block (Maxfield, 2004). Therefore, the inter-
connection array was huge, which made the whole device slow and expensive in
production. The solution to this problem was the programmable interconnect ar-
ray (PIA). Nowadays, each vendor of CPLD devices has its own technology of
2.1. Integrated circuits and programmable devices 17
Fig. 2.6: Programmed PAL device
CPLD manufacturing, but the idea is similar: all SPLDs share a common PIA
(Kania, 2004; Łuba, 2003; Maxfield, 2004). Figure 2.7 shows the structure of a
typical CPLD device.
As has been mentioned, a typical CPLD consists of a programmable intercon-
nect array surrounded by macrocells (Kania, 2004; Łuba, 2003; Maxfield, 2004).
The macrocell is built using AND-OR matrices (usually small PLDs, like PAL),
programmable flip-flops, and additional logic elements like multiplexers, OR and
XOR gates. The most popular CPLDs that are currently available on the market
are devices from Altera, Xilinx, Atmel, Lattice, Lucent and Cypres.
2.1.4. Field programmable gate arrays
The first field programmable gate array appeared in the mid-1980s. Its structure
was different in comparison to the CPLD (Xilinx, 2001; Altera, 2008; Maxfield,
2004; Jenkins, 1994). The main idea of FPGAs was to use programmable logic
elements for the implementation of simple logic functions. Such elements are called
look-up tables (LUTs) and can perform any logic function with a specific number
of inputs and one output. Early FPGAs contained logic elements that could
perform any logic function up to three inputs and one output. Additionally, each
LUT was connected with a multiplexer and a register, which created a simplified
programmable logic block. The idea of the early programmable logic block is
18 2. Related work
Fig. 2.7: CPLD structure
illustrated in Fig. 2.8. As has been mentioned, the LUT could be configured to
perform any combinational function with three inputs and one output. Sequential
functions could also be realised thanks to the register connected with the LUT.
3-input
LUT
mux
flip-flop
y
q
a
b
c
d
clock
Fig. 2.8: Early CLB structure
2.1. Integrated circuits and programmable devices 19
To perform functions that require more than one output (or more than three
inputs), more logic blocks were used. Thanks to the structure of the FPGA,
logic blocks are connected via an interconnect matrix. Generally, the FPGA was
created as a "... large number of logic blocks (islands), surrounded by a sea of
programmable interconnections ..." (Maxfield, 2004).
Nowadays, there are many vendors of FPGA devices. The most popular are
Xilinx, Altera, Lattice, Actel and Atmel. The structure of the FPGA depends on
the vendor, although the idea is the same – the usage of the array of logic blocks
based on the LUT and the register. There are different names for logic blocks,
interconnections and other elements of the FPGA because vendors ascribe new
ideas in the branch to their inventions. In this dissertation, the structure of the
FPGA will be described based on Xilinx devices. Therefore, all references and
information will concern Xilinx FPGAs.
Typical structure of the FPGA from Xilinx is shown in Fig. 2.9. The main
elements of the device include: a matrix of configurable logic blocks (CLBs), con-
figurable input/outputs blocks (IOBs) and dedicated memory blocks (BRAMs).
All elements are connected via a programmable interconnect matrix.
Blo
ck

 
R
AM
 CLBs
CLBs
Blo
ck

 
R
AM

CLBs
Block 

R
AM
CLBs
Block 

R
AM
I/O Logic
Fig. 2.9: Structure of a typical FPGA device
The configurable logic block (CLB) consists of four logic cells (LCs). Two logic
cells are organized into a slice. Thus each CLB has two similar slices. Figure 2.10
illustrates the idea of the CLB structure.
20 2. Related work
LE
mux
LE
slice
LE
mux
LE
slice
mux
Fig. 2.10: Structure of the CLB block
The main elements of the logic cell include: a four-input and one-output LUT,
a multiplexer and a flip-flop. Each family also has additional logic (like carry logic
for arithmetic operations); however, it is not important for our further discussion.
A simplified structure of the logic element (LE) is presented in Fig. 2.11. The
register can be configured either as the flip-flop or a latch. What is important,
the polarity of the clock may be rising-edge or falling-edge. Thus each LE could
be configured as an active-low or active-high clock trigger.
The look-up table has four inputs and one output. Therefore it can realise
any logic function that has maximum four input variables. Larger functions ought
to be decomposed and more LUTs (or even slices) have to be used. Additionally,
the look-up table can be configured as 16x1 RAM (random-access memory) or a
16-bit shift register.
The main role of the configurable input/output block (IOB) is to ensure the
connectivity beetween the FPGA and other elements of prototyped systems. Thus
a very essential fact is the wide variety of power supply standards (Maxfield, 2004).
IOBs are organized into banks (the number of banks depends on the FPGA). Each
bank may be configured independently so the designer can connect other devices
to the FPGA, with different input/output standards.
Dedicated memory blocks are very important components of the FPGA. Nowa-
days, a lot of designs contain memory that can be implemented with dedicated
memory blocks (Bursky, 1999). Each vendor has its own concept of such blocks.
In the case of Xilinx devices, there are block-RAMs (BRAMs). Such elements
are synchronous, therefore the clock signal ought to be delivered. The number of
BRAMs and the number of logic cells per each block depend on the particular de-
vice. BRAMs are organized into columns and each BRAM can be used separately.
2.2. Control units 21
4-input
LUT
mux
flip-flop
y
q
a
b
c
e
clock
16x1 RAM
16x bit SR
d
clock enable
set/reset
Fig. 2.11: Structure of the logic element (LE)
The main advantage of BRAMs is their size and reconfigurability. Depending on
the device, there can be stored even up to three mega bits in the memory (the
XC2V8000 device from the Virtex-II family). Additionally, BRAMs may be very
easily reconfigured by the process of partial reconfiguration. The content of one
(or more) BRAM is replaced while the rest of the device remains unchanged. Such
a solution reduces the size of the target bit-stream used to configure the FPGA.
Moreover, the design and configuration time is highly reduced as well. Partial
reconfiguration of control units implemented on FPGAs is described in Chapter 6
in greater detail.
2.2. Control units
A digital system may be represented by the composition of the control unit (CU)
and the operational unit (OU), also known as a data-path (Gajski, 1996; De Micheli,
1994; Barkalov and Węgrzyn, 2006b). The idea of such a defined digital system is
illustrated in Fig. 2.12.
Based on the set of input values (I ) and the set of logic conditions (X ), the
CU sends proper microoperations (Y ) to the OU. Additionally, a set of output
values (O) is generated. The set of inputs (I) and the set of outputs (O) are
used for communication with the environment of the digital system (Łuba, 2001;
Molski, 1986).
The operational unit executes microoperations (Y) by processing proper input
(Data) and generating results (Results). Additionally, the OU generates logic
conditions (X) for the control unit.
22 2. Related work
CU OU
DataFunctionI
Y
ResultsO
X
Fig. 2.12: Model of a digital system
2.2.1. Single-level control units (finite state machines)
The most popular realisation of control units nowadays is a finite state machine
(FSM), also known as the finite state automaton (Łuba, 2003; Baranov, 1994;
Barkalov and Węgrzyn, 2006b; Traczyk, 1982; Adamski et al., 2007; Adamski and
Węgrzyn, 2003; Adamski, 1980; Wiśniewska et al., 2007; Bukowiec et al., 2008;
Bukowiec, 2006). The FSM is a model of behavior that consists of the set of states,
the set of transitions between the states, and the set of actions (microoperations).
Formally, the FSM can be described as a 6-tuple vector:
M =< S,X, Y, f, h, s0 >, (2.3)
where
• S = {s0, s1, . . . , sK} is a non-empty finite set of states;
• X = {x0, x1, . . . , xL} is a finite set of inputs;
• Y = {y0, y1, . . . , yN} is a finite set of outputs;
• f : S × X → S is the transition function, which determines the next state
ss ∈ S depending on the current state sm ∈ S and on the value of the input
xl ∈ X;
• h : S ×X → Y is the output function, which determines the current output
yn ∈ Y , based on the current state (in the case of a Moore automaton) or
depending on the current state and the current input (in the case of a Mealy
automaton);
• s0 ∈ S is the initial state of an automaton.
2.2. Control units 23
Figure 2.13 shows a typical realization of the finite state machine (Baranov,
1994; Barkalov and Węgrzyn, 2006b). There are two main units in the FSM. The
combinational circuit CC generates proper output values (microinstruction) and
indicates excitation functions for the register RG, which is in charge of holding
the internal state sm ∈ S of an FSM.
CC
RGD
X
Q
Y
Fig. 2.13: Model of a finite state machine
The FSM can be realised as a Moore or a Mealy automaton. If the control
unit is described as a Moore FSM, then outputs depend on the current state of
the automaton (Moore, 1956). A microinstruction is represented as
Yt = f(sm), (2.4)
where Yt means the value of the output and sm ∈ S represents the current state
of the FSM. The main advantage of such a realisation is the simplification of the
behaviour of the control unit. States are clearly tied with the action generat-
ing proper microinstruction while inputs (conditions) influence only transitions
between states.
The second way of implementing the FSM is a Mealy automaton (Mealy,
1955). The value of outputs depends not only on the current state, but also on
input signals:
Yt = f(sm, X), (2.5)
where X is a set of input values (conditions).
The main benefit offered by a Mealy FSM is the reduction of the number
of internal states of the automaton in comparison with a Moore FSM. Both the
Moore and Mealy automata are classified as single-level control units.
The optimization of the FSM is one of the most popular tasks nowadays.
There are many ideas focused on improving the prototyping process and en-
coding internal states of the automaton (Sentovich et al., 1992a; Hrynkiewicz
et al., 1997; Ashar et al., 1990; Ashar et al., 1992; Kubátová, 2005; Perkowski
et al., 2001; Rawski et al., 2003; Barkalov, 1998; Ahmad et al., 2000; Łabiak,
2005a; Łabiak, 2005b; Bubacz, 2008). These research works benefited in the ap-
pearance of computer-aided design systems, like Sequential Circuit Synthesis, SIS
(Sentovich et al., 1992b). It contains algorithms for state assignments (NOVA,
JEDI ) and state minimization (STAMINA).
The next subsection deals with microprogram control units where outputs of
the controller are organized in microinstructions.
24 2. Related work
2.2.2. Microprogram control units
The idea of microprogramming was introduced by M. V. Wilkes in 1951 as an
intermediate level to execute computer program instructions (Wilkes, 1951; Mol-
ski, 1986; Traczyk, 1982; Husson, 1970; Kravcov and Chernicki, 1976; Misiurewicz,
1982; Papachristou, 1979; Stallings, 1996). Microprograms were organized as a
sequence of microinstructions and stored in special control memory (CM). The
algorithm for the MCU is usually specified by flow-chart (FC) description (Łuba,
2005; Barkalov and Węgrzyn, 2006b; Kołopieńczyk, 2008). Such a flow-chart al-
gorithm consists of four main types of vertices (start, end, the operational vertex,
the conditional vertex) that are interpreted by the control unit. More information
about the flow-chart algorithm can be found in Chapter 3.
RAMID
X
Q
Y
CMASQ
Fig. 2.14: Model of a microprogram control unit (MCU)
A typical structure of the MCU is presented in Fig. 2.14. There are three main
blocks that form the MCU: a sequencer SQ, a register of the microinstruction
address RAMI and control memory CM (Łuba, 2005; Barkalov and Węgrzyn,
2006b). The sequencer is a combinational circuit that forms the excitation function
for RAMI:
D = f(X,T ). (2.6)
Here X is a set of logic conditions of the system and T is a feedback function
generated by control memory. Based on this function, RAMI generates the proper
microinstruction address A. Control memory CM holds a microprogram that is
further executed by the operational part of the system. There are different methods
of microinstruction addressing (Barkalov and Palagin, 1997; Łuba, 2003); however,
in most cases CM also keeps the addresses of the next microinstructions that should
be executed. The feedback function T is analysed by the sequencer, which, based
on the condition from the set X, selects the proper address A.
There are many ways to design the MCU (Łuba, 2005; Adamski and Barkalov,
2006). One of the most popular ones is to perform the sequencer as the multiplexer,
and RAMI as the counter (Łuba, 2005). Then the structure of the MCU may be
interpreted as the system shown in Fig. 2.15.
In the MCU presented in Fig. 2.15, CM generates two feedback functions
– T for the counter and C for the multiplexer. Such a realisation is especially
fruitful in the case of long segments (chains) of microinstructions (Łuba, 2001).
Then the chain of microinstructions that are not separated by the condition may
be replaced with one state (block). Chapter 3 deals with the usage of chains of
microinstructions in greater detail.
2.3. Conclusions 25
Fig. 2.15: Model of a microprogram control unit with a counter
The main advantage of the microprogram control unit is the simplicity of
its structure. Outputs of the controller are organized in microinstructions and
they can be easily replaced. Additionally, control memory may be implemented
using dedicated memory blocks of the FPGA reducing the area of logic elements
application.
Apart from its benefits, the MCU has some disadvantages. Control memory
holds not only microoperations but also information for the calculation of the ad-
dress of the next microinstruction. Very often the size of control memory exceeds
the size of the dedicated memory block of an FPGA. To eliminate these disad-
vantages, the control unit may be decomposed and designed as a compositional
microprogram control unit (CMCU).
2.3. Conclusions
Two main topics were discussed in this chapter. The basic knowledge of pro-
grammable devices and control units is essential for understanding further analysis
and synthesis methods. The next chapter presents the structure of compositional
microprogram control units in detail.
Chapter 3
COMPOSITIONAL MICROPROGRAM
CONTROL UNITS
Any flow-chart of an algorithm can be interpreted as the compositional micropro-
gram control unit (Barkalov, 2002). In the CMCU, the control unit is decomposed
into two main parts. The first one is responsible for addressing microinstructions
that are kept in control memory. The role of the second part is to hold and
generate adequate microinstructions.
The control unit may be decomposed in two ways. The first one is functional
decomposition. Here the controller is decomposed based on its internal functions
and states. The second method is structural decomposition, where the task of
decomposition is reached thanks to the modification of the structure of the control
unit. Both ideas of the decomposition of the control unit lead to the compositional
microprogram control unit.
3.1. Functional decomposition of control units
Functional decomposition is a process that splits a complex function into smaller
sub-functions (Kania et al., 2005; Łuba, 2005; Devadas et al., 1988; Sasao, 1999;
McCluskey, 1986; Rawski et al., 2005; Scholl, 2001). Such a realisation is often used
as a part of logic synthesis of designs implemented with programmable devices.
Functional decomposition is widely expanded especially by academic organizations
(Sentovich et al., 1992b; Kania and Kulisz, 2007; Kania, 2007; Łuba et al., 2002;
Rawski et al., 2001; Tkacz, 2006). This dissertation focuses on the decomposition of
control units implemented on the FPGA. The optimization of SPLDs and CPLDs
can be found in (Kania, 2004; Kania, 1999; Ciesielski and Yang, 1992; Devadas
et al., 1988; Devadas et al., 1989; Muthukumar et al., 2007; Sasao, 1999; Sasao,
1999; Sentovich, 1993; Solovjev, 1996; Barkalov et al., 2007a).
In the FPGA, a limited number of inputs and only one output of LUT elements
make functional decomposition very effective (Scholl, 2001; Łuba, 2005; Rawski
et al., 2003; Łach et al., 2003; Pasierbiński and Zbysiński, 2001). The idea of func-
tional decomposition is widely used by both commercial (Xilinx, Altera, Synplicity,
etc.) and non-commercial (universities) organisations. It should be pointed out
that the best results are achieved by non-commercial projects such as DEMAIN
(Technical University of Warsaw) or SIS (University of Berkeley).
3.1. Functional decomposition of control units 27
Functional decomposition may be realised as serial decomposition or parallel
decomposition. In the first one, the set X of input variables is split into two subsets
U and V (Łuba, 2003).
G
H
U V
Z
Y=F(X)
Fig. 3.1: Idea of serial functional decomposition
The set V forms inputs for the function G, which generates the set of outputs
Z=G(V ). Of course, the method makes sense only if the number of outputs of the
function G is fewer than the number of its inputs. Furthermore, the outputs Z
generated by G and the set U form inputs for the function H ( 3.1). Finally, the
function F is represented as follows:
F = H(U,G(V )). (3.1)
The aim of parallel decomposition is to decompose the initial function F into
two separate sub-functions G and H (Łuba, 2005). The main idea is to split the
set of outputs Y into two subsets YG and YH . Here Y is the set of outputs of
the function F . Similarly, YG is the set of outputs of the function G and YH –
the set of outputs of H. The method makes sense if either of the functions G
or H has fewer input variables than the initial function F . The idea of parallel
decomposition is illustrated in Fig. 3.2.
Serial and parallel decompositions are very often combined. Balanced de-
composition unites both methods (Łuba, 2005; Rawski et al., 2005c; Selvaraj and
Luba, 1995). The whole process may be divided into steps. In each step, either
serial or parallel decomposition is performed. The process is repeated until a
satisfactory result is reached (Łuba, 2005).
28 3. Compositional microprogram control units
F
X
Y
G
YG
XG
H
YH
XH
Fig. 3.2: Idea of parallel functional decomposition
The presented methods of decomposition are fruitful for combinational blocks
of a system. However, they can also be used in the decomposition of control
units (Łuba, 2005; Rawski et al., 2003; Borowik, 2004; Borowik, 2005). The con-
troller may be realised as the sequential circuit shown in Fig. 3.3. The main idea
is to use control memory to hold microinstructions. Such memory is implemented
with dedicated memory blocks of the FPGA, which reduces logic resources of the
device.
Each microinstruction of the control unit presented in Fig. 3.3 consists of
two fields. The first one holds the code Q of internal states of the automaton,
while the second one contains output variables from the set Y . The next state of
the controller is determined by the input variables X and by the current state of
the control unit. The width of the address of control memory may be calculated
as |A|=n+p, where n means the number of input variables and p represents the
number of bits that are used for encoding internal states of the controller (Łuba,
2005). The volume of control memory depends on the width of its address. Each
additional bit doubles the memory volume. Thus, very often such a volume exceeds
the volume of dedicated memory blocks of the FPGA. A solution to this problem
may be functional decomposition of the control unit (Fig. 3.4).
3.2. Structural decomposition of control units 29
Fig. 3.3: Control unit realised as a sequential circuit
In the system shown in Fig. 3.4, control memory is decomposed into two parts:
the block of address modification (CAM) and smaller memory that may be realised
using the dedicated memory block of the FPGA. There are many variants of such
decomposition (Rawski et al., 2003; Borowik, 2004; Borowik, 2005). The main aim
of all methods is to decrease the size of control memory using a minimum number
of logic blocks of the FPGA.
The main benefit of functional decomposition of the control unit is very high
effectiveness. Memory may be decomposed in such a way that dedicated memory
blocks of the FPGA are used to the maximum. In other words, a minimum number
of logic blocks are used to realise the circuit of address modification. On the other
hand, only a part of a microinstruction is held in memory after decomposition.
Therefore, there is no possibility to apply the idea of partial reconfiguration, which
can significantly accelerate the prototyping process of the control unit (Chapter 6).
3.2. Structural decomposition of control units
This section deals with structural decomposition of control units and it is an
introduction to the main part of the dissertation. Thus, the idea of CMCUs created
as structural decomposition of the controller will be described in detail. The
section presents the main definitions and the basic structure of the CMCU, which
was the inspiration and motivation for the PhD thesis. The following sections
show the author’s methods of CMCU synthesis; however, all structures are based
on the ideas shown below.
30 3. Compositional microprogram control units
Register
Control Memory
 
 
 
w
m
Y
X
      n  
Q
p
 
w < n + p
UMA
Fig. 3.4: Functional decomposition of the control unit
3.2.1. Main definitions
This section introduces some definitions that will be needed later in order to de-
scribe the CMCU more formally.
Let the control algorithm (De Micheli, 1994) of a digital system (Adamski and
Barkalov, 2006; Barkalov and Węgrzyn, 2006b; Gajski, 1996) be represented as the
flow-chart Γ (Baranov, 1994) with a set of operational vertices B = {b1, . . . , bK}
and a set of edges E. Each vertex bk ∈ B contains the microoperations Y (bk) ∈ Y ,
where Y = {y1, . . . , yN} is the set of microoperations. Each conditional ver-
tex of the flow-chart contains one element from the set of logic conditions X =
{x1, . . . , xL}.
Defininition 3.1. The operational linear chain (OLC) of the flow-chart Γ is a
finite sequence of the operational vertices αg = 〈bg1, . . . , bgFg〉 such that for any
pair of adjacent components of the vector αg there is an edge 〈bgi, bgi+1〉 ∈ E,
where i is the number of the component in the vector αg (i = 1, . . . , Fg − 1).
Defininition 3.2. The vertex bq ∈ B is called an input of the OLC αg if there
is the edge 〈bt, bq〉 ∈ B, where bt is either an initial or a conditional vertex of the
flow-chart Γ or an operational vertex that does not belong to the OLC αg.
3.2. Structural decomposition of control units 31
Defininition 3.3. The vertex bq ∈ B is named an output of the OLC αg if there is
the edge 〈bq, bt〉, where bt is either a conditional or a final vertex of the flow-chart
Γ or an operational vertex that does not belong to the OLC αg.
Defininition 3.4. The parameter M1 is equal to the number of vertices in the
longest OLC αg of the flow-chart Γ.
Defininition 3.5. The minimum number of bits required to encode the variable
M1 is represented as R1 and is equal to R1 = dlog2M1e.
Defininition 3.6. The parameter M2 is equal to the number of all operational
chains presented in the flow-chart Γ.
Defininition 3.7. The minimum number of bits required to encode the variable
M2 is represented as R2 and is equal to R2 = dlog2M2e.
Defininition 3.8. The parameter M3 is equal to the number of all operational
vertices in the flow-chart Γ. This parameter also indicates the total number of
microinstructions of the CMCU.
Defininition 3.9. The minimum number of bits required to encode the variable
M3 is represented as R3 and is equal to R3 = dlog2M3e.
3.2.2. CMCU with a base structure
Let Dg be a set of operational vertices that are included in the chain αg. Then let
C = {α1, . . . , αG} be a set of OLCs of the flow-chart Γ satisfying the condition
Dg ∩Dq = Ø (g 6= q; g, q ∈ {1, ..., G});
B = D1 ∪D2 ∪ ... ∪DG;
Dg 6= Ø (g = 1, ..., G).
(3.2)
Let natural addressing of microinstructions be executed for each αg:
A(bgi+1) = A(bgi) + 1 (i = 1, ..., Fg−1), (3.3)
where A(bg) is an address of the microinstruction corresponding to the vertex
bg ∈ B. In this case, the flow-chart Γ can be interpreted as a CMCU with a base
structure denoted in the future as the CMCU UBS (Fig. 3.5).
There are four main modules in the CMCU UBS : the combinational circuit
CC, the register RG, the counter CT, and control memory CM. The combinational
circuit and the register represent a simplified FSM of microinstructions addressing
S1. Furthermore, the counter and control memory form the microprogram control
unit S2. RG keeps a code K(am) of the current state sm ∈ S of the CMCU, where
S={s1, . . . , sM} is a set of internal states. The register has dlog2M2e flip-flops and
their outputs qr ∈ Q are used to encode the states sm ∈ S, here Q={q1, . . . , qR2},
|Q| = R2. The transition from the state sm ∈ S to the state ss ∈ S is executed by
switching the register from the code K(am) to the code K(as). Such a switching
is determined by the excitation function Qr ∈ Q. CT keeps the address A(bt)
32 3. Compositional microprogram control units
CC
CT
RG
CM
T
D
A
Q
Y
X
y0
Fig. 3.5: Compositional microprogram control unit with a base structure
of the microinstruction Y (bt) that is executed by a data-path (Barkalov, 2002).
The variables ar ∈ A are used for the representation of the addresses A(bk).
Microinstructions are kept in CM having 2R1 words. Each word (microinstruction)
has N+2 bits in the case of one-hot encoding of microoperations (Barkalov et al.,
2005c; Barkalov and Wiśniewski, 2004a). One of the additional bits is used to
keep a variable y0 to organize the mode of addressing (3.3). The second bit keeps
a variable yK to terminate the fetching of microinstructions from control memory
(to clarify CMCUs structures, yK is not marked in the figures presented in the
dissertation).
The presented CMCU UBS operates in the following manner: at the begin-
ning, the register is set to the value that corresponds to the initial state of the
FSM. Similarly, the counter is set to the address of the first microinstruction. If
transitions are executed inside the OLC αg ∈ C, then y0 = 0, which causes the
increment of CT and forbids changing the state of the CMCU. When the output
of the OLC αg ∈ C is reached, then y0 = 1 and the combinational circuit forms
the excitation function for the register setting it into the proper state (Barkalov
and Wiśniewski, 2004e; Barkalov et al., 2004c; Barkalov and Wiśniewski, 2004b).
Similarly, the counter is set with the proper value as well:
D = f(Q,X), (3.4)
T = f(Q,X). (3.5)
HereX means the set of conditions, Q is the set of internal variables used to encode
the current state of the CMCU, D is a set of variables that form an excitation
function for the register D={d1, . . . , dR2}, and T is a set of variables that form an
excitation function for the counter T={t1, . . . , tR1}.
The functions (3.4) and (3.5) form a code K(sm) of the state of the transition
in the register and an address of the input of the next OLCαg ∈ C. If CT
contains the address of the microinstruction Y (bk) such as 〈bk, bE〉 ∈ E, then
yK = 1. In this case, the operation of the CMCU UBS is finished (Barkalov and
Wiśniewski, 2004g; Barkalov and Wiśniewski, 2004d; Barkalov et al., 2005a).
3.3. Conclusions 33
The main benefit coming from the realisation of the controller as the com-
positional microprogram control unit UBS is the possibility of implementing the
circuit CM using dedicated memory blocks (Wiśniewski, 2005a). Other blocks
of the prototyping system UBS are implemented with logic blocks (flip-flops and
LUT elements) of the FPGA (Łuba, 2003; Xilinx, 2005; Altera, 2008; Barkalov
et al., 2008). Such an idea leads to the reduction of the number of logic blocks
in comparison with the realisation of the controller as a traditional finite state
machine, and thus the designer can allocate a wider area of the FPGA for other
blocks of the prototyping system. The effectiveness of the CMCU is especially
high if the controller interprets the linear flow-chart. Such a flow-chart contains
75% of operational vertices or includes long linear chains (segments) of operational
vertices.
The second advantage of the CMCU is the possibility of selecting the imple-
mentation method of control memory. The designer can decide if the circuit CM
should be realised with logic blocks or with dedicated memory blocks. It is im-
portant especially in the case of designs, which consume a large area of memory.
Then the whole CMCU is implement with logic blocks of the FPGA.
In opposition to functional decomposition, structural decomposition of a con-
trol unit permits to apply the idea of partial reconfiguration (Wiśniewski, 2005b;
Barkalov et al., 2006a; Mesquita et al., 2003; Doligalski and Węgrzyn, 2007). In
this case, only a part of the controller (control memory) can be replaced while
the rest of the system remains untouched. Partial reconfiguration of control units
implemented in the FPGA is widely described in Chapter 6.
3.3. Conclusions
There were two ideas of the decomposition of control units presented in this chap-
ter. The aim of both methods is to decompose the controller into two main mod-
ules. The first is in charge of addressing microinstructions that are held in control
memory. Functional decomposition of the CU bases on the realisation of circuit
addressing using divisions of Boolean functions. In structural decomposition, ad-
ditional internal blocks are added to the structure of the control unit.
The idea of structural decomposition of the control unit presented in this chap-
ter was the basis for the development of new synthesis methods of the CMCU.
In (Barkalov, 2002), Barkalov introduced two new significant ways of control
units: the CMCU with mutual memory and the CMCU with sharing codes.
The aim of both methods was to reduce the number of logic blocks required
for the implementation of the controller. Experiments showed that either the
implementation of the CMCU with mutual memory or the CMCU with sharing
codes instead of the traditional CMCU with a base structure benefited in less
area usage of programmable devices (Barkalov and Węgrzyn, 2006b; Adamski and
Barkalov, 2006; Barkalov, 2002). However, thanks to the modification in the struc-
ture, there is still a possibility to reduce the number of logic blocks required for the
implementation of the controller. Thus, both methods – the CMCU with mutual
memory and the CMCU with sharing codes – were an inspiration for research into
34 3. Compositional microprogram control units
new design ways of control units. Six new structures and synthesis methods of
the CMCU are shown in the dissertation. All presented methods are divided into
two parts. The first group is based on the CMCU with mutual memory and it
is described widely in Chapter 4. The second group is based on the CMCU with
sharing codes and it is shown in Chapter 5.
Chapter 4
COMPOSITIONAL MICROPROGRAM CONTROL
UNITS WITH MUTUAL MEMORY
This chapter deals with the compositional microprogram control unit with mutual
memory. The main idea is to recognize each operational linear chain by an address
generated by the counter. Now the code produced by the counter indicates the
current state of the control unit. Therefore, the usage of the register in the CMCU
with mutual memory is unnecessary.
The first section describes the traditional synthesis method of the CMCU
with mutual memory that was initially proposed in (Barkalov and Palagin, 1997).
The next sections show three new synthesis methods of the CMCU with mutual
memory. The main idea was to reduce the number of logic blocks (especially LUT
elements) that are required for the implementation of the system in the FPGA.
4.1. CMCU with mutual memory
The structure of the CMCU UMM with mutual memory is presented in Fig. 4.1.
There are three main blocks in the CMCU UMM : the combinational circuit CC,
the counter CT, and control memory CM (Barkalov and Wiśniewski, 2005d).
CC CT CM
T A
Y
X
y0
Fig. 4.1: Structure of a CMCU with mutual memory
Unlike in the CMCU UBS with a base structure, in the CMCU UMM , the
combinational circuit generates the excitation function only for the counter:
T = f(X,A), (4.1)
where X means the set of conditional vertices and A means the code that is
determined by the counter. Such a code is also the address of the microinstruction
36 4. Compositional microprogram control units with mutual memory
that is kept in control memory. The number of logic functions is decreased in
comparison with the CMCU UBS , because the circuit CC does not generate the
excitation function for the register. Thus the number of logic blocks of the target
programmable device is reduced (Adamski and Barkalov, 2006; Wiśniewski and
Barkalov, 2007; Barkalov et al., 2006e; Wiśniewski et al., 2007).
4.1.1. Main idea of the method
In the CMCU UMM , transitions between internal states of the controller are per-
formed in a different way than it is done in the CMCU with a base structure. Here
the address generated by the counter is used to recognize the proper state of the
control unit.
The controller operates as follows: at the beginning, the counter is set to the
value that corresponds to the initial state of the FSM which is equal to the address
of the first microinstruction. If transitions are executed inside αg ∈ C, then y0 = 0.
This causes the incrementation of CT and forbids changing the current state of
the control unit. When the output of αg ∈ C is reached, y0 = 1 and the circuit
CC forms the excitation function for the counter (4.1). This function forms the
code K(ss) of the state of transition and the address of the input of the next OLC
αg ∈ C as well. If the controller reaches an address of the microinstruction Y (bk)
such that 〈bk, bE〉 ∈ E, then yK = 1. In this case, the operation of the CMCU
UMM is finished.
4.1.2. Synthesis of the CMCU with mutual memory
The method of the synthesis of the CMCU with mutual memory includes the
following steps:
1. Formation of the set of OLCs. At the beginning, the set of opera-
tional linear chains is created. For each OLC, all outputs and inputs are
determined. This step is executed according to the definitions 3.2 and 3.3.
2. Formation of the content of control memory. In order to perform the
formation of the content of control memory, microinstructions and their ad-
dresses ought to be encoded. In the case of the CMCU UMM , natural binary
codes will be used. Then control memory is formed. Each microinstruc-
tion consists of all (N) microoperations that belong to the initial flow-chart
and two additional bits: y0 and yK (the value 1 means that microoperation
belongs to the microinstruction). Therefore, the volume of control mem-
ory can be calculated as SCM=(N+2)∗2R1 , where R1 is the width of the
microinstruction address generated by the counter.
3. Formation of the transition table of the CMCU UMM ; formation
of the excitation function for the counter. At this stage, the table
of transitions between OLCs is created. The table contains the following
columns: Og,SA(Og),Xh,Itj ,K(Itj),T ,h, where
4.1. CMCU with mutual memory 37
• Og is the output from which the transition is executed;
• SA(Og) is the address of the output Og;
• Xh is the input signal causing the transition < Og, Ijt > and is equal to
the conjunction of the elements from the set X;
• Itj is the input of the target chain, αj ∈ C, where the transition is
executed;
• K(Itj) is the address of the input Itj ;
• T is the set of variables that form the excitation function for the counter;
• h is the number of the transition (h=1, . . . , H).
Based on this table, the excitation function T for the counter is formed:
Tr =
H∨
h=1
CrhE
h
gXh (r = 1, . . . , R1). (4.2)
Here Crh is a Boolean variable that is equal to 1, if and only if the function
Tr is written in the h-th line of the table of transitions; Ehg is a conjunction
of internal variables Ar ∈ A corresponding to the address SA(Og) of the
output Og from the h-th line of the table of transitions.
4. Implementation of the CMCU UMM . The controller may be imple-
mented on the FPGA in two ways. The first one is to realize control mem-
ory with dedicated memory blocks of programmable devices. Such a solution
permits to decrease the number of logic blocks of the FPGA used. The sec-
ond way is to implement the whole system using logic blocks of the device.
This option is applied usually if the size of control memory exceeds the size
of available dedicated memory blocks of the FPGA.
4.1.3. Example of the synthesis method of the CMCU with mutual memory
To bring closer the idea of the CMCU with mutual memory, the synthesis method
of such a controller will be illustrated with a simple example. Figure 4.1.3 shows
a hypothetical algorithm of the control unit U1. There are 11 operational B =
{b1, . . . , b11} and three conditional X = {x1, x2, x3} vertices in the flow-chart
Γ1. Thus, the circuit should generate 11 microinstructions that consist of five
microoperations Y = {y1, . . . , y5}.
In order to design the CMCU with mutual memory, first the set C of op-
erational linear chains ought to be formed (Fig. 4.3). In the presented example,
there are four OLCs C = {α1, α2, α3, α4}, where α1 = 〈b1, b2〉, α2 = 〈b3, . . . , b7〉,
α3 = 〈b8, b9〉, α4 = 〈b10, b11〉. All OLCs, except for α2, have one input: for α1 it
is the vertex b1, for α3 – b3 and for α4 – b4. The OLC α2 has two inputs: the
vertex b3 and the vertex b6. Therefore, the set of inputs contains five elements:
I = {I11 , I12 , I22 , I13 , I14}, where I11 = b1, I12 = b3, I22 = b6, I13 = b8, I14 = b10. Each
38 4. Compositional microprogram control units with mutual memory
Fig. 4.2: Flow-chart Γ1
OLC may have only one output, thus there are four outputs in the set of OLCs:
O = {O1, . . . , O4}, where O1 = b2, O2 = b7, O3 = b9, O4 = b11.
In the next step of the design process, the content of control memory should
be formed. To perform this task, addresses of all microinstructions have to be
encoded. In the case of a control unit with mutual memory, the encoding method
is not important, therefore, according to (3.3), natural binary codes will be used.
There are 11 operational vertices in the flow-chart Γ1, so microinstructions will be
encoded using four bits. In the presented example, microinstructions are addressed
as follows: A(b0) = 0000, A(b1) = 0001, . . ., A(b11) = 1010.
Each microinstruction executed in the vertex bk consists of microoperations
that are written in this vertex. There are two additional microoperations that are
necessary for proper functionality of the CMCU: y0 and yK . The first one is set
4.1. CMCU with mutual memory 39
Fig. 4.3: OLC flow-chart of the CMCU U1
to (y0=1) if the vertex bk belongs to the set of outputs O. Otherwise, y0 = 0. In
the proposed example, y0 will be produced by the vertices b2, b7, b9 and b11. The
microoperation yK is equal to 1 only if the vertex bk is connected with the final
vertex of the flow-chart. For the flow-chart Γ1, yk will be set only in the vertex b7.
Next, microinstructions are encoded and the table of control memory is formed.
Table 4.1 shows the content of CM for the control unit U1.
To determine the excitation function T for the counter, the table of transitions
of the CMCU U1 should be formed. This table describes transitions between all
operational linear chains depending on input values (set of operational vertices
X). In the presented example, the table of transitions has H = 8 lines (Tab. 4.2).
Based on the address SA(Og) (which is represented by the set of variables
A = {a1, . . . , a4}) and on the set of conditional vertices X, the excitation function
T for the counter is formed:
t4 = a4· a3· a2· a1·x1·x2,
t3 = a4· a3· a2· a1·x1·x2 + a4· a3· a1·x3,
t2 = a4· a3· a2· a1· (x1 + x1·x2),
t1 = a4· a3· a2· a1·x1 + a4· a3· a1·x3.
(4.3)
40 4. Compositional microprogram control units with mutual memory
Tab. 4.1: Content of the control memory of the CMCU U1
Vertex Address Microinstruction Comment
y0 y1 y2 y3 y4 y5 yK
b1 0000 0 1 1 0 0 0 0 I11
b2 0001 1 0 0 1 1 0 0 O1
b3 0010 0 0 1 1 0 0 0 I12
b4 0011 0 1 0 0 1 0 0 –
b5 0100 0 0 0 0 0 1 0 –
b6 0101 0 1 0 1 0 0 0 I22
b7 0110 1 0 1 1 0 1 1 O2
b8 0111 0 1 1 0 0 0 0 I13
b9 1000 1 1 0 1 0 1 0 O3
b10 1001 0 0 0 1 1 0 0 I14
b11 1010 1 1 0 1 0 0 0 O4
Tab. 4.2: Table of transitions of the CMCU U1
Og
SA(Og) Xh Itj
K(Itj) T h
a4 a3 a2 a1 t4 t3 t2 t1
O1 0 0 0 1 x1 I12 0 0 1 0 t2 1
O1 0 0 0 1 x1 x2 I13 0 1 1 1 t3 t2 t1 2
O1 0 0 0 1 x1 x2 I14 1 0 0 1 t4 t1 3
O2 0 1 1 0 – – - - - - – 4
O3 1 0 0 0 x3 I22 0 1 0 1 t3 t1 5
O3 1 0 0 0 x3 I11 0 0 0 0 – 6
O4 1 0 1 0 x3 I22 0 1 0 1 t3 t1 7
O4 1 0 1 0 x3 I11 0 0 0 0 – 8
Now the CMCU U1 can be easily prototyped using hardware description lan-
guages like Verilog (Lee, 1999; Thomas and Moorby, 2002) or VHDL (Bibilo,
1999; Brown and Vernesic, 2000; Pecheux et al., 2005; Salcic, 1998; Skahill et al.,
1996; Zwolinski, 2000). Based on this description, the CMCU can be logically
synthesised and finally implemented in the FPGA. Chapters 7 and 8 deal with
the description of the CMCU in HLDs and the implementation of the CMCU in
FPGAs in detail.
4.1.4. Summary
In this section, the synthesis method of the CMCU with mutual memory was
described in detail. The presented example was prepared and implemented using
4.1. CMCU with mutual memory 41
a real FPGA device (XC2VP30 from the Xilinx, Virtex-II Pro family). Figure 4.4
presents a simplified technological diagram of the controller. Initially, the diagram
was generated after logic synthesis by the Xilinx XST tool. It was modified to
clarify the logic structure of the circuit U1. Here 10 LUTs, which correspond to
the combinational circuit, were replaced with one block. Similarly, four LUTs and
four flip-flops that form the counter are represented by two further blocks. The
FDC is a Xilinx primitive, and it represents a D-type flip-flop with asynchronous
reset. Additionally, the main nets were named (in the example T , A) to show
similarity to the logic diagram.
There are two blocks of the CMCU U1 that are synchronous: the counter and
control memory. Therefore, the clock signal Clk ought to be delivered to such
modules. The counter is triggered by the rising edge of the clock signal. However,
because of feedback signals, control memory is active on the falling edge of Clk.
Now an address of a microinstruction is formed on a positive edge, while outputs
are generated when negative edge of a clock signal occurs. Of course, critical timing
paths should be checked to avoid timing skews in the circuit (placement and timing
paths are automatically verified by Xilinx tools during logical implementation of
the design).
10 LUTs
T
Reset
Clk
X
Y1 BRAM
Clr
Clk
A
4 LUTs 4 FDCs
Clr
Clk
y0
Fig. 4.4: Technological structure of the CMCU U1
The realization of the CMCU U1 took 14 LUT elements and one dedicated
memory block of FPGA resources. In comparison, the controller prepared as a tra-
ditional finite-state-machine required 14 LUT elements and one dedicated memory
block as well (here microinstructions were also implemented using dedicated mem-
ory). Such a simple example shows that the controller designed as the CMCU with
mutual memory may not give better results than the equivalent FSM-based circuit.
The achieved results of more tests (presented in Chapter 8 in detail) showed
that, for controllers that interpret a linear flow-chart, the CMCU with mutual
memory requires less logic blocks than the traditional FSM. However, the benefit
is very small (about 3%), and such results were an inspiration for searching for
new designing ideas of control units. The aim of the research was to reduce the
number of logic elements that are required in order to implement the controller
using programmable devices. The next sections present the author’s synthesis
methods of compositional microprogram control units. All methods shown in this
chapter are based on a modification of the structure of the CMCU UMM .
42 4. Compositional microprogram control units with mutual memory
4.2. CMCU with a function decoder
The microprogram control unit with a function decoder UFD is an extended struc-
ture of the CMCU with mutual memory (Wiśniewski and Barkalov, 2007; Barkalov
et al., 2007b). In comparison to the controller UMM , there is an additional cir-
cuit (function decoder (FD)) introduced. Figure 4.5 illustrates the CMCU with a
function decoder.
CC CT CM
T A
Y
X
FD
Z
y0
Fig. 4.5: Structure of the CMCU with a function decoder
The main idea of the method is to reduce the number of logic blocks of the
target FPGA due to the usage of the additional block (function decoder), which
may be implemented using dedicated memories. Therefore, fewer LUT elements
are used during the realisation of the control unit in comparison with the CMCU
with mutual memory.
4.2.1. Main idea of the method
In the CMCU UFD, variables that form the excitation function for the counter
are encoded with a minimum number of bits. To solve this task, all inputs of
operational linear chains ought to be encoded. Now the circuit CC generates the
function Z:
Z = f(X,A). (4.4)
The function Z contains the encoded addresses E(I) of all inputs in the set
of OLCs. They are further decoded by the circuit FD, which indicates the proper
code for the counter:
T = f(Z). (4.5)
The number of bits that are required to encode all inputs can be calculated
as RZ = dlog2MZe, where MZ = |I| is equal to the number of all inputs in the set
of OLCs.
The presented solution permits to reduce the number of outputs generated by
the circuit CC. The additional block of the function decoder is implemented with
dedicated memories of FPGAs. Therefore, the number of logic elements that are
needed to implement the whole controller is reduced.
4.2. CMCU with a function decoder 43
4.2.2. Synthesis of the CMCU with a function decoder
The design method of the CMCU UFD includes the following steps:
1. Formation of the set of OLCs, encoding their inputs and microin-
struction addresses. The formation of the set of OLCs is executed in the
same manner as was shown during the synthesis of the CMCU with mutual
memory. Next, the addresses A of all microinstructions are calculated. The
encoding style is not important, so natural binary codes may be used. Fi-
nally, the addresses K(Itj) of all inputs of the set of OLCs are encoded with
the minimum RZ number of bits. Now each input has the unique code E(Itj).
2. Formation of the control memory content. According to the addresses
calculated in the previous stage, the content of control memory is formed.
As has been mentioned, the encoding style is not important here.
3. Formation of the table of CMCU transitions and the formation
of the excitation function for the function decoder. This table is
the basis for the formation of the system (4.4) and the synthesis of the
circuit CC. This table contains only transitions for such OLCs that αg ∈ C ′,
where C ′ ⊂ C. The subset C ′ contains OLCs αg ∈ C if their outputs are
not connected with the final vertex of the flow-chart. The transition table
contains the following columns: Og,SA(Og),Xh,Itj ,E(Itj),Z,h. Here
• Og is the output from which the transition is executed;
• SA(Og) is the address of the output Og;
• Xh is the input signal causing the transition < Og, Ijt > and is equal to
the conjunction of the elements from the set X;
• Itj is the input of the chain αj ∈ C to which the transition is executed;
• E(Itj) is the encoded address of the input Itj ;
• Z is the set of variables that form the excitation function for the function
decoder;
• h is the number of the transition (h=1, . . . , H).
Based on the transition table, the excitation function Z can be determined.
The system (4.4) is represented as
Zr =
H∨
h=1
CrhF
h
g Xh(r = 1, . . . , R1), (4.6)
where Crh is a Boolean variable that is equal to 1, if and only if the function
Zr is written in the h-th line of the table of transitions; Fhg is a conjunction
of the internal variables Ar ∈ A corresponding to the address SA(Og) of the
output Og from the h-th line of the table of transitions.
44 4. Compositional microprogram control units with mutual memory
4. Formation of the truth table of the function decoder. Based on the
code E(Itj), the function decoder generates the proper address K(Itj) of the
OLC input. The set of addressesK(Itj) form the excitation function T for the
counter. The table of the function decoder contains the following columns
Itj , K(Itj), E(Itj), T , m:
• Itj is the input of the chain αj ∈ C;
• E(Itj) is the encoded address of the input Itj ;
• K(Itj) is the code of the input Itj ;
• T is the set of variables that form the excitation function for the counter;
• m is the consecutive line in the truth-table of the function decoder
(m=1, . . . , M).
Based on this table, the circuit of the function decoder can be implemented
with dedicated memory blocks. The code E(Itj) forms inputs and K(Itj)
forms outputs of the function decoder. The volume of memory that is re-
quired for the implementation of the function decoder can be calculated as
SFD=R1∗2RZ , where R1 counts the number of variables that form the ex-
citation function for the counter and RZ means the number of bits that are
required for OLC input encoding.
5. Implementation of the CMCU UFD. The memory of the controller
is implemented with dedicated memory blocks. However, in the case of
the CMCU UFD, the circuit of the function decoder may be realised with
dedicated memory blocks as well. In comparison to the CMCU UMM , the
number of output variables and the excitation function generated by the
circuit CC are reduced. Because the circuit FD is implemented as memory,
the total number of logic blocks that are used for the implementation of the
controller may be highly reduced. The gain depends mainly on the total
number of all inputs in the set of OLCs (see Chapter 8).
4.2.3. Example of the designing method of the CMCU with a function decoder
To demonstrate the designing method of the CMCU UFD with a function decoder,
the flow-chart Γ1 will be used as a description of the example controller U2. As
was shown in Section 4.1.3, there are four OLCs: C = {α1, α2, α3, α4}. The
circuit should generate 11 microinstructions that consist of five microoperations
Y={y1, . . . , y5}.
According to the synthesis rules of the CMCU with a function decoder, all
OLC inputs ought to be encoded. There are five inputs, therefore the minimum
number of bits that are needed for encoding is RZ = 3. In the presented example,
a natural binary code will be used and inputs will be encoded as follows: E(I11 ) =
000, E(I21 ) = 001, E(I22 ) = 010, E(I11 ) = 011, E(I11 ) = 100.
In the second step, the content of control memory is formed. This stage is
performed in the same way as was shown in Section 4.1.3 (the content of CM is
shown in Tab. 4.1).
4.2. CMCU with a function decoder 45
Next, the table of transitions of the CMCU U2 is prepared. The table is
similar to that shown as 4.2, but now all inputs of OLCs are encoded (Tab. 4.3).
Tab. 4.3: Table of transitions of the CMCU U2
Og
SA(Og) Xh Itj
E(Itj) Z h
a4 a3 a2 a1 z3 z2 z1
O1 0 0 0 1 x1 I12 0 0 1 z1 1
O1 0 0 0 1 x1 x2 I13 0 1 1 z2 z1 2
O1 0 0 0 1 x1 x2 I14 1 0 0 z3 3
O2 0 1 1 0 – – - - - – 4
O3 1 0 0 0 x3 I22 0 1 0 z2 5
O3 1 0 0 0 x3 I11 0 0 0 – 6
O4 1 0 1 0 x3 I22 0 1 0 z2 7
O4 1 0 1 0 x3 I11 0 0 0 – 8
Based on the table of transitions, the excitation function Z for the circuit FD
is formed:
z3 = a4· a3· a2· a1·x1·x2,
z2 = a4· a3· a2· a1·x1·x2 + a4· a3· a1·x3,
z1 = a4· a3· a2· a1· (x1 + x1·x2).
(4.7)
In order to decode the proper excitation function for the counter, the table
of the function decoder has to be prepared. Table 4.4 shows the content of the
function decoder of the CMCU U2.
Tab. 4.4: Table of the function decoder of the CMCU U2
Itj
E(Itj) K(I
t
j) T m
z3 z2 z1 t4 t3 t2 t1
I11 0 0 0 0 0 0 0 – 1
I12 0 0 1 0 0 1 0 t2 2
I22 0 1 0 0 1 0 1 t3 t1 3
I13 0 1 1 0 1 1 1 t3 t2 t1 4
I14 1 0 0 1 0 0 1 t4 t1 5
46 4. Compositional microprogram control units with mutual memory
The circuit FD may be implemented either using dedicated memories or with
logic blocks of the FPGA. In the case of realisation with LUT elements, the exci-
tation function T is additionally formed:
t4 = z3· z2· z1,
t3 = z3· z2,
t2 = z3· z1,
t1 = z3· z2 + z3· z2· z1.
(4.8)
Finally, the CMCU U2 may be implemented in the FPGA. As has been men-
tioned, the additional module of the function decoder may be realized either with
dedicated memory blocks or with LUT elements of the programmable device.
7 LUTs
T
Reset
Clk
X
Y1 BRAM
Clr
Clk
A
4 LUTs 4 FDCs
Clr
Clk
y0
1 BRAM
Clr
Clk
Z
Fig. 4.6: Technological structure of the CMCU U2
Figure 4.6 shows a schematic of the CMCU U2. The usage of the function
decoder permitted to decrease the number of LUT elements to 11. This means
that the area used by logic blocks of the circuit U2 was reduced by 21% compared
to the CMCU U1.
There is an additional synchronous component in the CMCU U2 in comparison
to the CMCU U1. The function decoder is implemented with dedicated memory
blocks of an FPGA and it should be triggered by a clock signal. Therefore, the
FD is active on a positive edge, while the counter is synchronized with the falling
edge of Clk. Finally, control memory is triggered by a positive clock signal. Such
a solution ensures proper functionality of the controller. On the other hand, it
should be emphasised that microinstructions are formed by a half clock pulse later
in comparison to the CMCU U1.
4.2.4. Summary
The CMCU with a function decoder was presented in the section. The excitation
function for the counter was decoded on a minimum number of bits, thus the
number of logic elements that were used to realize the circuit CC is reduced. The
additional block – the function decoder – may be realized with dedicated memory
blocks of the FPGA. Such a solution preserves the additional area of logic elements
of the programmable device.
4.3. CMCU with output identification 47
Detailed experiments showed that the realization of the controller with the
application of the CMCU with a function decoder reduces the number of LUT
elements used on average by 19% (see Chapter 8 for details).
4.3. CMCU with output identification
The structure of the CMCU UOI with output identification is illustrated in Fig. 4.7.
The main idea is to use the part of the address A for the identification of the
internal states of the control unit. Now the set of variables Q (Q ⊂ A) represents
the code of the current state of the controller (Wiśniewski et al., 2007; Barkalov
et al., 2005b; Barkalov et al., 2005i).
CC CT CM
T A
Y
X
Q
y0
Fig. 4.7: Structure of the CMCU with output identification
4.3.1. Main idea of the method
In the CMCU UOI , the set of feedback variables A that are used for the identifica-
tion of the current state of the controller is reduced to the minimum. Outputs of
operational linear chains may be recognized using ROI bits thanks to special encod-
ing of microinstructions (Wiśniewski et al., 2006a; Barkalov et al., 2007c; Barkalov
et al., 2006c; Barkalov and Wiśniewski, 2005h; Wiśniewski, 2006a). Therefore,
the combinational circuit generates the function T for the counter (Wiśniewski
et al., 2006a; Barkalov et al., 2006d):
T = f(X,Q), (4.9)
where Q ⊆ A, |Q| = ROI , Q = {Q1, . . . , QROI}.
4.3.2. Synthesis of the CMCU with output identification
The synthesis of the CMCU with mutual memory includes the following steps:
1. Formation of the set of OLCs. The set of operational linear chains is
created. For each OLC, its output and all inputs are determined. According
to the definitions 3.4 and 3.6, there are M2 operational linear chains and the
length of the longest one is specified by the M1 value. The total number of
microinstructions is equal to M3 (see the definition 3.8).
48 4. Compositional microprogram control units with mutual memory
2. Addressing microinstructions and encoding OLC outputs. Let Q ⊆
A be a set of variables that are sufficient for one-to-one identification of the
OLC αg ∈ C and ROI = |Q|. Microinstructions addressing of the CMCU is
executed in the following manner:
(a) At the beginning, all microinstructions are encoded using natural binary
codes.
(b) The value of ROI is set to ROI = R2, where R2=dlog2M2e.
(c) The table of addressing is created. The table has 2ROI columns marked
by ROI major address bits and 2R3−ROI lines marked by R3 − ROI
junior address bits. Here, R3=dlog2M3e.
(d) If outputs of two different OLCs αi, αj ∈ C are situated in the same
column and none of the outputs is connected with the final vertex of
the flow-chart, then the information is shifted to the right starting from
the first vertex of the OLC αj (j>i). The releasing cells of the table
are filled by the symbols "∗" . This operation is performed until the
outputs Oi and Oj are in different columns of the table.
(e) If outputs of all OLCs have one-to-one identification by ROI bits, then
the algorithm moves on to the point (g).
(f) If the address of any vertex is beyond the actual addressing space, then
ROI := ROI + 1. Next, the algorithm is repeated from the point (c).
(g) End.
Finally, all microinstructions are encoded. Now the code of each microin-
struction is formed as a concatenation of major (columns) and minor (lines)
addresses of the created table. Outputs of OLCs are encoded using only ma-
jor bits of the address. Such encoding will be further used in the formation
of the transition table of the CMCU.
For better understanding, the presented algorithm of microinstruction ad-
dressing will be illustrated later with an example.
3. Formation of the control memory content. The content of control
memory is formed. Addresses of microinstructions are created according to
the algorithm presented in the previous step.
4. Formation of the transition table of the CMCU UOI; formation
of the excitation function for the counter. At this stage, the table
of transitions between OLCs is created. The table contains the following
columns: Og,MA(Og),Xh,Itj ,K(Itj),T ,h, where
• Og is the output from which the transition is executed;
• MA(Og) is the major part of an address of the output Og; this address
was calculated at the stage 2;
4.3. CMCU with output identification 49
• Xh is the input signal causing the transition < Og, Ijt > and is equal to
the conjunction of the elements from the set X;
• Itj is the input of the chain αj ∈ C where the transition is executed;
• K(Itj) is the address of the input Itj ;
• T is the set of the variables that form the excitation function for the
counter;
• h is the number of the transition (h=1, . . . , H).
Based on this table, the excitation function T for the counter is formed:
Tr =
H∨
h=1
CrhE
h
gXh(r = 1, . . . , ROI). (4.10)
Here, Crh is a Boolean variable that is equal to 1 if and only if the function
Tr is written in the h-th line of the table of transitions; Ehg is a conjunction
of the internal variables Qr ∈ Q corresponding to the address MA(Og) of
the output Og from the h-th line of the table of transitions.
5. Implementation of the CMCU UOI. This step is executed in the same
manner as was shown during the designing process of the CMCU UMM . The
combinational circuit and the counter are implemented using LUT elements
while control memory is realised with dedicated memory blocks of FPGAs.
4.3.3. Example of the synthesis of the CMCU with output identification
To bring closer the idea of OLC encoding, the design process of the CMCU UOI
with output identification will be illustrated with an example. Once more, the
flow-chart Γ1 will be used as the initial description of the controller U3. As was
shown in Subsection 4.1.3, there are M3 = 11 operational vertices and M2 = 4
operational linear chains. The longest OLC is α2, and it containsM1 = 5 elements.
According to the algorithm of microinstruction addressing, the initial value of the
variable ROI is equal to R2=dlog2M2e = 2. Thus, at the beginning, the table of
addressing has 2ROI = 2 columns and 2R3−ROI = 2 lines (Fig. 4.8).
Initially, all addresses of microinstructions are encoded in a natural binary
code. In the presented example, the outputs O3 of α3 and O4 of α4 are located in
the same column. Because neither O3 nor O4 is connected with the final vertex
of the flow-chart Γ1, all components that have higher addresses than the output
O3 are shifted. This movement is performed while the output O4 is in the same
column as O3.
Figure 4.8 presents the table after the shift operation. Now each OLC output
is situated in a different column, and there are no vertices beyond the addressing
space. This means that all addresses are encoded and the algorithm is finished.
The table of encoding shows that each OLC output may be recognized using
|Q| = 2 major bits, where the set Q is the subset of A and contains only the
50 4. Compositional microprogram control units with mutual memory
a3 a4
a1 a2 00 1001 11
00
01
10
11
b1 = I 1
1
b2 = O 2
b4
b3 = I 2
1
b5
b6 = I2
2
b8 = I3
1
b10 = I4
1
b11= O4
b9 = O3 *
*
*
**
b7 = O2
Fig. 4.8: Initial table of addressing
a3 a4
a1 a2 00 1001 11
00
01
10
11
b1 = I 1
1
b2 = O 2
b4
b3 = I 2
1
b5
b6 = I2
2
b8 = I3
1
b11= O4
b10 = I4
b9 = O3
* *
*
*
*b7 = O2
1
Fig. 4.9: Table of addressing after shift operations
variables Q = {a3, a4}. Now OLC outputs are encoded as follows: K(O1) = 00,
K(O2) = 01, K(O3) = 10, K(O4) = 11. The presented algorithm of encoding is
used during the creation of control memory. Addresses of microinstructions are
formed as concatenations of major (columns) and minor (lines) bits of the table
of addressing. The content of control memory is presented in Table 4.5.
In the next step, the table of transitions of the CMCU U3 is created. The
table is similar to the one that was created for the CMCU with mutual memory,
although now there are only two major bits of the address used as OLC output
identification (Tab. 4.6).
4.3. CMCU with output identification 51
Tab. 4.5: Control memory content of the CMCU U3
Vertex Address Microinstruction Comment
y0 y1 y2 y3 y4 y5 yK
b1 0000 0 1 1 0 0 0 0 I11
b2 0001 1 0 0 1 1 0 0 O1
b3 0010 0 0 1 1 0 0 0 I12
b4 0011 0 1 0 0 1 0 0 –
b5 0100 0 0 0 0 0 1 0 –
b6 0101 0 1 0 1 0 0 0 I22
b7 0110 1 0 1 1 0 1 1 O2
b8 0111 0 1 1 0 0 0 0 I13
b9 1000 1 1 0 1 0 1 0 O3
b10 1011 0 0 0 1 1 0 0 I14
b11 1100 1 1 0 1 0 0 0 O4
Tab. 4.6: Transitions table of the CMCU U3
Og
MA(Og) Xh Itj
K(Itj) T h
a4 a3 t4 t3 t2 t1
O1 0 0 x1 I12 0 0 1 0 t2 1
O1 0 0 x1 x2 I13 0 1 1 1 t3 t2 t1 2
O1 0 0 x1 x2 I14 1 0 1 1 t4 t2 t1 3
O2 0 1 – – - - - - – 4
O3 1 0 x3 I22 0 1 0 1 t3 t1 5
O3 1 0 x3 I11 0 0 0 0 – 6
O4 1 1 x3 I22 0 1 0 1 t3 t1 7
O4 1 1 x3 I11 0 0 0 0 – 8
Based on the address MA(Og) (represented by the set of variables Q =
{a3, a4}) and the set of the conditional vertex X, the excitation function T for
the counter is formed:
t4 = a4· a3·x1·x2,
t3 = a4· a3·x1·x2 + a4·x3,
t2 = a4· a3,
t1 = a4· a3·x1 + a4·x3.
(4.11)
Now the CMCU U3 can be prototyped using HDL languages. The excita-
tion function T contains fewer variables and shorter equations in comparison to
the excitation function formed for the controller with mutual memory (see 4.8).
Therefore, it is expected that the CMCU U3 should consume fewer logic elements
52 4. Compositional microprogram control units with mutual memory
than the CMCU U1. The implementation of the controller in the FPGA showed
that, in fact, the CMCU U3 with output identification requires 11 LUT elements
(Fig. 4.10), which means reduction by 21% in comparison to the CMCU U1 with
mutual memory.
7 LUTs
T
Reset
Clk
X
Y1 BRAM
Clr
Clk
A
4 LUTs 4 FDCs
Clr
Clk
y0Q
Fig. 4.10: Technological structure of the CMCU U3
4.3.4. Summary
The CMCU with output identification was described in this section. The main
advantage of the presented solution is the reduction of the number of variables
that keep the actual code of the state of the controller. Thus, the number of logic
elements of the FPGA is reduced in comparison with the traditional CMCU with
mutual memory. It should be pointed out that special addressing of microinstruc-
tions is required. Therefore, an additional designing step has to be performed in
comparison to the synthesis process of the CMCU UMM .
4.4. CMCU with output identification and a function
decoder
The CMCU UOD with output identification and a function decoder (Fig. 4.11)
is a conjunction of two structures presented in the previous sections. Special
addressing of microinstructions is used in the CMCU UOD. Moreover, maximal
encoding of the set of variables A is performed as well.
4.4.1. Main idea of the method
To improve the reduction of LUT elements of the implementation of the
CMCUs UFD and UOI , both methods may be combined. Now the combinational
circuit generates the excitation function Z for the circuit FD:
Z = f(X,Q), (4.12)
4.4. CMCU with output identification and a function decoder 53
CC CT CM
T A
Y
X
FD
Z
Q
y0
Fig. 4.11: Structure of the CMCU with output identification and a function de-
coder
where X means the set of input variables of the CMCU (conditional vertices) and
Q ⊆ A is a feedback function generated by the counter. The function decoder
generates proper addresses of microinstructions:
T = f(Z), (4.13)
where T means the set of variables that form the excitation function for the
counter.
4.4.2. Synthesis of the CMCU with output identification and a function de-
coder
The designing method the CMCU UFD includes the following steps:
1. Formation of the set of OLCs and the encoding of their inputs.
This step is executed by the same procedure as that described in the section
regrading the synthesis of the CMCU with a function decoder. All inputs of
OLCs are encoded with the minimum RZ number of bits, so each input has
the unique code E(Itj).
2. Addressing microinstructions and encoding OLC outputs. Addresses
of microinstructions are represented using the algorithm shown in the previ-
ous section. Outputs of OLCs are encoded using only major bits of addresses.
Such encoding will be further used in the formation of the table of transitions
of the CMCU.
3. Formation of the control memory content. According to addresses
calculated in the previous stage, the content of control memory is prepared.
4. Formation of the transition table of the CMCU; formation of the
excitation function for the function decoder. The table of transitions
is the basis for the formation of the system (4.12) and the synthesis of the
circuit CC. This table contains only transitions for such OLCs whose out-
puts are not connected to the final vertex of the flow-chart. The table of
transitions contains the following columns: Og,MA(Og),Xh,Itj ,E(Itj),Z,h.
54 4. Compositional microprogram control units with mutual memory
• Og is the output from which the transition is executed;
• MA(Og) is the major part of an address of the output Og; this address
was calculated at the stage of microinstruction addressing;
• Xh is the input signal causing the transition < Og, Ijt > and is equal to
the conjunction of the elements from the set X;
• Itj is the input of the chain αj ∈ C to which the transition is executed;
• E(Itj) is the address of the input Itj ;
• Z is the set of the variables that form the excitation function for the
function decoder;
• h is the number of the transition (h=1, . . . , H).
Now, according to (4.6), the set of variables Z can be formed.
5. Formation of the table of the function decoder. This step is executed
in the same manner as during the design of the CMCU with a function
decoder (see Section 4.2).
6. Implementation of the CMCU UOD. The main advantage of the CMCU
with output identification and a function decoder is the possibility to im-
plement both blocks (FD and CM) with dedicated memory blocks. More-
over, thanks to output identification, the number of feedback functions for
the combinational circuit decreases in comparison with the CMCU UMM .
Therefore, the implementation of the CMCU UOD consumes the least logic
elements of programmable devices in comparison with the CMCUs UMM ,
UFD and UOI . However, it should also be pointed out that the presented
controller uses at least two dedicated memory blocks of the FPGA.
4.4.3. Example of the synthesis of the CMCU with output identification
and a function decoder
To illustrate the synthesis of the CMCU UOD, the flow-chart Γ1 will be used as
the initial description of the controller. The prototyping process of the CMCU U4
with output identification and a function decoder is a conjunction of the design
of the CMCUs U2 and U3. At the beginning, the set of OLCs is formed and all
OLCs inputs are encoded. As was presented in the previous sections, there are
four OLCs which have five inputs. Thus, OLC inputs may be encoded using |Z|=3
bits. In the presented example, a natural binary code will be used: E(I11 )=000,
E(I21 )=001, E(I22 )=010, E(I11 )=011, E(I11 )=100.
At the next stage, the addressing of microinstructions and the encoding of
OLC outputs should be performed. According to the algorithm presented in
Subsection 4.3.2, microinstructions corresponding to the vertices b1,. . . ,b9 are ad-
dressed consecutively in a natural binary code: A(b1)=0000, A(b2)=0001,
A(b3)=0010, . . . , A(b9)=1000. Addresses of the last two microinstructions are
shifted, thus their codes are A(b10)=1011 and A(b11)=1100. Outputs of OLCs
are encoded with |Q|=2 major bits of an address, and therefore MA(O1)=00,
4.4. CMCU with output identification and a function decoder 55
MA(O2)=01, MA(O3)=10, MA(O4)=11. The content of control memory is iden-
tical with the CMCU U3 shown in the previous section (Tab. 4.5).
Next, the transition table of the CMCU is prepared. The table contains
transitions from the output Oi (which is encoded using Q ⊂ A bits) to the input
Itj (which is encoded using Z bits). Table 4.7 shows the transition table for the
CMCU U4.
Tab. 4.7: Table of transitions of the CMCU UOD
Og
MA(Og) Xh Itj
E(Itj) Z h
a4 a3 z3 z2 z1
O1 0 0 x1 I12 0 0 1 z1 1
O1 0 0 x1·x2 I13 0 1 1 z2 z1 2
O1 0 0 x1·x2 I14 1 0 0 z3 3
O2 0 1 – – - - - – 4
O3 1 0 x3 I22 0 1 0 z2 5
O3 1 0 x3 I11 0 0 0 – 6
O4 1 1 x3 I22 0 1 0 z2 7
O4 1 1 x3 I11 0 0 0 – 8
From the table of transitions, the excitation function Z for the function de-
coder is formed:
z3 = a4· a3·x1·x2,
z2 = a4· a3·x1·x2 + a4·x3,
z1 = a4· a3· (x1 + x1·x2).
(4.14)
In the last step, the table for the function decoder is formed. For the CMCU
U4, this table is identical with that presented during the synthesis of the CMCU U2
(Tab. 4.4). Finally, the controller may be designed with HDL languages and imple-
mented in a programmable device. Figure 4.12 shows the technological schematic
of the realisation of the CMCU U4 in the FPGA. As expected, the CMCU U4 took
the fewest logic blocks of the device between all CMCUs presented in this chapter.
The conjunction of OLC output identification and the application of the function
decoder resulted in the reduction of LUT elements used to 10. This means that
the initial CMCU with mutual memory was decreased by 26%.
4.4.4. Summary
This section presented the design method of the CMCU UOD with output identi-
fication and a function decoder. The main idea was to use special identification
of OLC outputs, which resulted in the reduction of the size of the internal feed-
back function. Additionally, the application of the function decoder permitted to
decrease the number of outputs of the combinational circuit to finally reduce the
56 4. Compositional microprogram control units with mutual memory
6 LUTs
T
Reset
Clk
X
Y1 BRAM
Clr
Clk
A
4 LUTs 4 FDCs
Clr
Clk
y0Q
1 BRAM
Clr
Clk
Z
Fig. 4.12: Technological structure of the CMCU U4
number of the employed LUT elements of the target FPGA device. Detailed ex-
periments showed that the design method of the CMCU UOD always gives better
results than the CMCU with mutual memory, on average by 32% (see Chapter 8
for details).
4.5. Conclusions
There were four design methods of the CMCU with mutual memory presented
in this chapter. The first one was initially proposed by Barkalov (Barkalov and
Palagin, 1997; Barkalov, 1998) and was the inspiration for further methods. The
main idea was to reduce the number of logic blocks that are required to implement
the CMCU in the FPGA.
The CMCU UFD includes a new block – the function decoder. Now the
excitation function for the counter is encoded with a minimum number of bits.
The additional circuit is realised with dedicated memory blocks of the FPGA, thus
the application of the function decoder reduces the total number of LUT elements
in comparison with the CMCU UMM . Experiments showed that the application
of the function decoder may be ineffective. Such a situation may happen when
the initial-flow chart contains many short OLCs. It should also be pointed out
that the CMCU with a function decoder usually consumes 19% less LUT elements
compared with the traditional CMCU with mutual memory.
The reduction of the number of variables that keep the actual state of the
controller was introduced in the CMCU UOI with output identification. Here,
special encoding of microinstructions is used and each OLC output has the unique
code that usually (but not always – it depends on the structure of the flow-chart
that describes the controller) requires fewer bits than is the case in the traditional
CMCU with mutual memory. Experiments showed that the CMCU with OLC
output identification consumes fewer LUTs than the CMCU with mutual memory
(by 14%).
4.5. Conclusions 57
The last presented solution of CMCU design combines two methods of syn-
thesis. In the CMCU UOD with output identification and a function decoder,
the excitation function for the counter is encoded and special addressing of mi-
croinstructions is performed as well. As expected, such a solution gained the best
results out of all CMCUs presented in this chapter. The CMCU UOD always re-
quires fewer LUT elements than the CMCUs UMM and UFD. It is also almost
always better than the CMCU UOI (in 98% of cases). The area used by the CMCU
with output identification and a function decoder is on average 32% smaller than
the traditional CMCU with mutual memory.
Detailed interpretation of the results of experiments is presented in Chapter 8.
The next chapter shows four alternative methods of CMCU design, where the idea
of sharing codes is used.
Chapter 5
COMPOSITIONAL MICROPROGRAM CONTROL
UNITS WITH SHARING CODES
This chapter refers to synthesis methods of a CMCU with sharing codes. The aim
of the proposed structures is to determine the microinstruction address by codes
generated both by the counter and by the register.
In the first section, the traditional synthesis method of the CMCU USC with
sharing codes is presented. The structure of the CMCU USC was proposed in
(Barkalov and Palagin, 1997). The next sections deal with three new methods of
synthesis of the CMCU with sharing codes. The aim of the proposed methods is
different. The first one – the CMCU with sharing codes and the function decoder –
concentrates on the reduction of logic blocks of the FPGA. The aim of the second
design method (CMCU with an address converter) is to reduce the width of the
control memory address and thus the volume of the memory. This method is very
useful if the volume of control memory exceeds the volume offered by dedicated
memory blocks of the FPGA. The third method mixes the usage of an address
converter and a function decoder.
5.1. CMCU with sharing codes
Figure 5.1 shows the CMCU USC with sharing codes (Barkalov and Wiśniewski,
2004f; Wiśniewski, 2004; Wiśniewski, 2006b; Wiśniewski et al., 2006c; Kołopieńczyk
et al., 2007). The main idea is to use codes generated both by the counter and
by the register to form the microinstruction address. Therefore, the number of
variables that are used for encoding excitation functions for the counter is reduced
in comparison with the CMCU UBS .
In the CMCU with sharing codes, the microinstruction address A (bt) is rep-
resented as a concatenation (Barkalov and Palagin, 1997):
A (bt) = K (αg) ∗K (bt) . (5.1)
Here, K (αg) is a code of the OLC αg ∈ C with R2=dlog2M2e bits, where M2
defines the number of OLCs in the initial flow-chart Γ; K (bt) is a code of a
component of the OLC αg ∈ C corresponding to the vertex bt ∈ B. The code
K (bt) has R1=dlog2M1e bits, where M1 is equal to the maximum amount of
components in the OLC αg ∈ C. The sign (∗) in (5.1) is used for the concatenation
operation.
5.1. CMCU with sharing codes 59
CC
CT
RG
CM
T
D
A
Q
Y
X
y0
Fig. 5.1: Structure of the CMCU with sharing codes
5.1.1. Main idea of the method
In the CMCU USC , the combinational circuit CC generates excitation functions
for the counter CT and for the register RG:
T = f(X,Q), (5.2)
D = f(X,Q). (5.3)
RG is in charge of holding the code of the current OLC. Additionally, it
generates an upper part of the microinstruction address. CT keeps only the number
of the active component (block) in the current OLC. Therefore, it determines the
lower part of the microinstruction address.
5.1.2. Synthesis of the CMCU with sharing codes
The design method of the CMCU USC includes the following steps:
1. Formation of the set of OLCs; encoding OLCs and their compo-
nents. Based on the definitions 3.2 and 3.3, the set of operational linear
chains is formed. Next, OLCs are encoded. Each OLC forms the code
K (αg). Finally, all components in OLCs are encoded with the code K (bt).
Both the OLC and its components are encoded with a natural binary code.
2. Addressing microinstructions and the formation of control mem-
ory. The microinstruction address is determined as a concatenation of the
code of the OLC and its component, according to (5.1). Then, the content of
control memory is formed. The volume of control memory can be calculated
as SCM=(N+2)∗2R1+R2 , where R1 is the size of the code generated by the
counter and R2 is the size of the code generated by the register.
3. Formation of the transition table of the CMCU and the formation
of excitation functions for the counter and for the register. This
table is the basis for the formation of system functions (5.2 and 5.3) and the
60 5. Compositional microprogram control units with sharing codes
synthesis of the circuit CC. The table of transitions contains the following
columns: αg,K(αg),Xh,αt,K(αt),bj ,K(bj),D,T ,h. Here
• αg is the OLC from which the transition is executed;
• K(αg) is the code of the OLC αg;
• Xh is the input signal causing the transition and is equal to the con-
junction of elements from the set X;
• αt is the target OLC where the transition is executed;
• K(αt) is the code of the OLC αt;
• bj is the target component in the OLC αt where the transition is exe-
cuted;
• K(bj) is the code of the component bj ;
• D is the set of variables that form the excitation function for the regis-
ter;
• T is the set of variables that form the excitation function for the counter;
• h is the number of the transition (h=1, . . . , H).
Based on this table, the excitation function T for the counter is formed:
Tr =
H∨
h=1
TrhE
h
gXh(r = 1, . . . , R1). (5.4)
Here Trh is a Boolean variable that is equal to 1 if and only if the variable Tr
is written in the h-th line of the table of transitions; Ehg is the conjunction
of the internal variables Qr ∈ Q corresponding to the code K(αg) from the
h-th line of the table of transitions.
Similarly, the excitation function D for the register is determined:
Dr =
H∨
h=1
DrhF
h
g Xh(r = 1, . . . , R2), (5.5)
where Drh is a Boolean variable that is equal to 1 if and only if the variable
Dr is written in the h-th line of the table of transitions; Fhg is the conjunction
of the internal variables Qr ∈ Q corresponding to the code K(αg) from the
h-th line of the table of transitions.
4. Implementation of the CMCU USC. Three modules of the CMCU USC
– the combinational circuit, the counter and the register – are implemented
with logic blocks of the target programmable device. Control memory may
be realized in two ways, with dedicated memory blocks or with logic elements.
5.1. CMCU with sharing codes 61
5.1.3. Example of the synthesis of the CMCU with sharing codes
To bring closer the idea presented in this section, the design method of the CMCU
U5 with sharing codes will be illustrated with an example. Let us use the descrip-
tion of the controller presented in Fig. 5.2. There are 13 operational and two
conditional vertices in the flow-chart Γ2.
Fig. 5.2: Flow-chart Γ2
According to the design rules defined in the previous subsection, initially the
set of operational linear chains ought to be formed. In the presented example,
there are three OLCs: C={α1, α2, α3}. Here α1=〈b1, . . . , b3〉, α2=〈b4, . . . , b7〉,
α3=〈b8, . . . , b13〉. All OLCs, except α2, have one input: for α1 it is the vertex
b1 while for α3 – b8. The OLC α2 has two inputs: the vertex b4 and the vertex
b7. There are three outputs in the set of OLCs O={O1, O2, O3}, where O1=b3,
O2=b7, O3=b13. The set of operational linear chains is shown in Fig. 5.3.
Operational linear chains and their components are encoded using natural
binary codes. There are M2=3 OLCs, thus R2=dlog2M2e=2 bits will be used for
encoding: K(α1)=00, K(α2)=01, K(α1)=10. To encode OLCs components, first
the length M1 of the longest OLC ought to be determined. For the presented
62 5. Compositional microprogram control units with sharing codes
Fig. 5.3: OLC flow-chart of the CMCU U5
example, α1 contains M1=3 components, α2 includes M2=4 vertices, and α3 has
M3=6 blocks. Thus the longest OLC is α3 with M1=M3, and it is equal to 6.
This means that OLC components will be encoded using R1=dlog2M1e=3 bits.
Table 5.1 illustrates the encoding of OLCs and their components.
Tab. 5.1: Encoding of CMCU U5 OLCs and their components
αg K(αg) bt K(bt)
Address
K(αg) ∗K(bt)
α1 00
b1 000 00 000
b2 001 00 001
b3 010 00 010
α2 01
b4 000 01 000
b5 001 01 001
b6 010 01 100
b7 011 01 101
α3 10
b8 000 10 000
b9 001 10 001
b10 010 10 010
b11 011 10 011
b12 100 10 100
b13 101 10 101
5.1. CMCU with sharing codes 63
The address of each microinstruction is formed as a concatenation of both
codes: A(bt)=K(αg) ∗ K(bt). In the presented example, the vertex b1 has the
address A(b1)=K(α1) ∗ K(b1)=00000. The vertex b9 is encoded as A(b9)=K(α3)∗
K(b9)=10001, and so on. Such encoding is necessary during the formation of the
control memory content (Tab. 5.2).
Tab. 5.2: Content of the control memory of the CMCU U5
Vertex Address Microinstruction Comment
y0 y1 y2 y3 y4 y5 yK
b1 00000 0 1 1 0 0 0 0 I11
b2 00001 0 0 0 1 1 0 0 –
b3 00010 1 0 0 0 0 1 0 O1
b4 01000 0 0 1 0 0 1 0 I12
b5 01001 0 1 0 0 0 0 0 –
b6 01010 0 0 0 1 0 1 0 –
b7 01011 1 1 0 0 1 0 1 I22 O2
b8 10000 0 1 0 1 0 0 0 I13
b9 10001 0 0 0 0 0 1 0 –
b10 10010 0 0 0 0 1 0 0 –
b11 10011 0 0 1 0 0 0 0 –
b12 10100 0 0 0 1 0 1 0 –
b13 10101 1 1 1 0 0 0 1 O3
In order to form excitation functions for the counter and for the register, the
transition table of the CMCU ought to be prepared. It is presented as Tab. 5.3.
Tab. 5.3: Table of transitions of the CMCU U5
αg
K(αg) Xh αt
K(αt) bj
K(bj) D T h
q2 q2 d1 d1 t3 t2 t1
α1 0 0 x1 α2 0 1 b4 0 0 0 d1 – 1
α1 0 0 x1x2 α2 0 1 b7 0 1 1 d1 t2 t1 2
α1 0 0 x1 x2 α3 1 0 b8 0 0 0 d2 – 3
The transition table is the basis for the formation of the excitation function
D for the register. OLCs are encoded with R2=2 bits, thus two variables ought to
be calculated from Tab. 5.3:
d2 = q2· q1·x1·x2,
d1 = q2· q1· (x1 + x1·x2). (5.6)
64 5. Compositional microprogram control units with sharing codes
Similarly, the excitation function T for the counter is formed:
t3 = 0,
t2 = t1 = q2· q1·x1·x2). (5.7)
Finally, the CMCU U5 is designed using hardware description languages. Con-
trol memory is realised using dedicated memory blocks of the FPGA. Figure 5.4
shows the logic schematic of the prototyped controller. Similarly to the technolog-
ical schematic of CMCUs with mutual memory presented in the previous chapter,
the combinational circuit is implemented with LUTs, the counter is formed from
LUTs and FDC flip-flops, and control memory is realised as BRAM. Additionally,
the register is implemented with FDCE blocks. Such elements are D-type flip-flops
with clock enable and asynchronous clear.
6 LUTs 2 FDCEsD
A
Reset
X
Q
3 LUTs
T
Y
Clk
3 FDCs
y0
Clr
Clk
CE
Clr
Clk
1 BRAM
Clr
Clk
Fig. 5.4: Technological structure of the CMCU U5
5.1.4. Summary
The design method of the CMCU with sharing codes was presented in this section.
The prototyping process of the demonstration CMCU U5 was shown to bring closer
the presented ideas. The implementation of such an example controller in the
FPGA resulted in two main facts that should be pointed out.
First, the realisation of the CMCU U5 took nine LUT elements. Detailed
experiments showed that the realisation of the controller as a CMCU with sharing
codes is almost always better than the realisation as a traditional FSM. However,
in this particular case, the FSM also requires nine LUT elements of the FPGA.
The achieved results were an inspiration for researching into new methods (modi-
fications) of the presented solution.
5.2. CMCU with sharing codes and a function decoder 65
Another very important fact is the volume of the control memory of the
CMCU U5. There are 13 microinstructions, thus the minimum width of the address
is equal to 4. Each microinstruction contains seven microoperations (see Tab. 5.2).
Therefore, the expected memory volume is equal to SCM=7*24=112. On the other
hand, the CMCU U5 requires five bits to encode addresses of microinstructions,
thus the total volume of the memory took 224 bits. Of course, such a small example
does not influence device resources because the area offered by one dedicated
memory block of the FPGA (in this case, the Xilinx Virtex-II Pro family) is much
larger. However, the volume of the dedicated memory block is limited, therefore
the method of sharing codes may be ineffective because control memory ought
to be decomposed. The application of the address converter solves this problem
(Wiśniewski et al., 2006b). The additional block determines the microinstruction
address, which is encoded with the minimum number of bits.
The next sections present new structures of the CMCU with sharing codes.
All methods are based on the traditional CMCU USC . The main goal of the
proposed methods is to reduce the number of logic elements that are used for the
implementation of the controller, although in the CMCU with an address converter
the reduction of the volume of control memory is performed as well.
5.2. CMCU with sharing codes and a function decoder
The CMCU USD with sharing codes and a function decoder is shown in Fig. 5.5.
The main idea is to reduce the number of outputs of the combinational circuit
thanks to the encoding of excitation functions for the counter and the register.
Therefore, the number of logic blocks required for the implementation of the
CMCU is reduced. The additional block – the function decoder – decodes and
sends proper values for the counter and for the register. The function decoder can
be implemented with dedicated memory blocks.
FD
CT
RG
CM
T
D
A
Q
Y
X
y0
CC Z
Fig. 5.5: CMCU with sharing codes and a function decoder
66 5. Compositional microprogram control units with sharing codes
5.2.1. Main idea of the method
In the CMCU USD, the set of variables that form the excitation function T for
the counter and the set of variables that form the excitation function D for the
register are encoded. Similarly to the CMCUs UFD and UOF shown in the previous
chapter, all inputs of the set of OLCs are encoded. The combinational circuit
generates the excitation function Z for the function decoder:
Z = f(X,Q). (5.8)
The function Z contains the encoded addresses Q of all inputs I in the set
of OLCs. They are further decoded by the circuit FD, which indicates the proper
code for the counter and for the register:
T = f(Z), (5.9)
D = f(Z). (5.10)
5.2.2. Synthesis of the CMCU with sharing codes and a function decoder
The method of synthesis of the CMCU USD includes the following steps:
1. Formation of the set of OLCs; encoding OLCs inputs and their
components. First, the set of OLCs is formed. Then, similarly to the
synthesis of the CMCU USC , all OLCs and their components are encoded.
Additionally, each input I is encoded with natural binary codes. Finally,
each input has the unique code K(Itj).
2. Addressing microinstructions and control memory formation. The
microinstruction address is determined as a concatenation of the code of the
OLC and its component, according to (5.1). Then the content of control
memory is formed.
3. Formation of the transition table of the CMCU and the forma-
tion of the excitation function for the function decoder. This ta-
ble is the basis for the formation of the function (5.8) and the synthesis
of the circuit CC. The table of transitions contains the following columns:
αg,K(αg),Xh,Itj ,K(Itj),Z,h. Here
• αg is the OLC from which the transition is executed;
• K(αg) is the code of the OLC αg;
• Xh is the input signal causing the transition and is equal to the con-
junction of the elements from the set X;
• αt is the target OLC where the transition is executed;
• Itj is the input of the chain αj ∈ C to which the transition is executed;
• K(Itj) is the address of the input Itj ;
5.2. CMCU with sharing codes and a function decoder 67
• Z is the set of variables that form the excitation function of the function
decoder;
• h is the number of transition (h=1, . . . , H).
Based on this table, the excitation function Z for the function decoder is
formed:
Zr =
H∨
h=1
TrhE
h
gXh(r = 1, . . . , R2), (5.11)
where Trh is a Boolean variable that is equal to 1 if and only if the variable
Tr is written in the h-th line of the table of transitions; Ehg is the conjunction
of the internal variables Qr ∈ Q corresponding to the code K(αg) from the
h-th line of the table of transitions.
4. Formation of the table of the function decoder. Based on the code
of each input, the function decoder generates excitation functions for the
counter and for the register. The table of the function decoder contains the
following columns Itj , K(Itj), αt,K(αt),bj ,K(bj),T ,D,m:
• Itj is the input of the chain αj ∈ C;
• K(Itj) is the code of the input Itj ;
• αt is the target OLC where the transition is executed;
• K(αt) is the code of the OLC αt;
• bj is the target component in the OLC αt where the transition is exe-
cuted;
• K(bj) is the code of the component bj ;
• T is the set of variables that form the excitation function for the counter;
• D is the set of variables that form the excitation function for the regis-
ter;
• m is the consecutive line in the truth-table of the function decoder
(m=1, . . . , M).
Based on this table, the circuit of the function decoder can be implemented
with dedicated memory blocks. Outputs of the circuit FD form excitation
functions T for the counter and D for the register. The volume of mem-
ory required for the realization of the function decoder can be calculated
as SFD=(R1+R2)∗2RZ , where R1 is the size of the code generated by the
counter, R2 is the size of the code generated by the register, and RZ is the
number of variables used for OLC input encoding.
68 5. Compositional microprogram control units with sharing codes
5. Implementation of the CMCU USD. The circuit of the function decoder
is realized as memory. Thus it can be implemented with dedicated memory
blocks of the target FPGA. The remaining blocks of the CMCU USD are
implemented in the same manner as in the case of the CMCU USC : CC,
CT and RG are realized with logic blocks, while CM is implemented using
dedicated memory blocks.
5.2.3. Example of the synthesis of the CMCU with sharing codes
and a function decoder
To bring closer the synthesis method of the CMCU U6 with sharing codes and a
function decoder, a controller described by the flow-char Γ2 (Fig. 5.2) will be de-
signed. As was already shown in the previous section, there are M2=3 OLCs that
have MZ=4 inputs: I11=b0, I12=b4, I22=b7, I13=b8. OLCs are encoded using a nat-
ural binary code, with R2=dlog2M2e=2 bits: K(α1)=00, K(α2)=01, K(α1)=10.
OLC components are encoded with R1=3 bits, as presented in Tab. 5.1. Accord-
ing to the synthesis rules shown in the previous subsection, all inputs are encoded
using RZ=dlog2MZe=2 bits: K(I11 )=00, K(I12 )=01, K(I22 )=10, K(I13 )=11.
In the next step, addresses of microinstructions are encoded and the formation
of the control memory of the CMCU U6 is carried out. This stage is performed
exactly in the same manner as presented during the synthesis of the CMCU U5
(see Tab. 5.2).
In the third step of the designing process, the table of transitions is formed.
Distinct from the table of transitions of the CMCU U5, now inputs of OLCs are
used as the destination of transitions. The transition table of the CMCU U6 is
presented as Tab. 5.4.
Tab. 5.4: Transition table of the CMCU U6
αg
K(αg) Xh αt Ijt
K(Ijt ) Z h
q2 q1 z2 z1
α1 0 0 x1 α2 I12 0 1 z1 1
α1 0 0 x1 x2 α2 I22 1 0 z2 2
α1 0 0 x1 x2 α3 I13 1 1 z2 z1 3
The transition table is the basis for the formation of the excitation function
Z for the function decoder. This function includes two variables Z={z1,z2}:
z2 = q2· q1·x1,
z1 = q2· q1· (x1 + x2). (5.12)
The function Z keeps the code of the proper input that is selected by the
combinational circuit. This code is further decoded by the block FD. Therefore,
5.2. CMCU with sharing codes and a function decoder 69
the function decoder generates the address of microinstruction which is a concate-
nation of the proper code K(αt) of the OLC and the code of its component K(bj).
Table 5.5 presents the table of the function decoder for the CMCU U6.
Tab. 5.5: Table of the function decoder of the CMCU U6
Itj
K(Itj) αt
K(αt) bj
K(bj) D T h
z2 z1 q2 q1 t3 t2 t1
I11 0 0 α1 0 0 b1 0 0 0 – – 1
I12 0 1 α2 0 1 b4 0 0 0 d1 – 2
I22 1 0 α2 0 1 b7 0 1 1 d1 t2 t1 3
I13 1 1 α3 1 0 b8 0 0 0 d2 – 4
The circuit of the function decoder is usually implemented using dedicated
memory blocks. However, it can be also realised with logic elements of the FPGA.
In this case, the functions T and D are formed:
t3 = 0,
t2 = t1 = z2· z1,
d2 = z2· z1,
d1 = z2· z1 + z2· z1.
(5.13)
Now the CMCU U6 may be prototyped using HDLs. Both control memory
and the function decoder are implemented with dedicated memory blocks, while
the combinational circuit, the counter and the register are realised with logic blocks
of the FPGA. Figure 5.6 shows the logic schematic of the controller.
4 LUTs 2 FDCEsD
A
Reset
X
Q
3 LUTs
T
Y
Clk
3 FDCs
y0
Clr
Clk
CE
Clr
Clk
1 BRAM
Clr
Clk
1 BRAM
Clr
Clk
Fig. 5.6: Technology structure of the CMCU U6
70 5. Compositional microprogram control units with sharing codes
5.2.4. Summary
The CMCU with sharing codes and a function decoder was introduced in this
section. The additional block reduces the number of variables of logic functions
that are formed by the combinational circuit. Experiments proved the effectiveness
of the proposed method. The CMCU USD consumes on average 6% fewer LUT
elements than the CMCU USC .
5.3. CMCU with an address converter
The method of sharing codes makes sense only if the size of codes generated by the
register RG and by the counter CT is equal to the width of the microinstruction
address (Barkalov, 2002). Then the following condition is fulfilled:
R1 +R2 = R3. (5.14)
In most cases, the total number of bits generated by the register and by the
counter exceeds the width of the microinstruction address. The condition (5.14)
is violated because R1 + R2 > R3, and the volume of control memory grows
drastically. The minimum volume of the memory can be calculated as
SCM = (N + 2) ∗ 2R3 , (5.15)
where SCM means the total volume of control memory, N + 2 counts the total
number of microoperations kept in control memory (N is the number of micro-
operations while two additional bits are formed by y0 and yK), and R3 defines
the minimum width of the address. It is clear that each additional bit in the
microinstruction address doubles the total volume of the memory.
Sections 5.3 and 5.4 show a new synthesis idea of the CMCU with sharing
codes. The method is based on the application of the additional block (address
converter) in the CMCU structure (Fig. 5.7). Such an approach makes sense only
if the condition 5.14 is violated and the total quantity of codes generated by the
register and by the counter is greater than the width of the address of control
memory.
CC
CT
RG
CM
T
D
A
Q
Y
X V
y0
CA
Fig. 5.7: Structure of the CMCU with an address converter
5.3. CMCU with an address converter 71
5.3.1. Main idea of the method
Let K(αg) be the state code of the register and K(bt) the state code of the counter.
According to (5.1), the microinstruction address A(bt) is calculated as the concate-
nation of these codes:
A(bt) = K(αg) ∗K(bt).
In the CMCU UCA, the address generated by the register and by the counter
is converted by the address converter.
Now the circuit CC forms a system of functions:
T = f(X,Q), (5.16)
D = f(X,Q), (5.17)
and the circuit CA converts the generated addresses, forming the new function V :
V = V (Q,X). (5.18)
Here, V = {v1, . . . , vR3} is the set of addresses of control memory.
The presented solution permits to combine positive features of the traditional
CMCU with a base structure (UBS) and with sharing codes (USC) such as
• minimal number of inputs and outputs of the combinational circuit CC (com-
pared with USC);
• minimal width of an address of control memory (in comparison with UBS).
It is clear that the application of a given method makes sense only if the
implementation of the CMCU with an additional address converter requires fewer
memory blocks of the target FPGA than CMCUs based on the standard structure
USC .
5.3.2. Synthesis of the CMCU with an address converter
The design method of the CMCU with sharing codes and an address converter
includes the following steps:
1. Formation of the set of OLCs; encoding OLCs and their compo-
nents. This step is executed in the same manner as presented in the previous
sections.
2. Natural addressing of microinstructions and the formation of the
control memory content . Here microinstructions are simply encoded with
natural binary codes. Then the content of control memory is formed. The
volume of control memory can be calculated as SCM=(N+2)∗2R1+R2 .
72 5. Compositional microprogram control units with sharing codes
3. Formation of the transition table of the CMCU . According to (5.16)
and (5.17), this table is the basis for the formation of excitation functions
for the counter and for the register. Here outputs of the register (function
Q) and of the counter (function T ) are inputs of the address converter.
This table contains the following columns: αg, K(αg), αt, K(α5), I
j
t , A(I
j
t ),
Xh, T , D, h, where
• αg ∈ C is the initial chain of the flow-chart Γ;
• K(αg) is the code of the OLC αg ∈ C;
• αt ∈ C is the target OLC of the transition;
• K(αt) is the code of the target OLC of the transition;
• Ijt is the j-th input of the OLC αt ∈ C in which the transition from the
output Og of the OLC αg ∈ C occurs;
• A(Ijt ) is the address of the component in the OLC αt corresponding to
the input Ijt ;
• Xh is the input signal causing the transition < Og, Ijt > and is equal to
the conjunction of the elements from the set X;
• T is the set of variables that form the excitation function for the counter
CT (formed with the code A(Ijt ));
• D is the set of variables that form the excitation function for the register
RG (formed with the code K(αt));
• h is the number of the transition (h=1, . . . , H).
4. Formation of the table of the address converter . In this step the
truth-table for the address converter is created. Based on functions generated
by the counter and by the register, the microinstruction address is formed.
The table contains the following columns: αg, K(αg), bt, K(bt), At, Vt, m.
Here
• αg ∈ C is the chain of the flow-chart Γ;
• K(αg) is the code of the OLC αg ∈ C;
• bt ∈ B is the operational block of the flow-chart Γ;
• K(αg) is the code of the block bt ∈ B;
• At is the address of the microinstruction encoded in a natural binary
code;
• Vt is the column containing the variables vr ∈ V that are equal to 1 in
the address At;
• m is the consecutive line in the truth-table of the address converter.
5.3. CMCU with an address converter 73
5. Design and implementation of the logic circuit of the CMCU USC .
The circuit CC is implemented using the systems (5.16) and (5.17), which
are formed from the table of transitions. Depending on the demands, the
address converter may be implemented using either logic elements or memory
blocks of the FPGA. The volume of the memory that is required for address
converter realization can be calculated as SCA=(R1 +R2)∗2R3 .
5.3.3. Example of the synthesis of the CMCU with an address converter
The synthesis method of the CMCU with an address converter will be illustrated
with an example.
Fig. 5.8: Flow-chart Γ3
The flow-chart Γ3 shown in Fig. 5.8 describes a hypothetical CMCU U7. There
are M2=3 OLCs in the set of operational linear chains, C={α1, . . . , α3}, where
α1=〈b0, . . . , b2〉, α2=〈b3, . . . , b7〉 and α3=〈b8, . . . , b13〉. The OLC flow-chart of the
CMCU U7 is illustrated in Fig. 5.9.
74 5. Compositional microprogram control units with sharing codes
Fig. 5.9: OLC flow-chart of the CMCU U7
The longest operational linear chain is α3, and it contains M1=6 vertices.
Therefore, there are R1=dlog2M1e=3 bits required to implement the counter.
There are M2=3 OLCs, thus R2=dlog2M2e=2 variables will be used for encod-
ing internal states of the controller. The CMCU U7 contains M3=13 operational
vertices. This means that microinstructions can be addressed with a minimum
number of R3=dlog2M3e=4 bits. On the other hand, the total width of an address
generated by the counter and by the register is equal to R1+R2=5. Therefore,
the condition (5.14) is violated (R1+R2>R3) and the application of the address
converter makes sense.
According to the design rules, OLCs and their components should be encoded.
Let us use natural binary codes. Table 5.6 illustrates the encoding of OLCs and
their components.
In contrast to the traditional method with sharing codes, the address of each
microinstruction is encoded with R3=4 bits in a natural binary code. Therefore,
the microinstruction corresponding to the vertex b1 has the address A(b1)=0000.
Similarly, the address of the microinstruction executed in the vertex b9 is encoded
as A(b9)=1000, and so on. Now, the content of control memory can be formed
(Table. 5.7).
In the next step, the transition table of the CMCU should be formed. This
table is the basis for excitation functions for the counter and for the register. Table
5.8 presents the transition table of the CMCU U7.
5.3. CMCU with an address converter 75
Tab. 5.6: Encoding of CMCU U7 OLCs and their components
αg K(αg) bt K(bt) αg K(αg) bt K(bt)
α1 00
b1 000
α3 10
b8 000
b2 001 b9 001
b3 010 b10 010
α2 01
b4 000 b11 011
b5 010 b12 100
b6 011 b13 101
b7 010
Tab. 5.7: Content of the control memory of the CMCU U7
Vertex Address Microinstruction Comment
y0 y1 y2 y3 y4 y5 yK
b1 0000 0 1 1 0 0 0 0 I11
b2 0001 0 0 0 1 1 0 0 –
b3 0010 1 0 0 0 0 1 0 O1
b4 0011 0 0 1 0 0 1 0 I12
b5 0100 0 1 0 0 0 0 0 –
b6 0101 0 0 0 1 0 1 0 –
b7 0110 1 1 0 0 1 0 1 I22 O2
b8 0111 0 1 0 1 0 0 0 I13
b9 1000 0 0 0 0 0 1 0 –
b10 1001 0 0 0 0 1 0 0 –
b11 1010 0 0 1 0 0 0 0 –
b12 1011 0 0 0 1 0 1 0 –
b13 1100 1 1 1 0 0 0 1 I23 O3
Tab. 5.8: Transition table of the CMCU U7
αg
K(αg) Xh αt
K(αt) bj
K(bj) D T h
q2 q1 d2 d1 t3 t2 t1
α1 0 0 x1 α2 0 1 b4 0 0 0 d1 – 1
α1 0 0 x1 x2 α2 0 1 b7 0 1 1 d1 t2 t1 2
α1 0 0 x1 x2 x3 α3 1 0 b8 0 0 0 d2 – 3
α1 0 0 x1 x2 x3 x4 α3 1 0 b13 1 0 1 d2 t3 t1 4
α1 0 0 x1 x2 x3 x4 α1 0 0 b1 0 0 0 – – 5
76 5. Compositional microprogram control units with sharing codes
The transition table is the basis for the formation of the excitation functions
D for the register and the functions T for the counter, according to (5.19):
d2 = q2· q1·x1·x2· (x3 + x3·x4),
d1 = q2· q1· (x1 + x1·x2),
t3 = q2· q1·x1·x2·x3·x4,
t2 = q2· q1·x1·x2,
t1 = q2· q1·x1· (x2 + x2·x3·x4).
(5.19)
At the next stage, the truth table for the address converter is prepared. An
address of a microinstruction is formed based on the OLC code and its component
(Tab. 5.9).
Tab. 5.9: Table of the address converter
αg K(αg) bt K(bt) At Vt m
α1 00
b1 000 0000 – 1
b2 001 0001 v1 2
b3 010 0010 v2 3
α2 01
b4 000 0011 v2 v1 4
b5 001 0100 v3 5
b6 010 0101 v3 v1 6
b7 011 0110 v3 v2 7
α3 10
b8 000 0111 v3 v2 v1 8
b9 001 1000 v4 9
b10 010 1001 v4 v1 10
b11 011 1010 v4 v2 11
b12 100 1011 v4 v2 v1 12
b13 101 1100 v4 v3 13
Finally, the CMCU U7 is designed using hardware description languages. Con-
trol memory and the address converter are realised with dedicated memory blocks
of the FPGA. Figure 5.10 shows the logic schematic of the prototyped controller.
There are 13 LUTs required in order to implement the controller.
The main problem of the CMCU U7 is the realisation of the address converter,
which is synchronous. Such a module has to form a proper address based on codes
delivered from the counter and from the register. Both blocks are triggered by the
rising edge of a clock signal Clk. Additionally, the address ought to be prepared
before the falling edge of Clk, which synchronizes control memory. Therefore, to
ensure proper functionality of the CMCU, an additional clock signal Clk2 was
introduced. The address converter should be triggered between the rising and
falling edges of Clk.
5.4. CMCU with an address converter and a function decoder 77
10 LUTs 2 FDCEs
D
A
Reset
Clk2
X
Q
3 LUTs
T
Y
V
Clk
3 FDCs
y0
Clr
Clk
CE
Clr
Clk
1 BRAM
Clr
Clk
1 BRAM
Clr
Clk
Fig. 5.10: Technological structure of the CMCU U7
5.3.4. Summary
The CMCU with an address converter was presented in this section. The applica-
tion of the additional block makes sense only if the total size of codes generated
by the counter and by the register exceeds the width of the address of control
memory. The results of experiments showed that the proposed method permits to
reduce the number of required dedicated memory blocks of an FPGA on average
by 46% in comparison to the traditional CMCU with sharing codes. It should also
be emphasized that the number of other resources of an FPGA (LUTs, flip-flops,
slices) is the same.
5.4. CMCU with an address converter and a function decoder
This section presents the last method of the synthesis of the CMCU with sharing
codes that is proposed in the dissertation – the CMCU UCD with an address con-
verter and a function decoder. Such a controller combines two ideas presented in
the previous sections. The application of the address converter permits to mini-
mize the volume of control memory if the condition (5.14) is violated, while the
additional function decoder reduces the required logic elements for the implemen-
tation of the CMCU.
5.4.1. Main idea of the method
The CMCU UCD with an address converter and a function decoder is shown in
Fig. 5.11. The excitation functions T for the counter and D for the register are
encoded with the minimum number of bits. Now the combinational circuit CC
generates the excitation function Z for the function decoder:
Z = f(X,Q). (5.20)
78 5. Compositional microprogram control units with sharing codes
FD
CT
RG
CM
T
D
A
Q
Y
X CA
V
y0
CC Z
Fig. 5.11: CMCU with an address converter and a function decoder
The function Z contains the encoded addresses Q of all inputs I in the set
of OLCs. They are further decoded by the circuit FD, which indicates the proper
code for the counter and for the register:
T = f(Z), (5.21)
D = f(Z). (5.22)
Finally, the address indicated by the counter and by the register is converted
via the circuit CA:
V = f(T,D). (5.23)
5.4.2. Synthesis of the CMCU with an address converter
and a function decoder
The synthesis process of the CMCU UCD is a combination of designing flows of
the CMCUs USD and UCA. Therefore, only the most important stages will be
presented (more details were shown in the previous sections).
The design method of the CMCU with an address converter and a function
decoder includes the following steps:
1. Formation of set of OLCs; encoding OLC inputs and OLC compo-
nents. First, the set of OLCs is formed. Then, similarly to the synthesis of
the CMCU USC , all OLCs and their components are encoded. Additionally,
each input I is encoded with natural binary codes. Finally, each input has
a unique code K(Itj).
2. Natural addressing of microinstructions and the formation of the
control memory content. Here microinstructions are encoded with natu-
ral binary codes. Then the content of control memory is formed.
3. Formation of the transition table of the CMCU UCD. According to
(5.20), this table is the basis for the formation of the excitation function for
the function decoder.
5.4. CMCU with an address converter and a function decoder 79
4. Formation of the table of the function decoder. Based on the code of
each OLC input, the function decoder generates proper excitation functions
for the counter and for the register.
5. Formation of the table of the address converter. In this step, the
truth-table for the address converter is created. Based on functions generated
by the counter and by the register, an address of the microinstruction is
formed.
6. Design and implementation of the logic circuit of the CMCU USC.
Three blocks (CC, CT and RG) of the CMCU are implemented with logic
elements of the FPGA, while the circuits CM, FD and CA are realized with
dedicated memory blocks.
5.4.3. Example of the synthesis of the CMCU with an address
converter and a function decoder
To bring closer the synthesis method of the CMCU U8 with an address converter
and a function decoder, the controller described by the flow-char Γ3 (Fig. 5.8) will
be designed once more. There are M2=3 OLCs. The encoding of OLCs and their
components is performed in the same manner as in the previous section (see Tab.
5.6). There are MZ=5 OLCs inputs: I11=b0, I12=b4, I22=b7, I13=b8 and I23=b13.
Therefore, RZ=dlog2MZe=3 bits are required to encode all inputs: K(I11 )=000,
K(I12 )=001, K(I22 )=010, K(I13 )=011, K(I23 )=100.
At the second stage, microinstructions are addressed with a natural binary
code. Then the content of control memory is formed. This step is executed
identically as in the synthesis method of the CMCU U7 with an address converter
(see Tab. 5.7).
Next, the transition table of the CMCU U8 is formed. The table is the basis
for the excitation function for the module FD. Table 5.10 presents the content of
the table of transitions of the CMCU U8.
Tab. 5.10: Transition table of the CMCU U8
αg
K(αg) Xh αt Ijt
K(Ijt ) Z h
q2 q1 z3 z2 z1
α1 0 0 x1 α2 b4 0 0 1 z1 1
α1 0 0 x1 x2 α2 b7 0 1 0 z2 2
α1 0 0 x1 x2 x3 α3 b8 0 1 1 z2 z1 3
α1 0 0 x1 x2 x3 x4 α3 b13 1 0 0 z3 4
α1 0 0 x1 x2 x3 x4 α1 b13 0 0 0 – 5
80 5. Compositional microprogram control units with sharing codes
From the table of transitions, the excitation function Z for the function de-
coder is formed. The function consist of |Z|=3 variables:
z3 = q2· q1·x1·x2·x3·x4,
z2 = q2· q1·x1· (x2+·x2·x3),
z1 = q2· q1· (x1 + x1·x2·x3).
(5.24)
The truth table of the function decoder is prepared at the next stage. Based
on the function Z, the module FD indicates proper values for the counter and for
the register (Tab. 5.11).
Tab. 5.11: Truth table of the function decoder of the CMCU U8
Itj
K(Itj) αt
K(αt) bj
K(bj) D T h
z3 z2 z1 d2 d1 t3 t2 t1
I11 0 0 0 α1 0 0 b1 0 0 0 – – 1
I12 0 0 1 α2 0 1 b4 0 0 0 d1 – 2
I22 0 1 0 α2 0 1 b7 0 1 1 d1 t2 t1 3
I13 0 1 1 α3 1 0 b8 0 0 0 d2 – 4
I23 1 0 0 α3 1 0 b13 1 0 1 d2 t3 t1 4
At the last step, the content of the address converter should be determined.
The truth table for the module CA is exactly the same as in the previous section
(Tab. 5.9). Finally, the CMCU U8 can be designed. Now three blocks – the
function decoder, the address converter and control memory – are implemented
with dedicated memory blocks of an FPGA. Figure 5.12 shows the technological
structure of the CMCU U8.
7 LUTs 2 FDCEs
D
A
Reset
Clk2
X
Q
3 LUTs
T
Y
V
Clk
3 FDCs
y0
Clr
Clk
CE
Clr
Clk
1 BRAM
Clr
Clk
1 BRAM
Clr
Clk
1 BRAM
Clr
Clk
Fig. 5.12: Technological structure of the CMCU U8
5.5. Conclusions 81
The implementation of the CMCU U8 requires 10 LUTs. Therefore, the num-
ber of such elements was reduced by 23% in comparison to the CMCU U7 with an
address converter.
5.4.4. Summary
The design method of the CMCU UCD with an address converter and a function
decoder was presented in this section. The reduction of the number of LUTs is
reached thanks to the encoding of excitation functions for the counter and for the
register. Additionally, the application of an address converter permits to keep the
minimum volume of control memory. It should be pointed out that the performed
experiments proved the effectiveness of the proposed method. The CMCU UCD
permits to reduce the number of logic blocks of the target FPGA on average by
50% in comparison with the traditional FSM.
5.5. Conclusions
There were four CMCUs design methods presented in this chapter. The first one –
the CMCU USC with sharing codes – was initially proposed by Alexander Barkalov
and oriented towards CPLDs (Barkalov, 2002). Therefore, it ought to be adapted
to the FPGA. All other synthesis methods are based on the CMCU with sharing
codes. The aim of all methods is to reduce either the number of logic blocks or – in
the case of the application of the address converter – the control memory volume.
In the CMCU USD with sharing codes and a function decoder, excitation
functions for the counter and for the register are encoded with a minimum num-
ber of bits. These functions are further decoded by the function decoder. The
additional circuit is realized with dedicated memory blocks of an FPGA. There-
fore, the implementation of the CMCU USD requires less logic blocks than the
CMCU USC . Detailed experiments (presented in Chapter 8) proved the effective-
ness of the application of the function decoder. The realisation of the controller as
a CMCU USD reduces the number of required logic blocks of an FPGA on average
by 13%.
The two remaining synthesis methods refer to the application of an address
converter. The additional circuit is in charge of keeping the minimum volume
of control memory. The application of the address converter makes sense only if
the total length of codes generated by the counter and by the register exceeds
the minimum width of microinstruction addresses. The conducted investigations
showed that in the case of controllers where control memory ought to be decom-
posed (their volume exceeds the volume offered by dedicated memory blocks of
the FPGA), the CMCU UCA with an address converter reduces the number of
dedicated memory blocks of the FPGA on average by 46% in comparison to the
CMCU USC .
The ideas presented in the CMCUs USD and UCA were combined in the last
synthesis method of the compositional microprogram control unit with an address
converter and a function decoder. In the CMCU UCD, both the address converter
82 5. Compositional microprogram control units with sharing codes
and the function decoder are applied. The results of experiments showed that
the CMCU with an address converter and a function decoder requires the least
resources of an FPGA out of all methods where the operation of sharing codes was
applied. It should be pointed out that the presented method permits to reduce
the number of logic blocks of the target device on average by 49% in comparison
to the FSM. On the other hand, the realisation of the controller as a CMCU UCD
requires more dedicated memory blocks than the traditional automaton (by 86%).
Therefore, the proposed synthesis method is the best solution if the total volume
offered by dedicated memory blocks of an FPGA does not exceed the volume of
the control memory of the CMCU. In other cases, one of the synthesis methods
of the CMCU with mutual memory presented in the previous chapter should be
used instead.
Chapter 6
PARTIAL RECONFIGURATION OF
COMPOSITIONAL MICROPROGRAM CONTROL
UNITS IMPLEMENTED IN THE FPGA
This chapter deals with partial reconfiguration of compositional microprogram
control units implemented in the FPGA. In traditional prototyping methods of the
CMCU, the content of control memory is realised with logic elements of the FPGA.
However, the latest FPGAs offer additionally blocks of dedicated memory that are
integrated with the device. Therefore, the content of control memory can be easily
implemented with dedicated memories of the FPGA (Wiśniewski, 2005b; Barkalov
et al., 2005j). The functionality of the CMCU prepared in such a way can easily
be changed. This chapter introduces the new idea of partial implementation of the
CMCU. Designers are able to modify only a few microinstructions of the CMCU.
In the case of traditional implementation, the whole content of the FPGA ought
to be replaced. Partial reconfiguration of the CMCU permits to change only the
content of control memory while the rest of the system is not modified.
Partial reconfiguration of FPGA devices is a relatively new idea. Therefore,
not all programmable devices offer the reconfiguration of part of their resources.
Such a solution refers especially to devices by Xilix, Altera and Atmel.
The XC2VP30 device (Virtex-II Pro family) by Xilinx was selected as the
base FPGA for further analysis (Xilinx, 2007). Such a device permits partial
reconfiguration and it is available at the University of Zielona Góra. All presented
FPGA structures, research and performed experiments refer to this FPGA.
6.1. Introduction to partial reconfiguration of FPGA devices
This section introduces partial reconfiguration of FPGA devices. From the view-
point of the functionality of design, partial reconfiguration can be divided into two
groups:
• Dynamic partial reconfiguration – also known as active partial reconfigura-
tion – permits to change part of the device while the rest of the FPGA is
still running.
• Static partial reconfiguration – the device is not active during the recon-
figuration process. While the partial data is sent into the FPGA, the rest
84 6. Partial reconfiguration of CMCUs implemented in the FPGA
of the device is stopped (in the shutdown mode) and brought up after the
configuration is completed.
There are two styles of partial reconfiguration of FPGA devices by Xilinx:
module-based and difference-based.
• Module-based partial reconfiguration permits to reconfigure distinct modular
parts of the design. To ensure communication across reconfigurable module
boundaries, special bus macros ought to be prepared. That works as a fixed
routing bridge that connects the reconfigurable module with the remaining
part of the design. Module-based partial reconfiguration requires performing
a set of specific guidelines during the stage of design specification detailed
in (Xilinx, 2004). Finally, for each reconfigurable module of the design, a
separate bit-stream is created. Such a bit-stream is used to perform partial
reconfiguration of the FPGA.
• Difference-based partial reconfiguration can be used when a small change
is made in the design. It is especially useful in the case of changing LUT
equations or the dedicated memory blocks content. The partial bit-stream
contains only information about differences between the current design struc-
ture (that resides in the FPGA) and the new content of the FPGA. There
are two ways of difference-based reconfiguration known as a front-end and
a back-end. The first one is based on the modification of the design in
hardware description languages (HDLs). It is clear that such a solution
requires full repeating of the synthesis and implementation processes. Back-
end difference-based partial reconfiguration permits to make changes at the
implementation stage of the prototyping flow. Therefore, there is no need for
the re-synthesis of the design. The usage of both methods (either front-end
or back-end) leads to the creation of a partial bit-stream that can be used
for a partial reconfiguration of the FPGA.
All research results and experiments presented in the dissertation are based
on static difference-based partial reconfiguration. This method was chosen because
of the structure of the CMCU. Difference-based partial reconfiguration permits to
change the content of control memory at the implementation stage. Therefore,
most steps of the prototyping flow can be omitted. Moreover, the designer can
prepare more than one partial bit-streams with alternative versions of the content
of control memory. They can be very easily switched in the FPGA (the full bit-
stream is sent only once).
The next section presents the organization of dedicated memory blocks of
Xilinx FPGAs. Such an organization is very important for the understanding of
the mechanism of partial reconfiguration.
6.2. Mechanism of partial reconfiguration of Xilinx FPGAs 85
6.2. Mechanism of partial reconfiguration of Xilinx FPGAs
Figure 6.1 presents the structure of a typical FPGA device by Xilinx. As was shown
in Chapter 2, the main elements of the device are configurable logic blocks (CLBs),
which create a matrix of connected blocks. Each CLB contains two logic elements
called slices. Furthermore, each slice is built from two look-up tables (LUTs),
which perform all logic functions. Therefore, all logic elements of the CMCU such
as the combinational circuit, the register and the counter are implemented using
CLBs. Moreover, the FPGA contains dedicated memory blocks called block-RAMs
(or just BRAMs).
.

.

.

Columns of BRAMs
.

.

.

CLB
CLB
CLB
BR
AM

BR
AM

CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
BR
AM

BR
AM

CLB
CLB
CLB
CLB
CLB
CLB
.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

...
...
...
...
...
...
BR
AM

BR
AM

CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
BR
AM

BR
AM

CLB
CLB
CLB
Fig. 6.1: Structure of the FPGA device
Block-RAMs are organized in columns. The number of columns and BRAMs
in each column is different and depends on a particular FPGA. For example, the
device XC2VP30 (Virtex II Pro family) contains 136 dedicated memories which are
grouped in eight columns organized as 2x20 (two columns containing 20 BRAMs),
2x18, 2x16 and 2x14 (Xilinx, 2007). Additionally, each BRAM is divided into
lines (also called INITs). They are used for the initialization, configuration and
partial reconfiguration of the block. There are 64 lines per each BRAM (counted
hexadecimal from INIT_00 to INIT_3F ).
86 6. Partial reconfiguration of CMCUs implemented in the FPGA
Both the full and the partial bit-stream used for the configuration of the
device consist of frames. Each frame contains a portion of information about the
implemented design. In the case of partial reconfiguration, only different frames are
sent to the FPGA.What is very important, partial reconfiguration of Xilinx devices
operates on the whole column of BRAMs. This means that the modification of one
microoperation in one BRAM causes the reconfiguration of all dedicated memories
that belong to the same column. In the case of the XC2VP30 device, each column
of BRAMs is divided into 64 frames. One frame corresponds to one line (INIT) in
all BRAMs in the column (for example, the modification of two frames means the
reconfiguration of two lines in all blocks that belong to the column). Therefore,
each frame contains a portion of information about all BRAMs that are organized
in the column (Fig. 6.2).
BR
AM

0
1
2
...
63
...
BR
AM

0
1
2
...
63
...
BR
AM

0
1
2
...
63
SAME FRAME
(FRAME 1)
SAME FRAME
(FRAME 63)
Fig. 6.2: Organization of BRAMs
The next section presents the current prototyping flow of CMCUs. Such a
design process does not include the idea of partial reconfiguration. Therefore,
a forthcoming section introduces a modified prototyping flow based on partial
reconfiguration of CMCUs implemented in the FPGA.
6.3. Traditional prototyping flow of control units 87
6.3. Traditional prototyping flow of control units
In order to show the idea of partial reconfiguration of CMCUs implemented in the
FPGA, the traditional prototyping flow of controllers will be presented. Figure 6.3
illustrates the design process of a typical digital system (Wiśniewski and Węgrzyn,
2005; Parnell and Mehta, 2003), which can be applied in the case of the CMCU
prototyping flow.
Structural decomposition
Synthesis
Implementation
FPGA
Control unit specification
Fig. 6.3: Traditional prototyping flow
At the beginning, the specification and structure of the CMCU ought to be
prepared. The structure of the controller is prepared according to the synthesis
methods presented in previous chapters. Finally, the CMCU may be designed
according to the following steps:
1. Description of the compositional microprogram control unit prepared with
HDL languages. At this stage, all modules (combinational circuit, register,
counter and control memory) of the further control system are created. The
specification of the control memory content is not required now, although the
designer can specify initial values for the controller. Next, the description
of the compositional microprogram control unit should be verified in the
software simulator. This allows avoiding most functional errors in the design.
2. Logical synthesis of the design. The synthesis process converts the design
described with HDLs into the gate level. There are gates, logic blocks and
connections between them created as a result of synthesis (also known as a
"netlist"). This process is the same as in the traditional prototyping flow.
88 6. Partial reconfiguration of CMCUs implemented in the FPGA
3. Logical implementation of the design. At this stage, logic implementation
of the CMCU is performed. As a result of the implementation process, the
bit-stream is produced. It contains full description of the design that will be
sent to the device for the configuration of the FPGA.
4. Hardware implementation of the design. The FPGA is configured with the
bit-stream produced in the previous step.
It is clear that any modification of the content of control memory requires
repeating the full prototyping flow. Therefore, if there is a need to implement
another version of the CMCU, all steps ought to be performed, even if the designer
wants to change only one bit of control memory.
The next section presents the new idea of the prototyping flow of the CMCU.
The method is based on partial reconfiguration of FPGA devices.
6.4. Partial reconfiguration of CMCUs implemented in the FPGA
The prototyping flow for the control unit that should be prepared for further
reconfiguration is similar to the traditional prototyping process. Therefore, at the
beginning, the design should be described using hardware description languages
(HDL) like Verilog or VHDL. Then it should be verified to avoid further functional
errors. After the verification, the design is synthesized. The difference between
the proposed and traditional prototyping flows is the implementation process. At
this stage, the content of further control memory is prepared. As the result of
the implementation process, the bit-stream is created. It contains full information
about the configuration of the target FPGA device. Therefore, the size of the file
is respectively large. That also means long FPGA configuration time (Barkalov
et al., 2005j; Barkalov and Wiśniewski, 2005e; Barkalov et al., 2005f).
 
Full bit-stream from  
1st prototyping process  
Partial implementation  
FPGA 
Modified memory 
content 
Fig. 6.4: Modified prototyping flow including the operation of partial reconfigu-
ration of CMCUs implemented in the FPGA
6.5. Example of partial reconfiguration 89
The method of partial reconfiguration of the control unit includes the following
steps:
1. Description of the compositional microprogram control unit prepared with
HDL languages. This step is performed in the same manner as in the tradi-
tional prototyping flow. Next, the CMCU should be verified in the software
simulator. This allows avoiding most functional errors in the design.
2. Logical synthesis of the design. This process is the same as in the traditional
prototyping flow.
3. Formation of the control memory content. Now the content of control mem-
ory is created. The designer can prepare as many versions of the control
memory content as necessary.
4. Logical implementation of the first version of the design. As the result of
the logic implementation process, the bit-stream is produced. It contains
the first description of the design that will be sent to the device for the
configuration of the FPGA.
5. Hardware implementation of the design. At this stage, the FPGA is config-
ured for the first time. Therefore, the whole description of the device must
be specified in the bit-stream.
6. Modification of the control memory content. At this step, the content of con-
trol memory should be replaced with alternative values that were previously
prepared at Stage 3. The modification is performed during logic implemen-
tation. The content of control memory can be specified in many ways – by an
.ucf file or via Xilinx tools like FPGA Editor, see (Xilinx, 2004) for details.
7. Preparation of the difference bit-stream. Now the new bit-stream is created.
It contains only the differences between the new version of the design and
the previous one, which is already implemented in the FPGA. In fact, the
bit-stream will contain only information about modified elements of control
memory (Xilinx, 2007).
Steps 6 and 7 should be repeated for each version of the control memory
content that was prepared at Stage 3.
8. Partial reconfiguration of the device. Using bit-streams that were produced
in Step 3, the device can be partially reconfigured. The functionality of the
control unit can be changed very easily and very fast, because only different
frames between the modified and already implemented designs are sent to
the FPGA.
6.5. Example of partial reconfiguration of the CMCU
implemented in the FPGA
The idea of partial reconfiguration of the compositional microprogram control
unit will be shown with the example of the traffic lights driver. It is a simplified
90 6. Partial reconfiguration of CMCUs implemented in the FPGA
version of the control unit, just to show the benefits of partial reconfiguration of
the CMCU.
The driver controls traffic lights for vehicles and pedestrians at the crossroads.
It is assumed that the CMCU works according to the following rules:
• the crossroads is completely collision free,
• each road has three independent traffic lines and three independent traffic
lights, each for vehicles turning left, going straight and turning right,
• to make the design more clear, yellow lights are not considered. There
are only two signals for vehicles and pedestrians. The green light means
"go/walk" and the red one says "wait/stop".
The main goal for designers of such traffic light drivers is to minimize traffic
bottlenecks and jams. In the case of daytime (morning, before noon, after noon,
etc.), more privileges ought to be admitted for vehicles or pedestrians.
In the example, two versions of a traffic driver are proposed. The first one gives
more privileges for vehicles. The simplified model of the design can be described
by four main states.
Fig. 6.5: First version of the 3rd state of the traffic light driver
In the first state, the green light is set for vehicles turning left. The "stop"
signal is shown for other vehicle lines. Pedestrians also have to wait. In the second
stage, vehicles going straight and turning right may pass trough the crossroads.
The light for cars turning left is changed to red. Pedestrians still wait. The third
step allows turning right while other lights are set to red. This means that none
6.5. Example of partial reconfiguration 91
of the drivers turning left, going straight or pedestrians may go. This stage is
illustrated in Fig. 6.5. Finally, in the last stage, pedestrians can walk safely across
the street because all three lights for vehicles are set to red.
It has to be pointed out that such a solution regarding traffic lights is very
comfortable for vehicles but pedestrians can cross the street only in one of the
four states. Therefore, the second version of the driver assumes more privileges
for pedestrians. In this design, only the third state was modified (Fig. 6.6). Now
all lights for vehicles are set to red. This means that the green light for pedestrians
is shown and they can walk across the street. Such a small modification changes
the whole traffic cycle, because now pedestrians may safely cross the street twice
more frequently than before.
Fig. 6.6: Second version of the 3rd state of the traffic light driver
The design of the traffic light driver was prepared and implemented using the
XC2VP30 device (Xilinx Virtex-II Pro family). For both versions, full bit-streams
and partial reconfiguration data were prepared. Table 6.1 shows the results that
were achieved during configuration. The size of the bit-stream and approximate
time that is needed for device configuration are presented.
The above table shows that the size of the original bit-stream was highly
reduced. During the first configuration of the FPGA, over 1448K bytes have to be
sent. In the case of partial reconfiguration, only 2,5K bytes were required. This
means that the original size of the bit-stream was reduced by over 99,81%.
92 6. Partial reconfiguration of CMCUs implemented in the FPGA
Tab. 6.1: Results achieved during the implementation of the traffic light driver
Full bit-stream Partial-bit stream
Size [bytes] 1 448 816 2 696
Time [s] 4.5 >0.1
6.6. Conclusions
The idea of partial reconfiguration of CMCUs implemented in the FPGA was
shown in this chapter. Moreover, a new prototyping flow of CMCUs was proposed.
The modified design method is based on partial reconfiguration of a controller
implemented in the FPGA. Only the control memory content is replaced while the
rest of the system is not modified. In the presented prototyping flow, logic synthesis
and implementation are performed only once. Therefore, such a realisation highly
accelerates the whole prototyping process.
The performed experiments showed that the original bit-stream that is sent
to the FPGA can be reduced even over 500 times. Detailed results of investi-
gations of the effectiveness of partial reconfiguration of CMCUs implemented in
programmable devices are presented in Chapter 8.
Chapter 7
CAD TOOL FOR AUTOMATIC SYNTHESIS OF
CMCUS (ATOMIC)
This chapter introduces the dedicated CAD tool that was prepared to perform
AuTOM atic synthesI s of CMCUs (ATOMIC). Based on the description of the
controller as a flow-chart, ATOMIC produces a code in a hardware description
language (Verilog). Such a code is ready for logic synthesis and further implemen-
tation in the FPGA. There are main features shown in this chapter. A detailed
description of input and output data formats, switches and parameters are pre-
sented in Appendix A.
7.1. Overview of ATOMIC
ATOMIC implements all eight methods presented in Chapters 5 and 6. Based
on the description of the controller, ATOMIC generates the code in Verilog HDL.
There are three main modules that constitute ATOMIC (Fig. 7.1).
Fig. 7.1: Structure of ATOMIC
The first module (fc2olc) analyses the structure of the flow-chart and produces
the set of operational linear chains. This step is common for all implemented
94 7. CAD Tool for Automatic synthesis of CMCUs (ATOMIC)
methods. The second module (olc2mcu) is based on the description of OLCs, and
the chosen method performs the structural decomposition process. All required
data (excitation functions, description of control memory, etc.) are stored using
an intermediate format (see Appendix A). Such a format may be the basis for
various ways of CMCU description; for example, the Verilog or VHDL code may
be very easily produced. The last module of ATOMIC (mcu2verilog) generates
direct description of the CMCU using Verilog HDL. The description is ready for
logic synthesis and implementation.
ATOMIC was prepared as a module-based tool in order to improve its per-
formance. At each stage, the description of the prototyped controller may be
changed. Furthermore, once prepared OLC description may be commonly used as
the input for all eight implemented synthesis methods.
A very important feature is the possibility of using external tools for further
analysis. Each excitation function that is produced by the olc2mcu module may
be decomposed with other systems that are based on functional decomposition
like SIS, DEMAIN, etc. Therefore, both structural and functional decomposition
can be used in the prototyping flow of the CMCU. The control unit is initially
decomposed with structural procedures, and then excitation functions produced
for internal blocks of the CMCU are optimized with functional decomposition.
Such a solution saves the structure of the CMCU, which leads to the possibility of
partial reconfiguration of the controller (see Chapter 8). Obviously, to perform this
task, the excitation function has to be converted into a proper format; however, it
is a relatively easy process. It also has to be pointed out that the idea of joined
structural and functional decomposition in the prototyping flow of CMCUs will
be investigated in the future in greater detail and is not within the scope of this
dissertation.
7.2. Realisation of ATOMIC
ATOMIC was implemented using the standard C++ language (Kernighan and
Ritchie, 1977; Stroustrup, 1986). It does not include any additional libraries or
external units, thus it can be easily compiled under any system and any platform
that supports the C++ language (Windows, Solaris, Unix, etc.).
From the mathematical point of view, ATOMIC is based on graph theory and
implements methods of searching graphs (Berge, 1973; Wilson, 1979; Harary, 1994;
Aho et al., 1974). Both the fc2olc and olc2mcu modules read the input data and
dynamically form an adequate oriented graph that contains all information about
the flow-chart (or the OLC flow-chart). Such a graph can be easily searched using
standard graph searching methods. ATOMIC operates on a modified algorithm of
the DFS (Cormen et al., 2001). Such a method was used during the realisation of
the modules fc2olc and olc2mcu as well. The original algorithm was modified and
fitted to perform adequate operations. In the case of the unit fc2olc, operational
vertices are replaced with adequate OLC during the searching process. Here a
traditional stack is used. In the case of the olc2mcu module, adequate excitation
functions are formed during the search operation. It has to be pointed out that
7.2. Realisation of ATOMIC 95
functions for all the modules (counter, register, function decoder) are formed dur-
ing the same step, thus the whole process is executed only once. The complexity of
implemented algorithms is linear; Θ(|V |+ |E|), where |V | means the total number
of vertices (this number corresponds to the total number |B| of all operational
vertices in the initial flow-chart) and |E| is the total number of edges between
vertices.
The presented tool was prepared to aid the prototyping process of CMCUs
and applied to perform all experiments. The next chapter shows detailed results
of the implementation of the control unit that was prepared with all eight methods
shown in the dissertation. The analysis of the results of experiments is presented
as well. Moreover, detailed description of ATOMIC including intermediate formats
and command-line parameters can be found in Appendix A.
Chapter 8
RESULTS OF EXPERIMENTS
This chapter presents the results of experiments that were conducted to check the
effectiveness of the proposed synthesis methods. Moreover, benefits of partial re-
configuration will be shown as well. The first section deals with results gained dur-
ing the implementation of the prepared CMCUs in an FPGA. The achieved values
are analysed in detail and finally concluded with an attempt to select the proper
synthesis method depending on the initial flow-chart description. Experiments of
partial reconfiguration of CMCUs are shown in the second section. Finally, the
last section concludes the achieved results.
8.1. Results of experiments of investigations with the prepared
synthesis methods
This section presents the results of experiments that were achieved during the im-
plementation of CMCUs designed according to the rules shown in Chapters 4 and
5. First, there is a short introduction to the library of test modules (benchmarks)
that were used for the verification of the prepared synthesis methods. Further,
the simulation of the functionality of the CMCU will be presented. The third
subsection shows the main results of experiments made to check the effectiveness
of the proposed synthesis methods. Finally, all the achieved values are analysed
and concluded.
8.1.1. Library of test modules
All synthesis methods presented in Chapters 4 and 5 were verified with over 100
test modules (benchmarks). Each test module was prepared in a text format that
contains a description of the CMCU as a flow-chart (see Appendix A for details).
The library of test modules was initially created at Doneck University (Ukraine).
Now it is being expanded at the University of Zielona Góra. Most of the prepared
benchmarks describe a hypothetical flow-chart, although some of them contain a
description of real devices (for example, the traffic light controller, systems with
arithmetic operations, etc.).
8.1. Results of experiments of prepared methods 97
8.1.2. Verification of the prepared methods
The verification of the functionality of the prepared CMCUs was performed with
a software simulator (here, Active HDL from Aldec and ModelSim from Mentor
Graphics were used). The simulation was performed for each synthesis method.
The verification of each module was similar. First, a Verilog code was generated
for each synthesis method using ATOMIC (see Chapter 7). Next, the controller
was simulated and its functionality was verified.
Fig. 8.1: Flow-chart Γ4
The verification process will be illustrated with an example. There is an
example flow-chart Γ4 of the CMCU U9 shown in Fig. 8.1. The flow-chart Γ4
contains |B|=10 operational and |X|=2 conditional vertices. There are |Y |=5
microoperations that are generated by the controller. Let us design U9 as the
CMCU with sharing codes. The set of operational linear chains contains |C|=3
98 8. Results of experiments
OLCs: α1=〈b1, b2, b3〉, α2=〈b4, . . . , b7〉 and α3=〈b8, b9, b10〉 . Therefore, there are
R2=2 variables Q={q2,q1} required to keep the state of the controller. The longest
OLC is α2 and it contains M1=4 operational blocks. It means that two variables
T={t1,t2} will form the excitation function for the counter. Both codes gener-
ated by the counter A={a1,a2} and by the register Q={q1,q2} are encoded using
two variables, thus the address of each microinstruction has the width equal to
R3=R1+R2=4. According to (5.1), such an address is formed as the concatenation
of codes generated by the counter and by the register: A(bt)=K(αg)*K(bt).
Figure 8.2 shows the results of software verification of U9 designed as the
CMCU with sharing codes. Here, the microoperation y0 increments the counter
and it is a feed-back signal for the counter and for the register. The additional
variable yK is set to 1 if the final vertex of the flow-chart is reached.
Fig. 8.2: Results of the simulation of the CMCU U9
The counter and the register are active on the positive edge of the clock signal.
Dedicated memory blocks that were used for the implementation of control memory
are synchronous and the clock signal ought to be connected as well. Because the
variable y0 drives the counter and the register, microinstructions are generated
when the negative edge of the clock signal appears. Such a solution ensures proper
functionality of the CMCU.
8.1.3. Results of experiments
This section presents results that were achieved during logic synthesis and imple-
mentation of CMCUs. All synthesis methods were verified with over 70 bench-
marks. Additionally, there was an FSM model prepared for each test. The au-
tomaton was created according to the rules presented in (IEEE, 2001; Thomas and
Moorby, 2002; Chmielewski and Węgrzyn, 2006). It should be pointed out that all
FSMs were prepared in such a way that during implementation all microoperations
were realised with dedicated memory blocks of the FPGA.
8.2. Analysis of the results of experiments 99
The prototyping process for each benchmark was similar. Based on the flow-
chart description (.fc file), the controller was structurally decomposed with all
eight synthesis methods presented in the dissertation. Additionally, there was an
equivalent FSM produced. The achieved Verilog codes were finally synthesised
and implemented with the Xilinx XST tool.
Table 8.1 presents average results of CMCU implementation designed with
the particular synthesis method in comparison to the FSM and the traditional
CMCU with mutual memory. To clarify the presentation of the results, detailed
values that were achieved during the experiments are presented in Appendix B. As
the destination, the FPGA XC2VP30 (Xilinx Virtex-II Pro family) was selected.
The device contains 27392 flip-flops, 27392 LUTs (13696 Slices) and 136 dedicated
memory blocks (block-RAMs). The device was selected because of its structure
(it can be partially reconfigured) and its availability at the University of Zielona
Góra.
8.2. Analysis of the results of experiments
Detailed analysis of the results presented in Tables 8.1 and B.1 (Appendix B)
shows that the number of logic blocks that are required for the implementation of
the controller in the FPGA is strongly tied with the number of microinstructions
that are held in control memory.
From Table B.1 presented in Appendix B we can see that, in the case of rela-
tively small devices where control memory may be implemented with one dedicated
memory block, the realisation of the controller as the CMCU USD with sharing
codes and a function decoder gives the best results. First, it requires on average
the smallest number of logic blocks out of all presented methods. Furthermore,
control memory is implemented with one dedicated memory block, thus there is
no need for the application of the address converter. Obviously, the application
of the function decoder is optional – its usage reduces the number of logic blocks
but increases the number of dedicated memories.
According to (5.14), if the total number of bits generated by the register
and the counter exceeds the width of the microinstruction address, the CMCU
UCD with an address converter and a function decoder ought to be selected (see
Chapter 5). It should be pointed out that results gained during the realisation
of the controller as the CMCU USD are similar to the values achieved for the
CMCU UCD. The number of logic blocks required for the implementation of
both controllers is almost the same. However, in the case of control units that
contain memories that ought to be decomposed (their volume exceeds the volume
of one dedicated memory block), the CMCU with an address converter requires
on average 46% fewer dedicated memory blocks than the CMCU with sharing
codes (see the benchmarks Test031, Test036, TestAW02, further presented in Table
B.1 in Appendix B). These results prove the effectiveness of the application of
the address converter in the case of CMCUs, where the address indicated by the
counter and by the register is wider than the minimum number of bits needed for
microinstruction addressing.
100 8. Results of experiments
T
ab
.8
.1
:
A
ve
ra
ge
re
su
lt
s
of
ex
pe
ri
m
en
ts
F
P
G
A
D
es
ig
ni
ng
m
et
ho
d
re
so
ur
ce
s
F
SM
M
M
F
D
O
I
O
D
SC
SD
C
A
C
D
Sl
ic
es
10
0%
91
%
73
%
76
%
60
%
57
%
51
%
53
%
5
0%
C
om
pa
ri
so
n
F
F
10
0%
10
0%
10
5%
10
2%
10
8%
12
0%
12
7%
12
2%
12
5%
to
th
e
F
SM
LU
T
s
10
0%
91
%
71
%
78
%
60
%
57
%
50
%
54
%
4
9%
B
R
A
M
s
10
0%
10
0%
13
6%
10
2%
12
6%
27
9%
32
0%
15
1%
18
6%
C
om
pa
ri
so
n
Sl
ic
es
11
0%
10
0%
82
%
84
%
68
%
62
%
5
7%
60
%
5
7%
to
th
e
C
M
C
U
F
F
10
0%
10
0%
10
5%
10
2%
10
8%
12
0%
12
7%
12
2%
12
5%
w
it
h
m
ut
ua
l
LU
T
s
11
0%
10
0%
81
%
86
%
68
%
63
%
5
7%
62
%
5
7%
m
em
or
y
B
R
A
M
s
10
0%
10
0%
13
6%
10
2%
12
6%
27
9%
32
0%
15
1%
18
6%
F
SM
–
re
al
is
at
io
n
of
th
e
co
nt
ro
lle
r
as
th
e
F
SM
;
M
M
–
re
al
is
at
io
n
of
th
e
co
nt
ro
lle
r
as
th
e
C
M
C
U
w
it
h
m
ut
ua
l
m
em
or
y;
F
D
–
re
al
is
at
io
n
of
th
e
co
nt
ro
lle
r
as
th
e
C
M
C
U
w
it
h
a
fu
nc
ti
on
de
co
de
r;
O
I
–
re
al
is
at
io
n
of
th
e
co
nt
ro
lle
r
as
th
e
C
M
C
U
w
it
h
ou
tp
ut
id
en
ti
fic
at
io
n;
O
D
–
re
al
is
at
io
n
of
th
e
co
nt
ro
lle
r
as
th
e
C
M
C
U
w
it
h
ou
tp
ut
id
en
ti
fic
at
io
n
an
d
a
fu
nc
ti
on
de
co
de
r;
SC
–
re
al
is
at
io
n
of
th
e
co
nt
ro
lle
r
as
th
e
C
M
C
U
w
it
h
sh
ar
in
g
co
de
s;
SD
–
re
al
is
at
io
n
of
th
e
co
nt
ro
lle
r
as
th
e
C
M
C
U
w
it
h
sh
ar
in
g
co
de
s
an
d
a
fu
nc
ti
on
de
co
de
r;
C
A
–
re
al
is
at
io
n
of
th
e
co
nt
ro
lle
r
as
th
e
C
M
C
U
w
it
h
an
ad
dr
es
s
co
nv
er
te
r;
C
D
–
re
al
is
at
io
n
of
th
e
co
nt
ro
lle
r
as
th
e
C
M
C
U
w
it
h
an
ad
dr
es
s
co
nv
er
te
r
an
d
a
fu
nc
ti
on
de
co
de
r.
8.2. Analysis of the results of experiments 101
The CMCU USD with an address converter and a function decoder consumes
the smallest number of logic blocks of the target FPGA in the case of controllers
where control memory is decomposed (which means that more than one BRAM
are used). Such a realisation requires only 49% LUTs in comparison to the FSM
and 57% in comparison to the CMCU with mutual memory. This means that
the proposed synthesis method with an address converter and a function decoder
reduces the number of logic blocks that are used for the implementation of the
controller over two times in comparison to the traditional automaton. On the
other hand, there are more dedicated memory blocks required for the realisation
of the control unit. The number of dedicated memory blocks increases on average
by 86%, therefore the CMCU UCD is the best solution for the implementation of
the controller in FPGAs that contain enough dedicated memory blocks.
Finally, from Table B.1, we can see that among controllers that produce more
than 150 microinstructions the CMCU UOD with output identification and a func-
tion decoder gives the best results. In this case, such realisation on average re-
quires the smallest number of dedicated memory blocks and usually the fewest
logic blocks as well.
Concluding, it should be pointed out that the performed experiments proved
the effectiveness of the proposed synthesis methods. The criteria of all experi-
ments were to reduce the number of logic blocks that are required for controller
implementation. Detailed analysis of the results of experiments showed that the
selection of the proper synthesis method may be tied with the structure of the
CMCU. There are three typical situations when the proper synthesis algorithm
can be proposed:
• In the case of relatively small systems (where the number of microinstruc-
tions does not exceed 150 and control memory can be implemented with
one dedicated memory block), the CMCU with sharing codes and a function
decoder seems to be the best solution. However, it should be pointed out
that such realisation consumes at least two dedicated memory blocks of the
FPGA. Therefore, if the number of available dedicated memory blocks is
limited, the method with output identification should be used.
• In the case of controllers where the volume of control memory exceeds the
volume of one dedicated memory block and the total number of microin-
structions is fewer than 150, the CMCU with an address converter and a
function decoder gives the best results.
• In the case of controllers where the total number of microinstructions exceeds
150, the CMCU with output identification and a function decoder ought to
be selected.
102 8. Results of experiments
8.3. Results of experiments of partial reconfiguration of CMCUs
implemented in the FPGA
This section presents the analysis of the results of partial reconfiguration of CM-
CUs implemented in the FPGA. Detailed values gained during the experiments
are shown in Appendix B. The partial reconfiguration process of CMCUs was
performed on the XC2VP30 device. As was already mentioned in Chapter 6, such
an FPGA contains 136 dedicated memory blocks organized in eight columns. Each
column can be configured with 64 frames independently (one frame configures one
line (INIT) in all BRAMs that belong to the column).
The analysis of the results of the experiments showed that the way of realising
the control memory of the CMCU in the FPGA is very important. Figure 8.3
presents three variants of the implementation of a hypothetical CMCU where
two microinstructions A and B are partially reconfigured. In the first mode, both
microinstructions are implemented in separate BRAMs that are placed in the same
column. Both A and B are located in the line INIT_00 of its BRAM. Therefore,
during partial reconfiguration, only one frame will be sent to the FPGA. Such a
frame covers lines of both BRAMs, because they are situated in one column. In the
second mode, both A and B are implemented in the same BRAM. However, there
are two lines required (A is initialized with INIT_00 while B with INIT_01 ).
It means that two frames are required for reconfiguration. In the third mode,
A and B are implemented in two BRAMs. Now it is not important that both
microinstructions are configured with the same line (INIT_00 ), because they are
located in different columns. Therefore, two frames are sent during reconfiguration.
Table 8.2 shows that the best results were achieved during the implementation
of the first variant of the controller. Despite the fact that two lines are modified,
only one frame is sent to reconfigure the device and the original bit-stream was
reduced over 500 times. Very interesting results were achieved during the imple-
mentation of the two remaining variants. Both versions required two frames for
partial reconfiguration. In the case of the second variant, where both microin-
structions were located in the same BRAM, the bit-stream was reduced over 400
times. A worse gain was achieved in the third mode, where A and B were realised
with BRAMs located in different columns.
Detailed analysis of the performed experiments indicates that the reduction
of the size of the original bit-stream strongly depends on the placement of control
memory in dedicated memory blocks of an FPGA. The best gain is reached in the
case of the implementation of control memory with BRAMs located in the same
column. Partial reconfiguration of such an organization requires the least amount
of configuration frames. The experiments showed that even the replacement of the
content of control memory that was implemented with 13 BRAMs (organized in
one column) permits to reduce the original bit-stream by over 50 times. Further-
more, the worst results were achieved in the case of the implementation of control
memory with BRAMs located in separate columns. Partial reconfiguration of con-
trol memory that was realised with 13 BRAMs placed in eight different columns
reduces the size of the bit-stream by over eight times.
8.3. Results of experiments of partial reconfiguration 103
T
ab
.8
.2
:
R
es
ul
ts
of
th
re
e
va
ri
an
ts
of
th
e
re
co
nfi
gu
ra
ti
on
of
tw
o
m
ic
ro
in
st
ru
ct
io
ns
V
ar
ia
nt
M
od
ifi
ed
M
od
ifi
ed
M
od
ifi
ed
M
od
ifi
ed
Si
ze
of
pa
rt
ia
l
R
ed
uc
ti
on
R
ed
uc
ti
on
B
R
A
M
s
lin
es
co
lu
m
ns
fr
am
es
bi
t-
st
re
am
[b
yt
es
]
(%
of
or
ig
in
al
)
(t
im
es
sm
al
le
r)
1
2
2
1
1
26
96
0,
19
%
52
7
2
1
2
1
2
35
20
0,
24
%
41
7
3
2
2
2
2
43
60
0,
30
%
33
3
A
1s
t  
va
ria
nt

B
A
2n
d  
va
ria
nt

B
A
3r
d  
va
ria
nt
 B
R
ed
uc
tio
n:

52
7 
tim
es

R
ed
uc
tio
n:

41
7 
tim
es

R
ed
uc
tio
n:

33
3 
tim
es

IN
IT
_0
0
IN
IT
_0
0
IN
IT
_0
0
IN
IT
_0
1
IN
IT
_0
0
IN
IT
_0
0
F
ig
.
8.
3:
T
hr
ee
va
ri
an
ts
of
th
e
re
co
nfi
gu
ra
ti
on
of
tw
o
m
ic
ro
in
st
ru
ct
io
ns
104 8. Results of experiments
Concluding, it should be pointed out that partial reconfiguration of composi-
tional microprogram control units implemented on the FPGA reduces the size of
the original bit-stream even by over 500 times. In the case of controllers, where
control memory ought to be decomposed into more than one BRAM, the best
gain is reached during the realisation of memory with blocks located in the same
column. The placement of each BRAM can be easily modified with tools delivered
from Xilinx, which additionally checks routings and timing paths. Detailed values
that were gained during partial reconfiguration of CMCUs implemented on the
FPGA are presented in Appendix B.
Chapter 9
CONCLUSIONS
The development of microelectronics benefited in the appearance of the system-on-
a-programmable-chip that can be used for the implementation of complex digital
systems. The main part of SoPCs is an FPGA. Such a device contains logic blocks
for the implementation of combinational logic and dedicated memory blocks that
offer an additional area for data storage. Therefore, traditional methods of design
prototyping evaluate. The aim of such modifications is the reduction of the number
of logic blocks of the FPGA. This task is very often solved by the application of
design decomposition.
One of main blocks of the digital system is the control unit. It can be designed
as a compositional microprogram control unit, where the controller is decomposed
into two main parts. The first one is in charge of proper address formation of
microinstructions that are kept in control memory. The main advantages of such
realisation is the possibility of the implementation of the controller using logic
elements and dedicated memory blocks offered by FPGAs. Moreover, thanks to
its structure, part of the CMCU that is already implemented in the FPGA can be
easily reconfigured.
Structural synthesis of compositional microprogram control units was the
main scope of the dissertation. Six new design methods of the CMCU were pro-
posed. The aim of all methods is to reduce the number of logic blocks that are
required for the implementation of the controller in the FPGA. The prepared al-
gorithms were divided into two groups. The first one deals with the CMCU with
mutual memory, where the microinstructions address is used for the recognition
of internal states of the controller. The second group is oriented towards the for-
mation of the microinstruction’s address by codes generated by the register and
by the counter. Additionally, the CMCU with an address converter permits to
decrease the volume of control memory to the required minimum.
The effectiveness of the presented methods was proved thanks to the prepared
design aiding system. ATOMIC performs automatic structural decomposition of
CMCUs. The output code (generated in Verilog HDL) is ready for further logic
synthesis and implementation. The modular structure of the designed system
permits to modify the structure of the CMCU at any level of the prototyping
process.
The second task of the dissertation was partial reconfiguration of CMCUs im-
plemented in the FPGA. The designer can replace only the control memory content
of the controller, which already resides in the programmable device. Therefore,
106 9. Conclusions
there is no need to repeat the whole CMCU prototyping process for each version
of the control unit. The full design process is done only once. For further versions
of the CMCU, only the reduced prototyping flow ought to be performed. There-
fore, partial reconfiguration reduces the size of data that are sent to the FPGA.
Additionally, the configuration time is shorter, which results in the reduction of
the risk of errors that may occur during FPGA configuration.
The most important innovations introduced in the dissertation include the
following:
• preparation of new synthesis methods of CMCUs focused on the reduction of
the number of logic blocks that are required for implementing the controller
in the FPGA,
• preparation of new synthesis methods of CMCUs focused on the reduction of
the number of dedicated memory blocks that are required for implementing
the controller in an FPGA,
• design of a dedicated tool that aids the prototyping process of CMCUs im-
plemented in the FPGA,
• preparation of a new CMCU prototyping flow, based on partial reconfigura-
tion of controllers implemented in the FPGA,
• verification of the effectiveness of the prepared methods by adequate exper-
iments (implementation of benchmarks in the FPGA).
The performed experiments proved the effectiveness of the proposed synthesis
methods. The criteria of the whole research were to reduce the number of logic
blocks that are required for controller implementation. Detailed analysis of the
results of experiments showed that the proper synthesis method may be chosen at
the specification stage.
There are three main future directions of the presented work. The first
one is an attempt at combining both CMCU decomposition methods. Excita-
tion functions of internal blocks of CMCUs that are formed during structural de-
composition are further decomposed with commercial synthesis tools. Therefore,
there is an idea to apply external functional decomposition. This task is suc-
cessfully developed by academic organizations for both CPLDs (Kania, 2004; Ka-
nia, 1999; Devadas et al., 1988) and FPGAs (Selvaraj et al., 2005; Jóźwiak and
Chojnacki, 2003; Rawski et al., 2003). Additionally, functional decomposition can
be applied to the reduction of the volume of controller memory. This idea has
already been developed and was outlined in (Wiśniewska and Wiśniewski, 2005)
and (Wiśniewska et al., 2005).
The second aim of future work is the improvement of the encoding of internal
states in the case of CMCUs based on sharing codes. Such controllers contain a
simplified automaton for microinstruction addressing which can be optimized with
algorithms like NOVA or JEDI (Sentovich et al., 1992b).
107
The proposed synthesis methods of the CMCU can be expanded to improve
the safety of the controller (Halang and Krämer, 1994; Adamski, 1999; Halang
and Krämer, 1992; Halang and Adamski, 1997; Adamski et al., 2007; Bukowiec
and Węgrzyn, 2005). There are ideas to implement the combinational circuit
with dedicated memory blocks of the FPGA. Additionally, such realisation per-
mits partial reconfiguration of the combinational circuit. Therefore, this means
that the functionality of the controller can be easily modified. The presented
idea has already been developed as a joint project with Professor W. A. Halang
(FernUniversität, Hagen).
The presented work benefited in publications in journals (ten international
and three domestic ones). Moreover, the conducted investigations were presented
at various workshops and conferences (sixteen international and five domestic
ones). Totally, there were 34 articles published that directly refer to the disser-
tation. The presented solutions were honoured with four awards (two submitted
at the OWD and KNWS conferences, and two team awards of the Rector of the
University of Zielona Góra). The work was realised as a part of the Integrated
Regional Operational Programme and as a part of the grant no. 3 T11C 046 26
of the Polish Ministry of Science and Higher Education.
Appendix A
DESCRIPTION OF ATOMIC
The consecutive sections below present a detailed description of ATOMIC. First,
the structure of ATOMIC is presented. In the second section, formats of input
and output data are shown. Finally, the last section describes ATOMIC command-
parameters and switches. The overview of the tool is shown in Chapter 7.
A.1. Structure of ATOMIC
Figure A.1 presents the idea of the prototyping flow of CMCUs. Such a flow is
performed by ATOMIC’s modules.
Fig. A.1: Detailed structure of ATOMIC
Based on the flow description, the module fc2olc generates the .olc file. Such
a file contains the description of operational linear chains. The operation is per-
formed in five main steps:
1. Reading and analysis of input data.
2. Searching for OLCs.
3. Searching for all inputs of each OLC.
4. Generating output data.
5. Writing the output file.
Next, the main structural decomposition is performed by the olc2mcu unit.
Here one of the eight CMCU prototyping methods is used (command-line switches
are described in Section A.3). Structural synthesis is conducted in five main steps:
A.2. Input and output data formats of ATOMIC 109
1. Reading and analysis of input data.
2. Encoding OLCs and their components (with the selected method).
3. Expanding the memory content (conversion to the binary format).
4. Creating excitation functions.
5. Writing the output file.
Structural synthesis performed by ATOMIC was improved, thus the transition
table of the CMCU is not formed. Due to the structure of ATOMIC, excitation
functions are generated directly from the graph that represents the OLC flow-chart.
Therefore, the synthesis process is faster and, additionally, less computational
memory is used (there is no need to represent the transition table in the memory).
Finally, the description of the CMCU is converted to the Verilog language
with the mcu2verilog module. This process is performed in only two stages. The
input file is read, and then it is converted to the Verilog-HDL format. Finally,
the CMCU is written as the Verilog source code and it is ready for further logic
synthesis and implementation.
It should be pointed out that each operation of the execution of each presented
module is logged into a log-file. Such a log-file is additionally written to the screen
(however, it can be turned off, see Section A.3).
The next section presents a detailed input and output data format of each
ATOMIC module.
A.2. Input and output data formats of ATOMIC
This section describes all data formats that are used by ATOMIC modules. At
the beginning, the input format of ATOMIC is shown. It is also the input for
the module fc2olc. The second subsection presents all intermediate formats that
are exchanged by ATOMIC modules. Finally, the last subsection deals with the
output data of ATOMIC that is generated by the mcu2verilog module.
A.2.1. Input data format of ATOMIC
The input for ATOMIC is specified as a text-file that contains the description of
the flow-chart (.fc) file. Such a description was initially proposed in (Baranov,
1994). The input file is divided into two sections: flow-chart description and
microinstruction definition.
110 A. Appendix A
The first part contains the description of the flow-chart structure. Figure A.2
shows the graphic and text version of the CMCU U9. Each line corresponds to
one block of the flow-chart. The line must begin with the number of the vertex.
Next, the symbol of the vertex appears, where
• S – start – initial vertex of the flow-chart. After this symbol there is a
number of the next vertex of the flow-chart.
• O – operational vertex. After the symbol, the name of a pseudo-microinstru-
ction is declared (i.e., Y1). Such a pseudo-microinstruction is defined in the
second part of the file. Then, there is a number of the next block where the
transition should be performed.
• X – conditional vertex of the flow-chart. After the symbol there is the
definition of the block name (x1, x2, etc.). Next, the numbers of target
vertices appear – the first number means the vertex where the transition
should be executed if the condition is true while the second one shows the
destination if it is false.
• E – end – final vertex of the flow-chart.
Fig. A.2: Graphic and text description of an exemple CMCU U9
A.2. Input and output data formats of ATOMIC 111
The second part of the file contains the definition of pseudo-microinstructions
that were already declared by operational vertices of the flow-chart. There is a
name of the pseudo-microinstruction that is followed by the list of microoperations
that are executed in such a pseudo-microinstruction.
The description of microinstructions is compacted, thus two or more different
vertices may define the same pseudo-microinstruction. Of course, such compacted
information is expanded in further steps executed by ATOMIC, and finally each
vertex of the flow-chart will execute one microinstruction. For example, in the
CMCU U9, there are two pseudo-microctions named Y1 and Y2. Here Y1 consists
of the microoperations y1 and y3 that are executed in the vertices b1 and b5, while
Y2 includes y2 that is executed in the vertex b4.
A.2.2. Intermediate data formats of ATOMIC
ATOMIC consists of three modules, thus each module has its own data format.
The description of the flow-chart presented in the previous subsection is either
the input for ATOMIC and for the module fc2olc. Such a module generates an
intermediate data format (.olc file) that contains the description of the set of
operational linear chains. Of course, additional information about the content of
control memory is specified. The structure of the .olc file is similar to the .fc
structure. The set of operational linear chains is also described as a flow-chart,
although the meaning of the symbols is now the following:
• S – start – initial vertex of the OLC flow-chart. After this symbol there is
the number of the next vertex of the OLC flow-chart.
• O – main input and description of the OLC. After the symbol there are four
numbers specified, separated by commas: the number g of the OLC αg ∈ C,
the number t of the input Itg in the OLC αg ∈ C, the position of the vertex
bi in the OLC αg ∈ C, the number of the next vertex bj in the OLC flow-
chart. Next, besides colons, there is a number of microinstructions that are
executed inside the chain (sum of all operational vertices inside the OLC
αg ∈ C). Finally, there are pseudo-microinstructions specified, separated by
commas. The meaning of pseudo-microinstructions is the same as explained
in the .olc file description.
• I – input of the OLC (appears only if the OLC has more than one input).
After the symbol there are four numbers specified: the number of the OLC,
the number of the input in the OLC, the position in the OLC, the number
of the next vertex in the OLC flow-chart.
• X – conditional vertex of the OLC flow-chart. After the symbol there is a
definition of the name of the block (x1, x2, etc.). Next, numbers of target
vertices appear – the first number means the vertex where the transition
should be executed if the condition is true while the second one shows the
destination if the condition is false.
• E – end – final vertex of the OLC flow-chart.
112 A. Appendix A
An example .olc file for the CMCU U9 is presented in A.3.
Fig. A.3: OLC description of the CMCU U9
The .olc file contains all necessary data for the main synthesis of the CMCU.
The file is read by the module olc2mcu, which produces the description of the
CMCU (.mcu file) with one (of eight) selected synthesis method. There are three
main sections specified in the .mcu file: Module name and parameters, Excitation
functions, Control memory content. The first section contains information about
the CMCU (its name, the number of inputs, outputs, etc). The second section de-
fines excitation functions for the counter, the register and/or the function decoder.
The last section contains the description of control memory (specified as a table
of microoperations). In the case of applying the circuit of the function decoder or
the address converter, additional description of such a block is written as well. An
example .mcu file content for the CMCU U9 realised as a controller with mutual
memory is presented in A.1.
Lst. A.1: MCU description of the CMCU U9
//Module name and parameters :
Module=CMCU_U9_mm
Method=1
Inputs=2
Outputs=6
Microinstructions=6
InputNames=x1 , x2
OutputNames=y0 , y1 , y2 , y3 , y4 ,yK
A.2. Input and output data formats of ATOMIC 113
// Exc i t a t i on f unc t i on s :
Counter=3
t1=0
t2=a1 ∗ ! a2 ∗ ! a3∗x1
t3=a1 ∗ ! a2 ∗ ! a3 ∗ ! x1∗x2
//Contro l Memory content :
Address=3
O0=010100
O1=101000
O2=010100
O3=000110
O4=001000
O5=100111
DefaultMemoryValue=000000
Finally, the code in Verilog HDL is generated. Such a code is the output file
produced by ATOMIC and it is ready for further logic synthesis and implementa-
tion. Detailed description of the Verilog code generated by ATOMIC is shown in
the next subsection.
A.2.3. Output data format of ATOMIC
ATOMIC produces the code in Verilog HDL as the output. Depending on the
selected method, the output file is different. Listing A.2 presents an example
Verilog code of the CMCU U9 generated for the method with mutual memory.
Lst. A.2: Verilog code for the flow-chart Γ9
//−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
// Descr ip t i on o f CMCU
//−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
module CMCU_U9_mm (y1 , y2 , y3 , y4 ,yK, c lk , r e s e t , x1 , x2 ) ;
output y1 , y2 , y3 , y4 ,yK;
input c l k ;
input r e s e t ;
input x1 , x2 ;
wire [ 3 : 1 ] t ; // e x c i t a t i o n func t i on f o r counter
wire [ 3 : 1 ] a ; // address genera ted by counter
wire y0 ;
CC combinat iona l ( t , x1 , x2 , a ) ;
CT counter ( a , c lk , t , r e s e t , y0 ) ;
CM memory ({y0 , y1 , y2 , y3 , y4 ,yK} ,~ clk ,~yK, r e s e t , a ) ;
endmodule
//−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
// Descr ip t i on o f COMBINATIONAL CIRCUIT
//−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
module CC ( t , x1 , x2 , a ) ;
output [ 3 : 1 ] t ;
input [ 3 : 1 ] a ;
114 A. Appendix A
input x1 , x2 ;
assign t [ 1 ]=0 ;
assign t [2 ]= a [1]&~a [2]&~a [3 ]& x1 ;
assign t [3 ]= a [1]&~a [2]&~a [3]&~x1&x2 ;
endmodule
//−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
// Descr ip t i on o f COUNTER
//−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
module CT (q , c lk , data , r e s e t , load ) ;
output reg [ 3 : 1 ] q ;
input r e s e t , load , c l k ;
input [ 3 : 1 ] data ;
always @(posedge r e s e t or posedge c l k )
begin
i f ( r e s e t == 1 ’ b1 ) q = {3{1 ’ b0 }} ;
else i f ( load == 1 ’ b1 ) q = data ;
else q=q+1;
end
endmodule
//−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
// Descr ip t i on o f CONTROL MEMORY
//−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
module CM (y , c lk , oe , r e s e t , address ) ;
output reg [ 6 : 1 ] y ;
input [ 3 : 1 ] address ;
input c lk , oe , r e s e t ;
// s yn t h e s i s a t t r i b u t e bram_map o f CM i s yes
always @(posedge c l k )
begin
i f ( r e s e t ) y=0;
else i f ( oe )
case ( address )
0 : y=6’b010100 ;
1 : y=6’b101000 ;
2 : y=6’b010100 ;
3 : y=6’b000110 ;
4 : y=6’b001000 ;
5 : y=6’b100111 ;
6 : y=0;
7 : y=0;
default : y=6’b000000 ;
endcase
end
endmodule
A.3. Arguments of ATOMIC modules 115
A.3. Arguments of ATOMIC modules
This section shows the usage of ATOMIC. All modules are independent of the
platform and the system. Therefore, ATOMIC tools operate under the command-
line interface. Each module ought to be run with at least one argument where
the name of the input file is specified. Furthermore, other arguments separated by
the space character may be specified. Consecutive subsections present available
arguments for each ATOMIC module.
A.3.1. Usage of the fc2olc module
The module is executed by the following command:
fc2olc inputfile.fc − o outputfile − q. (A.1)
Meaning of arguments:
• inputfile.fc is the input file name that contains the flow-chart description;
• -o outputfile (optional) is the name of the output file, where results are
written (the default output name is the same as inputname with the .olc
extension);
• -q (optional) runs in the quiet mode (the log is not displayed on the screen).
Example of execution: fc2olc test.fc -q.
A.3.2. Usage of the olc2mcu module
The module is executed by the following command:
olc2mcu inputfile.olc −m[1− 8]− o outputfile − q. (A.2)
Meaning of arguments:
• inputfile.olc is the name of the input file that contains the OLCs description;
• -m[No], where No is the number of the synthesis method (1 . . . 8):
1. CMCU with mutual memory.
2. CMCU with a function decoder.
3. CMCU with output identification.
4. CMCU with output identification and a function decoder.
5. CMCU with sharing codes.
6. CMCU with sharing codes and a function decoder.
7. CMCU with an address converter.
8. CMCU with an address converter and a function decoder;
116 A. Appendix A
• -o outputfile (optional) is the name of the output file where results are written
(the default output name is the same as inputname with the .mcu extension;
• -q (optional) runs in the quiet mode (the log is not displayed on the screen).
Example of execution: olc2mcu test.olc -m7 -o cmcu_conv.mcu.
A.3.3. Usage of the mcu2verilog module
The module is executed by the following command:
mcu2verilog inputfile.mcu − o outputfile − q. (A.3)
Meaning of arguments:
• inputfile.olc is the name of the input file that contains the description of the
CMCU in the .mcu format;
• -o outputfile (optional) is the name of the output file, where results are
written (the default output name is the same as inputname with the .v
extension;
• -q (optional) runs in the quiet mode (the log is not displayed on the screen).
Example of execution: mcu2verilog cmcu_conv.mcu -o cmcu.v.
Appendix B
DETAILED RESULTS OF EXPERIMENTS
Detailed results of experiments are presented in this appendix. The first section
analyses the effectiveness of the prepared synthesis methods, while the second
one shows the results of experiments performed with partial reconfiguration of
CMCUs.
B.1. Detailed results of experiments of the effectiveness of the
prepared synthesis methods
Table B.1 shows the results of implementation. To clarify the presentation of
the achieved results, there were only the most important benchmarks selected. It
should be pointed out that detailed analysis and calculation of the average results
of the whole collection of benchmarks are described in Chapter 8.
The table contains the following columns:
• Benchmark – name of the benchmark;
• X – number of inputs (logic conditions) of the controller;
• Y – number of outputs (microoperations) of the controller;
• M1 – length of the longest OLC (OLC that contains the most operational
vertices);
• M2 – total number of OLCs;
• M3 – number of microinstructions (total number of operational vertices);
• FPGA resources – name of the presented FPGA resource (slice, flip-flop,
look-up table or dedicated block-RAM);
• FSM – results achieved for the implementation of the controller as the FSM;
• MM – results achieved for the implementation of the controller as the CMCU
with mutual memory;
• FD – results achieved for the implementation of the controller as the CMCU
with a function decoder;
118 B. Detailed results of experiments
• OI – results achieved for the implementation of the controller as the CMCU
with output identification;
• OD – results achieved for the implementation of the controller as the CMCU
with output identification and a function decoder;
• SC – results achieved for the implementation of the controller as the CMCU
with sharing codes;
• SD – results achieved for the implementation of the controller as the CMCU
with sharing codes and a function decoder;
• CA – results achieved for the implementation of the controller as the CMCU
with an address converter;
• CD – results achieved for the implementation of the controller as the CMCU
with an address converter and a function decoder.
The method that gives the best results (the CMCU that requires the fewest
LUT elements of an FPGA) is marked in gray. Additionally, in the case of bench-
marks where the application of the address converter reduces the number of re-
quired dedicated memory blocks in comparison to traditional sharing codes, the
number of BRAMs used by the CMCUs USC and USD is marked in red.
B.1. Detailed results of implementation of prepared methods 119
T
ab
.B
.1
:
R
es
ul
ts
of
im
pl
em
en
ta
ti
on
–
P
ar
t
1
B
en
ch
m
ar
k
X
Y
M
1
M
2
M
3
F
P
G
A
F
SM
M
M
F
D
O
I
O
D
SC
SD
C
A
C
D
re
so
ur
ce
s
M
K
_
01
10
9
15
15
85
Sl
ic
es
45
48
33
26
22
19
20
19
20
F
F
7
7
7
7
7
8
8
8
8
L
U
T
s
80
83
59
50
40
34
35
34
35
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
02
10
9
10
15
89
Sl
ic
es
52
47
30
31
24
19
19
19
19
F
F
7
7
7
7
7
8
8
8
8
L
U
T
s
91
85
53
54
43
35
34
35
34
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
03
7
10
8
12
48
Sl
ic
es
29
24
19
20
17
13
14
13
14
F
F
6
6
6
6
6
7
7
7
7
L
U
T
s
48
41
31
36
30
22
26
22
26
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
04
9
9
10
13
56
Sl
ic
es
25
27
28
26
15
17
14
17
14
F
F
6
6
6
6
6
8
8
8
8
L
U
T
s
45
45
48
47
25
29
24
29
24
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
06
8
13
7
14
58
Sl
ic
es
32
31
21
20
16
14
13
14
13
F
F
6
6
6
6
6
7
7
7
7
L
U
T
s
55
53
37
36
29
26
23
26
23
B
R
A
M
s
1
1
2
1
2
1
2
2
3
120 B. Detailed results of experiments
T
ab
.B
.2
:
R
es
ul
ts
of
im
pl
em
en
ta
ti
on
–
P
ar
t
2
B
en
ch
m
ar
k
X
Y
M
1
M
2
M
3
F
P
G
A
F
SM
M
M
F
D
O
I
O
D
SC
SD
C
A
C
D
re
so
ur
ce
s
M
K
_
07
11
10
8
19
70
Sl
ic
es
30
52
34
46
31
22
19
22
19
F
F
7
7
7
7
7
8
8
8
8
L
U
T
s
46
90
57
83
55
41
35
41
35
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
08
7
9
7
12
49
Sl
ic
es
22
25
19
22
16
11
11
11
11
F
F
6
6
6
6
6
7
7
7
7
L
U
T
s
39
44
34
41
28
21
21
21
21
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
09
7
9
7
17
56
Sl
ic
es
34
33
28
28
23
18
17
18
17
F
F
6
6
6
6
6
8
8
8
8
L
U
T
s
63
58
48
50
39
33
33
33
33
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
10
13
12
7
21
92
Sl
ic
es
69
55
44
48
30
28
22
28
22
F
F
7
7
7
7
7
8
8
8
8
L
U
T
s
12
6
94
75
84
54
50
39
50
39
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
11
8
9
6
14
48
Sl
ic
es
32
24
21
24
19
14
17
14
17
F
F
6
6
6
6
6
7
7
7
7
L
U
T
s
49
43
36
45
32
27
31
27
31
B
R
A
M
s
1
1
2
1
2
1
2
2
3
B.1. Detailed results of implementation of prepared methods 121
T
ab
.B
.3
:
R
es
ul
ts
of
im
pl
em
en
ta
ti
on
–
P
ar
t
3
B
en
ch
m
ar
k
X
Y
M
1
M
2
M
3
F
P
G
A
F
SM
M
M
F
D
O
I
O
D
SC
SD
C
A
C
D
re
so
ur
ce
s
M
K
_
12
11
10
6
23
71
Sl
ic
es
55
56
38
36
27
24
20
24
20
F
F
7
7
7
7
7
8
8
8
8
L
U
T
s
10
0
96
67
64
49
43
35
43
35
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
14
8
9
5
17
53
Sl
ic
es
26
33
27
31
26
18
17
18
17
F
F
6
6
6
6
6
8
8
8
8
L
U
T
s
47
59
46
54
47
34
28
34
28
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
15
8
12
8
18
68
Sl
ic
es
37
41
29
25
22
18
15
18
15
F
F
7
7
7
7
7
8
8
8
8
L
U
T
s
49
71
50
48
40
31
28
31
28
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
17
9
9
7
19
72
Sl
ic
es
43
50
34
30
27
19
17
19
17
F
F
7
7
7
7
7
8
8
8
8
L
U
T
s
81
88
57
54
48
32
31
32
31
B
R
A
M
s
1
1
2
1
2
1
2
2
3
M
K
_
19
6
9
6
13
46
Sl
ic
es
21
28
19
21
13
10
10
10
10
F
F
6
6
6
6
6
7
7
7
7
L
U
T
s
38
49
33
38
24
18
18
18
18
B
R
A
M
s
1
1
2
1
2
1
2
2
3
122 B. Detailed results of experiments
T
ab
.B
.4
:
R
es
ul
ts
of
im
pl
em
en
ta
ti
on
–
P
ar
t
4
B
en
ch
m
ar
k
X
Y
M
1
M
2
M
3
F
P
G
A
F
SM
M
M
F
D
O
I
O
D
SC
SD
C
A
C
D
re
so
ur
ce
s
T
es
t0
14
2
7
14
3
30
Sl
ic
es
10
10
8
5
6
4
5
4
5
F
F
5
5
5
5
5
6
6
6
6
L
U
T
s
21
20
15
12
11
10
9
10
9
B
R
A
M
s
1
1
2
1
2
1
2
2
3
T
es
t0
17
2
6
4
3
11
Sl
ic
es
4
6
6
4
5
4
4
4
4
F
F
4
4
4
4
4
4
4
4
4
L
U
T
s
10
12
10
9
9
8
7
8
7
B
R
A
M
s
1
1
2
1
2
1
2
2
3
T
es
t0
24
6
17
3
5
9
Sl
ic
es
10
9
7
9
7
6
5
6
5
F
F
4
4
4
4
4
5
5
5
5
L
U
T
s
20
18
14
18
14
13
12
13
12
B
R
A
M
s
1
1
2
1
2
1
2
2
3
T
es
t0
25
65
18
14
54
15
3
Sl
ic
es
36
6
16
1
13
1
17
4
12
8
13
0
12
2
13
0
12
2
F
F
8
8
8
8
8
10
10
10
10
L
U
T
s
64
2
28
3
22
9
30
9
22
4
22
4
21
1
22
4
21
1
B
R
A
M
s
1
1
2
1
2
1
2
2
3
T
es
t0
27
35
15
2
16
29
10
0
Sl
ic
es
20
6
99
79
10
3
82
67
66
66
66
F
F
7
7
7
7
7
9
9
9
9
L
U
T
s
36
1
17
6
13
8
18
1
14
2
12
3
11
7
12
1
11
7
B
R
A
M
s
5
5
6
5
6
5
6
6
7
B.1. Detailed results of implementation of prepared methods 123
T
ab
.B
.5
:
R
es
ul
ts
of
im
pl
em
en
ta
ti
on
–
P
ar
t
5
B
en
ch
m
ar
k
X
Y
M
1
M
2
M
3
F
P
G
A
F
SM
M
M
F
D
O
I
O
D
SC
SD
C
A
C
D
re
so
ur
ce
s
T
es
t0
31
52
22
2
11
51
15
1
Sl
ic
es
27
6
12
0
11
1
13
4
95
99
10
6
99
10
5
F
F
8
8
8
8
8
10
10
10
10
L
U
T
s
48
2
20
8
19
9
24
0
17
2
17
3
18
7
17
3
18
6
B
R
A
M
s
7
7
8
8
9
13
14
8
9
T
es
t0
32
8
12
4
2
7
Sl
ic
es
5
6
5
6
5
4
5
4
5
F
F
3
3
5
3
5
3
4
3
4
L
U
T
s
11
13
11
13
11
9
10
9
10
B
R
A
M
s
1
1
2
1
2
1
2
2
3
T
es
t0
33
52
15
2
11
51
15
1
Sl
ic
es
27
9
11
9
11
2
13
4
95
99
10
6
98
10
5
F
F
8
8
8
8
8
10
10
10
10
L
U
T
s
48
6
20
7
20
0
24
0
17
2
17
3
18
7
17
2
18
6
B
R
A
M
s
5
5
6
5
6
9
10
6
7
T
es
t0
36
52
45
2
11
51
15
1
Sl
ic
es
27
9
11
9
11
2
13
4
95
98
10
5
98
10
5
F
F
8
8
8
8
8
10
10
10
10
L
U
T
s
48
6
20
7
20
0
24
0
17
2
17
3
19
0
17
2
18
6
B
R
A
M
s
13
13
14
14
15
27
26
14
15
T
es
tA
W
01
11
42
36
17
15
4
Sl
ic
es
63
33
26
27
19
21
22
21
22
F
F
8
8
8
8
8
11
11
11
11
L
U
T
s
11
4
58
49
51
38
41
39
41
39
B
R
A
M
s
2
2
3
2
3
5
6
3
4
124 B. Detailed results of experiments
T
ab
.B
.6
:
R
es
ul
ts
of
im
pl
em
en
ta
ti
on
–
P
ar
t
6
B
en
ch
m
ar
k
X
Y
M
1
M
2
M
3
F
P
G
A
F
SM
M
M
F
D
O
I
O
D
SC
SD
C
A
C
D
re
so
ur
ce
s
T
es
tA
W
02
11
42
68
17
18
8
Sl
ic
es
67
31
29
28
22
27
27
27
27
F
F
8
8
8
8
8
12
12
12
12
L
U
T
s
11
9
57
55
51
42
47
49
47
49
B
R
A
M
s
2
2
3
2
3
11
12
4
5
T
es
tA
W
03
11
42
36
19
18
8
Sl
ic
es
60
43
35
30
28
29
26
29
26
F
F
8
8
8
8
8
11
11
11
11
L
U
T
s
10
8
76
62
56
52
53
48
53
48
B
R
A
M
s
2
2
3
2
3
5
6
3
4
T
es
tA
W
05
12
44
70
17
18
8
Sl
ic
es
70
39
29
27
21
28
24
28
24
F
F
8
8
8
8
8
12
12
12
12
L
U
T
s
12
6
68
56
51
40
49
43
49
43
B
R
A
M
s
2
2
3
2
3
11
12
4
5
T
es
tA
W
06
14
44
70
17
18
8
Sl
ic
es
77
37
30
36
24
31
23
31
23
F
F
8
8
8
8
8
12
12
12
12
L
U
T
s
14
1
67
56
67
47
55
42
55
42
B
R
A
M
s
2
2
3
2
3
11
12
4
5
T
es
tA
W
07
14
44
70
18
20
7
Sl
ic
es
75
40
31
37
25
31
24
31
24
F
F
8
8
8
8
8
12
12
12
12
L
U
T
s
13
4
71
57
69
50
56
45
56
45
B
R
A
M
s
2
2
3
2
3
11
12
4
5
B.2. Detailed results of partial reconfiguration 125
B.2. Detailed results of partial reconfiguration of CMCUs
implemented on an FPGA
This section presents results gained during partial reconfiguration of CMCUs. To
verify the effectiveness of partial reconfiguration of controllers implemented in the
FPGA, there were four CMCUs selected:
• Test002, which contains control memory implemented with one BRAM,
• Lights, which contains control memory implemented with four BRAMs,
• Test027, which contains control memory implemented with five BRAMs.
• Test036, which contains control memory implemented with 13 BRAMs.
All benchmarks except Test002 were checked in three different modes. In
the first mode, all BRAMs were organized according to automatic placement and
routing executed by the Xilinx implementation tool. In this case, BRAMs are
usually mixed – some of them are placed in the same column, some of them are
located in different columns.
In the second mode, all BRAMs were placed in one column. Such an opera-
tion was performed with Xilinx FPGA Editor, which additionally checks routing
and timing paths. According to the configuration rules presented in Chapter 6,
the content of BRAMs located in the same column is modified with the same
configuration frames. It means that partial reconfiguration of such implemented
memories should require the fewest frames, and therefore the size of the partial
bit-stream should be reduced to the minimum.
The third mode organizes all BRAMs in different columns (except Test036,
which requires 13 BRAMs; this exceeds the total number of eight columns, thus
some memory blocks were located in the same column). It is expected that par-
tial reconfiguration of CMCUs realized in this mode should give the worst gain.
Control memory is implemented with BRAMs located in separate columns and
different frames are required for partial reconfiguration. Therefore, the reduction
of the bit-stream should be worst of all presented modes.
Table B.2 presents the results of experiments of partial reconfiguration of
CMCUs implemented in the FPGA. There are three main columns in the table:
• Benchmark – contains information about the benchmark (the name, the
number of microinstructions, the number of microoperations, the number of
BRAMs that are required for implementation and the size of the full bit-
stream).
• Modification – contains information about modifications that were done to
the benchmark (the number of modified microinstructions, the number of
BRAMs where microinstructions were modified, the total number of modified
lines (INITs), the number of columns that contain modified BRAMs).
126 B. Detailed results of experiments
• Results – contains information about the achieved results (the number of
different frames, the size of the partial bit-stream, the percentage of the
comparison between full and reduced bit-streams and the achieved reduction
– the number of times that the original bit-stream was reduced).
As expected, the best results were achieved in the second mode, where all
BRAMs were placed in the same column. Even in the case of Test036, which
requires 13 BRAMs, swapping the memory content is performed using only 32
frames and the size of the partial bit-stream is over 50 times smaller compared to
the bit-stream containing full FPGA configuration data.
The worst results were achieved in the third mode, where BRAMs were located
in the different columns. In the case of the benchmark Test036, 256 frames have
to be sent to the FPGA to replace the whole memory content. However, it should
be pointed out that the bit-stream is still reduced over eight times.
127
T
ab
.B
.7
:
R
es
ul
ts
of
pa
rt
ia
lr
ec
on
fig
ur
at
io
n
of
C
M
C
U
s
–
P
ar
t
1
B
en
ch
m
ar
k
M
od
ifi
ca
ti
on
R
es
ul
ts
N
am
e
M
ic
ro
-
M
ic
ro
-
B
R
A
M
s
F
ul
l
bi
t
M
od
ifi
ed
M
od
ifi
ed
M
od
ifi
ed
M
od
ifi
ed
D
iff
er
en
t
P
ar
ti
al
bi
t
%
of
R
ed
uc
ti
on
in
st
r.
op
er
.
st
re
am
m
ic
ro
in
st
r.
B
R
A
M
s
li
ne
s
co
lu
m
ns
fr
am
es
st
re
am
or
ig
in
al
11
13
1
14
48
81
6
1
1
1
1
1
26
96
0,
19
%
53
7,
39
2
1
1
1
1
26
96
0,
19
%
53
7,
39
T
es
t0
02
2
1
2
1
2
35
20
0,
24
%
41
1,
60
(d
ef
au
lt
)
4
1
2
1
2
35
20
0,
24
%
41
1,
60
8
1
2
1
2
35
20
0,
24
%
41
1,
60
11
1
1
1
1
26
96
0,
19
%
53
7,
39
11
1
2
1
2
35
23
0,
24
%
41
1,
24
53
32
4
14
48
81
6
1
1
1
1
1
26
96
0,
19
%
53
7,
39
1
2
2
1
1
26
96
0,
19
%
53
7,
39
L
ig
ht
s
1
3
3
2
2
43
60
0,
30
%
33
2,
30
(d
ef
au
lt
)
1
4
4
2
2
43
60
0,
30
%
33
2,
30
al
l
B
R
A
M
s
13
1
2
1
2
35
20
0,
24
%
41
1,
60
in
2
co
lu
m
ns
26
2
4
1
2
35
20
0,
24
%
41
1,
60
39
3
6
2
4
60
08
0,
41
%
24
1,
15
53
4
8
2
4
60
08
0,
41
%
24
1,
15
53
32
4
14
48
81
6
1
1
1
1
1
26
96
0,
19
%
53
7,
39
1
2
2
1
1
26
96
0,
19
%
53
7,
39
L
ig
ht
s
(m
in
)
1
3
3
1
1
26
96
0,
19
%
53
7,
39
al
l
B
R
A
M
s
1
4
4
1
1
26
96
0,
19
%
53
7,
39
in
1
co
lu
m
n
13
1
2
1
2
35
20
0,
24
%
41
1,
60
26
2
4
1
2
35
20
0,
24
%
41
1,
60
39
3
6
1
2
35
20
0,
24
%
41
1,
60
53
4
8
1
2
35
20
0,
24
%
41
1,
60
53
32
4
14
48
81
6
1
1
1
1
1
26
96
0,
19
%
53
7,
39
1
2
2
2
2
43
60
0,
30
%
33
2,
30
L
ig
ht
s
(m
ax
)
1
3
3
3
3
60
24
0,
42
%
24
0,
51
al
l
B
R
A
M
s
1
4
4
4
4
76
88
0,
53
%
18
8,
45
in
4
co
lu
m
ns
13
1
2
1
2
35
20
0,
24
%
41
1,
60
26
2
4
2
4
60
08
0,
41
%
24
1,
15
39
3
6
3
6
84
96
0,
59
%
17
0,
53
53
4
8
4
8
10
98
4
0,
76
%
13
1,
90
128
T
ab
.B
.8
:
R
es
ul
ts
of
pa
rt
ia
lr
ec
on
fig
ur
at
io
n
of
C
M
C
U
s
–
P
ar
t
2
B
en
ch
m
ar
k
M
od
ifi
ca
ti
on
R
es
ul
ts
N
am
e
M
ic
ro
-
M
ic
ro
-
B
R
A
M
s
F
ul
l
bi
t
M
od
ifi
ed
M
od
ifi
ed
M
od
ifi
ed
M
od
ifi
ed
D
iff
er
en
t
P
ar
ti
al
bi
t
%
of
R
ed
uc
ti
on
in
st
r.
op
er
.
st
re
am
m
ic
ro
in
st
r.
B
R
A
M
s
li
ne
s
co
lu
m
ns
fr
am
es
st
re
am
or
ig
in
al
10
0
15
2
5
14
48
81
6
1
1
1
1
1
26
99
0,
19
%
53
6,
80
1
2
2
2
2
43
60
0,
24
%
41
1,
13
1
3
3
2
2
43
60
0,
24
%
41
1,
60
T
es
t0
27
1
4
4
3
3
60
24
0,
30
%
33
3,
29
(d
ef
au
lt
)
1
5
5
3
3
60
24
0,
30
%
33
3,
06
al
l
B
R
A
M
s
20
1
16
1
16
15
06
0
1,
04
%
96
,2
0
in
3
co
lu
m
ns
40
2
32
2
32
29
09
1
2,
01
%
49
,8
0
60
3
48
2
32
29
08
8
2,
01
%
49
,8
1
80
4
64
3
48
42
48
8
2,
93
%
34
,1
0
10
0
5
68
3
48
42
49
1
2,
93
%
34
,1
0
10
0
15
2
5
14
48
81
6
1
1
1
1
1
26
99
0,
19
%
53
6,
80
1
2
2
1
1
26
96
0,
19
%
53
7,
39
1
3
3
1
1
26
96
0,
19
%
53
7,
39
T
es
t0
27
(m
in
)
1
4
4
1
1
26
96
0,
19
%
53
7,
39
al
l
B
R
A
M
s
1
5
5
1
1
26
96
0,
19
%
53
7,
39
in
1
co
lu
m
n
20
1
16
1
2
35
20
0,
24
%
41
1,
60
40
2
32
1
4
51
68
0,
36
%
28
0,
34
60
3
48
1
8
92
92
0,
64
%
15
5,
92
80
4
64
1
16
15
05
5
1,
04
%
96
,2
3
10
0
5
68
1
16
15
06
0
1,
04
%
96
,2
0
10
0
15
2
5
14
48
81
6
1
1
1
1
1
26
96
0,
19
%
53
7,
39
1
2
2
2
2
43
60
0,
30
%
33
2,
30
1
3
3
3
3
60
24
0,
42
%
24
0,
51
T
es
t0
27
(m
ax
)
1
4
4
4
4
76
88
0,
53
%
18
8,
45
al
l
B
R
A
M
s
1
5
5
5
5
93
52
0,
65
%
15
4,
92
in
5
co
lu
m
ns
20
1
16
1
16
15
06
0
1,
04
%
96
,2
0
40
2
32
2
32
29
08
8
2,
01
%
49
,8
1
60
3
48
3
48
43
11
6
2,
98
%
33
,6
0
80
4
64
4
64
54
04
8
3,
73
%
26
,8
1
10
0
5
68
5
68
54
98
4
3,
80
%
26
,3
5
129
T
ab
.B
.9
:
R
es
ul
ts
of
pa
rt
ia
lr
ec
on
fig
ur
at
io
n
of
C
M
C
U
s
–
P
ar
t
3
B
en
ch
m
ar
k
M
od
ifi
ca
ti
on
R
es
ul
ts
N
am
e
M
ic
ro
-
M
ic
ro
-
B
R
A
M
s
F
ul
l
bi
t
M
od
ifi
ed
M
od
ifi
ed
M
od
ifi
ed
M
od
ifi
ed
D
iff
er
en
t
P
ar
ti
al
bi
t
%
of
R
ed
uc
ti
on
in
st
r.
op
er
.
st
re
am
m
ic
ro
in
st
r.
B
R
A
M
s
li
ne
s
co
lu
m
ns
fr
am
es
st
re
am
or
ig
in
al
15
1
45
2
13
14
48
81
6
1
1
1
1
1
26
96
0,
19
%
53
7,
39
1
2
2
1
1
26
96
0,
19
%
53
7,
39
1
4
4
2
2
43
60
0,
30
%
33
2,
30
T
es
t0
36
1
8
8
4
4
76
88
0,
53
%
18
8,
45
(d
ef
au
lt
)
1
13
13
5
5
93
52
0,
65
%
15
4,
92
al
l
B
R
A
M
s
30
1
30
1
30
26
69
7
1,
84
%
54
,2
7
in
5
co
lu
m
ns
60
2
60
2
60
52
27
3
3,
61
%
27
,7
2
90
4
64
3
48
43
35
1
2,
99
%
33
,4
2
12
0
8
12
8
4
64
57
51
9
3,
97
%
25
,1
9
15
1
13
41
6
5
16
0
11
82
63
8,
16
%
12
,2
5
15
1
45
2
13
14
48
81
6
1
1
1
1
1
26
96
0,
19
%
53
7,
39
1
2
2
1
1
26
96
0,
19
%
53
7,
39
1
4
4
1
1
26
96
0,
19
%
53
7,
39
T
es
t0
36
(m
in
)
1
8
6
1
1
26
96
0,
19
%
53
7,
39
al
l
B
R
A
M
s
1
13
13
1
1
26
96
0,
19
%
53
7,
39
in
1
co
lu
m
n
30
1
30
1
30
26
70
1
1,
84
%
54
,2
6
60
2
60
1
30
26
70
1
1,
84
%
54
,2
6
90
4
64
1
30
26
70
1
1,
84
%
54
,2
6
12
0
8
12
8
1
32
28
34
9
1,
96
%
51
,1
1
15
1
13
41
6
1
32
28
34
9
1,
96
%
51
,1
1
15
1
45
2
13
14
48
81
6
1
1
1
1
1
26
96
0,
19
%
53
7,
39
1
2
2
1
1
26
96
0,
19
%
53
7,
39
1
4
4
3
3
61
43
0,
42
%
23
5,
85
T
es
t0
36
(m
ax
)
1
8
8
5
5
93
52
0,
65
%
15
4,
92
al
l
B
R
A
M
s
1
13
13
8
8
14
49
5
1,
00
%
99
,9
5
in
8
co
lu
m
ns
30
1
30
1
30
26
70
2
1,
84
%
54
,2
6
60
2
60
2
60
52
27
3
3,
61
%
27
,7
2
90
4
64
4
64
57
53
1
3,
97
%
25
,1
8
12
0
8
12
8
5
80
71
42
7
4,
93
%
20
,2
8
15
1
13
41
6
8
25
6
18
04
49
12
,4
5%
8,
03
Bibliography
Adamski M. (1980): Programmable asynchronous control units with selfsynchro-
nization. — Proceedings of the 7th National Conference on Automaton,
KKA’80, Szczecin, Poland, pp. 203–208 (in Polish).
Adamski M. (1999): Application specific logic controllers for safety critical sys-
tems. — 14th IFAC World Congress, Beijing, China: Oxford, International
Federation of Automatic Control, Vol. Q: Transportation Systems: Computer
Control, pp. 519–524.
Adamski M. and Barkalov A. (2006): Architectural and Sequential Synthesis of
Digital Devices. — Zielona Góra: University of Zielona Góra Press.
Adamski M. and Węgrzyn M. (2003): Reprogrammable controllers for reactive em-
bedded systems. — Real-Time Programming 2003 (WRTP 2003): A Proceed-
ings Volume from the 26th IFAC/IFIP/IEEE Workshop, Oxford: Elsevier,
pp. 39–44.
Adamski M., Węgrzyn M. and Węgrzyn A. (2007): Safe reconfigurable logic con-
trollers design. — Measurements, Models, Systems and Design, Warsaw:
WKŁ, pp. 343–370.
Ahmad I., Ali F. and Ul-Mustafa R. (2000): An integrated state assignment and
flip-flop selection technique for FSM synthesis. — Microprocessors and Mi-
crosystems.
Aho A. V., Hopcroft J. E. and Ullman J. D. (1974): The Design and Analysis of
Computer Algorithms. — Reading, MA: Addison-Wesley.
Altera (2006): Embedded Memory in Altera FPGAs. — Altera.
http://www.altera.com/technology/memory/embedded
/mem-embedded.html
Altera (2008): Altera Devices Website. — Altera.
http://www.altera.com/products/devices/dev-index.jsp
Ashar P., Devadas S. and Newton A. R. (1990): A unified approach to the decompo-
sition and re-decomposition of sequential machines. — DAC ’90: Proceedings
of the 27th ACM/IEEE Conference on Design Automation, New York, NY,
USA: ACM, pp. 601–606.
Ashar P., Devadas S. and Newton A. R. (1992): Sequential Logic Synthesis. —
Norwell, MA: Kluwer Academic Publishers.
BIBLIOGRAPHY 131
Baranov S. I. (1994): Logic Synthesis for Control Automata. — Boston, MA:
Kluwer Academic Publishers.
Barkalov A. (1998): Principles of optimization of logical circuit of Moore finite-
state-machine. — Cybernetics and System Analysis, Vol. 1, pp. 65–72.
Barkalov A. (2002): Synthesis of Control Units on PLDs. — Donetsk: DonNTU.
Barkalov A., Bukowiec A. and Wiśniewski R. (2005a): Synthesis of control units
with optimization of excitation functions. — Radiotehnika, Vol. 142, pp. 92–
96 (in Russian).
Barkalov A., Efimenko K. and Wiśniewski R. (2006a): Optimization of an OLC
address in the compositional microprogram control unit. — Naukovi Praci
Donec’kogo Nacional’nogo Tehniènogo Universitetu: Problemi Modeljuvan-
nja ta Avtomatizacii Proektuvannja Dinamicnich Sistem, Vol. 5, pp. 156–161
(in Russian).
Barkalov A. and Palagin A. (1997): Synthesis of Microprogram Control Units. —
Kiev: IC NAC of Ukraine.
Barkalov A., Titarenko L. and Bieganowski J. (2008): Synthesis of control unit
using code sharing and chain modifications. — Przegląd Telekomunikacyjny i
Wiadomości Telekomunikacyjne, Vol. 6, pp. 753–755 [CD–ROM].
Barkalov A., Titarenko L. and Chmielewski S. (2007a): Reduction in the number
of PAL macrocells in the circuit of a moore FSM. — International Journal of
Applied Mathematics and Computer Science, Vol. 17, No. 4, pp. 565–675.
Barkalov A., Titarenko L. and Wiśniewski R. (2005b): Optimization of the amount
of LUT-elements in compositional microprogram control unit with mutual
memory. — Proceedings of the IEEE East-West Design & Test Workshop
– EWDTW ’05, Odessa, Ukraine, Kharkov National University of Radioelec-
tronics, pp. 75–79.
Barkalov A., Titarenko L. and Wiśniewski R. (2005c): Synthesis of compositional
microprogram control units with transformation of the numbers of inputs. —
The Experience of Designing and Application of CAD Systems in Micro-
electronics: Proceedings of the 8th International Conference CADSM 2005,
Lviv-Polyana, Ukraine: Publishing House of the Lviv Polytechnic National
University, 2005c, pp. 181–184.
Barkalov A., Titarenko L. and Wiśniewski R. (2007b): Synthesis of compositional
microprogram control units with function decoder for telecommunication sys-
tems. — Radiotehnika: Problems of Telecommunications, Vol. 151, pp. 106–
111.
Barkalov A., Titarenko L. and Wiśniewski R. (2007c): Optimization of the cir-
cuit of compositional microprogram control unit with mutual memory. — The
132 BIBLIOGRAPHY
Experience of Designing and Application of CAD Systems in Microelectron-
ics: Proceedings of the 9th International Conference CADSM 2007, Lviv-
Polyana, Ukraine: Publishing House of the Lviv Polytechnic National Uni-
versity, pp. 251–255.
Barkalov A. and Węgrzyn M. (2006b): Design of Control Units with Programmable
Logic. — Zielona Góra: University of Zielona Góra Press.
Barkalov A., Węgrzyn M. and Wiśniewski R. (2006c): Partial reconfiguration of
compositional microprogram control units implemented on FPGAs. — Pro-
grammable Devices and Embedded Systems, PDeS 2006: Proceedings of the
IFAC Workshop, Brno, Czech Republic, pp. 116–119.
Barkalov A., Węgrzyn M. and Wiśniewski R. (2006d): Optimization of LUT-
elements amount in cotrol unit of system-on-chip. — Discrete-Event System
Design, DESDes ’06: A Proceedings Volume from the 3rd IFAC Workshop,
Rydzyna, Poland: International Federation of Automatic Control by The
University of Zielona Góra Press, pp. 143–146.
Barkalov A. and Wiśniewski R. (2004a): Design of compositional microprogram
control units with maximal encoding of inputs. — Radioelektronika i Infor-
matika, Vol. 3, pp. 79–81.
Barkalov A. and Wiśniewski R. (2004b): Optimization of compositional micropro-
gram control unit with elementary operational linear chains. — Upravljuscije
Sistemy i Masiny, Vol. 5, pp. 25–29.
Barkalov A. and Wiśniewski R. (2004d): Design of control units with transforma-
tion of the number of transaction. — Radiotehnika, Vol. 138, pp. 110–113.
Barkalov A. and Wiśniewski R. (2004e): Design of compositional microprogram
control units with transformation of the number of transactions. — Mixed
Design of Integrated Circuits and Systems, MIXDES 2004, Szczecin, Poland:
Proceedings of the 11th International Conference, pp. 172–175.
Barkalov A. and Wiśniewski R. (2004f): Optimization of compositional mi-
croprogram control units with sharing of codes. — Avtomatizacija proek-
tirovanija diskretnych sistem: Materialy pjatoj mezdunarodnoj konferencii,
Nacional’naja Akademija Nauk Belarusi, Minsk, Belarus, 2004f, Vol. 1,
pp. 16–22.
Barkalov A. and Wiśniewski R. (2004g): Synthesis of compositional microprogram
control units with transformation of the numbers of inputs. — Discrete-Event
System Design, DESDes’ 04: Proceedings of the International Workshop,
Dychów, Poland: University of Zielona Góra Press, pp. 145–148.
Barkalov A. and Wiśniewski R. (2005d): Synthesis of compositional microprogram
control units with optimal encoding of elementary linear chains. — Infor-
matika, Vol. 1, pp. 95–102 (in Russian).
BIBLIOGRAPHY 133
Barkalov A. and Wiśniewski R. (2005e): Implementation of compositional mi-
croprogram control unit on FPGAs. — Proceedings of the IEEE East-West
Design & Test Workshop, EWDTW ’05, Kharkov National University of Ra-
dioelectronics, Odessa, Ukraine: IEEE EWDTW, pp. 80–83.
Barkalov A. and Wiśniewski R. (2005h): Optimization of compositional micropro-
gram control units implemented on system-on-chip. — Informatyka Teorety-
czna i Stosowana, Vol. 5, No. 9, pp. 7–22.
Barkalov A., Wiśniewski R. and Babakov R. (2004c): Optimization of the compo-
sitional microprogram control unit with elementary linear chains. — Naukovi
Praci Donec’kogo Nacional’nogo Technicnogo Universitetu: Obcisljuval’na
Technika ta Avtomatizacija, Vol. 77, pp. 210–216 (in Russian).
Barkalov A., Wiśniewski R. and Efimenko K. (2005i): Designing of the compo-
sitional microprogram control units for FPGAs. — Radioelektronika Infor-
matika Upravlinnja, Vol. 2, pp. 127–131 (in Russian).
Barkalov A., Wiśniewski R. and Greczko E. (2005j): Synthesis of compositional
microprogram control units implemented in FPGAs. — Proceedings of the
Workshop RUC 2005, Szczecin, Poland, pp. 17–23 (in Polish).
Barkalov A., Wiśniewski R., Kovalyov S. and Efimenko K. (2006e): Optimization
of LUT-elements in microprogrammed controllers implemented in an FPGA.
— Mašinostroenie i tehnosfera XXI veka: Sbornik trudov XIII meždunarod-
noj nauèno-tehnièeskoj konferencii, Sevastopol, Ukraine: Doneck, DonNTU,
Vol. 1, pp. 75–80 (in Russian).
Barkalov A., Wiśniewski R. and Titarenko L. (2005f): Synthesis of compositional
microprogram control unit on FPGA. — Mixed Design of Integrated Circuits
and Systems, MIXDES 2005: Proceedings of the 12th International Confer-
ence, Cracow, Poland, Vol. 1, pp. 205–208.
Berge C. (1973): Graphs and Hypergraphs. — Amsterdam, Holland: North Hol-
land.
Bibilo P. (1999): Background of VHDL. — Moscow: Solon-R.
Bolton M. (1990): Digital Systems Design with Programmable Logic. — Boston,
MA: Addison-Wesley Longman Publishing Co., Inc.
Borowik G. (2004): Synthesis of sequential circuits implemented in FPGAs. —
Proceedings of the International PhD Workshop OWD’04, Wisła, Poland,
2004, Vol. 19 of the PTETiS Conference Archive, , pp. 361–366 (in Polish).
Borowik G. (2005): FSM coding for optimal serial decomposition. — Proceedings
of the International PhD Workshop OWD’05, Wisła, Poland, 2005, Vol. 21
of the PTETiS Conference Archive, , pp. 243–248.
134 BIBLIOGRAPHY
Brown S. and Vernesic Z. (2000): Fundamentals of Digital Logic with VHDL De-
sign. — New York, NY: McGraw Hill.
Bubacz P. (2008): Hierarchical designing of control units. — Proceedings of the In-
ternational PhD Workshop OWD 2008, Wisła, Poland, Polish Society of The-
oretical and Applied Electrotechnics: Warsaw, Vol. 25, pp. 507–512 (in Pol-
ish).
Bukowiec A. (2006): Synthesis of mealy FSM with multiple shared encoding of
microinstructions and internal states. — Programmable Devices and Embed-
ded Systems, PDeS 2006: Proceedings of the IFAC Workshop, Brno, Czech
Republic: pp. 95–100.
Bukowiec A., Barkalov A. and Titarenko L. (2008): FSMs implementation into
FPGAs with multiple encoding of states. — Proceedings of the IEEE East-
West Design & Test Symposium, EWDTS’ 08, Kharkov National University
of Radioelectronics, Lviv, Ukraine: Institute of Electrical and Electronics
Engineers, Inc., pp. 72–75.
Bukowiec A. and Węgrzyn M. (2005): Design of logic controllers for safety criti-
cal systems using FPGAs with embedded microprocessors. — Real-Time Pro-
gramming 2004 (WRTP 2004): A Proceedings Volume from the 28th IFAC/I-
FIP Workshop, Oxford: Elsevier Ltd, pp. 97–102.
Bursky D. (1999): Embedded logic and memory find A home in FPGAs. — Elec-
tronic Design, Vol. 47, No. 14, pp. 43–56.
Chmielewski S. and Węgrzyn M. (2006): Modelling and synthesis of automata
in HDLs. — Proceedings of SPIE: Photonics Applications in Astronomy,
Communications, Industry and High-Energy Physics Experiments, Vol. 6347,
pp. 14.
Ciesielski M. and Yang S. (1992): PLA DE: A two-stage PLA decomposition.
— IEEE Transactions on CAD of Integrated Circuits and Systems, Vol. 11,
No. 8, pp. 943–954.
Clements A. (2000): The Principles of Computer Hardware. — Oxford, NJ: Oxford
University Press.
Cormen T. H., Leiserson C. E., Rivest R. L. and Stein C. (2001): Introduction to
Algorithms. — The MIT Press.
Dagless E. L. (1983): PLA and ROM based design. — Semi-Custom IC Design
and VLSI, IEE Digital Electronics and Computing Series 2, Herts: Peter
Peregrinus Ltd., pp. 121–135.
De Micheli G. (1994): Synthesis and Optimization of Digital Circuits. — New
York, NY: McGraw-Hill.
BIBLIOGRAPHY 135
Devadas S., Wang A., Newton R. and Sangiovanni-Vincentelli A. L. (1989):
Boolean decomposition in multilevel logic optimization. — IEEE Journal of
solid-state circuits, April, pp. 399–408.
Devadas S., Wang A. R., Newton A. R. and Sangiovanni-Vincentelli A. L.
(1988): Boolean decomposition of programmable logic arrays. — Proceed-
ings of IEEE Custom Integrated Circuits Conference, CICC’88, California
University, Berkeley, CA.
Doligalski M. and Węgrzyn M. (2007): Partial reconfiguration-oriented design
of logic controllers. — Proceedings of SPIE: Photonics Applications in As-
tronomy, Communications, Industry, and High-Energy Physics Experiments,
Vol. 6937, pp. 10.
Gajski D. (1996): Principles of Digital Design. — Upper Saddle River, NJ: Pren-
tice Hall.
Halang W. A. and Krämer B. (1992): Achieving high integrity of process control
software by graphical design and formal verification. — Software Engineering
Journal, Vol. 7, No. 1, pp. 53–64.
Halang W. A. and Krämer B. J. (1994): Safety assurance in process control. —
IEEE Softw., Vol. 11, No. 1, pp. 61–67.
Halang W. and Adamski M. (1997): A programmable electronic system for safety
related control applications. — Advances in Safety and Reliability: Proceed-
ings of the International Conference, ESREL ’97, Oxford: Pergamon, Vol. 1,
pp. 349–355.
Harary F. (1994): Graph Theory. — Reading, MA: Addison-Wesley.
Hrynkiewicz E., Pucher K. and Kania D. (1997): The input partitioning and coding
problem in PAL-based CPLDs. — 20th National Conference on Circuit Theory
and Electronic Networks, Kołobrzeg, Poland, pp. 145–152.
Husson S. (1970): Microprogramming – Principles and Practices. — New York,
NY: Prentice Hall.
IEEE (2001): IEEE Standard Verilog Hardware Description Language 1364-2001.
— New York, NY: Institute of Electrical and Electronics Engineers.
Jenkins J. (1994): Designing with FPGAs and CPLDs. — Upper Saddle River,
NJ: Prentice Hall.
Jóźwiak L. and Chojnacki A. (2003): Effective and efficient FPGA synthesis
through general functional decomposition. — J. Syst. Archit., Vol. 49, No. 4-6,
pp. 247–265.
Kania D. (1999): Two-level logic synthesis on PAL-based CPLD and FPGA us-
ing decomposition. — Procedings of the 25-th Euromicro Conference, IEEE
Computer Society Press, pp. 278–281.
136 BIBLIOGRAPHY
Kania D. (2004): The Logic Synthesis for the PAL-based Complex Programmable
Logic Devices. — Gliwice: Lecture Notes of the Silesian University of Tech-
nology (in Polish).
Kania D. (2007): A new approach to logic synthesis of multi-output Boolean func-
tions on PAL-based CPLDS. — GLSVLSI ’07: Proceedings of the 17th ACM
Great Lakes Symposium on VLSI, New York, NY, USA: ACM, pp. 152–155.
Kania D. and Kulisz J. (2007): Logic synthesis for PAL-based CPLDs, based on
two-stage decomposition. — J. Syst. Softw., Vol. 80, No. 7, pp. 1129–1141.
Kania D., Kulisz J., Milik A. and Czerwiński R. (2005): Decomposition models for
CPLDs. — Proceedings of the Workshop RUC 2005, Szczecin, Poland, 2005,
pp. 77–84 (in Polish).
Kernighan B. W. and Ritchie D. M. (1977): The C Programming Language. —
Englewood Cliffs, NJ: Prentice-Hall.
Kołopieńczyk M. (2008): Application of Adress Converter for Decreasing Mem-
ory Size of Compositional Microprogram Control Unit with Code Sharing.
— Lecture Notes in Control and Computer Science, Vol. 12, Zielona Góra:
University of Zielona Góra Press.
Kołopieńczyk M., Titarenko L. and Barkalov A. (2007): Design of CMCU with
expanded microinstruction and elementary OLC’s. — Pomiary, Automatyka,
Kontrola, Vol. 5, pp. 72–74.
Kravcov L. and Chernicki G. (1976): Design of Microprogram Control Units. —
Leningrad: Energia (in Russian).
Kubátová H. (2005): Finite State Machine Implementation in FPGAs. — Design
of Embedded Control Systems, New York, NY: Springer, pp. 177–187.
Łabiak G. (2005a): Application of Hierarchical Model of Concurent Automata in
the Digital Controller Design. — Lecture Notes in Control and Computer
Science, Vol. 6, Zielona Góra: University of Zielona Góra Press (in Polish).
Łabiak G. (2005b): Symbolic state exploration of UML statecharts for hardware de-
scription. — Design of Embedded Control Systems, New York, NY: Springer,
pp. 73–83.
Łach J., Sapiecha E. and Zbierzchowski B. (2003): Synthesis of sequential circuits
for FPGAs with embedded memory blocks. — Przegląd Telekomunikacyjny i
Wiadomości Telekomunikacyjne, Vol. 2-3, pp. 81–86 (in Polish).
Lee J. M. (1999): VERILOG QuickStart: A Practical Guide to Simulation and
Synthesis in VERILOG. — Norwell, MA: Kluwer Academic Publishers.
Łuba T. (2001): Synthesis of Digital Devices. — Warsaw: WSISiZ (in Polish).
BIBLIOGRAPHY 137
Łuba T. (2003): Synthesis of Digital Devices. — Warsaw: WKŁ (in Polish).
Łuba T. (2005): Synthesis of Logic Devices. — Warsaw: Warsaw University of
Technology Press (in Polish).
Łuba T., Rawski M. and Jachna Z. (2002): Functional decomposition as a uni-
versal method of logic synthesis for digital circuits. — Proceedings of the 9th
International Conference on Mixed Design of Integrated Circuits and Systems
MixDes’02, Wrocław, Poland, pp. 285–290.
Maxfield C. (2004): The Design Warrior’s Guide to FPGAs. — Orlando, FL:
Academic Press, Inc.
McCluskey E. (1986): Logic Design Principles. — Englewood Cliffs, NJ: Prentice
Hall.
Mealy G. (1955): A method for synthesizing sequential circuits. — Bell System
Technical Journal, Vol. 34, Nr. 5 (1955), pp. 1045–1079.
Mesquita D., Moraes F., Palma J., Möller L. and Calazans N. (2003): Remote and
partial reconfiguration of FPGAs: Tools and trends. — Proceedings of the
International Parallel and Distributed Processing Symposium (IPDPS’03),
Montpellier, France, pp. 177–185.
Misiurewicz P. (1982): Digital Circuits. — Warsaw: WNT (in Polish).
Molski M. (1986): Modular and Microprogramed Digital Devices. —Warsaw: WKŁ
(in Polish).
Moore E. (1956): Gedanken experiments on sequential machines. — Automata
Studies, PUP, pp. 129–153.
Muthukumar V., Bignall R. J. and Selvaraj H. (2007): An efficient variable parti-
tioning approach for functional decomposition of circuits. — J. Syst. Archit.,
Vol. 53, No. 1, pp. 53–67.
Papachristou C. A. (1979): A scheme for implementing microprogram addressing
with programmable logic arrays. — Digital Processes, Vol. 5, No. 3-4, pp. 235–
256.
Parnell K. and Mehta N. (2003): Programmable Logic Design Quick Start Hand
Book. — Xilinx.
Pasierbiński J. and Zbysiński P. (2001): Programmable Devices in Practice. —
Warsaw: WKŁ (in Polish).
Pecheux F., Lallement C. and Vachoux A. (2005): VHDL-AMS and Verilog-AMS
as alternative hardware description languages for efficient modeling of multi-
discipline systems. — IEEE Transactions on Computer-Aided Design of In-
tegrated Circuits and Systems, Vol. 24, No. 2, pp. 204–225.
138 BIBLIOGRAPHY
Perkowski M., Jóźwiak L. and Zhao W. (2001): Symbolic two-dimensional mini-
mization of strongly unspecified finite state machines. — Journal of Systems
Architecture, Vol. 47, pp. 15–28.
Rawski M., Jóźwiak L. and Łuba T. (2001): Functional decomposition with an
efficient input support selection for sub-functions based on information rela-
tionship measures. — Journal of Systems Architecture, Vol. 47, pp. 137–155.
Rawski M., Łuba T., Jachna Z. and Tomaszewicz P. (2005): The influence of func-
tional decomposition on modern digital design process. — Design of Embedded
Control Systems, Boston, MA: Springer, pp. 193–206.
Rawski M., Selvaraj H. and Luba T. (2003): An application of functional de-
composition in ROM-based FSM implementation in FPGA devices. — DSD
’03: Proceedings of the Euromicro Symposium on Digital Systems Design,
Washington, DC, USA: IEEE Computer Society, 2003, p. 104.
Rawski M., Tomaszewicz P., Selvaraj H. and Łuba T. (2005c): Efficient imple-
mentation of digital filters with use of advanced synthesis methods targeted
FPGA architectures. — DSD ’05: Proceedings of the 8th Euromicro Con-
ference on Digital System Design, Washington, DC, USA: IEEE Computer
Society, 2005c, pp. 460–466.
Salcic Z. (1998): VHDL and FPLDs in Digital Systems Design, Prototyping and
Customization. — Boston, MA: Kluwer Academic Publishers.
Sasao T. (1988): Multiple-valued logic and optimization of programmable logic
arrays. — Computer, Vol. 21, No. 4, pp. 71–80.
Sasao T. (1999): Totally undecomposable functions: Applications to efficient
multiple-valued decompositions. — ISMVL ’99: Proceedings of the 29th IEEE
International Symposium on Multiple-Valued Logic, Washington, DC, USA:
IEEE Computer Society, 1999, p. 59.
Scholl C. (2001): Functional Decomposition with Application to FPGA Synthesis.
— Norwell, MA: Kluwer Academic Publishers.
Selvaraj H. and Luba T. (1995): A balanced multilevel decomposition method. —
EDTC ’95: Proceedings of the 1995 European Conference on Design and Test,
Washington, DC, USA: IEEE Computer Society, 1995, p. 594.
Selvaraj H., Tomaszewicz P., Rawski M. and Luba T. (2005): Efficient application
of modern logic synthesis in FPGA-based designing of information and signal
processing systems. — ITCC ’05: Proceedings of the International Conference
on Information Technology: Coding and Computing (ITCC’05) – Volume II,
Washington, DC, USA: IEEE Computer Society, 2005, pp. 22–27.
Sentovich E. M. (1993): Sequential Circuit Synthesis at the Gate Level. — PhD
thesis, University of California, Berkeley.
BIBLIOGRAPHY 139
Sentovich E., Singh K. J., Moon C. W., Savoj H., Brayton R. K. and Sangiovanni-
Vincentelli A. L. (1992a): Sequential circuit design using synthesis and opti-
mization. — ICCD ’92: Proceedings of the 1991 IEEE International Confer-
ence on Computer Design on VLSI in Computer & Processors, Washington,
DC, USA: IEEE Computer Society, 1992a, pp. 328–333.
Sentovich E., Singh K., Lavagno L., Moon C., Murgai R., Saldanha A., Savoj H.,
Stephan P. R., Brayton R. K. and Sangiovanni-Vincentelli A. (1992b): SIS:
A system for sequential circuit synthesis. — Technical Report UCB/ERL
M92/41, University of California, Berkeley.
Skahill K., Legenhausen J., Wade R., Wilner C. and Wilson B. (1996): VHDL
for Programmable Logic. — Redwood City, CA: Addison Wesley Longman
Publishing Co., Inc.
Solovjev V. (1996): Design of the Functional Units of Digital Systems Using Pro-
grammable Logic Devices. — Minsk: Bestprint.
Stallings W. (1996): Computer Organization and Architecture. — Upper Saddle
River, NJ: Prentice Hall.
Stroustrup B. (1986): The C++ Programming Language. — Reading, MA:
Addison-Wesley.
Thomas D. and Moorby P. (2002): The Verilog Hardware Description Language.
— 5th edn, Norwell, MA: Kluwer Academic Publishers.
Tkacz J. (2006): Gentzen system calculus implementation for symbolic minimal-
ization of complicated logical expressions. — Discrete-Event System Design,
DESDes ’06: A Proceedings Volume from the 3rd IFAC Workshop, Rydzyna,
Poland: Zielona Góra, International Federation of Automatic Control by the
University of Zielona Góra Press, pp. 53–56.
Traczyk W. (1982): Digital Devices. Theoretical Basics and Synthesis Methods. —
Warsaw: WNT (in Polish).
Wilkes M. V. (1951): The best way to design an automatic calculating machine.
— Manchester University Inaugural Conference, Manchester.
Wilson R. J. (1979): Introduction to Graph Theory. — New York, NY: Academic
Press.
Wiśniewska M. and Wiśniewski R. (2005): Application of hypergraphs in the mini-
mization of the memory volume of microprogrammed controllers. — Proceed-
ings of the Workshop KNWS 2005, Złotniki Lubańskie, Poland: University
of Zielona Góra Press, pp. 23–32 (in Polish).
Wiśniewska M., Wiśniewski R. and Adamski M. (2005): Reduction of the microin-
struction length in the microprogrammed controllers based on the hypergraph
theory. — Proceedings of the Workshop RUC 2005, Szczecin, Poland, pp. 33–
40 (in Polish).
140 BIBLIOGRAPHY
Wiśniewska M., Wiśniewski R. and Adamski M. (2007): Usage of hypergraph the-
ory in decomposition of concurrent automata. — Pomiary, Automatyka, Kon-
trola, Vol. 7, pp. 66–68.
Wiśniewski R. (2004): Design of compositional microprogram control units with
sharing of the codes. — Proceedings of the International PhDWorkshop OWD
2004, Wisła, Poland: PTETiS, Vol. 19, pp. 217–220.
Wiśniewski R. (2005a): Designing and implementation of microprogrammed con-
trollers with embedded memory blocks of FPGAs. — Proceedings of the Work-
shop KNWS 2005, Złotniki Lubańskie, Poland: Zielona Góra, University of
Zielona Góra Press, 2005, pp. 33–38 (in Polish).
Wiśniewski R. (2005b): Partial reconfiguration of microprogrammed controllers
implemented in FPGAs. — Proceedings of the International PhD Workshop
OWD 2005, Wisła, Poland: PTETiS, Vol. 21, pp. 239–242 (in Polish).
Wiśniewski R. (2006a): Design of compositional microprogram control units with
elementary operational linear chains. — Discrete-Event System Design, DES-
Des ’06: A Proceedings Volume from the 3rd IFAC Workshop, Rydzyna,
Poland, pp. 191–194.
Wiśniewski R. (2006b): Synthesis of microprogrammed controllers with sharing
codes and address decoder. — Pomiary, Automatyka, Kontrola, Vol. 6, pp. 38–
40 (in Polish).
Wiśniewski R. and Barkalov A. (2007): Synthesis of compositional microprogram
control units with function decoder. — International Workshop Control and
Information Technology, IWCIT 2007, Ostrava, Czech Republic: VSB – Tech-
nical University of Ostrava, pp. 229–232.
Wiśniewski R., Barkalov A. and Titarenko L. (2006a): Optimization of address
circuit of compositional microprogram unit. — Proceedings of the IEEE East-
West Design & Test Workshop – EWDTW ’06, Sochi, Russia: Kharkov Na-
tional University of Radioelectronics, pp. 167–170.
Wiśniewski R., Barkalov A. and Titarenko L. (2006b): Synthesis of compositional
microprogram control units with sharing codes and address converter. — Po-
miary, Automatyka, Kontrola, Vol. 7, pp. 121–123 (in Polish).
Wiśniewski R., Barkalov A. and Titarenko L. (2006c): Synthesis of compositional
microprogram control units with sharing codes and address decoder. — Mixed
Design of Integrated Circuits and Systems, MIXDES 2006, Gdynia, Poland:
Proceedings of the International Conference, pp. 397–400.
Wiśniewski R., Barkalov A. and Titarenko L. (2007): Synthesis of compositional
microprogram control units with OLC output identification. — Computer-
Aided Design of Discrete Devices, CAD DD ’07: Proceedings of the 6th In-
ternational Conference, Minsk, Belarus: Vol. 2, pp. 81–86.
BIBLIOGRAPHY 141
Wiśniewski R. and Węgrzyn M. (2005): Hardware acceleration and verification of
systems designed with hardware description languages (HDL). — Proceedings
of SPIE, Vol. 5775, pp. 365–376.
Xilinx (2000): Using block selectRAM+ memory in Spartan-II FPGAs. —
www.xilinx.com/bvdocs/appnotes/xapp130.pdf: .
Xilinx (2001): ASIC alternatives. — Xilinx.
http://www.xilinx.com
Xilinx (2004): Two flows for partial reconfiguration. — Xilinx.
http://direct.xilinx.com/bvdocs/appnotes/xapp290.pdf
Xilinx (2005): Using Block RAM in Spartan-3 Generation FPGAs. — Xilinx.
www.xilinx.com/bvdocs/appnotes/xapp463.pdf
Xilinx (2007): Virtex-II Pro and Virtex-II Pro X FGPA User Guide. — Xilinx.
http://www.xilinx.com/support/documentation
/user_guides/ug012.pdf
Yang S. and Ciesielski M. (1989): PLA decomposition with generalized decoders.
— Proceedings International Conference on Computer-Aided Design, IEEE
CS Press, IEEE Computer Society Press, 1989, pp. 312–315.
Zwolinski M. (2000): Digital system design with VHDL. — Boston, MA: Addison-
Wesley Longman Publishing Co., Inc.
List of figures
2.1 Unprogrammed PROM device . . . . . . . . . . . . . . . . . . . . . 12
2.2 Programmed PROM device . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Unprogrammed PLA device . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Programmed PLA device . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Unprogrammed PAL device . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Programmed PAL device . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 CPLD structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Early CLB structure . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 Structure of a typical FPGA device . . . . . . . . . . . . . . . . . . 19
2.10 Structure of the CLB block . . . . . . . . . . . . . . . . . . . . . . 20
2.11 Structure of the logic element (LE) . . . . . . . . . . . . . . . . . . 21
2.12 Model of a digital system . . . . . . . . . . . . . . . . . . . . . . . 22
2.13 Model of a finite state machine . . . . . . . . . . . . . . . . . . . . 23
2.14 Model of a microprogram control unit (MCU) . . . . . . . . . . . . 24
2.15 Model of a microprogram control unit with a counter . . . . . . . . 25
3.1 Idea of serial functional decomposition . . . . . . . . . . . . . . . . 27
3.2 Idea of parallel functional decomposition . . . . . . . . . . . . . . . 28
3.3 Control unit realised as a sequential circuit . . . . . . . . . . . . . 29
3.4 Functional decomposition of the control unit . . . . . . . . . . . . 30
3.5 Compositional microprogram control unit with a base structure . . 32
4.1 Structure of a CMCU with mutual memory . . . . . . . . . . . . . 35
4.2 Flow-chart Γ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 OLC flow-chart of the CMCU U1 . . . . . . . . . . . . . . . . . . . 39
4.4 Technological structure of the CMCU U1 . . . . . . . . . . . . . . . 41
4.5 Structure of the CMCU with a function decoder . . . . . . . . . . 42
4.6 Technological structure of the CMCU U2 . . . . . . . . . . . . . . . 46
4.7 Structure of the CMCU with output identification . . . . . . . . . 47
4.8 Initial table of addressing . . . . . . . . . . . . . . . . . . . . . . . 50
4.9 Table of addressing after shift operations . . . . . . . . . . . . . . . 50
4.10 Technological structure of the CMCU U3 . . . . . . . . . . . . . . . 52
4.11 CMCU with outputs identification and function decoder . . . . . . 53
4.12 Technological structure of the CMCU U4 . . . . . . . . . . . . . . . 56
5.1 Structure of the CMCU with sharing codes . . . . . . . . . . . . . 59
5.2 Flow-chart Γ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 OLC flow-chart of the CMCU U5 . . . . . . . . . . . . . . . . . . . 62
5.4 Technological structure of the CMCU U5 . . . . . . . . . . . . . . . 64
5.5 CMCU with sharing codes and a function decoder . . . . . . . . . 65
5.6 Technology structure of the CMCU U6 . . . . . . . . . . . . . . . . 69
5.7 Structure of the CMCU with an address converter . . . . . . . . . 70
5.8 Flow-chart Γ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.9 OLC flow-chart of the CMCU U7 . . . . . . . . . . . . . . . . . . . 74
5.10 Technological structure of the CMCU U7 . . . . . . . . . . . . . . . 77
5.11 CMCU with an address converter and a function decoder . . . . . 78
5.12 Technological structure of the CMCU U8 . . . . . . . . . . . . . . . 80
6.1 Structure of the FPGA device . . . . . . . . . . . . . . . . . . . . . 85
6.2 Organization of BRAMs . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3 Traditional prototyping flow . . . . . . . . . . . . . . . . . . . . . . 87
6.4 Modified prototyping flow including partial reconfiguration . . . . 88
6.5 First version of the 3rd state of the traffic light driver . . . . . . . 90
6.6 Second version of the 3rd state of the traffic light driver . . . . . . 91
7.1 Structure of ATOMIC . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.1 Flow-chart Γ4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.2 Results of the simulation of the CMCU U9 . . . . . . . . . . . . . . 98
8.3 Three variants of the reconfiguration of two microinstructions . . . 104
A.1 Detailed structure of ATOMIC . . . . . . . . . . . . . . . . . . . . 109
A.2 Graphic and text description of an exemple CMCU U9 . . . . . . . 111
A.3 OLC description of the CMCU U9 . . . . . . . . . . . . . . . . . . 113
143
List of tables
4.1 Content of the control memory of the CMCU U1 . . . . . . . . . . 40
4.2 Table of transitions of the CMCU U1 . . . . . . . . . . . . . . . . . 40
4.3 Table of transitions of the CMCU U2 . . . . . . . . . . . . . . . . . 45
4.4 Table of the function decoder of the CMCU U2 . . . . . . . . . . . 45
4.5 Control memory content of the CMCU U3 . . . . . . . . . . . . . . 51
4.6 Transitions table of the CMCU U3 . . . . . . . . . . . . . . . . . . 51
4.7 Table of transitions of the CMCU UOD . . . . . . . . . . . . . . . . 55
5.1 Encoding of CMCU U5 OLCs and their components . . . . . . . . 62
5.2 Content of the control memory of the CMCU U5 . . . . . . . . . . 63
5.3 Table of transitions of the CMCU U5 . . . . . . . . . . . . . . . . . 63
5.4 Transition table of the CMCU U6 . . . . . . . . . . . . . . . . . . . 68
5.5 Table of the function decoder of the CMCU U6 . . . . . . . . . . . 69
5.6 Encoding of CMCU U7 OLCs and their components . . . . . . . . 75
5.7 Content of the control memory of the CMCU U7 . . . . . . . . . . 75
5.8 Transition table of the CMCU U7 . . . . . . . . . . . . . . . . . . . 75
5.9 Table of the address converter . . . . . . . . . . . . . . . . . . . . . 76
5.10 Transition table of the CMCU U8 . . . . . . . . . . . . . . . . . . . 79
5.11 Truth table of the function decoder of the CMCU U8 . . . . . . . . 80
6.1 Results achieved during the implementation of the traffic light driver 92
8.1 Average results of experiments . . . . . . . . . . . . . . . . . . . . 100
8.2 Results of three variants of the reconfiguration of two microinstruc-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.1 Results of implementation – Part 1 . . . . . . . . . . . . . . . . . . 120
B.2 Results of implementation – Part 2 . . . . . . . . . . . . . . . . . . 121
B.3 Results of implementation – Part 3 . . . . . . . . . . . . . . . . . . 122
B.4 Results of implementation – Part 4 . . . . . . . . . . . . . . . . . . 123
B.5 Results of implementation – Part 5 . . . . . . . . . . . . . . . . . . 124
B.6 Results of implementation – Part 6 . . . . . . . . . . . . . . . . . . 125
B.7 Results of partial reconfiguration of CMCUs – Part 1 . . . . . . . . 128
B.8 Results of partial reconfiguration of CMCUs – Part 2 . . . . . . . . 129
B.9 Results of partial reconfiguration of CMCUs – Part 3 . . . . . . . . 130
List of listings
A.1 MCU description of the CMCU U9 . . . . . . . . . . . . . . . . . . 113
A.2 Verilog code for the flow-chart Γ9 . . . . . . . . . . . . . . . . . . . 114
Metody syntezy układowej części
sterującej w mikrosystemie cyfrowym
Streszczenie
Jednostka sterująca jest jednym z najważniejszych elementów układu cyfrowego.
Bardzo szybki rozwój w dziedzinie techniki cyfrowej spowodował pojawienie się
zintegrowanych układów takich jak System-on-a-Chip (SoC ) czy System-on-a-
Programmable-Chip (SoPC ), w których bloki funkcjonalne projektowanego układu
implementowane są z wykorzystaniem matryc programowalnych FPGA (Field Pro-
grammable Gate Array). Takie podejście wymusza modyfikację klasycznych metod
projektowania jednostek sterujących. Główną cechą matryc FPGA jest wykorzys-
tanie elementów LUT (Look-Up Table) do realizacji funkcji logicznych. Liczba
wejść elementu LUT jest ściśle ograniczona, co wiąże się z zastosowaniem dekom-
pozycji w projektowanym układzie.
Jedną z metod zmniejszenia liczby wykorzystanych elementów LUT jest struk-
turalna dekompozycja jednostki sterującej. Wówczas system jest realizowany
jako układ wielopoziomowy, w którym mikrooperacje mogą zostać zaimplemen-
towane w dedykowanych blokach pamięci układu FPGA. Ponieważ pojemność
dedykowa-nych bloków pamięci układów programowalnych jest ograniczona, po-
jemność pa-mięci jednostki sterującej powininna być jak najmniejsza. Wymagania
te spełnia mikroprogramowany układ sterujący, w którym zastosowano dekom-
pozycję sterownika na część zarządzającą (adresującą) oraz pamięć, w której prze-
chowywane są mikroinstrukcje kontrolera. Należy tu jednakże zaznaczyć, że stoso-
wanie mikroprogramowanych układów sterujących ma sens w przypadku, gdy
sterownik może zostać zinterpretowany opisem tzw. liniowej sieci działań. W sieci
takiej liczba bloków operacyjnych stanowi co najmniej 75% wszystkich bloków
sieci.
146
W rozprawie zaproponowano sześć autorskich metod syntezy mikropogramowa-
nych układów sterujących. Celem opracowanych algorytmów było zmniejszenie
liczby wykorzystanych elementów logicznych docelowego układu FPGA. Ze względu
na strukturę, przedstawione metody syntezy zostały podzielone na dwie grupy.
Pierwsza dotyczy mikroprogramowanych układów sterujących o adresowaniu wspól-
nym, gdzie adres mikroinstrukcji jest wykorzystywany do rozpoznania stanów
wewnętrznych sterownika. Zaproponowano trzy nowe metody syntezy układu
o adresowaniu wspólnym:
• Układ z dekoderem funkcji - w którym wprowadzono dodatkowy blok deko-
dera funkcji. Ideą metody jest zakodowanie funkcji wzbudzeń dla licznika,
która jest następnie dekodowana przez dekoder funkcji. Dodatkowy blok
jest implementowany z wykorzystaniem dedykowanych bloków pamięci ma-
tryc FPGA, co pozwala zredukować liczbę wykorzystanych elementów LUT
w porównaniu do tradycyjnej realizacji sterownika o adresowaniu wspólnym.
• Układ z identyfikacją wyjść - w którym stany wewnętrzne sterownika kodo-
wane są z wykorzystaniem minimalnej, niezbędnej liczby bitów. Dzięki temu
zmniejszona zostaje liczba wejść układu kombinacyjnego, a co za tym idzie
liczba bloków logicznych, niezbędnych do implementacji tego modułu oraz
całego sterownika w matrycach FPGA.
• Układ z identyfikacją wyjść oraz dekoderem funkcji - w którym zastosowano
połączenie obu wyżej opisanych rozwiązań.
Druga grupa metod bazuje na idei współdzielenia kodów. W tym przypadku
adres mikroinstrukcji jest wyznaczany na podstawie kodów generowanych zarówno
przez licznik, jak i przez rejestr. W obrębie sterownika ze współdzieleniem kodów
także zaproponowano trzy nowe rozwiązania:
• Układ z dekoderem funkcji - w którym wprowadzono dodatkowy blok deko-
dera funkcji. Ideą metody jest zakodowanie funkcji wzbudzeń dla licznika
oraz dla rejestru. Funkcje te są następnie dekodowane przez dodatkowy blok.
Dekoder funkcji jest realizowany z wykorzystaniem dedykowanych bloków
pamięci matryc FPGA, co pozwala zredukować liczbę wykorzystanych ele-
mentów LUT w porównaniu do tradycyjnej realizacji sterownika o adresowa-
niu wspólnym.
• Układ z konwerterem adresów - w którym zastosowano dodatkowy blok kon-
wertera adresów mikroinstrukcji. Metoda ma sens w przypadku, gdy rozmiar
kodów generowanych przez licznik oraz rejestr jest większy niż minimalny
rozmiar adresu mikroinstrukcji. Każdy nadmiarowy bit oznacza podwojenie
pojemności pamięci układu mikroprogramowanego. Zastosowanie konwert-
era adresów pozwala utrzymać minimalny rozmiar adresu mikroinstrukcji.
147
• Układ z konwerterem adresów oraz dekoderem funkcji - w którym zastosowano
połączenie obu prezentowanych powyżej idei. Konwerter adresów umożliwia
zastosowanie metody ze współdzieleniem kodów w przypadku, gdy rozmiar
kodów generowanych przez licznik oraz przez rejestr jest większy niż mini-
malny rozmiar adresu mikroinstrukcji. Dodatkowo dekoder funkcji pozwala
zmniejszyć liczbę bloków LUT niezbędnych do realizacji układu w matrycach
FPGA.
W celu weryfikacji skuteczności algorytmów zaproponowanych w rozprawie,
opracowany został system automatycznej syntezy mikroprogramowanych układów
sterujących (ATOMIC ). Środowisko składa się z trzech niezależnych modułów,
które realizują kolejne etapy projektowe jednostki sterującej. Na podstawie specy-
fikacji sterownika przedstawionej w formie tekstowej, ATOMIC przeprowadza au-
tomatyczny proces dekompozycji strukturalnej. Wynikiem działania systemu jest
opis mikroprogramowanego układu sterującego w językach opisu sprzętu. Tak
przygotowany sterownik może zostać w konsekwencji zaimplementowany w do-
celowym układzie FPGA. Warto podkreślić fakt, że ATOMIC jest niezależny od
platformy oraz systemu operacyjnego. Program może zostać uruchomiony zarówno
w środowisku Windows, Unix jak i Solaris.
Przeprowadzone eksperymenty potwierdzają skuteczność zaproponowanych
metod syntezy mikroprogramowanych układów sterujących. W badaniach porów-
nano opracowane algorytmy z tradycyjnym sposobem projektowania jednostek
sterujących, realizowanych jako skończony automat stanów. Średnio największą
skuteczność uzyskano podczas realizacji układu jako mikroprogramowany układ
sterujący z konwerterem adresów oraz dekoderem funkcji. Taki sposób implemen-
tacji sterownika pozwala średnio zmniejszyć liczbę bloków LUT o 50% w porów-
naniu do realizacji układu jako skończony automat stanów.
Szczegółowa analiza wyników badań pozwoliła określić kryteria doboru metody,
w zależności od specyfikacji jednostki sterującej:
• W przypadku relatywnie małych sterowników (w których liczba mikroin-
strukcji nie przekracza 150, a pamięć układu może zostać zrealizowana
z wykorzystaniem jednego bloku pamięci FPGA), należy zastosować układ
ze współdzieleniem kodów oraz dekoderem funkcji.
• W przypadku sterowników, w których liczba mikroinstrukcji nie przekracza
150, jednakże pojemność pamięci jednostki sterującej jest większa niż pojem-
ność jednego bloku pamięci układu FPGA, należy zastosować układ
z konwerterem adresów oraz dekoderem funkcji.
• W przypadku sterowników, w których liczba mikroinstrukcji przekracza 150,
najlepszym rozwiązaniem jest zastosowanie układu z identyfikacją wyjść oraz
dekoderem funkcji.
148
Badania związane z częściową rekonfiguracją mikroprogramowanych układów
sterujących implementowanych w matrycach FPGA pokazały, że redukcja stru-
mienia danych jest silnie związana z rozlokowaniem pamięci realizowanego sterown-
ika w matrycy FPGA. Najlepsze rezultaty zostały osiągnięte w przypadku im-
plementacji modułu pamięci jednostki sterującej w blokach pamięci układu pro-
gramowalnego, które są położone w tej samej kolumnie. Takie rozwiązanie pozwala
zmniejszyć rozmiar strumienia przesyłanego do układu FPGA nawet ponad 500
razy.
Najważniejsze wyniki badań zaprezentowano na konferencjach oraz w cza-
sopismach, w tym w dwudziestu dziewięciu o zasięgu międzynarodowym oraz w
pięciu o zasięgu krajowym. Ponadto wymiernym efektem prowadzonych badań jest
nawiązana współpraca z Uniwersytetem w Hagen (zespół pod kierunkiem profesora
Halanga) oraz plany opatentowania opracowanych metod.
Część prowadzonych prac została zrealizowana w ramach projektu badawczego
finansowanego ze środków Zintegrowanego Programu Operacyjnego Rozwoju Re-
gionalnego z udziałem Europejskiego Funduszu Społecznego oraz grantu KBN
nr 3 T11C 046 26.
149
Lecture Notes in Control and Computer Science
Editor-in-Chief: Józef KORBICZ
Vol. 14: Remigiusz Wiśniewski
Synthesis of Compositional Microprogram Control Units for Programmable
Devices
153 p. 2009 [978-83-7481-293-1]
Vol. 13: Arkadiusz Bukowiec
Synthesis of Finite State Machines for FPGA Devices Based on Architectural
Decomposition
102 p. 2009 [978-83-7481-257-3]
Vol. 12: Małgorzata Kołopieńczyk
Application of Address Converter for Decreasing Memory Size of Compositional
Microprogram Control Unit with Code Sharing
88 p. 2008 [978-83-7481-215-3]
Vol. 11: Bartłomiej Sulikowski
Computational Aspects in Analysis and Synthesis of Repetitive Processes
168 p. 2006 [83-7481-033-5]
Vol. 10: Bartosz Kuczewski
Computational Aspects of Discrimination between Models of Dynamic Systems
158 p. 2006 [83-7481-030-0]
Vol. 9: Marek Kowal
Optimization of Neuro-Fuzzy Structures in Technical Diagnostics Systems
116 p. 2005 [83-89712-88-1]
Vol. 8: Wojciech Paszke
Analysis and Synthesis of Multidimensional System Classes Using Linear Matrix
Inequality Methods
188 p. 2005 [83-89712-81-4]
Vol. 7: Piotr Steć
Segmentation of Colour Video Sequences Using the Fast Marching Method
110 p. 2005 [83-89712-47-4]
Vol. 6: Grzegorz Łabiak
Wykorzystanie hierarchicznego modelu współbieżnego automatu
w projektowaniu sterowników cyfrowych
168 p. 2005 [83-89712-42-3]
Vol. 5: Maciej Patan
Optimal Observation Strategies for Parameter Estimation of Distributed Systems
220 p. 2004 [83-89712-03-2]
150
Vol. 4: Przemysław Jacewicz
Model Analysis and Synthesis of Complex Physical Systems Using Cellular
Automata
134 p. 2003 [83-89321-67-X]
Vol. 3: Agnieszka We¸grzyn
Symboliczna analiza układów sterowania binarnego z wykorzystaniem wybranych
metod analizy sieci Petriego
125 p. 2003 [83-89321-54-8]
Vol. 2: Grzegorz Andrzejewski
Programowy model interpretowanej sieci Petriego dla potrzeb projektowania
mikrosystemów cyfrowych
109 p. 2003 [83-89321-53-X]
Vol. 1: Marcin Witczak
Identification and Fault Detection of Non-Linear Dynamic Systems
124 p. 2003 [83-88317-65-2]
151
Prace Naukowe z Automatyki i Informatyki
Przewodnicza¸cy: Józef KORBICZ
Tom 14: Remigiusz Wiśniewski
Synthesis of Compositional Microprogram Control Units for Programmable
Devices
153 s. 2009 [978-83-7481-293-1]
Tom 13: Arkadiusz Bukowiec
Synthesis of Finite State Machines for FPGA Devices Based on Architectural
Decomposition
102 s. 2009 [978-83-7481-257-3]
Tom 12: Małgorzata Kołopieńczyk
Application of Address Converter for Decreasing Memory Size of Compositional
Microprogram Control Unit with Code Sharing
88 s. 2008 [978-83-7481-215-3]
Tom 11: Bartłomiej Sulikowski
Computational Aspects in Analysis and Synthesis of Repetitive Processes
168 s. 2006 [83-7481-033-5]
Tom 10: Bartosz Kuczewski
Computational Aspects of Discrimination between Models of Dynamic Systems
158 s. 2006 [83-7481-030-0]
Tom 9: Marek Kowal
Optimization of Neuro-Fuzzy Structures in Technical Diagnostics Systems
116 s. 2005 [83-89712-88-1]
Tom 8: Wojciech Paszke
Analysis and Synthesis of Multidimensional System Classes Using Linear Matrix
Inequality Methods
188 s. 2005 [83-89712-81-4]
Tom 7: Piotr Steć
Segmentation of Colour Video Sequences Using the Fast Marching Method
110 p. 2005 [83-89712-47-4]
Tom 6: Grzegorz Łabiak
Wykorzystanie hierarchicznego modelu współbieżnego automatu
w projektowaniu sterowników cyfrowych
168 s. 2005 [83-89712-42-3]
Tom 5: Maciej Patan
Optimal Observation Strategies for Parameter Estimation of Distributed Systems
220 s. 2004 [83-89712-03-2]
152
Tom 4: Przemysław Jacewicz
Model Analysis and Synthesis of Complex Physical Systems Using Cellular
Automata
134 s. 2003 [83-89321-67-X]
Tom 3: Agnieszka We¸grzyn
Symboliczna analiza układów sterowania binarnego z wykorzystaniem wybranych
metod analizy sieci Petriego
125 s. 2003 [83-89321-54-8]
Tom 2: Grzegorz Andrzejewski
Programowy model interpretowanej sieci Petriego dla potrzeb projektowania
mikrosystemów cyfrowych
109 s. 2003 [83-89321-53-X]
Tom 1: Marcin Witczak
Identification and Fault Detection of Non-Linear Dynamic Systems
124 s. 2003 [83-88317-65-2]
153
