Hardware emulation board based on field programmable gate arrays (FPGAs) and programmable interconnections. by Lo, Wing-yee. & Chinese University of Hong Kong Graduate School. Division of Electronic Engineering.
A Thesis of 
Hardware Emulation Board Based on Field Programmable Gate 
Arrays (FPGAs) and Programmable Interconnections 
by y / S力 , 
(i/ 坊：：—\ 
h \ 
LO Wing-yee I 4 SEP !>；； I 
Presented to 




Partial Fulfilment of the Requirements for Degree of 
Master of Philosophy in Electronic Engineering 
in 






 C y . 丨 
^ ^
 ^





/ p ^ 
Abstract 
Based on Field Programmable Gate Arrays (FPGAs), an in-circuit rapid prototyping system 
can be built. However, just hardwiring the FPGAs together is inflexible and may waste the 10 
resources. One can use programmable interconnect switches to enhance the connectivity 
between FPGAs. Added with a microprocessor and RAMs, almost any digital systems can be 
realized. Moreover, automated software tools are required in circuit path analysis, FPGAs 
partitioning, FPGAs configuration data downloading and interconnectable switches 
programming. This thesis describes the hardware and software aspects of a low-cost, 
reconfigurable and flexible hardware emulation board. Furthermore, different bus 
configurations between the FPGAs are analysed to find the best configuration. This board 




LIST OF TABLES iv 
LIST OF FIGURES v 
1. INTRODUCTION 1 
1.1 Traditional Design Prototyping 1 
1.2 In-Circuit Rapid Prototyping System 2 
1.3 A Summary of Prototyping Systems Available 5 
1.4 Universal Prototyping Board (UPB) 6 
2. HARDWARE DESIGNS 9 
2.1 Bus Interconnection 9 
2.1.1 Fixed buses 9 
2.1.2 Programmable buses 12 
2.2 Architectural Features 15 
2.2.1 Field programmable gate array 15 
2.2.2 Microprocessor 15 
2.2.3 Memory 16 
2.2.4 Buffers 18 
3. SOFTWARE TOOLS 20 
3.1 Critical Path Analysis 20 
3.1.1 Algorithm of critical path analysis 21 
3.1.2 Computation time 21 
3.2 Circuit Partitioning 23 
3.2.1 Partitioning algorithm 24 
ii 
3.2.2 Effects of partitioning 36 
3.2.3 Partitioning parameters 38 
3.2.4 Pseudo-code of partitioner 39 
3.3 10 Assignments 40 
3.3.1 Connect 4 FPGAs 40 
3.3.2 Connect 3 FPGAs 42 
3.3.3 Connect 2 FPGAs 44 
3.3.4 System 10 (Connect 1 FPGA) 47 
3.4 Other Tools 48 
4. STRUCTURE ANALYSIS 49 
5. RESULTS 52 
6. FUTURE DIRECTION 73 
6.1 Other Possible Configurations 73 
6.2 Programmable Interconnection 73 
6.3 Expandability of UPB 74 




LIST OF TABLES 
1. Paths need to be programmed to connect FPGAs 13 
2. Balanced and Non-Balanced partitioning comparison 38 
3. Benchmark circuits selected from MCNC 50 
4. Non-balanced partitioning, utilization = 0.3 53 
5. Balanced partitioning into 2, utilization = 0.3 54 
6. Balanced partitioning into 3, utilization = 0.3 55 
8. Balanced partitioning into 4, utilization = 0.3 56 
9. Non-balanced partitioning, utilization = 0.4 57 
10. Balanced partitioning into 2，utilization = 0.4 58 
11. Balanced partitioning into 3, utilization = 0.4 59 
12. Balanced partitioning into 4, utilization = 0.4 60 
13. Non-balanced partitioning, utilization = 0.5 61 
14. Balanced partitioning into 2，utilization = 0.5 62 
15. Balanced partitioning into 3，utilization = 0.5 63 
16. Balanced partitioning into 4’ utilization = 0.5 64 
17. Non-balanced partitioning, utilization = 0.6 65 
18. Balanced partitioning into 2，utilization = 0.6 66 
19. Balanced partitioning into 3, utilization = 0.6 67 
20. Balanced partitioning into 4，utilization = 0.6 68 
. iv 
LIST OF FIGURES 
1. A conceptual FPGA 3 
2. A CAD system for FPGA 4 
3. Bus interconnection in AnyBoard 1 
4. Using programmable saves lOs in UPB 7 
5. Local bus between any two adjacent FPGAs in UPB 9 
6. Global bus in UPB 10 
7. Board 10 in UPB 10 
8. Fixed bus in UPB 11 
9. Programmable buses connect two next adjacent FPGAs in UPB 12 
10. Programmable buses connect three consecutive FPGAs in UPB 12 
11. The programmable buses in UPB 14 
12. Downloading configuration data to FPGAs in UPB 15 
13. Memory in UPB 17 
14. Buffers to prevent address bus contention 18 
15. Buffers to prevent data bus contention 18 
16. Overall structure of UPB 19 
17. All paths in a circuit 22 
18. Post-mapping partitioning 23 
19. BUCKET array structure i n K & L algorithm 26 
20. Check critical nets before the move: T(n) = 0 27 
21. Check critical nets before the move: T(n) = 1 27 
22. Check critical nets after the move: F(n) = 0 28 
23. Check critical nets after the move: F(n) = 1 28 
24. Check critical nets before the move T(n) = 0, 0(n) =0 31 
25. Check critical nets before the move: T(n) = 0, 0(n) >= 1 31 
26. Check critical nets before the move: T(n) = 1, 0(n) = 0 32 
• V 
27. Check critical nets before the move: T(n)=l, 0(n) >= 1 32 
28. Check critical nets after the move: F(n) = 0, 0(n) = 0 33 
29. Check critical nets after the move: F(n) = 0，0(n) >= 1 33 
30. Check critical nets after the move: F(n) = 1, 0(n) = 0 34 
31. Check critical nets after the move: F(n) = 1, 0(n) >=1 34 
32. Assigning cells on critical path to FPGA 36 
20. Connect 4 FPGAs using global bus 40 
33. Connect 4 FPGAs using local & programmable bus (& board 10) 40 
34. Connect 4 FPGAs using local bus (& board 10) 41 
35. Connect 3 FPGAs using global bus 42 
36. Connect 3 FPGAs using programmable bus (& board 10) 42 
37. Connect 3 FPGAs using local bus (& board 10) 43 
38. Connect 3 FPGAs using local & programmable bus (& board 10) 43 
39. Connect 2 FPGAs using local bus (& board 10) 44 
40. Connect 2 next adjacent FPGAs using programmable bus (& board 10) 44 
41. Connect 2 adjacent FPGAs using programmable bus (& board 10) 45 
42. Connect 2 next adjacent FPGAs using programmable bus (& board 10) 45 
43. Connect 2 FPGAs using global bus 46 
44. System 10 using board 10 47 
45. System 10 using global bus 47 
46. Ideal 10 assignment 51 
47. Cut size of benchmark circuits with utilization rate = 0.3 69 
48. Cut size of benchmark circuits with utilization rate = 0.4 69 
49. Cut size of benchmark circuits with utilization rate = 0.5 69 
50. Cut size of benchmark circuits with utilization rate = 0.6 69 
51. Bus allocation according to structure analysis in UPB 71 
52. Final bus allocation in UPB 72 
. vi 
1. TNTRODUCTION 
Circuit design projects can be divided into four stages: design, verification, implementation 
and validation. The design stage includes choosing the best alternative from all the possible 
initial concepts and producing a logical description such as schematic diagram, high-level 
description language or circuit netlists. In the verification stage, the logical description is 
checked if it meets the specification written by the user. Then it enters the implementation 
stage which entails tooling and building the prototypes by using TTL chips or ASIC chips 
by the vendors. The last validation stage takes place when all prototype, peripheral, PCB and 
software's all are brought together and are plugged into the target system. This stage is 
important since it puts all components which were designed and tested separately together 
and have to interact with each other in the target system [1]. 
1.1 Traditional Design Validation 
Traditionally, there have been three choices available for design validation, namely 
breadboarding, software simulation and silicon prototyping. However, each of these 
alternatives has its own disadvantages. 
The breadboards are implemented by using TTLs, PALs or MSI off-shelf logic chips on a 
board. It works quite well only for designs less than a few thousands gates. For large designs, 
the time taken to build such a breadboard is very long. It is not worth especially in this 
competitive market. Since it is very likely that the design will not function the first time, the 
designers need to modify the designs. However, slightest changes in designs will cause large 
effort of rework. 
The apparently good choice of system-level simulation, indeed, cannot allow enough 
real-time operation to ensure correct functionality. It takes months to simulate a few seconds 
1 
of real-time operation of a moderately complex system [2^ Statistics shows that over 50% fail 
to work in the target system even though over 90% of test vectors pass. 
The last choice is for ASIC applications -一 silicon prototyping. The obvious disadvantage of 
silicon prototyping is that it is very difficult to probe inside the chip for debugging. Once it is 
in silicon, modification cannot be made to evaluate other better alternatives. The long turn 
around time and high Non-Recurring Engineering (NRE) cost make it no longer a good 
r 
choice for design validation. 「已- . 
1.2 In-Circuit Rapid System Prototyping 
A new technology called in-circuit hardware emulation system can alleviate the above 
disadvantages. It automatically produces hardware prototypes of chip designs from netlists 
and requires no effort in circuit design modification. The underlying key is the use of the 
Field Programmable Gate Arrays (FPGAs). FPGA is a device in which the final logic 
structure can be implemented by loading the internal RAM with configuration data, without 
going through the I.C. fabrication process. It combines the programmability of aPLD and the 
scaleable interconnection structure of mask programmable gate array (MPGA). 
Figure 1 shows a conceptual diagram of a typical FPGA. As depicted, it consists of a 
two-dimensional array of uncommitted Configurable Logic Blocks (CLBs) in which the logic 
design can he resided. They contain both combinational and sequential logic. CLBs are 
connected by the interconnect resources. The interconnect comprises segments of wire which 
may be of various length. Present in the interconnect are programmable switches serve to 
connect CLBs to wire segments, or one wire segment to another wire segment. Logic circuits 
are implemented in the FPGA by partitioning the logic into the CLBs and then 





V r r z _ _ ！ n i _ _ n V / V c j _ _ [ r e . _ _ W _ \ 
3 S g g g g g S 
^ r ^ rC3n r ^ o 
— i n j i — • J i — z u r Z L C ： — • L C z u c 
• • • • • • 
„„„„„„„„„„„ 
Fig. 1. A conceptual FPGA. 
Input/Output Blocks (lOBs). They provide interface between external package pins and the 
internal logic. They can be programmed to act as input, output or bidirectional. 
Automated CAD tools are provided to ease the implementation of the circuit in the FPGA. 
Figure 2 shows all the steps involved in the implementation process. The first step is to 
produce a logical description. It can be a schematic diagram, a VHDL description or a 
. 3 
Boolean expressions specification. The logical description is then translated into a netlist 
format which the software tools can understand. Afterwards, the logic optimization tools 
optimize the area and speed of the final circuit. Technology mapping transforms the circuit 
into FPGA CLBs and lOBs. Upon completion of mapping the CAD system will decide where 
to place each CLB in the FPGA array such that total length of interconnect required is 
minimized. Routing involves assignment of FPGA's wire segments and choosing the 
programmable switches to establish the required connections. The final step is to download 














Fig.2. A CAD system for FPGA. 
4 
hardware gates to emulate the design, it offers real-time operation. Such a system offers lower 
cost and faster implementation. To modify and change the design, the only thing we have to 
do is to download another set of configuration data into the FPGA. As a result, much 
innovation can be obtained in designs [3].^ 
1.3 A Summary of Prototyping Systems Available 
Although using FPGAs have many advantages over using the traditional circuit validation 
methods, FPGAs still have limitations. First of all, it has limited amount of gate count in each 
FPGA. Designers can prototype small designs of several thousand gates. For designs larger 
than 5,000 gates, they have to use several FPGAs in which a lot of manual intervention is 
required. For instance, it is necessary to manually partition the design among FPGAs. Manual 
partitioning, actually, can be very erroneous since there are too few pins on most FPGAs 
compared with the number of gates inside. Hence, when the designs are partitioned across 
multiple FPGAs, the usable gates per FPGA goes down rapidly. Besides, wirewrapping 
makes it very inconvenient to deal with design changes and rework. Another limitation is that 
as design grows more complex, many applications require a lot of memory. Yet FPGAs can 
offer only a little amount of memory. 
Over these few years, there are several companies and an academic institution that developed 
their rapid prototyping board based on FPGAs. The all try to overcome the limitations of 
FPGAs one way or another. 
One of the products from the market is Quciktum's RPM system. It has a multiplexed 
architectural rack called the Enterprise Emulation System which consists of logic emulation 
modules and a reprogrammable backplane. In each module, there is an array of Xilinx 
XC3090 FPGAs interspersed with the custom Multiplex Interconnect chips to connect the 
FPGAs. Up to eleven modules can be plugged into the rack. With the Interconnect Module, 
• 5 
the RPM can cluster at most 22 Enterprise Systems which can emulate up to six millions 
gates. An optional component adapter card permits the use of standard components such as 
memory devices. The modular and expandable capabilities of Enterprise are enhanced by the 
Automatic Design Partitioner (ADP). This software partitions logic into netlists fitting within 
a single emulation system and also automates the clustering of multiple Enterprise Systems 
[4]-[7L 
Another competitor, PiE's Mars II has a similar architecture but they claim that they have 
developed the innovative timing-driven partitioning software that automates the 
logic-emulation process [8]. At the same time, Intel's ASIC In-Circuit-Emulation (ICE) is 
tailored for verifying ASIC design especially incorporating with micro-controller core and 
peripheral cells [9] - [ 10]. 
Research work in the North Carolina State University has developed an AnyBoard for 
hardware emulation. A set of FPGAs built on a single 13-by-4 inch card can be inserted into 
the PC slot. The FPGAs have local buses between adjacent FPGAs and global bus which 
connected to all FPGAs for high fan out net. Besides, each FPGA is connected with a RAM 
for memory intensive designs. The PC interface allows data to pass between the AnyBoard 
and the host PC system for downloading and readback data to and from the array [11].‘ 
1.4 Universal Prototyping Board (UPB) 
It is no doubt that Quicktum's RPM system is powerful and flexible. However, it is too 
expensive and too large scale in gate count for most of the designs in the markets. For the 
AnyBoard, the interconnection between FPGAs is not flexible. As previously mentioned, 
there are too few pins on most FPGAs compared with the number of gates inside. Hence, 
partitioning the design across multiple FPGAs will bring the nets in the circuit out across the 
FPGAs. Just hardwiring the FPGAs may be 10 wasteful and inflexible. To clarify the point, 
6 
in the AnyBoard, only local bus and global bus are available. If a net connected to two CLBs 
and these CLBs are assigned to FPGA 0 and FPGA 2 (non-adjacent FPGAs), intermediate 
global bus 
m • • 
I 
I 
I I I t I 
I I 
厂 i 
FPGAO FPGAl FPGA 2 FPGA'S" 
local bus 
Fig. 3. Bus interconnection in AnyBoard. 
FPGA 0 I FPGA 2 I 
厂 1 I I 1 I 
switch 
I 
i FPGAl \ � FPGA 3 
I ‘ 
] — — — ' 
Fig. 4. Using programmable switch saves lOs in UPB. 
local data path in FPGA 1 must be connected if using local bus (Figure 3). Using the global 
bus will also work as the global bus connected to all the FPGAs. However, four lOs are 
committed in both cases, wasting two lOBs in FPGAs. If an intermediate switch is used to 
connect the CLBs, only two lOs are used (Figure 4). 
Hence, it is in practice that the interconnectivity between FPGAs should be emphasized in 
designing a rapid prototyping system. The Universal Prototyping Board (UPB) we developed 
has the flexibility of the Quicktum ’s RPM system and the lower cost of the AnyBoard. This 
UPB is based on four FPGAs and cross-point programmable analog switches. Besides, the 
UPB has the following characteristics: 
• 7 
- F P G A s interconnected by hardwired and programmable buses, 
- R A M s available for memory-intensive designs, 
-microprocessor for downloading configuration data of FPGAs, testing and itself as 
part of emulation, 
- s o f t w a r e tools to automate design process, 
- c h e a p and large gate count available, 
- e x p a n d a b l e hardware for more complex designs. 
In this chapter, the disadvantages of the traditional design prototyping methods and new 
technologies of in-circuit rapid prototyping systems were discussed. In chapter 2, the 
hardware aspects of the UPB will be described. The software tools provided to automate the 
validation process will be talked in chapter 3. In order to find the best configuration of the 
UPB, a structure analysis was designed and will be outlined in chapter 4. In chapter 5, the 
results of the analysis and the best configuration of the UPB concluded from the analysis will 
be presented. Next, the future direction of this kind of rapid prototyping system will be 
talked. Lastly, a conclusion will be drawn in chapter 7. 
8 
2. HARDWARE DESIGNS 
2.1 Bus Interconnection 
A good interconnection structure between FPGAs in the emulation board should contain 
various kinds of buses for different nets so that the signal can pass across FPGAs with little 
delay and that no 10 will be wasted. To achieve this, two kinds of buses are available in the 
UPB: the fixed buses and the programmable buses. The fixed buses are the hardwired buses 
while the programmable buses are the buses connected via programmable switches. As the 
fixed buses are hardwired, the delay for signal to travel along is very small. Therefore, it is 
mainly for critical nets, high fan-out nets, nets with high skew-rate and nets connected to 
CLBs which are assigned to neighbouring FPGAs. On the other hand, the programmable bus 
is used to saves the lOBs in the FPGAs. So the nets connected to CLBs which assigned to 
two next adjacent FPGAs or three consecutive FPGAs should be mapped to the 
programmable buses; only in this way, the signal need not to pass through an intermediate 
FPGA in which more lOs are used than desirable. 
2.1.1 Fixed buses 
Totally there exist- three kinds of fixed buses in the UPB. They are local bus, global bus and 
board 10. It is normal to think that if the design is too large to fit on a single FPGA, the 
design will he partitioned and placed on the adjacent chips rather than on two widely 
local bus 
~ F P G A i i FPGAi+i 
Fig. 5. Local bus between any two adjacent FPGAs in UPB. 
9 
separated chips that the signals have to travel a long way through the third FPGA. Hence, 
local bus is basically used to connect any two adjacent FPGAs together (Figure 5). 
The global bus, on the other hand, connects all four FPGAs together. It is mainly for the high 
fan-out nets and the high skew-rate signals (Figure 6). At the same time, this global bus is 
connected to the outside for the system input and output. 
FPGAO IFPGAI FPGA2 FPGAS 
i i {) 
Fig. 6. Global bus in UPB. 
The board 10, as the name implied, is the interface between the emulation board and the 
external world. Several pins of each FPGA are connected to the outside. (Figure 7). It should 
be noted that it is the global bus and the board 10 that contribute to the system interface 
where the UPB can be expanded by cascading more boards together for more complex 
design. Combining all these buses, the overall connection of the fixed bus as shown in Figure 
8. 
Board 10 
FPGAO iFPGAl FPGA2 FPGA3 
Fig. 7. Board 10 in UPB. 
10 
Global bus m 
� F P G A O I Local bus | FPGAl I 
——FPGA3 1 FPGA2 [ 
Board lO 
Fig. 8. Fixed bus in UPB. 
. 11 
2.1.2 Programmable buses 
The programmable buses connect signals that span more than two FPGAs. Although we can 
use the reprogrammable interconnection contained in the FPGAs as the connection resources, 
this will mean less usable lOBs available for the designs. Instead we use several cross-point 
programmable analog switches (74HCT22106), then the connectivity between multiple 
FPGAs will increase. 
Figure 9 shows one usage of the programmable switches. They connect any two next 
adjacent FPGAs together. Another usage of the switches is to connect any three consecutive 
FPGAs together. Figure 10 shows the connection method. In this method, either two out of 
three consecutive FPGAs are hardwired and the remaining FPGA is connected through the 
programmable switches. 
F P G A j I F P G A . ^ J | F P G A i + 2 
Switch 
Fig. 9. Programmable buses connecting two next adjacent FPGAs in UPB. 
F P G A : _ . | F P G A i + J | F P G A i + 2 
• 
Switch 
Fig. 10. Programmable buses connecting three consecutive FPGAs in UPB. 
. 12 
If all the FPGAs are connected in this way, a symmetrical interconnection through the 
programmable buses are obtained. It is illustrated in figure 11. There are 8 paths: 1 - 8 
connected to the switches. The paths 1, 3，5, 7 (odd path) are connected from each FPGA 0，1, 
2，3 while the paths 2, 4, 6, 8 (even path) are T-connected, from each FPGA and its two 
adjacent FPGAs to the switches. 
Two examples are given to illustrate the use of the programmable buses. For instance, if a net 
needs to connect two next adjacent FPGA 1 and FPGA 3, path 3 and path 7 (2 odd paths) are 
required to be programmed to connect. Similarly, if we want to connect three consecutive 
FPGA 1，FPGA 2 and FPGA 3，either path 3 and path 6 or path 4 and path 7 (1 odd and 1 
even paths) have to be programmed. The following table shows which paths should be 
programmed to connect for all possible FPGAs connections. 
FPGAS to be connected Paths need to be programmed to connect 
0 ,2 (1,5) 
1,3 (3,7) 
0 , 1 , 2 (1，4) or (2,5) 
1 ,2 ,3 (3，6) or (4, 7) 
2 , 3 , 0 (1,6) or (5, 8) 
3 ,0 ,1 (2，7) or (3，8) 
— Table 1. Paths need to be programmed to connect FPGAs. 
. 13 
F P G A O I- F P G A l 
8 1 I k 3 
~ " d S w i t c h ~ * “ 
F P G A 3 I F P G A 2 
Fig. 11. The pogrammable buses in UPB. 
14 
2.2 Architectural Features 
2.2.1 Field programmable gate array 
There are four Xilinx XC3042 FPGAs in the UPB as the prototype implementation 
technology. Although not all the gates provided are usable because of the incomplete 
routability of the internal blocks in each FPGA (see 3.2.2)，it is estimated that the usable gate 
count is approximately 10，000 agates. To download the configuration data, FPGA 0 is 
configured in the peripheral mode, FPGA 1 - 3 are configured in the slave mode. There is a 
microprocessor (see 2.2.2) in the UPB to download the data to the FPGAs. It gets the data 
from the serial port of a PC and then downloads to FPGA 0. The FPGA 0 is in fact acting as 
the leading device in the daisy chain. When it receives excess configuration data, it will pass 
to other slave FPGAs and so on [12] - [13] (figure 12). 
PC J processor I _ ^ FPGAO — FPGAl _ > FPGA 2 — FPGA 3 
serial port peripheral slave slave slave 
mode mode mode mode 
Figure 12. Downloading configuration data to FPGAs in UPB. 
2.2.2 Microprocessor 
,—-—、\\ 
In the UPB, there is a Motorola 68000 (68000|ip) microprocessor J14] - [15p With this 
‘.. 一一一-— 
microprocessor, two modes: testing mode and emulation mode are provided. These two 
modes are controlled by a toggling mode-select switch. At any time, the two modes is 
interchangeable by toggling the switch. 
15 
In the testing mode, the operation of UPB is controlled by the monitor program. The 
microprocessor can download the configuration data into the FPGAs. Besides, it can also read 
and write data from and to the RAMs. It is very useful when the design is in its developing 
stage since it replaces part of the design involving the control of accessing RAMs. Therefore 
the user can validate the control logic of accessing RAMs after the rest of the system is 
proved to work as expected. Moreover, apart from the normal operation (called normal mode) 
of the system, the whole system can debug the design in the trace mode. Designers can trace 
or single-step their designs in this mode for debugging and then verify them in normal mode 
after correcting their designs. The last but not the least, designers can program any switches 
they like. It does not mean that designers are required to program the switches for the 
interconnections between the FPGAs after partitioning. The software tools provided will 
program it automatically (see 3.4). Rather, designers can program the switches at any time for 
their own purposes. 
In the emulation mode, the operation of UPB is controlled by the instructions given by the 
designers. The firmware are located in another sets of EPROM. By toggling the mode-select 
switch, this firmware is selected. In other words, the microprocessor itself can be a part of the 
emulation system. 
2.2.3 Memory 
Three 3 2 x 8 RAMs each of which is connected to FPGA 0 - 2 to provide storage elements for 
memory-intensive design. Figure 13 shows how the memory design may be realized in the 
UPB. In this example, it is assumed that only RAM 1 is used in the design. FPGA 3 contain 
logic of an address generator, such as a counter, to drive all RAMs while FPGA 1 contains 
logic to read, write and control the RAM data. At least one intermediate local path is required 
to connect both logic blocks to provide communication between them. 
16 
FPGA3 FPGA2 FPGAl FPGAO 
r 丨 丨 
I ‘ I 
logic 卜 k-^- l l o g i c l ^ ； 
I I ‘ ] 
丨一—r—r—」 ^ ^ ~ 丨…-丁-丁—--J 
I I I I 小 !__ ： I � _ _ _ � _ _ � 丨 
buffer i buffer : buffer i buffer \ 
丨 — 「 ] - — — — — ; - - 「 - 」 
control I ； control control ; ； 
I I data data ; : data 
丨 丨 w I 丨 
「 R A M 2 i R A M I ; R A M O ; 
I I ‘ 丨 
‘ 个 7R 八 
address 
i 
Fig. 13 Memory in UPB. 
It should be noted that some of the programmable buses serve two purposes. Part of the 
address and data buses come from these programmable buses. If the RAMs are used in the 
UPB, these buses are used as the address buses and data buses for the RAMs. If not, they are 
used as normal programmable buses. This arrangement can maximize the number of 10 pins 
on each FPGA for interconnection between the FPGAs. The buffers between the FPGAs and 
RAMs are provided to prevent bus contention. It is because the 68000|ap can also access the 
RAMs. So the"data and address buses in RAM are physically connected to both FPGAs and 
68000iap. Bus contention would occur if both parties intended to access at the same time 
while there was no buffers between them, (see 2.2.4) 
17 
2.2.4 Buffers 
As stated before, RAM can be accessed by both FPGAs and 68000|ip, buffers must be 
provided to prevent bus contention. Apart from the RAMs, FPGA 0 can also be accessed by 
both 68000|Lip (for downloading configuration data) and RAM 0, buffer is required to connect 
between them. Otherwise, bus contention may occur if both parties access FPGA 0 at the 
same time. Figure 14 and Figure 15 show all the buffers required in isolating address and data 
lines between the corresponding two parties connected to them. 
:buffer 
Fig. 14. Buffers to prevent address bus contention. 
frpGAm \ ® 
— ^ I 
� : b u f f e r V ^ ^ ^ F P G A \\ 
Fig. 15 Buffers to prevent data bus contention. 
18 
. BoardTo …，， ；== 
Global bus 
_ T Local bus T _ 
FPGAO : FPGAI 
H c I � D Programmable c n r 
b u f f e r b u s b u f f e r H L, 
RAMO B 」 R A M I 
" - Switch 一 
-.—A RAMI 
A II I 
^： b u f f e r -1 b u f f e r 
nTi r^ c 
I Ml c ^ I I '~L| 
L FPGA3 丄-FPGA2 丄 
Fig. 16. Overall structure of UPB. 
Figure 16 shows the overall structure of the UPB including the FPGAs, RAMs buffers, 
switches and all kinds of buses [26]. 
I 
19 
1 SOFTWARE TOOLS 
3.1 Critical Path Analysis 
High-speed high-performance digital systems require a careful timing analysis of all critical 
signal paths to establish the maximum usable system clock frequency in sequential circuit 
designs, or to establish that critical path delays are compatible with the timing specification in 
combinational circuit designs. Very often the design will still not work even though the circuit 
passes the functional test. This is mainly because the designers do not consider the worst-case 
component delays and the signal wiring propagation delays [16]. This is important in UPB 
since there are programmable switches between the FPGAs, it is obvious that the propagation 
delays of a signal which leaves a FPGA, then proceeds through the switches and re-enter into 
another FPGA is longer than that propagates within a FPGA. Hence, a critical path analysis is 
performed before partitioning the circuit across the FPGAs. All the components in the critical 
path found will be assigned to a single FPGA. 
For a realistic estimation of the operation speed of a system, the propagation delays of each 
components in each signal path plus the interconnection propagation delays must be 
determined. A data file (delay.dat) which specifies all the primitive gates delays and the path 
delay can be modified by the user. The format and the content of the delay.dat file is shown in 
Appendix A. The default propagation delays of all primitive gates and paths are one. i.e. 
unit-delay model. For combinational circuit, the critical path is the longest delay of a path 
originates from the UPB input to the UPB output. On the other hand, the operation speed of a 
sequential circuit is determined by the paths between DFFs and system input and output. So 
the critical path is the longest delay of one path in the following: from the UPB input to the 
input of a D-flip flop (DFF, the primitive clocked device in FPGA), or from the input of a 
DFF to the input of another DFF, or from the input of a DFF to the UPB output in the 
sequential circuit. An option that the critical path analysis will treat all the components in a 
20 
sequential circuit as combinational elements is provided for analysis. In other words, if this 
option is chosen, the program will find the longest path from the UPB input to the UPB output 
disregarding the DFFs. 
3.1.1 Algorithm of critical path analysis 
The algorithm of the critical path analysis is very simple. It just iteratedly searches all paths in 
the circuit from the system input to the system output in the combinational circuit and from the 
system input to the input of a DFF, or from the input of a DFF to the input of another DFF, or 
from the input of one DFF to the system output in the sequential circuit. 
First of all, a GATE variable array which records all the primitive gates is constructed. For 
each gate, a linked netlist of its succeeding gates, and the information of the type of the gate 
such as AND, DFF, etc., and that whether it is an input gate or output gate are recorded. Input 
gate is the gate whose signals incident on it is system input wile the output gate is the gate 
whose signals incident on it is system output. At any time, two paths are stored in memory. 
One stores the longest path currently found so far while the other stores the current path it is 
searching. At the same time, two critical path delays correspond to the two paths are also 
noted. It continuously compares the current path with the stored longest path. If the current 
path is longer than the longest path stored, the longest path recorded will store the current 
path. Otherwise, the current path will be deleted. 
3.1.2 Computation time 
It seems that the algorithm is quite simple and straightforward. However, if it is not carefully 
done, the computation time will be terrible. One naive approach is deleting the whole path 
after comparing with the longest path stored. This is not necessary since within a path, there 
may be a gate which have two or more succeeding gates. In such case, there exists another 
21 
path in the circuit in which the path preceding that gate is the same. From figure 17，gates 1，3 
& 4 are input gates and gate 7 is output gate. Gates 2 and 4 have two succeeding gates. We can 
see that path 2 preceding gate 2 is the same as path 1 preceding gate 2. Similarly, path 3 
contains 2 succeeding gates 
I input gate 
input gate t “ “ • 
inpuTgate 
Paths in curcuit: ^ go J 
path 1: gl->g2->g3-->g7 V i y 
path 2: g 1 ->g2-->g4-->g5-->g7 
path 3: g 1 ->g2->g4-->g6->g7 
path 4: g3->g7 
path 5: g4~>g5~>g7 
path 6: g4~>g6~>g7 
Fig. 17 All paths in a circuit. 
preceding gate 4 is the same as path 2 preceding gate 4. Therefore, a stack (not queue) is used 
to store the gates with two or more succeeding gates. Every time after comparing with the 
longest path stored, we only delete the path proceeding the gate in the stack. 
Another time-consuming task is to check for feedback path. A simple approach is to check if 
the current gate found already exists in a path. This wastes a lot of time for searching from 
the start of a path to the current gate if the circuit is very large because the path obtained will 
be very long. The problem can be solved by adding a pointer in each entry in the GATE 
variable array pointed to the gate in the path. Every time it adds a gate in the path, we only 
check if the pointer of the gate is NULL. If so, it does not exist on the path. Otherwise, the 
gate does exist, i.e. it is the feedback path. The existed gate found is disregarded and not added 
in the path. It then continues to find the next succeeding gates in the current path. 
. 22 
3.2 Circuit Partitioning 
In order to automate the design process, a partitioning tool is provided to partition the netlist 
into multiple sets of netlist each of which fits into a FPGA. There are two kinds of partitioning 
techniques: pre-mapping partitioning and post-mapping partitioning. Pre-mapping partitioning 
Design entry 
• T , 
Translate 




C L B n e t l i s t ] f CLB n e t l i s t •… CLB n e t l i s t 
( * O O . m a p ) ( * 0 1 . m a p ) ( * 0 3 . m a p ) 
Xilinx APR Xilinx APR •••• Xilinx APR 
f ^ 广— ^ c — > 
Routed netlist Routed netlist • • • • Routed netlist 
(*00.1ca) (*01.1ca) •… (*03.1ca) 
V -J V J ^ J 
9 . 
• • 
Fig. 18. Post-mapping partitioning. 
23 
means the circuit partitioning is performed before the technology mapping process whereas in 
the post-mapping partitioning, partitioning is done after the technology mapping process. 
Clearly, an advantage of post-mapping partitioning is that it occurs at the actual 
implementation of the design process. The physical information specific to the FPGA is 
available [17]. So the partitioner here adopts post-mapping partitioning. It accepts the netlist 
file with extension *.map (mapping) and produces the partitioned netlists that can be fitted into 
individual FPGA for Xilinx Automatic Place and Route (APR) (see figure 18). 
3.2.1 Partitioning algorithm 
Logic partitioning is NP-complete. In practice, the size of the partitioning problems makes it 
impossible to perform an exhaustive search to find an optimal partition. Hence, algorithms 
based on heuristic rationales that give good results in a reasonable amount of time have been 
developed [18]-[20]. 
3.2.1.1 Kemighan and Lin 
One of the well known heuristic methods of circuit partitioning was developed by Kemighan 
and Lin (K&L) [21] f i t became the basis for most of the iterative improvement partitioning 
algorithms generally used. One example is the linear-time heuristic partitioning algorithm 
developed by Fiduccia and Mattheyses (F&M) [22] '(see 3.2.1.2). Another example is the 
level-gain model developed by Balakrishnan Krishnamurthy [23]^This algorithm deals with 
the problem of dividing a set of cells into two blocks A and B (bi-partitioning) so that the cut 
size between two blocks is minimized. The cut size is the number of nets connected 
simultaneously to cells in both blocks of the partition. It starts with an initial partition of cells 
into A and B, and improves it by choosing one cell from each block and swaps them. The cell 
pair chosen is the pair that gives the most improvement in cut size. The algorithm contains a 
number of passes. In each pass, potential gains achieved from all possible cell pairs if they are 
24 
swapped are calculated. After choosing the cell pair with the best potential gain, they are 
swapped and locked in place. The cut size of the current partition is recorded and the potential 
gains are updated. The same process continues among the unlocked cells until all the cells are 
locked. At the end of the pass, the cells in the two partitioned blocks and the cut size will be 
exactly the same as at the beginning since all the cells in block A are moved to block B and 
vice versa. The algorithm looks back at the sequence of gains that have been recorded, and 
undoes swaps that are made after the greatest improvement has been seen. The same process is 
repeated in subsequent passes, and the whole algorithm stops until the gaip cannot be 
improved. It was shown that the running time per pass is O(c^logc), where c is the total 
number of cells in the network, and that the process usually converges after a few passes. 
.2 Fiduccia and Mattheyses 
With the proper model from reference [24], Fiduccia and Mattheyses modified the algorithm 
ofK&L such that the worst case running time per pass is O(clogc) i.e. grows linearly with the 
size of the network, and proved that it also converges in several passes. Instead of swapping 
the best cell pair from each block, the basic approach is to move one cell at a time, from one 
block to the other, in an attempt to reduce the number of nets which have cells in both blocks. 
Each cell is assigned a gain value defined as the number of nets by which the cut size would 
decrease if it was m o v ^ from its current block to its complementary blocks. This MOVE 
operation consists of selecting the best cell (highest gain) to be moved, moving it and then 
adjusting the gains of its free (unlocked) neighbouring cells. In a pass, all the cells are moved 
and locked, and then 袖me later moves are undone until the greatest improvement is obtained 
as in K&L algorithm. Since it moves one cell to the other block at a time, there is a possibility 
that all the cells in one block may be moved to the other. A balancing criterion is established 
to prevent the MOVE during a pass. 
25 
If this MOVE operation was implemented naively, it would require O(c^) operation. Two 
tricks in implementing this operation are proposed that will reduce the total work to perform 
one pass to be linear. The first one is the introduction of a data structure: BUCKET variable 
BUCKET array 
MAXGAIN ^ T ^ Z f p T I cell # cell # | 
— doubly linked list 
Fig. 19. BUCKET array structure in K & L algorithm. 
array, as shown in Figure 19, whose k^ entry contains a doubly-linked list of free cells with 
gains currently equal to k. Two such arrays are needed, one for block A and one for block B. 
For each BUCKET array, a MAXGAIN index is maintained which is used to keep track of the 
bucket having a cell of highest gain. This data structure quickly returns a cell of highest gain 
and allows recomputed cell gains to be re-entered into the structure. 
The second one is to update the gains of its free neighbouring cells by an appropriate sequence 
of simple gain increments or decrements of the current gains rather than recomputing the gains 
of all neighbouring cells each time the best cell moves. Define the distribution of a net n, 
relative to the blocks A and B, as A(n) and B(n) which represents the number of cells the net n 
has in blocks A and B respectively. The net n is said to be cut if it has at least one cell in each 
block and uncut if otherwise. This refers to the cut-state of the net. The net is critical if it is 
connected to a cell which if moved will change the net's cut-state. It is easy to see that n is 
critical iff: either A(n) or B(n) is equal to 0 or 1. More importantly, a net which is not critical 
either before or after a move cannot possibly influence the gains of any of its cells. Therefore, 
only those nets, connected to the best cell, that are critical before or after the move have to be 
considered. Let F(n) (From) be the distribution of a net n relative to the block in which the best 
cell is resided, and T(n) (To) be the distribution of the net n relative to the complimentary 
block. We have to check the critical nets before and after the move. Besides, the net 
26 
distribution of the net to which the best cell is connected has changed to reflect the move. 
Before the move, the critical nets in To-side i.e. T(n) = 0 and T(n) = 1 are checked since the 
best cell is going to be moved to it. On the other hand, after the move, critical nets in 
From-side i.e. F(n) = 0 and F(n) = 1 are checked as the best cell have already been moved from 
it. During the move, both F(n) and T(n) are updated. In the following, all cases before and after 
the move are illustrated. Assume Ci is the best cell to be moved. 
Case 1: Before the move, T(n) = 0，F(n) > 2 (Figure 20). 
Result: gain of all free cells on net n incremented. 
F(n)> = 2 T(n) = 0 
From-side To-side From-side To-side 
g2 = -l g2 = 0 
g3 = -l g3 = 0 
Fig. 20. Check critical nets before the move: T(n) = 0. 
Case 2: Before the move, T(n) = 1, F(n) > 2 (Figure 21). 
Result: gain of the only cell in To-side decremented, if it is free. 
F(n)> = 2 I T(n)=l 
From-side To-side From-side To-side 
g2 = 0 g2 = 0 
g3 = 0 g3 = 0 
g 4 = l g4 = 0 
Fig. 21. Check critical nets before the move: T(n) = 1. 
27 
Case 3: After the move, F(n) = 0，T(n) > 2 (Figure 22). 
Result: gains of all free cells on net n decremented. 
F(n) = 0 T(n)> = 2 
晨 隱 ； A 
0 1 © © > I © © © 
From-side To-side From-side To-side 
g2 = 0 g2 = -l 
g3 = 0 g3 = -l 
Fig. 22. Check critical nets after the move: F(n) = 0. 
Case 4: After the move, F(n) = 1, T(n) > 2 (Figure 23). 
Result: gain of the only cell in From-side incremented, if it is free. 
F(n) = 1 T(n)> = 2 • 
From-side To-side From-side To-side 
g2 = 0 g2 = 1 
g3 = 0 g3 = 0 
g4 二 0 g4 = 0 
Fig. 23. Check critical nets after the move: F(n) = 1. 
Combining all the cases together, updating the gains of the free neighbouring cells of the best 
cells requires only an appropriate sequence of gain increment or decrement of the current 
gains. The process can be written as follows: 
28 
FOR each net n on the best cell DO 
IF T(n) = 0 THEN /* check critical nets before move */ 
increment gains of all free cells on n; 
ELSE IF T(n) = 1 THEN 
decrement gain of the only T cell on n, if it is free; 
END IF 
decrement F(n); /* change net distribution to reflect */ 
increment T(n); /* the move */ 
IF F(n) = 0 THEN /* check critical nets after move */ 
decrement gains of all free cells on n; 
ELSE IF F(n) =1 THEN 




As the computation time of F&M is much faster than K&L for large circuits, F&M algorithm 
is used to partition the netlist in the UPB. However, both K&L and F&M algorithms deal with 
bi-partition, there are totally four FPGAs (blocks) in the UPB, it is necessary to adapt the 
bi-partitioning F&M algorithm to the multi-partitioning algorithm. There are two ways to 
adapt the bi-partitioning algorithm to multi-partitioning. One way successively chooses pairs 
of blocks and applies the bi-partitioning to these pairs until all pairs are chosen. However, 
eliminating a net across a given pair does not necessarily remove it in the multiple block 
partition. Another way hierarchically uses bi-partitioning algorithm. It initially uses 
bi-partitioning to partition the whole network into two blocks, and then partition each blocks 
into two more blocks and so on. Obviously, the first level partitioning will limit the scope of 
improvement iir the next level partitioning. Besides, the first level partitioning will try to 
minimize the number of connections between the first two blocks. This tends to maximize the 
connections inside these blocks which will make it harder to get a good result in the next level 
partitioning. Both ways are obviously not satisfactory. A better approach is to expand the 
scope at each step in the algorithm rather than applying the whole bi-partitioning algorithm to 
block pairs one by one. In this approach, all possible moves to any of the other blocks are 
considered and the best move is chosen and recorded [25]: 
29 
It is different from bi-partitioning that in the multi-partitioning, the cells can be moved to all 
other blocks rather than only the complimentary block. So there are several gains associated 
with each cell. Define the gain gft(C) of a cell C as the number of nets by which the cut size 
would decrease if it was moved from f-block to t-block. In the UPB, each cell has 3 gain 
values since there are totally 4 FPGAs in the UPB and each cell can move from its current 
FPGA to the other 3 FPGAs. The net n has k cuts if it has the cells in k+1 blocks and the cut 
size of the partition in the UPB is defined as the total number of cuts it has. 
The tricks proposed by F&M were also adopted and modified to make running time per pass is 
linear. First of all, the BUCKET variable arrays are also used in the UPB partitioner. However, 
there are totally 4 x 3 = 12 such arrays kept. For each block, 3 arrays are kept to represent the 
possible movement to the other 3 blocks. A MAXGAIN index is also maintained in each array 
to point to the cell with the maximum gain. In order to ease the process of choosing the best 
cell among those cells pointed by the MAXGAIN pointers, a HEAP variable array is used to 
keep a sorted list of the MAXGAIN pointers and the corresponding move directions. The 
pointer with the highest gain value can be found at the bottom of the HEAP and will be chosen 
and moved. 
Secondly, updating the gains of the free neighbouring cells is also modified such that it can be 
done by an appropriate sequence of gain increments or decrements of the current gains when 
the best cell is moved. Apart from the F(n) and T(n), let 0(n) (Other) be the distribution of a 
net n relative tathe block which the best cell are not moved from or moved to. The following 
8 cases illustrates how the process of updating gains can be implemented. 
30 
Case 1: Before the move, T(n) = 0，0(n) = 0，F(n)> 2 (Figure 24). 
Result: gains g^ of all free cells on n incremented. 
F(n) > = 2 T(n) = 0 0(n) = 0 
I i 
From-side To-side Other-side From-side To-side Other-side 
gft � = - l gft � = 0 
gft � = - l = 0 
gfo(2) = -l gfo(2) = -l 
gfo(3) = -l gfo(3) = -l 
Fig. 24. Check critical nets before the move: T(n) = 0，0(n) = 0. 
Case 2: Before the move, T(n) = 0，0(n) > l , F ( n ) > 2 (Figure 25). 
Results: gains gft of all free cells on n incremented and 
gains got of all free cells on n incremented. 
F(n)> = 2 T(n) = 0 0 ( n ) > = l 
From-side To-side Other-side From-side To-side Other-side 
gft(2) = -l gof (4) = 1 gft(2) = 0 gof (4 )= l 
gft(3)—1 got(4) = 0 gft(3) = 0 got(4)=l 
gfo (2) = 0 gfo (2) = 0 
gfo (3) = 0 gfo (3) = 0 
Fig. 25. Check critical nets before the move: T(n) = 0, 0(n) > = 1. 
31 
Case 3: Before the move, T(n) = 1，0(n) = 0, F(n) > 2 (Figure 26). 
Results: gain g^ of the only cell in To-side decremented, if it is free and 
gain gto of the only cell in To-side decremented, if it is free. 
F(n)> = 2 T(n) = 1 0(n) = 0 
\ (^^^^^h、 
From-side To-side Other-side From-side To-side Other-side 
gft(2) = 0 g t f � = 1 gft � = 0 gtf(4) = 0 
gft(3) = 0 gto (4) = 0 gft(3) = 0 gto (4) = -1 
gfo (2) = -1 gfo(2) = -l 
gfo (3) = -1 gfo (3) = -1 
Fig. 26. Check critical nets before the move: T(n) = 1, 0(n) = 0. 
Case 4: Before the move, T(n) = 1，0(n) > l , F ( n ) > 2 (Figure 27). 
Results: gain gtf of the only cell in To-side decremented, if it is free and 
gain gto of the only cell in To-side decremented, if it is free. 
F(n)> = 2 T(n) = 1 0(n) > = 1 
From-side To-side Other-side From-side To-side Other-side 
gft(2) = 0 g t f � = 1 gof(5)=l g f t � = 0 gtf(4) = 0 gof(5)=l 
gft(3) = 0 gto (4) = 1 got(5)=l gft(3) = 0 gto (4) = 0 got(5)=l 
gfo (2) = 0 gfo (2) = 0 
gfo (3) = 0 gfo (3) = 0 
Fig. 27. Check critical nets before the move: T(n) = 1, 0(n) > = 1. 
32 
Case 5: After the move, F(n) = 0, 0(n) = 0，T(n) > 2 (Figure 28). 
Result: gains g^ of all free cells on n decremented. 
F(n) = 0 T(n)> = 2 0(n) = 0 
From-side To-side Other-side From-side To-side Other-side 
gtf(2) = 0 gtf(2) = -l 
gtf(3) = 0 gtf(3) = -l 
gto (2) = -1 gto (2) = -1 
gto (3) = -1 gto(3) = -l 
Fig. 28. Check critical nets after the move: F(n) = 0，0(n) = 0. 
Case 6: After the move, F(n) = 0, 0(n) > 1, T(n) > 2 (Figure 29). 
Results: gains g^ of all free cells on n decremented and 
gains gof of all free cells on n decremented. 
F(n) = 0 T(n)> = 2 0 ( n ) > = l 
@ ； ； © 0 | © i © ® © i © 
From-side To-side Other-side From-side To-side Other-side 
gtf (2) = 0 g o f � = 1 gtf (2) = -1 go f (4 ) = 0 
gtf (3) = 0 got (4)=l gtf(3) = -l go t (4)=l 
gto (2) = a gto (2) = 0 
gto (3) = 0 gto (3) = 0 
Fig. 29. Check critical nets after the move: F(n) = 0, 0(n) > = 1. 
33 
Case 7: After the move, F(n) = 1，0(n) = 0，T(n) > 2 (Figure 30). 
Results: gain g^ of the only cell in From-side incremented, if it is free and 
gain gfo of the only cell in From-side incremented, if it is free. 
F(n) = 1 T(n)> = 2 0(n) = 0 
From-side To-side Other-side From-side To-side Other-side 
gft(2) = 0 gtf(3) = 0 gf t (2)=l gtf (3) = 0 
gfo(2) = -l gtf(4) = 0 gfo(2) = 0 gtf(4) = 0 
gto (3) = -1 gto(3) = -l 
gto (4) = -1 gto (4) = -1 
Fig. 30. Check critical nets after the move: F(n) = 1，0(n) = 0 . 
Case 8: After the move, F(n) = l , 0 ( n ) > l , T(n) > 2 (Figure 31). 
Results: gain g^ of the only cell in From-side incremented, if it is free and 
gain gfo of the only cell in From-side incremented, if it is free. 
F(n) = 1 T(n)> = 2 0(n)> = l 
From-side To-side Other-side From-side To-side Other-side 
gft(2) = 0 — gtf(3) = 0 gof (5)=l gf t (2)=l gtf(3) = 0 go f (5 )= l 
gfo (2) = 0 gtf (4) = 0 got(5)=l gfo (2) = 1 gtf (4) = 0 got (5) 二 1 
gto (3) = 0 gto (3) = 0 
gto (4) = 0 gto (4) = 0 
Fig. 31. Check critical nets after the move: F(n) = 1，0(n) > = 1. 
34 
Combining all the cases above, the following codes can be developed to update the gates of 
the neighbouring cells when the best cell moves: 
FOR each net n on the best cell DO 
IF T(n) = 0 THEN /* check critical nets before the */ 
increment gains g^ of all free cells on n; /* move */ 
increment gains g^tof all free cells on n, if any; 
ELSE IF T(n) = 1 THEN 
decrement gain g^ of the only T cell, if it is free; 
decrement gain g,�of the only T cell, if it is free; 
END IF 
decrement F(n); /* change net distribution to */ 
increment T(n); /* reflect the move */ 
IF F(n) = 0 THEN /* check critical nets after the */ 
decrement gains g^ of all free cells on n; /* move */ 
decrement gains g^fOf all free cells on n, if any; 
ELSE IF F(n) = 1 THEN 
increment gain g^of the only cell in From-side, if it is free; 
increment gain g扣 of the only cell in From-side, if it is free; 
END IF 
END FOR 
Before generating the initial partition, the CLBs which contain gates on the critical path are 
assigned to an FPGA so that the number of interconnections between critical path and the 
fixed neighbouring cells. The fixed neighbouring cells come from the fact that some logic cells 
that drive fixed pins such as the address lines for the RAM chips must be assigned to the 
FPGA connected to those fixed pins. Also, the algorithm allows users to assign any signals to 
a specific FPGA for their own purposes. For example, in figure 32, there are three 
35 
interconnections between the critical path and the fixed neighbouring cells assigned to FPGA 
1, six interconnections between the critical path and the fixed neighbouring cells assigned to 
FPGA 2，and two interconnections between the critical path and the fixed neighbouring cells 
critical path ^ ^ ^ ^ 
T r J 
Fig. 32. Assigning cells on critical path to FPGA. 
assigned to FPGA 3. So the cells on the critical path are assigned to FPGA 2. On the other 
hand, whenever no fixed neighbour cells exists, the cells on the critical path are arbitrarily 
assigned to an FPGA. Once the critical path and fixed cells are taken care of, cells remained 
are then assigned to a FPGA which currently contains the most number of cells. If the design 
is too large to fit into the FPGA, the cells are assigned to the adjacent FPGA and the next 
adjacent FPGA and so on until all the cells are assigned. 
3.2.2 Effects of partitioning 
As discussed above, the delays of a signal routed off one FPGA and back onto another will 
increase. It is better if the signal is mapped to the hardwired bus. In contrast, if it is mapped to 
the programmable bus, extra delay that the programmable switches caused may be very 
serious. That's why the critical path analysis is needed before partitioning the circuit. Of 
course, assigning all gates on the critical path into a single FPGA may have a disadvantage 
that the cut size between FPGAs will be higher since it forces some CLBs into a particular 
FPGA. Luckily, this will affect the non-critical-part of the circuit. 
36 
The fixed gate count of the FPGA always limit the number of the CLBs assigned to each 
FPGA. Theoretically, each XC3042 FPGA can handle 4, 200 gates, totally 16, 800 gates in the 
UPB. However, locking the lOs by the partitioner can severely restrict the ability of the Xilinx 
APR to place and route the design. There are only at most 50% of CLBs, around 10,000 gates, 
usable to obtain 100% routability of each FPGA. The CLB utilization rate of the UPB is 
defined as the maximum ratio of CLBs in each FPGAs can be used. Hence the higher the CLB 
utilization rate of FPGAs was set by the partitioner, the lower the routability of the FPGAs 
will be obtained and vice versa. 
Two ways of partitioning methods can be chosen by the user. The first one is the balanced 
partitioning. It partitions the circuit into any number (2-4) of FPGAs in a balanced way. It has 
the advantage that the logic after partitioning in each FPGA is smaller. So the Xilinx APR 
most probably can route the circuit completely. However, balanced partitioning increases the 
cut size between the FPGAs. This results in greater delay in signals and congestion in 
interchip routing. The second one is the non-balanced partitioning. It first partitions the circuit 
in such a way that the lowest number of FPGAs is used to accommodate the circuit. In this 
case, the cut size between FPGA will be smaller and hence it has the greater possibility to 
assign all lOs between FPGAs. The delay will be smaller as less signals will pass through the 
switches and the interchip buses. Yet, the logic assigned to each FPGA will increase. This may 
lead to the result that the APR is not able to route all the nets in FPGAs. Table 2 compares the 
effects of balanced partitioning and non-balanced partitioning. 
37 
Balanced partitioning Non-balanced partitioning 
个 小 
Cut Size 









Table 2. Balanced and Non-balanced partitioning comparison. 
3.2.3 Partitioning parameters 
As the partition influence several aspects of the design implementation, there are several 
parameters that can be varied by the user to tradeoff various effects. The user can choose not 
performing the critical path analysis for moderately low-speed applications. Or he can choose 
the option that the partition treats all primitive gates as combinational gates for analysis 
purpose. The user can vary the CLB utilization ratio of FPGA from 0.1 to 0.9. Besides, he can 
choose from balanced partitioning into any number (2-4) of FPGAs or non-balanced 
partitioning. With all these parameters varied at the same time, many partitioned sets of netlist 
can be obtained. From those sets, one may be able to get a partitioned netlist which has 
minimum cut size, minimum propagation delay between interchips, 100% 10 assignment and 
100% routability of all FPGAs used. The default options of the partitioner is no critical path 
analysis, CLB utilization rate = 0.5 and non-balanced partitioning. All the command line 
argument options are listed in Appendix B. 
38 
3.2.4 Pseudo-code of partitioning algorithm 
1. Network initialization; 
2. All user-defined cells and the dedicated cells which drive fixed pins of FPGA are assigned 
to appropriate FPGAs and locked; 
3. Find and return the critical path; 
4. Obtain a starting partition after assigning and locking the cells on the critical path and 
neighbour fixed cells, if any; 
5. REPEAT 
1. Initialize starting partition; 
2. Initialize BUCKET gain array; 
3. Initialize HEAP; 
4. WHILE(a move chosen preserving 2 constraints is possible && not all the cells 
locked) DO 
1. Gains of all possible moving of each cell from its home FPGA to all other 
FPGAs are computed and noted The highest gain move is returned; 
2. The cell returned from above is locked and moved to the assigned FPGA; 
3. Update the gains of its neighbouring cells; 
4. Update the MAXGAINpointer in BUCKET array; 
5. Update HEAP; 
6. The move and the current system gain are recorded in MOVE record; 
5. 'Unlock' and 'unmove' the cells according to the MOVE record until the highest 
system gains obtained. 
UNTIL there is no improvement in system gain. 
A move is possible only when it satisfies two constraints. First, a finite number of 
combinational logic blocks (CLBs) limits the amount of logic that can be assigned to a single 
39 
FPGAs. This number is limited by the CLB utilization rate (see later). Second is the limited 
number of interconnect (lOs) on each FPGA. 
3.3 lO Assignments 
Since all types of buses in the UPB are fixed to particular pins of the FPGAs, 10 assignment is 
needed after partitioning. The pin numbers corresponds to all types of buses are specified in a 
data file iosource.dat which is first read in before 10 assignments is performed. The formats 
and the content of the iosource.dat is shown in Appendix C. Two kinds of 10 are needed to be 
assigned: the system 10 and the interconnection nets between FPGAs. Actually for each 
system 10 and the interconnection nets, there are several assignments possible. Some 
assignment use or even waste more lOs while others use less and save lOs. The assignment 
depends mainly on the number of FPGAs the nets connected to. The next subsections outlines 
all possible assignments and the number oflOs used and wasted for different kinds of nets. 
3.3.1 Connect 4 FPGAs 




3 2 — 
Fig. 33. Connect 4 FPGAs using global bus. 
40 
3.3.1.2 Local + programmable bus (+board 10): 5 (6) lOs used 
s 
(board 10) 
local bus /' 
0 . 乂 1 
/ 
/ 






Fig. 34. Connect 4 FPGAs using local & programmable bus (& board 10). 
3.3.1.3. Local bus (+board 10): 6 (7) lOs used 
(board 10) 
local bus 




Fig. 35. Connect 4 FPGAs using local bus (& board lO). 
41 
3.3.2 Connect 3 FPGAs 
3.3.2.1 Global bus: 4 lOs used, 110 wasted 




— 3 1 2 I — 
； J 
Fig. 36. Connect 3 FPGAs using global bus. 









- 3 2 
Fig. 37. Connect 3 FPGAs using programmable bus (& board lO). 
42 
3.3.2.3 Local bus (+board lO): 4 (5) lOs used 
(board 10) 
local bus 




Fig. 38. Connect 3 FPGAs using local bus (& board 10). 
3.3.2.4 Local + programmable bus (+board 10): 4 (5) lOs used 
(board 10) 
local bus /' 
0 . 乂 1 
/ 
/ 
i •‘ • 
switch 




—Fig. 39. Connect 3 FPGAs using local & programmable bus (& board lO). 
43 
3-3.3 Connect 2 FPGAs 
3.3.3.1 Adjacent FPGAs 
- l oca l bus (+board 10): 2 (3) lOs used. 
(board 10) 
local bus 
0 乂 1 
Fig. 40. Connect 2 adjacent FPGAs using local bus (& board I � ) . 
3.3.3.2 Next adjacent FPGAs 









Fig. 41. Connect 2 next adjacent FPGAs using programmable bus (& board lO). 
44 
3.3.3.3 Adjacent FPGAs 
-Programmable bus (+board 10): 3 (4) lOs used, 110 wasted. 
(board 10) 
programmable bus / 




. 1 … 2 I 
Fig. 42. Connect 2 adjacent FPGAs using programmable bus (& board lO). 
3.3.3.4 Next adjacent FPGAs 
-programmable bus (+board 10): 3 (4) lOs used, 110 wasted. 
(board 10) 1 
\ programmable bus 
0 � • 1 1 
丨  switch 
‘ 2 
Fig. 43. Connect 2 next adjacent FPGAs using programmable bus (& board lO). 
45 
3.3.3.5 Global bus: 4 lOs used, 2 lOs wasted 
r T T 
global bus 
0 I 1 I 
I 3 丨 2 — 
Fig. 44. Connect 2 FPGAs using global bus. 
46 
3.3.4 System lO (connect 1 FPGA) 
3.3.4.1 Board lO: 1 lO used 
board 10 
0 
Fig. 45. System 10 using board lO. 
3.3.4.2 Global bus: 4 lOs used, 3 wasted 
f f 
global bus 
f o i 1 I 
• ! 3 丨 丨 2 I 
i 
Fig. 46. System 10 using global bus. 
Among all the possible 10 assignments for each net, the one uses and wastes the least lOs will 
be chosen to assign first. If the best choice in the 10 resource is used up, the next best 10 
assignment will be chosen and so on until there are 10 resources available for the net. If all the 
10 resources are used up and there is still nets that are not yet assigned, then 10 assignment 
fails. The user has to change the partitioning parameters to obtain an alternative partitioned 
netlists for 10 assignments. 
47 
3.4 Other Tools 
Apart from the partitioning, there are other tools to automate the validation of the circuit in the 
UPB. The configuration data for the programmable switches connection is generated 
automatically upon successful completion of 10 assignments. The format and the content of 
the <filename>.con is shown in Appendix D. Both the configuration data of FPGAs and the 
programmable switches connection are in a format ready for downloading. At any time of 
operation, the data of RAMs can be read and written from and to the RAMs to ease the 
validation process. 
48 
4 STRTTCTTTRE ANALYSTS 
Having decided the types of the buses and 10 assignments in the UPB, the number of each 
kind of buses have to be determined. They were determined by performing a structure 
analysis in order to obtain the best configuration of the UPB. The best configuration is in the 
sense that the UPB can realize a wide spectrum of different circuits with all the lOs among 
the FPGAs ideally assigned and with the least number of lOs used and wasted. 
Benchmark circuits selected from the Partitioning benchmark suite of the MCNC Centre for 
Microelectronic Systems Technologies were used to carry out the structure analysis. They are 
in the form of Xilinx netlists including both *.xnf and *.map files. Since post-mapping 
partitioning was adopted to partition the design, *.map files of the benchmark circuits were 
used. The numbers of CLBs and lOs of the circuits are different from one and others. 
However, only some were selected in the structure analysis. The maximum CLB utilization 
rate of the UPB was assumed to be 0.6. There are 144 CLBs in each XC3042 FPGA in the 
UPB, only benchmark circuits with CLBs less than 144 x 4 x 0.6 = 345 were used. Table 3 
shows all the benchmark circuits used and their number of CLBs and lOs in the circuits. 
The following outlines the structure analysis method. First of all, the selected benchmark 
circuits are both balanced partitioned into two to four FPGAs and non-balanced partitioned. 
In each case, all the CLB utilization rate of 0.3，0.4，0.5 and 0.6 are performed. After 
partitioning, all the lOs including lOs among the FPGAs and system lOs are assigned. Each 
10 assignment in the circuit is recorded so that the bus ratio of board 10，global bus, local 
bus, programmable bus to connect 2 FPGAs and programmable bus to connect 3 FPGAs can 
be found. There are two assumptions in carrying out the structure analysis. It is assumed that 
the number of lOs in the FPGAs is infinite. This is acceptable because only the ratio of the 
various kinds of buses have to be found rather than the exact number of the buses. Although 
there are several 10 assignment methods for each nets available, only the one with the least 
49 
number of lOs used and wasted is selected. Figure 47 shows all the ideal 10 assignment 
methods. These assignment methods are noted in a data file strana.dat and will be read in 
each time the structure analysis is performed. The format and the content of the strana.dat is 
shown on Appendix C. 
no. Benchmark CLB no. lO no. no. Benchmark CLB no. lO no. 
1 cl7xc3.map 2 7 14 s400xc3.map 32 11 
2 c432xc3.map 50 43 15 s420xc3.map 50 23 
3 c499xc3.map 66 73 16 s444xc3.map 32 11 
4 c880xc3.map 84 86 17 s510xc3.map 68 26 
5 cl355xc3.map 70 73 18 s526xc3.map 55 11 
6 cl908xc3.map 116 58 19 s526nxc3.map 55 11 
7 c3540xc3.map 283 72 20 s820xc3.map 91 39 
8 s27xc3.map 3 7 21 s832xc3.map 91 39 
9 s208xc3.map 25 15 22 s838xc3.map 102 39 
10 s298xc3.map 26 11 23 s953xc3.map 107 41 
11 s344xc3.map 20 22 24 sll96xc3.map 143 30 
12 s349xc3.map 20 22 25 sl238xc3.map 158 30 
13 s382xc3.map 31 11 26 sl423xc3.map 112 24 
Table 3. Benchmark circuits selected from MCNC. 
As the bus ratio depends on the 10 assignment methods which in turn depends on the number 
of FPGAs the net connected, it is not accurate to take all the results obtained from the 
structure analysis. For example, if a circuit size is so small that the number of the FPGAs 
used is two when it is non-balanced partitioned, only board 10，local bus and programmable 
bus (connect 2 FPGAs) are used while no global bus and programmable bus (connect 3 
FPGAs) are assigned because the maximum number of FPGAs connected to the nets is two. 
This would seriously affect the bus ratio determined if this result was taken into account. 
Similarly, it is the same case in considering the results obtained from balanced partitioning 
50 
into two FPGAs. Hence, only those results of benchmark circuits which have at least three 
� F P G A s in the UPB after non-balanced partitioning are considered as valid results. 
No system interface System interface 
^ ^ ^ ^ 
4 FPGAs — 一 
4 lOs 4 lOs 
3 FPGAs r-U • , � � � ; — - r ^ r S I switch I I I I ‘ 
3 lOs 4 lOs 
I 
2 adj. FPGAs 
2 lOs 3 lOs 
2 non-adj. FPGAs f n switch [M switch 
—— L-h-J 
2IOs 3 I 0 s ' 
1 FPGA — | _ 
1 10 




Table 4 - 1 9 shows all the results obtained from the structure analysis. The CLB distribution 
is the number of CLBs assigned to the FPGA 0, 1, 2, 3 respectively after partitioning. The 
CLB distribution and the cut size are also shown here just for completeness and reference. 
The bus ratios are the ratio of board 10 to global bus to local bus to programmable bus 
(connect 2 FPGAs) to programmable bus (connect 3 FPGAs). The bolded numbers of the bus 
ratio in the tables signify the valid results used to determine the best bus configuration of the 
UPB. Some entries are empty because the benchmark circuits concerned are so small that they 
can be fitted into 1 FPGA to obtain 100% mutability. In order to provide a clear view of the 
effects of the partitioning methods on the cut size, the cut size is plotted against each 
benchmark circuit used in the structure analysis graphically. From the graphs (Figure 48 _ 
51), one can find the results are reasonable.. The cut size of the circuit is generally smaller 
when it is non-balanced partitioned, and increases with the number of FPGAs finally obtained 
from balanced partitioning as explained in the previous section. 
52 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 {43,7,0,0} 12 43:0:12:0:0 
3 c499xc3.map 66 73 {38,28,0,0} 17 72:0:17:0:0 
4 c880xc3.map 84 86 {40,43,0,0} 21 81:0:25:0:0 
5 cl355xc3.map 70 73 {41，29，0,0} 16 73:0:16:0:0 
6 cl908xc3.map 116 58 {36，41，0，39} 54 41:1:52:12:0 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 {9,41,0,0} 1 21:0:7:0:0 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 {25,43,0,0} 33 27:0:33:0:0 
18 s526xc3.map 55 11 {42, 13，0，0} 11 9:0:11:0:0 
19 s526nxc3.map 55 l i {42, 13’ 0，0} 11 9:0:11:0:0 
20 s820xc3.map 91 39 {43，5，43，0} 29 31:0:0:19:5 
21 s832xc3.map 91 39 {43, 13，35，0} 35 32:3:4:15:5 
22 s838xc3.map 102 39 {16,43,43,0} 50 35:1:33:15:0 
23 s953xc3.map 107 41 {33,43,31,0} 84 28:5:49:17:4 
24 sll96xc3.map 143 30 {40,22,41,40} 99 15:13:46:16:4 
25 sl238xc3.map 158 30 {43,42,39,34} 110 16:13:48:21:6 
26 sl423xc3.map 112 24 {35’ 43，34，0} 52 20:3:39:3:2 
53 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 {22,28,0,0} 15 43:0:15:0:0 
3 c499xc3.map 66 73 {29，37，0，0} 18 72:0:18:0:0 
4 c880xc3.map 84 86 {41,42,0,0} 21 78:0:21:0:0 
5 cl355xc3.map 70 73 {39,31,0,0} 18 73:0:18:0:0 
6 cl908xc3.map 116 58 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 {24，26，0，0} 12 21:0:12:0:0 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 {37,31,0,0} 29 25:0:29:0:0 
18 s526xc3.map 55 11 {29，26，0，0} 21 8:0:21:0:0 
19 s526nxc3.map 55 11 {29,26,0,0} 21 8:0:21:0:0 
20 s820xc3.map 91 39 
21 s832xc3.map 91 39 
22 s838xc3.map 102 39 
23 s953xc3.map 107 41 
24 sll96xc3.map 143 30 
25 sl238xc3.map 158 30 
26 sl423xc3.map 112 24 
54 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
� Benchmark no. Benchmark CLB no. lO no. CLBdist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 {15, 19,16,0} 34 43:0:19:7:4 
3 c499xc3.map 66 73 {22，26，18,0} 31 65:0:19:12:0 
4 c880xc3.map 84 86 {30,22,31,0} 32 75:3:13:9:2 
5 cl355xc3.map 70 73 {24，24,22，0} 32 73:0:16:16:0 
6 cl908xc3.map 116 58 {43,39,34,0} 64 49:2:23;11:13 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 {15’ 16，19，0} 22 19:2:13:5:0 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 {21，23，24,0} 59 26:0:25:16:9 
18 s526xcc3.map 55 11 {22，16，17，0} 40 8:3:9:7:9 
19 s526nxc3.map 55 11 {22,16，17，0} 40 8:3:9:7:9 
20 s820xc3.map 91 39 {34，33’ 24，0} 40 29:3:20:4:5 
21 s832xc3.map 91 39 {31,26,34,0} 41 31:3:16:9:5 
22 s838xc3.map 102 39 {35,40,27,0} 39 35:3:25:6:1 
23 s953xc3.map 107 41 {43，35’ 29,0} 90 28:5:41:23:8 
24 sll96xc3.map 143 30 
25 sl238xc3.map 158 30 
26 sl423xc3.map 112 24 {36,36,40,0} 42 18:3:20:12:2 
55 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLBdist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 {9，16，14，11} 32 43:3:13:10:0 
3 c499xc3.map 66 73 {19，21，12，14} 47 65:0:22:13;6 
4 c880xc3.map 84 86 {25，16，16，26} 51 73:3:26:15:2 
5 cl355xc3.map 70 73 {19，16，22，13} 48 62:0:27:5:8 
6 cl908xc3.map 116 58 {21，32，26，37} 75 44:3:45:14:5 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 {15，9，10，16} 28 19:3:8:12:1 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 {13, 14,21，20} 70 25:3:28:21:6 
18 s526xc3.map 55 11 {11，16, 11，17} 37 8:3:10:6:6 
19 s526nxc3.map 55 11 {11, 16，11，17} 37 8:3:10:6:6 
20 s820xc3.map 91 39 {24’ 21，29, 17} 59 31:8:21:16:0 
21 s832xc3.map 91 39 {28，28, 18，17} 63 32:8:23:18:0 
22 s838xc3.map 102 39 {25,21,27,29} 43 36:3:21:13:0 
23 s953xc3.map 107 41 {34,26,20,27} 113 29:12:46:22:10 
24 sll96xc3.map 143 30 {37，32，43，31} 102 15:14:41:18:6 
25 sl238xc3.map 158 30 {43，36，36，43} 106 17:10:61:10:4 
26 sl423xc3.map 112 24 {21’ 35，21’ 35} 46 20:3:19:14:2 
56 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
� Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 {57,9,0,0} 15 65:0:15:0:0 
4 c880xc3.map 84 86 {56，27，0,0} 16 83:0:16:0:0 
5 cl355xc3.map 70 73 {56, 14,0,0} 19 65:0:197:0:0 
6 cl908xc3.map 116 58 {2,57，57，0} 46 45:1:41:3:0 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 H 
17 s510xc3.map 68 26 {11,57,0,0} 23 28:0:23:0:0 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 11 
20 -s820xc3.map 91 39 {57，34’ 0，0} 24 33:0:24:0:0 
21 s832xc3.map 91 39 {57’ 34，0，0} 28 32:0:22:0:0 
22 s838xc3.map 102 39 {0，47，55，0} 19 36:0:19:0:0 
23 s953xc3.map 107 41 {0，57，50，0} 53 29:0:53:0:0 
24 sll96xc3.map 143 30 {50，0，56，37} 66 15:3:35:21:2 
25 sl238xc3.map 158 30 {54，0，53，51} 75 18:5:32:25:4 
26 sl423xc3.map 112 24 {49，57，6，0} 33 20:3:24:3:0 
57 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLBdist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 {29,37，0’ 0} 18 72:0:18:0:0 
4 c880xc3.map 84 86 {41,42,0,0} 21 78:0:21:0:0 
5 cl355xc3.map 70 73 {39,31,0,0} 18 73:0:18:0:0 
6 cl908xc3.map 116 58 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 1 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.tnap 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 11. 
17 s510xc3.map 68 26 {37，31，0,0} 29 25:0:29:0:0 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 U 
20 s820xc3.map 91 39 {40,51,0,0} 23 33:0:23:0:0 
21 s832xc3.map 91 39 {45’ 46，0,0} 22 34:0:22:0:0 
22 s838xc3.map 102 39 {54,48,0,0} 22 36:0:22:0:0 
23 s953xc3.map 107 41 {57，50，0，0} 52 32:0:52:0:0 
24 sll96xc3.map 143 30 
25 sl238xc3.map 158 30 
26 sl423xc3.map 112 24 
58 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Bench Mark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 {22,26,18,0} 31 65:0:19:12:0 
4 c880xc3.map 84 86 {30,22,31,0} 32 75:3:13:9:2 
5 cl355xc3.map 70 73 {24，24，22，0} 32 73:0:16:16:0 
6 cl908xc3.map 116 58 {39，46，31，0} 58 39:2:29:5:10 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 {21,23,24,0} 59 26:0:25:16:9 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 U 
20 - s820xc3.map 91 39 {34，33，24，0} 40 29:3:20:4:5 
21 s832xc3.map 91 39 {31,26,34,0} 41 31:3:16:9:5 
22 s838xc3.map 102 39 {35,40，27，0} 39 35:3:25:6:1 
23 s953xc3.map 107 41 {43,35,29,0} 90 28:5:41:23:8 
24 sll96xc3.map 143 30 {54，51，38,0} 64 15:5:29:21:2 
25 sl238xc3.map 158 30 {53,48,57,0} 84 17:7:40:24:3 
26 sl423xc3.map 112 24 {33，34，45，0} 37 19:3:18:11:1 
59 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 {19,21,12, 14} 47 65:0:32:13:6 
4 c880xc3.map 84 86 {25’ 16，16，26} 51 73:3:26:15:2 
5 cl355xc3.map 70 73 {19，16，22，13} 48 62:0:275:8 
6 cl908xc3.map 116 58 {21’ 32，26’ 37} 75 44:3:45:14:5 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
^ H i ^ ^ M ^ M ^ M ^ H ^ H i i M H ^ H i 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 11. 
17 s510xc3.map 68 26 {13，14，21，20} 70 25:3:28:21:6 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 11 
20 —s820xc3.tnap 91 39 {24，21’ 29，17} 59 31:8:21:16:0 
21 s832xc3.map 91 39 {28，28，18, 17} 63 32:8:23:18:0 
22 s838xc3.map 102 39 {25,21,27,29} 43 36:3:21:13:0 
23 s953xc3.map 107 41 {34,26,20,27} 113 29:12:46:22:10 
24 sll96xc3.map 143 30 {42，27，45，29} 96 15:10:39:26:5 
25 sl238xc3.map 158 30 {48，33’ 29，48} 106 16:10:55:16:6 
26 sl423xc3.map 112 24 {21,35,21,35} 46 20:3:19:14:2 
60 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 
4 c880xc3.map 84 86 {11,72,0,0} 15 78:0:15:0:0 
5 cl355xc3.map 70 73 
6 cl908xc3.map 116 58 {72,44,0,0} 35 50:0:35:0:0 
7 c3540xc3.map 283 72 {69，70,72，72} 126 50:10:50:37:8 
8 s27xc3.inap 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.tnap 32 U 
17 s510xc3.map 68 26 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 11 
20 s820xc3.map 91 39 {72, 19，0，0} 17 32:0:17:0:0 
21 s832xc3.map 91 39 {72’ 19，0，0} 16 33:0:16:0:0 
22 s838xc3.map 102 39 {33,69,0,0} 18 35:0:18:0:0 
23 s953xc3.map 107 41 {35，72，0，0} 51 28:0:51:0:0 
24 sll96xc3.map 143 30 {71，72,0，0} 40 22:0:40:0:0 
25 sl238xc3.map 158 30 {72, 14,72,0} 88 17:5:20:54:2 
26 sl423xc3.map 112 24 {40’ 72，0，0} 17 22:0:17:0:0 
61 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 
4 c880xc3.map 84 86 {41’ 42，0,0} 21 78:0:21:0:0 
5 cl355xc3.map 70 73 
6 cl908xc3.map 116 58 {66,50，0，0} 34 48:0:34:0:0 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 11 
20 s820xc3.map 91 39 {40,51,0,0} 23 33:0:23:0:0 
21 s832xc3.map 91 39 {45，46’ 0’ 0} 22 34:0:22:0:0 
22 s838xc3.map 102 39 {54，48，0，0} 22 36:0:22:0:0 
23 s953xc3.map 107 41 {51,56,0,0} 53 30:0:53:0:0 
24 sll96xc3.map 143 30 {71，72,0，0} 40 19:0:40:0:0 
25 sl238xc3.map 158 30 
26 sl423xc3.map 112 24 {50’ 62，0，0} 19 20:0:194:0:0 
62 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLBdist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 
4 c880xc3.map 84 86 {30，22，31，0} 32 75:3:13:9:2 
5 cl355xc3.map 70 73 
6 cl908xc3.map 116 58 {39，46，31，0} 58 39:2:29:5:10 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 11 
20 s820xc3.map 91 39 {34,33,24,0} 40 29:3:20:4:5 
21 s832xc3.map 91 39 {31,26，34,0} 41 31:3:16:9:5 
22 s838xc3.map 102 39 {35,40,27,0} 39 35:3:26:6:1 
23 s953xc3.map 107 41 {43,35,29,0} 90 28:5:41:23:8 
24 sll96xc3.map 143 30 {54，51，38，0} 64 15:5:29:21:2 
25 sl238xc3.map 158 30 {63，42，53，0} 76 16:8:31:21:4 
26 sl423xc3.map 112 24 {33，34，45，0} 37 19:3:18:11:1 
63 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 
4 c880xc3.map 84 86 {25，16，16，26} 51 73:3:26:15:2 
5 cl355xc3.map 70 73 
6 cl908xc3.map 116 58 {21，32，26’ 37} 75 44:3:45:14:5 
7 c3540xc3.map 283 72 {69,72，70，72} 132 48:8:64:41:4 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 11 
20 s820xc3.map 91 39 {24，21，29，17} 59 31:8:21:16:0 
21 s832xc3.map 91 39 {28,28,18,17} 63 32:8:23:18:0 
22 s838xc3.map 102 39 {25，21，27，29} 43 36:3:21:13:0 
23 s953xc3.map 107 41 {34，26,20，27} 113 29:12:46:22:10 
24 sll96xc3.map 143 30 {42,27,45,29} 96 15:10:39:26:5 
25 sl238xc3.map 158 30 {48,33,29,48} 106 16:10:55:16:6 
26 sl423xc3.map 112 24 {21,35,21,35} 46 20:3:19:14:2 
64 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 
4 c880xc3.map 84 86 
5 cl355xc3.map 70 73 
6 cl908xc3.map 116 58 {85,31,0,0} 23 57:0:23:0:0 
7 c3540xc3.map 283 72 {73,86，86,38} 113 53:10:59:22:5 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 11 
20 s820xc3.map 91 39 {86，5，0,0} 5 39:0:5:0:0 
21 s832xc3.map 91 39 {86,5,0,0} 5 39:0:5:0:0 
22 s838xc3.map 102 39 {16,86,0,0} 13 37:0:13:0:0 
23 s953xc3.map 107 41 {21，86，0，0} 40 32:0:40:0:0 
24 sll96xc3.map 143 30 {86，57,0，0} 38 20:0:38:0:0 
25 sl238xc3.map 158 30 {86，72,0’ 0} 46 22:0:0:46:0 
26 sl423xc3.map 112 24 {26，86，0’ 0} 17 20:0:17:0:0 
65 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 1 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 
4 c880xc3.map 84 86 
5 cl355xc3.map 70 73 
6 cl908xc3.map 116 58 {66’ 50，0,0} 34 48:0:34:0:0 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 1 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 1.1 
17 s510xc3.map 68 26 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 t l 
20 s820xc3.map 91 39 {40,51,0,0} 23 33:0:23:0:0 
21 s832xc3.map 91 39 {45，46,0,0} 22 34:0:22:0:0 
22 s838xc3.map 102 39 {54，48，0,0} 22 36:0:22:0:0 
23 s953xc3.map 107 41 {51,56,0,0} 53 30:0:53:0:0 
24 sll96xc3.map 143 30 {75，68，0，0} 39 22:0:39:0:0 
25 sl238xc3.map 158 30 {86，72’ 0,0} 45 22:0:45:0:0 
26 sl423xc3.map 112 24 {50’ 62’ 0’ 0} 19 20:0:19:0:0 
66 
Table 18. Balanced partitioning into 3，utilization=0.6. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 
4 c880xc3.map 84 86 
5 cl355xc3.map 70 73 
6 cl908xc3.map 116 58 {39，46，31，0} 58 39:2:46:5:0 
7 c3540xc3.map 283 72 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 11 
20 s820xc3.map 91 39 {34，33，24，0} 40 29:3:20:4:5 
21 s832xc3.map 91 39 {31,26,34,0} 41 31:3:16:9:5 
22 s838xc3.map 102 39 {35,40,27,0} 39 35:3:25:6:1 
23 s953xc3.map 107 41 {43,35,29,0} 90 28:5:41:23:8 
24 sll96xc3.map 143 30 {54,51,38,0} 64 15:5:29:21:2 
25 sl238xc3.map 158 30 {63,42,53,0} 76 16:8:31:21:4 
26 sl423xc3.map 112 24 {33,34,45,0} 37 19:3:18:11:1 
• 67 
Table 5. Non-balanced partitioning into 2, utilization=0.3. 
Benchmark no. Benchmark CLB no. lO no. CLB dist. Cut size Bus ratio 
1 cl7xc3.map 2 7 
2 c432xc3.map 50 43 
3 c499xc3.map 66 73 
4 c880xc3.map 84 86 
5 cl355xc3.map 70 73 
6 cl908xc3.map 116 58 {21，32，26，37} 75 44:3:45:14:5 
7 c3540xc3.map 283 72 {84，67，77，55} 134 53:13:76:24:2 
8 s27xc3.map 3 7 
9 s208xc3.map 25 15 
10 s298xc3.map 26 11 
11 s344xc3.map 20 22 
12 s349xc3.map 20 22 
13 s382xc3.map 31 11 
14 s400xc3.map 32 11 
15 s420xc3.map 50 23 
16 s444xc3.map 32 11 
17 s510xc3.map 68 26 
18 s526xc3.map 55 11 
19 s526nxc3.map 55 11 
20 s820xc3.map 91 39 {24，21，29’ 17} 59 31:8:21:16:0 
21 s832xc3.map 91 39 {28，28, 18，17} 63 32:8:23:18:0 
22 s838xc3.map 102 39 {25,21,27,29} 43 36:3:21:13:0 
23 s953xc3.map 107 41 {34，26’ 20，27} 113 29:12:46:22:10 
24 sll96xc3.map 143 30 {42,27,45,29} 96 15:10:39:26:5 
25 sl238xc3.map 158 30 (48,33,29,48} 106 16:10:55:16:6 
26 sl423xc3.map 112 24 {21，35,21，35} 46 20:3:19:14:2 
68 
Fig. 48. Cut size of benchmark circuits with utilization rate = 0.3. 
801 1 1 1 1 1 — I r- I 
60 - , / \ . “ 
® Balanced into 4 . 7 � : � •y . / � 
CO / n - • • 一 
与 40 ... Balanced into 3 / . / . 
O 、 7 / \ 
Non-balanced "“ 
n I I I 1 1 1 ‘ ‘ 
� 2 4 6 8 10 12 14 16 18 20 
Benchmark no. 
Fig. 49. Cut size of benchmark circuits with utilization rate = 0.4. 
150| 1 1 ‘ ‘ “ 
� 100- / , _ 
N ； • 
Balanced into4 ..... 
O 5 0 - . 一 . - . - - . . 、 . 、 • 、 V / -
Balanced i n t o 3 , . ' - Balanced inio_2_ 一 … / 
--jd 
, , Non-balanced , , 
0 � 5 10 15 20 25 
Benchmark no. 
Fig. 50. Cut size of benchmark circuits with utilization rate = 0.5. 
150| 1 1 "T ‘ ‘ 
•. 
0 ) 1 0 0 - -
.送 Balanced into 4 . \ 
与 \ •• 
o . Balanced into 3 ) . . 50 - .. / . 一 •••/ / X _ Balanced into 2 / \ � 
X 
Non-balanced , , n I 1 1 ‘ ‘ 
� 0 5 10 15 20 25 30 
Benchmark no. 
_ Fig. 51 • Cut size of benchmark circuits with utilization rate = 0.6. 
1501 1 1 1 1 —I I ““‘ ‘ ‘ ‘ 
•. 
m100_ .. . _ 0) • A, •S Balanced into 4 / \ ^ .7. \ z V . 
O Balanced into 3 ••、，：•’、 v： 
50 - V. � . � 
Baiancedjnto 2 ^ \ ' 
I r^on-balapced . ~ . 
8 10 12 14 16 18 20 22 24 26 Benchmark no. 
69 
The average number of each type of buses from the valid data is calculated and the final 
average bus ratio is 
7.34 : 1.49 : 8.70 : 4.16: 1. 
We can see that more local buses are used than the programmable bus, the ratio is 8.70 : 4.16 
for nets connected to 2 FPGAs. The reason is obvious because the partitioner assigns circuits 
which are too large to fit into a single FPGA, into two adjacent chips rather than two widely 
separated FPGAs. 
Since board 10, global bus, local bus, programmable bus connect to 2 FPGAs and 
programmable bus connected to 3 FPGAs require 1, 4, 2, 2, 3 lOs respectively, the final ratio 
oflOsis 
7.34:5.97:17.41 : 8.33 : 3. 
The total lOs available for the buses in the UPB is 282. According to the ratio above, the 
number of different kinds of buses allocated is as follow: 
lOs for Board 10 = 49.20 ~ 14 + 14 + 14 + 8 = 50; 
lOsfor Global bus = 40.07 ~ 10x4 = 40; 
lOs for Local bus = 116.75 � 16x2x4 二 112; 
lOsforProg, bus (2) = 55.86 ~ 14x4 = 56; 
lOs for Prog, bus (3) = 20.12 � 2 x (4 x 2) + 2 x (4) = 24. 
The bus allocation is shown in Figure 52. The number of board 10 comes from FPGA 3 is 6 
less than that from FPGA 0 - 2 because a total of 16 lOs is used in FPGA 3 for the address 
bus and the control bus of the address generator while only 10 lOs are used in FPGA 0 - 2 for 
the data and control bus of the RAM 0 - 2. Such board 10 allocation can make the bus 
structure more symmetrical. It should also be noted that the programmable bus to connect 3 
F P G A S requires 1 odd path and 1 even path, totally 3 lOs. Hence, it was chosen to allocate 2 
lOs for the even paths and 2 lOs for the odd path. 
70 
8 
Board 10 I T T T T . 
10 Global bus ^ 
T “ 1 2 ‘ Local bus ~~^——~ 
FPGAO • F P G A I 
U ~ D Programmable C n"""j 
I b u f f e r bus [ b u f f e r j：^!, 
2 > 4朴 4丨 2卞4才卞4 
RAMO , 8 RAMI 
12 8 , l | | 2 3 8 , 1 2 
Z “ 一：〈 . Switch — 
8 , Z 8 o ~ ~ RAM2 
4 A 、才卞 2 
buffer I b u f f e r I 
I L L R ^ '^c P [ | 丨JC I I 
U F P G A 3 丄： F P G A 2 二 
_ _ 
Fig. 53. Final bus allocation in UPB. 
. 72 
6. FTTTTIRE DIRECTION 
6.1 Other Possible Configurations 
The UPB uses four Xilinx XC3042 FPGAs as the prototype implementation technologies. It 
is estimated the maximum CLB utilization rate is 0.6. There are around 10,000 gate count 
available. Actually, it has not been investigated that the best number of FPGAs in the UPB is 
4. May be it would be better to use larger capacity FPGAs and more number of FPGAs in the 
UPB because at least more gate count will be contributed for the larger and more complex 
designs. The structure analysis used in the UPB can be adopted to find the best bus 
interconnections for emulation boards with different number of FPGAs provided that the 
connection method for each type of nets is specified. This structure analysis is suitable for the 
emulation board which have 4 - 8 FPGAs. 
The final bus ratio depends greatly on the structure analysis. During the structure analysis, the 
circuit is first partitioned. So the partitioning algorithm has great effect on the final bus 
structure. The F & M algorithm is quite fast (almost linear with the circuit size) and simple to 
implement. However, there are other algorithms available to partition the circuit. Although its 
computation time is higher and the algorithms are much more sophisticated, the cut size 
between the partitioned component is smaller. Perhaps great enhancement would be obtained 
from a better algorithm. 
6.2 Programmable Interconnection 
In the UPB, 8 x 8 cross-point switches are used as the routing resource between the FPGAs. 
There is a limitation here. If the number of buses in either dimension is greater than 8 (such as 
15 X 2 in the UPB), it is very difficult to realize it. Even though it can be realized by 
cascading more switches，the delay would be very large. A better approach for future UPB 
• 73 
development is to determine the best bus structure first. After knowing the dimensions of the 
� r equ i red switches, use either an intermediate FPGA or custom-made programmable switches 
as the routing resources. Then any configuration of bus structure could be implemented as the 
FPGA provides a large number of lOs and the custom programmable switches is tailored for 
the board designed. 
6.3 Expandability of UPB 
The global bus and board 10 in the UPB provide the interface to expand the UPB by 
cascading more UPBs together. The UPBs can be interconnected by just using the hardwired 
buses. Another option is to scale up the UPBs by the programmable interconnection with the 
same bus interconnection structure between the FPGAs in a single UPB. In other word, the 
structure analysis can find the final bus structure not only between the FPGAs bus also 
between UPBs. And all the buses including Board 10, global bus, local bus and 
programmable bus are scaled up in the same manner. The last option is that the UPBs can be 
connected by both the hardwired buses and the programmable bus designed differently. 







An Universal Prototyping Board (UPB) was developed to overcome the disadvantages of 
traditional design prototyping. It provides a fast, flexible, reconfigurable and in-circuit 
validation for any digital designs. Although such an emulation board exits commercially and 
academically, they are either too expensive and too large capacity for general applications or 
too inflexible. The UPB was successfully implemented which has the flexibility for design 
modification and requires lower cost. 
This UPB is based on the Field Programmable Gate Array (FPGAs) and programmable 
interconnections. Both hardwired and programmable buses are used in between the FPGAs. 
These buses were carefully designed so that each net will have little delays after net 
assignment. Besides, external RAMs are available for the memory-intensive designs. 
Furthermore, a microprocessor acts both as the controller of the UPB and one of the emulated 
hardware. 
In order to automate the validation process, partitioning tool was developed to partition the 
circuit into multiple sets of netlists target for each FPGA. It performs the critical path analysis 
before partitioning to ensure the operation speed or timing specification requirement. 
Different parameters can be adjusted by the user to obtain the most desirable partitioned 
netlists. 
The ratio of various kinds of buses found was proved to be the optimum through the structure 
analysis. Although, the optimum number of FPGAs used in the UPB has not been studied, 
this structure analysis can still be used to find the optimum bus ratio in UPBs which have 4 to 
8 FPGAs. 
75 
In short, with this emulation board, software tools and structure analysing tools, almost all 
� d i g i t a l circuits can be realized in the UPB, and one can design a different sized UPB with the 




1 / N. Zafar, "Managing Risk in ASIC Design Cycle，” Proc. Third Annual IEEE ASIC 
Seminar & Exhibit, Sep. 1990. 
2] S. Walters, "Reprogrammable Hardware Emulation Automates System-Level ASIC 
Validation，” WESCON/90 Conference Record, Nov. 1990. 
[3/ S.D. Brown, RJ. Francis, J. Rose & Z.G. Vranesic，Field-Programmable Gate Array, 
Kluwer Academic Publishers, 1992. 
[4]' S. Walters, "Reprogrammable Hardware Emulation for ASICs Makes Thorough Design 
Verification Practical," COMPCON Spring ,89，Thirty-Fourth IEEE Computer-Society 
International Conference, 1989. 
[5], S. Walters, "Computer-Aided Prototyping for ASIC-Based Systems,” IEEE Design and 
Test of Computers, vol. 8, iss. 2, 1991，pp.4-10. 
[6] R . J. Hasslen & N. Zafar, "A Validation Strategy for Embedded Core ASICs," Proc. 
Third Annual IEEE ASIC Seminar & Exhibit, Sep. 1990. 
[7] B. Tuck, "Prototyping System Emulates Up To Six Million Gates," Computer Design, 
Nov. 1992, pp.119. 
[8], L. Maliniak, "Logic Emulation Promotes Parallel Design Methods," Electronic Design, 
Apr. 1992, pp.104-105. 
9] ‘ S. J. Cravatta, "Logic Cell Emulation for ASIC In-Circuit Emulators," Proc. Third 
Annual IEEE ASIC Seminar & Exhibit, Sept. 1990. 
[10] D. Pasternak & T. Hike, "In-Circuit-Emulation in ASIC architectural Core Designs," 
Proc. Second Annual IEEE ASIC & Exhibit, Sep. 1989. 
[11] D.E.V.D. Bout, J.N. Morris, D. Thomae, S. Labrozzi, S. Wingo & D. Hallman, 
"AnyBoard: An FPGA-Based, Reconfigurable System," IEEE Design iSc Test of 
Computers, Sept. 1992，pp. 22-30. 
[12] The Programmable Gate Array Data Book, Xilinx Inc., 1991. 
[13] User Guide and Tutorial, Xilinx Inc., 1991. 
vii 
[ h / a . Clements, Microprocessor Systems Design 68000 Hardware, Software, and 
� Interfacing, PWS-KENT Publishing Co., 1987. 
[15] ^ . D . Wilcox, 68000 Microprocessor Systems: Designing and Troubleshooting, 
Prentice-Hall Inc., 1987. 
[16f J.Buchanan, System Timing in CMOS/TTL Digital Systems Design, McGraw-Hill 
Publishing Co., 1990, pp. 199-215. 
l / ] K. Perry, "Partitioning Logic Designs into FPGAs," m Electronic Engineer, May 1993, 
yPP. 86-91. 
[18] S. Goto & T. Matsuda, "Partitioning, Assignment and Placement," in Layout Design 
and Verification, T. Ohtsuki ed” Elsevier Science Publishers B. V. (North-Holland), 
1986, pp. 55-64. 
[19]^W.E. Donath, "Logic Partitioning, “ in Physical Design Automation of VLSI Systems, B. 
Preas & M. Lorenzetti ed., The Benjamin. Cummings Publishing Co., Inc., 1988, pp. 
65-86. 
[20]^T. Lengaure，"Circuit Partitioning," in Combinational Algorithms, John Wiley & Sons， 
1990, pp. 251-273. 
p i f e . W . Kemighan & S. Lin, "An Efficient Heuristic Procedure for Partitioning Graphs," 
Bell Syst. Techn. J” vol. 49，Feb. 1970，pp. 291-307. 
[22]'^C. M. Fiduccia & R.M. Mattheyses, "A Linear-Time Heuristic for Improving Network 
Partitions," Proc. 19th Design Automation Conference, 1982, pp. 175-181. 
[23]^B. Krishnamurthy, "An Improved Min-Cut Algorithm for Partitioning VLSI Networks," 
IEEE Transaction on Computers, vol. c-33, no. 5, May 1984，pp.438-446. 
[24] D.G. Schweikert & B.W. Kemighan, "A Proper Model for the Partitioning of Electrical 
Circuits," Proc. 9th Design Automation Workshop, Dallas, Jun. 1979, pp. 57-62. 
[25] ^ L.A. Sanchis, "Multiple-Way Network Partitioning," IEEE Transaction on Computers, 
vol. 38，no. 1，Jan. 1989，pp. 62-81. 
viii 
[26] W.Y. Lo, C.S. Choy & C.F. Chan, "Hardware Emulation Board Based on FPGAs and 
Programmable Interconnections," Fifth International Workshop on Rapid System 


















On each line, the name and the delay of the primitive gate is specified. It is no need to put the 
names of the gates in alphabetical order. However, the names of the gates must be in capital 
letter, and the spellings of the primitive gates should be the same as that in *.map files. PATH 
means the wiring path. 
XV 
APPENDIX B 
Command line argument options 
Particulars Argument Options Description Defaults 
Critical path analysis -f -- normal critical path no critical path analysis 
analysis 
-c ~ critical path analysis, treat 
all clocked elements as 
combinational elements 
Partitioning -b 2，3 or 4 balanced partitioning into non-balanced partitioning 
2，3 or 4 FPGAs 
Utilization rate -u 1 - 9 utilization rate of CLB in 5 
each FPGA, 1 - 9 means 
utilization rate = 0.1 - 0.9 
respectively 
Example 1: parti <filename>[.map] -f -b/2 -f <ENTER> 
Meanings: normal critical path analysis, balanced partitioning into 2 FPGAs, utilization rate 
=0.6 
Example 2: parti <filename>[.map] -u/4 <ENTER> 




The content of iosource.dat is shown as follows: 
BOARD 10 
LCA0..2: 25..27, 30, 33..40, 49, 76; 
LCA3: 25..27, 30, 33..35, 76; 
END 
GBL BUS 
LCAO..3: 42, 44..48, 50, 53, 61, 66; 
END 
PROG INTC 
PI： 28, 56. .60, 62..63; 
P3: 29, 65, 67. .71, 75; 
P5: 29, 65, 67..71, 75; 
P7: 29, 65, 67..71, 75; 
P2,LCAO: 29, 65, 67..71, 75; 
,LCAl: 77..84; 
P4,LCAl: 28, 56..60, 62..63; 
,LCA2: 77..84; 
P6,LCA2: 28, 56. .60, 62.. 63; 
,LCA3: 77..84; 




LCAO,1: 2..9, 16, 24, 41, 73:72; 
LCAl,2: 10..11, 13..15, 17..21, 23, 73:72; 
LCA2,3: 2..9, 16, 24, 41, 73:72; 
LCA3,0: 10..11, 13..15, 17..21, 23, 73:72; 
END 
ADD BUS 
LCA3,AO: 36..40, 49, 52; 
,S8: 56, 58, 60, 62; 
,S7: 65, 67, 70, 75; 
END 
DATA BUS 
LCAO,S2: 75, 70, 67, 65; 
,SI: 62, 60, 58, 56; 
LCAl,S3: 75, 70, 67, 65; 
,S4: 62, 60, 58, 56; 
LCA2,S5: 75, 70, 67, 65; 




All alphabets are written in capital letters. The iosource.dat is divided into a number of 
records which is delimited by the record name and END record. The order of placement of 
the records are fixed as follows: Board 10，Global bus, Programmable interconnect, Local 
bus, Address bus and Data bus. The last record must be the EOF (end of file) record. 
C-1 Numerical value 
The numerical values include the LCA numbers and the pin numbers. They can be listed one 
by one separated by a comma ,,’. Also, two full stop can be used to list a range of number. 
For example, both 3, 4, 5, 6 and 3..6 means pin 3, pin 4, pin 5 and pin 6. Similarly, both 
LCA0..2 and LCAO，LCAl and LCA2 means the FPGA 0, FPGA 1 and FPGA 2 and 
FPGA 3. It should be noted that a semicolon instead of the comma is used to illustrate the 
last pin number or last pin range. 
C-2 Records 
In the following, the formats of all records are illustrated, i，j and k are the substitute of the 
numerical value. 
Board IP 
Record Name: BOARD 10 
Format: LCAi[..j]: <pin number or pin range�; 
Description: LCAi means FPGA i. 
Global bus 
Record Name: GBL BUS 
Format: LCAi[.JJ: <pin number or pin range�; 
xiii 
Programmable bus 
Record Name: PROG INTC 
Format: Pi： <pin number or pin range>; for odd paths, 
Pi, LCAj: <pin number or pin range�; for even paths 
Description: Pi means path number i. For the even paths, it has to tell which two FPGAs 
the even path connect to. 
Local bus 
Record name: LOG BUS 
Format: LCAi，j: <pin number or pin range�; 
Description: It has to state which FPGA (i and j) the local bus connect to. If the local 
bus connects to the same pin number in both FPGA, just write the pin 
number as other numerical values. If not, use colon to state them. For 
instance, in the iosource.dat shown before, 73:72 in the line 1 of LOG BUS 
record means the local bus connects to pin 73 on FPGA 0 and pin 72 on 
FPGAl. 
Address bus 
Record name: ADD BUS 
Format: LCAi,Aj:<pin number or pin range�; or 
,Sk: <pin number or pin range�; 
Description: In the UPB, 8 address buses are shared with the programmable bus while 4 
are not, Sk states the shared address bus and Aj states the non-shared 





Record name: DATA BUS 
Format: LCAi,Sj: <pin number or pin range�; or 
，Sk: <pin number or pin range�; 




The <filename>.con is the Hex-character data file for programming the switch connection 
between the FPGAs. In the UPB, the switch address starts from $140000 and ends $14007F. 
Hence, there are totally 128 data and 8 lines in the file. The first data corresponds to the 












Table of pin number for different buses 
Legend: 
1. Board 10 A. Address bus 
2. Global bus D. Data bus 
3. Local bus C. Control bus 
4. Programmable interconnect (path) *** Permanent function on such pin 
Pin# Function " " “ F P G A O " " “ F P G A 1 F P G A 2 FPGA 3 
" " “ i ^ ‘ … … ^ … 
“ 2 ~ A13 - I/O 3 ‘ 3 i 3 3 
3 A 6 - I / 0 3 3 « 3 3 
4 A12- I /0 3 3 3 3 
5 A7 - I/O 3 3 3 3 
“ 6 m 3 一 3 3 3 
“ 7 ~ V0 3 ？ 3 3 3 
8 A l l - I/O 3 3 3 3 
9 A8 - I/O 3 3 3 3 
10 AlO-I /O I X ) 3 3 — 
11 A 9 - I / 0 H I 3 3 
1 2 P W R D N * *** … … … 
13 TCLKIN-I/0 3 3 國 
14 I/O d D 3 3 — 
15 m _ 一 3 3 CT 
1 6 “ 3 3 3 3 
C I D 3 3 _ 
~ 1 8 “ V O G D “ 3 3 
1 9 “ VO CX) 3 3 曲 
“ T o " " “ C Z D 3 3 
~ “ ^ 3 3 1 
~ 2 ^ ^ … … … 
“ S “ “ " m C X ] 3 3 M P 
“ 2 4 ” V O 3 3 3 3 
xvii 
[ 2 5 I I/O 1 1 ] 1 
^ “ y o 1 1 [ 1 
^ I/O 1 1 1 1 
” V O m 4(4) ^ 4(8) 
7 9 " " “ ^ ‘ ^ 4(3) 4(5) 4(7) 
~ ^ I/O 1 1 1 1 
Ml-RDATA* … … ^ … 
“ T l M O - R T R I G ^ … ^ … 
^ M 2 - I / 0 1 1 1 1 
M HDC -I/O 1 1 1 1 
” " " “ T s “ I/O 1 1 I 1 
~ % LDC* -I/O 1 1 1 1 
^ ~ I/O 1 1 1 A 
^ I/O 1 1 1 A 
^ I/O 1 1 1 A 
" " ” ^ I/O 1 1 1 A 
7 l " " “ I/O 3 3 3 3 
^ INIT* - I/O 2 2 2 2 
7 3 ^ ^ *** ； … 
7 4 l i o 2 2 2 2 
I 45 lio 2 2 2 2 
I 46 VO 2 2 2 2 
I 47 VO 2 2 2 2 
7syo 2 2 2 2 
" " ” ^ " " “ I/O 1 1 1 A 
""”7o""“VO 2 2 2 2 
7 \ c C c c 
5 2 I / O C C C A 
" " ” “ « ~ XTL2(IN) -I/O 2 2 2 2 
“ M " “ R E S E T * … ^ … 
T s D O N E / P G * *** … … 
T e " " “ D 7 - I / 0 4 ( 1 ) , D 4 � , D 4 (6), D 4 (8)，A 
51XTLl(OUT) - BCLKIN -I/O 4 (1) 4 � 4 (6) 4 (8) 
7 s “ D 6 - I / 0 4(1)，D 4(4)，D 4 (6)，D 4 (8), A 
" " “ ^ “ l i o 4(1) 4(4) 4(6) 4(8) 
" " “ ^ ” D 5 - I / 0 4(1)，D 4(4)’D 4 (6)，D 4 (8)，A 
7 l ~ CSO* - I/O 2 2 2 2 
xviii 
Z ~ | D 4 - I / 0 4(1) ,D 4(4)，D 4(6)，D 4 (8)，A 
« m "‘ 4(1) 4 � 4(6) 4(8) 
“ T a “ v c c … … … … 
“ ~ 5 D 3 - I / 0 4 (2)，D 4 (3)，D 4 (5), D 4 (7), A 
66 C S l * - I / 0 2 2 2 2 
7 l D 2 - I / 0 4 (2), D 4 (3)，D 4 (5), D 4 (7), A 
^ “ VO 4 (2) 4 (3) 4 (5) 4(7) 
“ 7 9 “ VO 4 (2) 4 (3) 4 (5) 4(7) 
“ T o “ D l - I / 0 4 (2)，D 4 (3), D 4 (5)，D 4 (7), A 
71 RDY/BUSY* - RCLK* - I/O 4(2) 4(3) 4(5) 4(7) 
72 DO - DIN - I/O C D 3 3 3 
73 DOUT - I/O 3 3 3 Q Z ] 
“ T a “ C C L K *** *** *** *** 
" " “ T s " " “ A 0 - w s * - l / 0 4 (2)，D 4 (3)，D 4 (5)，D 4 (7)，A 
76 A1 - CS2 * - I/O 1 1 1 1 
" " “ T l A 2 - I / 0 4 (8) 4 (2) 4 � 4(6) 
“ ^ " " “ A3-I /O 4 (8) 4 (2) 4 (4) 4(6) 
7 9 “ m 4 (8) 4 (2) 4 (4) 4 (6) 
” T o m 4 (8) 4 (2) 4 (4) 4(6) 
7 \ “ A15- I /0 4 (8) 4 (2) 4 � 4(6) 
“ n A 4 - I / 0 4(8) 4(2) 4(4) 4(6) 
" " “ T s A 1 4 - I / 0 4(8) 4(2) 4(4) 4(6) 




Device Address Space ($) Notes 
EPROM (monitor program) 000000 - 007FFF user unaccessible 
EPROM (user program) 040000 _ 047FFF user accessible 
Local RAM 080000 - 080FFF 80000 - 807FF user unaccessible 
80800 - 80FFF user accessible 
r a m 0, 1,2 OCOOOO -13FFFF for memory-intensive designs 
Programmable switch 140000 - 14007F interconnection between FPGAs 
MC68681 180000 - 1800IF serial interface connected PC 
LcA 0 1COOOO downloading FPGAs config. data 
XX 




L _ — — � � I I . 厂 — 
I • ’ . � 
I . • , 











f • • 
f 
i ; i 丨 
i 
L-： . 
厂 、 ： ？ • . . ： . r -1’ / V, 
‘ ... •‘ t � ‘ � i ‘ .- ‘ 
； - v M . t ‘ 
；::..‘'. .’iu'.: 之“飞.1� ：、_1: 
： -f* • . • .S . 
‘ 4 “ \ • ， ： ‘ 
. … ’ ” .. xxi , 








































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































 ‘ . . 丨 . I .




 . . —
 .
 i —






 — . .


















 — ， ！ 广
 • 
- ,




 . . 
. - .
 . •













































































































 V 〜 i 
, _ _ 圈 MM 
~ m * rm » ^^  _ i ^  
DD 豪— r>i •二 ri m 
ng nS u D 3 WH 
=H A 丫 g5 . jiY J 
VY u W w “ 
Y V / 
I I — 
\\\\\\ 、、、！V 
m ^ s ^ M s ^^^ 
隱 腿 眺 j i j孤 11 [ i i ; 魂 
…U丨…m ro I办 injiwroh-o WTJH 主 t oij^MroHO mji-i g 
H 2 — 
K K - K 
断 盼 卡盼卜 f：畔 
二二二 •0“I3I3“I355555555 3 冗二!3^ 55555555 
i i I 
r r -
� W � ^v � J 
‘ ^ ^ I 
i r r ——tl 
USMSfciS^  JwteL LKMrtfcM kL N)\)N)roi<o J 
n^n rr I [ K 鬥 鬥 - o o � _ 办•tjlji^ ^^  
•；J 5 D^ ^^ TDOoam c -NIE! p^p^ ^^ ooain~t ^ LLLLLLL^Z^U 
^^ m^CUMt^O I/I7IH ^ ^ £ mi^ JKiilo ift^ -j 0% J r -
o® … O*" � 舌M ro 
K i K 
jjwwwwMwwl^^^I^ I^  rovjIovjfOMfO'ODDnooaaQ 
^ ^ • 瞧 b 騰 嚇 躲 瞧 二二 SS^IJIS 
I I I 
s n « « 
� • 誦 • 賴 碰 
ssssjs^ciis I # s i i E i ^ s i i ^ f ssSS^siste 
® O®* ® 3 W CD » - - IJ 
经 ^ t 
HWwIfcKK^^So^flHCDlo^wo^  HroMJ^Ji^^SovDKQoro^iMi^  
口 wwwwwwwwdddddSOD 二二二二=�=�55555555 
\ \ \ \ \ \ 、 \ \ \ \ \ \ \ \ \ S 
£ 
• • •^J>J>J>J>0 ^ • 一 ’ ff^ wroH > 
I • 嶋 、 
N) oro w -0 J 
K CO ^ J 01A JM：^ 
；^ 1>1>1>3>1>1>Q0 3W c 
“ 山 L_ 
O -<-<K-<-<-<-<-<XXXXXXXX ‘― � ^ ― — • _ _ 
卜卜卜卜卜卜卜—N) ro ro N 
� � � � r � � r r r r r r r r r ^^ ^^ ^^ ^^ ^ 
^ ^ 
Jr ^ ^ tk sj ^  n ' 
u S S A A A A A S g 2 S S ^ S § 
^ r\ r\ r-, rs 
S SiS i / i o ^ y i j ^ m 
• � � I N 
s 5 I \i\i\il 
中 i (ft m ‘‘ ‘‘ m S V K 5 V V V I 
^ ^ -J 
V V V 
？2 s i ” C • O CD W -I 0 m -< c I n -0 Tj m c » m 3 7} a n H < I 3 H z H W p rt- z m lA z t -t X o m r z n H w 
I C 73 m S i" o m r o c « z 厂• a z T z m • M n o -< < t： o H. o m s "I 30 • » s 3> or w S m z o in M (7 r H • H r n o -< • o • ui m z o •+ £ z o "n 
H ft -< H H 飞 I 
APPENDIX H 
Program listings 
P a r t i . h 
/ … … … … … … … … … … … … … … … … … … … … … … … … … … … / 
/* NET_LMT & CELL_IiMT calculated from equations in comments are only suggestion. */ 
/* CLB_LMT,IO_IiMT & CUT_LMT ARE fixed by emulation board hardware. */ 
/* UTIL is suggested by Xilinx data book. */ 
Refine SYM一NM—LEN 30 /* max. symbol name length */ 
#define RCD~NM_LEN 8 /* length of first entry of line in .map */ 
ttdefine FIIiiuM一LEN 13 /* max. filename length */ 
ttdefine CHAR一NO 200 /* max. no. of charactor on line in .map */ 
ttdefine NO_CP_ANA 0 I* no critical path analysis */ 
#ciefine FLIP_FLOP 1 /* critical path analysis */ 
#define ALL_COMB 2 /* treat all cells in CLB as combinational cells */ 
#define FPGA_LMT 4 /* max. no. of FPGAs in emulation board */ 
#define CLB_LMT 100 , 
#define REAL_CLB_LMT_D 80 /* real max. no to ensure 100% routability=CLB_LMT*0.8 */ 
#define CLB_IN_FPGA_D 20 /* max. no. of CLB in each FPGA=REAL_CLB_LMT*0.8/4 */ 
#define NET~LMT 200~ /* max. no. of NET incident on CLB in emulation board=CLB_LMT*2 */ 
#define CELL LMT 400 /* max. no. of CELL_LMT in CLB in emulation board=NET_LMT*2 " 
#define io_LMT 100 /* max. no. of 10 in emulation board (=CLB_LMT, for testing only) */ 
#define CXJT LMT 100 /* max. no. of cut size between FPGAs */ 
#define NO_NEIBOR_CLB 20 /* max. no of CLB which has no neighbour CLB */ 
ttdefine INIT CLB LMT D 64 /* inital max. no. of CLBs for partition=CLB_LMT*0.8*0.8 */ 
时 e f i n e INIT—CLB二IN_iPGA_D 20 /* initial max. no. of CLB in each FPGA for initial partitioning */ 
— — — 一 /* CLB_LMT*UTIL*0.8/4 */ 
#define SYS_LMT 50 , , 
#define PROG_PIN_LMT 50 /* PR0G_LMT*total_path_no/2 */ 
ttdefine LCA_IO_LMT 100 
struct BLOCK { 
char blk_name[SYM_NM_LEN]； 
int fpga~id, final_fpga一id; 
unsigned locked； 
struct LIST *sig_first； 
struct DL_LIST *gain_bukt[3]; h “ 
struct SIGNAL { 
char sig_name[SYM_NM_LEN]； 
unsigned~ip_net, op—net, bi_net, connected； 
struct LIST *blk_first； 
} ； 
struct LIST { 
unsigned lst_id, lst_type； 
char lst_name[SYM_NM_LEN]； 
struct LIST *next_lst； 
} ； 
struct CELL 
{ char cell_name[SYM_NM_LEN], cell_type[SYM_NM_LEN], op_net_name[SyM_NM_LEN]； 
unsigned ip_cell, op_cell, d_ff; 
int clb_id;~ • 
struct SUCCESSOR *suc_cell_first, *in_path; 
) ； 
struct SUCCESSOR ‘ 
{ 
unsigned sue一cell一id; 
struct SUCCESSOR *next:一sue一cell; 
} ； 
struct HEAD 
struct SUCCESSOR *first, *last; 
} ； 
struct DIi_LIST { “ 
int mag； 
struct DL_LIST *prevs, *next; 
} ； “ 
struct NO_NEIBOR 
{ char name�SYM_NM_LEN]; 
int fpga_id； 
xxiii 
struct LIST *net_first； 
h 
Struct lOB { 
unsigned fpga一id, iob一type, pin—id,-
char net—name [SYM一NM一LEN] 
} ; “ 一 一 
struct IO_NET { 
char net_name [SYM_NM_LEN] , no一neibor一name [SYM_NM一LEN]； 
unsigned net—id, i, o, pin 一id, fpga_id, lea—con 一 n o ; 
int clb； 
struct 10一 NET *next一io一net; 
} ； “ “ 
struct SYS_INTF 
{ 一 
unsigned pin[SYS一LMT], index； 
} ； “ 
struct CON { 
unsigned lea一con_no, lca_id[4], op一clb; 
} ； “ 
struct LCA_IO 
{ 一 
unsigned lca_con_no, adj一lea, bd_intf； 
} ； “ 





unsigned lca_con_no, sw_id, path—id[2] , index[2]； 
} ； 一 一 
xxiv 
Parti. c 
/ … … … … … … … … … … … … … … … … … … … … … … / 
/* cp一ana = 0： no critical path analysis. */ 
/* 1: critical path analysis. */ 
/* 2: critical path analysis, treat all cells in CLBs as */ 
/* combinational cells in sequectial circuit. */ 
/ … 會 … … … … … … … … … … … … … … … … … … … “ … / 
#include<st:dio. h> 
#include"parti.h" 





GetCmd(argc, argv) ； /* get coiranand line */ 
MakeList 0； /* make linked list for partition */ 
switch{cp_ana) 
case 0: /* no critical path analysis */ 
Partition(0)； " partitioning */ 
break； 
case 1： 
CPAna(l)； /* normal critical path analysis */ 
Partition(1)； /* partitioning */ 
break； 
case 2 ： 
CPAna(O)； /* combinational critical path analysis */ 
Partition(1)； " partitioning */ 
break； 








extern FILE *f_ptr； 
unsigned cp一ana, bal_parti, TtlFpgaNo； 
unsigned REAL_CLB_LMT, CLB_IN一FPGA, INIT_CLB一LMT, INIT一CLB一IN一FPGA; 
float UTIL, LOW_BAL_LMT, HIGH_BAL_LMT; 
Char infile[FILENM_LEN]； 
char *c_ptr, *strchr(), *strcat()； 





unsigned i, cpchked； 
unsigned char para; 
bal_parti=0； /* initialization */ 
cpchked=0； /* initialization */ 
CP ana=NO CP ANA； /* default： no critical path analysis */ 
UTIL=0.5;~ — /* • Utilization of CLB in FPGA = 0.5 */ 
TtlFpgaNo=4； /* ： no. of FPGAs used = 4 */ 
if(argc==l) 




printf("\nCritical path analysis (y/n)?"); 
getchar{)； 
key=getchar()； 
while(key!='y' && key!='Y' && key!='n' && key!='N')； 
if (key=='y'丨| l<ey==,Y') { 
do 
printf ("\nNormal critical analysis (y/n) ？ _,); 
getchar(); 
key=getchar()； 
while(key!='y' && key!='Y' && key!='n' && key!='N')； 
if{key=='y' || key=='Y,) 
cp_ana=FLIP_FLOP； 




printf("\nBalanced partitioning (y/n)？"); 
getchar()； 
key=getchar0； 
lhile(key!='y' && key!='Y' && key!='n' && key!='N')； 
if (k;ey=='y' 1| key==,Y,> { 
baljparti=l; 
do 
printf("\nHow many FPGAs used (2-4)？"); 
scanf("%d", &TtlFpgaNo)； 
while{TtlFpgaNo<2 || TtlFpgaNo>4)； 
} 
do 
{ printf("VnUtilization of FPGA (0.1-0.8)?"); 
scanf("%f", &UTIL)； 




















































printf("\nToo many parameters entered!\n")； 
help(); 
REAL_CLB_LMT=(unsigned)(CLB_LMT/4*TtlFpgaNo*UTIL); 
REAL~CLB~LMT= (unsigned) (REAL_CLB_LMT/TtlFpgaNo*TtlFpgaNo); 
CLB_IN_FiGA=(unsigned) {REAL_CLB_LMT/Tt1FpgaNo); 
INIT_CLB_LMT=(unsigned) (REAL_CLB_LMT* 0.8); 
INITICLB~IN_FPGA=(unsigned){INIT_CLB_LMT/TtlFpgaNo); LOW_iAL_LMT=(float)1/(float)TtlFpgaNo-0.07; 
HIGH_BAL_LMT=(float)1/(float)TtlFpgaNo+0.07; } “ “ 








{ if(strcmp(c_ptr, “.map") && strcmp(c_ptr, “.MAP")) 












printf("\nHARDWARE EMULATION BOARD -- PARTITIONING"); 
printf(••\n======= ================== ========== ===============")； 
printf (“ \nCommand ： " ) ; … „�•„„ _•� 
printf("Xnparti <filename[.map]> �-<para> [-<para> [-<para>]]] <ENTER>"); 
printf("\n"); 
printf (•’\nParameter ："); 
orintf ("\nf ： critical path analysis.”； 
printf("\nc ： critical path analysis, treat all cells in CLBs as "); 
printf("\n combinational cells")； 
crintf("\nb[/no.] ： balanced partitioning into [no.] of FPGA.")； 
printf("Xnul/ratio]： utilization ratio (e.g. 2 => ratio=0.2) of FPGA.")； 
printf ("\n ===== ======== ) 
print^CXncilkult： no critical path analysis, not balanced partitioning & "); 












































if(UTIL<0.1 II UTIL>0.80001) 
WrongPara()； 





M a k e l i s t . c 
/* clb_id in cell [] = -1： cell embedded in CLB which has no neighbour CLB. */ 
/* no_neibor一clb[] .fpga一id = -1: default value, CLB which has no neighbour */ 
/* CLBs not assigned to any FPGA. */ 
/* 1st一type in net一list = 1： input net. */ 
/* =0: output net. */ 





extern FILE *f_ptr； 
extern unsigned baljarti, CLB一IN一FPGA, REAL一CLB一LMT, TtlFpgaNo, cp_ana； 
char line [CHAR一NO] , rcd_ntti [RCD_NM_LEN] , sym_nm[SYM_NM_LEN] , pin一ran[3]; 
char sym_type [SYM_NM_LEN] , entry_nm[RCD_NM_LEN] , temp一sig一nni[SYM一NM一LEN]; 
char pin一type [2] , sig_nm [SYM_NM_LEN]； 
unsigned net一no, clb一no, cell一no, same一net一nm, temp—net一 no, cell一count； 
unsigned non_inv 一 c e l l , inv_cell一no, total一cell—no, total一clb_no; 
unsigned total_net_no, TtlNoClb, MaxPin, fb一net, misc' blk_nm； 
long cur_f_ptr； 
struct LIST *net_list, *clb_list； 
struct BLOCK clb[REAL_CLB_LMT_D]; 
struct SIGNAL net[NET_LMTT; 
struct CELL cell[CELL一LMT]； 
struct SUCCESSOR *suc一cell一list; 
struct 10一 NET *head_io_net, *io一net; 
struct NO~NEIBOR no:neibor一clb [NO_NEIBOR一CLB]; 
void MakeListO /* make lists for partitioning */ 
{ 
unsigned count, io_no； 
printf("\nMaking netlists...\n"); 
ChkLCAFile()； /* check 1st & 2nd lines of .map file format */ 
misc=0； ‘ /* initialization */ 
clb_no=0； /* initialization */ 
netlno=0； /* initialization */ 
cell_no=0； /* initialization */ 
io_no=0； /* initialization */ 
Ma^Pin=0； /* initialization */ 
blk_nm=2； /* initialization */ 
strcpy (entry nm, "SYM,") ； , ^  , 
SkipLines(entry_nm, 4, f_ptr)； /* skips line until 1st record encountered V 
sscanf(line, "%s %s %s", rcd_nm, sym_nm, sym_type); 




while(strncmp(sym_type, "lOB", 3)==(int)NULL) /* count lOB records */ { 
if(blk_nm==2) 






Skipliines (entry—ntn, 4, f_ptr)； 
ifTirno+l>IO_LMT) /* check too many 10 in cirkt. */ 






while(!strncmp(line, "EXT"’， 4)) 
cur_f_ptr=ftell(f_ptr)； 
ReadSYM()； 
Ihile(strncmp(sym_type, "CLB", 3)==(int)NULL /* CLB records V 
&& strncmp(line, "EOF", 3)！=(int)NULL) /* and not end of file */ 
CLBRcd(); 
total_clb_no=clb_no； 
if(bal_parti==0 && total_clb_no<CLB_IN_FPGA) 
{ printf("XnCircuit can be fit into ONE FPGA, no need to partition.")； 
printf("\n"); 
exit(O); 
if(bal_parti==l && total_clb_no<TtlFpgaNo) 











lONetLst()； /* make list of io net */ 
DellONetArrayO； /* delete 1-cell net */ 
if(cp_ana) 
WritelnvCellO； /* search for and write INV cells */ 
SucCellLstO； /* make successor cell list for critical path analysis */ 
/* printf("\nclb%d, net%d.total_clb_no, total_net_no)；*/ 
DelNoNeiborCLB()； /* delete CLB which has no neighbour CLB */ 
for(count=0； count <NO_NEIBOR_CLB； count++) 
no_neibor_clb[count!.fpga_id=-1； /* inititalization " 
/* for(count=0； count<NO_NEIBOR_CLB； count++) 
printf(“\nNo_neibor_clb [%d]=%s", count, no_neibor_clb[count].name)； 
printf("\nclb:%d, netT%d.", total_clb_no, total_net_no)；*/ 
} 
ChkLCAFileO /* check .map file format */ 
{ 
fgets(line, CHAR_NO, fjptr)； 
if (strncmpdine, "LCANET, 7) ！ = (int)NULL) /* first line -- LCANET, <version> */ 
NotLCA()； /* not .map file format */ 
fgets(line, CHAR_NO, f_ptr)； 
if (strncmpdine, "PROG,’,，5) ！ = {int)NULL) /* second line -- PROG, <prog. name�, <version>. . . */ 
NotLCA()； /* not .map file format */ 
} ‘ 
NotLCA() I* not .map file format */ 
{ 






SkipLines(entry nm, str len, t_ptr) /* skip line until record of entry_nm encountered */ 
char •entry.nm；" _ /* and read entries of 1st line in record */ 
unsigned str_len； 
FILE *t_ptr； 
while (strncmpdine, entry_nin, str_len) ！ = (int)NULL) 
fgetsdine, CHAR_NO, t_ptr); 
} 
ReadSYM() { 
fgetsdine, CHAR_NO, f_ptr)； 




struct LIST *temp_net_list, *temp； 
strcpy(clb[clb no].blk_name, sym_nm)； /* write CLB block name */ 
strcpy(entry_nm, "PIN,")； /* skip all CFG lines */ 
SkipLines(entry_nm, 4, fJptr)； 
sscanf(line, "%s %s %s %s", rcd_nm, pin_nm, pin_type, sig_nm)； � w 
while (strncmpdine, "PIN,", 4) == (int)NULL) /* make list of NETS for each CLB */ 
{ 
s ame_ne t_nm= 0； 
clb count=0； ^ 
do 一 /* search for same net name */ 
{ temp_net_list=clb[clb_count].sig_first； 
while{temp_net_list && strcmp(temp_net_list->lst_name, sig_nm)) 
t emp_net_li s t=temp_net_li s t->next_ls t； . 
if(temp_net_Tist && ！ s t r c m p ( t e m p _ n e t _ l i s t - > l s t _ n a m e , sig_nm)) 
SameNetNm(temp一net—list->lst_id)； 
clb_count++; 
ihile(！same_net_nm && clb[clb_count-l].sig_first)； 
fb_net=0； 
temp_net_list=clb[clb_no].sig_first； 
while(temp_net_list && strcmp(temp_net_list->lst_name, sig_nm)) 
temp_net~list=tetnp_net_list->next_lst； 
if (temp_neOist && ！ strcmp (temp_net_list->lst_name, sig_nm)) 
fb_net=l; 
if(!fb_net) 
{ net list=(struct LIST •)calloc(l, (unsigned)(sizeof(struct LIST)))； 
ChkMemLst(net_list); /* check memory space */ 
net list->lstIid=net_no； /* write net id. no. in net—list */ 
strcpy(net_list->lst_name, sig_nm)； /* write net name in net—list */ 
X X X 
if{！strncmp(pin_type, "I", 1)) 
net_list->lst_type=l； /* input net */ 
else 
net_list->lst_type=0； I* output net */ 
if(！clbTclb_no].sig_first) /* add net to net_list */ 
{ /* empty list */ 
net_list->next_lst=(struct LIST *)NULL； 










t emp_ne t 一 1 ist->ls t一 type=2 ； 
net [temp一net一list->lst:一id] . ip_net=0； 
net [temp_net_list->lst_id] .op_net=0； 
net[temp一net一list->lst 一id] .bi—net-l; 
temp=net [temp—net一list->lst:一id] .blk_first ； 
while(temp && temp->lst_type！=2) { 
if(temp->lst_id==clb_no) 
temp->lst_type=2； 








printf(“\nCircuit too many nets to fit in emulation board!"); 
printf("\n")； 








if(clb_no>REAL_CLB_LMT-l) /* check too large cirkt. */ 
{ “ 
if{TtlFpgaNo==4) 
printf("\nCircuit too large to fit in emulation board.")； 
else 
printf("\nCircuit too large to fit in only %d FPGAs.", TtlFpgaNo)； 
exit(O)； 
} if(！misc) { 
^^WriteCell ()； /* write cells for critical path ananlysis */ 
else if(!cp_ana) 
strcpy(entry_nm, "ENDMOD")； /* skip all MODEL records */ 
SkipLines(entry_nm, 6, fjptr)； 
} 




ChkMemLstdist) /* check for sufficient memory allocation */ 
struct LIST *list; 
if(list==(struct LIST *)NULL) 




SameNetNm(cell_id) /* search for same NET name */ 





AddlNetO 卜 add 1 net to net—list 
clb[clb no].sig first=net_list； 
SakeCLBlstO； /* 邮 k e s list of CLBs for each net */ 
} 
MakeCLBLstO /* make list of CLBs for each net */ 
{ while(net[net_no].blk_first==(struct LIST *)NULL || 
xxxi 
net [net一no] .blk_f irst->lst_id! =clb_no) 
{ “ 
clb_list=(struct LIST *)calloc(l, (unsigned)(sizeof(struct LIST)))； 
ChkMemLst(clb_list)； ” check memory space " 
clb_list->lst_id=clb_no； 
strcpy (clb-list->lst一name, clb[clb_no] ,blk_name)； 




if (Tnet [net_no] .blk_first) /* empty list */ 
{ 
net[net_no].blk_first=clb_list； 










fgets(line, CHAR_NO, f_ptr)； 
sscanf(line, "%s %s %s %s", red— ran, pin_nm, pin_type, sig_nm)； 
} 
WriteCellO /* read in and write cell name for critical path analysis */ 
{ ReadSYM()； 
while(stmcmp(line, "ENDMOD", 6) ！ = {int)NULL) 
strcpy{cell[cell_no].cell_name, sym_nm)； /* writes cell name */ 
strcpy(cell[cell:no].cell-type, sym_type)； /* writes cell type */ 
if(cell[cell_no]Tclb_id!=-1) , 
cell[cell_no].clb_Id=clb_no-l； /* writes C L B no in which the cell resides */ 
if (strcmp{s^_type, "DFF") == (int)NULL) /* D flip flop cell */ cell[cell—no].d_ff=l; 
do 




TestCellLmt 0； ,^ 
strcpy(entry_nm, "END"); /* skip lines until end of current SYM */ SkipLines(entry_nm, 3, f_ptr)； 
ReadSYM()； 
} ； } 
TestCellLmt0 { 
if(cell_no>CELL_LMT) 
{ printf("\nCircuit too many cells in CLB to fit in emulation board!"); 
printf ("\n'')； 





lONetLstO /* find net with only 1 cell incident on */ 
{ 
struct LIST *temp； 
unsigned i, o； 
net_no=0； while(net[net_no].sig_name[0]) { 
i=0; 
o=0 ; temp=net[net_no].blk_first; 
if (met [net_no] .blk_f irst->next_lst && ！ net[net_no] .bi_net) 
^ io_net=(struct 10一 NET *)calloc(l, (unsigned)(sizeof(struct IO_NET)))； 
ChkMemlONet(io_net); 
io_net->net_id=net_no； ioInet->clb=(int) (net[net_no].blk_first->lst_id); io_net->lca_con_no=l； 
strcpy(io_net->net_name, net[net_no].sig_name)； if(net[net—no].ip_net==l) io_net->i=l; if(net[net_no].op_net==l) io_net->o=l； if(!head_io_net) { 
he ad_i o_ne t=i o_ne t； 





io—net- >next_i�一net=head一 io一net ； 
head一 io一net:=io_net ； 














if((i==l && o==0) II (i==0 && o==l)) 
io_net=(struct 10一 NET *)calloc{l, (unsigned)(sizeof(struct 10一 NET))); 
ChkMemlONet (io_net)； 
io一net->net一id=net一no； 
io:net->clb:(int:) (net [net_no] .blk_f irst->lst_id); 
i o_ne t->1ca_con_no=0； 
Strcpy (io_net->net_name, net [net一no] . sig一name)； 
if (i==l && o==0) 
io_net->i=l； 
if {i==0 && 0--1) 
io_net->o=l； 
if(!head_io_net) /* empty list */ 
{ “ “ 
head一 io-xiet-io一net ； 











ChkMemlONet(ionet) /* check for sufficient memory allocation */ 
struct IO_NET *ionet； 
{ if(ionet==(struct IO_NET *)NULL) 




DellONetArrayO /* search delete io net in net array */ 
{ 
unsigned i, j; 
int count; 
j=0； � 
for(count=0； count<total_net_no； count++) 
{ if(!net[count].blk_first->next_lst) 
{ DelIONetLst(count+j); /* delete io net in net-list */ 
if(count!=total_net_no-l) 





net [i-1] .blk_first=net[i] .blk_first; 
} 
count--； 
} ； total_net_no--； 
} ； “ 
LangelONetldO ； /* renumber net id. after deleting io net */ 
} 
DelIONetLst(io_net_id) /* search & delete io net in net-list */ 
unsigned iojnet一id; { _ 
unsigned countl; 
struct LIST ••temp, *templ； 
for(countl=0； countl<total_clb_no； countl++) 




while ((*temp) ->next_lst &&�*temp)->lst一id! =io一net一id) 
temp=&((*temp)->next_lst)； 
if�（*temp)->lst_id=-io一net一id) /* io net encountered */ 
{ “ 
templ=(*temp)->next_lst； /* delete io net */ 






ChangelONetIdn /* change net id. in net-list & in io-net-list */ 
{ 
unsigned count, countl； 
int i； 
struct LIST *temp； 
struct 10一 NET *tempi; 
for(count=0； count<total_net_no； count++) { “ “ 
count1=0； 
while(countl！=total一clb一 no) 














while (tempi && teTnpl->net:一id! =count) 
if {！ strcmp (tempi->net_name, net [count] . sig一name)) 
tempi->net_id=count； 








fseek(f_ptr, cur_f_ptr, 0)； 
wSle^trncmpdine, "EOF", 3)) /* search for and write INV cell */ 
{ 
st rcpy(entry_nm, “MODEL“)； 
SkipLines(entry_nm, 5, f_ptr)； 
fgets(line, CHAR_NO, f_ptr); 





{ if (strstrdine, INV")) ^ , 
InvCellO； /* INV cell encountered */ 
ReadPINO ; 
ihile(!strcmp(pin type, "I,") && strncmp(line, "END", 3)); 
strcpy(entrylnl "eSS"); /* Skip lines until end of current SYM */ 




fgets(line, CHAR_NO, f_ptr)； 
ReadSYM()； 
} ； total一cell一no=cell一 no; } “ -




while(strcmp(cell[cell_count].cell_name, temp_sig_nm) && 
cell[cell一count].cell—name[0]) 
cell_count++; 










} ； } 
SucCellLst 0 /* make list of successor cell for critical path 
analysis */ { 
fseek(f_ptr, cur_f_ptr, 0)； 
ReadSYM()； 
cell一no=0； 
while(strncmp(line, "EOF", 3)) { 
strcpy(ent ry_nm, “MODEL“)； 
SkipLines(entry_nm, 5, f_ptr)； 
fgets(line, CHAR_NO, f_ptr)； 
while(strncmp(line, "ENDMOD", 6)) { 
ReadPIN()； 
while(！strncmp(line, "PIN,", 4)) { 
if(！strcmp(pin_type, "O,")) 
{ ChklOCell {)； /* check if the cell connected to output pin is io cell 
* / 
fgetsdine, CHAR_NO, f_ptr)； 
if(strncmp(line, "PIN,", 4)) 
f g e t s d i n e , CHAR_NO, f _ p t r ) ； 
else 




if(strncmp(line, "PIN,", 4)) 
fgetsdine, CHAR_NO, f_ptr)； 
else 
sscanf(line, "%s %s %s %s", rcd_nm, pin_nm, pin—type, sig_nm)； 
} 
sscanf(line, "%s %s %s", red一nm, sym_nm, sym_type)； 
cell_no++； 
} _ 
fgetsdine, CHAR_NO, f_ptr)； 
R e a d S Y M ( )； 
} 
fgetsdine, CHAR_NO, f_ptr)； 
R e a d S Y M ( )； 
} 
D e l N o N e i b o r C L B O / * r e c o r d & d e l e t e n o - n e t - o n - C L B s * / 
{ 
u n s i g n e d i； 
i n t c o u n t ； 
struct IO_NET *temp； 
struct LIST *tempi; 
for(count=0； count<total_clb_no； count++) 
{ if(clb[count].sig_first==(struct LIST *)NULL) 
{ ChangeBlkId(count)； /* change CLB id. in cell & in io net */ 
strcpy(no_neibor_clb[TtlNoClb].name, clb[count].blk_name)； 
T t l N o C l b + +； 
i f ( T t I N o C l b > N O _ N E I B O R _ C L B ) 
printf("\nToo many CLBs which has no neighbour CLBs！\n")； 
printf("\nType \"parti -h <ENTER>\" for help..."); 
printf ("\ii")； 
exit(O); 
If(count!=total_clb_no-l) /* shift CLB in CLB array after deleting */ 
{ for(i=count+l; i<total_clb_no； i++) 
{ strcpy (clb [i-1] .blk_name, clb[i] .blk_naitie)； 
clb[i_l].sig_first=clb[i].sig_first； 
t e m p = h e a d _ i o _ n e t ； … • 
while(temp) /* change CLB id. in 10 net */ 
{ 
i f ( t e m p - > c l b = = i ) 
t e m p - > c l b = i - l ; 





forU=0; i<Ltal_net_no； i++) /* change CLB id. in CLB list */ 
{ 















struct 10—NET *tenip； 




teTnp=head一 io一 net ； 
while (temp) { 
if(temp->clb==j) { 
tetnp->clb=-l； 
strcpy (temp- >no一neibor一name, clb [j ] . blk—natne)； 
} - 一 
temp=temp->next_io_net； 
} - 一 
} 
ChklOCellO { 
struct IO_NET *temp_io_net； 
temp_io_net=head_io_net； 
while{temp_io_net->next_io_net && strcmp(temp_io_net->net_name, sig_nm)) 
temp_io_net=temp_io_net->next_io_net； 
if(！strcmp(temp_io_net->net_naTne, sig_nm)) { _ -
if ('strstrdine, INV”）） 
{ if(temp_io_net->i) /* system input net */ 
cell[cell_no].ip_cell=l; 








if(temp_io_net->i && cell[inv_cell_no].ip_cell==0) 
cell[inv_cell_no].ip_cell=l; 
if(temp_io_net->o && cell[inv_cell_no].op_cell==0) 





if (! strstrdine, INV")) { 
ChklOCell0 ； 
cell_count=0； 







strcpy (temp_sig_nm, sig一nm)； 
strcat (temp一sig—nm' “ INV")； 
cell 一 count:=non-inv一 cell ； 
while(cell_count<total_cell_no && 
strcmp (cell [cell一count] . op_net_name, tenip_sig_nm)) 
cell一count++7 
inv cell no=cell一count; . .. ^ , 






if (！ strcmp (cell [cell_count] .op_net一name, sig一nm)) 
xxxvi 
if(！ ChkSucCell(cell[cell_count].suc_cell_first, inv_cell_no)) 




while(！strcmp(pin_type, "I,") && strncmp(line, "END", 3))； 
} 
AddSucCell(first, suc_cell_no) /* adds successor cell */ 
struct SUCCESSOR **first; 
unsigned suc_cell_no； 
suc_cell_list= (struct SUCCESSOR *)calloc(l, (unsigned) (sizeof (struct SUCCESSOR)"; 
ChkMemSuc(suc_cell_list)； 
suc_cell_list->suc_cell_id=suc_cell_no； 
if (:firs[==(struct一SUCCESSOR *TnullT { 
*first=suc一cell一list 7 





}； 一 一 
} 
ChkMemSuc(sue) 
struct SUCCESSOR *suc； 
{ 
if(suc==(struct SUCCESSOR *)NULL) 




ChkSucCell(first, cell_id) /* check exist successor cell id. in list *first */ 
struct SUCCESSOR *first； 
unsigned cell_id； 
{ 
Struct SUCCESSOR *temp一sue一cell 
t einp_su c_ce ll=first ； 
if(!templsuc_cell) /* empty list */ 
return(0)； 
else 











/* Path cpO stores the final critical path. */ 
/* Path cpl stores the current path. */ 





ttdefine ADD 1 
#define SUB 0 
#define PRI_GATE_LMT 30 
extern unsigned total_cell_no, cell_count, cp_ana, total_clb_no； 
extern char line[CHAR~NO]；" 
extern struct BLOCK clb[REAL_CLB_LMT_D]； 
extern struct CELL cell[CELL_LMTT； 
extern struct SUCCESSOR *suc_cell_list； 
unsigned cntO, cntl, flip_flop, total_gate_no, dff_delay, path—delay; 
unsigned long max—delay[2]； 
FILE *rf_ptr； 
struct SUCCESSOR *head_cpl, *cp[2], *suc_cell_ptr[CELL_LMT]； 
struct SUCCESSOR *head_stack, *stack_lst； 
struct HEAD head_cpO； 
struct PRI_GATE pri_gate[PRI_GATE_LMT]； 
CPAna(ff) /* find critical path & lock CLB */ 
unsigned ff； 
printf("\nCritical pass analysis...\n"); 
flip_flop=ff; 
max_delay[0]=0； /* initialization */ 
CPInit()； 
for(cell_count=0； cell_count<total_cell_no； cell_count++) 
if(cell[cell一count].suc_cell_first) 
if(flip_flop==l) 
{ if(cell[cell_count].ip_cell==l || cell [cell_count].d_ff==l) 













printf("\nFile delay.dat not found!\n")； 
exit(O)； 
fgetsdine, CHAR_NO, rf_ptr)； 
do 
{ sscanf (line, ’’％s %d", pri_gate[i] .gate, &(pri_gate [i] .gate_delay))； 
if(！strcmp(pri_gate[i].gate, "DFF")) 
dff_delay=pri_gate[i].gate—delay; 
else if(！strcmpTpri_gate[i].gate, "PATH")) 
path_delay=pri_gate[i].gate_delay； 
if(i==PRI_GATE_LMT) 
{ printf("\nError! Too much primitive gate type.\n")； 
exit(O)； 






















DelShorterPathO； /* delete shorter path after comparing current with stored 







DeAllocPath(&(head_cpl), 1); } “ 
while (head一stack)； } “ 





cell[cell_id].injpath=suc_cell_list； } “ 
CalMaxDelay(cal, cell—id) /* find cell type for calculating path delay */ 
unsigned cal, cell_id； 
{ 
unsigned i； 













UpdateDelay(cal, cell_delay) /* update path delay */ 




else max_delay[l]=max_delay[1]-cell_delay； } 一 
FindPath(count) /* find whole path from ip. to op. */ 
unsigned count; 
{ if(count!=cell_count) /* check if count is output cell or d_ff */ 
{ if(flip一flop==l> 












ifi'^ uc cell Dtr[cntO] && ！ ChkSucCell (head—stack, cntO)) 
ISsucSM"head_staclc) , cntO) ； /* put cell in stack which has more than 2 successor cells */ 




{ cntl=cell[cntOl.suc_cell^first->suc_cell_id；cntl： successor cell */ � 
lf(!cell[cntll.in_path) /* check feedback path V 
AddPathLst(cntl)； 





DeAllocStackFirstO /* delete 1 cell in stack */ 
{ 
struct SUCCESSOR *temp； 
unsigned k; 
)c=head一 stack->suc 一 cell 一 id; 
suc_cell_ptr[k]=cell[k]•sue一cell一first； 
temp=head_stack->next_suc_cell； 
free((struct SUCCESSOR *)head_stack)； 
head_stack=temp； 
} 一 









return(0)；" “ /* feedback path */ 




} ； } 
DeAllocPath(first, x) /* delete path */ 
struct SUCCESSOR **first; 
unsigned x； 
{ 
struct SUCCESSOR *temp； 
while(*first) { 
if (X) 
cell[(*first)->suc_cell_id].in_path=(struct SUCCESSOR *)NULL； 
temp=(*first)->next_suc_cell； 
free((struct SUCCEsioR *)(*first)); 
*first=temp； 
) ； } 
CopyPathO /* copy path 1 to path 0 */ 
{ struct SUCCESSOR *temp, *temp_path_lst； 
temp=head_cpl； 
while(temp) 




head_cpO.1as t=t emp_path_ls t； 
} 一 
else 
head_cpO. last->next_suc_cell=tempjpath_lst； head~cpO.last=temp_path_lst； 
J . 一 
h:ad一 cpO. last:->suc 一 cell 一 ici=tetnp->suc 一 cell 一 id; 
temp=temp->next_suc_cell； 
} ； 一 
} 
n，’ ^ ” … � i i iH� /* delete all one-successor-cells until */ 
DelSucCells(cell_id) which more than 2 successors encountered */ 
unsigned cell一id； ‘ 
{ 
struct SUCCESSOR *temp； 
while(head_cpl->suc_cell_id！=cell_id) 
{ CalMaxDelay(SUB, head_cpl->suc_cell_id)； 
cell[head_cpl->suc_cell_id].inj)ath=(struct SUCCESSOR *)NULL； 
temp=head_cpl->next_suc_cell； 






/* fpga_id in clb[i]=-l: default value, not assigned to any FPGA. */ 
/* clb_Id in cell[i]=-l： cell [i] is in CLB which has no neighbour CLB. */ 
/* locked in cell[i]= 0： free cell. */ 
/* =1: locked cell, locked within a pass. */ 





#define INC 1 
#define DEC 0 
#define CR OxOd 
extern unsigned total_clb_no, total_net_no, real_clb_lTnt, total_cell_no； 
extern unsigned total_no_net_clb, MaxPin, TtlNoClb, bal_parti, CLB_IN_FPGA； 
extern unsigned INIT_CLB_LMT, REAL_CLB_LMT, INIT_CLB_IN_FPGA, TtlFpgaNo； 
extern float LOW_BAL~LMT, HIGH_BAL_LMT; 
extern struct BLOCK clb[REAL_CLB_LMT_D]; 
extern struct CELL cell[CELL一LMTT; 
extern struct SIGNAL net[NET_LMT]； 
extern struct HEAD head一cpO, *head_cpl, *suc_cell_ptr[CELL_LMT]； 
extern struct NO_NEIBOR no_neibor_clb[NO_NEIBOR_CLB]； 
extern struct IO_NET *head_io_net； 
struct GAIN { 
int clb—id, gain—value 
} ； 一 
struct MOVE { 
unsigned fm_fpga, to_fpga, base_clb； 
int move一gain; 
struct DL_LIST *best_move； 
} ； -
unsigned clb_cnt[FPGA_LMT], total_heap_no, min_cut_size； 
unsigned cut:size, total_gain_no, clb_dist[NET_LMT] [FPGA_LMT]； 
unsigned move_no, total_move_no, cnt, no_valid_move, final_cut, pass； 
unsigned min_clb_no, max一clb_no, temp_clb_cnt[FPGA_LMT]； 
struct GAIN iainTFPGA_LMT] [FPGA_LMT] [CLB_IN_FPGA_D]; 
Struct DL_LIST *gain_lst, gain_ptr[FPGA_LMT][FPGA_LMT]； 
Struct DL:LIST *bucket[FPGA_LMT] [FPGA_LMT] [NET_LMT]; 




unsigned count, countl； 
int X, y； 
float z； 
printf{"\nPartitioning...\n")； • 
for(count=0； count<TtlFpgaNo； count++) /* initialization */ 
clb cnt[count]=0； /* no. of CLBs currently in each FPGA */ 
final二cut=CUT_LMT; /* initialization */ 
min cut size=CUT_LMT； , • . . ， . ^ . pass=o" /* initialization */ 
































for(x=TtlFpgaNo-l； x>(TtlFpgaNo-1-(int)countl)； x--) 
clb_cnt[X]=count+l; 
for(y=x； y>((int)count1+x-(int)TtlFpgaNo) ； y--) 
clb一cnt[y]=count； 
if(cp_ana==0) 
Netwklnit(TtlFpgaNo-l, 0)； /* obtain initial partition */ 
else 
count=LockCP() ； /* lock CLBs in critical path in 1 FPGA */ 
Netwklnit(count, cnt)； /* obtain initial partition by assigning other CLBs */ 
ClearCPO； /* delete all paths created in critical path analysis */ 
} ； 
do { 
f inal 一 cut-min一 cut_size; 
PartiInitO； 一 一 /* initialization for partitioning */ 
printf("\nPartitioning pass %d...\n", pass+1)； 





printf("\nFinal Assigment with cut size %d：final_cut)； 
printf{"\nCLB distribution： {%d, %d, %d, %d}.\n", clb_cnt[0], clb_cnt[l], 
clb_cnt[2], clb—cnt[3]); , 
AsgnNoNeiborCLBO ； _ /* assign CLB which has no neighbour CLB */ 
} 
Netwklnit(i, j) /* assign CLBs randomly */ 
unsigned i, j； 
{ 
unsigned count; 
for(count=0； count<total_clb_no； count++) 









LockCPO /* lock CLBS in critical path in 1 FPGA */ 
{ 
unsigned i, k； 
int j; 







i=cell[temp->suc cell_id].clb_id； . … 們 if工=丄1) /* CLB which is no neighbour CLB */ 
if (clb[j] .locked==0) 
clb[j].fpga一id=(int)i; 
clb[j].locked=2； k++; 
} ； temp=temp->next_suc_cell； 
} ； 




/* delete all paths created in critical path analysis */ ClearCPO { 
unsigned count; 
DeAllocPath(&(head_cpO.first)); 
for(count=0； count<total_cell_no； count++) 







unsigned count, count1, i； 
/* i:0; 
for(count=0； count<FPGA_LMT； count++) 
temp一fpga[count]=-1； 
for(count=0； count<FPGA_LMT； count++) 
{ 一 
if(clb_cnt[count]>0) { “ 
temp_fpga[i]=(int)count； 
if(i!=count) { 













PartilnitO /* initialization for partitioning */ 
{ unsigned n, a, f, t, b, cnt[FPGA_LMT], count, countl, count2； 
int i, j, k； 
struct LIST *temp； 
struct DL一LIST *tempi； 
for(count=0； count<TtlFpgaNo； count++) 
cnt[count]=0； 
for(n=0; n<total_net_no； n++) 
for(a=0； a<TtlFpgaNo； a++) “ . . . " clb dist[n][a]=0； /* initialize no. of CLBs incident on net i in FPGA ] " 
for(f=0； f<TtlFpgaNo； f++) 
for(t=0； t<TtlFpgaNo； t++) 
for(count=0; count<CLB_IN_FPGA； count++) 
{ gain[f] tt][countl.clb_id=-l; /* initialize gain of CLB after moving to another FPGA V 
gain[f][t][count].gain_value=0； 
for(Lo； n<total_net_no； n++) /* initialize CLB distribution */ 
{ 
temp=net[n].blk_first； 




} ； ‘ 
} 
forTnio!^nitotal_net_no； n++> /* calculate cut size between FPGAs */ 





L(a=0； a<TtlFpgaNo； a++) /* calculate gain of CLB after moving to another FPGA */ 
for(b=0； b<total_clb_no； b++) 
{ if(clb[b].fpga_id==(int)a) 






while(temp!=(struct LIST *)NULL) { . 
n=temp->lst_id； 
for(count=0; count<TtlFpgaNo； count++) { 
if(count==a) 
if{clb_dist[n][count]==1) 










} ； } 
t emp=t emp->nexts t； 
} ； -
cnt[a]++； 
} ； } 
} 
total一gain—no=MaxPin* 2+1； 
for(count=0； count<Tt1FpgaNo； count++) 
for(count1=0； count1<Tt1FpgaNo； countl++) { 
if(count！=countl) { 
i=(int)MaxPin； 
for(count2=0； count2<total_gain_no； count2++) 
{ 一 /* make gain structure */ 
bucket[count][countl][count2]=(struct DL_LIST *)calloc(l, 
(unsigned)(sizeof(struct DL 一 L I S T ) ) )； 
ChkMemDL(bucket [count] [countl] [count2])； 
bucket[count][countl][count2]->mag=i； 
bucket[count][countl][count23->next=(struct DL_LIST *)NULL； 
bucket[count][countl][count2]->prevs={struct DL_LIST *)NULL； 
i--; 
for(count2=0； count2<cnt[count]； count2++) 





gain一lst=(struct DL_LIST *)calloc{l, (unsigned) 






clb[i] .gain一bukt [k] =gain__lst； 
gain一1s t->mag=i； 
AddlGain{count• countl, a)； 
} 
} 
count2=0； . ^ L L , “ 
while{count2<total_gain_no && /* find max. gain pointer of each bucket */ 
bucket[count][countl][count2]->next==(struct DL_LIST *)NULL) 
count2++； 
if{count2<total_gain_no && 






for^counSorcoint<TtlFpgaNo； count++) /* record all max. gain pointers in heap */ 
for(countl=0； count1<Tt1FpgaNo； count1++) 
if(count!=countl && gain_ptr[count][countl].next!=(struct DL—LIST *)NULL) 
if(heap[0].best_move=={struct DL一LIST *)NULL) 
{ WriteHeap(0, count, countl); 
total_heap_no++； 
} 一 else InsertHeap(count, countl)； 
} 
} 
… . … … “ ！ … /* check for sufficient memory for structure for DL_LIST */ ChkMemDL(dl_lst) ‘ 
struct DL_LIST *dl_lst； 
{ if(dl_lst=={struct DL_LIST *)NULL) 




AddlGain(X, y, z) /* add 1 狎 i n list in gain structure */ 
unsigned x, y, z； 
if(bucket[X][y][z]->next==(struct DL_LIST *)NULL) 
bucket[X][y][z]->next=gain_lst； 
gain_lst->prevs=bucket[x][y][z]； 










InsertHeap(j, k) /* search appropiate position & insert 1 heap record */ 




while(i<total_heap_no && gain_ptr[j][k].mag<heap[i].move一gain) 
i++; 一 if{i==total_heap_no && heap[i].best_move==(struct DL_LIST *)NULL) { “ “ 
WriteHeap(i, j, k)； 
total_heap_no++； } “ “ 
else { 
HeapShiftDown�； 
WriteHeap(i, j, k)； 
total_heap_no++； } _ 
} 
WriteHeap(x, y, z) /* write the best gain pointer */ 






heap Ix] .Tnove_gain=heap [x] . best_move- >mag； } _ 















no一 valid一 move-0； 
while(total_heap_noI=0 && ！no一 valid一 move) 
{ 一 








/* move base CLB from own FPGA to another FPGA */ 
MakeMove() ‘ 
{ 
unsigned i, j, k, 1, m, n, o, p; 
int X, y； 





x = h e a p [ 0 ] move_gain； perform at least 1 pass even cut size > limit initially V if(pass==0) ‘ ^ if(！bal_parti) 
^ while(l<total heap_no && clb_cnt[k]+1>CLB_IN_FPGA) ， • � " “ 一 /* invalid if no. of CLB in FPGA > limit */ 








else { while(l<total_heap_no && (clb一cnt [j]-l<Tnin一clb一no | | clb一cnt [k] +1 >max_clb_no)) 
1++； /* test next heap record */ i=heap[1].base一clb; j=heap[1].fm_fpga； k=heap[1].to_fpga； x=heap[l].move一gain; 
} 一 
} 
} else if(pass>0) { 
if{!bal_parti) 
while (l<total_heap_no && /* test valid move “ 
(clb_cnt [kT+l>CLB_IN_FPGA || cut_size- (unsigned) x>CUT_LMT)) 
{ 一 一 一 /* invalid if cut size or no. of CLB in FPGA > limit */ 
1++； i=heap[l].base_clb； j=heap[l].fm_fpga； k=heap[l].to_fpga； x=heap[1].move一gain; 
} 
} else 
while(l<total_heap_no && (clb一cnt[j]-l<min_clb_no || 
clb一cnt[k]+l>max_clb_no || cut_size-(unsigned)x>CUT_LMT)) 
{ 1++,. /* test next heap record */ 




{ no_valid_tnove=l； /* no valid move in heap */ 
return(0)； 
clb[i].locked=l； /* lock base CLB */ clb [i] .fpga_id=(int)k； clb_cnt[j]--; 
clb cnt[k]++； , ^, 
cut"size=cut_size-(unsigned)x； /* update cut size */ WrileMoveRcdTi, j, k, cut—size); /* record move */ 
UpdateGainPtrBf{j, k, i, x, 0)； . � 
RemoveGainNoded, j, k) ； ” remove gain node “ 
for(1=0； IcTtlFpgaNo; 1++) { 
if(l!=k && j!=1) { 




UpdateGainPtrBf(j, 1' i, x, 0); 
RemoveGaixiNode (i, j , D ； } 
} 
whn:(temJl = (s?;^crLisT *)NULL) /* update CLB distribution, gain . gain structure */ 




while(tempi！=(struct LIST *)NULL) { 
m=templ->lst_id； 
if(clb[m].locked==0) 





UpdateAll(j, k, n, m, o, y, D ; 
for(o=0； o<TtlFpgaNo； o++) 
if(o!=j && o!=k) 





y=gain[o] [k] [n] . gain一value 
p=MaxPin- (unsigned) y； 
UpdateAll(o, k, n, m, p, y, 1)； 
} 
} 






while(tempi！=(struct LIST *)NULL) { 
m=templ->lst_id； 
if(clb[m].locked==0 && clb[m] .fpga_id==(int)k) { 
n=0; 




UpdateAll(k, j, n, m, o, y, 0)； 
for(o=0； o<TtlFpgaNo； o++) 
if{o!=j && o!=k) { 
y=gain[k][o][n].gain 一 value; 
p=MaxPin-(unsigned)y； 





clb_dist[l][j]--; /* move */ 
clb_dist[l][k]++； 
if(clb_dist[l][j]==0) /* after move */ 
{ 
templ=net[l].blk_first； 









UpdateAll(k, j, n, m, o, y, 0); 
} 
for(o=0； o<TtlFpgaNo； o++) 
if{o!=j && o!=k) if(clb[m].fpga_id==(int)o) { 
n=0； 











while(tempi!=(struct LIST *)NULL) { 
Tn=templ->lst_id； 





UpdateAll (j, k, n, m, o, y, 1); 
for(o=0； o<TtlFpgaNo； o++) 
if(o!=j && o!=k) 
{ y=gain[j][o][n].gain—value; 
p=MaxPin-(unsigned)y; 









WriteMoveRcd(i, j, k, 1) /* record all moves within a pass */ 
unsigned i, j, k, 1 ； 
{ 
move一red [move一no] . base_clb=i ； 
move一red [move一no] . f m一f pga=j ； 
move一red [move-no] . to_fpga=k； 
move一red [move—no] .move一gain: (int) 1 ； 
move一red [move一no] .best_move=heap [0] .best一move; 
move一no++; 
if (move一no:=REAL一CLB一LMT) { “ 




UpdateGainPtrBf (i, j, n, k, o) /* update gain pointer before removing gain node */ 
unsigned i, j, n, k, o； 
{ 
unsigned 1, m； 
if(o==DEC) 
l=MaxPin-k+l； m=0; 
while(heap[m] .fm_fpga!=i || heap[m].to_fpga！=j) 
m++； 
if{gain_ptr[i][j].next->next->next==(struct DL_LIST *)NULL) 
{ /* update max. gain pointer */ 
if(o==DEC) { 
HeapShiftUp(m) ； ^ ^  , … , 
if{k;+MaxPin!=0) /* max. gain pointer not at bottom of bucket */ 
{ 
while(l<total_gain_no && 
bucket[i]Tj] [11->next==(struct DI•一LIST *)NULL) 
1++; 
if(l<total_gain_no && 
bucket[i]Tj][lT->next!=(struct DL_LIST *)NULL) 
{ gain_ptr[i][j].mag=bucket[i][j][1]->mag; 
gain_ptr[i] [j].next=bucket[i] [j] [1]; 
InsertHeap(i, j); /* update heap */ 
} 
else if(l==total_gain_no) 
















{ if(gain_ptr[i][j].next->next->mag==(unsigned)n) heap[m].base_clb=(unsigned)(gain_ptr[i] [j].next->next->next->Tnag); 
} 一 
else 





.一，.、 /* delete heap[i] record & move all lower records up */ HeapShiftUp(i) 
unsigned i; { 
unsigned 1; 




heap[1-1].move一gain=heap[l].move_gain； heap[1-1].best一 move=heap[l].best一 move; 
Lap[1-1] .best_move= (struct DL_LIST *)NULL; 
total_heap_no--； 
} - “ 
RemoveGainNoded, j, k) /* remove gain node of base CLB */ 
xlviii 






if(clb[i].gain_bukt[1]->next==(struct DI._LIST *)NULL) 
{ 一 
clb[i] .gain_bukt[l] ->prevs->next= (stxruct DL_LIST *)NULL； 
free((struct DL_LIST *)clb[i].gain_bukt[1]); 





free((struct DL_LIST *)clb[i].gain_bukt[1])； 
clb[i].gain_bukt[l]=(struct DL_LIST *)NULL； 
} 一 
} 
UpdateAlKi, j, k, 1, m, x, o) /* update CLB distribution, gain, bucket & heap after a move */ 




if(gain[i] [j] [k].gain_value==gain_ptr[i] [j].mag) 
{ if(o==INC) 
UpdateGainPtrBf(i, j, 1, x, 1); 
else 
UpdateGainPtrBf(i, j, 1, x, 0)； 
} 




gain[i][j][k].gain_value--； � … � 
gain_lst=(struct DL_LIST *)calloc(l, (unsigned)(sizeof(struct DL_LIST)))； 








AddlGain(i, j, m-1); 
else 
AddlGaind, j, m+l)； 
UpdateGainPtrAt(i, j, k); 
} 
UpdateGainPtrAt(i, j, k) /* update gain pointer . heap after removing gain node */ 




ihile(l<total_heap_no && (heap [1] . fm^fpga! =i || heap[l] .tO-fpga!-：))) 
1 + + • 
if(l<total_heap_no && gain[i] [j] M .gain_value>gainjptr[i] [j].mag) 
{ gain_ptr[i][j].mag=gain[i][j][k].gain_value； 
x=(int)MaxPin-gain[i][j][k].gain_value； 
gain_ptr[i] [j] .next=buck;et [i] [j] [x]； 
HeapShiftUp(l)； 
InsertHeap(i, j); 
ilse if(l<total_heap_no && gain[i] [j] [k].gain_value==gainj)tr[i] [j].mag) 
{ heap[l].base_clb=(unsigned)gain[i][j][k].clb_id； 
ilse if(l==total_heap_no && gain_j>tr[i][j].mag==-l-(int)MaxPin) { 







,� /* reverse all moves until min. cut size reached */ ReverseMove() 
unsigned i, j, k, 1' min_cut_index； 
int X； 
if (total一move—no！=0) 
{ for{i=0； i<total_move_no； i++) /* final min. cut size within a pass */ 
if (move—red [i] .move_gain< (int)miii_cut_size) 
xlix 
{ 
niin一cut_size二（unsigned) (move一red [i] .move一gain)； 
min 一 c u t 一 i n d e x - ( i n t ) i ； 
if (min一cut:一size<final一cut) /* if < final cut size, reverse move */ 
{ “ 
for(i=total一move一no-1; i>min_cut_index； i--) { “ “ 
j =move_rcd [i] .base_clb； 
k=move_rcd[i].fm 一 f p g a ； 
l=move_rcd[i] .to一fpga,. 




for(i=0； i<TtlFpgaNo； i++) 
temp一clb一cnt[i]=clb_cnt[i]; 
for(i=0； i<total_clb_no； i++) { _ 
clb[i] .final_fpga_id=clb[i].fpga_id； 







printf("Too many interconnections between FPGAs, can't be partitioned!"); 
exit(O); 
} 






















for(i=0； i<TtlFpgaNo； i++) { 
while(!clb_cnt[i3 ) 
for(k=0； k<total一clb一no; k++) 
if(clb[kl.final_fpga_id>(int)i) 
clb[k].final_fpga_id--； 









for(i=0； i<4； i++) 
clb_cnt[i]=0; 







{ unsigned sort[FPGA_LMT], total, i, j; 
int X； struct IO_NET *temp； 
if(TtlNoClb>0) { 
total=0； 
for(i=0； i<TtlFpgaNo； i++) 
sort[i]=0； 
cxxix 
for(i=0； i<TtlFpgaNo； i++) { 
j=0; 
while(j<total && clb_cnt[i]>clb_cnt[sort[j ]]) j++； - _ 
if(j<total) { 








i++; for(j=0； j<TtlNoClb； j++) { 












while(temp && temp->clb==-l) { 
i=:0 ； 
while (strcmp (temp->no_neibor_name, no一neibor一clb [i] .name)) 
i++； 
temp->fpga_id= (unsigned) no一neibor_clb [i] . fpga_id； 






/* clb_type in con[] = 0 ： neither input nor output CLB found. */ 
/* 1 ： only input CLB found. */ 
/* 2 : only output CLB found. */ 





#define GBL_LMT 10 
ttdefine LOC_LMT 12 
#ciefine PROG_LMT 8 
ttdefine DATA_LMT 8 
#define ADD_LMT 15 
extern unsigned total_net_no, net_no, TtlNoClb, TtlFpgaNo； 
extern char line[CHAR_NO], entry_nmtRCD_NM_LEN], sym_type[SYM_NM_LEN]； 
extern char rcd_nm[RCD_NM_LEN], sym_nm[SYM_NM_LEN], pin_nm[3], pin_type[2]； 
extern char sig_nm[SYM_NM_LEN]； 
extern FILE *f_ptr, *rfjptr； 
extern struct SIGNAL net[NET_LMT]； 
extern struct BLOCK clb [REAL_CLB_IiMT_D]； 
extern struct 10一 NET *head_io_net； 
extern struct NO:NEIBOR no_neibor_clb[NO_NEIBOR一CLB]； 
struct GBL_BUS { 
unsigned pin [GBIi-LMT] , index； 
} ； 
struct ADD-BUS { “ 
unsigned pin[ADD_LMTl, shared[ADD_LMT], index； 
} ； -
struct DATA_BUS 
unsigned pin [DATA_IiMT] , sharedIDATA_LMT] , index； 
} ； “ 
struct LOC一BUS 
{ unsigned pinx[LOC_LMT], piny[LOC_LMT], index； 
} ； 
struct PROG-EVEN 
{ unsigned pinx[PROG_LMT] , piny[PROG_LMT] , index； 
} ； 
struct PROG_ODD 
unsigned pin[PROG_LMT], index； 
} ； 
unsigned offset, fm_lca, to_lca, fm_j,in, to_pin, adj_lca, iob_no' sw_iio； 
unsigned iob_cnt[FPGA_LMT], lca_io_no； 
char pin[2]； 
struct SYS一INTF sys_intf[FPGA 一 L M T ] ; 
struct GBL:BUS gbl_5us; 
struct PRO石-EVEN prog_even[FPGA_LMT]； 
struct PROG_ODD prog_odd[FPGA_LMT]; 
struct LOC_iuS loc_bus[FPGA_LMT]; 
struct ADD_BUS add_bus; 
Struct DATA_BUS data—bus[3]; 
struct CON con； 
struct lOB iob[NET_LMT]; 
struct SW sw[PROG_PIN_LMT]; 
Struct LCA_IO lca_io[LCA_IO_LMT]; 
Assignio0 { 
unsigned i; 
for(i=0; i<GBL_LMT; i++) 
gbl_bus pin[i]=0； initialize structure for assigning 10 */ 
AssignlnitO; initialization V 
/* initialization */ 
， w _ n o = O z /* initialization */ 
forTi=0; i<4； i++) 
f o 二 r 二 一 L M T ; i++) /* for structure analysis only */ 
lea一io[i].adj-lca=2; 




AssignBiNet 0 ； 
lii 
printf("\nAssigning lOs for interconnection between FPGAs...\n")； 
Assignlnter24 0 ； /* assign interconnection 10 between 2 or 4 LCAs */ 
Assignlnter3()； 











unsigned i, j； 
rf_ptr=fopen(”iosource.dat”, "rt"); 
if(！rf_ptr) { 
printf(”\nFile iosource.dat not found！"); 
printf("\n")； 
exit(O)； 
SysIntflnitO； /* initialize structure for system interface */ 
GblBusInitO • ‘ /* initialize structure for global bus */ 
Proglntclnit'o • /* initialize structure for programmable interconnect */ 
LocBusInitO； ‘ /* initialize structure for local bus */ 
AddBusInitO • /* initialize structure for address bus */ 
DataBusInito'； /* initialize structure for data bus */ 
fgetsdine, CHAR_NO, rf_ptr)； 
if (strncmpdine, "EOF", 3) ！ =NULL) 
NotDatFile(); 
) 
SysIntflnitO /* initialize structure for system interface */ 
{ 
unsigned i, j, k； 
fgetsdine, CHAR_NO, rf_ptr)； 
if (strncmpdine, "BOARD 10", 8)) 
NotDatFile 0 ; 
for(i=0； i<4； i++) 
for(j=0; j<SYS LMT; j++) , 
sys_intf[i].;in[j]=0； /* initialization */ 
fgetsdine, CHAR_NO, rf_ptr)； 
while (strncmpdine, "END", 3)) { 
FindLCA(); 


































FindPin() /* find pin no. */ 
{ 
offset=offset+3； 
if(line[offset]==•,‘ || line[offset]=='；•) /* i, */ 
fm_pin=tojpin=(unsigned)atoi(&(line[offset-1])); 
else if(line[offset]==•.•) { 
fm_j)in=(unsigned)atoi(&(line[offset-l]))； 
offset=offset+3； 
if(linetoffset]==',» || line[offset]==»；•) /* i..i, */ 
to_pin=(unsigned)atoi(&(line[offset-l]))； 
else /* i..ii, */ 
{ 
of fset;++,-








if(line[offset]==I,' || line[offset]==•；•) /* i:i, */ 
to lca=(unsigned)atoi{&(line[offset-l])); 
else" /* i:ii." 
{ 
offset++； 






if (line [offset] ==,, , | | line [offset]；。 
strncpy(pin, &(line[offset-2]), 2)； /* ii, */ 
fm_j)in=to_pin=： (unsigned) atoi (pin)； 
} 
else if(line[offset]==’.’） 
strncpy(pin, &(line[offset-2]), 2)； 
fmjpin=(unsigned)atoi(pin)； 
offset=offset+3； 
if(line[offset]==•,• || line[offset]==•；•) /* ii..i, */ 
to pin=(unsigned)atoi(&(line[offset-l]))； 
else /*ii..ii, */ 
{ 
offset++; 





strncpy(pin, &(line[offset-2]), 2)； 
fm_pin=to_pin=(unsigned)atoi(pin); 
offset=offset+2； , . . . , 
if(line[offset]==',• || line[offset]==•；') /* */ 
to lca=(unsigned)atoi(&(line[offset-l])); 
else “ ” 
{ 
offset++； 






. ,� /* iniCialize structure for global bus */ 
GblBusInit 0 ‘ 
{ 
unsigned j, k； 
k=0 • 
fgetsdine, CHAR_NO, rf_ptr); 
if (strncmpdine, "GBL BUS", 7)) 
NotDatFile(); 
fgetsdine, CHAR_NO, rf_ptr)； 
while (strncmpdine, "END", 3)) { offset=7； while(line[offset]!=,;') { 
FindPin0； 
for(j=fm_pin； j<=to_pin； j++) 
gbl_bus.pin[k]=j; 
k + + ; 
} 
fgetsdine, CHAR_NO, rfjptr)； 
} 
} 




unsigned i, j, k； 
fgetsdine, CHAR_NO, rf_ptr)； 
if(strncmp(line, "PROG INTC", 9)) 
NotDatFile()； 
fgetsdine, CHAR_NO, rf_ptr)； 
while(strncmp(line, "END", 3)) { 
FindLcaProg()； 






























if(line[0]=='P' && line[2]==•：') 
fm_lca=(unsigned)atoi(&(line[1])); 
else—if(line[01==,P, && line[2]==',') 
fm_lca=(unsigned)atoi(&(line[1])); 
to_lca=0； 





LocBusInitO /* initialize structure for local bus */ 
{ . 
unsigned j, k； 
fgetsdine, CHAR_NO, rf_ptr)； 
if(strncmp(line, "LOC BUS", 7)) 
NotDatFile(); 
fgetsdine, CHAR_NO, rf_ptr)； 







for(j=fm_pin； j<=to_pin； j++) { 












{ if(strncmp(line, "LCA", 3)) 
NotDatFile(); 





AddBusInit 0 /* initialize structure for address bus */ 
{ 
unsigned j, k； 
k=0； 
fgetsdine, CHAR_NO, rf_ptr)； 
if(strncmp(line, "ADD BUS", 7)) 
NotDatFile{)； 
fgetsdine, CHAR_NO, rf_ptr) ！ while(strncmp(line, "END", 3)) { 
FindLcaOther()； 
offset=7； while(line[offset]!=';') { 
FindPin()； 
for(j=fm_pin； j<=to_pin； j++) { 





fgetsdine, CHAR_NO, rf_ptr)； 
} 
} 
DataBusInit 0 { 
unsigned j, k； 
k=0； 
fgetsdine, CHAR—NO, rf_ptr)； 
if(strncmp(line, "DATA BUS", 8)) 
NotDatFile0; 





for(j=fm_pin; j<=to_pin； j++) { 
k=k%8; 
data—bus[fm_lca].pin[k] =j； 
datalbus [ftiTlca] . shared [k] =to_lca； 
k++； 
} 




{ if(!strncmp(line, "LCA", 3) && line[4]==•,•) 
{ fm_lca=(unsigned)atoi(&(line[3])); to:lca=(unsigned)atoi(&(line[6]))； 






unsigned i, j, k； 










j =sys_intf[k] .index； 
if(j=:SYS_LMT) 







temp->pin_id=sys_intf [k] .pin[j]； 












temp->lca_con_no=con•lea一con 一 n o ； 
lea一io [lea一io一no] • lca_con一no=con. lca_con一no; 
if(con.lca_con_no==4 || con.lea一con_no==3) 
lea一io[lea一io一no].bd_intf=l； 
lea一io一no++; 
ChkLcalOLmt 0 ； 





printf("\nNo vacancy in Global Bus. Assigning Board 10 fails...\n"); 
exit(0)； 
} 
if{temp->i && ！temp->o) { 
temp->pin_id=gbl_bus.pin[i]； 


















if {(i==0 && con.lea id[3]) || con.lca_ica[ (i+1) %4]) 
{ 一 /* check adjacent FPGAs */ 
adj-lca-1; 
lca:io[lca_io一 no-1].adj_lca=l； 


































temp->pin_id=sys_intf [i] .pin[j]； 
temp->clb=(int)i； 









ChkLcalOLmt () /* for structure analysis only */ 
{ 
if(lca_io_no==LCA_IO_LMT) 





unsigned i, o； 
struct LIST *temp； 























unsigned i; { 
unsigned j; 
struct LIST *temp； 
jj'Hr二 /* initialization */ 
con.lca_xd[:]=0； / initialization */ 





if(!con.lca_id[j]丨丨 con.op_clb==4) { 
if(con.op_clb==4) 
{ if(!i && tenipolst_type==0) /* output net */ 
con.op_clb=j； 
if (i && tetnp->lst_type==2) 
con.op一clb二j; 
if(!con.lca_id[j]) 
{ if(!i && temp->lst_typel=l) { 
con.lca_con_no++; 
con.lca_id[j] =1; 










lea一io [lea一io一no] . lea一con一no=con • Ica—con一no； 
lca_io_no++； 
ChkLcalOLmt()； 














Assignlnter24() /* assign interconnection 10 between 2 or 4 io */ 
{ 
for (net_no=0； net_no<total_net_no； net_no++) 
{ if (！ (net [net一no] . ip_net^net [net一no] . op一net)) 
{ 一 一 
FindLcaNo(); 
if(con.lca_con_no==2 || con.lca_con_no==4) " for structure analysis */ 
{ “ 一 




switch(con•Ica—con 一 n o ) { 
Con4Lca(0)； /* connect 4 LCAs together */ 
break； 








M „ ,� /* find no. of LCAs required to connect net */ FindLcaNo() ‘ 
{ 
unsigned j; 
struct LIST *temp； 
for(j=?'•二'H二： /* initialization */ 
no 0 卜 initialization */ 





if(!con.lea一id[j]丨丨 con•op_clb=-4) { 
if(con,op_clb==4) 
{ if(temp->lst_type==0) /* output net */ 
con.op_clb=j； 
if (!con.lca__id[j]) { 
con. 1 c a_c on__no++ ； 
con.lca_id[j]=1; 
} 

























if((j==0 && con.lca_id[3]) || con.lca_idt(j+1)%4]) { “ “ 
adj_lca=l； /* adjacent LCAs */ lca_io[lca_io_no-l].adj_lca=l； 
if (j==0 &St~con.lca_id[3T) 
LocCon(i, 3)； 
else 
LocCon(i, j); /* use local bus */ 
} 
else 
adj_lca=0； /* not adjacent LCAs */ 
lea一io[lea一io一no-1].adj_lca=0； ProgCon(i,~j)7 " use programmable bus */ 
} 
if{!net[net一 no].connected) { _ 
if (j==0 && con.lca_id[3]) 
ProgTCon(i, 3)； 
else 
ProgTCon(i, j)； /* use programmable bus */ 
} 
if(！net[net一no].connected) 




LocCon(i, k) /* local bus connection */ 
unsigned i, k； 
{ 














{ IORcd(k, 0, loc_bus[k].pinx[l]); 
if(k==3) 
IORcd(0, 1, loc—busW.piny[1]); 
else IORcd(k+l, 1, loc_bus[k] .piny[l]); 
} 
else 
{ IORcd{k, 1, loc_bus[k] .piiix[l]); 
if(k==3) 
IORcd(0, 0, loc_bus[k].piny[l]); 
else 









} �� 一 一 i v ^ � /* middle CLB is output CLB */ 
if({k+2)%4==con.op_clb) ‘ 
{ 
SingleLoc((k+1), D ; SingleLoc((k+2) , 0); 
} 
else 
{ if{(k.l)%4==con.op_clb) /* starting CLB is output CLB */ 
{ SingleLoc((k+1), 0)； SingleLoc((k+2), 0)； 
Ix 
else /* last CLB is output CLB */ 
{ 
SingleLoc((k+1), 1)； 
SingleLoc((k+2), 1)； } 
} 
loc_bus[(k+1)%4].index++； 




if (loc一bus [k] • index==LOC_LMT ] | loc_busI (k+3) %41 . index==LOC_LMT) 
return(0)；； 
m=loc_bus [ (k+l) %4] . index； 
n=loc一 bus[(k+2)%4].index； 
if (ni==LOC_LMT && n==LOC_LMT) 












IORcd(((k+l)%4), 0, loc_bus[(k+l)%4].pinx[m])； 




IORcd(((k+2)%4), 1, loc_bus[(k+2)%4] .pinx[n])； 







l^f � /* system interface required */ { 1 } 
} 
IORcd(i, j, k) 
unsigned i, j, k； 
{ 
iob [ iob一no] . fpga__id=i ; 
iob [iob一no] . iob_type=j ; 
iob[iob一no].pin_id=k； 











{ I0Rcd((i%4), 1, loc_bus[i%4].pinxLk])； 
I0Rcd{((i+l)%4), 0, loc_bus[i%4].piny[k]); 
} 
else 
{ I0Rcd((i%4), 0, loc_bus[i%4].pinx[k])； 
I0Rcd({(i+l)%4), 1, loc_bus[i%4].piny[k]); 
} 
} 
� /* programmable bus */ ProgCon(i, k) ^ 
unsigned i, k; 
unsigned 1, m, n' o' p' <3' r' s' t' u； 
， •J� /* connect 3 LCAs */ if(con.lca_con_no==3) ‘ 
{ 















if(S==L0C_LMT && t==LOC_LMT) 
return(0)； 



























{ ’ /* connect 2 LCAs via programmable bus */ case 2： ‘ 




IORcd(k, 0, p); 
I0Rcd(lc+2, 1, q); 
else 
IORcd(k, 1, p); I0Rcd(k+2, 0, q); 
break; connect 3 LCAs via programmable bus */ 
default： ‘ 
if (k==!Con. op一clb) 
IORcd(k, 0, p); 
I0Rcd(lc+2, 1, q)； 
else if(k+2==con.op_clb) 
lORcddc, 1, p); 
I0Rcd(k+2, 0, q); 
else 
if (!u) { 
IORcd(k, 1, p); 
IORcdOc+2, 0, q); 
} 
else { 
lORcdOc, 0, p)； 




{ IORcd(r, 0, loc_bus[r].pinx[s]); 
I0Rcd(({r+l)%4), 1, loc_bus[r].piny[s]); 
loc_bus[r].index++; } ‘ 
else 
{ IORcd(({r+3)%4), 1, loc_bus[(r+3)%4].pinx[t]); 
IORcd(r, 0, loc_bus[(r+3)%4].piny[t]); 





IORcd(r, 1, loc_bus[r].pinx[s]); 
IORcd(((r+l)%4), 0, loc_bus[r].piny[s])； 
loc_bus[r].index++； } “ 
else { 
IORcd{({r+3)%4), 0, loc_bus[(r+3)%4].pinx[t]); 
IORcd(r, 1, loc_bus[{r+3)%4].piny[t])； 







i f ( i ) -{ 
} 
} 
FindPathId(i, j) /* find path id. for programmable bus */ 
unsigned i, j； 
{ 
unsigned k； 
.lca_con_no=i； 卜 connect 2 non-adjacent LCAs via programmable bus */ 
^ if(j==0) /* LCA[0] & LCA[2] */ 
{ if (prog_odd [0] . index==PROG_LMT || p r o g _ o d d [2】.index==PROG_LMT) 
sw[sw_no].lca_con_no=0； 
else 
{ sw[sw_no].path_id[0]=l; /* use path[l] & path[5] */ 
sw[sw_no].path_id[l]=5； 
swLswIno].sw_id=l； /* switch 1 */ 
} 
}， /* LCA[1] & LCA[3] */ else 
{ if (prog_odd[l] .index==PROG_LMT || prog_odd[31 . index==PROG_LMT) 
sw[sw一no] . lca_con_no=0； 
else 
{ sw[sw_no].path_id[0]=3； /* use path[3] & path[7] */ 
sw[sw no].path—id[1]=7; 
sw[swino].sw_id=2； /* switch 2 */ 
} 
} 
} /* connect 3 consecutive LCAs via programmable bus */ else { 
switch(j) 
{ „ /* LCA[1], LCA[2] & LCAI3] */ 
'''V=;unsigned)FindOptPath(3, 6, 7, 4)； /* either p a t h [ 3 ] . ⑷ or p a t h [ 7 ] . ⑷ */ 
if(k==0) 
{ sw[sw_no].path_id[0]=3; /* use path[3] & path[6] */ 
sw[sw_no].path_id[1]=6； } 一 
else if{k==l) 
{ sw[sw_no].path_id[0]=7； /* use path[7] & path[4] */ 
sw[sw_no].path一id[1]=4； } 一 
''"[sw.no] .lca_con_no=0； /* no path available */ 
break; LCA[0], LCA[2] & LCA[3] */ 
"'\=;unsigned)FindOptPath(l. 6, 5. 8); /* either path[l] & [6] or path[5] . [8] */ 
if (Jc==0) 
{ sw[sw_no].path_id[0]=l; /* use path[l] & path[6] */ 
sw[sw_no].path_id[1]=6; 
} 一 
else if (k;==l) 
{ sw[sw_no].path_id[0]=5； /• use path[5] & path[8] V 
sw[sw_no].path_id[1]=8； 
} “ 
''sw[sw_nol.lca_con_no=0； /* no path available */ 
brealt'. /* LCA[0] , LCA[1] & LCA[3] */ 
''^'V=;unsigned)FindOptPath(7, 2, 3, 8); /* either path[7] & [2] or path[3] & [8] */ 
if(k==0) 
{ sw[sw_no].path_id[0]=7； /* use path[7] & path[2] */ 
Ixiii 
sw[sw_no].path 一 i d [ 1 ] = 2 ； 
} 一 一 
else if (k;==l) 
sw [sw_no].path_id[0]=3； /* use path[3] * path[8] */ 
sw[sw_no].path-id[1]=8； } “ “ 
else 
sw[sw一no] .lea一con一no=0; /* no path available */ 
break； 
default: /* LCA[0], LCA[1] & LCA[2] */ 
k= (unsigned) FindOptPathd, 4, 5, 2); " either path[1] & [4] or path[5] & [2] */ 
if(k==0) 




sw[sw_no].path_id[0]=5； /* use path[5] & [2] */ 
sw[sw_no].path-id[1]=2; } “ 
else 






FindOptPathd, j, k, 1) /* find optimal path： path[i] & [j] or path[k] & [1]? */ 
unsigned i, j, k, 1; { 
unsigned m, n； 
if((prog odd[(i-l)/2] .index==PROG_LMT || prog一even[(j/2)-1] •index二=PROG一LMT) && 
(progjdd t (k-1) /2] . index==PROG一LMT | | prog一even [ (1/2) -1] . index==PROG_LMT)) 
el^rinprig.oddE (i-1) /2] . index<PROG_LMT && prog一even[ (j/2)-1] • index<PROG—LMT && 
prog_odd[{k-l)/2] . index<PROG_LMT && prog_even[ (1/2)-1] . index<PROG_LMT) 











,� /* use global bus to make connection */ GblCon0 ' = 
{ 





for(j=0； j<TtlFpgaNo； j++) { 
if (con.lca一i<a[j]) { 
if(j ==con.op_clb) 
IORcd(j, 0, k)； 
else 






unsigned i, j; 
unsigned Ic, 1, tn, n, o, p, q； 
switch (con. lea一con一no) { 
case 4: 
二 二 讓 一 L M T II ((pro” 二 广 ] 二 二 應 _ L M T 
& & l o c _ b u s [ k ] . i i i d e x = = P R O G _ L M T ) & & ( p r o g _ e v e n [ k ] . i n d e x = = P R O G _ L M T 
I I loclbus[(k+3)%4]•index==PROG_LMT))) 









else if (prog_odd [ (k+2) %4] . index<PROG_LMT && prog_even [ (k+3) %4] . index<PROG_LMT SfiTloc一 bus[k].index<PROG_LMT) n=0； 
else 
n=l； 
l=prog_odd[(k+2)%4].index； I0Rcd(T(k+2)%4) , 1, prog_odd[ {k+2),%4] .pin[l])； 
sw[sw_no] • lea一con一no二3 ; 
sw[sw二no] .path_iciTo] = ({k+2) %4) *2+l； SW[SW一no]•index[0]=1； 
if(!nT 
{ 
Tn=prog_even[ (k+3) %4] • index； 
o=loc一 bus[k].index； 
IORcd{k, 0, loc_bus[k],pinx[o])； 
IORcd(((k+l)%4), 1, loc_bus[k].piny[o])； 
IORcd(((k+3)%4), 1, prog_even[(k+3)%4].pinx[m]); IORcd(k, 0, prog一even[ (k+3m] .piny [m]); 
prog一even [ {k+3) %4] . index++ ； loC-bus[k].index++； 





tn=prog一even [k] . index ； o=loc_busI(k+3)%4] .index； IORcdT({k+3)%4), 1, loc_bus[(k+3)%4].pinx[o]); 
lORcddc, 0, loc_bus[(k+3)%4] .piny[o]); 
IORcd(k, 0, prog一even[k] .pinx[m])； 
lORcd(((k+l)%4), 1, prog_even[k] .piny [m]); 
prog一even [Ic] .inciex++; loc_bus[(k+3)%4]•index++; 
SW[sw_no].path_id[l]=(k+l)*2； 


















if((k-l)/2==con.op_clb) I0Rcd(({k-l)/2), 0, o)； 
else 
I0Rcd(((k-l)/2), 1, o)； 
if((1/2)-l==con.op_clb) I0Rcd(((l/2)-l), 0, p); 
else 
I0Rcd(((l/2)-l), 1, P)1 
if{{l/2)%4==con.op_clb) IORcd{((l/2)%4), 0, q); 











{ lORcd(j, 0, prog_even[j].pinx[k]); 
IORcd(({j+l)%4), 1, prog_even[j].piny[k]); 
} 
else 
{ lORcd(j, 1, prog_even[j].pinxtk])； 










if (k==PROG_LMT II (1 ==PR0G一LMT && m==PROG一LMT)) 
return(0)； 











I0Rcd(j+2, 0, prog一odd[j+2].pin[k]); 
else 







IORcd(j, 0, prog_even[j] .pinx[l])； 
else 
IORcd(j, 1, prog_even[j].pinx[ll)； 







IORcd(j, o7 prog_event(j+3)%4].piny[m])/ 
else 
IORcd{j , 1, prog_even[ (j+3)%4] .piny[ni])； 
sw[sw_nol .path_id[!] = (((j +3)%4)+1)*2； 


















































if (sw[i].path一id[1] ==2) 
SW[i].sw_id=6； 












for(net_no=0； net一noctotal一 net—no; net一 no++) 
{ if (! (net [net一no] . ip_net^net [net—no] . op_net)) 
{ 一 
FindLcaNo()； 






























{ unsigned i, j[FPGA_LMT], m, k, 1, n, o； 
for(i=0; i<FPGA_LMT； i++) 卜 initialization */ 
/* initialization */ 
/* initialization */ 
1 = 0 ； n=0; 
o=0； 
/* printf(”\nLCA 10 RECORD:");*/ 
for(i=0； i<lca_io_no； i++) { 
in=lca_io[i] .lca_con_no； 
/* ''p;'iltf'i"\n%dAtconnect to %d LCAs. i, Ica.io[i] .lca_con_no) ;*/ 
if(lca_io[i].lca_con_no==2) 
{ if(lca_io[i].adj_lca==l) 
/* { printf("(adj_lca)")；*/ 
n++; 
else if(!lca_io[i].adj_lca) 









if(lca_io[i].lca_con_no==4 && lca_io[i].bd_intf) 
{ “ 一 
/* printf(“(board 10)")；*/ 
k++； 
} 
if{lca_io[i].lca_con_no==3 && lca_io[i].bd_intf) { “ “ 




printf ("\nBus ratio --'•); 
printf ("\nBoard 10 ： Global bus ： Local bus ： Programmable bus (2) ： •’）； 
printf("Programmable bus (3)")； 
printf (••\n(%d ： %d ： %d ： %d ： %d) \n", j [0] , j [3]+l, n, o, j [2] -1); 










extern unsigned TtlFpgaNo, iob_no, bal_parti, net_no, total_net_no, lca_io_no； 
extern unsigned clb_no, clb_cnt[FPGA_LMT], blk_nm, sw_no； 
extern long cur_f_ptr； 
extern char infile[FILENM_LEN], line[CHAR_NO], sym_type[SYM_NM_LEN]； 
extern char entry_nm[RCD_NM_LEN] , sym_nm[SYM_NM_LEN], rccl_nm [RCD_NM_LEN]； 
extern char pin_nm[3], pin_type[2], sig_nm[SYM_NM_LEN]； 
extern FILE *f_ptr； 
extern struct IOB iob[NET_LMT]； 
extern struct BLOCK clb[REAL_CLB_LMT_D]； 
extern struct SIGNAL net[NET_LMT]； 
extern struct NO_NEIBOR no_neibor_clb[NO_NEIBOR_CLB]； 
extern struct IO_NET *head_io_net； 
extern struct SYi_INTF sys_intf[FPGA_LMT]; 
extern struct CON con； 
extern struct LCA_IO lca_io[LCA_IO_LMT]； 
extern struct SW sw[PROG_PIN_LMT]； 
unsigned no_neibor_no, buf_in_lca； 
char outfili[FPGA_LMT][FILENM_LEN], temp_line[CHAR_NO], confile[FILENM_LEN]; 
FILE *ptr[FPGA_LMT]; 
Struct LIST *cTb_ptr； 
WriteMiscO /* write netlist & executive programs version no. & power record 
* / 
{ . • unsigned i, j, k； 
buf_in_lca=4； /* initialization */ 
j=(unsigned)(strlen(infile))； 
if (j ==FIIiENM_LEN-1) j=j-6； _ else if(j==FILENM_LEN-2) j=j-5； else 




{ printf("\nCannot open destination file after partition!\n«)； 
exit(O)； 
} } 
fseek(f_ptr, 0, 0); 
咖 ' ' 4 ) ) /* copy first severa. lines up to SVM records 
{ for(i=0； i<TtlFpgaNo； i++) 
if(outfile [i] [0]) 
fputs(line, ptr[i]); 
cur_f_ptr=ftell(f_ptr)； 
fgetsdine, CHAR一NO, f_ptr); 
-canfdine, "%s %s %s", rcd_nm, sym.nm, sym_type) ； _^_^^_^^^_^^ 叫 
clb_no=0; /* initialization */ 
no neibor no=0 ； , ’,tadh i \ \ 
while (strncmp (sym_type, "CLB", 3) && strncmp (sym_type, "IOB , 3)) 








} 一 else 














strncpy(outfile[i] , infile, j)； 
switch(i) { 
case 0； 
strcat(outfile[0] , "00.map")； 
break； 
case 1： 
strcat(outfile[l] • "01.map")； 
break； 
















fgetsdine, CHAR_NO, f_ptr)； 
} 





WriteBoardlOO /* write emulation board 10 */ 
{ 
unsigned i, j; 
struct 10一 NET *temp； 
printf("\nWrite lOB records of board IOs...\n"); 
sscanf (line, "%s %s %s", red一nm, sym一mti, sym_type)； 
while(!strncTnp(sym_type, "lOB", 3)) /* lOB encountered */ 
{ 
s t rcpy(ent ry_nm, "PIN,"); 
SkipLines(entry一nm, 4, f_ptr)； 
sscanf {line, "%s %s %s %s", rcd_nm, pin_nm, pin—type, sig_nm)； 
temp=head一io—net； 

























for(i=0; i<TtlFpgaNo； i++) 









else if(!temp) { 
net_no=0； 
Ixx 
while (net_no<total_net_no && strcmp (net [net_no] . sig_name, sig一ntn)) 
net_no++； 
if (net_no<total_net_no) { “ 
FindLcaNo()； 
lea一io [lea一io一no] . lea一con一no=l; 
lca_io_no++； ChkLcaioLmt(); 
i二sys_intf [con.op-clb] . index； 
if (i==SYS_LMT) -
























strncpy(entry_nm, "EXT,", 4); 







unsigned i, j; 
(char block_name [SYM_NM_LEN] , temp_blk_nm[SYM_NM_LEN]； 
strncpy(temp_blk_nm, SYM_NM_LEN)； 
strncpy (block_name, "•', SYM_NM_LEN)； 
fseek:{f__ptr, cur_f_ptr, 0); 
fgetsdine, CHAR一NO, f_ptr); 
sscanf(line, "%s %s %s", rcd_nm, block_name, sym_type)； 
strncpy(tetnp_blk_nm, block—name, strlen(block_name) -1); 
""^fprint^CptrEi] , "%s %s %s LOC=P%d, BLKNM=%s\n", red一nm, block_natne, 
sym_type, j, temp_blk_nm)； 
^^fprintf {ptr[i] , "%s %s %s, LOC=P%d\n", rcd_nm, bloclc_name, syin_type, j); 
fgetsdine, CHAR_NO, f_ptr); 
while(strncmp(line, "SYM,", 4)) { 
fputs(line, ptr[i]); 
fgetsdine, CHAR_NO, f_ptr); 
Lcanf(line, "%s %s %s", rcd_nm, sym_nm, sym_type)； 
^^fprint?(ptr[i], "%s %s %s LOC=P%d, BLKNM=%s\n", rcd_nm, sym_nm, sym_type, 
j, temp_blk_nm)； 
^^fprintf(ptr[i], "%s %s %s, LOC=P%d\n", rcd_nm, sym_nm, sym_type, j); 
fgetsdine, CHAR_NO, f_ptr)； 
while(strncmp(line, "EXT,", 4)) 
fputs(line, ptr[i]); 
fgetsdine, CHAR_NO, f_ptr> ; 
Lcanf(line, "%s %s %s", rcd_nm, sym_nm, sym_type)； 
"fpr^nSptr[i], ,,%s %s %s , LOC=P%d, BLKNM=%s\n", rcd_nm, sym_nm, sym.type, 
j, temp_blk_nm); 
^^fprintf(ptr[i], "%s %s %s , LOC=P%d\n", rcd_nm, sym_nm, sym_type, j); 
strncpy (temp_blk_nm, "__, SYM_NM_LEN)； 
} 
TestBuflnLcaO { 
unsigned j, k； 
if(buf_in_lca==4) { 
Ixxi 





if (k<SYS_LMT && k<sys_intf [buf_in_lca] .index) 
buf一in一lca=j ； 
j++; 





unsigned i, j； 
printf("\nWrite lOB records of interconnection IOs...\n"); 
for(i=0； i<iob_no； i++) { “ 
j=iob[i].fpga_id； 
if(iob[i].iob_type) /* input 10 pad */ 
{ 一 
fprintf(ptr[j], "SYM, %s_CUT, IOB, LOC=P%d", iob[i].net_name, 
iob[i].pin_id)7 
if(blk_nm) 
fprintf(ptr[j], BLKNM=%s_CUT\n", iob[i].net_name)； 
else 
fprintf(ptr[j], "\n">; 
fprintf(ptr[j], "CFG, Base IO\n")； 
fprintf(ptr[j], "CFG, Config IN：I OUT TRI\n")； 
fprintf(ptr[j], "PIN, I, O, %s,\n", iob[i].net_name)； 
fprintf(ptr[j], "MODEL\n")； 
fprintf(ptr[j], "SYM, %s_IB, IBUF, LOC=P%d", iobti].net_name, 
iob[i].pin_id)； 
if(blk_nm) 
fprintf (ptr[j] , ••, BLKNM=%s_CUT\n", iob[i] .net—name); 
else 
fprintf(ptr[j], "\n"); 
fprintf(ptr[j], "PIN, O, O, %s,\n", iob[i].net—name); 
fprintf(ptr[j], "PIN, I, I, %s_CUT,\n", iob[i].net_name)； 
fprintf(ptr[j], “END\nENDMOD\nEND\n")； 
fprintf (ptr [j] , "EXT, %s_CUT, I, , LOC=P%d", iob[i] .net_natne, 
iob[i].pin—id)； 
if(blk_nm) 
fprintf{ptr[j], BLKNM=%s_CUT\n", iob[i].net—name); 
else 
fprintf(ptr[j], "\n")； 
L s e /* output 10 pad */ 
{ fprintf(ptr[j], "SYM, %s_CUT, lOB, LOC=P%d", iob[i].net_name, 
iob[i] .pin_id)； 
if(blk nm) � 
fprintf (ptr [ j ] , " , BLKNM=%s_CUT\n" , iob [i] .net_natne); 
else 
fprintf(ptr[j], "\n"); 
fprintf(ptr[j], "CFG, Base IO\n"); 
fprintf(ptr[j], "CFG, Config IN OUT:O TRI\n"); 
fprintf(ptr[j], "PIN, O, I, %s,\n", iob[i].net—name); 
fprintf(ptr[j], "MODEL\n")； . 
fprintf(ptr [j], "SYM, %s_OB, OBUF, LOC=P%d", iob[i] .net_name, 
iob[i].pin_id); 
""^^^fp^intf (ptr[j] , BLKNM=%s_CUT\n", iob [ 幻 . n e t — n a m e ) ; 
else 
fprintf(ptr[j], "\n")； … � 
fprintf(ptr[j], "PIN, 0, 0, %s_CUT,\n", iob[i].net_name)； fprintf(ptr[j], "PIN, I, I, %s,\n”， iob[i].net—name); fprintf(ptr[j], "END\nENDMOD\nEND\n")； 
fprintf(ptr[j], "EXT, %s_CUT, O, , LOC=P%d", iob[i].net—name, 
iob[i].pin_id)； 






/* write output .map files with only CLB records */ 
WriteCLBRcd() { 
unsigned i, k； 
struct IO_NET *temp； 
printf("\nWrite CLB records...\n"); 
while(!strncmpdine, "EXT,"' 4)) 
{ sscanf(line, "%s %s %s %s", rcd_nm, pin_nm, pin_type, sig_nm)； 
temp=head_io_net； while(temp && strcmp(temp->net_name, pin_nm)) 
temp=temp->next_io_net； 





sscanf(line, "%s %s %s", rcd_nm, sym_nm, sym_type)； 
do - - -
{ 
if (！ strcmp (sym_nm, clb [clb一no] .blk一name)) 
{ - - -
k=(unsigned)(clb[clb一 no].final_fpga_id)； 
WriteCLB(k); “ 
clb一no++; } _ 
else if (！ strcmp (sym_ntn, no一neibor—clb [no一neibor—no] .name)) { - - - - -









while(strncmp(line, "EOF", 3)！=(int)NULL)； 







WriteCLB(i) /* write whole CLB records in file corresponds to different FPGAs 
* / 
unsigned i; { 
do { 
fputs(line, ptr[i]); 
fgets(line, CHAR_NO, f_ptr); 
while(strncmp(line, "ENDMOD", 6)！=(int)NULL)； 
fputs(line, ptr[i]); 
fgetsdine, CHAR一NO, f_ptr); 
fputs(line, ptr[i]); ReadSYM0; 
} 
WriteSwCon() { 
unsigned i, j, k, 1; 
short int con_data, shift； 
printf{"\nWrite switch connection in file.\n"); 
i=(unsigned)(strlen(infile))； 
i=i-4； 




{ printf("\nCannot open output file %s!\n", confile); 
exit(O)； 
} • /* initialization */ 
shift=0x0001； ‘ 
for(i=0； i<8； i++) { 
for(j=0； j<8； j++) 
{ . „ /* initialization */ 
con_data=OxOOOO； ‘ 
forTk=0; k;<sw_no； k++) 















fprintf(f_ptr, "%0x", con_data); 
} 
Ixxiii 
'�4 r 内、* ‘  '；. 二 . ： . “ • 
f } ' 厂’1 ‘ fprintf(f_ptr, "\n"); 丄 
^？ V / ' ) ” • . • 
t 心 、 fclose(f_ptr)； . -
printf (“ \nDone. \n\n")； 
ll \ • • — I卜 ‘ I ., . ‘ if f . iW ^  
( “ 、 一 ’. K •“. 
i^.-�..>:. ：-,； . '^"H^i-v：； % -I . � V • 
M ‘ . . . 
迷 ' .. 二 C本.’ ..’:.：—— - -
(.'.’ . . ‘ ：―. ；T•..、： •，.. . .；2^�.�.,1‘- :  • . ., 
1.’.： . “ . . . J丨.：.....——. . . . SV "... ’ . ... - •• - ‘ 
•‘’ - - ' . , ‘ . - : . • ’ -
^ - ： . - t - “ ‘ . 
‘ • ,.,:. - ‘ • I'. . ‘ 
： 气 \ • 
‘ - ‘ 1 
. - . 、 ， . ^ 
. - • 
k ^ . ‘ . . . 
, .‘ 
. - S • • 
4 
. . . . . ‘ .. 
. � V • • 
， 、 ： . 」 I •」 一 . 
• , ‘ .‘:V ’’ ： ‘ . . . . . … 
…？ •. ‘ I.： \ - . -
. . . . . . . . . , .. 、.“r.取.7 • ‘ • ‘‘- ‘  
；:V.* �_.-.).,,:: ’� . • ‘ 
k/i �� �• . 
f凌‘‘心 ‘ , -
'A 3 , « , •： -






#define CR OxOd /* carriage return */ 
Sdefine ESC Oxlb /* escape */ 
#ciefine menu一top一x 17 /* menu x co-ordinate */ 
#define menu_top_y 7 /* menu y co-ordinate */ 
_ /* COM 1 used */ 
#define RBR 0x03f8 /* Receiver Buffer Register */ 
#define THR 0x03f8 /* Transmitter Hold Register */ 
ttdefine DLL 0x03f8 /* Divisor Latch */ 
#define lER 0x03f9 /* Interrupt Enable Register " 
#dlefine DLM 0x03f9 /* Division Latch */ 
#define LCR 0x03fb /* Line Control Register */ 
#define MCR 0x03fc /* Modem Status Register */ 
ttdefine LSR 0x03fd /* Line Status Register */ 
int function 一 k e y ; 





unsigned char key； 
printf("\n")； 
do 
printf ("\nEnter the filename of FPGA configuration data："); 
scanf("%s", cfilename)； 
fcptr = fopen(cfilename, "r"); 
while(ifcptr)； /* file not found */ 
clrscr{)； . 







printf("\nDownloading FPGA configuration data completed.\n"); 
clrscr0； 
do 
printf(”\nDownload switch connection data (Y/N)?"); 
key=getch()； 
} 
w h i l e ( k e y ! & & key!='y' &&key!:’N, &&key!-'n,); , 
clrscr0； 










{ printf ("\nEnter the filename of switch connection： ••); 
scanf("%s", cfilename)； 
fcptr = fopen(cfilename, "rt"); 
} 











int keypress, command[20], counter； 
llZrlss- 0x00； /* initialization " 
Ixxi 
counter = 0； 
do { 
if (kbhitO) { 
keypress = getchO ； 
command[counter] = keypress； 




if(keypress ！= Oxlb) /* if not ESC */ 
send(keypress)； 
if(keypress == OxOd) /* if carriage return, determine command */ 
{ 
counter = 0； 
do 
if (command[counter] == 0x20) /* if space, skip it */ 
counter++； 
else if ((command [counter] == 0x44) | | (command[counter] == 0x64)) 
( /* if D/d, check F/f or S/s */ 
counter++； 
if((command[counter] == 0x46) || (command[counter] == 0x66)) 
{ /* if F/f, check any spaces */ 
counter++； 
while (command [counter] == 0x20) /* skip any spaces */ 
counter++； 
if(command[counter] == OxOd) /* until carriage return is detected */ 
config 一 d a t a ( ) ; /* download FPGA configuration data */ 
if((command[counter] == 0x53) || (command[counter] == 0x73)) 
( /* if S/s, check any spaces */ 
counter++； 
while{command[counter] == 0x20) /* skip any spaces */ 
counter++； 
if(command[counter] == OxOd) /* until carriage return is detected */ 




command[counter] = OxOd； 
} 
while(command[counter] ！= OxOd)； 
counter = 0； 
} 





while(keypress ！= Oxlb); 
} 
send � 
int i ； 
{ 
outportb (MCR, 0x02) ； , .“. , , , v • ！ … ” */ 














gotoxy(menu_top_x, menu_top_y-l); ” 
cprintf ("====================="""_•) 
gotoxy(menu_top_x, menu_top_y); 
cprintf("Universal Prototyping Board")； 
gotoxy(menu_top_x, menu_top_y+l)； " 
cprintf ( " = = = = = = = = = = = = = = = = = = = = " = " " • ’ ） ' • 
textcolor(CYAN); 
gotoxy(menu_top_x, menu_top_y+3); 
cprintf("Press <CR> to begin..."); 
gotoxy(menu_top_x, menu_top_y+5); 
cprintf("At any time, press <ESC> to quit..."),-
textcolor(LIGHTGRAY); 
functionjcey = getchO ； 




m ； ： ‘ ：：。，^ 、 … 々 、 ； 
“ ('；'. V while (function_key ！ = ESC) ； > ' 、 ？ * ' ‘ � 
‘‘ textcolor(LIGHTGRAY) ； ' ^ 
P l i 、她 t � ‘ “ 
I fiV � � -
liA 广 \ rf “ f 人 
1% 、 ‘ 
• 3./:.,: - • . . . • • 
‘ ' 〜 卞 - ‘ 
. 《 ； 森 ， : —. "^；::.? “ / ’ , . • 
. • ‘ . •, “ ： . . 、:. . 
卞'‘，’ 丨 I r \ 
y .：广 . j . . . . - . 
. . 'I： ••’ ‘ 'T'. �W. L -,；> f；, “. a • . 
i M : 。 , ， • . . . ‘ 
••‘ v.. . • • 
V - • “ ， 
. V"-；:-. • ： • . . . . •、 ‘ • 
二 . .. . .... . . — ‘ • 
t -： ’ .：..、 
- . . . . . . . > ’ ， . ， - , 
I-. • 
(:.:..‘...’，...”..'、厂 . . •• . . . ； : . ：‘ ： ：• :“..-, • 
•'•： !(••： I • -. -:.‘..’，.....、丨‘...... -
Vr；.'.�…�.. . . . . -N 5.1.1:  ....:•.‘ , .,. -
：•私 • - . - ： , 
赵 ' ： 、 ： . ： . • � ’ • . ^ 
• ,„••. y t, ^  ••； . ^ 
. — — . 
-
. • . • . •-
rl - . .V V 二r ., .. • , “ . r.f u.v, • .T^.i-'j ‘ 、 , ‘ . . . . ‘ 
.>..•••• .丨丨 . . . . . . . ‘ 
- ‘ ‘ . . � � • • i . K -is.'" - .:• w ‘ -
r‘ :.”. .�‘:. “ • ‘ • - . • ., ‘ . -• 
jl. .. •‘v.r •••： . •‘ • • 
. . ‘ 一 5 
“ 、 -
、 “ . . . . . 
> . ‘ . - - • ； . . . • 、 - • ， - - ... -.“ I . .  . . •. ^ t . . . . . . . 
. • ,. • I .. I ’• ..: ’ . . . . . ‘ . , ~ •,. M . •• •• ^  • .  . • • • •‘ .，：、..• ' •；, . .‘ .A‘ • • -.i . - . .. I, , . .‘ .•“ 
./'•••••； .,;(>.’.‘ , , . - I ••-,.• ,• ‘ . . ， • , ‘ ” ’‘• . . . 辟。t. ；. .• . . ... /..r - „ . . . . . . ， , - . . . 
:-,；•？. ..-'-i^�.‘. . . . . . . . . • • •：-• ： /, - .,： . . . . . . ••-‘ • •： • . • , 
：今》二 ‘ ， . ： 、 . . . 、 . , . : . ： . . . . ‘ . , . . . 、 . ] . - . 、 ， ： - : 
？ 伊 、 ， � " � ‘ -
KI^S陣I:: :;,!::‘:!;『::：偏 
Upb.asm 
CPUVEC EQU $00 
CODES EQU CPUVEC+$000500 * program codes * 
ROMDATA EQU CPUVEC+$0079FE * data in ROM * 
INT6 EQU CPtJVEC+$007DFE * interrupt request IRQ6 * 
RAMDATA EQU CPUVEC+$080000 * volatile data in local RAM * 
STACK EQU CPUVEC+$0807FC * program stack * 
RAMO EQU CPUVEC+$OCOOOO 
BUFFER EQU CPUVEC+$180021 * disable RAMRW * 
RAMRW EQU CPUVEC+$180023 * enable RAMRW * 
MUXTM EQU CPUVEC+$180029 * trace mode * 
MUXNM EQU CPUVEC+$18002B * normal mode * 
LATCH EQU CPUVEC+$18002D * latch to reprogram/restart FPGAs * 
CLEAR EQU CPUVEC+$18002F * 1 short circuit * 
LCAO EQU CPUVEC+$1C0000 * FPGA 0 * 
SWFIRST EQU CPUVEC+$140000 * first switch address * 
MRIA EQU $180001 * mode register A * 
MR2A EQU MRIA 
CSRA EQU $180003 * clock select register A * 
SRA EQU CSRA * status register A * 
CRA EQU $180005 * command register A * 
TEA EQU $180007 * transmitter buffer A * 
RBA EQU TEA * receiver buffer A * 
ACR EQU $180009 * auxiliary control register * 
IMR EQU $180008 * interrupt mask register * 
IPR EQU $18001B * input port (unlatched register) * 
OPS EQU $18001D * output port set * 
OPR EQU $18001F * output port reset * 
EOS EQU $00 * end of string * 
BELL EQU $07 * bell * 
BS EQU $08 * back space * 
LF EQU $0A * line feed * 
CR EQU $0D * carriage return * 
SPC EQU $20 * space * 
COLN EQU $3A * colon * 
PRMT EQU $3E * prompt * 
* Data Storage * 
ORG ROMDATA 
MONMSG FCB LF.CR * monitor program message * 
FCC /UNIVERSAL PROTOTYPING BOARD/ 
FCB LF.CR 
FCC /Type <H> for HELP / 
FCB EOS 
PROMPT FCB LF,LF,CR 
FCB PRMT,EOS * prompt * 
HELPMSG FCB LF,LF,CR FCC /Universal prototyping Board HELP Menu/ 
FCB LF.CR , 
, ======================/ 
FCC /===================== FCB LF.CR FCC /MM <ADDR> <DATA> -- Modify Memory/ 
FCC %F^<SOURCE ADDR> <DEST. ADDR> <DATA> -- Block Fill Memory/ 
FCB LF,CR 
FCC /DM <ADDR> -- Dump Memory/ 
FCB LF.CR . 
FCC /PS <ADDR> <DATA> -- Program switch/ 
FCB LF.CR ^ ^ , 
FCC /DF -- Download FPGA Configuration data/ 
FCB LF.CR 
FCC /DS -- Download switch data/ 
FCB LF,CR 
FCC /RS -- Reset FPGA/ 
FCB LF,CR 
FCC /RP -- Reprogram FPGA/ 
FCB LF.CR 
FCC /TM -- Trace Mode/ 
FCB LF,CR 
FCC /NM -- Normal Mode/ 
FCB LF,CR , 
FCC /========:============================================"=/ 
FCB EOS 
… … T r i cnn rp TP CR * memory table content heading * 
HEADING FCB ， F , L F , C R � � 。 丄 0 3 04 05 06 07 08 09 OA OB OC OD OE OF/ 
FCB LF,CR / PCC / ======== =============== = ==== = / 
FCB LF,CR,EOS 
ENDING FCC /=========================================== 丨 
FCB EOS 
INVCMD FCB LF,CR,BELL 
FCC /Invalid Command!!/ 
FCB EOS 
Ixxviii 
INVPAR FCB LF,CR,BELL 
FCC /Invalid Parameter!!/ 
FCB EOS 
INVDATA FCB LF,CR,BELL 
FCC /Invalid Data!！/ 
FCB EOS 
ACCADDR FCB LF,CR,BELL 
FCC /Accessible Address:/ 
FCB LF,LF,CR 
FCC / Local RAM : $080800 - $080FFF &/ 
FCB LF.CR 
FCC / Sharing RAM： $0C0000 - $13FFFF./ 
FCB EOS 
INVADDR FCB LF,CR,BELL 
FCC /Source > Destination Address！！/ 
FCB EOS 
INVSW FCB LF, CR, BELL 
FCC /Address for Programmable Switches：/ 
FCB LF,LF,CR 
FCC / $140000 - $14007F./ 
FCB EOS 
ORG RAMDATA 
NEWDATA BLKB 1 * any new data received? * 
RXDATA BLKB 1 * stored received data * 
COUNTC BLKB 1 * character counter * 
CMDBUF BLKB 20 * command buffer * 
ORG CPUVEC 
SSP LONG STACK 
PC LONG CODES 
ORG CPUVEC+$78 AV6 DC.Ii INT6 * level 6 interrupt autovector at $78 * 
ORG CODES 
* Serial Port Initization * 
SERINIT MOVE.B #$20,GRA * reset receiver * 
MOVE.B tt$30,CRA * reset transmitter * 
MOVE.B #$10,GRA * reset MR to pointer to MRl * 
MOVE.B #$13,MiaA * 8 data bits, no parity * 
MOVE.B #$07,MR2A * 1 Stop bit * 
MOVE.B #0,ACR 
MOVE.B #$CC,CSRA * Tx & Rx elk.=38.4kxl6 * 
MOVE.B #$02,IMR 
MOVE.B #$05,CRA * Tx & Rx enabled * 
* Use Supervisor Mode * 
*********************** 
MOVE.W #$2000,SR * supervisor mode used * 
* FPGAS & Switches Initization * 
********二IRR*二FF,LATCH * FPGA reprogram & reset pull high * 
MOVE.B #$FB,LATCH * initialize switches * 
MOVE.B #$FF,LATCH ^ ^ * MOVE B #S0 CLEAR * initialize 1 short circuit * 
MOJE'B #5O;SFFER * initialize buffer between 68k & sharing RAMs * 
MOVE.B #$0,MUXTM * normal mode selected * 
MOVE.B tt$.0,MUXNM 
* Main Program * 
MOVEA.L #MONMSG,AO 
JSR TXSTR 




CLR.B NEWDATA ^ . . . . 






NEW BCLR.B #0,NEWDATA . ^ • * 
BEQ NEW * wait until data received * 
MOVE.B DO,RXDATA * recieve data from Receiver Buffer * 
CMPI.B #CR,RXDATA * if CR * 
BEQ CMDANA * determine which function chosen * 
CMPI.B #BS,RXDATA 
BEQ DELBUF 
CMPI B #20,COUNTC * character entered > 20 * 




JSR TXCHAR * loop back to monitor * 
ADDI.B #1,C0UNTC 
JMP NEW 
* Delete 1 Char in Command Buffer * 
DELBUF CMPI.B #0,COUNTC 
BEQ DELEND 








DELEND JMP NEW 
* Command Buffer Full * 




* Tx A String * 
* AO： Address Pointer * *********************** 







* Tx A Char * 
************* 
TXCHAR MOVEM.L DO,-(A7) 
BTST.B #2,SRA * tx ready? * 
BEQ TXCHAR 
MOVE.B DO,TEA 




* Command Analysis * 




CMPI.B #CR,(A0,D1.L) * no command entered 
BEQ NOCMD JSR TESTSPC * test Sc Skip spaces * 




CMPI.B (AO.Dl.L) * M/m for modify memory * 
BEQ MODMEM 
CMPI.B #,tn', (A0,D1.L) 
BEQ MODMEM 
CMPI.B #-B',(AO,Dl.L) * B/b for block fill memory * 
BEQ BLKFIL 
CMPI.B #'b',(AO.Dl.L) 
BEQ BLKFIL ^ 
CMPI B #'D',(AO.Dl.L) * D/d for dump memory or * 
BEQ • DANA * download FPGA configuration data * 
CMPI.B #,d,,(A0,D1.L) 
S P I . B (A0,D1.L) * R/r for reset or reprogram FPGA * 
BEQ RANA 
CMPI.B #'r',(AO.Dl.L) 
BEQ RANA ^ ^ 




CMPI.B (AO.Dl.L) * N/n for normal mode * 
BEQ NMD 
CMPI.B #'n',(AO.Dl.L) 




JMP CMDERR * Other is invalid * 
* No Command Entered * 
Ixxi 
* start over again * 
NOCMD JMP BEGIN 
* Test & Skip Spaces Between Parameters In Command * 
TKSTSPC CMPI.B #SPC,(AO,D1•L) * still space? * 
BEQ SKIP 
RTS 
SKIP ADDI.L #1,D1 
JMP TESTSPC 









* Display Error Message * 
CMDERR MOVEA•L #INVCMD,AO 
JSR TXSTR 
JMP BEGIN 
PARERR MOVEA.L #INVPAR,AO 
JSR TXSTR 
JMP BEGIN 
ADDRERR MOVEA.L #INVADDR,AO 
JSR TXSTR 
JMP BEGIN 
DATAERR MOVEA.L #INVDATA,AO 
JSR TXSTR 
JMP BEGIN 
NOTACC MOVEA.L #ACCADDR•AO 
JSR TXSTR 
JMP BEGIN 
SWERR MOVEA • L # INVSW, AO 
JSR TXSTR 
JMP BEGIN 
* Block Fill Memory * 
* A2： Source Address * 
* A3 ： Destination Address * 




BNE CMDERR * invalid command * 
BFMEM JSR BWTARGl 
JSR ASCTHEX * ASCII to hex conversion * 
CMPI.B #$FF,D3 * invalid hex address * 
BEQ PARERR * invalid parameter * 
JSR TESTADD * test accessible address * 





JSR TESTADD ^ ^ 




BEQ PARERR ^ ^  
CMPA.L A2,A3 * destination < source? 
BLT ADDRERR * invalid parameter 
CMPI.L #$FF,D� * data , : 
BHI DATAERR * invalid data 
JSR BWTARG3 _ , 
JSR INITBUF * enable buffer */ 
FILL MOVE.B DO,(A2)+ 
CMPA.L A3,A2 
BLE FILL 
JSR CLRBUF * disable buffer * 
JMP BEGIN 
* Check Valid Command between Arguments 1 * 









* ASCII To Hex Conversion * 
* D3： Invalid Hex if FF * 
* Valid Hex if 00 * 
ASCTHEX MOVEM.L D2/A0,-(A7) 
CLR.L DO 
CLR.L D3 
GETASC MOVE.B (A0,D1.L),D2 
CMPI.B #SPC,D2 * if space or * 
BEQ CONEND 
CMPI.B #CR,D2 * carriage return or * 
BEQ CONEND 
CMPI.B #E0S,D2 * end of string, end of conversion* 
BEQ CONEND 
CMPI.L #6,D3 * address > $FFFFFF * 













SUBI.B #$20,D2 * a - f * 
CONVERT SUBI.B #'0,,D2 
CMPI.B #9,D2 * 0 - 9 * 
BLS GETHEX 
SUBI.B #7,D2 * A - F * 











NOTHEX MOVE•L #$FF,D3 
MOVEM.L (A7)+,D2/A0 
RTS 
* Check Valid Command between Arguments 2 * 
BWTARG2 JSR TESTSPC 
CMPI.B #CR, (AO,Dl.II) 
BEQ PARERR 
RTS 
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
* Check Valid Command between Arguments 3 * 




* Modify Memory * 
* A1： Memory Location * 
MODMEM ADDI.L #1,D1 
CMPI.B #>M',(AO.Dl.L) 
BEQ MODIFY 
CMPI.B #'m' , (AO.Dl.L) 
BNE CMDERR 
MODIFY JSR BWTARGl 
JSR ASCTHEX 
CMPI.L #$FF,D3 
BEQ PARERR ‘ JSR TESTADD * test accessible address * 





CMPI.L #$FF,DO * data > $FF * 
BHI DATAERR * invalid data * 
JSR BWTARG3 
JSR INITBUF * enable buffer * 
MOVE.B DO,(Al) * write data * 
JSR CLRBUF * disable buffer * 
JMP BEGIN 
Ixxxiii 
* Test Accessible Address * 
TESTADD CMPI.L #$807FF,D0 * $0-807ff unaccessible * 
BLS NOTACC 
CMPI.L #$80FFF,D0 * $80800-80fff accessible * 
BLS VALADDR 
CMPI.L #$BFFFF,DO * $81000-bffff unaccessible * 
BLS NOTACC 
CMPI.L #$13FFFF,D0 . * $C0000-13ffff accessible * 
BLS VALADDR * initial buffer needed * 
JMP NOTACC * others unaccessible * 
VALADDR RTS 
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
* Initialize Buffer between 68k & sharing RAMs * ************************************************ 
INITBUF MOVE.B #$0,BUFFER 
MOVE.B #$0,RAMRW 
RTS 
* Clear Buffer between 68k & sharing RAMs * ******************************************* 
CLRBUF MOVE.B # $ 0,BUFFER 
RTS 
* D： Dump Memory or * 
* Download FPGA Configuration Data * 
DANA ADDI.L #1,D1 




CMPI.B #,F’，（A0,D1.U * F/f for download FPGA * 
BEQ DOWNLD * configuration data * 
CMPI.B #'f•,(AO'Dl.L) 
BEQ DOWNLD 




JMP CMDERR * others is invalid * 
******************** 
* Dump Memory * 
* D2： Column count * 
* D3： Row count * 






MOVEA.L D0,A1 * memory location * 
JSR BWTARG3 
JSR INITBUF * enable buffer * 
MOVEA.L SHEADING,AO 
JSR TXSTR 
DSPMEM MOVE.L # 2 0,Dl 
JSR PRTHEX ‘ 











COLUMN MOVE.B (A0,D2),D0 
JSR PRTHEX 
MOVE.B #SI>C,DO * next column * 
JSR TXCHAR 














TABLEND MOVEA.L #ENDING,AO 
Ixxxiii 
JSR TXSTR 
JSR CLRBUF * disable buffer * 
JMP BEGIN 
* Hex to ASCII Conversion & Display * 
* Dl： # Of Shift * 
* DO: HEX * 
PRTHEX MOVEM.L DO/Dl,-(A7) 





BLS PRT * 0 - 9 * 
ADDI.B #7,DO * A - F * 






PRTALL MOVEM.L (A7)+,DO/Dl 
RTS 
* Downloading FPGA Configuration Data & * 
* switch connection * 
* DO： ASCII * 
* D2: HEX * 
* D4: FPGA Data if $0 * 
* Switch Data if $1 * ***************************************** 
DOWNLD ADDI.L #1,D1 
JSR BWTARG3 
CLR.L D2 * final HEX value * 
CLR.L D3 * ASCII count * 
CLR.l D4 * initialization * 
MOVEA.L #LCA0,A1 
MOVE.B #$FE,LATCH * reprogram FPGA * 
MOVE.B #$FF,LATCH 
NEWCON BCLR.B #0,NEWDATA . 
















二 I 丄 FPGAE^ * download FPGA data completed * 
MOVE.B #$FD,LATCH * restart FPGAs * 
MOVE.B #$FF, LATCH . … I — * JMP BEGIN * download switch data completed * 
NUMBER SUBI.B #$30,DO 
JMP HEX 
CAPATF SUBI.B #$37,DO 
JMP HEX 
SMLATF SUBI.B #$57,DO 
HEX ADDI.L #1,D3 
LSL.L #4,D2 
ANDI.L #$F,DO 




DL 二 I 丄 二 . download PPOA data * 
MOVE.B D2,(Al)+ * download switch data * 
JMP DLCON 
DLDATA MOVE.W D2,(Al) 
DLCON CLR.L D2 
CLR.L D3 
JMP NEWCON 
FPGAEND MOVE.B #$0,BUFFER 
TESTSW BCLR.B ttO,NEWDATA 
BEQ TESTSW 
ANDI.L #$FF,DO , ^ ^ ^  
CMPI L #$AA,DO * download switch data signal * 
BEQ sw 
JMP BEGIN 
SW MOVE.L #$1,D4 
CLR.L D2 
CLR. L D3 
MOVEA.L #SWFIRST,A1 * first switch address * 




* Download Switch data * 
DOWNSW ADDI.L #1,01 
JSR BWTARG3 
JMP SW 
* R： Reset or * 
* Reprogram FPGA * 
RANA ADDI.L #1,D1 








JMP CMDERR * other is invalid * 
* Reset FPGA * 





* Reprogram FPGA * 





* * * * * * * * * * * * * * 
* Trace Mode * ************** 







TRACE MOVE.B #$0,MUXTM 
JMP BEGIN 
*************** 
* Normal Mode * 







NORMAL MOVE.B #$0,MUXNM 
JMP BEGIN 
* Program Switch * 
* D4: Lower Switch if $0 * 
* Upper Switch if $1 * 





PROG JSR BWTARGl 
JSR ASCTHEX 
CMPI.L #$FF,D3 
BEQ PARERR , � . ^ js二 TESTVSW * test valid switch position * 





CMPI.L #$1,D4 . 
BEQ UPPERSW * upper switch * 
CMPI.L #$FF,DO * lower switch, data > $FF * 
BHI • DATAERR * invalid data * 
JMP PROGCON 
UPPERSW CMPI.L #$3,DO * data > $3 * 
Ixxxvii 
BHI DATAERR * invalid data * 
PROGCON JSR BWTARG3 
MOVE.B DO,(Al) * program switch * 
JMP BEGIN 
* Test valid switch position * 
TESTVSW CLR.L D4 * initialization * 
CMPI.L tt$140000,D0 * $0-140000 invalid switch positions * 
BLT SWERR 
CMPI.L #$14007F,D0 * $140000-14007F valid switch positions * 
BLS TESTUPP * t^st upper/lower switch * 
JMP SWERR * others invalid * 
TESTUPP BTST.L #0,D0 
BNE TESTEND * lower switch * 
MOVE.L tt$l,D4 * upper switch * 
TESTEND RTS 
* IRQ 6: RS-232C Communication * ******************************** 
ORG INT6 
MOVE.B RBA,DO 




UPB Command Line Arguments 
Command Arguments Required Description 
"mi <ADDR.> <DATA> “ Modify Memory： modify a 
single memory location ADDR. 
with DATA 
" B F < S O U R . A D D R . > < D E S T . A D D R . > ~ Block Fill： modify a range 
<DATA> of memory locations from 
SOUR. ADDR to BEST. ADDR. 
with DATA 
<M)DR> • Dump Memory： dump 16 x 16 
data including memory 
location ADDR. 
"DF �filename� Download FPGA configuration 
(enter after prompted) data 
"DS < f i l e n a m e > . c o n D o w n l o a d Switch connection 
(enter after prompted) data 
n Reprogram FPGAs 
n Reset FPGAs 
- ； ^ n Trace Mode： single step the 
design for debugging 
n Normal Mode ： normal 
operation after dwonloading 
all data 
— n Help menu displayed 
Ixxxvii 
^ J t f "fx 1 •‘ - ! 
二 二 ： / 、 ‘ ” • , 1 
V ^ , ， i 
- ‘ ‘ . ’ ！ 
•.,丄心、i,,. -• ：•：• , , -J , , -J >'. •- > 
• . -r. y. • .. 1 >.,,/,., ‘ ... •/•'.r- i 
‘： . . ' ： 、 ， . . ’ ， ’ • • . • •• 丨 
• ： - . •-.、'• •• , ‘ ： , 、 . ： . . . . , 
• • •_ 、. ..�’ . 
-• . -u . ‘ 
• ‘ 
. ... . 
‘ • ‘... . - .‘ 









‘ • •“ • • T ‘ . 
• . 2 . • 
： . -. . . I I ‘ 
-rfl ： ‘ •• 1' - ‘ ••“ . • ‘ • • , 
..、.••• ••： •• . . . 乂 … 二 • S ‘ . ^  
‘ 添" . , • . .. • J: 
, , V 4 ‘ ‘ ‘ , . ‘ 
, « ^ -rf ^ . 1：- ‘ , ^ A 4 - ‘‘ , -. . • ‘ ‘ 
, . ^ ^ ^ ； , . ： ‘ � . . 
- � “ T i ： ^ 辦 r I , f. j, • “ - .. � , . .. ... . i 
• •  
CUHK L i b r a r i e s 
mmiMM 
• • Q E 7 5 7 f i O 
/ 
