Improving the performance of an FPGA based model design for sensor monitoring using PlanAhead tool by Arshak, Khalil et al.
Improving the performance of an FPGA based Model
design for sensor monitoring using PlanAhead tool
Khalil Arshak
Dept. Electronic and Computer Eng
University of Limerick
Limerick, Ireland
khalil .arshak@ul .ie
Essa Jafer
Dept. Electronic and Computer Eng
University of Limerick
Limerick, Ireland
Essa .jaferul .ie
Christian Ibala
CAE Logic drive
XILINX
Dublin, Ireland
christian .ibala@xilinx.com
ABSTRACT
The study in this paper is focused on the improvement of
a Field Programmable Gate Arrays (FPGA) based design
using a hierarchical analysis tool offered by XILINX
PlanAhead'TM. During this work, PlanAhead software is
used to address any problems on the physical side of our
FPGA design flow in order to add more visibility and
control.
The target system is reading analog information recorded
by a biomedical sensor in a transmitting unit attached to
the patient. The recorded data is converted digitally using
analog to digital converter (ADC) and sent to FSK
transmitter through FPGA. Verilog HDL has been used to
develop and implement the required functions of the
FPGA, such as bus interfacing, data buffering,
compression and framing. The system performance has
been optimized using a recent comprehensive tool in
order to reach and maintain the goals of the design.
1. INTRODUCTION
FPGAs devices are nowadays widely used as one of the
most important alternative to construct high-speed digital
systems. This technology was marketed in the middle of
the 1980s with a simple but strong argument: Its
capability to be in-house erased and reconfigured in few
milliseconds would allow the designers to correct errors
or introduce last-minute modifications. This feature
clearly distinguished the FPGAs from other alternatives
like standard cells or gate arrays, and guaranteed the
success of the new devices [1].
As FPGA-based Systems-on-a-Chip (SoCs) are getting
more popular, many of the issues regarding sensor
monitoring need to be tackled [2,3]. In this paper, the
design of a short-range wireless system prototype used for
biomedical sensor recording and implemented on an
FPGA device will be presented. The system timing and
low power constraints are needed for optimization, so that
to meet the area design requirements.
The PlanAhead software provides insight to the data flow
of the design by displaying I/0 interconnect as well as
physical block net bundles [4]. Timing constraints can be
then modified within the PlanAhead environment. These
analysis results can help to determine what logic should
be grouped together and floorplanned. Paths can be
logically sorted, grouped, and selected for floorplanning.
TimeAhead environment can also be leveraged with
imported timing results from the timing analyzer tool
within Xilinx Integrated Synthesis Environment (ISE'TM)
software [4].
TimeAhead is useful to validate and optimize the
constraint set before running any ISE implementation
tools. In addition it provides visual aides to comprehend
the physical implementation results. Design rule checks
(DRCs) are provided to catch errors early. It also flags
designs that do not properly take advantage of certain
device resources, such as the dedicated registers of the
XtremeDSPTM slice or RAM within the VirtexTM-4 FPGA
device.
Design solutions can be addressed quickly by visualizing
area problems, either in the register transfer logic (RTL)
or on the physical implementation side, without having to
continue RTL and synthesis iterations.
2. DESIGN FLOW USING PLANAHEAD
(PA)
The PlanAhead tool sits between synthesis and the ISE
place and route (P&R) tools as shown in Figure 1.
Figure 1: FPGA design flow using Plan Ahead
0-7803-9742-8/06/$20.00 © 2006 IEEE. 91
To increase the design performance, the HDL code has
been written taking into consideration the following
comments:
1- The design has been partitioned at the RTL level,
such that critical timing paths are confined to
individual modules. Critical paths that span large
numbers of hierarchical modules can be difficult
to floorplan.
2- The outputs of all modules have been registered
to help limit the number of modules involved in
a critical path.
3- Dividing large hierarchical block into smaller
RTL units to avoid the possibility of having long
paths, which makes the floorplan a difficult task.
Through analysis and floorplanning, physical constraints
are applied to help control the initial implementation of
the design. PlanAhead is also used after implementation
to analyze the placement and timing results in order to
improve the floorplanning and complete the design.
3. SYSTEM SPECIFICATIONS
The wireless system is consisting of two main units at
both, the transmitter and receiver sides [5]. At the
transmitter, the data recorded by the sensor will be first
converted to digital form using a specified ADC. Then an
FPGA device will read the data serially and implement
different processes like buffering, compression and
framing before send it to the FSK transmitter unit. A
second FPGA will be used to control the receiver side and
it is responsible for processing the data inversely like de-
framing and de-compressing. Spartan.m-3 device from
Xilinx [6] has been used in the design since it meets our
requirements and has enough resources.
3.1. Transmitter Side FPGA
The main blocks of the device are shown in Figure.2.
-I
-i-b- Control si g n aI
Data flow
Figure 2: Building blocks of the transmitter FPGA
From the Figure, the FPGA consists mainly from an SPI
(Serial Peripheral Interface), RLE (Run Length Encoding)
compressor and HDLC (High Data Link Control)
compressor units. The operation of the system units and
the flow of data through the system are controlled by a
main FSM (Finite State Machine) controller.
3.2. Receiver Side FPGA
At the receiver side, the system units of the FPGA are
organized as shown in Figure.3.
H-S
-o C o ntro sgna
_mlp D ata flo
Figure 3: Building blocks of the receiver FPGA
A data recovery unit is needed to extract the clock from
the received bit stream. The HDLC de-framer and the
RLE decompresser blocks are designed to reconstruct the
original data bytes sent by the transmitter.
4. VERILOG_HDL CODING
A very optimized Verilog code has been written to
describe the different block units of the design since it is
based only on the instantiation of the basic units that can
be invoked directly from the library. This is essential in
the design process to remove any complexity from the
model, which makes it easy to understand and debug. In
addition, unnecessary resources are not added by the
synthesize tool to the code, which leads to a power
efficient model design. An example of such code for the
ADC chip select is shown in Figure.4.
5. PLANAHEAD IMPLEMENTATION
In this section, the PlanAhead implementation on the
transmitter side FPGA will be presented. The on-chip
design partitions are referred as physical block (Pblocks).
92
Figure 4: Verilog_HDL sample code form the SPI design
With PlanAhead software, the utilisation estimates of the
device resources viewed below in Figure 5.
location of the longest timing path (critical path). In
Figure 7, the hierarchy of the design is displayed.
Figure 6: Design top-level schematic
PlanAhead software has an embedded static timing
analysis engine and environment called TimeAhead. With
this feature, timing estimations can be utilized at various
stages of design implementation. The longest path has
been explored and highlighted in Figure 8. This step is
necessary in the next coming stages to improve the
floorplan for better performance. It is worth to mention
here that in most cases the longest path is associated with
the module of the biggest size in a fully synchronous
design. In the hierarchy view shown in Figure 6, the
module top BlockRamMod is obviously the one, which
has the longest path.
Figure 7: Module hierarchy view from PlanAhead
Figure 5: device resources used for the target system
design
In order to get a good schematic-level view of the key
portions of the design, Planahead software has been used
for this purpose as well. The schematic of the design top-
level is presented in Figure 6. Such views can be valuable
aid in understanding how the modules of the design are
connected to each other.
It is recommended with PlanAhead to view and analyze
the hierarchy of the design. Such view can be useful to
implement the best floorplan and also may indicate the Figure 8: The longest logic delay path
93
6. FLOORPLAN STRATEGIES
6.1. Problem Description
Assume we are given a set of modules, each of them
having an associated resource requirement vector O=
(nl,n2,n3), which means this module requires ni CLBs,
n2 RAMS, and n3 multipliers. The FPGA floorplanning
problem is to place modules on the chip so that each
rectangular region assigned to a module should satisfy its
resource requirements.
For example, we have 6 modules, and their resource
requirement vectors are 0 l = (12,2, 1), 0 2 =(30,4,4), q0 3
= (15,1,1), 0q4 = (24,4,4), 05 = (18,2,2), 0q6 = (30,2,2).
Figure 9 is a feasible floorplan for these modules, which
shows a slicing structure [7].
Figure 10: Logic placement before Floorplanning
Table 1: Device utilization before using PlanAhead tool
Device Utilization Summary
Number ofBUFGMUXs 2 out of 8 25%0
Number ofDCMs 1 out of 4 25%
Number of External lOBs 6 out of 173 3%
Number ofRAMB 16s 2 out of 12 16%
Number of SLiCEs 143out of 1920 7%
Number of SLICEMs 1 out of 960 1%o
Figure 9: Example of Flooplan slicing structure
In the following sections, the use of Planahead tool to
implement our floorplan design strategies will be
explained.
6.2. Logic Placement Before Floorplanning
Figure 10, shows how the logic modules of the design are
distributed by the ISE tool inside the chip. The FPGA I/0
resources displayed as thin rectangles just outside the
device are the Input/Output (I/0) pads. I/0 banks are
displayed as thin rectangles just outside the I/0 pads.
Digital clock managers (DCMs) are shown graphically as
rectangles along the I/0 ring. The clock I/0 pins are
shown as filled rectangles. The interior of the device is
broken up into smaller rectangles called tiles. These tiles
contain placement sites for the different types of logic
primitives pertinent to the architecture being used.
In Table 1, a summary of the device utilization before
implementing floorplanning is given. The number of
slices has been highlighted since it is important figure that
indicates the device area covered by the design. Such
figure will be used later on for performance comparisons.
To get a closer view in the timing properties of the current
design, a timing report has been generated as in Figure 11.
The following is the flow used to floorplan our design.
The netlist file is generated first inside the ISE using XST
(Xilinx Synthesis Technology). Then the netlist file is
used to create a new project in Planahead. After
floorplanning the design, the netlist and the User
Constraint File (UCF) will be exported back into ISE
environment and the timing report can be read from the
place and route (P&R) results.
Figure 11: Timing report before implementing PlanAhead
94
From the Figure, it can be noticed the large difference
between the requested time for running the design (640
nsec) and the actual time given by the P&R report
(7.834nsec).
6.3. FLOORPLANNING TECHNIQUES
IMPLEMENTATION
In this section, PlanAhead software will be used to
implement two different strategies of floorplanning. The
output results of the implementation will be presented and
discussed. The main goal is to show how Planahead can
be employed successfully to optimize the placement area
of our design logic.
The first strategy is based on pulling the whole design
logic to be focused in one Pblock. This has been done as
shown in Figure 12, where all the modules have been
placed in one rectangular at the bottom right corner of the
device chip.
It is clear from the number of slices that the design area
has been compressed by more than 10%. To have a
complete picture, timing report has been generated using
the same procedure mentioned in the previous section and
it is shown in Figure 13.
Figure 13: Clock report after implementing the Planahead
using lPblock strategy
One interesting comment can be drawn from the above
figure, is the actual time has been increased from the one
before implementing this strategy. This can be explained
as the design now became more compact in a smaller
area, which leads to have data congestion. Despite this
increase in the time, it won't have any impact on our
design performance since we are meeting the required
time far below.
In the second strategy, two Pblocks will be used to place
the design modules and the necessary connections
between these two blocks can be visible in Planahead.
The placement of the 2Pblock is shown in Figure 14.
Figurel2: Logic placement after FloorPlanning using 1
Pblock strategy
The summary of the device utilization after implementing
this strategy is given in Table 2.
Table 2: Device utilization using Planahead strategy 1
Device Utilization Summary
Number ofBUFGMUXs 2 out of 8 25%
Number ofDCMs I out of4 25%
Number of External lOBs 6 out of 173 3%
Number ofRAMB16s 2 out of 12 16%
Number of SLICEs I128out of 1920 71%
Number of SLICEMs 1 out of 960 1%o
Figurel4: Logic placement after FloorPlanning using
2 Pblocks strategy
95
In the same manner, Table 3 is showing the device
utilization after implementing the second strategy. As
expected the design area has been slightly compressed
with this strategy.
Table 3: Device utilization using Planahead strategy 2
Device Utilization Summary
NumberofBUFGMUXs 2outof8 25%
Number ofDCMs I out of 4 25%
Number of External lOBs 6 out of 173 3%
Number ofRAMB16s 2 out of 12 16%
Numbe fSlice,140.. t 19207%
Number of SLICEMs 1 out of 960 1%
Finally, the timing report of this implementation is
showing less value for the actual time as presented in
Figure 15. With this strategy, larger area is provided for
the design, which reduces the possibility of having high
data congestion rate.
Figure 15: Clock report after implementing the Planahead
using 2 Pblocks strategy.
Another highlighted field in the utilization summary
tables is the one shows the locked Input/Output Blocks
(JOBs). Before implementing floorplanning, only two
pins have been given positions in the UCF, these are the
clock and reset pins. As the physical locations of these
two pins were compulsory, the floorplanning was
influenced by this fact. The closest physical location to
the design has been chosen to lock the other pins after
floorplanning. That's why the number of locked pins is
complete in tables 2 and 3.
By doing a comparative study for the two suggested
floorplanning techniques, it is obvious that the first
strategy gives the optimum design area solution. On the
other hand, the speed of data transfer might be affected
due to the data congestion that occurs between the
modules but this won't be a serious issue in our case. The
second floorplanning strategy slightly reduces the area
since more space will be provided for the design modules.
The data speed has been improved in compared with the
first strategy but still our design meets the requested
timing requirements in all the cases.
7. CONCLUSIONS
In this paper, PlanAhead tool has been employed as a
hierarchical software environment after synthesis to
analyze, modify, constraint and implement our design. It
has been shown that a significant reduction in both the
number and the length of design iterations can be obtained
when using this tool.
A wireless system, which has been designed for sensor
monitoring, required some optimization. Mainly,
PlanAhead has been used to optimize the area occupied
by the entire design, giving a better insight into the place
and route process. Two strategies have been adopted in
this work based on the number of Pblock placed to
contain the different modules of the design. From the
presented comparative results, using a single Pblock
floorplanning design was representing the best scenario in
term of area compressing.
It is worth to mention that when the design become more
complicated, it would be better to consider the other
option of having more than one Pblock if the design time
is critical.
8. REFERENCES
[1] Sergio, L, B., Javier, G., and Eduardo I, B., " Dynamically
inserting, operating, and elemnating thermal sensors of
FPGA based system", IEEE Trans. Components and
Packaging Technologies, Vol. 25, 2002, pp. 561-566.
[2] Sagahyroon, A., Al-Khudairi, T., "FPGA-based acquisition
of sensor data", IEEE International Conference Proc. on
Industrial Technology, Vol.3, Dec. 2004, pp. 1398-1401.
[3] Velusamy, S., Wei Huang, Lach, J., Stan, M., Skadron, K.,
"Monitoring temperature in FPGA based SoCs",
International Conference Proc. on Computer Design, Oct.
2005, pp.634 - 637.
[4] XILINX, Inc., "PlanAhead User Guide", Release 8.1, 2006.
[5] Arshak, K., Jafer, E., "Modeling remote system for sensor
monitoring using Verilog HDL and SIMULINK co-
simulation" IEEE Proc. BMAS, Sept 2005, pp.64-69
[6] XILINX, Inc., "Spartan 3, FPGA FPGA Family", Complete
data sheet, 2006.
[7] Lei Cheng, Wong, M.D.F., "Floorplan design for multi-
million gate FPGAs", IEEEIACMInternational Conference
Proc on Computer Aided Design, Nov. 2004, pp.292 - 299.
ACKNOWLEDGMENTS
This work was supported by the Enterprise Ireland
Commercialization Fund 2003, under technology
development phase, as part of the MIAPS project,
reference no. CFTD/03/425.
Also we wish to thank Xilinx University Program (XUP)
for their valuable support.
96
