Deep Learning-Based FPGA Function Block Detection Method using an
  Image-Coded Representation of Bitstream by Chen, Minzhen & Liu, Peng
1Deep Learning-Based FPGA Function Block
Detection Method using an Image-Coded
Representation of Bitstream
Minzhen Chen and Peng Liu, Member, IEEE
Abstract—Examining field-programmable gate array (FPGA)
bitstream is found to help detect known function blocks, which
offers assistance and insight to analyze the circuit’s system
function. Our goal is to detect one or more than one function
block in FPGA design from a complete bitstream by utilizing the
latest deep learning techniques, which do not require manually
designing features. To this end, in this paper, we propose a
deep learning-based FPGA function block detection method by
transforming the bitstream into a three-channel color image. In
specific, we first analyze the format of the bitstream to find
the mapping relationship between the configuration bits and
configurable logic blocks. Next, an image-coded representation
of bitstream is proposed suitable for deep learning processing.
This bitstream-to-image transformation takes into account of the
adjacency nature of the programmable logic as well as high
degree of redundancy of configuration information. With the
color images transformed from bitstreams as the training dataset,
a deep learning-based object detection algorithm is applied for
generating the function block detection results. The effects of
EDA tools, input size of the deep neural network, and the
data arrangement of representation on the detection accuracy
are explored. The Xilinx Zynq-7000 SoCs and Xilinx Zynq
UltraScale+ MPSoCs are adopted to verify the proposed method,
and the results show that the mean Average Precision (IoU=0.5)
for 10 function blocks is as high as 97.72% for YOLOv3 detector.
Index Terms—Field-programmable gate array, bitstream,
image-coded representation, function block detection.
I. INTRODUCTION
F IELD-PROGRAMMABLE gate arrays (FPGAs) havebecome popular and been widely applied in various
fields, such as communication, computation, deep learning,
and digital signal processing, due to its configurability, fast
development-cycles, and abundant resources. Because of the
low-latency and low-power features, FPGAs also play an
important role in highly real-time and embedded systems. With
widespread applications, the security issues of the FPGAs are
gaining more and more focus. Since the function of the FPGA
is fully determined by the bitstream used to configure the
FPGA, it is possible to retrieve the design details of the circuit
implemented in FPGA by reverse engineering the bitstream.
This paper focuses on the reverse analysis of circuit design
function. Detecting known function blocks from the circuit
design can play the role of pre-analysis and offer assistance to
the analysis of the circuit’s system function. After the function
The authors are with the College of Information Science and Electronic
Engineering, Zhejiang University, Hangzhou 310027, China.
E-mail: {chenmz, liupeng}@zju.edu.cn.
FPGA
.bitApplication 
algorithm
Function 
blocks
bitstream 
access
Function block 
detection
Fig. 1. For instance, an application scenario demonstrates that FPGA function
block detection helps analyze the circuit’s system function.
blocks are detected, the system function implemented on the
circuit can be further found out.
We take the scenario shown in Fig. 1 as an example to
demonstrate that FPGA function block detection helps analyze
the circuit’s system function. In this scenario, circuit’s system
function refers to the application algorithm implemented on
FPGA and the application algorithm contains different kinds of
function blocks. After bitstream access, function blocks con-
tained in the application algorithm can be identified through
FPGA function block detection from the bitstream. Then, the
application algorithm can be found out or narrowed down to
several candidates.
To identify the function blocks in FPGA designs, re-
searchers in [1], [2] analyzed bitstreams or netlists, partitioned
circuits, and compared the content of the partitioned circuits
from bitstreams or netlists with existing designs using conven-
tional algorithms. However, the partitioning process is time-
consuming and the imperfect partitioning can lead to incorrect
matching results. Features for conventional algorithms need to
be designed manually, and improperly selected features result
in the performance degradation of conventional algorithms.
Deep neural networks (DNNs) have been used to classify
arithmetic operator modules in circuits [3], [4], [5], due to its
good performance, strong ability to learn from large amounts
of data, and the advantage of no manually designing features.
Dai et al. [3] and Fayyazi et al. [4] classified arithmetic
operators from gate-level circuits, which requires the bitstream
to be reverse engineered first. Mahmood et al. [5] classi-
fied arithmetic operators from partial bitstreams. However,
the work in [5] lacks the processing of raw bitstream data
and could only classify the circuit containing one hardware
module.
Inspired by the work mentioned above, the goal of this paper
is to detect one or more than one function block in the FPGA
circuit design from a given complete bitstream by making
use of the latest deep learning research results. Since more
than one function block is to be detected at the same time,
object detection networks are used in this paper instead of the
ar
X
iv
:2
00
7.
11
43
4v
1 
 [c
s.O
H]
  2
0 J
ul 
20
20
2classification networks similar to the ones mentioned above.
Because of the discontinuity of the configuration bits for one
element and the high degree of redundancy of configuration
information, the raw bitstream data should be pre-processed.
Therefore, an effective representation of bitstream should be
proposed so that DNN can make full use of its feature
extraction capability.
In this paper, we propose an FPGA function block detec-
tion method based on deep learning using an image-coded
representation of bitstream. For the purpose of improving the
detection accuracy and compressing the size of the dataset,
the representation should reflect the adjacency of the pro-
grammable logic and remove the useless information. The
approaches taken are: 1) finding the mapping relationship
between the configurable logic block (CLB) elements and
the configuration bits, and 2) using only configuration bits of
CLBs for representation. Then, the bitstreams are transformed
into images by the proposed representation and the deep
learning techniques are applied by training an innovative
object detection algorithm on the images. Researchers [6]
have implemented a variety of application-specific encryption
algorithms containing different cryptographic operators, such
as Advanced Encryption Standard (AES), Secure Hashing
Algorithm 1 (SHA-1), Message Digest Algorithm 4 (MD4),
and so on, on FPGA devices. Our work takes the cryptographic
operator detection as an example to verify the methodology.
In summary, the main contributions of this paper include:
• A three-channel color image-coded representation of bit-
stream suitable for deep learning processing is proposed
by analyzing the mapping relationship between the con-
figuration bits and CLB elements.
• A dataset, in which the images are transformed from
bitstream files containing 10 kinds of cryptographic op-
erators, is generated without manual annotation.
• The deep learning techniques are applied to FPGA func-
tion block detection from bitstream for the first time by
training a deep learning-based object detection algorithm
on the dataset. The mean Average Precision (mAP)
reaches 97.72% for 10 kinds of function blocks when
Intersection over Union (IoU) is 0.5.
The rest of the paper is organized as follows: Sec. II briefly
describes the background of FPGAs and deep learning-based
object detection algorithms. Sec. III firstly introduces the
overall process of the detection method and then describes
each step of the method in details. The experimental results
are presented in Sec. IV. Sec. V discusses the related work
and Sec. VI summarizes the findings.
II. BACKGROUND
In this section, the basic knowledge of FPGAs and function
block detection is presented at first (Sec. II-A). Then, the
background of deep learning and object detection algorithms
is introduced (Sec. II-B).
A. FPGA and Function Block Detection
FPGA. FPGAs are widely applied in digital designs. Com-
pared to application-specific integrated circuits (ASICs), FP-
GAs have the advantages of reconfigurability and flexibility,
which bring about lower costs and shorter development time.
There are several kinds of FPGAs, such as static random-
access memory (SRAM) FPGA, flash memory FPGA, and
anti-fuse FPGA. Although flash memory FPGA and anti-fuse
FPGA have higher security over SRAM FPGA, these kinds
of FPGAs have a more complex process and limited writing
times. SRAM FPGA occupies the main market and is more
widely applied in daily life than the other kinds of FPGAs.
With the increase of application demand, more and more re-
sources are integrated into FPGAs. For example, Xilinx Zynq-
7000 SoC includes embedded microprocessors in the FPGA
system. Xilinx SoC usually consists of the processing system,
programmable logic, and many other features all in one silicon
chip. The processing system includes microprocessors, inter-
connection interface, external memory interface, and so on.
The programmable logic includes CLBs, Input/Output Blocks
(IOBs), Block RAMs (BRAMs), Digital Signal Processors
(DSPs), and others. CLB resources are the logical resources
for realizing combinational logic circuits and sequential logic
circuits. A CLB element contains Look Up Tables (LUTs) and
Flip-Flops (FFs). Switch Matrix connects the CLB to the other
resources, allowing flexible wiring. CLBs are arrayed into a
two-dimensional matrix in the programmable logic. BRAMs
are used for dense storage, and DSPs are used for high-speed
computing.
Bitstream. An FPGA bitstream is a binary file that contains
configuration information for an FPGA design. In SRAM FP-
GAs, the bitstream is placed in external non-volatile memory.
The programming process of an FPGA is to load a bitstream
file into the FPGA. The reconfigurability of FPGA benefits
from the bitstream file. However, the bitstream also brings
about lots of security issues for FPGAs. There are many FPGA
vendors providing bitstream encryption for FPGAs to improve
the confidentiality of the bitstream.
FPGA security. Since the security of the processing system
in FPGA corresponds to processor security, FPGA security
here focuses on the security of programmable logic in the
FPGA. Compared to ASICs, FPGAs are more vulnerable to be
attacked because of the binary file used to configure the FPGA
design. FPGA security threats include cloning, overbuilding,
reverse engineering, tampering, spoofing, and so on. These
FPGA security threats may lead to intellectual property (IP)
theft, circuit design leakage, privacy issues, and even FPGA
system damage.
The reverse engineering of FPGAs refers to analyzing
the bitstream format and transforming the bitstream to the
netlist [7]. The netlist contains the components and connection
information among the components. The reverse engineering
of FPGAs consists of the following several steps: bitstream
access, bitstream decryption [8], [9], [10], and bitstream
reverse engineering [11], [12], [13], [14]. The bitstream access
and bitstream decryption are not the focus of this paper.
Function block detection. In this paper, a function block in
the FPGA system design refers to a circuit block implementing
a complex function, such as a cryptographic operator like
SHA-1. A function block is more complex and occupies more
hardware resources than a simple hardware module, such as
an adder module or a multiplication module. Function block
3detection should not only pick out one or more than one
function block from the FPGA design but also point out the
locations of the function blocks in the FPGA device diagram.
However, classification can only give one prediction result for
one test sample. Therefore, function block detection means
partition first, then classification for the partitioned sample.
Function block detection plays a role of pre-analysis in the
analysis of the circuit’s system function.
B. Deep Learning and Object Detection Algorithms
Deep learning. Deep learning has become a popular re-
search field with many applications, such as image process-
ing, speech recognition, medical diagnosis, unmanned ground
vehicle, and so on. Deep learning makes it needless to design
features manually because deep learning extracts high-level,
abstract features from raw data by several layers [15]. Convo-
lutional neural networks (CNNs) and recurrent neural networks
(RNNs) are popular DNNs, which are widely used in various
fields. The deep learning model learns lots of experience from
a large dataset. This process of learning is called training,
and this dataset is called a training set. After training, the
performance of the deep learning model is usually tested in
another dataset, which is called a test set.
CNN. CNNs are one of the most important deep learning
networks which gain great success in many applications.
CNNs are a kind of DNN with at least one convolution layer,
which uses convolution calculation instead of general matrix
multiplication. CNNs usually consist of convolution layers,
pooling layers, fully connected layers, and so on. Because of
the convolution calculation, CNNs are able to extract local
features of the image and have fewer parameters than fully
connected neural networks. Therefore, CNNs achieve a quite
good performance in two-dimensional image applications.
Object detection. The function of object detection algo-
rithms is to find the location of the interesting objects in
an image and to give a classification probability for each
object. Object detection algorithms generally fall into two
categories: conventional methods based on manually designing
features and deep learning-based methods. Deep learning-
based methods do not need specifically defined features and
have been developed with good performance. To date, there are
two groups of deep learning-based object detection algorithms.
One is two-stage object detection algorithms based on region
proposal and classification, such as R-CNN [16], Fast R-
CNN [17], and Faster R-CNN [18]. The other one is one-
stage object detection algorithms based on regression, such
as You Only Look Once (YOLO) [19], [20], [21], and Single
Shot MultiBox Detector (SSD) [22]. The biggest advantage of
the one-stage object detection algorithms is the fast detecting
speed with the necessary accuracy.
Some of the function blocks occupy a small part of the
whole image transformed from the bitstream by the image-
coded representation, and sometimes there is more than one
function block in an image. Considering the characteristics of
the function blocks in the image, object detection algorithms
are more suitable for our work than other kinds of algorithms,
such as the classification algorithms. Classification algorithms
cannot work on the bitstreams containing more than one
function block, because classification algorithms can only
classify one category for each image.
Object detection is widely applied in various fields, such
as face detection, object tracking, security, unmanned vehicle,
robot, and so on. However, it has never been applied to FPGA
function block detection. Our work combines the image-coded
representation of bitstream and an object detection algorithm
based on deep learning to realize the detection of function
blocks in the FPGA bitstreams for the first time.
YOLO. YOLO is the first deep learning-based object de-
tection algorithm with the idea of end-to-end training [19],
which means YOLO does not have the process of region
proposal and uses a single network. The object detection
problem is regarded as a regression problem. YOLO extracts
the features from images and predicts the location information
of the bounding boxes and class probability directly. Compared
to two-stage object detection algorithms, YOLO greatly im-
proves the running speed and has better generalization ability.
With the development of YOLOv1 [19], YOLOv2 [20], and
YOLOv3 [21], YOLO has significantly improved the speed,
detection class number, location accuracy, and accuracy on
small objects. From YOLOv2, YOLO uses convolution layers
instead of the last fully connected layer and takes multi-scale
training so that multi-scale input images can be predicted by
the same network. Among various deep learning-based object
detection algorithms, YOLOv3 [21] is the network with the
best comprehensive performance since YOLOv3 achieves high
accuracy and fast speed at the same time.
SSD. SSD [22] is another classical one-stage deep learning-
based object detection algorithm, which is also based on the
idea of end-to-end training. SSD is able to detect multi-scale
objects by computing the results from multiple feature maps
with different sizes. The speed and accuracy of SSD are
higher than YOLOv1 when SSD was presented. However,
the performance of SSD is soon exceeded by YOLOv2 and
YOLOv3.
III. METHODOLOGY
A. Overview
The process of function block detection consists of the
following steps, as is shown in Fig. 2. First of all, the analysis
of the FPGA bitstream format is carried out and the mapping
relationship used for the representation of bitstream is found.
Secondly, a large number of bitstreams are transformed into
images by the image-coded representation of bitstream, and a
dataset containing a lot of images is generated without manual
annotation. Then, a process of deep learning training is carried
out and a model with good performance is obtained. When
testing, an FPGA bitstream should be transformed into an im-
age, which passes through the deep network later. At last, the
detection result is obtained. The process of dataset generation,
deep learning training and testing could be implemented by
scripts automatically. The rest of this section will describe
each step in detail.
4deep 
learning 
model
training
bitstreams
a bitstream
an image
images
function block 
detection result
bitstream 
format analysis
bitstreams
image-coded 
representation
automatic  
annotation inference
Bitstream 
format analysis
Dataset generation 
and training
Test
image-coded 
representation
.bit
mapping 
relationship
mapping 
relationship.bit
mapping 
relationship
dataset
.bit
Fig. 2. The process of function block detection method consisting of bitstream
format analysis, image-coded representation of bitstream, dataset generation,
deep learning training, and deep learning inference.
B. Bitstream Format Analysis
The components of FPGAs are described in Sec. II-A. The
bitstream files are used to configure the programmable logic.
The programmable logic of FPGA can be divided into several
Clock Regions. Each Clock Region consists of many columns
of CLBs, and q CLBs make up a column of CLBs. For
instance, Fig. 3 shows the Clock Regions of Xilinx Zynq-7000
SoC ZC702 Evaluation Board. The dark blue area represents
columns of CLBs, and the number of columns and rows of
CLBs are marked in Fig. 3. For Xilinx Zynq-7000 SoCs, a
CLB element contains two slices, and each slice consists of 4
LUTs and 8 FFs [23]. For Xilinx Zynq UltraScale+ MPSoCs,
a CLB element contains one slice, and each slice consists of 8
LUTs and 16 FFs [24]. There are two types of slices, SLICEL
(logic) and SLICEM (memory) respectively.
The Xilinx FPGA bitstream consists of Head-of-File, FDRI
(Frame Data Register Input) data, and End-of-File. Among
these three parts, FDRI data contains the configuration infor-
mation and is the main content. The configuration memory
is arranged in frames, which are the smallest addressable
segments of configuration memory space. Each frame contains
m 32-bit words (101 words for Xilinx Zynq-7000 SoCs [25]
and 93 words for Xilinx Zynq UltraScale+ MPSoCs [26]).
According to the information provided by Head-of-File, the
length of FDRI data is known, which is decided by the FPGA
devices.
Since the open official document does not mention how the
FDRI data configure CLBs, our work finds it through lots of
analysis. We have found that every successive n frames of the
FDRI data configure a column of CLBs (q CLBs), as is shown
in Fig. 4. Except for the l words in the middle of the frame,
57
1
3
2
4
5 6
12 9 6
PS
17
5
0
5
0
5
0
Fig. 3. As an example, ZC702 FPGA consists of the programmable logic and
the processing system. The programmable logic can be divided into 6 Clock
Regions. The dark blue area represents columns of CLBs, and the digits in
black show the numbers of columns and rows of CLBs of each Clock Region.
The total of CLBs in ZC702 FPGA is 6,650.
...
n frames
a column of CLBs
(q CLBs)
m 
words
Frame 0 Frame 1 Frame n-1
Not for any 
CLBs
l 
words
...
word 0
word p-1
...
word m-1
word m-p
...
word (m-l)/2
word (m+l)/2-1
...
...
...
...
...
word 0
word p-1
...
word m-1
word m-p
...
word (m-l)/2
word (m+l)/2-1
...
...
...
...
...
word 0
word p-1
...
word m-1
word m-p
...
word (m-l)/2
word (m+l)/2-1
...
...
...
...
CLB q-1
CLB 0
Fig. 4. Every successive n frames of the FDRI data configure a column of
CLBs from bottom to top. The configuration data for one CLB is distributed
at the same location in the n frames.
every p words in the left m-l words of a frame correspond to
a CLB from the bottom to the top of the column. For Xilinx
Zynq-7000 SoCs, a CLB contains two slices. The p words
configure separately the two slices in a CLB from left to right.
Therefore, each CLB in a column of CLBs needs p×n words
to configure, which are distributed at the same location in the
n frames.
In the n frames of the FDRI data configuring the same
column of CLBs, the first frame has a fixed position in the
bitstream. The positions of every first frame can be found by
configuring the different columns of CLBs repeatedly. As an
example, the composition of the bitstream of ZC702 FPGA is
shown in Fig. 5. The number of frames configuring a column
of CLBs (n) is 36 for ZC702 FPGA. Sometimes the gap
between two consecutive first frames is 64 frames instead of
36 frames, because there is a column of BRAMs between
the two columns of CLBs, and a column of BRAMs need 28
frames to configure. After the positions of every first frame
5Head-of-File
FDRI data
End-of-File
Clock Region 1
Clock Region 2
Clock Region 3
Clock Region 4
Clock Region 5
Clock Region 6
Column 2 of CLBs
Column 3 of CLBs
Column 4 of CLBs
Column 5 of CLBs
BRAMs
Column 7 of CLBs
Column 8 of CLBs
Column 9 of CLBs
Column 1 of CLBs
Column 10 of CLBs
Column 11 of CLBs
Column 12 of CLBs
BRAMs
Column 6 of CLBs
Frame 3736
Frame 5204
Frame 6302
Frame 652
Frame 724
Frame 788
Frame 860
Frame 924
Frame 996
Frame 688
Frame 760
Frame 824
Frame 888
Frame 960
Frame 1032
Frame 1068
Frame 1104
ZC702 FPGA
bitstream file
Frame 652
Frame 1170
Frame 3218
Fig. 5. The bitstream file is arranged in frames. As an example, the
composition of a bitstream file of ZC702 FPGA and the positions of some
first frames are shown. The number of frames configuring a column of CLBs
(n) is 36 for ZC702 FPGA.
are found, the configuration bits for every CLB element can
be extracted from the raw data by the Python script.
C. Image-Coded Representation of Bitstream
When proposing an image-coded representation of bit-
stream, there are two challenges to be faced. The first chal-
lenge is that the configuration bits of one CLB element are not
consecutive in a bitstream. In order to extract features from the
image-coded representation, the representation should reflect
the adjacency of the programmable logic. The discontinuity
of the configuration bits makes it difficult to reflect the
adjacency of the programmable logic by the representation.
The approach taken is analyzing the bitstream format and
finding the mapping relationship between the CLB element
in the device location diagram and the configuration bits in
the bitstream. So that the two-dimensional location distribution
diagram of the programmable logic is obtained by the mapping
relationship.
The second challenge is that not all configuration informa-
tion in the bitstream file is useful for function block detection.
This challenge leads to two bad effects: 1) Since our work
focuses on the logic resources, the utilization information of
other resources may confuse the function block detection. For
instance, the utilization of BRAMs will change if the array size
in the function block changes, so the utilization of BRAMs
may be very different for the same kind of function blocks.
However, the logic in the function block will not change
according to the data size. Therefore, the configuration bits,
which are irrelevant to the logic resources and helpless for
function block detection, can be dropped. 2) The large size
of images leads to a large image dataset and low speed of
reading image during the deep learning training. For instance,
the storage size of a bitstream of ZC702 FPGA is 3.86 MiB,
which is fixed for one type of FPGA device. If all of the
configuration bits in a bitstream are transformed into an image,
the size of the image will achieve 1280×1080×3, which is
word 25word 24
word 13word 12
6
8
a slice
...
...
36 words
(144 bytes)
...
...
word 1word 0
word 3word 2
word 5word 4
word 7word 6
word 9word 8
word 11word 10
word 0
word 1
word 35
Fig. 6. For Xilinx Zynq-7000 SoCs, the 36-word configuration bits of a slice
are transformed into a three-channel color image with 6×8 pixels.
too large for deep learning training. The approach taken is
using only configuration bits of CLBs for representation. Since
approximately 60% of the configuration bits in a bitstream are
used for configuring CLBs, using only configuration bits of
CLBs can compress effectively and drop useless information
at the same time.
On the basis of the above bitstream analysis (Sec. III-B),
we propose a three-channel color image-coded representation
of the bitstream. The device diagram of FPGA is divided
into a×b blocks since each block corresponds to a CLB.
The representation of each CLB is done separately and the
representation results of all of the CLBs are aggregated to
obtain the entire device location map.
According to the analysis of the bitstream format
(Sec. III-B), each CLB is allocated 4p×n bytes configuration
memory from successive n frames. For Xilinx Zynq-7000
SoCs, each slice is allocated 2p×n=144 bytes. For Xilinx
Zynq UltraScale+ MPSoCs, each slice is allocated 4p×n
bytes (n is different for different types of slices). The several
bytes of configuration data for a slice are transformed into a
separate three-channel RGB image with the proper height and
width. For example, the configuration bits of a slice in Xilinx
Zynq-7000 SoCs are transformed into a three-channel color
image with 6×8 pixels. A pixel with three channels occupies
three bytes. As is shown in Fig. 6, the configuration bits are
considered as the image data arranged in the order of (channel,
height, width). For Xilinx Zynq-7000 SoCs, the image-coded
representation of bitstream has (a×6)×(b×2×8) pixels with
three RGB channels. For Xilinx Zynq UltraScale+ MPSoCs,
the size of the image-coded representation of bitstream is
decided by the numbers of the SLICEL and SLICEM. The
configuration bits for every slice can be transformed into the
three-channel image-coded representation by the Python script
after the configuration bits are extracted from the bitstream file.
D. Generation of Dataset
The image-coded representation of bitstream is used to
transform a large number of bitstream files into images, which
are gathered into a dataset for deep learning training and
testing. Each bitstream file implements an algorithm and each
algorithm contains one or more than one function block.
In a practical application, one kind of function block has
different constructions, such as the original one with no special
design and the pipelined one. Therefore, each kind of function
block is implemented in one or two constructions when the
bitstreams are generated.
6Multiple bitstream files containing various kinds of function
blocks are needed for the training of the deep network. We
generate a large number of bitstreams by EDA toolset (Xilinx
Vivado). The constraint of the implementation region makes
the function blocks placed in different locations in different
bitstreams. Tcl (Tool Command Language) [27] of Xilinx
Vivado is used instead of a graphical user interface (GUI).
The categories and locations of the function blocks in the
FPGA device diagram can be extracted from the EDA toolset
when the bitstreams are generated by Tcl scripts. Finally, the
bitstream files are transformed into images using the Python
scripts. And the Python scripts process the label information
into annotation files for deep learning at the same time.
E. Architectures of the DNNs and Training for Function Block
Detection
Since the deep learning techniques have not been applied
to the FPGA function block detection, our work makes use
of the image-coded representation of bitstream and the image
feature extraction capabilities of DNNs. YOLOv3 and SSD are
two classical one-stage deep learning-based object detection
algorithms, which have fast speed and high accuracy at the
same time, especially the speed is much faster than the speed
of two-stage deep learning-based object detection algorithms.
The reason we choose YOLOv3 and SSD is their high com-
prehensive performance. The architectures and training process
of YOLOv3 and SSD used in our work are introduced briefly
below.
YOLOv3. YOLOv3 consists of 75 convolution layers. The
sizes of kernel/stride include 1×1/1, 3×3/1, and 3×3/2. The
backbone of YOLOv3 is Darknet-53 [21]. There are three
output convolution layers with different sizes of feature maps
for detecting objects of different sizes. The three output
convolution layers have the same filter number according to
the class number. In YOLOv3, we divide the input picture into
grids, and each grid cell predicts box number bounding boxes
(box number is 3 for YOLOv3). One objectness prediction, C
class predictions for C classes, and 4 box offsets are predicted
for each bounding box. The objectness prediction quantifies
how likely the image in the box contains a generic object [28].
Thus, the filter number of the output layers is 3×(1+C+4). For
example, we apply YOLOv3 to the detection of 10 kinds of
function blocks, so the number of classes C is 10 and the filter
number of the output layers should be 45.
When training YOLOv3, we take the pre-trained weights
for the COCO dataset [29] as initial weights. During the first
50 epochs of the training process, the front layers are frozen to
get a stable loss value, except the last three output convolution
layers. From the 51st to the 100th epoch, all of the layers
are unfrozen and trained with a smaller learning rate. When
the training is finished, choose the model with the smallest
validation loss value as the final model. Eventually, test the
model on the test set.
SSD. SSD consists of 29 convolution layers and 4 max-
pooling layers. The sizes of kernel/stride are the same as
YOLOv3. The backbone of SSD is VGG-16 [30]. There are
six output convolution layers for objects of different sizes.
TABLE I
PARAMETERS FOR THE TRAINING PROCESS OF YOLOV3 AND SSD.
Deep neural network YOLOv3 SSD
Input size1 416×416×3 300×300×3
Optimizer Adam [31] Adam
The first stage Batch size 32 32
of training Learning rate 0.001 0.001
The second stage Batch size 16 32
of training Learning rate 0.0001 0.01
1 The images are resized to the input size of DNN before fed into the
DNN.
Similar to YOLOv3, the filter number of the output layers is
box number×(C+4). In SSD, box number can be 4 or 6 for
different output layers. Thus, the filter number of the output
layers in SSD is 56 or 84 when the class number C is 10.
When training SSD, we load the trained weights of VGG-
16 as initial weights for the front layers. In the first stage of
training, the front layers are frozen with the weights from the
VGG-16 model. Then, the whole network is trainable in the
second stage of training. The training process is similar to
YOLOv3 introduced above.
IV. EXPERIMENTAL RESULTS
A. Experimental Setup
For evaluation purposes, we use Xilinx Zynq-7000 SoCs
and Xilinx Zynq UltraScale+ MPSoCs to evaluate our pro-
posed methodology. All of the experiments in this section are
performed in the following experiment setup unless otherwise
stated. The bitstream files are generated by the Xilinx Vivado
toolset without encryption. The versions of Xilinx Vivado
toolset used by this work include Vivado 2016.3, Vivado
2017.2, and Vivado 2017.4. The scripts used for transforming
bitstreams into images are running in Python 2.7.15. The
training and testing of the deep learning are running in Keras
2.2.5 based on TensorFlow 1.10.0 for GPUs, with Python
3.5.6. A server running CentOS Linux 7.6, with an NVIDIA
Tesla P100 GPU, is used to perform all of the experiments.
There are 10 kinds of function blocks to detect. The DNN
YOLOv3 is used as the deep learning-based object detection
algorithm unless otherwise stated. The DNN SSD is only used
in the experiments in Sec. IV-E6. The parameters for the
training process of YOLOv3 and SSD are listed in TABLE I.
To characterize the performance of the object detector quan-
titatively, mAP under specific IoU is used as the performance
metric, which takes into account of both precision and recall.
In general, the performance is good when the IoU for the
detected box and the ground truth is more than 0.5. In our
work, two metrics are used, imitating the COCO dataset. One
is the mAP at IoU=0.5 (mAP@0.5), which is the metric for
the PASCAL VOC dataset [32]. The other one is the mAP at
IoU=0.75 (mAP@0.75), which is stricter than mAP@0.5.
B. Bitstream Format Information
For the purpose of finding the mapping relationship used
for the representation of bitstream, this work analyzes the
bitstream format of Xilinx Zynq-7000 SoCs and Xilinx Zynq
7TABLE II
BITSTREAM FORMAT INFORMATION DETERMINED BY THE FPGA DEVICE
FAMILY, TAKING XILINX ZYNQ-7000 SOCS AND XILINX ZYNQ
ULTRASCALE+ MPSOCS AS EXAMPLES.
Device family Xilinx Zynq-7000SoCs
Xilinx Zynq
UltraScale+ MPSoCs
Number of words
in a frame (m) 101 93
Number of CLBs
in a column (q) 50 60
Number of frames
configuring a column
of CLBs (n)
36 29 for SLICEL79 for SLICEM
Number of words
in the middle of a frame
not configuring CLBs (l)
1 3
Number of words
configuring a CLB (p) 2 1.5
TABLE III
BITSTREAM FORMAT INFORMATION DETERMINED BY THE FPGA
DEVICE, TAKING XILINX ZYNQ-7000 SOC Z-7020, XILINX ZYNQ-7000
SOC Z-7030, AND XILINX ZYNQ ULTRASCALE+ MPSOC ZU9EG AS
EXAMPLES.
Device name
Xilinx
Zynq-7000
SoC Z-70201
Xilinx
Zynq-7000
SoC Z-70302
Xilinx Zynq
UltraScale+
MPSoC
ZU9EG3
Number of Clock
Regions in
programmable logic
6 8 254
Number of CLBs
in total 6,650 9,825 34,260
Number of slices
in total 13,300 19,650 34,260
Number of frames
of the FDRI data
in total
10,008 14,796 71,260
Number of words
of the FDRI data
in total
1,010,808 1,494,396 6,627,180
1 Evaluated on Xilinx Zynq-7000 SoC ZC702 Evaluation Board.
2 Evaluated on Xilinx xc7z030fbg484-3 FPGA.
3 Evaluated on Xilinx Zynq UltraScale+ ZCU102 Evaluation Board.
4 There are 28 Clock Regions in Xilinx Zynq UltraScale+ MPSoC
ZU9EG device totally. However, there are 3 Clock Regions without
programmable logic resources.
UltraScale+ MPSoCs. The analysis results of these two fam-
ilies of SoCs are listed in TABLE II. The bitstream format
information listed in TABLE II is determined by the FPGA
device family.
In order to further analyze the bitstream format information
determined by the FPGA device, this work takes three FPGA
devices, namely Xilinx Zynq-7000 SoC Z-7020, Xilinx Zynq-
7000 SoC Z-7030, and Xilinx Zynq UltraScale+ MPSoC
ZU9EG, as examples. The bitstream format information of
these three FPGA devices is listed in TABLE III. The bitstream
format information listed in TABLE II and TABLE III and the
positions of every first frame are necessary for transforming
the bitstreams into images. Similar bitstream format rules can
be found when analyzing other FPGA devices and other FPGA
device families.
TABLE IV
PARAMETERS FOR TRANSFORMING THE BITSTREAMS INTO IMAGES,
TAKING XILINX ZYNQ-7000 SOC Z-7020, XILINX ZYNQ-7000 SOC
Z-7030, AND XILINX ZYNQ ULTRASCALE+ MPSOC ZU9EG AS
EXAMPLES.
Device name
Xilinx
Zynq-7000
SoC Z-7020
Xilinx
Zynq-7000
SoC Z-7030
Xilinx Zynq
UltraScale+
MPSoC
ZU9EG
Image size
of a slice 6×8×3 6×8×3
7×9×3
for SLICEL,
7×23×3
for SLICEM
Image size
of a CLB 6×16×3 6×16×3
7×9×3
for SLICEL,
7×23×3
for SLICEM
Number of
blocks the
device diagram
divided (a×b)
150×57 200×60
420×97
(46 for SLICEL,
51 for SLICEM)
Size of the
image-coded
representation
(bytes)
900×912×3 1200×960×3 2940×1587×3
Bitstream
length (bits) 32,364,512 47,839,328 212,086,240
Bitstream
length, rounded
up (MiB)
3.86 5.70 25.28
Compression
ratio (%) 60.86 57.79 52.80
C. Bitstream Representation
According to the bitstream format analysis results in
Sec. IV-B, the bitstreams are transformed into images by the
proposed image-coded representation of bitstream. Taking Xil-
inx Zynq-7000 SoC Z-7020, Xilinx Zynq-7000 SoC Z-7030,
and Xilinx Zynq UltraScale+ MPSoC ZU9EG as examples,
the parameters for transforming the bitstreams into images are
listed in TABLE IV.
For instance, the bitstream length of ZC702 FPGA is 3.86
MiB. The size of the image-coded representation of bit-
stream is 900×912×3 bytes. So the representation compresses
effectively the image to (900×912×3×8 bits/32,364,512
bits)×100%=60.86% of the original bitstream data. The effec-
tive compression mainly benefits from the drop of the configu-
ration bits irrelevant to the logic resources. The parameters for
transforming the bitstreams into images are set differently for
the different FPGA devices and determined by the bitstream
format information of FPGA devices.
The Vivado implemented designs and the image-coded
representation of bitstream of Xilinx Zynq-7000 SoC Z-7020,
Xilinx Zynq-7000 SoC Z-7030, and Xilinx Zynq UltraScale+
MPSoC ZU9EG are shown in Fig. 7. The dark blue area in
the Vivado implemented design represents columns of CLBs,
the configuration bits of which are used for the bitstream
representation. The corresponding relationship between the
Vivado implemented design (shown in Fig. 7 (a), (c), and (e))
and the image-coded representation of bitstream (shown in
Fig. 7 (b), (d), and (f)) for the same FPGA device proves
that the mapping relationship between configuration bits and
CLBs found by this work is correct and the image-coded
8(a) (b)
(c) (d)
(e) (f)
Fig. 7. Vivado implemented designs and the image-coded representation.
(a) Vivado implemented design and (b) the image-coded representation
of bitstream of Xilinx Zynq-7000 SoC Z-7020. (c) Vivado implemented
design and (d) the image-coded representation of bitstream of Xilinx Zynq-
7000 SoC Z-7030. (e) Vivado implemented design and (f) the image-coded
representation of bitstream of Xilinx Zynq UltraScale+ MPSoC ZU9EG.
representation of bitstream can reflect the adjacency of the
programmable logic.
In summary, the two challenges mentioned in Sec. III-C
have been overcome by the proposed image-coded represen-
tation of bitstream.
D. Dataset Description
For the purpose of training and testing the DNN models, a
large number of bitstreams, which implement FPGA designs
on ZC702 FPGA, are generated to make up the dataset. There
are 15 kinds of application-specific encryption algorithms cho-
sen for generating 15,104 bitstream files and these encryption
algorithms contain 10 kinds of cryptographic operators. The
application-specific encryption algorithms used for generating
the dataset and the cryptographic operators contained are
listed in TABLE V. Each encryption algorithm used in this
work contains up to 3 kinds of cryptographic operators. Each
kind of cryptographic operator is implemented in one or two
constructions. The pipeline means the cryptographic operator
is implemented in a pipeline design, and the module means
the cryptographic operator is implemented without special
designs. For example, a bitstream implementing the encryption
algorithm used for NTLM (NT LAN Manager) contains an
MD4 pipeline or an MD4 module. A bitstream implementing
the encryption algorithm used for PDF-R2 contains an MD5
(Message Digest Algorithm 5) pipeline and an RC4 (Rivest
Cipher 4) module.
In order to arrange the experiments reasonably, 13 kinds of
encryption algorithms in TABLE V are chosen to make up
the training set and the test set, including 10,047 bitstreams
generated by Xilinx Vivado 2017.4 totally. The bitstreams for
training and testing are divided into the training set and the
test set randomly by 4:1. Two kinds of encryption algorithms,
used for PDF-R5 and OFFICE 2010, implemented by Xilinx
Vivado 2017.4 are just used for testing. Besides, to explore
the effect of EDA tools, the bitstreams generated by Xilinx
Vivado 2016.3 and Xilinx Vivado 2017.2 are used to test
the performance of the trained model. These bitstreams are
transformed into images to make up the dataset for training
and testing in Sec. IV-E.
E. Function Block Detection
The experimental results of function block detection are
evaluated on ZC702 FPGA. Firstly, the evaluation results on
the test set with the same distribution as the training set are
shown, as well as the evaluation results on the encryption al-
gorithms not appearing in the training set. Then, the effects of
EDA tools, input size of the DNN, and the data arrangement of
the image-coded representation are discussed. Finally, another
deep learning-based object detection algorithm SSD is applied
and the performance of SSD is presented.
1) Evaluation results on the test set: The first 13 kinds
of encryption algorithms listed in TABLE V are chosen to
make up the training set and the test set. Since the images
transformed from the bitstreams of the 13 kinds of encryption
algorithms are divided into the training set and the test set
randomly by 4:1, the test set has the same distribution as the
training set.
In this experiment, we train our detector on the training set
for around 16 hours and test on the test set with the same
distribution as the training set. The function block detection
result of a bitstream file, which implements the encryption
algorithm used for PDF-R2 on ZC702 FPGA, is shown in
9TABLE V
COMPONENTS OF THE TRAINING SET AND TEST SET.
Vivado
version
Applications of
encryption
algorithms
Number of
cryptographic
operators
contained
Cryptographic
operators Implementation constructions
Number of
bitstreams
For training
and testing
2017.4
LM 1 DES DES pipeline; DES module 1,294
NTLM 1 MD4 MD4 pipeline; MD4 module 1,135
PDF-R2 2 MD5, RC4 MD5 pipeline, RC4 module 2,252
OFFICE 2003 2 MD5, RC4 MD5 pipeline, RC4 module 419
TrueCrypt 3 AES, Serpent, Twofish AES module, Serpent module, Twofish module 1,842
OFFICE 2007 1 SHA-1 SHA-1 pipeline 429
WPA-2 1 SHA-1 SHA-1 pipeline 451
WINZIP 1 SHA-1 SHA-1 pipeline 542
RAR3 1 SHA-1 SHA-1 module 165
RAR5 1 SHA-256 SHA-256 pipeline 245
7ZIP 1 SHA-256 SHA-256 pipeline 242
OFFICE 2013 1 SHA-512 SHA-512 module 303
LINUX-SHA512 1 SHA-512 SHA-512 module 728
Total 10,047
For testing only
2016.3 PDF-R2 2 MD5, RC4 MD5 pipeline, RC4 module 587PDF-R5 2 MD5, RC4 MD5 pipeline, RC4 module 1,107
2017.2 PDF-R2 2 MD5, RC4 MD5 pipeline, RC4 module 587PDF-R5 2 MD5, RC4 MD5 pipeline, RC4 module 1,407
2017.4 PDF-R5 2 MD5, RC4 MD5 pipeline, RC4 module 1,107OFFICE 2010 1 SHA-1 SHA-1 pipeline 232
Total 5,057
Fig. 8. The function block detection result of a bitstream file implementing
the encryption algorithm used for PDF-R2 on ZC702 FPGA.
Fig. 8, as an example. The function blocks included in this
image are marked with boxes. Each box is labeled with the
category and a classification probability.
TABLE VI shows the evaluation results under the metrics
mentioned in Sec. IV-A. The performance of the detector is
evaluated quantitatively on the test set. AP (Average Preci-
sion) @0.5 of 10 kinds of cryptographic operators are all
beyond 85.33% and mAP@0.5 reaches 97.72%. Even under
the stricter metric, mAP@0.75 reaches 95.87%. It is evident
that the detector has a good detection performance on the test
set with the same distribution as the training set.
The image-coded representation of bitstream can reflect
the resource utilization of function blocks and the adjacency
of the CLBs used. Different kinds of function blocks are
different in these two aspects. The SHA-256 module nearly
takes over all of the LUT resources of ZC702 FPGA. However,
the RC4 module only occupies no more than 1% of the FF
resources and approximately 3% of the LUT resources. When
implemented in the same construction, two function blocks of
the same kind are similar but appear in different locations in
different images. Since the image-coded representation of bit-
stream keeps some characteristics of function blocks that can
distinguish one kind of function block from another one, the
proposed image-coded representation of bitstream is proved
effective and applicable for generating a dataset for deep
learning. The performance of the detector also demonstrates
that YOLOv3 learns successfully how to detect the function
blocks from the images.
Some kinds of cryptographic operators are well detected,
such as SHA-1 and SHA-256. It is because these kinds of
cryptographic operators occupy a large area of the images. It
is not difficult to detect a box with an IoU over 0.5 or 0.75
with the ground truth.
Some kinds of cryptographic operators have relatively low
AP, such as AES and Serpent. The reasons are as follows: 1)
These kinds of cryptographic operators always occupy a small
number of resources of FPGA. It is difficult to distinguish
one function block with high resource utilization and the
combination of several function blocks with low resource
utilization. 2) There is more than one function block in an
image. It is more difficult to detect a function block from a
system with various function blocks than from a system with
a single function block. The boundary between two function
blocks in an image is hard to distinguish.
2) Effectiveness of detecting cryptographic operators: In
this experiment, the trained model is tested on the bitstream
files implementing the encryption algorithms, which do not
10
TABLE VI
EVALUATION RESULTS ON THE TEST SET WITH THE SAME DISTRIBUTION AS THE TRAINING SET.
Function blocks mAP AES DES MD4 MD5 RC4 Serpent SHA-1 SHA- 256 SHA- 512 Twofish
AP@0.5 (%) 97.72 93.61 99.92 100.00 100.00 99.52 85.33 100.00 100.00 98.78 100.00
AP@0.75 (%) 95.87 77.52 99.76 99.98 99.23 99.26 83.85 100.00 100.00 99.13 100.00
TABLE VII
EVALUATION RESULTS ON THE ENCRYPTION ALGORITHMS NOT
APPEARING IN THE TRAINING SET.
Function blocks AP@0.5 (%) AP@0.75 (%)
PDF-R5 (1) MD5 pipeline 100.00 99.60(2) RC4 module 97.13 94.98
OFFICE 2010 (3) SHA-1 pipeline 100.00 100.00
appear in the training set, to confirm the capability to detect
cryptographic operators. The training process is the same as
mentioned in Sec. IV-E1. We choose 1,339 bitstreams for
testing in this experiment, which implement two kinds of
encryption algorithms used for PDF-R5 and OFFICE 2010
and are generated by Vivado 2017.4. The function blocks in
these bitstreams have the same implementation constructions
as the ones in the training set.
TABLE VII shows the results of this experiment. The
evaluation results on the encryption algorithms used for PDF-
R5 and OFFICE 2010 show that the detector also has a
good performance on the encryption algorithms not appearing
in the training set, because the cryptographic operators in
these encryption algorithms have the same implementation
constructions as the ones in the training set. TABLE VI and
TABLE VII demonstrate that the detector has the capability
to detect the function blocks with the same constructions in
the training set, no matter whether the encryption algorithms
appear in the training set or not.
Although the RC4 module has appeared in the training
set, the small occupied area of the RC4 module accounts for
the relatively low AP. On the contrary, the SHA-1 pipeline
occupies a large area. And the detector has the capability to
distinguish SHA-1 from other cryptographic operators, which
also have a large area, such as SHA-256. Therefore, the SHA-1
pipeline has a really high AP.
3) Effect of EDA tools: In this experiment, the model is
trained on the bitstreams generated by Vivado 2017.4 and
tested on the bitstreams generated by the other version of
Xilinx Vivado, namely Vivado 2016.3 and Vivado 2017.2.
The training process is the same as mentioned in Sec. IV-E1.
The bitstreams for testing implementing encryption algorithms
used for PDF-R2 and PDF-R5, contain the function blocks
with the same implementation constructions as the ones in
the training set. The encryption algorithm used for PDF-R2 is
included in the training set. However, the encryption algorithm
used for PDF-R5 is not included. This experiment is set up to
explore the effect of the EDA tools. Although the EDA tools
are provided by the FPGA vendors, the corresponding EDA
tools are updated continually.
The evaluation results are shown in Fig. 9. The evaluation
results on bitstreams generated by Vivado 2016.2 and Vivado
1 0 0 . 0 0 1 0 0 . 0 0 1 0 0 . 0 0 9 9 . 6 4 1 0 0 . 0 0 1 0 0 . 0 0
8 2 . 2 1 8 8 . 7 9
9 5 . 5 1 8 7 . 3 8
9 9 . 5 2 9 7 . 1 3
V i v a d o  2 0 1 6 . 3  P D F - R 2
V i v a d o  2 0 1 6 . 3  P D F - R 5
V i v a d o  2 0 1 7 . 2  P D F - R 2
V i v a d o  2 0 1 7 . 2  P D F - R 5
V i v a d o  2 0 1 7 . 4  t e s t  s e t
V i v a d o  2 0 1 7 . 4  P D F - R 5
0
2 0
4 0
6 0
8 0
1 0 0
AP@
0.5(
%)
 M D 5 R C 4
(a)
9 7 . 4 3 9 8 . 9 6 9 5 . 3 9 9 6 . 5 6 9 9 . 2 3 9 9 . 6 0
8 3 . 5 5 8 5 . 4 3 9 3 . 7 0 8 4 . 8 8
9 9 . 2 6 9 4 . 9 8
V i v a d o  2 0 1 6 . 3  P D F - R 2
V i v a d o  2 0 1 6 . 3  P D F - R 5
V i v a d o  2 0 1 7 . 2  P D F - R 2
V i v a d o  2 0 1 7 . 2  P D F - R 5
V i v a d o  2 0 1 7 . 4  t e s t  s e t
V i v a d o  2 0 1 7 . 4  P D F - R 5
0
2 0
4 0
6 0
8 0
1 0 0
AP@
0.75
(%)
 M D 5 R C 4
(b)
Fig. 9. Evaluation results on the bitstreams generated by different EDA tools.
(a) The AP@0.5 and (b) the AP@0.75 of MD5 and RC4 evaluated on the
bitstreams generated by different EDA tools.
2017.2 are compared with the evaluation results on bitstreams
generated by Vivado 2017.4. The AP of MD5 is not affected
by the change of EDA tool. The AP of RC4 evaluated on
other EDA tools, namely Vivado 2016.3 and Vivado 2017.2, is
lower than the AP evaluated on Vivado 2017.4. Although there
are some differences in placing and routing when optimizing
the same design on the different versions of Vivado toolset,
there is no big difference in the resource utilization of function
blocks and the effect of EDA tools on the detection accuracy is
really slight. It is evident that the model trained on bitstreams
generated by Vivado 2017.4 can detect the function blocks
from the bitstreams generated by Vivado 2016.3 and Vivado
2017.2, although the detection accuracy of some function
blocks may decrease due to the really low resource utilization.
4) Effect of input size: In this experiment, there are five
models trained with different input sizes of YOLOv3. The
purpose of this experiment is to explore the effect of the
input sizes of YOLOv3 on detection accuracy. The images are
resized to the input size of YOLOv3 before fed into the DNN.
Since YOLOv3 is a fully convolutional network without any
fully connected layers, the changes in the input size will not
change the number of weight parameters in every layer. The
hyperparameters of the models with different input sizes are
the same as the model trained in Sec. IV-E1. The input size of
the model in Sec. IV-E1 is 416×416×3. The training set and
test set are the same as set up in Sec. IV-E1 and Sec. IV-E2.
The bitstreams in this experiment are all generated by Vivado
11
9 4 . 2 3 9 7 . 1 5 9 6 . 0 7 9 7 . 7 2 9 6 . 4 59 2 . 0 0 9 5 . 5 9 9 4 . 8 3 9 5 . 8 7 9 1 . 3 5
2 2 4 × 2 2 4 × 3 2 8 8 × 2 8 8 × 3 3 5 2 × 3 5 2 × 3 4 1 6 × 4 1 6 × 3 6 0 8 × 6 0 8 × 30
2 0
4 0
6 0
8 0
1 0 0
I n p u t  s i z e  o f  Y O L O v 3
mA
P(%
)
 m A P @ 0 . 5 m A P @ 0 . 7 5
(a)
9 9 . 4 0 1 0 0 . 0 0 1 0 0 . 0 0 1 0 0 . 0 0 9 9 . 9 88 9 . 8 0 9 5 . 8 9 9 8 . 9 7 9 7 . 1 3 9 9 . 6 2
2 2 4 × 2 2 4 × 3 2 8 8 × 2 8 8 × 3 3 5 2 × 3 5 2 × 3 4 1 6 × 4 1 6 × 3 6 0 8 × 6 0 8 × 30
2 0
4 0
6 0
8 0
1 0 0
I n p u t  s i z e  o f  Y O L O v 3
AP@
0.5 
(%)
 M D 5 R C 4
(b)
Fig. 10. Evaluation results of five models with different input sizes on (a)
the test set and (b) the encryption algorithm used for PDF-R5.
2017.4.
The mAP of 10 kinds of function blocks on the test set with
the same distribution as the training set is shown in Fig. 10
(a). The five models are also evaluated on the encryption
algorithm used for PDF-R5, which has not appeared in the
training set. The evaluation results on the encryption algorithm
used for PDF-R5 are shown in Fig. 10 (b). Considering the
evaluation results on the test set and the encryption algorithm
not appearing in the training set shown in Fig. 10, it is
demonstrated that the model with the input size of 224×224×3
has the worse performance among these five models. Except
for the model with the input size of 224×224×3, the other
four models have a similar performance. There is no need to
choose too large input size for the DNN, because the large
input size leads to large computation.
5) Effect of the order in which the configuration bits of a
slice correspond to the image: In this experiment, there are
three models trained on different image datasets. The three
image datasets are transformed from the same bitstreams by
different representation methods. The only difference among
the three representation methods is the order in which the
configuration bits of a slice correspond to the image. The first
order is (channel, height, width), as is shown in Fig. 6. The
other two orders are (height, width, channel) and (channel,
width, height), respectively. The training processes are the
same as mentioned in Sec. IV-E1. The bitstreams in this
experiment are all generated by Vivado 2017.4. The purpose
of this experiment is to explore the effect of the order in which
the configuration bits of a slice correspond to the image.
The mAP of 10 kinds of function blocks on the test set
with the same distribution as the training set is listed in
TABLE VIII. The three models are also evaluated on the
encryption algorithm used for PDF-R5, and the evaluation
results are listed in TABLE IX. It is demonstrated that the
order, in which the configuration bits of a slice correspond to
TABLE VIII
EVALUATION RESULTS OF THREE MODELS WITH DIFFERENT ORDERS, IN
WHICH THE CONFIGURATION BITS OF A SLICE CORRESPOND TO THE
IMAGE, ON THE TEST SET.
The order mAP@0.5 (%) mAP@0.75 (%)
(channel, height, width) 97.72 95.87
(height, width, channel) 97.46 95.28
(channel, width, height) 97.64 95.71
TABLE IX
EVALUATION RESULTS OF THREE MODELS WITH DIFFERENT ORDERS, IN
WHICH THE CONFIGURATION BITS OF A SLICE CORRESPOND TO THE
IMAGE, ON THE ENCRYPTION ALGORITHM USED FOR PDF-R5.
The order Functionblocks
AP@0.5
(%)
AP@0.75
(%)
(channel, height, width) MD5 100.00 99.51RC4 97.13 94.99
(height, width, channel) MD5 100.00 99.66RC4 96.90 95.39
(channel, width, height) MD5 100.00 99.52RC4 96.44 94.38
TABLE X
EVALUATION RESULTS OF YOLOV3 AND SSD ON THE TEST SET WITH
THE SAME DISTRIBUTION AS THE TRAINING SET.
Deep neural network
(Input size) mAP@0.5 (%) mAP@0.75 (%)
YOLOv3 (416×416×3) 97.72 95.87
SSD (300×300×3) 96.34 81.54
the image, has almost no effect on the detection accuracy. It
is inferred that as long as the representation of bitstream can
reflect whether each slice is used or not, the detection accuracy
will not be affected.
6) Application of other deep learning-based object algo-
rithms to function block detection: In this experiment, SSD is
applied to bitstream function block detection to demonstrate
the effectiveness of the methodology. In the methodology, it is
not necessary to choose YOLOv3 as the deep learning-based
object detection algorithm. The training set and test set are the
same as used in Sec. IV-E1. The bitstreams are generated by
Vivado 2017.4.
The evaluation results of SSD on the test set are listed in
TABLE X, together with the evaluation results of YOLOv3 in
Sec. IV-E1. The evaluation results of SSD show that SSD also
has the capability to detect function blocks. The methodology
has generality to some degree and the other deep learning-
based object detection algorithms with high performance can
be applied. Compared with the performance of YOLOv3,
the performance of SSD is evidently lower than YOLOv3.
mAP@0.5 of SSD is slightly low than mAP@0.5 of YOLOv3.
However, mAP@0.75 of SSD is much lower than YOLOv3.
It is shown that the location accuracy of SSD is poor in this
scenario.
F. Processing Time
Taking Xilinx Zynq-7000 SoC Z-7020, Xilinx Zynq-7000
SoC Z-7030, and Xilinx Zynq UltraScale+ MPSoC ZU9EG as
12
3 . 8 6 5 . 7 0
2 5 . 2 8
3 . 4 4 5 . 2 1
2 7 . 8 9
X i l i n x  Z y n q - 7 0 0 0  S o C  Z - 7 0 2 0 X i l i n x  Z y n q - 7 0 0 0  S o C  Z - 7 0 3 0 X i l i n x  Z y n q  U l t r a S c a l e +  M P S o C  Z U 9 E G
05
1 01 5
2 02 5
3 0  B i t s t r e a m  l e n g t h P r o c e s s i n g  t i m e
F P G A  d e v i c e
Bits
trea
m le
ngth
 (M
iB)
05
1 01 5
2 02 5
3 0
 Pro
cess
ing 
time
 (s)
Fig. 11. The processing time of transforming a bitstream into an image on a
single Intel Xeon Gold 5118 CPU@2.30GHz.
0 . 0 6 2 2 0 . 0 6 9 6 0 . 0 7 2 4 0 . 0 8 0 1
0 . 1 1 2 3
2 2 4 × 2 2 4 × 3 2 8 8 × 2 8 8 × 3 3 5 2 × 3 5 2 × 3 4 1 6 × 4 1 6 × 3 6 0 8 × 6 0 8 × 30
0 . 0 2
0 . 0 4
0 . 0 6
0 . 0 8
0 . 1
0 . 1 2
Pro
cess
ing 
time
/ima
ge (
s)
I n p u t  s i z e  o f  Y O L O v 3
Fig. 12. The processing time per image of the YOLOv3 inference process
on an NVIDIA Tesla P100 GPU.
examples, the processing time of transforming a bitstream into
an image is reported in Fig. 11, together with the bitstream
length. The processing time is measured on a single Intel Xeon
Gold 5118 CPU@2.30GHz. It is evident that the processing
time of transforming a bitstream into an image is almost
proportional to the bitstream length, which is determined by
the FPGA device.
As is shown in Fig. 12, the processing time per image of
the YOLOv3 inference process varies with the input size of
the DNN. Before fed into the DNN, the images are resized
to the input size of the DNN. Therefore, the processing time
of YOLOv3 inference has no relationship with the size of
the image-coded representation of bitstream. The confidence
threshold is set as 0.5 and the IoU threshold is set as 0.45 for
inference. The processing time is measured on an NVIDIA
Tesla P100 GPU. It is shown that the processing time of
YOLOv3 inference increases as the input size of YOLOv3
increases. Besides, the processing time of the SSD inference
process is 0.0803s per image, also measured on an NVIDIA
Tesla P100 GPU. The thresholds are set the same as YOLOv3.
The input size of SSD is 300×300×3.
G. Recap
Based on the above experimental results and analysis, the
effectiveness of the proposed bitstream function block detec-
tion methodology is proved. The following points summarize
the insights from the experimental results:
1) Similar bitstream format rules can be found in several
FPGA devices by bitstream format analysis, which is the
basis of the image-coded representation of bitstream. The
image-coded representation can reflect the adjacency of the
programmable logic and the image-coded representation has a
suitable size without losing useful information.
2) The deep learning-based object detection algorithm has
the capability to detect the function blocks with the same
constructions in the training set, no matter whether the system
designs appear in the training set or not.
3) The model trained on the bitstreams generated by one
version of Xilinx Vivado can also detect function blocks from
the bitstreams generated by other versions of Xilinx Vivado.
4) The model with too small input size has bad detection
accuracy. The model with too large input size has no accuracy
improvement while sacrificing the computation cost.
5) The order in which the configuration bits of a slice
correspond to the image has almost no effect on the detection
accuracy, as long as the image-coded representation of a slice
can reflect whether the slice is used or not.
6) In the methodology, other deep learning-based object
detection algorithms with high performance can also be chosen
to detect function blocks from the bitstream.
V. RELATED WORKS
In this section, some related works about bitstream format
analysis, bitstream reverse engineering, and deep learning-
based circuit classification are presented.
An FPGA bitstream contains the programming information
for an FPGA device, which configures the programmable logic
into the FPGA. Because of the lack of disclosed information
about the bitstream format from FPGA vendors, many works
have analyzed the format of bitstream [1], [33], [34], [35].
Ziener et al. [1] extracted the content of LUTs in the bitstream
of Xilinx Virtex-II and Virtex-II Pro FPGAs to identify IP
cores in the FPGAs. Le Roux et al. [33] analyzed the bitstream
of Xilinx Virtex-5 FPGAs to manipulate the configuration
bits of LUTs for the purpose of reconfiguring the FPGAs in
real-time. There are also some related works analyzing the
bitstream format of the later Xilinx 7-series FPGAs [34], [35].
Dang Pham et al. [34] provided a tool called BITMAN that
supports bitstream manipulations, such as module placement,
module relocation, and so on. COMET [35] is a tool support-
ing bitstream analysis, visualization, and manipulation. The
manipulation of bitstream is to provide means to perform
partial reconfiguration or fault injection.
There are many works implementing FPGA bitstream re-
verse engineering based on bitstream analysis. Some of them
reverse the bitstream to a Xilinx Design Language (XDL) level
representation of the netlist or a Native Circuit Description
(NCD) file [11], [12], [13]. Some of them further reverse
the netlist file to Register Transfer Level (RTL) code [14].
These works analyze the bitstream format and gather databases
containing the mapping relationship from the configuration
bits in the bitstream to their related configurable elements.
Reverse engineering is implemented through these databases.
Bitstream reverse engineering aims at performing analysis or
detecting hardware Trojan based on netlist or RTL code.
Dai et al. [3] put forward a CNN method for arithmetic
operator classification and detection from gate-level circuits
rather than bitstream-level, and discussed the importance of
representation of circuits for CNN processing. Fayyazi et
al. [4] also presented a CNN-based gate-level circuit recogni-
tion method and used a vector-based representation for the
CNN processing. The method can be used to detect the
13
hardware trojans or classify the circuits of different arithmetic
operators, such as an adder and an multiplier. Mahmood et
al. [5] proposed judging whether a partial bitstream contains a
hardware module implementing an add operation using neural
networks. Their work lacks the analysis and process of the
raw bitstream data. Neto et al. [36] also took advantage of the
latest progress of DNNs in image classification by proposing
a binary image-like circuit representation of Boolean logic
function. However, the work in [36] just utilized the DNNs
to choose the best optimization method for the partitioned
circuits. And the binary image-like representation of Boolean
logic function in [36] is different with the three-channel color
image-coded representation in this paper, which is proposed
to represent the configuration bits in the bitstream.
VI. CONCLUSIONS
In this paper, we have proposed an FPGA bitstream func-
tion block detection method built upon the deep learning
techniques. At first, we analyze the bitstream format and
find the mapping relationship between the configuration bits
and the CLB elements. Then, we propose a three-channel
color image-coded representation of bitstream, which reflects
the adjacency of the programmable logic and transforms the
bitstreams into images suitable for deep learning processing.
A dataset is generated without manual annotation, in which
there are 15,104 images transformed from bitstreams. The
object detection algorithm based on deep learning is applied to
detecting function blocks from FPGA bitstreams by training on
this dataset. The process of dataset generation, deep learning
training and testing could be implemented by scripts automat-
ically. Experimental results show that mAP (IoU=0.5) for 10
kinds of function blocks reaches 97.72% when using YOLOv3
as the object detector. The detector is also demonstrated to
have the capability to detect the function blocks from the
bitstreams implementing the system designs not appearing in
the training set or from the bitstreams generated by other EDA
tools.
REFERENCES
[1] D. Ziener, S. Assmus, and J. Teich, “Identifying FPGA IP-cores based
on lookup table content analysis,” in International Conference on Field
Programmable Logic and Applications (FPL), Aug. 2006, pp. 1–6.
[2] J. Couch, E. Reilly, M. Schuyler, and B. Barrett, “Functional block
identification in circuit design recovery,” in IEEE International Sympo-
sium on Hardware Oriented Security and Trust (HOST), May. 2016, pp.
75–78.
[3] Y.-Y. Dai and R. K. Brayton, “Circuit recognition with deep learning,”
in IEEE International Symposium on Hardware Oriented Security and
Trust (HOST), May. 2017, pp. 162–162.
[4] A. Fayyazi, S. Shababi, P. Nuzzo, S. Nazarian, and M. Pedram, “Deep
learning-based circuit recognition using sparse mapping and level-
dependent decaying sum circuit representations,” in Design, Automation
Test in Europe Conference Exhibition (DATE), Mar. 2019, pp. 638–641.
[5] S. Mahmood, J. Rettkowski, A. Shallufa, M. Hu¨bner, and D. Go¨hringer,
“IP core identification in FPGA configuration files using machine
learning techniques,” in IEEE International Conference on Consumer
Electronics (ICCE-Berlin), Sep. 2019, pp. 103–108.
[6] P. Liu, S. Li, and Q. Ding, “An energy-efficient accelerator based
on hybrid CPU-FPGA devices for password recovery,” IEEE Trans.
Comput., vol. 68, no. 2, pp. 170–181, Feb. 2019.
[7] S. E. Quadir, J. Chen, D. Forte, N. Asadizanjani, S. Shahbazmohamadi,
L. Wang, J. A. Chandy, and M. M. Tehranipoor, “A survey on chip
to system reverse engineering,” ACM J. Emerg. Technol. Comput. Syst.,
vol. 13, no. 1, pp. 1–34, Apr. 2016.
[8] A. Moradi, A. Barenghi, T. Kasper, and C. Paar, “On the vulnerability
of FPGA bitstream encryption against power analysis attacks: Extracting
keys from Xilinx Virtex-II FPGAs,” in ACM Conference on Computer
and Communications Security (CCS), Oct. 2011, pp. 111–124.
[9] S. Tajik, H. Lohrke, J. Seifert, and C. Boit, “On the power of optical
contactless probing: Attacking bitstream encryption of FPGAs,” in ACM
Conference on Computer and Communications Security (CCS), Nov.
2017, pp. 1661–1674.
[10] M. Ender, A. Moradi, and C. Paar, “The unpatchable silicon: A full
break of the bitstream encryption of Xilinx 7-Series FPGAs,” in USENIX
Security Symposium, Aug. 2020.
[11] J. Note and E´. Rannaud, “From the bitstream to the netlist,” in Inter-
national Symposium on Field Programmable Gate Arrays (FPGA), Feb.
2008, pp. 264–264.
[12] F. Benz, A. Seffrin, and S. A. Huss, “Bil: A tool-chain for bit-
stream reverse-engineering,” in International Conference on Field Pro-
grammable Logic and Applications (FPL), Aug. 2012, pp. 735–738.
[13] Z. Ding, Q. Wu, Y. Zhang, and L. Zhu, “Deriving an NCD file
from an FPGA bitstream: Methodology, architecture and evaluation,”
Microprocessors and Microsystems, vol. 37, no. 3, pp. 299–312, May.
2013.
[14] T. Zhang, J. Wang, S. Guo, and Z. Chen, “A comprehensive FPGA
reverse engineering tool-chain: From bitstream to RTL code,” IEEE
Access, vol. 7, pp. 38 379–38 389, 2019.
[15] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT press,
2016.
[16] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,” in
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Jun. 2014, pp. 580–587.
[17] R. B. Girshick, “Fast R-CNN,” in IEEE International Conference on
Computer Vision (ICCV), Dec. 2015, pp. 1440–1448.
[18] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards real-
time object detection with region proposal networks,” in Conference on
Neural Information Processing Systems (NIPS), Dec. 2015, pp. 91–99.
[19] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only
look once: Unified, real-time object detection,” in IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 779–
788.
[20] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Jul. 2017, pp. 6517–6525.
[21] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,”
arXiv preprint arXiv:1804.02767, 2018.
[22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C.
Berg, “SSD: Single shot multibox detector,” in European Conference on
Computer Vision (ECCV), Oct. 2016, pp. 21–37.
[23] Xilinx. (2018) Zynq-7000 SoC technical reference manual (UG585).
[Online]. Available: https://www.xilinx.com/support/documentation/use
r guides/ug585-Zynq-7000-TRM.pdf
[24] Xilinx. (2017) UltraScale architecture configurable logic block user
guide (UG574). [Online]. Available: https://www.xilinx.com/support/d
ocumentation/user guides/ug574-ultrascale-clb.pdf
[25] Xilinx. (2018) 7 series FPGAs configuration user guide (UG470).
[Online]. Available: https://www.xilinx.com/support/documentation/use
r guides/ug470 7Series Config.pdf
[26] Xilinx. (2020) UltraScale architecture configuration user guide (UG570).
[Online]. Available: https://www.xilinx.com/support/documentation/use
r guides/ug570-ultrascale-configuration.pdf
[27] Xilinx. (2018) Vivado design suite Tcl command reference guide
(UG835). [Online]. Available: https://www.xilinx.com/support/docum
entation/sw manuals/xilinx2017 4/ug835-vivado-tcl-commands.pdf
[28] B. Alexe, T. Deselaers, and V. Ferrari, “Measuring the objectness of
image windows,” vol. 34, no. 11, pp. 2189–2202, Nov. 2012.
[29] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
P. Dolla´r, and C. L. Zitnick, “Microsoft COCO: Common objects in
context,” in European Conference on Computer Vision (ECCV), Sep.
2014, pp. 740–755.
[30] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in International Conference on Learning
Representations (ICLR), May. 2015.
[31] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
in International Conference on Learning Representations (ICLR), May.
2015.
[32] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisser-
man, “The Pascal Visual Object Classes (VOC) challenge,” International
Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010.
14
[33] R. le Roux, G. van Schoor, and P. van Vuuren, “Parsing and analysis
of a Xilinx FPGA bitstream for generating new hardware by direct bit
manipulation in real-time,” South African Computer Journal, vol. 31,
pp. 80–102, Jul. 2019.
[34] K. Dang Pham, E. L. Horta, and D. Koch, “BITMAN: A tool and API for
FPGA bitstream manipulations,” in Design, Automation Test in Europe
Conference Exhibition (DATE), Mar. 2017, pp. 894–897.
[35] L. Bozzoli and L. Sterpone, “COMET: A configuration memory tool to
analyze, visualize and manipulate FPGAs bitstream,” in International
Conference on Architecture of Computing Systems (ARCS) Workshop,
Apr. 2018, pp. 1–4.
[36] W. L. Neto, M. Austin, S. Temple, L. G. Amaru`, X. Tang, and P. Gail-
lardon, “LSOracle: A logic synthesis framework driven by artificial
intelligence: Invited paper,” in International Conference on Computer-
Aided Design (ICCAD), Nov. 2019, pp. 1–6.
