We designed and implemented a deep learning based RF signal classifier on the Field Programmable Gate Array (FPGA) of an embedded software-defined radio platform, DeepRadio TM , that classifies the signals received through the RF front end to different modulation types in real time and with low power. This classifier implementation successfully captures complex characteristics of wireless signals to serve critical applications in wireless security and communications systems such as identifying spoofing signals in signal authentication systems, detecting target emitters and jammers in electronic warfare (EW) applications, discriminating primary and secondary users in cognitive radio networks, interference hunting, and adaptive modulation. Empowered by low-power and low-latency embedded computing, the deep neural network runs directly on the FPGA fabric of DeepRadio TM , while maintaining classifier accuracy close to the software performance. We evaluated the performance when another SDR (USRP) transmits signals with different modulation types at different power levels and DeepRadio TM receives the signals and classifies them in real time on its FPGA. A smartphone with a mobile app is connected to DeepRadio TM to initiate the experiment and visualize the classification results. With real radio transmissions over the air, we show that the classifier implemented on DeepRadio TM achieves high accuracy with low latency (microsecond per sample) and low energy consumption (microJoule per sample), and this performance is not matched by other embedded platforms such as embedded graphics processing unit (GPU).
I. INTRODUCTION
The wireless communication environment is often characterized by high mobility, channel uncertainty, growth in traffic demands, and vulnerability to jamming. New agile technology solutions are much needed to characterize the spectrum and ensure reliable communications in such a fastpaced dynamic environment. Furthermore, spectrum resources are scarce across time, frequency, and space dimensions, and often shared among a variety of users and applications such as sensing, communications, and electronic warfare (EW). Cognitive radio has emerged as the enabling technology to make efficient use of spectrum resources and support adaptation of wireless communications in highly dynamic environments [1] . Supported by flexible software-defined radio (SDR) designs and implementations [2] , cognitive radio finds both commer-This effort is partially supported by the U.S. Army Research Office under contract W911NF-17-C-0090. The content of the information does not necessarily reflect the position or the policy of the U.S. Government, and no official endorsement should be inferred. cial and tactical applications in conventional and emerging communications systems.
Cognitive radio performs various tasks such as spectrum sensing and dynamic spectrum access (DSA) for situational awareness and spectrum agility. To support these tasks, machine learning provides automated means to learn from and adapt to spectrum dynamics [3] , [4] . In particular, deep learning can process raw spectrum data and effectively operate on the latent representations by capturing and analyzing highdimensional and dynamic spectrum data that conventional feature-based machine learning algorithms fail to grasp.
Various waveform, channel, traffic, and interference effects, each with its own complex structures that quickly change over time, shape the spectrum [5] . One critical step to assess the spectrum in terms of resources or vulnerabilities is to classify wireless signals. Signal classification is a key step of various tasks in wireless security and communications such as jamming/anti-jamming, device/RF fingerprinting, signal authentication, perimeter security, and interference hunting. Therefore, there is a growing interest in applying deep learning to signal classification such as modulation recognition [6] . Deep learning was applied to signals collected over the air [7] . However, the processing was confined to a host computer, which results in longer processing delay and limits it portability to embedded platforms.
Wireless systems operate with high data rates, ranging from MHz in LTE to GHz in 5G systems. Therefore, highvolume data samples (in I/Q form) arrive at RF receivers to be processed. The latency to move the data to a host processor for deep learning is not feasible in real-time operations such as those that need to make a quick decision in microseconds time frame. While missing, a real-time embedded implementation of deep neural networks on the SDR is ultimately needed to support the embedded applications that run on stand-alone SDRs.
To fill this gap, we designed and implemented a deep neural network based classifier for signal classification on the Field Programmable Gate Array (FPGA) of DeepRadio TM . Without using any other host processor or deep learning accelerator (such as [8] ), the deep neural network runs directly on the FPGA fabric. This embedded solution provides high accuracy, low latency, and low power. In particular, we show the following novel capabilities for RF signal classification:
• low-latency (microseconds), matching the speeds of RF spectrum dynamics in terms of channel, interference, and 
traffic,
• low energy consumption (microJoule), resulting in a longer battery lifetime (translated to longer network lifetime) and high portability, and
• high accuracy, approaching the limits of floating-point software operation. Note that our goal is not to introduce another SDR. We use DeepRadio TM as an FPGA-based RF platform and provide a flexible design that can be ported to any other SDR that is equipped with FPGA. Our paper has three major discriminators compared to the state of the art that typically considers offline software implementation of simulated or prerecorded data:
• We consider hardware generated and over the air transmitted physical data.
• We consider algorithm implementation on FPGA in an embedded hardware platform for low latency and low power consumption.
• We consider real-time operation for both data collection and algorithm run. The rest of the paper is organized as follows. Section II discusses related work. Section III introduces the system setup. Section IV presents implementation for deep neural network on FPGA. Section V presents the results in terms of classification accuracy, latency, and power efficiency.
II. RELATED WORK
Deep learning finds rich application in wireless communications. Examples include spectrum sensing [10] , MIMO detection [11] , channel estimation and signal detection [12] , physical layer communications [13] , jammer detection [14] , stealth jamming [15] , [16] , power control [17] , signal spoofing [18] , and transmitter-receiver scheduling [19] . RF signal classification can support different applications such as radio fingerprinting [28] that can be ultimately used in cognitive radio systems [29] subject to dynamic and unknown interference and jamming effects [30] . Modulation classification has been extensively studied with deep neural networks [6] , [7] , [20] - [27] , where the goal is to classify a given signal to a known modulation type. Different types of datasets have been used to train deep neural network for modulation classification. For example, [6] provided a training dataset collected from GNU Radio without any real channel or hardware impairments. On the other hand, [7] provided a training dataset collected from over-the-air measurements of USRP radio transmissions. However, those studies have performed modulation classification offline in software environments. Our goal is to run modulation classification in an embedded platform in real time while accounting for latency and power efficiency requirements.
III. SYSTEM SETUP
The system setup is shown in Fig. 1 . There are two major components, a transmitter and a receiver.
• There is one USRP N210 SDR that is controlled by a computer as an RF front end. This USRP either waits or transmits narrowband signals at 2.4 GHz frequency. Different transmit powers will be employed to generate different signal-to-noise-ratio (SNR) effects. The transmissions can be made over the air or over cables depending on spectrum management restrictions. Each transmitted signal is modulated with one of six different modulation types, namely 1) Binary Phase Shift Keying (BPSK), 2) Quadrature Phase Shift Keying (QPSK), 3) Continuous Phase Modulation (CPM), 4) Gaussian Frequency Shift Keying (GFSK), 5) Quadrature Amplitude Modulation (QAM) 16, and 6) Gaussian Minimum Shift Keying (GMSK).
• As receiver and classifier, DeepRadio TM runs a signal classifier by taking the received signals (I/Q samples) as input data and determines whether the received signal is noise (label 0) or it is a signal transmitted with one of the six modulation types (labels 1-6). Note that the uniqueness of the system set up is at the receiver side that makes real-time decisions by running a deep learning classifier on the FPGA of DeepRadio TM . Performance with the FPGA is compared to an embedded graphics processing unit (GPU) that connects to DeepRadio TM to receive I/Q data and runs the RF classifier. For that purpose, we use two types of embedded GPU from NVIDIA, namely Jetson AGX Xavier Developer Kit (512-core Volta GPU with Tensor Cores) and Jetson Nano Developer Kit (128-core Maxwell).
IV. IMPLEMENTATION OF DEEP LEARNING BASED RF SIGNAL CLASSIFIER
The FPGA implementation focuses on inference (test) time. Hence, the training data was collected offline and a deep neural network was trained offline. The trained model is a feedforward neural network (FNN). The input layer receives 900 I/Q signal samples that constitute one data sample for which a modulation label is assigned.
The deep neural network is a function F (x) = y that takes an n-dimensional input x ∈ R n (in our case x is the received signal vector) and produces an m-dimensional output y ∈ R m (in our case y is the vector of likelihood scores corresponding to modulation types). For an m-class classifier, the output vector y satisfies y 1 + . . . + y m = 1 and 1 ≥ y i ≥ 0. The classifier assigns a label C(x) = argmax i F (x). An activation function, denoted by σ(·) is applied at layer i for weights θ i and biases b i to perform σ(θ i x + b i ) operation. Let F i denote such operation at each layer, σ(θ i x + b i ), then a k-layer neural network can be represented as [31] . The neural network training tries to minimize a cost function J(θ) using the backpropagation algorithm by computing the gradient of the cost function with respect to neural network parameters, i.e., ∇ θ J(θ). After hyperparameter optimization, the deep neural network architecture consists of four layers. The number of neurons is 1800 at the input layer, 100 and 20 at the first and second hidden layers, respectively, and 7 at the output layer. The Rectified linear unit (ReLU) activation function is used at hidden layers. ReLU performs the f (x) = max(0, x) operation on input x. Softmax activation function is used at the output layer. Softmax performs f (x) i = e xi / j e xj on input x. The deep neural network is trained with the backpropagation algorithm [32] in TensorFlow [33] using crossentropy as the loss function. Cross-entropy function is
is a binary indicator of ground truth such that only the index corresponding to correct label in β is 1 and others are 0. The predicted outputs by the neural network are denoted by y i 's. Adam optimizer [34] is used to update network weights iteratively based on training data. Figure 3 shows the block diagram of the system implementing a layer of neurons in the FNN on the FPGA. Each layer of the FNN performs essentially same operations except with different numbers of neurons and synapses. Each neuron, in a FNN evaluates the dot-product of the inputs and its weights. As the neurons in a layer do not depend on each other for their operations, we perform the operations of each neuron in parallel. Parallel execution of the neurons provides low latency execution of the FNN on the FPGA. We implemented the ReLU activation on FPGA using conditional operation.
The resulting TensorFlow model is converted to a format readable by FPGA and the bit file is generated. This bit file is uploaded to the FPGA. This way, the RF signal classifier runs on the FPGA without incurring any delay due to another host machine. The output label returned by the FPGA is passed to an Android smartphone for display. A mobile app is running on the smartphone to display the predicted label. The graphical user interface (GUI) of the smartphone is shown in Figure 4 .
V. RF CLASSIFICATION PERFORMANCE
We measured that the RF signal classifier implemented on the FPGA achieves high accuracy (> 94% when averaged over different SNRs) with low latency and low energy consumption. Vivado Design Suite [35] then synthesize the FPGA code. We port this code to FPGA and obtain classification results in real time from FPGA. The confusion matrix is shown in Figure 5 , where label 0 is noise, label 1 is, BPSK, label 2 is QPSK, label 3 is CPM, label 4 is GFSK, label 5 is QAM16, and label 6 is GMSK. The average probability of error is shown in Table I for each  ground truth label. FPGA resource allocation is measured in Vivado. Resource allocation breakdown for RF signal classifier is shown in Table II, where LUT: lookup table, LUTRAM: lookup table   0  1  2  3  4  5  6 Predicted label We have the 16-bit implementation of deep neural network on FPGA and the performance gap from the floating implementation in software (on embedded GPU) is within 1%. When the RF classifier is run on the FPGA, the latency is 24 µs per sample and the energy consumption is 28 µJ per sample. Note that 24 µs translates to operating with >37 megasamples per second without any downsampling. It is possible to increase the rate by downsampling, reducing the size of FNN, or using the available FPGA resources to implement another classifier in parallel on the FPGA. Figure 6 shows the detailed breakdown of power consumption obtained from Vivado. FPGA achieves better latency and power performance compared to other embedded platforms such as embedded GPU. When the RF classifier is run on the Jetson AGX Xavier GPU, the latency is 3.6 ms per sample and the energy consumption is 36 mJ per sample. When the RF classifier is run on the Jetson Nano GPU, the latency is 4.1 ms per sample and the energy consumption is 21mJ per sample. Table III shows the performance comparison of FPGA and embedded GPU.
The predicted label is displayed in the smartphone GUI. Figure 7 shows how the predicted label changes first when there is no transmission, second when a signal with BPSK is transmitted, and when third a signal with QPSK is transmitted.
VI. CONCLUSION
The paper presents the embedded implementation of deep learning based RF signal classification for low-latency and low-power applications. While deep learning has started finding different applications in wireless security and communications, there is a gap of real radio implementation. This paper fills the gap and opens up new opportunities regarding the use of deep learning in wireless applications. We showed that FPGA implementation of RF signal classifier maintains high accuracy and significantly reduces latency (more than 100 times) and energy consumption (close to 1000 times) compared to an embedded GPU implementation. With this capability, it is possible to support various wireless application such as detecting jammers/emitters, authenticating mobile devices, and identifying spectrum opportunities all in realtime (in microsecond time frame) with low (microJoule) power consumption. 
