Abstract-Support Vector Machines (SVMs) is a popular classification and regression prediction tool that uses supervised machine learning theory to maximize the predictive accuracy. This paper focuses on the field programmable gate array (FPGA) implementation of a Support Vector Machine classification system. Owing to the advanced parallel calculation feature provided by FPGA, a fast data classification can be achieved by the FPGA-based two-class SVM classifier. The classification system works both in linear mode and non-linear mode, depending on the dimensions of the classification. Simulated results demonstrate that the classification system is effective in fast data classification, and it is a promising technique for Smart Grid to strengthen its communication security.
I. INTRODUCTION
achine Learning is considered to be a subfield of artificial intelligence and focuses on the development of techniques and methods which enable the computer to learn. Its purpose is to solve practical problems using machine learning theory, and many algorithms are developed which enable the machine to learn and perform tasks and activities. As a machine learning method, Support Vector Machines (SVMs) was first introduced in early 1990's by Boser, Guyon, and Vapnik. SVMs is a set of related supervised learning models with associated learning algorithms that analyze data and recognize patterns. It is usually used for classification and regression analysis. The algorithm constructs a data prediction model based on labeled data, as well as provides classification and regression prediction for the unknown data. Machine learning theory is used to maximize the predictive accuracy, while at the same time automatically avoiding over-fit to the data [1] . Due to the powerful machine learning algorithm and high prediction accuracy, the applications of SVMs greatly increased during last 10 years, especially in classification and pattern recognition problems, which provide a good generalization performance for Xiaohui Song and Lingfeng Wang are with the Department of Electrical Engineering and Computer Science, University of Toledo, Toledo, OH 43606 (e-mails: Xiaohui.Song@rockets.utoledo.edu, Lingfeng.Wang@utoledo.edu).
Hong Wang is with the Engineering Technology Department, University of Toledo, Toledo, OH 43606(e-mail: Hong.Wang@utoledo.edu). a wide range of regression and classification tasks [2] . There exists a fair amount of work on implementing SVMs on custom hardware, mostly on FPGAs. A homogeneous FPGA-based architecture for the SVM training was introduced in [3] , and the results can be potentially extended for the acceleration of the SVM classification. The work in [4] presents a SVM training architecture on a Xilinx Virtex II device. The SVM classification was applied for video shot boundary detection in [5] , in which the FPGA device was used for the dot-product mapping of the SVM algorithm and only linear SVMs were targeted. However, most works are focused on pattern recognition or calculating acceleration, the application of FPGA based SVM classification system in cyber security area is a novel and promising research direction.
By optimizing the use of the available computing resources, the performance of the SVMs can be maximized. Implementing SVM classifiers on suitable computing devices like FPGAs can exploit the potential of custom precision algorithms. Nowadays, FPGA devices offer a vast amount of DSP blocks and a hierarchy of different memory sizes, providing high level of flexibility and large amounts of parallel computational power. Compare to other computing resources, the reprogrammable feature of FPGAs offers a significant advantage against application-specific cases, especially targeting different classification problems which may vary in size, dimensionality, and dynamic range constraints. Additionally, modern FPGA devices are able to provide equal or superior performance at a lower power cost than general purpose processing units [6] .
In this paper, an FPGA-based SVM classification system is presented in order to achieve a fast two-class data classifier. This work focuses on a FPGA implementation for the two-classII. SUPPORT VECTOR MACHINE Support Vector Machines (SVM) [7] are considered one of the most powerful classification tools due to their state-of-the-art machine learning algorithm based on the Vapnik-Chervonenkis learning theory [8] . SVMs can be defined as supervised learning models which construct a hyperplane or a set of hyperplanes in a high dimensional feature space, and used for classification and regression analysis. The algorithm is trained with a machine learning theory to maximize predictive accuracy using optimization theory that implements a learning bias derived from statistical learning theory [9] . With the strong regularization properties-which refer to the generalization of the model to new data-SVM can be an efficient tool in solving classification problems [1] . A classification task usually involves training and testing data which consist of some data instances [10] . Each instance in the training set contains one target values and several attributes. The goal of SVM is to produce a model which predicts target value of data instances in the testing set, which are given only the attributes [11] . In SVMs, a dataset consisting of pairs of input vectors and desired outputs is called the training dataset, which is used to design and construct the decision function of the system and, hence, this procedure is usually considered as an instance of supervised learning. Known labels help indicate whether the system is performing in a right way or not. This information points to a desired response, validating the accuracy of the system, or helping the system to learn to act correctly [12] . During the training phase, the system identifies the Support Vectors (SVs) [13] , which are the data points that can best build a separation model for the classes. Those vectors are then used to predict the class of any future data point during the classification phase. The classification phase gives prediction of unknown samples. According to the prediction model from the training phase, new data can be classified based on different key features. During the classification phase, training datasets can be updated with newly obtained data and work as an "online" model to provide most accurate prediction.
A. Linear SVM
Multi-class classifications can be broken up into two-class classification units and a non-linear classification problem can be solved by replacing inner product calculation with kernel functions.
Given 2 classes C 1 and C 2 , T = {(X 1 , y 1 ) ( X 2 , y 2 ) (X n , y n )} is a training dataset consisting of samples taken from C 1 and C 2 , where
. If X n belongs to class C 1 , then y n = 1; If X n belongs to class C 2 , then y n = -1. Finding a real function g(X) in R M , for any new sample with unknown class, have:
is linear function, it's called linear SVM, and when g(X) is non-linear function, it's called non-linear SVM.
As showed in Fig. 1 , the goal of linear SVM is to find a classification line g(X) between C 1 and C 2 . It is known that under high-dimensional circumstance, g(X) is a hyperplane. For linear separable classes C 1 and C 2 , more than one hyperplanes can be applied to separate them accurately. Assuming that two classes can be separated by hyperplane l , lying on each of l are two parallax hyperplanes 1 l and 2 l with no learning sample points between them. The region bounded by them is called the "margin". Thus the objective of SVMs is to maximize the distance between the classes' hyperplanes or, in other words, to maximize the "margin". The expression of the separating hyperplane is:
Where denotes an inner product, and W the normal vector to the hyperplane. The parameter b/ W determines the offset of the hyperplane from the origin along the normal vector W.And the hyperplanes 1 l and 2 l can be described by the expression
The distance between these two hyperplanes is2 ‫‖܅‖‬ ⁄ and then the problem to maximize the "margin" becomes the problem to minimize‖‫‖܅‬ In order to simplify the calculation, substitute ‫‖܅‖‬ with ‫‖܅‖‬ ଶ 2 ⁄ then the problem can be expressed as a constrained optimization problem:
Using Lagrange multipliers to solve the constrained optimization problem, we can get the classification function:
Apply Wolfe's dual form to solve (4) and transform the constrained optimization problem to a concise form: 
B. Non-linear SVM
In many real-world classification problems, it is often not feasible to linearly separate the data in the original space. SVMs can overcome this problem by mapping the input space to a higher dimensional one where a linear separation may be feasible. Then the non-linear SVM can be expressed as: 
where x ) is a mapping function that transform non-linear problem to linear separable problem. Actually, there is no necessity to find x ) if we can calculate
. Introducing Kernel Function:
By replacing dot product with Kernel Function, this allows the algorithm to fit the maximum-margin hyperplane in a transformed feature space. It is a feasible way to achieve non-linear SVM classification.
Common Kernel Functions are: Polynomial:
Gaussian radial basis function:
(11) Out of many possible Kernel Functions, of special interest are those which satisfy Mercer's condition [14] and can be expressed as an inner product in the high-dimensional space. By applying the kernel, there is no need to explicitly map the data to the higher-dimensional space [15] . Qualified Kernel Functions may vary in practical applications. The choice of the Kernel Functions for this specific project is based on their responses for the classification data used. After a certain amount of classification test, The Kernel Function with the best recognition rate will be applied.
III. FPGA ARCHITECTURE MAPPING

A. Linear SVM
The SVM classification algorithm can be expressed with several concepts and equations, but to implement the algorithm, a feasible expression has to be introduced in the implementation.
For the training phase of the algorithm, the classification function has to be constructed. Solving the Lagrange multiplier D n in (7).Let:
Then we can get all the Lagrange multipliers D n , as a result, the normal vector of the classification function g(X) is: 
To solve SVM classification problem with PFGA, we need to design a compatible expression of the algorithm rather than apply the equations and functions mentioned above.
The rationale behind the design of the SVM classifier is the exploitation of the parallel computational power offered by the FPGA resources, as well as the high memory bandwidth offered by the FPGA internal memories in the most efficient way. As we can see in Table I , the computation of g(X) involves matrix-vector operations, which can be very complicated and onerous using FPGAs during actual calculating procedure. The problem can be segmented into smaller ones and parallel units can be instantiated for the processing of each sub problem. Therefore, the matrix calculating is performed by amount of processing units, such as adders and multipliers. These processing units perform as parallel computation architecture in the design that significantly speeds up the decision function.
The proposed FPGA architecture for the linear SVM classifier is shown in Fig. 2 . The SVs (training datasets) are loaded into the internal FPGA memories, while the classification data points are streamed into the FPGA through By replacing the dot product with the Kernel Function, we construct a non-linear SVM classification algorithm. As we can see in Table II , besides the matrix calculation, the exponential function calculation is introduced. Neither fixed point calculation nor floating point calculation for exponential function can be implemented on FPGA. The advantage of parallel computing cannot be fully played out for massive exp computing. In the proposed algorithm, we designed a table-driven exponential function calculation module to fulfill the actual exp calculation. The application of the table-driven exponential module saved a large amount of PFGA computing resources, and the accuracy is reliable according to the simulation test. The matrix calculating is also performed in parallel by processing units similar to linear algorithm. The parameter V in the Gaussian radial basis function is set to 0.6 in our design.
The non-linear SVM classifier FPGA architecture is shown in Fig. 3 . Like the linear design, training datasets are loaded in the internal FPGA memories through SV units, and then together with test data, they are streamed into square difference units. These units calculate the square difference of test points and training points. Next, results are sent to exp units which perform exponential function calculation in order to achieve the Kernel Function. As a matter of fact, these exp units working as table-driven modules fulfill not only the exponential function calculation but also the rest of the Gaussian radial basis function including parameters. The table-driven module saved a large amount of computing resources by transferring massive floating point calculations into a fast table look up mechanism. Due to the mathematical for i=1 to n, for j=1 to n, calculate classification function parameters using matrix X and Gaussian radial basis function and give the result to matrix A(i,j)=exp(-((X(i,1)-X(j,1))^2+(X(i,2)-X(j,2))^2)/(2*(0.6)^2)); end end for k=1 to10, for l=1 to 10, multiply matrix A(k,l) by matrix Y(l) and give the result to matrix A(k,l); end end dived matrix A by matrix B and give the result to matrix C; for q=1 to 10, calculate C(q)*Y(q)*exp(-((X(q,1)-X(1,1))^2+(X(q,2)-X(1,2))^2)/(2*(0.6)^2)) and give the result to h(q); end calculate the sum of all the elements in matrix h and give the result to matrix H; calculate parameter b=Y(1)-H; for p=1 to 10, calculate matrix g(p)=C(p)*Y(p)*exp(-((X(p,1)-x(1))^2+(X(p,2)-x(2))^2)/(2*(0.6)^2)); end calculate the sum of matrix g and give the result to matrix W; build the classification function G=W+b properties of Gaussian radial basis function, the table can be restricted to an acceptable size. The adder tree and multipliers construct the classification functions with parameter b. Class identities of test data will be shown after the comparing unit. The calculating units in the architecture use integer and 18 bit signed fixed point binary data to fulfill the calculation, with 1 sign bit, 2 bit before the decimal point and 14 bit after the decimal point.
IV. IMPLEMENTATION RESULTS
Based on the calculating accuracy and the test data we choose in the implementation. We chose the Altera's Cylone II EP2C70F896 as the targeted device for the proposed architecture. The results can be easily expanded to other targeted devices by changing the resource constraints of the design flow. The architecture is captured in VHDL and the fixed-point modules are generated with the Altera tools and packages [16] . The targeted operating frequency is between 200-250 MHz For the testing data, we created 4 random sampling datasets for linear and non-linear SVM classifier separately.
For linear SVM design, we build 4 different two-class linear datasets (dataset A, B, C, D) to test the classifier. The size of each dataset is 400, 20 of which are used for SVM training, with a testing size of 50. According to the cumulative results of each datasets, the test results given by the classification system is accurate and the consumed time meets our requirements for the design. The results accuracy, however, may be slightly different based on the amount of training data and the density of the support vectors. Table III , based on 4 different linear test models, the calculation error is around 0.39% (e 1 ), and the recognition rate is satisfactory. The recognition rate is related to the training and test datasets we choose, the classification results can vary with the selection of the testing points. Based on the algorithm proposed above and the calculation error obtained from the amount of testing calculation, an expected recognition error (E 1 ) is inevitable. With the algorithm accuracy ' 1 (1/2 10 ), the expected recognition error can be calculated as E 1 =2(e 1 +' 1 ) =0.975%. If applied on a certain amount of test data, the actual recognition error will regress towards E 1 . Although the time required for different classification models is unpredictable, it is far easier and more reliable than a PC with a 2.27-GHz Intel i5 duo processer with 3 GB of RAM, as the computing time is reduced by approximately 30%-50%. Fig. 4 depicts the detailed comparison in terms of time usage.
As indicated in
For the non-linear SVM classification design, 4 different 400-size nonlinear datasets are built to test the non-linear classifier. The training size for each dataset is 20 and the testing size is 40. The cumulative results are also satisfactory. The calculation error is around 0.041%, which is better than the linear design.
As shown in Table IV , high recognition rates are obtained for the test models and datasets we chose. Like the linear classification, we obtain a 0.041% calculation error (e 2 ), which we get from the amount of testing calculations and the non-linear algorithm accuracy (' 2 ). The expected recognition rate error for non-linear system (E 2 ) can be calculated as E 2 =2(e 2 +' 2 ) =0.832%. Due to the table-driven exp units we introduced into our design, the time consumption for the non-linear SVM classification system significantly decreased; compared to the 2.27-GHz Intel i5 duo processer with 3 GB of RAM PC, the computing time is reduced by approximately 60%. Figure 5 is the detailed time consumption.
The implementation results show that the designed FPGA implementation of SVM classification system works adequately as a fast two-class classification system with a high-accuracy and satisfying computing time. The performance of the SVM classifier as a fast recognition classification system fulfills the proposed requirements. For the future work, it is very promising that smart meters embedded with SVM classifiers can provide fast intrusion detection in order to protect the whole secure communication system. According to the new classification data and computing requirements, adjustments may be needed in computing units or the Kernel Function used when applied in practical security system,.
V. CONCLUSION AND FUTURE WORK
This paper presents a FPGA-based SVM classification system which can be used for fast data classification. The
