ABSTRACT This paper presents results of using a simple bit-serial architecture as a method of designing an extremely low-power and low-cost neural network processor for epilepsy seizure prediction. The proposed concept is based on a novel bit-serial data processing unit (DPU) which implements the functionality of a complete neuron and uses bit-serial arithmetic. Arrays of DPUs are controlled by simple finite state machines. We show that epilepsy detection through such dedicated neural hardware is feasible and may facilitate development of wearable, low-cost and low-energy personalized seizure prediction equipment. The proposed processor extracts epileptic seizure characteristics from electroencephalogram (EEG) waveforms. In order to facilitate the classification of EEG waveforms, we develop a dedicated feature extraction hardware that provides inputs to the neural network. This approach has been tested using various network configurations and has been compared with related work. A complete system which can predict epileptic seizures with high accuracy has been implemented on an ALTERA Cyclone V FPGA using 3931 ALMs which constitutes about 7% of the Cyclone V A7 capacity. The design has a prediction accuracy of 90%.
I. INTRODUCTION
The World Health Organization (WHO) estimated 50 million of the world's population today are afflicted with epilepsy [1] . It was approximated that 80% of these reported epileptic cases are located in developing countries where the availability of treatment facilities and medications that are needed are questionable. There exists the posibility that many epileptic cases are not reported in many parts of the world where the people still suffer from stigma and discrimination. Epilepsy treatment to date still involves the use of various anti-epileptic drugs (AEDs) across the globe. Therefore, accurate seizure prediction is significant in order to prevent the recurrence of seizures through timely administration of the AEDs. Accurate seizure prediction is based on the research of complex electroencephalogram (EEG) signals. State-of-the art seizure prediction mainly involves complex software methods and these methods can be categorized as: time-domain analysis, frequency-domain analysis, and non-linear dynamics [2] . Unfortunately, as of today there is still no reliable, homebased seizure prediction system to help an epileptic patient with timely administration of AEDs. A novel approach is proposed in this paper to implement a low-cost hardware neural network which is primarily intended for use in portable equipment to predict epilepsy seizures. This paper is organized as follows. Firstly, the paper presents a brief review on state-of-the-art seizure detection techniques. Secondly, a bit-serial data processing unit (DPU) is introduced. The DPU is extremely small and has the capability of implementing a biological neuron. It is then demonstrated how a multi layer neural network can be built using DPUs. Thirdly, a simple feature extraction hardware has also been proposed and implemented to work with the network. The feature extraction hardware is implemented as a dedicated simple processor. A preliminary version of this work has been reported [3] .
II. BACKGROUND RESEARCH
In general, an EEG signal is defined as a non-stationary biomedical signal where epileptic seizures are characterized by recurrent spike patterns. An EEG signal has a few VOLUME 6, 2018 This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ useful characteristics which can be beneficial when detecting a seizure event. Specifically, the delta (0-4Hz) and theta (4-8Hz) sub waves in an EEG signal exhibit low frequency and high magnitude during a seizure event [4] . The traditional procedure of analysing an EEG scan requires expensive manpower where a specialist is needed to review the whole EEG recording. As part of the ongoing research into epilepsy detection, automatic seizure identification methods have been considered, such as Wavelet Transform [5] and Autoregressive (AR) modelling [6] . These methods present a better resolution for short data segments, and they can be used when real-time data processing is required. The EEG research relies on state-of-the-art waveform analysis methods which include Short Time Fourier Transforms, Wavelet Transforms, Lyapunov Exponent, Autoregressive Modelling etc [7] . As described above, the frequency components can be extracted using Short Time Fourier Transform (STFT) as the basic Fast Fourier Transform (FFT) method suffers from large noise sensitivity [8] . The average electric potential that is emitted by a group of neurons is recorded by specific placement of the electrodes on the human scalp [9] . With the Rosenstein algorithm, Lyapunov Exponent for the EEG signals can be used with the combination of a fuzzy-logic based system which allow the detection of an epilepsy seizure event [10] . AR modelling can reduce the spectral loss and increase the resolution of the EEG spectrum. The optimum order of an AR model is determined by the Bayesian Information Criterion (BIC) and the AR parameters of an EEG signals [6] .
Recent work [5] proposed a new algorithm, tunable-Q wavelet transform in conjunction with fractal dimensions to detect epilepsy seizures. This tool decomposes the EEG signal into the various sub-bands previously mentioned. The fractal dimensions of the sub-bands are used as discriminating features for epilepsy detection. A 10-fold cross-validation was used to reduce the possibility of over-fitting. The work achieved an average classification sensitivity of 100% and has many advantages, including an ability to analyse seizures within a short time with no errors. However, this approach requires high computational power and complexity, and would not be suitable as a wearable seizure detection. Another work [11] , employs a multivariate approach to detecting epilepsy by using an empirical wavelet transform, and has a patient specific model for EEG seizure detection. The data sets used for testing were obtained from the scalp EEG database of the Children's Hospital, Boston Massachusetts Institute of Technology (CHB-MIT). The tests evaluated 177 hours of EEG recording, using six classifiers. Evaluations achieved the following averages: accuracy 99%; specificity 100%; sensitivity 98%. The work used oversampling in an attempt to address the imbalance issue of the dataset. This approach was adopted in the training process of our work.
The main conventional classification techniques for machine learning which can be applied to epilepsy diagnosis include: the Naive Bayes (NB) Classifier [12] , Decision Tree Classifier (DTC) [13] , k-Nearest-Neighbours (k-NNs) Classifier [14] , support vector machines (SVM) [8] , empirical mode decomposition [15] and classifiers based on artificial neural networks [16] .
The NB classifier is a simple probabilistic classifier which utilises the Bayes Theorem. It can also be considered as a conditional probability model. This classifier is often used in data mining applications as well as automated medical diagnosis. Thus, it is suitable for epilepsy detection. The Naive Bayes classifier uses the independence assumption that focuses on each feature independently of each other while ignoring any possible correlation between the features [12] . One of the main advantages of utilising the Naive Bayes classifier is the limited use of training data for classification.
Decision trees are also used in epilepsy detection because they are efficient at classifying different sets of data. As a sample is only tested against a subset of the classes, this method does not require complex computations. It has been suggested in a recent paper [13] to utilise neural networks in the design of a DTC. However, there are a few disadvantages when using a decision tree. They are not as accurate as the other classifiers. Furthermore, DTC performance heavily depends on the effectiveness of the particular DTC implementation [13] . They tend to be less robust than other methods as a very small change in the training datasets might result in a huge change in the output prediction.
The k-NN classifier is a non-parametric, non-linear yet relatively simple classifier. This classifier is effective when dealing with large data sets. It relies on class assignment based on a nearby data set where similarities between the samples used are measured with a distance function. A recent work [14] points out that k-NN is applicable to medical classification problems. The basic algorithm for a k-NN classifier is relatively similar to that of a neural network classifier with training stage and a prediction stage. The training stage of the k-NN classifier involves all the different samples which are stored in some form of memory.
SVMs have also been used to analyse EEG signals. A smart sensor IC was proposed [8] with a CMOS chip for scalp EEG acquisition. This chip with an area of 0.35um is integrated with the local processing of the sensor node. Feature vectors of the signal are extracted and classified through machine learning. A number of sensors would have to be worn to achieve spatial correlation in order to produce a functional system for epilepsy detection. Each individual output of the classifier could then be combined to detect the onset of an epileptic seizure. SVM have also been used in lung cancer diagnosis along with image processing techniques [17] . The advantage of high generalisation and an assurance of global optimisation makes SVMs useful for such applications. They have been successfully as classifiers in many other fields [17] . In a more recent work [15] , the proposed method involves the use of empirical mode decomposition (EMD) to distinguish seizure and non-seizure EEG waveforms. The datasets used in this work are the same as used in our research. They combine the use of least square support vector machines (LS-SVM) and the EMD algorithm. The work has managed to achieve an accuracy higher than 90%. However it uses a software approach that requires complex computations. A very relevant study was conducted by Zhong et al. [18] . In that work, it was proposed to use Gaussian Progression (GP) classification to binary discrimination of motor imagery of EEG data. Zhong's approach is also computationally intensive but outperforms SVM and k-nearest neighbour (k-NNs) in terms of 0 to 1 loss class prediction error.
Artificial Neural Networks (ANNs) can solve very complex problems and have been used in biological modelling where they are an efficient tool that can ease the burden on experts in medical diagnosis [16] . It is possible to use ANNs to complete an automatic epilepsy detection system through the prediction of the onset of a seizure occurrence can be achieved with the assumption that the EEG generated is a very complex but linear system. However, the brain is nonlinear. By analysing the power spectrum, it is also possible to continue the analysis through a linear approach. Back propagation neural networks include two stages, a forward propagation stage and a back propagation stage. The normal neural operation uses the forward propagation to pass along the EEG sample provided along the input layer to the hidden layer where calculations are being made which in turn is passed to the output layer to produce the output sample of the neural network which can determine if a seizure occurrence will appear with the input EEG sample. The back propagation stage includes a learning process which reduces the error between the calculated output sample and the target output, i.e. the possibility of seizure occurrence. This process is performed by adjusting the weights of the neural network in real time [19] . Spiking Neural Networks (SNNs) are a third generation ANNs that have been researched in recent years [20] . SNNs are a distinct form of ANNs as each individual spiking neuron propagates information by the timing of the neuron while other forms of ANNs uses the rate of the spikes. SNNs are useful in detecting epilepsy through the process of modelling the brain of an epileptic patient [21] . Hardware implementations of SNNs were performed using NVIDIA CUDA [20] and the SpiNNaker [22] . The latter has the capability to simulate and implement the SNN which is used in brain modelling mentioned above.
In summary, a hardware neural network solution may prove to be better suited for a dedicated hardware implementation as compared to the other software implemented classifiers described in this section. This hardware neural network would need to meet the research specifications of being small and power-efficient classifier. Neural networks can be implemented in hardware such that high performance is achieved when processing huge amounts of data. In the next section, a novel bit-serial implementation of a neural network (BSNN) is proposed.
III. IMPLEMENTATION OF BIT-SERIAL HARDWARE NEURAL NETWORKS (BSNN)
Bit-serial architectures which process data bit by bit during each clock cycle are largely historic. Most modern processors use bit-parallel data processing for performance. However, when high performance is not a priority but instead the emphasis is on very low-power and low-cost bit-serial computing has its advantages. In modern applications bit-serial processing is still used in digital filters where input samples are processed in a bit-serial manner [23] .
Here we consider the classical model of a perceptron that receives a vector input pattern x i where i = 1, . . . , I and I the size of the vector. These inputs are weighted by the weight vector of a given perceptron (w 1 , w 2 , . . . , w I ) which is obtained in the off-line learning process. The neuron is a summation unit that performs the sum of products to calculate its output u. The output u is then processed by the activation function used in the output neuron. In our case the activation function is a simple threshold operation converting u into a logic signal y which has the value of '0' or '1'.
The conventional bit-serial architecture can model this behaviour with ease and complex feed forward neural networks (FNNs) based on such neurons can be created using simple, regular hardware structures controlled by simple state machines. The learning process of such designs can be accomplished off-line by using simulation software.
The proposed Data Processing Unit (DPU) is illustrated in Figure 1 . It is designed to calculate equation 1a. The Wmem is a RAM memory that stores the weight values. The ALU consists of a custom multiplier which utilises bit-serial processing. This custom multiplier is a modified version of a simple multiplier. When the DPUs are used in a vector arrangement, they can be controlled by a single state machine (Figure 2(b) ) as they perform the same operations. In this way, an entire neural network layer can be implemented as a vector processor. The computational complexity of the design is kept to a minimum as to decrease the cost of the hardware design.
A three layer neural network with layer control FSMs and a central controller is shown in Figure 2 (a). In Figure 2 , the range of x0 to x3 indicate the inputs, w indicates the weights with u0 and u1 as separate outputs. u outputs will later be passed through an activation function to obtain a single output y (eq.1b). Table 1 shows that an 8-bit DPU requires only 24 Logic Elements (LEs) on an inexpensive Altera Cyclone V FPGA, out of over 300,000 LEs available on a Cyclone V chip. The control path for a network with three layers requires 103 LEs (Central Control FSM: 3 LEs, 2 layer FSMs: 18 LEs each and 2 counters with 32 LEs each). This compares favourably with the size of the datapaths of typical bit-serial processors VOLUME 6, 2018 FIGURE 1. DPU Design (Logic element counts are included in table) [3] . mentioned in the Table. Bearing in mind that the control logic of the proposed approach requires only simple state machines, rather than fully-fledged program control paths used in general-purpose processors, expected overall benefits of an ASIC implementation will include faster operation and lower power consumption.
The performance of the proposed hardware is tested on FPGAs. The power performance of FPGAs can not be directly compared to that of an equivalent ASIC. However, the proposed hardware in this work is much smaller than other equivalent processors as discussed above in Section III. Therefore, it can be expected that an equivalent ASIC implementation of the proposed system will be more power efficient than existing solutions. As a form of estimation, we addressed the issue of power consumption through a simple comparison between our design the Cyclone V NIOS general processor design. It was found that the dedicated hardware neural network design requires less than 10% of the resources needed to implement a NIOS processor executing the same algorithm. With this fact in mind, we can infer that an equivalent ASIC will consume an order of magnitude less energy than a dedicated processor. 
TABLE 2.
Recognition accuracy for different number of inputs in a n-1-1 network against training data.
IV. EEG WAVEFORM CLASSIFICATION
The input data used in the evaluation of the proposed FNN was obtained from an on-line open source [26] provided by the Epilepsy Center of the University of Bonn, Germany [27] . The source provides sets of EEG waveforms for both seizure free instances and EEG waveforms during seizures taken from the brain (epileptogenic zone) of the same patient. Figure 3 shows samples of an epileptic and a normal EEG. Our results were obtained from a number of implementations of the proposed FNN and were evaluated using standard metrics [28] in seizure detection, namely: the sensitivity (TPR), specificity (TNR), positive predictive value (PPV) and negative predictive value (NPV). The hardware implementations were trained offline in MATLAB and then tested with two sets of 100 EEG waveforms. As part of the validation process, the same input data used for training was used to test the n-1-1 network, i.e. n neurons in the input layer, one neuron in the hidden layer and one output neuron as shown in Table 2 . Then, additional data was used to test the same network and the results obtained are shown in Table 3 . The n-1-1 network configuration has a very bad recognition rate when additional data was used for testing. From the results it can be concluded that a multi-input single neuron in the hidden layer is not sufficient to detect epilepsy accurately.
TABLE 3.
Recognition accuracy for different number of inputs in a n-1-1 network for additional testing (not training data).
Therefore, other configurations have been tested, for example a 40-n-1 network with n hidden neurons. The DPUs used in these tests had a 12-bit precision to increase the accuracy. Table 4 presents the response of the 40-n-1 network using VOLUME 6, 2018 MATLAB results as a form of comparison. The logic element counts needed for different numbers of neurons are also included. In summary, the network configuration of 40-30-1 provides promising results in terms of detecting epileptic waveforms.
V. FEATURE EXTRACTION HARDWARE AND IMPROVED SYSTEM A. SLOPE CALCULATOR
In order to complete the wearable seizure detection system, it is imperative to include a simple feature extraction hardware to provide the inputs to the BSNN. The proposed hardware will use picoMips as the basis of the design. The data path of the feature extractor as illustrated in Figure 4 which consists of a synchronous RAM, a simple subtractor implemented as an ALU and registers. The data path is controlled by a simple FSM module. The hardware cost for the ALU requires only 13 ALMs when synthesised on a Altera Cyclone V chip. This hardware will serve as a mean of extracting the slope, S of the EEG waveform from two adjacent points (x 1 and x 0 ) on the EEG sample. It is calculated using this simple equation, S = x 1 −x 0 . Each S value is stored in the registers and used as inputs for the BSNN.
This section presents results of experiments that have have been conducted to obtain better accuracy by using the slope of the EEG waveform. The tested network configurations are 11- 10-10-1, 11-20-20-1, 11-30-30-1 and 11-40-40-1 . The results are evaluated using the same statistic metrics used in the above section. The metrics are presented in Table 6 and Table 7 . With 11 inputs, the best correct recognition rate that was obtained was the 11-40-40-1 configuration with 70% and precision rate of 100% when tested using training data. When tested with additional data, the network configuration have an recognition rate of 61% and a precision rate of 80%. Further testing using single feature inputs, i.e. EEG signal slope values are tested across 4 different EEG segments and the results of the experiments is shown here in Table 12 . Table 8 presents the rates of correct recognition when different numbers of inputs were used within a double layer network configuration. 40 hidden neurons were used for each hidden layer as it has the best recognition and precision rate when tested with 11 inputs.
B. EXPERIMENTS WITH MEAN ENERGY
The energy of a designated EEG signal window was also extracted from the EEG input signals; this is in addition to the slope calculator featured above. Mean energy is calculated by the following equation [29] :
The amplitudes of the EEG signal spikes are represented by a(i); w represents the number of a values used. A new system using the extraction hardware component was used on FPGAs and achieved a 62% accuracy in 100 EEG samples.
C. IMPROVED SYSTEM
The improved system uses the mean energy and slope values from the EEG signals which are to be used in the proposed network. The 100-40-40-1 network configuration, with a recognition rate of 88%, has been tested and formed a comparison. The recognition rate has improved by 2% in the improved system. Using experiment statistics, it is demonstrated that a 16-bit system has the highest correct recognition rate. A high possibility of correctly identifying a seizure would be maintained, even if the system was made smaller and an associated degree of accuracy lost. A detailed comparison is shown in Table 9 below. TABLE 9. Improved system statistics using 100-40-40-1 network configuration.
D. CONCLUSIVE REMARK
In conclusion, we maintain that with only a 2% increase improvement of the improved system; the 12-bit network using only EEG slope features can still provide a reliable performance when predicting seizure events. A comparison of the three systems is shown in Table 10 . 
VI. HARDWARE NETWORK TESTING AND COMPARISON WITH RELATED WORK
In this section, the network proposed is tested thoroughly and comparisons is made against related research. A brief work flow is explained here. Firstly, the range of EEG data waveform is obtained from the open source database published by Andrzejak RG et. al, members of the Department of Epileptology at University of Bonn in Germany [27] . Secondly, the datasets are segmented using the OAT method proposed by recent work [30] . The training of our neural network are completed off-line using simulation software. The hardware of our design encompasses the feature extraction and the BSNN. The work flow of the dedicated hardware can be referred to in Figure 1 and Figure 2 . As mentioned above in Section III, there is no complex algorithm in play in this proposed method as to minimise the hardware cost and optimise its efficiency. The results of the hardware design are shown here in Table 12 .
The EEG samples obtained from the University of Bonn [27] are 100-sample single channel EEG datasets. The experiments in our work use both free seizure and seizure EEG datasets of a single epileptic human patient. Half of the datasets consist of free seizure samples and the other half are seizure samples. Each sample consists of up to 800 data points obtained from the dataset mentioned above.
The feature vector that was used by a recent research [30] consists of statistic metrics which are: mean (X Mean ), median (X Median ), mode (X Mode ), standard deviation (X StdDev ), first quartile (X Q1 ), third quartile (X Q3 ), inter-quartile range (X IQR ), skewness (X skew ), kurtosis (X kurtosis ), minimum (X Min ), and maximum (X Max ) [31] have also been included as part of the experiments. Using this feature vector, the 11-7-1 hardware neural network with a 12 bit architecture obtained a sensitivity, specificity and sensitivity of 60%. It could recognise 30 out of 50 waveform used to training datasets.
Ten other network configuration have also been designed and tested. Table 11 presents the configurations and their recognition rates. The table shows that that a single hidden layer with 100 neurons have a similar performance to that of a double layer network (10 neuron in each layer). It would be more cost-effective to use the double layer configuration as it requires less number of hidden neurons.
By analysing these results, it can be seen that this simple feature vector may prove lacking in providing a very accurate classification for our dedicated hardware neural network when compared with an input vector consisting of multiple slope values obtained from different EEG samples.
Both optimized hardware neural network system is tested and compared against several software implementations for epilepsy detection [30] , [31] . When compared with the results from another paper [30] , it is possible to argue that the design proposed in this paper is more practical than designs using the SVM approach. As it is a simple wearable hardware design, many more input neurons are used as compared with the design proposed previously [30] . In a software implementation of a epilepsy detection system [30] LMT, MLR and SVM classifiers were used. Table below presents a close comparison between our design and the software implementation [30] . It should be noted that the network used for comparison is of a 12-bit architecture.
The dedicated hardware design was implemented and synthesised on an Altera Cyclone V FPGA. Different type of configurations are used as a form of comparison to fully explore the capabilities of the proposed network. Therefore, examples of 2 and 3 hidden layers were used. The hardware costs for different network configuration are included here, i.e. 100-20-20-1 and a 100-40-40-1 configuration. They cost 2303 and 3931 Adaptive Logic Modules (ALMs) accordingly. The configurations with 3 hidden layers are 100-10-10-10-1 and 100-5-5-5-1. The costs are 2259 and 1748 ALMs.
VII. CONCLUSION
In conclusion, experiments with bit-serial neurons confirm that an extremely small logic system can successfully implement effective epileptic seizure detection. The key benefit of a dedicated neural processor compared to known, equivalent general-purpose processors, is that very small control logic and a low bit-precision are sufficient to obtain correct operation. Multiple tests have been conducted with various network configuration to test the feasibility of detecting epilepsy when using the proposed approach. The clinical significance of our work is that it provides a technique to develop a wearable and reliable hardware for epileptic patients in their daily activities. However, a system conceived as a compromise between performance and cost has limitations. The 90% seizure pediction accuracy is high but mispredictions are still possible. Furthermore, the testing conducted during this research were performed using EEG benchmark waveforms. Future work will involve personalised EEG waveform tests suited to individual patients and further investigation into suitable sizes and accuracies of bit-serial FNNs which will be followed by a development of a low-power ASIC. The aspect of power consumption can then be fully addressed using an ASIC implementation.
