The smartphone-based human activity recognition (HAR) systems are not capable to deliver high-end performance for challenging applications. We propose a dedicated hardware-based HAR system for smart military wearables, which uses a multilayer perceptron (MLP) algorithm to perform activity classification. To achieve the flexible and efficient hardware design, the inherent MLP architecture with parallel computation is implemented on FPGA. The system performance has been evaluated using the UCI human activity dataset with 7767 feature samples of 20 subjects. The three combinations of a dataset are trained, validated, and tested on ten different MLP models with distinct topologies. The MLP design with the 7-6-5 topology is finalized from the classification accuracy and cross entropy performance. The five versions of the final MLP design (7-6-5) with different data precision are implemented on FPGA. The analysis shows that the MLP designed with 16-bit fixed-point data precision is the most efficient MLP implementation in the context of classification accuracy, resource utilization, and power consumption. The proposed MLP design requires only 270 ns for classification and consumes 120 mW of power. The recognition accuracy and hardware results performance achieved are better than many of the recently reported works.
I. INTRODUCTION
In the area of ubiquitous sensing, the wearable sensors are used to measure human body attributes like body motion, location, temperature, ECG, etc. The data received from the sensors are integrated, processed and analyzed on networkconnected devices like smartphones or laptops. Some of the meaningful information like human activity is also extracted on these devices. In the last decade, the Human Activity Recognition (HAR) based on the wearable sensor has attracted many researchers [1] . Nowadays, the smartphones have become the most feasible device for HAR implementations, because smartphones are equipped with a variety of low power and small size sensors [2] , [5] . Many such HAR based applications are successfully implemented on the smartphone platforms [5] , [33] , [34] , like complex activity
The associate editor coordinating the review of this manuscript and approving it for publication was Mitra Mirhassani.
recognition [3] , sporting activity detection [4] , etc. However, the smartphone based HAR systems are not suitable and capable to deliver the required performance in case of challenging applications like the workforce monitoring in a military combat scenario [6] , [10] .
A real-time soldier activity information along with other sensory information is a useful feedback for workforce monitoring, smart backup, rescue operation and virtual war field mapping [6] , [10] . This inspired us to incorporate soldier activity recognition in the smart military wearables. Due to difficult combat conditions, the hardware of smart wearables has to meet the stringent system requirements like fast response time, high performance per watt ratio, small form factor, higher reliability and flexibility [10] . Such requirements are achievable only by using parallel processing hardware solutions like FPGAs [22] , [28] . The efficient implementation of an activity recognition algorithm on such devices plays a crucial role in wearable system performance.
The customized hardware implementation has the capability to achieve low latency and low power consumption compared with the software implementations. Hence, the hardware is a favorable choice for HAR implementation for the challenging applications. Few attempts of HAR algorithm implementation on reconfigurable hardware are reported in the literature [22] , [29] , [30] . These implementations are focused on the hardware design and modeling of activity classifier. Because these classifiers are computationally complex and heavy, it highly influences the overall hardware performance of a HAR system. Therefore, the efficient and flexible hardware implementation of an activity classifier is a challenging problem for the development of the customized HAR system.
In presented work, the hardware-based MLP classifier has been developed for activity recognition in the smart wearable gateway (Xilinx Artix-7 FPGA). The activity classification is obtained from a single accelerometer placed on the soldier's waist. The activity is classified into five basic classes: walking, sitting, standing, laying and activity transitions. Ten MLP models with different numbers of hidden layer perceptrons have been trained and tested on HAR dataset of 20 subjects. For final hardware implementations, an MLP topology with 7-6-5 is selected from the simulation results. Then, the five hardware models of MLP design with different data precision are synthesized, implemented and tested on the Artix-7 FPGA. The detailed competitive analysis of classification accuracy, hardware resource utilization, classification latency and power consumption has been presented for all five models in this work. The real-time and power-efficient intellectual property (IP) core for HAR with adequate classification accuracy is the main contribution of this work. This IP core enables the smart wearable gateway to recognize the current soldier activity in real-time with decent classification accuracy. Likewise, the achieved minimum power consumption of IP core increases the battery life of the smart wearable. The same IP core and development methodology can be further extended to the build the smart wearables for firefighters, police professionals, mines workers, etc. In addition, the comparison of this work with other existing HAR systems and FPGA based MLP implementations is also presented.
The remainder of the paper is organized as follows. In Section-II, we review HAR classifier algorithms and hardware implementation of MLP classifiers reported for various applications including HAR. Section-III discusses detailed hardware design architectures of a perceptron and MLP. In Section-IV, we describe the complete hardware implementation and testing procedures. Section-V discusses the implementation and performance analysis of MLP designs. Finally, Section-VI concludes this work with important research findings.
II. RELATED WORK
Many HAR systems adopted various classification algorithms like the decision tree, Markov models, domain transform, fuzzy logic, support vector machine (SVM), regression methods, artificial neural networks (ANN), K-nearest neighbor (KNN), etc. [1] . Wei et al. [7] proposed the two-layer Hidden Markov Model (HMM) for continues and long-term daily activity monitoring which uses wearable body sensors. Andreu and Angelov [8] suggested the fuzzy based algorithm for real-time human activity recognition. In this work, rule-based and self-learning fuzzy classifier extracts class information from wearable wireless accelerometers. After the extensive analysis, Janidarmian et al. [31] found that the KNN with ensembles methods shows the best robust results among the other machine-learning models. Rodriguez et al. [23] reported BioHarness and smartphone based activity recognition using decision tree for the classification. Zebin et al. [24] conducted the comparison between different HAR algorithms for the inertial sensor data. Tang and Sazonov [9] compared the ANN and SVM classification algorithms for the smart shoe. From the accuracy results, Tang et al. concluded that the ANN classification algorithms performs well compared with the SVM algorithm. Therefore, we adopted ANN as an activity classification algorithm for the workforce monitoring.
Many applications adopted ANN for classification, control and calibration tasks in biomedical instrumentation [11] , [12] , control systems [13] , [14] , non-conventional energy production [15] , [16] , etc. However, the efficient implementation and real-time execution of ANN models on embedded platforms are still a challenging problem because it involves plenty of nonlinear activation functions, numerous amounts of additions and multiplications. The implementation of these algorithms demands high performance parallel computing devices like FPGA's. Because of an inherently parallel architecture of ANN algorithms, FPGA devices are becoming a favorable choice compared with sequential devices [17] , [25] . The modern FPGA has specialized blocks like DSP and BRAM to handle complex mathematical operations. The soft-core and hardcore processor are also available to enhance sequential capability. This makes the FPGA technology most suitable for the implementation of soft-computing algorithms (like ANN) for real-time applications [18] .
The feed-forward type of ANN architecture is the most commonly used ANN architectures, it is also generally known as the MLP. Many FPGA based MLP classifier models are reported in the literature for various applications. Gas classification is one of the popular application of these types of classifiers [19] . Recently, Zhai et al. [20] proposed a realtime gas classification system, which takes 540 nanoseconds for the classification. Bahoura [21] designed MLP based blue whale calls classifier using Xilinx system generator toolbox. However, this model is unable to achieve realtime performance. Nevertheless, it requires significantly less development time compared with traditional design methods. Therefore, we selected high-level programming methodology for the development of target MLP design.
Many smartphones based HAR systems are available [32] , but a very limited work has been reported on dedicated hardware (FPGA) based HAR implementation. Like in [30] , Biswas et al. designed and implemented the real-time arm movement recognition that required 41.2 microseconds for recognition. Yan et al. [29] tested the MLP based HAR design on the two different FPGAs and got impressive results compared to the smartphones implementations. Basterretxea et al. [22] implemented an MLP algorithm on FPGA (Xilinx XC6SLX45CSG324-2) for the generalized HAR system. These work motivated us to implement efficient hardware based activity classification for the smart military wearables. This study explore the efficient and flexible hardware implementation approach for the MLP based activity recognition.
III. HARDWARE ARCHITECTURE
The hardware design of the MLP classifier is broadly divided into two parts. The elementary unit of the architecture is a perceptron model and a united network of these perceptrons is MLP model. A group of multiple and independent perceptrons are referred as a perceptron layer. The feature information is processed and forwarded from one layer to another completely connected layer. This multiple layers of perceptron's are used to solve complex classification problems like activity recognition. This type of network is also known as fully connected feed-forward neural network. In this work, the final MLP activity classifier is constructed from elementary perceptrons using Xilinx system generator toolbox. The detailed mathematical formulation and hardware design of a perceptron and MLP classifier have been discussed in the following two subsections.
A. PERCEPTRON MODEL
A perceptron is a fundamental block of MLP architecture, which is inspired by brain neuron cell. A perceptron receives inputs from the previous layer and forwards the output to next layer after performing some mathematical operations. Equation (1) shows the mathematical operation performed by a perceptron.
where, Y k is a output of k th perceptron, w ki is i th element of the pre-trained weight matrix of k th perceptron in l th layer, x i is i th input of perceptron and b k is the bias of a perceptron and f is the activation function. In this work, we are using the sigmoid function as the non-linear activation function and it is the most computationally intensive part in the perceptron design. Therefore, we used the approximation of this function known as the PLAN function (f ). The PLAN function is one of the best approximations of the sigmoid function that requires less hardware resource as compared to other sigmoid approximations [25] . The mathematical description of PLAN function is shown in Table 1 . Mathematical model of a perceptron that is designed for the FPGA implementation is presented in Figure 1 . As highlighted in Figure 1 , the input signals are multiplied with the pre-trained weights and then added together with the pre-trained bias. The pre-trained weights and biases are stored in the distributed memory of FPGA fabric. The output of an addition block is connected to the PLAN function. The PLAN function (f ) is efficiently modeled using elementary computational blocks that requires minimum FPGA hardware resource [25] . This same design method has been used for designing of all perceptrons.
We used high-level FPGA design methodology for the implementation of perceptron model. This programming technique significantly reduces the design time and provides sufficient design flexibility. Figure 10 (See Appendix. B) shows a perceptron model designed by the Xilinx system generator toolbox.
B. MLP MODEL
A MLP model contains multiple layers of perceptrons connected in a feed-forward manner. The output of the final layer is terminated with a soft-max function to use it as a classifier. As our implementation is limited to the testing phase of MLP, we used normal maximum instead of the soft-max function to reduce the FPGA resource utilization [26] .
The proposed MLP design uses three layers i.e. input, output and a hidden layer. As shown in the Figure 2 , Layer 1 is the input layer that performs normalization on all i number of features. Generally, MLP consist of one or more number of hidden layers, but in the present work, we used only one hidden layer to reduce the classification delay and hardware resource utilization. The complete MLP mathematical model is shown in Figure 2 , a hidden layer (Layer 2) consists of j number of hidden perceptron. The output matrix of the hidden layer is shown in (2).
where W 2 is the hidden layer weight matrix, X 1 is the output matrix of an input layer, B 2 is the bias matrix of the hidden layer and f 1 is a sigmoidal approximation. The MLP classifies input features into ''k'' classes. Therefore, the output layer (Layer 3) consists of ''k'' output perceptron. The output matrix of this output layer is calculated using equation (3) .
where W 3 is the weight matrix of the output layer, X 2 is the output matrix of the hidden layer, B 3 is the bias matrix of the output layer and f 2 is the pure linear activation function. Figure 11 (See Appendix. B) shows the MLP model with 7-6-5 topology, designed by Xilinx system generator toolbox. We selected seven input features for soldier activity recognition that decides numbers of normalization blocks in the input layer. There are five different essential output classes identified for the soldier activity classification that decides numbers of perceptrons in the output layer. The number of hidden perceptrons has been decided from the experimental analysis presented in the fourth section.
We used inherent MLP architecture to achieve the minimum classification latency and maximum design flexibility. As shown in Figure 11 (See Appendix. B), the MLP design consists of eight input signals from which seven signals (F1 to F7) work as feature inputs and ''ST'' works as an enable signal. There are two output signals in which ''CLS'' works as an output class signal (walking, sitting, standing, laying and activity transitions) and ''D'' works as an acknowledgment signal. All of the perceptrons are connected in a feed-forward manner, which works synchronously with a control signal. The internal architecture of all hidden layer perceptrons is similar as shown in Figure 10 (See Appendix. B). The architecture of all output layer perceptrons is also similar to hidden layer perceptron only excluding activation function approximation.
The MLP mathematical model is inherently parallel i.e. every perceptron in a layer works independently. Hardware implementation of this parallel algorithm on sequential devices requires more computation time and power, compared with the parallel processing devices [17] . Hence, we used parallel processing device like FPGA for MLP implementation. The proposed hardware design achieved the complete hardware parallelism required for the computation VOLUME 7, 2019 of a single layer. Every perceptron in the single layer functions independent and parallel with the layer synchronization. This results into the significant reduction in classification latency for the time-sensitive applications like soldier activity recognition.
IV. HARDWARE IMPLEMENTATION AND TESTING
The MLP hardware design is targeted for real-time soldier activity recognition in smart wearable gateway. The activity classification is done by the 3-axis accelerometer data located on the soldier waist. As this work focuses on the hardware design of the MLP classifier, we used the available accelerometer data set for the evaluation of MLP classifier hardware implementation [16] . A sensor node has been simulated in LabVIEW using this dataset. The simulated node is used for the performance evaluation of the MLP design in the hardware. The detailed description of the used dataset, FPGA implementation and testbed setup is given in the following subsections.
A. DATASET DESCRIPTION
The training and testing performance analysis of MLP classifier has been done on UCI transition-aware human activity recognition dataset [36] . A 3-axial accelerometer is used for body motion data acquisition that is placed on the subject's waist with a sampling frequency of 50 Hz. The generated raw data is segmented into 2.56 seconds segments with 50% overlap. Many important features are extracted from these data segments, out of which we have used seven types of data features. The selected features are body acceleration standard deviation of all three axes, the signal magnitude area of body acceleration and gravity acceleration mean of all axis [27] . The five basic and important output classes (walking, sitting, standing, laying and activity transition) are selected for the soldier activity recognition. This work used 7767 feature vector samples of 20 subjects from the complete dataset. The three different combinations of training, validation and testing samples are selected from the original dataset. The numbers of training, validation and testing samples selected for each dataset are mentioned in Table 5 (See Appendix. A).
B. FPGA IMPLEMENTATION
The present application demands efficient MLP performance for field operations. After completion of appropriate training, MLP design works in the testing phase. Thus, there is no need to implement training hardware in actual field operations. Therefore, this work uses the off-line training method to avoid unnecessary hardware burden of training. These MLP designs are trained off-line using the MATLAB. Ten MLP models per dataset with distinct topologies have been trained and validated to finalize the optimum MLP topology. Three sets of testing samples(1165, 1941 and 2718) are used for the testing of all MLP models in MATLAB simulation. The comprehensive analysis of testing results is presented in the next section.
After training, calculated weight and bias matrices have been included in the hardware model of the MLP design. These parameters are stored in the distributed memory of FPGA fabric to avoid time and power-consuming memory read cycles required for external memory access. The five different variant of MLP design has been implemented with five distinct input-output (IO) data precision of all perceptrons. In addition, the data precision of all weights and biases has been set constant with fixed-point 16-bit precision. The complete MLP has been designed by Xilinx system generator design toolbox, which helps to reduce development time and generate the flexible design. This MLP design is packaged into a Xilinx IP Core after completion of design and performance evaluation. This flexible IP core can be implemented on the any Xilinx FPGA by only changing the design token configurations. In this work, the MLP IP core is implemented on the Xilinx FPGA (Artix-7-35T, xc7a35ticsg324-1L) with operating frequency of 100 MHz. The softcore processor (Microblaze) based IP test system has been developed for the hardware performance evaluation of MLP IP core. Figure 12 (See Appendix. B) shows the hardware design of Microblaze based IP test system. The Vivado IP integrator software is used for the complete development of IP test system. The MicroBlaze is programed with IP test application using Xilinx SDK.
C. TESTBED SETUP
The hardware testing is important to evaluate the operational performance of the MLP IP core. Figure 3 shows hardware testbed setup for MLP IP core testing. Setup is divided into two parts, the first part is Microblaze based IP test design with MLP IP core and second part is LabVIEW based sensor node simulator and receiver simulator. The sensor node in a targeted application handles the data acquisition, segmentation and feature extraction, which is forwarded to the gateway using UART communication link. After the evaluation of output class, the current soldier activity is further forwarded to the receiver. As shown in Figure 3 , the same practical scenario has been simulated using LabVIEW based GUI. The sensor node simulator sends test features of dataset-A to the gateway (Artix-7 FPGA) through UART-1 communication link. The activity classification is performed on the gateway using the MLP IP core running and then calculated class is forwarded to receiver simulator using UART-2 link.
V. PERFORMANCE EVALUATION
To meet the defense field requirements, the performance evaluation of MLP IP core is focused on optimum topology selection, classification accuracy, hardware resource utilization, classifier power consumption and classification latency. The detailed analysis of simulation and implementation results have been discussed in the following subsections.
A. MLP TOPOLOGY SELECTION
The training and validation of basic MLP design have been conducted on MATLAB using back-propagation algorithms. This work used MLP with a single hidden layer, which helps to reduce the classification latency, hardware resources and power consumption. This configuration also minimizes the over fitting in MLP design [40] . The seven types of features are selected for the classification and five output classes are chosen, which gives the MLP design with seven perceptrons in an input layer and five perceptrons in an output layer. Now, the final topology of MLP design depends only on the numbers of hidden layer perceptrons, which is decided from the following analysis.
In this work, the iterative constructive method is used to decide the optimum numbers of hidden layer perceptrons [41] . Ten possible MLP designs have been generated by changing the number of hidden layer perceptrons for each dataset. Thirty independent MLP models have been trained and validated on three datasets A, B and C (see Table 5 in Appendix. A). Then, the generated MLP designs are tested on test samples of respective datasets that works on floatingpoint data operations. Figure 4 shows the graph of classification accuracy versus the number of hidden layer perceptrons for all datasets. The accuracy is the average of 10 similar training, validation and testing of each MLP models. As each extra perceptron contributes to the hardware resource utilization and power consumption, hence we allowed maximum ten perceptrons in the hidden layer. It is clear from the Figure 4 that the MLP design with six hidden layer perceptrons gives maximum classification accuracy for all three test datasets. Similarly, The validation performance in from of average cross entropy is also shown in Figure 5 . The minimum cross entropy is achieved by the six hidden layer perceptrons for all datasets. Both results indicate that the hidden layer with six perceptrons performed better compared with other combinations. Therefore, we selected MLP topology (7-6-5) with six hidden layer perceptron for the FPGA implementations.
B. HARDWARE CLASSIFICATION ACCURACY
The hardware of finalized MLP model (7-6-5) has been designed by Xilinx system generator toolbox. Five different MLP designs with distinct data precision have been synthesized, implemented and tested. The input and output (IO) bit precision of all the perceptrons have been set distinct in each variant of MLP IP core. The selected precision are 24, 20, 16, 12 and 8 bits respectively. However, the precision of weights, biases and feature inputs have been kept constant as a 16-bit in all variant MLP IP core. All these versions have been implemented and tested using the same procedure as explained in VOLUME 7, 2019 Section-IV (Subsection B and C). The hardware accuracy of each version is tested on the test data with 1165 samples. Figure 6 shows the classification accuracy with different implemented models. The accuracy is almost constant up to the 16-bit precision model and below this, it starts decaying. The models with 24 and 20 precision requires more power and hardware resources compared with the 16-bit precision, which is explained in the following subsection. Therefore, as per our application constraint, the MLP IP core with the 16-bit perceptrons IO precision is proven as the best solution in the classification accuracy context. Table 2 presents the class wise sensitivity, positive predictive value, specificity, negative predictive value, and accuracy. The result indicates that the confidence of output sitting class is comparatively less among all classes. In addition, the classification results of walking, laying and activity transition is more accurate than the sitting and standing class.
C. HARDWARE RESOURCE UTILIZATION ANALYSIS
As mentioned in above subsection, the five different versions of MLP design are implemented on the Artix-7 FPGA. The implementation reports of all five designs have been analyzed collectively. The change in perceptron IO precision effects the hardware resource utilization of MLP IP core design. As shown in Figure 7 , the linear reduction in the FPGA resource utilization with respective IO precision is observed. However, the DSP (DSP48E1) utilization remains same for all MLP versions. From Figure 7 , it can be observed that the MLP design with 8-bit perceptrons IO precision has emerged as the most efficient design. However, as shown in Figure 6 , the classification accuracy of 8-bit MLP design is lowest among all. Therefore, the MLP design with 16-bit IO precision has proved as a high accuracy and moderate resource utilization model.
D. POWER CONSUMPTION ANALYSIS
The power consumption of all designs has been estimated by Xilinx power estimator toolbox with same parameter settings. The operating frequency of FPGA is set at 100 MHz for all models.
The power consumption significantly depends upon the FPGA resource utilization and operating frequency. Therefore, the same trends have been observed like hardware resource utilization in power consumption estimations as shown in Figure 8 . The DSP slices are the major contributor to the total power consumption of MLP design. It adds 44 milliwatts of dynamic power into each design. As the DSP slices used by all the designs are same, therefore there is a gradual change in the power consumption profile. Like resource utilization scenario, the MLP design with 16-bit IO precision has emerged as an optimal solution for accuracy and power consumption. 
E. CLASSIFICATION LATENCY
The hardware testing of MLP design is conducted by MicroBlaze based test system and LabVIEW based GUI's. The MLP design is packaged in an IP core, which communicates with a test system using AXI-Lite interface mentioned in Figure 12 (See Appendix. B). The classification latency is calculated by observation of AXI-Lite bus transitions during live hardware design run. The bus transitions are acquired by USB-JTAG programming circuitry and Vivado system debugger toolbox. Figure 9 shows signal transitions during actual MLP algorithm execution with 16-bit IO precisions on FPGA. After sending last feature (f7) to MLP IP core, ''st'' signal becomes high. At this point, IP starts classification task and generate acknowledgment signal ''d'' after completion. Total 27 clock cycles are required for classification. The test system and MLP IP operates on 100 MHz so total 270 nanoseconds are required for classification. As highlighted in Figure 9 , the ''cls'' signal is output class send by MLP IP, which is two (sitting) for present feature vector. The hardware architecture of all MLP version is same except perceptrons IO precision. Therefore, the classification latency of all MLP versions is the same.
F. COMPARISON WITH EXISTING WORKS
The HAR systems reliability depends upon the activity recognition accuracy. Table 3 shows the accuracy comparison with some previous works. Various types of HAR classification algorithms were adopted for accelerometer inputs in these works. In works [32] , [34] , classification algorithms are implemented on the smartphone platform with floating point data precision. The work presented in [22] and this work implemented a classification algorithm on the customized hardware with fixed bit precision. The comparison shows that the accuracy of the proposed classifier outperforms all implementations.
Many recently reported research work implemented the MLP algorithm on various FPGA platforms. We compared this work with similar FPGA based MLP classifier implementations designed for various applications. Table 4 shows the hardware performance comparison of this MLP IP (16-bit precisions) with other MLP implementations. Hardware design simplicity affects the maximum operating frequency, which directly decides classification latency of MLP design. To achieve optimum hardware performance, we adopted the inherent MLP architecture in hardware designing and we used DSP blocks for multiplications.Due to this approach, the MLP IP core achieved 270 nanosecond classification latency, which is the lowest among all mentioned models. We chose Artix-7 35T (xc7a35ticsg324-1L), which is specially designed to achieve optimum power performance. This FPGA fabric runs on the 0.95 volts supply, which saves around 30% power compared with normal Artix-7 devices [35] . As shown in Table 4 , the total FPGA power consumption of complete test design is 241 mill watts out of which 120 mill watts is consumed by MLP IP core.
The comparison shows that the performance of MLP IP core is best among other works in terms of time and power.
VI. CONCLUSION
In this paper, we proposed hardware implementation of the MLP classifier for activity classification in smart military wearables. The simulation results prove that the MLP design with 7-6-5 topology gives maximum classification accuracy. The five variants of final MLP designs (7-6-5) with different data precision implemented on the Xilinx Artix-7 FPGA. In which, the classification accuracy of the MLP model with the 24, 20, 16 bits precision is almost constant, below 16-bit it starts decaying. The collective analysis shows that the reduction in resource utilization and power consumption without compromising of the classification accuracy is achievable by reducing the perceptrons IO precision. 0.9% to 3.33% classification accuracy reduction have been observed between simulation and hardware testing results. It is mainly because of data format switching (Floating point to fixed point) and sigmoid function approximation.
The implementation result shows that the MLP design with 16-bit fixed-point data precision is the most efficient MLP design in the context of classification accuracy, latency, power consumption and resource utilization. This MLP design requires only 270 nanoseconds for classification using 120 mill watts of dynamic power. The classification accuracy of the proposed HAR classifier is greater among many HAR implementations reported in the literature. Moreover, the proposed MLP classifier outperforms the recently reported FPGA based MLP design in terms of the classification time and power consumptions.
The presented work can be further extended to implement on-chip learning in the hardware that can be integrated into an on-line training hardware of the smart suit. Some of the extra essential output classes also can be easily incorporated into the hardware design due to the flexibility offered by this design.
APPENDIX A DATASET DESCRIPTION
See Table 5 .
APPENDIX B FPGA IMPLEMENTATION DESIGNS
See Figures 10-12 .
ACKNOWLEDGMENT
The author would like to give special gratitude to Dr. Kishore Bhurchandi for his guidance and support. Authors would also like to thanks Dr. Vipin Milind Kamble for their suggestions in improving the paper.
