This paper presents a novel framework for designing support vector machines (SVMs), which does not impose restriction on the SVM kernel to be positive-definite and allows the user to define memory constraint in terms of fixed template vectors. This makes the framework scalable and enables its implementation for low-power, high-density and memory constrained embedded application. An efficient hardware implementation of the same is also discussed, which utilizes novel low power memtransistor based cross-bar architecture, and is robust to device mismatch and randomness. We used memtransistor measurement data, and showed that the designed SVMs can achieve state-ofthe-art classification accuracy on both synthetic and real-world benchmark datasets. This framework would be beneficial for design of SVM based wake-up systems for internet of things (IoTs) and edge devices where memtransistors can be used to optimize systems energy-efficiency and perform in-memory matrix-vector multiplication (MVM).
I. INTRODUCTION
Due to the proliferation of internet-of-things (IoTs) in the areas of ubiquitous sensing and human-machine interaction, there has been an increased demand towards integrating intelligence directly onto IoT hardware platforms [1] . In these embedded platforms, high energy-efficiency and low computational/memory foot-print are the key design requirements due to limited battery resources. In this regard, wake-up systems play an integral role and operate by triggering on the computationally and power-intensive modules only when some ambient conditions are detected. As shown in Fig.1 , the wakeup system could be a generic signal detector that can sense the input signal and turns on the backend feature extraction and classification module only when the system detects an ambient conditions. It could be a speech signal detector, motion detector in gesture recognition systems, vibration detector in seismic monitoring system and others. Unlike the backend recognition module [2] , the wake-up system could have a simpler architecture but must be highly energy-efficient.
In this scenario, support vector machine (SVM) based wake-up detectors are advantageous because they generalize well with few training samples, their performance is directly determined by its energy-efficiency [3] and under general conditions they offer unique and robust solution. However, evaluating an SVM decision function is computationally intensive [4] since the input features must be matched with several stored templates (or support vectors). Furthermore, SVM architectures require the kernel functions to be positivedefinite which leads to uniqueness of the trained solution. To overcome computational complexity, inherent parallelism in SVMs can be mapped onto an array and matrix-based solution for high degree of regularity in computational acceleration [5] . The parallel architecture can also be mapped onto a twodimensional grid of computing elements interconnected so that shared inputs are along one dimension and shared outputs are along another dimension. Further increase in computational efficiency can be achieved by implementing the array using analog elements where computations such as multiplications are performed using physical properties of devices. In this regard, memtransistor [6] based cross-bar array provides an attractive and energy-efficient platform to implement inmemory computation and matrix-vector operations [7] . The high-density integration offered by nanoscale memtransistor array and it's non-volatility could be exploited to implement SVMs. However, due to the intrinsic non-linearity in memtransistor characteristics, any kernel implemented using its crossbar array cannot be guaranteed to be positive-definite which is the key requirement for conventional SVMs [4] . To overcome the need of SVM kernel to be positive definite and to reduce computation complexity along with power requirements, we present a novel framework for designing SVMs. This framework does not impose any explicit restrictions on the nature of the kernel and is robust to fabricated device mismatch and randomness, similar to other work which exploits randomness and is tolerant to device mismatch [8] . Additionally, the fixed number of stored template vectors along with CMOS-Memtransistor cross-bar topology to perform in-memory [9] computations, significantly improves performance, leading to improved power and area efficiency as compared to traditional SVM based hardware implementations. We used the device measurement data for kernel implementation and showed that the proposed novel framework can achieve state of the art classification accuracy.
II. TEMPLATE BASED SVM FORMULATION
Given a training set (x i , y i ), i = 1, . . . , N , where x i ∈ R d , y i ∈ R c , the generic form of the decision function for a multiclass SVM is given by
are the s th support vector and any arbitrary test vector respectively, K(·, ·) is positive definite kernel, S is the number of support vectors obtained after training and c denotes number of classes in dataset [4, 10] . In this paper, we propose a novel variant of the kernel function where instead of computing the similarity of an arbitrary test point with respect to all the support vectors, we precompute the similarity between the support vector and a predetermined set of P template vectors. When a new test point comes in, we compute its similarity only with respect to the template vectors, and synthesize the kernel using the inner product
is a nonpositive definite function which gives an estimate of the similarity between the p th template vector m p and the i th training vector x i . The decision function can thus be rewritten as:
where w p = In our implementation, we have chosen a probabilistic approach [10] for training, even though the framework is also applicable to other SVM training procedures. Input vector x is processed by a MVM or a cross-bar array to estimate the kernel functions Φ(m p , x). The kernels are then processed column-wise by the reformulated training weights w p and calibrated using the memtransistor crossbar module Φ(·, ·). Fig. 3 shows the flowchart summarizing the design flow for the entire process.
III. MEMTRANSISTOR IMPLEMENTATION OF THE TEMPLATE BASED SVM A. Kernel Computation Using Memtransistor Crossbar
Memtransistor crossbar arrays have shown great potential for neuromorphic computing and learning. These non-volatile memories if arranged in crossbar pattern as shown in Fig.2(c memductance (conductance of memtransistor device). Fig.2(c) shows an ideal crossbar where memtransistors are linear and all other circuit parasitic may be ignored. An input vector voltage x is applied to the rows of the crossbar, the output voltage is captured with Trans-Impedance Amplifiers (TIA) at all columns. We get, |Φ(m p , x)| = |(m 1p * x 1 ) + . . . + (m dp * x d )| where M P ∈ R d×P is the memductance matrix containing all the template vectors. We normalized the input vector between 0 to 1. The memtransistor accuracy (number of repeatable and precise resistance levels) is obtained by the number of memductance states it can maintain and attain. In our case, it can attain around 86 states while maintaining significant readout separation. Energy consumption in fabricated memtransistor was found to be 0.7 nJ for potentiation and 0.5 pJ for depression cycles for the device channel area of 0.423 × 10 −14 m 2 , both of which are much lower than traditional CMOS based architectures [11] . Fig.4 shows a schematic of a typical memtransistor, which consists of a floating gate based two-dimensional multi-state memory device for the storage of weights. The operating principle involves the tunneling of charge carriers from the channel through a tunnel barrier into the floating gate [12] . This charging of the floating gate, in turn, creates an electric field which screens the applied back gate bias leading to a hysteresis in the transfer characteristics and hence memory action. For the proposed application, we have used a memory device which is completely fabricated from ultra-thin twodimensional layers. The channel of the device constitutes a single layer molybdenum disulphide (MoS 2 ) flake which is semiconducting in nature leading to a high on/off ratio. We have chosen exfoliated hexagonal boron nitride (hBN) as the tunnel barrier because of its single crystalline nature, lack of defects and a large band gap ( 5.97 eV). A layer of graphite acts as the floating gate electrode. The device is placed on a Si ++ /SiO 2 (285 nm) wafer, contacts to the sample are made using E-beam lithography followed by packaging in a chip carrier and wire bonding. To obtain the memory action we apply a pulse at the gate( Si ++ ) electrode while simultane-ously measuring the conductivity change of the MoS 2 channel by applying a very small drain bias (V sd ). The memory action is robust and repeatable over multiple switching cycles and the two-dimensional nature of the constituents makes this device geometry immune to short channel effects due to improved gate coupling. We can then combine n such fabricated devices in crossbar arrangement where each device memductance can be adjusted by applying a pulse of definite width for a definite time. Fig.4(b,c) shows the memductance (M sd ) variation of memtransistor for the gate pulse of -2 V and +2 V respectively. Fig.4(d) shows the characteristic plot for output current vs input voltage (pulse) obtained for negative and positive pulse intervals. It shows that on applying negative pulses the memductance increases in steps and so does the output current and vice versa for positive pulses.
B. Memtransistor Device Fabrication and Characterization

IV. RESULTS AND DISCUSSIONS
We use real physical data obtained from the memtransistor device for mapping the kernel function in order to mimic actual memtransistor crossbar array behaviour. We demonstrated the classification capabilities of the proposed framework for standard datasets and compared it with traditional SVM implementation on both synthetic and real dataset based on multiclass data. Fig. 5 (a-c) shows the classification results for three synthetic datasets 100×2, 100×3, and 1000×9 for verification. Tables I and II show a comparison between the classification accuracies of the traditional SVM and the template-vector based SVM on different benchmark UCI datasets such as Stalog Heart, Bank note authentication, Diabetes, Haberman and Activity recognition (AReM) datasets [13] . We combined from Georgia Tech Face Database (GTFD) [14] and Caltech-101 [15] to generate face and non-face dataset and used it for classification. It can be seen that even with a fixed number of support vectors (here 10 in all the cases except face dataset where we 100 support vectors were used) the classification is on par with traditional SVM yet gives power efficient and memory efficient computations. Furthermore, the support vectors are fixed in number and are implemented using memtransistor whose energy consumption was found to be 0.7 nJ for potentiation and 0.5 pJ for depression cycles for around 90 memory states while occupying a smaller channel area. Additionally, the ability to compute summation by KCL and implementing inherent dot product of memtransistor crossbar provides computational efficiency without any extra hardware. 
V. CONCLUSIONS
In this paper, we presented a unified framework for designing support vector machines (SVMs) that do not impose any explicit restrictions on the kernel to be positive-definite. We also showed that the proposed framework is able to find an SVM solution where the number of stored templates is always fixed. The architecture utilizes our novel memtransistor crossbar topology where the proposed framework itself is robust to fabricated device mismatch and randomness. The measurement data from our fabricated prototype device model was used as fixed stored template vector for memtransistor kernel. Classifications results were presented for both real and synthetic dataset.We also proposed an SVM based wake up system for monitoring incoming signal and deciding whether the signal is relevant for computing task or not such as activity recognition and others. Utilizing only a fixed number of support vector and memtransistor as a kernel, a classification without spending much of the power and memory can be efficiently achieved. Future work includes a implementation of the entire classification framework with a complete memtransistor based memory system for extremely low power envelope and computational cost.
