Current advances in emerging memory technologies enable novel and unconventional computing architectures for high-performance and low-power electronic systems, capable of carrying out massively parallel operations at the edge. One emerging technology, ReRAM, also known to belong in the family of memristors (memory resistors), is gathering attention due to its attractive features for logic and in-memory computing and benefits which follow from its technological attributes, such as nanoscale dimensions, low-power operation, and multi-state programming. At the same time, the design with CMOS is quickly reaching its physical and functional limitations, and further research toward novel logic families, such as threshold logic gates (TLGs), is scoped. In this paper, we introduce a physical implementation of a memristor-based current-mode TLG (MCMTLG) circuit and validate its design and operation through multiple experimental setups. We demonstrate twoinput, three-input, and four-input MCMTLG configurations and showcase their reconfiguration capability. This is achieved by varying memristive weights arbitrarily for shaping the classification decision boundary, thus showing a promise as an alternative hardware-friendly implementation of artificial neural networks. Through the employment of real memristor devices as the equivalent of synaptic weights in TLGs, we are realizing components that can be used toward an in silico classifier.
I. INTRODUCTION
T ODAY'S conventional computing paradigm is based on the MOSFET transistor and CMOS technology; two cornerstones which have underpinned the development of digital electronics over the last 5 decades. Although there is still optimism for future improvement of CMOS, accumulating scientific evidence indicates the need for advances Manuscript [1] , [2] . The former addresses the increasing difficulty of pursuing further downscaling (with its associated drop in reliability [2] ) whilst the latter seeks to address the Von Neumann bottleneck, where increasingly large memories and powerful processors struggle to communicate over a limited interlink whose data transfer capacity doesn't scale fast enough [2] - [4] .
On the technology front, recent advances in emerging memory technologies introduce new tools in electronic system design. One prominent technology, ReRAM devices [5] (part of the memory resistor, memristor, family of devices) can act as nanoscale [6] , finely tunable [7] , electrically programmable [8] , [9] resistive elements. Memristors are capable of storing multi-bit information and retaining their memory state when powered off (non-volatile) while simultaneously their adoption in electronics is accelerated by additional advantages they provide, such as better area scaling, low power consumption and CMOS-compatibility [10] - [12] . Hence, memristor devices are considered one of the most promising candidates for the next generation of computer circuits, systems and architectures [13] - [16] . This enables the development of area and power efficient reconfigurable electronics, which are very important in a wide range of applications, e.g. the embedded computing systems that process the data at the edge, where there is a continuous race toward minimization of chip area and power consumption for neuromorphic edge computing [17] , [18] .
On the computation/architecture front, there has been a sustained effort to develop bio-inspired computation concepts, mostly in the guise of artificial neural network (ANN)-enabled systems. Research on artificial neural networks has thus far spanned the entire interval between the first simplified models of all-or-none hardware neurons [19] and the current stateof-the-art GPU-based ANNs [20] - [22] . However, one often overlooked example of ANN-like computation can be found in the form of its quantized, digital counterpart, the so-called threshold logic (TL) [19] . TL is a model for performing a comparison between a threshold value and the weighted sum of an input vector. A basic computational unit in TL is called a threshold logic gate (TLG) and it corresponds to the artificial neuron in ANNs. TLGs were introduced as a method of describing and modeling neural activity in the brain through conventional electronic circuits and systems [19] , [23] . Although TL is effectively a simplification 1549-8328 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
of ANNs, TLG-based logic families have been shown to be capable of fast and low-power operation as evaluated by the power-delay trade-off metric [24] - [26] . Many of these implementations suggest the hardware design of ANN-like circuits and systems could greatly benefit from using TLGs as their fundamental logic cell, thus enabling advances toward neuromorphic computer architectures and applications, as some recent work explores [27] - [30] .
In this work, we demonstrate a practical implementation of the metal-oxide memristor-based differential TLG (MCMTLG) circuits, thus laying the foundation for building artificial perceptron networks. In Section II, we provide the background that supports the concept of memristive TLGs as the basis for future reconfigurable computing systems. In Section III, we provide the design and operation description of the MCMTLG. In Section IV, we explain the experimental setup, and in Section V, we validate the functionality of the proposed gates through 2-input, 3-input and 4-input experimental setups where tunable memristor weights are used as artificial synaptic weights, thus defining the contribution of each input component to the TLG's comparison function. Notably, this functionality is enabled by the recently introduced multi-bit memristor technology [7] , which enables fine, continuous control of the memristive synaptic weights. We show experimentally how changes in memristor resistances affect the shape of the decision boundary of the MCMTLG and comment on key, practical factors that affect performance. In Section V, we modeled the proposed MCMTLG circuit into Cadence's Virtuoso Spectre simulation environment and performed performance testing to provide estimation regarding its power and delay metric. Through the use of the modeled MCMTLG, we performed a comparison with existing memristive TL circuits (Section V). Finally, our conclusions and discussion of future work is presented in Section VI.
II. TLG AND MEMRISTOR TECHNOLOGY BACKGROUND
Many competing emerging memory technologies are part of the memristor technology family, such as PCM, ECM, VCM etc. [31] , [32] . A number of these technologies have been studied as novel reconfigurable circuit and system components [13] , [33] - [35] (including TLGs [30] , [36] - [38] -exclusively simulated). In principle, any memristor technology featuring non-volatile resistive switching, sufficiently high ON/OFF ratio and not excessively high or low resistance levels can be introduced into an appropriately designed TLG. In this work, we have used a type of metal-oxide-based memristive devices shown to routinely support analogue, non-volatile resistive state resolution of 5 bits [7] .
Different designs of TLGs have been proposed with different trade-offs regarding their power-noise ratio performance [26] and different circuit implementations [26] , [39] , [40] . Recently, the use of differential type Current-Mode (CM) TLGs (CM-TLGs) seems to gain ground as one of the fastest and low-power TL implementations [26] , [41] . More importantly, recent work is showcasing optimized differential CM-TLGs [42] - [44] , highlights the benefits this logic technology offers in comparison with conventional CMOS logic networks regarding area and power savings [25] .
The transistor-based differential CM-TLGs can provide some competitive solutions, but at the same time, their benefits are limited by increased sensitivity to noise and device mismatch, as well as limited fan-in (input vector space dimensionality) [45] , [46] . Transistor-based TLG designs are regulating the input weights and classifying the input space through parallel connections of MOS devices. These designs were limited by the need to use a standard minimum transistor size as the unity input contribution multiplier [42] . For applying different non-unity multiplier factors per input, multiple transistor components needed to be selected. At the same time, the transistor-based TLGs are incapable of representing noninteger contribution values, as the weights are determined by a specific number of binary weights [44] , [47] .
Toward providing solution to these limitations a lot of work has focused into novel and more efficient implementations of the differential synaptic weight circuits, by introducing conventional electronics, such as capacitors and resistors [40] , [48] , while other recent implementations take advantage of emerging nano-electronics devices such as single-electron technology (SET) [49] and negative resistance devices (NRD) [50] , [51] . More recently, Rajendran et al. [52] and later Dara et al. [41] , among others, have shown that memristor technology can efficiently be incorporated into TLG designs, hence becoming the catalyst of significant power consumption and noise sensitivity reduction, as well as logic and area scaling in TLGs, compared to conventional Boolean logic gate. Introducing the memristor devices as analogue weights in a digital logic gate family, has the advantage of enabling highly localized, continuously tunable, minimal front-end footprint and low-voltage operated non-volatile memory into the TLG, thus providing a potentially decisive advantage in the implementation of memory-heavy ANN accelerators [30] , [38] , [53] - [56] .
From the available memristive TL implementations, the computing schemes of differential memristively-enhanced load comparison TLGs are shown to have advantages over simpler memristor-based TLG designs [53] , [57] , [58] . More specifically, the differential implementations, in general [39] , showcase delay and energy improvements over non-differential memristive TL (MTL) designs [59] - [62] . At the same time, there is a significant trade-off of energy-delay-flexibility and area-complexity between these two main groups of TLGs. While the differential TLG group is optimized for performance and logic-centric features (e.g. positive and negative weight configurations as in [43] ) the non-differential MTL gates provide a simpler gate structure where a resistive network (weighted inputs) is fed into an inverter (thresholding element). Different variants of the MTL scheme are available and can provide novel solutions to memristively-enhanced TLG-based computer architectures, capable of competing against conventional systems in applications such as object recognition, FPGAs, etc. [28] - [30] , [60] , [63] - [65] .
Memristively-enhanced TLGs have a lot of competition from other memristor-based logic circuits (depending on the application). Technologies such as memristor-based Look-Up-Tables (LUTs) [66] - [68] and memristor-based universal logic gates [69] - [71] , such as memristor ratioed logic (MRL) NAND gates [71] , may be preferable in some architectures and/or applications over TLGs. But the MCMTLG's requirements for a state-of-the-art memristor technology, such as the one used in this work, favors in critical ways the implementation of non-uniformly behaved programmable analogue resistive elements. At the same time, TL computing schemes do not require frequent switching of the memristive weights, as they are mainly used as programmed-once-read-many types of reconfigurable logic, thus do not impose high requirements of switching endurance in memristors. In contrast, LUTs techniques require stable and hard-defined memristive states to operate correctly, while other logic techniques that make use of memristive networks to perform state-based logic [71] requires total homogeneity of device behavior and high endurance in large crossbar arrays to be viable as true alternative post-von Neumann solution. While unorthodox by mainstream conventional systems, the implementation of future computers that make use of non-uniformly behaved components might be the key to a new era of computing. Neuro-inspired logic schemes, such as TLGs, are ideal to 'assimilate' such 'imperfect' technologies, i.e. technologies that do not offer better reliability and controllability compared with existing conventional digital electronics, and use them to build new generations of computers, similar to what biological brains seems to achieve in nature's biological neural networks.
Combining an understanding of TLGs as fundamental computational units in ANN-like post-von Neumann computing schemes with the recently demonstrated multi-bit capabilities and fine tuning of memristances of metal-oxide-based memristive devices raises the prospect of a memristor-based reconfigurable fabric. In the following section, we present our approach into memristively enhanced CM-TLGs (MCMTLG) and initial experimental results using a discrete componenton-breadboard circuit implementation of the proposed design.
III. PROPOSED MCMTLG DESIGN AND OPERATION
A differential CM-TLG design, such as the MCMTLG, consists of two parts (see Fig.1 ), the differential and the sensor part. The differential part consists of the input and threshold branches, handling the input and threshold memristance input vectors respectively. Within each branch, the weight vectors are implemented by a bank of 1T1M (memristively source-degenerated pMOS transistors) ensembles. Each 1T1M ensemble receives a digital input signal controlling the gate of the pMOS transistor; a single element of the branch's input vector, with the accompanying memristor defining the contribution of each such vector element. If the input is low (active), then a memristance-dependent current flows from that 1T1M sub-branch toward the sensor part. Additionally, each of the differential branches is power-gated by a back-to-back (BtB) pMOS circuit.
The sensor part is the thresholding element of the circuit, comparing the differential inflowing currents and settling to a binary output indicating which is greater. It is designed as an SRAM memory cell; a latching element consisting of two The two basic parts are the memristor-based 1T1M array performing a dot-product multiplication between the memristor state (memristance, i.e. memory resistance) and the binary input vector controlling the accompanying pMOS device, and the sensor determining which 1T1M array outputs greater current. CA and CO nodes, standing for canonical (CA) and complementary (CO), respectively, are the output nodes of the gate. The outputs are available during the evaluation phase when the differential current flows have been compared and a final stable state of the sensor part has been obtained. The clock signal (CLK) and its complement (CLK') are controlling the sensor's power gate and the equalization circuit, respectively. Hence, the CLK signal defines the transition between the two operation phases, equalization (reset) and evaluation (set).
BtB connected CMOS inverters, forming a positive feedback loop ( Fig.1 ). Furthermore, two additional CMOS inverters are added at the outputs of the sensor part, one per output, hence avoiding any voltage level degradation and isolating the sensor part from the circuitry connected further down the logic cascade. The power supply to the sensor part with the isolation inverters is controlled by a pMOS power gate.
The main design features of our proposed MCMTLG are based on a variety of differential TL circuits, such as [41] and [42] . Similar to Dual Clocked Current Mode Threshold Logic Gate (DCCML) design, presented in [43] , we used a common voltage supply for both sensor and differential parts. Furthermore, the differential 1T1M banks were connected to the outputs of the sensor, thus speeding up the sensor's decision-making operation (differential current comparison) by removing the RC paths introduced by the 1T1M array path, a design feature similar to transistor-based coupled inverters with asymmetrical loads (CIAL) [72] , threshold logic CIAL (CIAL-TL) [73] and their memristor-based counterpart from Dara et al. [41] .
The MCMTLG circuit performs a current comparison operation in two phases. During the equalization phase, the differential part is power-gated on and the sensor part is power-gated off, thus forcing the sensor part into an unstable equilibrium. In that phase, the voltages at CO and CA are forced to be almost equal by the shunting BtB pMOS devices between the branches. Next, in the evaluation phase, the inter-branch shunting is released, the differential part is power-gated off and the sensor part is power-gated on. This allows differences on nodes CO and CA to be amplified by the positive feedback action of the BtB-connected inverters of the sensor part [45] . Notably the differential part is cut-off from the voltage supply during the evaluation phase, thus disabling the current flow through the 1T1M input and threshold arrays and toward the sensor, leaving only a brief window for the sensor (during the short period of CLK falling and CLK' rising) of achieving a stable and correct transition to a binary memory state, based on the small voltage differences settled during the equalization.
IV. EXPERIMENTAL SETUP
For this practical memristor-enhanced circuit implementation we used memristor devices designed and fabricated in-house by our group. All the memristors used in the experimental setups are in 3 × 3mm 2 chips that are wire-bonded to PLCC68 packages. Each memristor is a 60 × 60 μm 2 cross-point of the top and bottom electrodes. All circuits implemented throughout this work rely on the rich dynamics of an in-house metal-oxide ReRAM technology employing metal-insulator-metal (MIM) devices [7] .
Originally, the devices were fabricated on 6-inch SiO 2 /Si wafer with the bottom and top electrodes (BEs & TEs) patterned using optical lithography, e-beam evaporation, and lift-off processes. Similar processes were adopted for the active layer patterning, except that sputtering was used for the deposition with a magnetron-sputtering tool. The active layer is constituted of TiO 2 and Al 2 O 3 thin-film metal-oxides. After dicing, 3×3 mm 2 wire-bonded chips containing memristor devices were obtained, with MIM stacks constituting of Pt/Al 2 O 3 /TiO 2 /Pt/Ti (10/4/24/10/5) nm. Fig.2a shows an example of a chip that contains 32 stand-alone devices. Scalability is another advantage of memristors when used in a cross-bar array configuration. Pi et al. [74] have demonstrated devices down to 2×2 nm 2 with 12 nm pitch.
All hardware experiments presented through this work were carried out on circuits prototyped on breadboard. External power supply was used to supply the power rail of the implemented circuits. The results were gathered exclusively by an oscilloscope. For these experiments, packaged devices were used, connected to the breadboard discrete componentbased circuit using a breakout board. The power supply used for the experiments was 0.65V, to avoid any unwanted state programming through the trains of reading pulses applied to the differential part of the circuit. For the pMOS devices, we used the NDP5020P (1H10AA) model while for the nMOS devices we used the SUP85N02-03 (T32BAA) model.
The memristor devices used throughout these experiments are detailed above. Prior to use in the practical proof-of-concept MCMTLG case studies, the devices were pre-conditioned separately by electroforming and resistance stabilization in the required functional range. We measured the circuit response through the Rigol MSO4000 Oscilloscope. The input vector and the clock signal were produced through microcontroller programming and converted to circuit-specific voltage levels through custom resistive voltage divider circuits. It worth noting that for the case of 3-input we measured the input vectors and the clock signal through the Logic Analyzer (LA) digital probes, due to the limited number of analogue probes available on the oscilloscope. In each experiment, the memristive devices used were programmed in the required state using an ArC ONE instrument board (ArC Instruments, UK). An example of such programming Characteristic behavior of the memristor devices. In Fig.1a a standalone memristive topology is shown, where each crosspoint device is a memristor. Each device is a stack of Pt/Al 2 O 3 /TiO 2 /Pt/Ti (10/4/24/10/5) nm, with the dimensions per crosspoint device being 60 × 60 μm 2 . In Fig.1b an example of arbitrary programming of select memristive states is presented [8] . The upper trace of (b) presents the memristance programming due to the voltage bias pulse, shown in the lower trace of (b). The memristors are fabricated in-house by our team. Alongside the +2V bias pulses corresponding to the multi-state memristance programming, we can observe the +0.5V reading pulses, showcased in orange color, as well as the −2V pulses that assure us that before each programming phase the memristance is reset fully (Fig.2 is adapted from [75] ).
is provided through Fig.2b [7] . All devices used for all the experiments were located on the same die, i.e. only one memristive device package containing a total of 32 memristors. Having decided upon the details of the components of our practical implementation, we demonstrate experimental setups to validate the functionality of the circuit and gaining insights regarding its real-world constrained operation.
V. MCMTLG VALIDATION
The proposed MCMTLG was built using discrete components on a breadboard. For our physical implementation, the differential circuit uses two pMOS transistors back-toback for power-gating (see Fig. 4a ). This is done to control the connection of the input and threshold weighted vectors to the voltage supply, thus avoiding logic state degradation of the latching element during the evaluation phase, as well as improved operational stability even with noisy input vectors. The BtB pMOS circuits also enable lower power consumption, due to the fact that the differential part is cutoff and does not consume power during the evaluation phase. The voltage supply V DD used in the proposed design is 0.65V, thus ensuring that the memristive devices being used cannot be accidentally programmed during operation. Furthermore, a BtB PMOS circuit was used also for the equalization circuit that reset the sensor part before the evaluation being performed. V CLK , which control the operation cycle of equalization/evaluation, and the input vector's high voltage level, are set to 0.9V. The low logic level for both the V CLK and the input vector is set to 0V (Gnd). A microcontroller (Raspberry Pi 3 Model B) is used to generate the clock signal as well as the input vector used for all the experiments (2-, 3-and 4-input MCMTLGs) described below.
A. 2-Input MCMTLG
In the 2-input circuit experiments, the threshold branch consists of a single 1T1M sub-circuit (TH) while the input vector consists of two 1T1M sub-circuits (M1, M2). Simultaneously, AND and OR functionality, can be interpreted as different flavors of majority-gating, thus defining a decision boundary that classifies the input vector (see Fig.3a,b) . For example, a MAJ-1 gate is equivalent to a 2-input OR gate and a MAJ-2 is homomorphic to an AND gate. This implies memristive weights set in such a way that either input has a larger weight than the threshold branch (i.e. M1, M2 < TH). Similarly for AND: M1, M2 > TH, but M1||M2 < TH. Hence, by altering the memristances of the devices we are able to alter the functionality of the MCMTLG. An LTSpice simulation of a TL using resistors of various values as weights demonstrates the effect of resistance on the decision boundary of the TL (Fig. 3(c) ).
In the circuit level, the aforementioned differential impedance enables a majority functionality, where at least one of the input 1T1M sub-circuits has to be conductive in order for the input network conductance to enable higher current flow toward the sensor part, thus enforcing a change in the memory state of the CMOS latching element. At the same time, the value of the input memristive weights are used to define the contribution of each input or threshold, thus used as a form of signal multipliers where a lower memristive weight results in a higher multiplier factor for an input/threshold signal compared to a higher memristive weight. Through the employment of these programmable contribution rates, we are able of enhancing the reconfiguration capabilities of the circuit.
For the physical implementation of a 2-input AND MCMTLG we used the weight configuration of {M1, M2; TH} = {60.5k , 60k ; 33k }, thus satisfying the requirements for the AND TL inequality equation (majority-2 function). These values are chosen from the available dynamic range of the memristor devices programmability [7] . From the canonical (CA) output we can measure the AND function circuit response, while from the complementary (CO) output we can obtain the complementary function, NAND. The measured response of the AND/NAND TLG configuration is shown in Fig.4b and the input vector is presented in Fig.4d for the first input (IN1) and in Fig.4e for the second input (IN2). The clock signal that controls operation is shown in Fig.4f . Due to the use of binary input vectors the quantized corner points of the 2D area, as seen in Fig.3a,b , are of interest.
The CA (canonical) output, where AND and OR functions can be measured, and CO (complementary) output, where the NAND and NOR functions can be measured, can be seen in Fig.4a. Fig 4b showcase the complementary (CO) output of the circuit configured to perform the 2-input AND/NAND gate (NAND(CO)), while in Fig.4c the measured CO output for the OR/NOR configuration (NOR(CO)) is shown. The clock signal is determining the equalization/reset phase (clock HIGH) and evaluation/set phase (clock LOW) cycle of operation. The outputs NAND(CO) and NOR(CO) are valid during the evaluation phase, while during the equalization we can see that the output signals stay at an intermediate unstable logic level. The V D D = 0.65V and the V C L K = V I N = 0.9V (for the logic '1'). It is worth noting that due to the use of pMOS devices in the 1T1M sub-circuits of the differential part, the logic for HIGH input voltage the corresponding input is non-conductive (logic '0') while for LOW input voltage the corresponding input is conductive (logic '1').
In the case of the OR functioning MCMTLG the differential part configuration was set as {M1, M2; TH} = {33.8k , 18.3k ; 41.6k }. These values are chosen from the available dynamic range of the memristor programmability. Similarly to the configuration of the AND/NAND case study, the input vector (Fig.4d,e ) and the clock signal (Fig.4f) are the same for this operational configuration. Both OR and NOR, as well as the AND/NAND, functions are performed simultaneously due to the complementary bi-stable operation of the sensor part.
Toward getting a detailed glimpse of the inner working of the physically implemented circuit, we are demonstrating a practical experiment to validate a 2-input MCMTLG with sweeping inputs, as shown in Fig. 5 . The circuit has been developed for binary input space (the input vector control the accompanying transistors of the 1T1M) and responds with binary values, but we are able to capture the full behavioral aspect of the MCMTLG by introducing analogue ramps as inputs to its input 1T1M array.
As shown in the results of this experiment we can map the evaluation response of a 2-input MCMTLG. For the four cases of the response in the analogue input signal, we used 3 memristor devices per gate configuration, one for the threshold 1T1M array and 2 for the input 1T1M array. We program the memristors for performing all the possible TL function given the input space (2 inputs) under test. The possible function are the AND (M1,M2 > TH and M1||M2 < TH) (see Fig.5a ), OR (M1,M2 < TH) (see Fig.5b ), IN1 (M1 < TH, M2 > TH) (see Fig.5c ) and IN2 (M1 > TH, M2 < TH) (see Fig.5d ). From each case, shown in Fig. 5 , highlighting the different responses of the MCMTLG for different memristance weight sets, we are observing the moving of the 2D space separation boundary. For the cases in Fig. 5a,b we can understand the separation boundary as a moving 45 • line, while for the case in Fig. 5c,d the separation boundary has the form of 90 • and 180 • lines, respectively.
This example is showcasing clearly the reconfigurability of such systems and how they classify their inputs to a binary set of responses. Through this more detailed measurement approach, we can see more clearly the function-wise connection of the MCMTLG with the perceptron units and the ANNs. Similarly to the previous experiment with the 2-input MCMTLG, the 1T1M sub-circuits are conductive for LOW logic input and non-conductive for HIGH logic inputs, due to use of pMOS components for the memristive array.
B. 3-Input & 4-Input MCMTLG
For TL gates with larger input vector capabilities, thus greater fan-in, an increased number of 1T1M circuits is introduced to the MCMTLG's differential part. Through the employment of memristor memory cells, we are able to define the input weight through a single programmable resistive switch, hence eliminating large parallel pMOS networks for the differential part [25] . Hence improvements in speed and power can be observed [44] , potentially closing the gap between TL technology and low-power computing systems at the edge. To support this concept through practical measurements of practically realized circuits MCMTLGs of higher input spaces were implemented and investigated. Fig.6a shows, a 3-input MCMTLG design, where a third 1T1M sub-circuit has been added to the input branch of the differential part and memristors are employed as the analogue weights at the input/bias binary signals, similar to the 2-input MCMTLG.
The measured results (Fig.6b,c) are extracted from a 3-element input vector and 1-element threshold setting of MCMTLG. The upper trace of Fig.3c shows the response for {M1, M2, M3; TH} = {31.5k , 30k , 28.2k ; 68.2k } Fig. 3a,b we also include the input/output mapping on the quantized, binary space. An input activates (deactivates) a specific memristor branch of the 1T1M array by getting LOW (HIGH), due to the use of pMOS as the transistors in 1T1M.
memristive weight configuration which functions as a 3-input OR, where at least one of the input sub-circuit need to be conductive in order for the current comparison to result in an input-side winning node (majority-1 function). In Fig.6b the measured response for the canonical output (CA) while in Fig.6c the output of the isolation inverter connected to the CA output is shown. We can observe that during the equalization phase the voltage level or the isolation inverter is 0V while performing a full voltage swing to logic '1' when the corresponding CA output drops to logic '0'. The input vector and clock signal are shown in Fig.6d -g. In the case of memristor devices use in this configuration, similarly to the 2-input case described above, the same circuit could perform multiple functions just by reprogramming the threshold resistive value, thus configure differently the winning conditions of the current comparison performed.
Similarly to the operation of a perceptron or other simple ANN, we can observe that the memristive synaptic weights are responsible for the plane splitting of the 2D or 3D space. Through arbitrary reconfiguration of the memristive values, we are able to shift the decision boundary of neural decision-making functionality and thus alter the system of inequalities that the MCMTLG solves. The MCMTLG and TLGs, in general, are best suited for functions with high input space, thus represented by exploiting larger 1T1M arrays for the input and threshold differential part. Hence, our focus to experimentally test larger than 2-and 3-input MCMTLGs, and gather data from the gate-level testing of a memristively enhanced TL circuit.
In the example shown in Fig.7 , a 4-input MCMTLG is tested, performing a MAJ-2 (majority-2) TL function, implemented by configuring the 1T1M TL arrays as follows: {M1, M2, M3, M4; TH} = {30k , 21.6k , 31.2k , 25.2k ; 19.1k }. In Fig.7a the schematic of the circuit practically realized is shown, which include an additional 1T1M for the input network of the differential part. In Fig.7b -i the circuit response and the corresponding control signals (input vector and clock) are shown. It is worth noting that the Cab (Fig.7h) and Cob (Fig.7i ) sensor outputs are the outputs of the isolation inverters, included into the sensor part by our approach here.
We provide emphasis on the logic scaling capabilities of the memristor-based TLGs, including the proposed MCMTLG design presented in this work, as it can make this group of gates a promising candidate for advanced memory-centric reconfigurable fabric implementation, where the functionality of a computational fabric is controlled by the ReRAM memory contents distributed into this sea of gates. The importance of logic scaling capabilities of said circuits has been explored previously [44] , [47] , [76] , [77] , i.e. the capability of TLGs to replace larger CMOS circuit with a single gate, that is why we considered an important parameter of our investigation the experimental demonstration of MCMTLGs with larger 1T1M arrays.
The larger differential arrays enable us to increase the fan-in capabilities of the gate, thus the complexity of linearly separable functions that we can represent on the memristive memory components is increasing. The replacement of multi-stage CMOS logic, with an equivalent TLG circuit, can result in significant reduction in hardware cost, in power consumption and IC area as well as reduction of critical signal path length, thus even further improvements in performance and reduction in power and area [45] . 
C. Comparison With Existing Memristor-Based TLG Designs
The main contribution presented in this work is the practical implementation of said circuit using real memristors and thus the investigation of the feasibility of such circuits using stateof-the-art memristor technology. However, the MCMTLG circuits we have investigated are not the only embodiments of memristor-enhanced TL. A wealth of other designs have been proposed, albeit only examined through simulations, thus making it difficult to carry out an immediate comparison. In order to provide comparison data, we have modeled the MCMTLG in the industry-standard Cadence tool using TSMC's 65nm technology node for the CMOS transistors and our Verilog-A memristor model [78] , (which is based on measurements taken from real memristor devices [78] . Hence, .1k }. This configuration of the MCMTLG performs the following Boolean function: F= MAJ2(I1,I2,I3,I4), thus needing at least two conductive inputs for the total input current to be larger than the threshold current.
attempting to predict the performance of MCMTLG implemented in deep-submicron technologies and at the same time providing a better framework for comparison with existing memristor-based TLG designs. For the modeled MCMTLG we used memristors with values of {30k , 30k , 30k ; 18k }, performing a majority-2 (MAJ-2) function, with voltage supply of 1V and clock period of 100ns. The widths of pMOS and nMOS devices were set to 400nm and 200nm, respectively. The power of MCMTLG was measured by calculating the product of I×V of the circuit's voltage supply, over a single clock period and for the case where all three 1T1M branches of the input array are conductive (input case: logic "111"). The delay was measured for the start of the evaluation phase until the voltage difference between the two complementary We have tested for worst case scenario vis-a-vis memristor resistive states (exactly two inputs are active, leading to a close comparison between 15k and 18k composite memristances).
Next, we choose few memristive TLG designs representing major directions in circuit implementation from the literature. A performance comparison across different metrics is shown in Table I . The Memristor-based TL (MTL) [59] design was one of the first TLGs that made use of memristors as its input weights. In MTL the memristive weights are isolated from the actual thresholding units (a cascade of inverters) through the use of current mirrors. The Resistive TL (RTL) [62] is similar to the MTL, but with the simplification that the memristive network is directly connected to the output inverter. Additionally, the RTL gate uses a type of ratioed memristive network, thus a network with pull-up and pull-down branches (for the specific RTL gate a single pull-down memristor), thus enhancing the reconfiguration capabilities of the gate. The CMMTL [41] design was one of the first differential TLG concept that used memristors. In terms of design, CMMTL is similar to the proposed MCMTLG implementation, having its differential part separated from the sensor part. CMMTL tested provided a glimpse regarding some of the advantages the differential implementations have over non-differential TL. More specifically, CMMTL showcased better energy and delay metrics over the MTL implementation [41] . For the comparison, the first order Threshold Function memristive Threshold Logic Gate (1-TF mTG) implementation, a stateof-art differential current-mode memristive TLG [42] , is used as well. In this work the TLGs are optimized for minimal transistor count and can achieve very low power consumption and delay. In terms of design, the 1-TF mTG incorporates in a way the differential part into the sensor part, with the differential arrays being part of the pull-up network of the gate. The design is very interesting and is similar to the work of Bobba and Hajj [26] , which was one of the first to proposed CMOS-based CM-TLGs.
For our comparison, we use published data that support the metrics of power consumption and delay. For the case of CMMTL, we used data for the delay metric is measured by [41] using 45nm Berkley's Predictive Technology Model (PTM) technology node and for 3-input gate. For the power estimation of the CMMTL we could not gather relevant data from [41] , but in the comparative study by [53] an estimation is provided of power supply with 0.25um TSMC technology node and for 2-input case study. The MTL data were gathered from [59] , using 45nm PTM technology node for a 3-input case study. The RTL metrics were gathered by [62] , using a 0.25um TSMC technology node and the data refers to 2-input gates. The 1-TF mTG power and delay was measured in [42] using 45nm PTM technology node and the used values are for an indicative 3-input gate. CMMTL, MTL and RTL used HP memristor model [41] , [53] , [59] for their simulations, while 1-TF mTG simulations were based on VTEAM memristor model [42] . From TABLE I we observe that our implementation offers competitive power dissipation. In terms of device count (transistors and memristors), MCMTLG and CMMTL have the larger count of devices, while the much simpler design of RMTL has the least components. But in terms of delay, MCMTLG is comparable with the state-of-the-art 1-TF mTG implementation. We further observe that in general differential implementations have smaller delay that non-differential ones.
VI. DISCUSSION & CONCLUSIONS
In this work, we have presented a physical implementation of a CM-TLG circuit which uses reconfigurable memristive loads as analogue weights. Through the presented experimental results we have practically demonstrated the functionality of 2, 3-, and 4-input memristor-based TLGs, showcasing the memristor-enabled reconfigurability of the base design for all validated cases. Furthermore, we have investigated how memristor-based TLGs behave when used outside their intended, digital-input operating regime. Experiments show the decision boundary shapes, which approximate beveled L-shapes with lines parallel to the input voltage axes. Further investigation through simulations confirms these shapes and shows graphically how altering the resistive states of the memristive devices affects the specific decision boundary locations (most notably which points in the binary-quantized input space lie on the 'output = 1' side of the boundary or the 'output = 0'). Finally, we implement and simulate a 3-input TLG in a TSMC 65nm technology for the purposes of comparison vs state-of-art. To that purpose, we have utilized our own memristor model which takes into account the non-linearity of the memristive device IV (a typical feature of many technologies [11] ). Results testify to the robustness of the TLG concept by confirming no perturbation of functionality and power/delay figures comparable, and indeed competitive, vs. state-of-art. Our work thus provides experimental backing to a considerable body of literature where simulation work indicates the potential for highly energy-and area-efficient TL implementations exploiting memristive devices.
Finally, the present work suggests some routes for further investigation. Notably, we would expect the capability of memristors to support fine tuning of their resistances to become particularly valuable in TLGs of even higher dimensionality. In a 5-input TLG, the number of possible points increases to 32 whilst the number of memristive devices only rises to 6. It would be useful to investigate how the number of possible majority gates implementable with n inputs increases vs the number of available memristors (n + 1), and computing a 'logic density' metric. We conjecture that higher n will lead to a higher 'logic density' burden, which eventually reaches the maximum number of resolvable states attainable by the memristor. How this limits practical performance remains to be revealed. Another avenue of investigation pertains to linking multiple TLGs together in larger combinatorial blocks and understanding, e.g. how the memristor resistance-dependent delays may affect overall timing constraints (particularly in domino logic-type systems).
ACKNOWLEDGMENT
All supporting data are available through the University of Southampton repository with dataset DOI: https://doi.org/10.5258/SOTON/D0824
