Recently proposed probabilistic spin logic (PSL) has offered promising solutions to novel computing applications, including some that have previously been covered by quantum computing. Several task implementations, including invertible logic gate, have been simulated numerically. Here, we report an experimental demonstration of a magnetic tunnel junction (MTJ) based hardware implementation of PSL. The probabilistic bit (p-bit) is the basic element of PSL. In our hardware implementation of a p-bit, two biasing methods, magnetic field and voltage, were used to better tune the characteristics of the MTJ random fluctuations. This addresses the potential system-wide speed limitations that result from the unavoidable device-to-device variation in MTJ fluctuation rates. With the p-bit hardware implementation demonstrated, we built three p-bits and connected them through a resistor network to implement an example PSL, an invertible AND gate, which performs exactly as expected.
I. INTRODUCTION
There are multiple pathways that have been explored in past years by the research communities to go beyond CMOS for memory and computing , Manipatruni 2018 . Despite the major progress on magnetic tunnel junctions (MTJs) as memory cells in nonvolatile magnetoresistive random-access memory, MTJs have also been involved in experimental demonstration of various computing schemes, such as reconfigurable and programmable computing [Wang 2005 ], neuromorphic computing [Torrejon 2017 ], stochastic computing [Lv 2017] , and true in-memory computation [Lyle 2010 , Chowdhury 2018 , Zabihi 2019 . Recently, probabilistic spin logic (PSL) based on MTJs has been suggested as a promising solution in novel computing scenarios [Camsari 2017a ]. The key building block of PSL is a probabilistic bit (p-bit). Its output fluctuates randomly in time between "0" and "1," but the mean of the output is input-dependent. In the original proposed implementation of a p-bit, MTJ is designed to be superparamagnetic with ultralow thermal stability so that the magnetization of its free layer spontaneously fluctuates over time. Then the fluctuation of magnetization is translated into electrical signal by tunneling magnetoresistance (TMR). This signal is influenced by an input signal by either spin-orbit torque on the magnet [Camsari 2017b] or shifting the signal against a reference threshold voltage [Camsari 2017a ]. When multiple p-bits are properly coupled, the PSL system can solve certain classes of problems or implement certain functionalities, such as invertible logic [Camsari 2017,] and inference [Faria 2018 ]. Nevertheless, PSL schemes are studied and evaluated primarily numerically so far; an experimental demonstration could provide more realistic insight for further development of PSL. While we have been carrying out our independent experimental demonstration of PSL [Lv 2019 ] since 2018, we notice the recent experimental PSL demonstration [Borders 2019 , Nikonov 2019 .
The original proposal for p-bit implementation uses superparamagnetic MTJs with ultralow thermal-stability factor. An effective tuning of the rate (or speed) of fluctuation has never been implemented. The device-to-device variation of thermal stability of MTJs is not negligible ]. This variation is contributed by many factors, including defects [Abeed 2019 ], variation of MTJ thin-film stack, and the variation of multistep device fabrication processes. Furthermore, this device-to-device variation of thermal stability can lead to a large variation of the MTJ's random fluctuation rate. For example, the variation of the fluctuation rate is exponentially dependent on the variation of thermal stability factor at a ratio of 2.3 per decade, according to the Néel-Arrhenius equation [Brown 1963 ]. From a system perspective, p-bits with MTJs of higher-than-intended thermal stability and slow fluctuation could appear "stuck in a state" or appear losing their shortterm randomness. In addition, these slow-fluctuating p-bits could limit the entire PSL system's speed performance, or alter its randomness integrity, or even disturb the short-time statistical state distribution of p-bits. Although the impact on integrity or performance of PSL may vary with different tasks [Drobitch 2019 ], the PSL system may require longer time to give correct solutions, become more easy to settle at local meta-stable solutions than the best global solution within the expected time, or give solution combinations with skewed or even incomplete distributions within expected time. In this letter, we report an experimental demonstration of a p-bit building block using thermally stable MTJs that are biased by two means. We adopt this "dual-biasing" method (involving bias field and bias voltage applied on MTJ) to operate the MTJ [Suh 2015 , Zink 2018 and design a circuit accommodating such method to implement the p-bit building block functionality. The "dual-biasing" method offers adequate tunability on both mean (percentage of MTJ in certain state) and rate (the average frequency of MTJ's fluctuations or switching) of MTJ random fluctuation. In addition, the additional tunability is used to compensate MTJ device-to-device variations and have the MTJs generate random signals close to 50% mean and at similar (and faster) rates. Therefore, the overall PSL system speed is not limited by the slowest MTJ device due to device-to-device variations, and a faster speed could be achieved or a better solution could be obtained within certain amount of time. Although the MTJs used in this demonstration are thermally stable MTJs, the tunability of biasing is also expected to be effective for thermally unstable MTJs. By adopting MTJs with lower thermal stability, the static power consumption of biasing can be reduced.
II. DEMONSTRATION OF P-BIT BUILDING BLOCK

A. p-Bit Building Block Hardware Design and Implementation
We first introduce the operation scheme and physical mechanisms involved to operate the MTJ to randomly fluctuate with desired tunability. As shown in Fig. 1(a) , an MTJ is biased by a constant voltage source and is under a static bias magnetic field. The field activates switching from a antiparallel (AP) to parallel (P) state. When in the P state, the MTJ has lower resistance and draws more current. More current generates more spin-transfer torque (STT), which activates switching from P to AP. It is shown that the average dwell time of an MTJ in the P and AP state can be tuned individually by combinations of bias voltage and field [Zink 2018 ]. This scheme of operation allows us to experimentally demonstrate a PSL consisting of three MTJs in this letter, despite their property variations. Fig. 1(b) shows the MTJ-based hardware implementation of a p-bit building block, which we will refer to as a "p-block" in short. The input signal is attenuated and shifted by resistors R 1 , R 2 , and R 3 , and then is fed to the comparator as a threshold signal. The constant dc voltage source V block drives the MTJ, via a small resistor R sense in series, to randomly oscillate. The added small resistor R sense allows the MTJ's oscillation to be converted into a random voltage signal. This signal is then low-pass filtered by R filt and C filt before entering the comparator. The filtering allows for optimization of the p-block transfer curve. Fig. 2(a) shows photographs and illustration of the experiment setup. The entire setup consists of a main printed circuit board (PCB) demonstrating the PSL, a tester PCB generating test input signals to and analyzing output statistics in real time from the main PCB, magnets for bias field, and a digital oscilloscope for measuring and recording PSL outputs. Fig. 2(b) is a zoomed-in photograph of the main PSL board. It contains three identical p-blocks, the resistor network to couple the p-blocks, and an MTJ die. Integrated semiconductor electronic components and passive components are also used on the backside of the main PCB to complete the p-block circuit as well as necessary power conditioning. Fig. 2(d) .
B. Effects of Filtering on p-Block Response
Since the MTJs for a p-block are operated in the thermal activation regime, the magnetization fluctuation mostly dwells in either the P or AP state and shows relatively sharp transitions between the two states for the MHz rate the MTJs fluctuate at. This translates into sharp transitions of MTJ electrical signal by the TMR effect. In our implementation, a resistor and a capacitor are used to low-pass filter the MTJ signal. The filtered signal would have smoother or more gradual transitions, which eventually results in more desirable p-bit transfer response. In the rest of this section, we study and optimize the filtering effects on MTJ signal and the corresponding p-block response. Fig. 3(a) shows the waveform (left) and voltage level histogram (right) of V MTJ , which is the voltage across R sense reflecting MTJ's random fluctuation. Without any low-pass filtering (black, t C = 0 µs), sharp transitions in the MTJ's waveform results in high distributions of V MTJ at the two discrete voltage levels that correspond to MTJ P or AP states, and low distributions in between. With increasing low-pass filtering (yellow and then red), the MTJ's signal becomes smoother and dwells more often in between the two discrete voltage levels. The relative histogram of V MTJ also shows the effect of filtering on the voltage level distribution. The unfiltered signal shows high distribution at the two voltage levels leaving low distribution in between (black). With increasing amount of filtering, distribution of V MTJ becomes flatter (yellow), and even single-peaked (red). The input signal of p-block acts as a threshold to V MTJ signal. When the V MTJ is unfilterd, the timeaverage of p-block output is mostly 0%, 50%, or 100%, and a p-block time-average transfer curve is two-step-like, as shown in Fig. 3(b) (black). Because the V MTJ is heavily distributed at the aforementioned two discrete voltage levels and lightly distributed in between. Only when the V MTJ is properly filtered and its mid-range distribution is sufficiently increased, the p-block input signal thresholding the filtered V MTJ yields a time-average transfer curve that ranges from 0% to 100% gradually and smoothly, as shown in Fig. 3(b) (red). The sigmoidal transfer curve is more desirable for intended PSL operations.
C. p-Block Key Functionality and Response Time
With properly configured filtering, the desired response of p-block is obtained. As shown in Fig. 4(a) , the output of the p-block is seemingly random digital signal (blue). However, its mean or time average (red) is a function of input. A positive input will pull the output average to deviate from 0.5 and decrease toward 0, while a negative input will push output average toward 1.
The response time of the p-block is revealed by feeding a step signal to its input. Since the output is random, 100 output waveforms are captured and overlay plotted to detect the earliest-possible response of p-block at the output. As shown in Fig. 4(b) , the response time of p-block is well below 1 µs. This measurement is limited by the limited slew rate of the input step signal instead of the p-block circuit. Additionally, since the MTJ oscillates randomly on the order of 1 MHz, limited number of waveform captures may slightly underestimate the actual response time. In addition, the low-pass filtering on V MTJ signal does not affect response time of p-block. 
III. DEMONSTRATION OF PSL IMPLEMENTING AN INVERTIBLE AND GATE
A. Invertible AND Gate Functionality and Design
With the single p-block demonstrated, we then build three p-blocks as well as other necessary circuitry, and demonstrate a PSL of an invertible AND gate as an example. First, we would like to introduce the concept of invertible logic and expectations of functionality. As shown in Fig. 5(a), terminal A, B , and C satisfy the relationship of C = AB. Compared to a standard AND gate, all three terminals of the invertible AND gate are both input and output capable. When no information is given to the gate A, B, and C fluctuate seemingly randomly. But they follow the constraint of C = AB. When some information is given to some terminal(s), the state of that (these) terminal(s) become certain. Then the other terminal(s) become either certain if only one state is legal, according to the constraint or otherwise fluctuate among all legal states allowed by the constraint. The right-hand side of Fig. 5(a) reveals the fact that the flow of information is bidirectional among all three terminals.
Second, to implement such an invertible AND gate by PSL, the three terminals of the gate are represented by three p-bits and implemented by three p-blocks. Schematics of the demonstration of this PSL are shown in Fig. 5(b) . Three p-blocks are built involving three MTJs. The outputs of p-blocks are coupled to their inputs via a resistor network. Values of resistors determine the intended functionality of the PSL, in this case being an invertible AND gate. Biasing resistors to both power rails are also parts of the resistor network. Information (clamp signals at "clamp A, B, C") from outside of the PSL can enter via three clamp resistors, R cA, B, C . If a clamp terminal is driven to V dd or V ss , the corresponding p-bit is clamped to logic "0" or "1," respectively. If the clamp terminal is left floating, its p-bit is left free. Final states of the p-bits can be read at the p-block outputs. Note that other basic two-input logic gates, such as NAND, OR, and NOR, are possible by only changing resistors positions and values within the 8-by-3 grid.
Last, before running the PSL, the three implemented p-blocks are tested simultaneously. Based on our previous study [Zink 2018 ], the bias field and bias voltage primarily changes the average dwell time of MTJ in the AP and P state, respectively. By adjusting the bias fields and voltages on the three MTJs, the fluctuation rate and mean are brought to about 1 MHz and 50%, respectively. Fig. 5(c) shows the waveform of all three p-block outputs when their inputs are disconnected from the resistor network and to a ramp signal, "test input." This shows that all the three p-blocks are working properly and simultaneously. Note that the actual voltage of p-block outputs is between -1.65 and 1.65 V. We consider voltage greater than 0 V as logic "1" and otherwise logic "0."
B. Invertible AND Gate "Forward" Operations
After all p-blocks are tested and confirmed, they are switched to the PSL circuitry (inputs and outputs switched to the resistor network). States of p-bits are read out by an oscilloscope at outputs of corresponding p-blocks. The combination of states of p-bit A, B, and C are coded as [ABC] and represented as a 3-bit binary values ranging from "000" to "111" for all histogram results. All states satisfy the AND constraint (legal) are plotted in green while those violate the constraint (illegal) are plotted in red. Also note that in this and the following section, although for invertible logic there is no real difference between "input" or "output" terminals, we will refer terminal A and B as "inputs" and C as "output," as if they were in a conventional gate. In addition, we will refer the flow of information as "forward" or "backward" as if for a conventional gate. Fig. 6(a) shows such relative histogram with A and B clamped to various certain states, as if information is flowing "forward" through the AND gate. The results reflect the PSL's AND gate behavior as expected. Note that "clamp = 0(1)" means that the clamp terminal is driven so that the corresponding p-bit is clamped to logic "0(1)"; "clamp = n" means the terminal is left floating and the p-bit is left free. Fig. 6(b) shows results when only one of A and B is clamped so that information flows "forward" but is "incomplete" on the standard "input" side. When A is clamped to "0," B fluctuates between "0" and "1" and C always stay "0" being legal.
C. Invertible AND Gate "Backward" Operations and Stabilization Time
For an invertible logic gate, information can also flow "backwards" from the standard "output" C side to the standard "input" A and B side. As shown in Fig. 7(a) , when C is clamped to "1," only [AB] = 11 is legal, and as a result, all other combinations of [AB] are eliminated or suppressed. In addition, when C is clamped to "0," three legal combinations of [AB] are present while the illegal combination "11" is eliminated. Fig. 7(b) shows waveforms of C, A, and B when the clamp signal at C is switched from V ss to V dd and C is transitioning from "1" to "0." Before the transition when C is "1," [AB] are mostly "11." After the transition of C to "0," A and B are more "uncertain" but also are exclusive to each other and avoid the illegal "11" state. The effect of C on A and B is instant (on the order of few µs). Fig. 7(c) shows the time evolution of [ABC] relative histogram with various sample time after C's transition. The histogram becomes correct and stable after 100 µs. Since the p-block's response time is much faster that 100 µs, this stabilization time is more attributed to the statistical accumulation process (uncertainty) of sampling A, B, and C, considering the limited rate of random change of p-bits.
D. Invertible AND Gate Other Operations and Free-Run Operation
One terminal from the standard "input" side and C from the standard "output" side can also be clamped. Fig. 8(a) shows the case where C and B are clamped while A is left free. A fluctuates among all legal states. When [BC] = 01, which is already illegal, A goes to "1" to try to satisfy the AND constraint. The mirrored cases, where B is left free, yield similar results.
When all three p-bit are left free, they fluctuate among all legal states under the AND constraint. As shown in Fig. 8(b) , all legal states are more likely than the illegal states by adequate margins.
