Monolithic three-dimensional integration of memory and logic circuits could dramatically improve performance and energy efficiency of computing systems.
Information. correspond to the bottom active interface, which is in agreement with the devices' asymmetric structure. For all devices, the set switching is rather sharp, while the reset process is gradual.
For example, for the device B1 reset transition starts at reset min ≈ -1.5 V; however, to avoid partial switching, voltage exceeding reset max ≈ -2.2 V must be applied (Fig. 3b) . A slightly thicker titanium dioxide layer for the bottom devices resulted in higher set threshold voltages as Significant set threshold voltage variations (Fig. 3b) is a major challenge for implementing IMP logic. Therefore, it is natural to choose circuit parameters (i.e. GL, VL, VP) that maximize the range of variations, also referred as margins, which can be tolerated without comprising the correctness of logic operation. Some earlier works suggested choosing GL' = (GONGOFF ) 1/2 for the most optimal design, 31 however, our simple analysis of IMP logic operation (Sec. 3 of Supplementary Information) shows that set margins monotonically increase as the load conductance decreases (Fig. 1e, f) . The largest margins are for GL = 0, which cannot be implemented with the original circuit, though can be easily realized by replacing the load resistance and voltage source with a current source (Fig. 1d) . The transition from the original circuit with earlier suggested GL' to the modified one with an optimized current source IL increased set margins by more than 20% (Fig. 1e ). Such a boost in variation tolerance was critical for our experiment by allowing it to cope with virtually all experimentally observed variations (Fig. S5 ). It should be noted that, in principle, IMP logic can also be implemented using a memristor's reset transition, i.e. assuming that logic states "0" and "1"
are represented by the ON and OFF states instead. However, this would not be helpful in our case, because of the gradual reset transition-see Section 3 of Supplementary Information for more details.
Using the variation tolerant design with optimal values of IL and VP, which were obtained from accurate numerical simulations based on experimental (nonlinear) I-V curves, we successfully demonstrated IMP logic with the fabricated memristor circuit (Figs. 4 and 5 ).
In the first set of experiments, a series of IMP operations were performed sequentially utilizing four different pairs of memristors (Figs. 4 and S7 ). Before each logic operation, the devices were always written to the specified initial states, therefore this experiment is a proof of memory and logic functionality implemented within the same circuit. Moreover, the considered pairs constitute all possible combinations of memristor's polarities in IMP circuit and hence are sufficient to compute and move information in any direction within the circuit.
In most cases of the first experiment, output conductances are close to the extreme ON
and OFF values, so that it should be possible to cascade IMP logic gates, i.e. use the output of one gate as an input for the other. To confirm this, in the next series of experiments, we implemented NAND Boolean logic operation, for which inputs were the states of the bottom level devices and the output was stored in one of the top level memristors (Fig. 5 ). The NAND gate was realized in three steps -an unconditional reset, followed by two sequential IMP operations. 11 The result of the first IMP operation was stored in the top level device, which was then used as one of the inputs to the second IMP gate. In some rare cases (~ 6.5% of total IMP operations), there is some visible reduction in the ON-to-OFF conductance ratio. This is not desirable because set margins decrease with ON-to-OFF ratio (Fig. 1e) . One plausible solution to restore the ratio is to read the state and write it back, i.e. similar to what was implemented in the first experiment.
Interestingly, three-dimensional IMP logic enables a practical solution for one of the Feynman Grand Challenges -the implementation of an 8-bit adder which fits in a cube no larger than 50 nanometres in any dimension. 21 The major building block -a full adder, which adds Boolean variables a, b, and cin to calculate sum s, and carry-out cout, requires 6 memristors and consists of two monolithically stacked 2×2 crossbars sharing the middle electrodes (Fig.   6a) . Two of the memristors in the crossbar are assumed to be either not formed or always kept in the OFF state (Fig. 6b) , which eliminates leakage currents typical for crossbar circuits and makes IMP logic set margins similar to those of the demonstrated circuit. In particular, at the start of computation, a, b, and cin are written to the specific locations in the circuit (Fig. 6c) . A sequence of NAND operations, each consisting of one unconditional reset step and two IMPs (Fig. 5) , is then performed to compute cout and s according to the particular implementation of Fig. 6d . An occasional NOT operation is implemented with one unconditional reset step and one IMP step and is used to move variables within the circuit. In total, the full adder is implemented with 9 NAND gates and 4 NOT gates, i.e. 13 unconditional reset steps and 22
IMP steps. The simplest way to read an output of an adder is to measure electrically the state of memristors T2 and T3 (Fig. 6c) . Alternatively, the output can be sensed as a mechanical deformation of upper metal electrodes, which is often observed in metal-oxide memristors, 32 or using scanning Joule expansion microscopy. 33 Finally, a full 8-bit adder could be implemented in a ripple-carry style 34 by performing full adder operation 8 times.
In summary, we have demonstrated logic-in-memory computing in three-dimensional monolithically integrated circuits. As the memristor technology continues its rapid progress and will eventually become sufficiently advanced to enable large-scale integration of memristive devices with sub-nanosecond, pico-Joule switching with >10 14 cycles of endurance, which so far was demonstrated for discrete devices, [22] [23] [24] we expect that the presented approach will become attractive for high-throughput and memory-bound computing applications suffering from memory bottleneck problems. Furthermore, we showed how the presented approach establishes a realistic pathway towards resolving one of the Feynman's Grand
Challenges. The remaining challenge is to scale down the circuitry (Fig. 6a) , which does not seem unrealistic task given that discrete metal-oxide memristors with similar dimensions 24 and much more complex (but less dense) memristive circuits have been already demonstrated. 
Supplementary information 1. Circuit fabrication
Devices were fabricated on a Si wafer coated with 200 nm thermal SiO2. Circuit fabrication involved four lithography steps using an ASML S500 / 300 DUV stepper with a 248 nm laser. To prevent from misalignment of device layers, the bottom devices were made larger with an active area of 500 nm × 500 nm, as compared to a 300 nm × 500 nm active area of top devices.
In particular, in the first lithography step the bottom electrode was patterned using a developable antireflective coating (DSK-101-307 from Brewer Science, spin speed 2500 rpm, bake 185ºC, thickness ~50 nm) and positive photoresist (UV210-0.3 from Dow, spin speed 2500 rpm, bake 135ºC, thickness ~300 nm). 5 nm / 20 nm of Ta / Pt were evaporated at 0.7 A/sec deposition rate in a thin film metal e-beam evaporator. After the liftoff, a "descum" by active oxygen dry etching at 200ºC for 5 minutes was performed to remove photoresist traces.
In the next lithography step, the middle electrode was patterned and the bottom device layer (6 nm / 45 nm of Al2O3 / TiO2-x bi-layer) and middle electrode (15 nm / 38 nm of Ti / Pt) were deposited using low temperature (< 300ºC) reactive sputtering in an AJA ATC 2200-V sputter system. To minimize sidewall redeposition on the photoresist, which was undercut during sputtering of the middle electrode and caused "bunny-ear" formation around the edges of middle electrode (Fig. S1a) , both metals were deposited at 0.9 mTorr, which is the minimum pressure needed to maintain plasma in the sputtering chamber. Also, the thickness of the photoresist undercut layer was optimized to provide more shadowing by using a liftoff layer of LOL2000 (from Shipley Microposit, spin speed 3500 rpm, bake 210ºC, thickness ~200 nm) followed by the same DSK101/ UV210 stack as for the first lithography step mentioned above. Occasional lumps were reduced to the height of ~ 20-30 nm by swabbing in isopropanol (Fig. S1b) . Severe topography of the bottom level devices (Fig. 2e ) may cause shorts and large variations in top level devices. To overcome this problem, a planarization step was performed using chemical mechanical polishing and etch-back of 750 nm of sacrificial SiO2.
SiO2 served the double purpose: as a sacrificial material for planarization and as an insulation among devices. The most optimal planarization was achieved by depositing SiO2 at 175ºC 2 using PECVD. Following the deposition, 400 nm of SiO2 were removed by chemical mechanical polishing for 3 min achieving surface roughness of less than 1 nm. The last step in the planarization procedure was to etch back ~ 250 nm of SiO2 until the middle electrodes were exposed (Fig. 2f) . Several etch-back approaches were investigated with the best results achieved using CHF3 at 50 W, which had an etch rate of 0.2 nm/s (Fig. S2) . In particular, the dry-etching with CHF3 was done in steps to ensure < 5 nm roughness in the exposed middle electrode. AFM scans were performed after each etching step to check the thickness of the exposed electrode (Fig. S3) and to confirm that the post-etch surface has no traces of bunnyear formations. After planarization and partial middle electrode exposure, the top layer devices were completed by in-situ reactive sputtering of the switching layer, which consisted of 4 nm / 30 nm of Al2O3 /TiO2-x, and Ti (15 nm) / Pt (25 nm) top electrode over patterned photoresist (DSK101/UV210). No oxygen descum was performed before deposition in order to avoid potential oxidation of the bottom switching layer and to maintain TiO2-x stoichiometry.
Lastly, the pads of the bottom and middle electrodes were exposed through a CHF3 etch of the sacrificial SiO2 which was used for planarization.
In all lithography steps, the photoresist was stripped in the 1165 solvent (from Shipley Microposit) for 24 h at 80ºC. 
Electrical testing and device forming
All electrical testing was performed with an Agilent B1500 tool. The memristors were electroformed by grounding the device's bottom electrode and applying a current-controlled quasi-DC ramp-up to the device's top electrode, while keeping all other circuit terminals floating. For most of the devices forming voltages were around ~ 2-3 V, while device T1 did not require forming (Fig. S4 ). To minimize current leakage during the forming process, each memristor was switched to the OFF state immediately after forming. For all devices the most severe are cycle-to-cycle variations in set transition (Fig. S5) , which range from 0.7 V to 1.6 V for the top layer devices, and from 1.1 V to 1.9 V for the bottom ones. However, because of gradual switching, |V max -V min | statistics is comparable or wider for reset transition (Fig. S5) . A.
reset max e
Material implication logic
The optimal circuit parameters VP, VL and GL, which result in the largest set margins could be derived analytically for the memristors with linear I-V (Fig. 1b) . Let us first consider an IMP circuit with a specific "parallel" configuration of memristors (Figs. 1c and S6a) .
Assuming for convenience that VQ = 0, the proper operation of the material implication logic circuit shown on Figs. 1a, c require that device Q is set only when both P and Q are in the OFF state, i.e.
where
is a voltage on the common electrode. Device P should not be disturbed during the IMP operation, i.e. Therefore, the largest set margins and the corresponding optimal parameters can be found by solving the following equations:
where set * = ( set max + set min )/2
Here, ∆ ideal is a set margin for the binary zero-variations (i.e. ideal for the considered application) memristors for which set * = set max = set min . Accounting for variations in set switching threshold and analog switching, a more relevant for our case margin is
From Eqs. (7-9) VP, VL and ∆ ideal are
According to Eq. 10 ∆ ideal is monotonically decreasing with GL (Fig. 1e ) and the maximum margins are achieved for GL = 0, i.e. a circuit on Fig. 1d for which
For devices with large ON-to-OFF conductance ratio, Eq. 13 can be approximated with very simple formula
It is instructive to compare IMP logic margins with those of passive crossbar memories. For example, let us consider the most optimal V/3-baising scheme, 1 and assume that voltages V and 0 are applied on the lines leading to the selected device, and V/3, and 2V/3 on the corresponding lines leading to the remaining devices. Assuming that voltage across the selected device is = set * + ∆ memory , while it is /3 = set * − ∆ memory across all other devices, it is straightforward to show that the margins for crossbar memory are
Thus voltage margins for memory circuits are more relaxed as compared to those of IMP logic. In principle, a somewhat larger IMP logic set margins can be obtained by not enforcing full switching, e.g. by defining set max as the largest set threshold voltage due to cycle-to-cycle variations. However, in this case, the ON-to-OFF ratio will get reduced with every IMP logic operation, which is not desirable.
The analysis above is for a specific IMP logic based on memristors with identical linear static I-V characteristics. It is straightforward to extend it to a more general case by using specific to memristors Q and P parameters in Eqs. (S6-S8), such as different set and reset threshold voltages for the top and bottom devices, which is the case relevant to the implemented circuit. For example, a more general set of equations for parallel configuration shown on Fig. S6a , which is more convenient to solve for Δ directly, is 
For anti-parallel configuration shown on Fig. S6b , the set of equation is
and the actual margin for GL= 0 is
Because − reset min > set min typically holds for the considered devices (Fig. S5) , from Eqs. 19
and 21 margins for the parallel case are smaller, which is why this case is considered more in detail. Margins and optimal parameters for the remaining parallel (Fig. 4a ) and antiparallel configurations (Fig. 4d ) that were experimentally demonstrated, are similar to those described above with the only difference is that the signs for VP and IL are negative.
The analytical approach can be also utilized for IMP logic based on memristors with more realistic nonlinear static I-V by using GON and GOFF measured at large (close to switching threshold) voltages. A more accurate approach, however, is to solve inequalities Eqs. (S1-S5) numerically. By fitting experimental I-V curves (Fig. S4b ) and using Mathematica's Newton-Raphson-based solver, we have obtained more accurate optimal values for VP and VL, which were used in experimental work. The margins calculated from a numerical simulations for a specific IMP logic are also shown on Fig. 1e and are in fairly good agreement with simple analytical model for a system with an ON-to-OFF conductance ratio of ~10.
Material implication logic experiment
For IMP and NAND experiments presented in Figs. 4 and 5, the memristors were set to the initial states using the state tuning algorithm. 
