Abstract-In a previous paper, we described a new abstract circuit model for reversible computation called asynchronous ballistic reversible computing (ABRC), in which localized informationbearing pulses propagate ballistically along signal paths between stateful abstract devices and elastically scatter off those devices serially, while updating the device state in a logically-reversible and deterministic fashion. The ABRC model has been shown to be capable of universal computation. In the research reported here, we begin exploring how the ABRC model might be realized in practice using single flux quantum solitons (fluxons) in superconducting Josephson junction (JJ) circuits. One natural family of realizations could utilize fluxon polarity to represent binary data in individual pulses propagating near-ballistically, along discrete or continuous long Josephson junctions or microstrip passive transmission lines, and utilize the flux charge (−1, 0, +1) of a JJ-containing superconducting loop with Φ 0 < I c L < 2Φ 0 to encode a ternary state variable internal to a device. A natural question then arises as to which of the definable abstract ABRC device functionalities using this data representation might be implementable using a JJ circuit that dissipates only a small fraction of the input fluxon energy. We discuss conservation rules and symmetries considered as constraints to be obeyed in these circuits, and begin the process of classifying the possible ABRC devices in this family having up to three bidirectional I/O terminals, and up to three internal states.
I. INTRODUCTION

I
N ORDER for any candidate technological basis for computing to potentially be economically viable for highperformance computing (HPC) applications, it must be capable of attaining a high level of computational energy efficiency, a figure of merit that characterizes how many standard computational operations can be performed per unit energy dissipated to the environment. Maximizing this figure is essential in order for a technology to be cost-effective for typical HPC applications, since energy-related costs (including infrastructure for power delivery and cooling) often comprise a substantial part of the lifetime cost of ownership for typical HPC system deployments. The current efforts in places including the US [1] , Europe [2] and China [3] to develop superconducting electronics (SCE) as a basis for HPC are thus motivated in large part by the hope that superconducting logic will in the end prove to be more energyefficient than end-of-roadmap CMOS technology.
However, for HPC applications in typical environments such as datacenters that are not already inherently operating at cryogenic temperatures to begin with, a realistic accounting of the energy cost for SCE must take into account the overhead related to refrigeration, which can be on the order of 1000×, depending on the scale of the cryo-cooling apparatus [4, Sec. 4.2.5.1]. This puts SCE technology at a disadvantage, and, when cooling is accounted for, typical superconducting logic technologies seem to be no more energy efficient (when normalizing for speed) than is leading-edge CMOS technology (Fig. 1) .
Some authors have sought to identify alternative advantages for superconducting circuits in terms of interconnect energy efficiency or multi-layer logic, but, given that there are techniques such as adiabatic switching and SOI (silicon-on-insulator) fabrication that reduce wiring dissipation and allow multiple active 1051-8223 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
logic layers in the CMOS world, a thorough analysis is needed to show clearly that a competitive advantage over CMOS nevertheless exists for SCE in these areas. In any case, further improving energy-delay performance of logic remains desirable. As awareness of the above issues has grown, we have been motivated, in the work reported herein, to attempt to develop a novel superconducting logic style aimed towards achieving an improved energy-delay product for logic compared to existing superconducting logic styles, in the hope that this then may be sufficient to overcome realistic refrigeration-related overheads, and outperform the energy-delay product of end-ofroadmap CMOS even in room-temperature environments. Recently, the RQFP (reversible quantum flux parametron) logic family did in fact achieve this milestone [5, slide 11] , [6] . However, the desire remains to find a method that may perform even better.
The most energy-efficient computing technologies consistent with fundamental physical limits require us to use the principles of logically and physically reversible computing [7] - [11] , which avoids discarding known information and incurring the associated Landauer cost of kT ln 2 from the thermalization of each bit's worth of lost information to entropy in an environment at temperature T [12] - [15] . In principle, a technology that leverages the reversible computing paradigm can perform multiple useful computational operations per kT of dissipation, and in fact there is no known fundamental (technologyindependent) upper limit on the energy efficiency of reversible machines.
A number of superconducting logic styles have been proposed that appear able to approach the ideal of reversible computation in the adiabatic limit, starting with Likharev's 1977 parametric quantron [16] and the quantum flux parametron of Goto's group in Japan [17] [19] , and the AQFP/ RQFP logic family [6] , [20] , [21] under active development by Dr. Yoshikawa's group at Yokohama National University.
However, one drawback of all these adiabatic schemes for reversible computing in SCE is that they require substantial overhead, in terms of design complexity, to distribute AC powerclock signals to every logic gate in the circuit, to drive the adiabatic transitions. A similar complexity overhead is also required even in popular irreversible superconducting logic styles such as RSFQ [22] , eRSFQ [23] , and RQL [24] ).
These observations may prompt one to consider: Could we approach reversible operation in superconducting circuits without requiring the delivery of a clock signal to every logic gate to drive, and recover energy from, each gate transition? This requires first developing an abstract theoretical model of unclocked, asynchronous reversible computing (ARC).
One of us (M. Frank) had considered the ARC problem previously, during the period 2000-2004. At the time, it seemed intractable: What happens to the timing information contained in asynchronously arriving input signals when producing a single result? However, upon revisiting the problem in 2016-17, a solution was found: We simply require one output for each input, which carries the associated timing information. The resulting model, which was dubbed Asynchronous Ballistic Reversible Computing (ABRC), since it assumes near-ballistic transport of data signals, turns out to be uniquely determined, and was proved capable of universal computation, for both reversible and embedded irreversible computations [25] .
Conceptually, the ABRC model of computation is simple: A machine is a network of abstract devices with I/O ports connected by bidirectional, ballistic interconnects. Each device may contain a local, mutable stationary state. Signals propagate as localized pulses along interconnects; whenever a pulse arrives at a device, the device carries out a deterministic transformation of its local state and emits an output pulse. For energy efficiency, the local state transformations must be at least conditionally logically reversible (a concept defined in [26] ).
In 2017, an internally-funded 3-year project began at Sandia National Laboratories to investigate whether we could implement this ABRC model in SCE, by designing a new superconducting logic family based on asynchronous, reversible operations on single flux quanta (SFQ) propagating nearballistically along passive interconnects, and interacting via near-elastic scattering with unclocked devices (implemented as unbiased Josephson junction circuits) containing mutable stationary flux quanta, to produce reversible state changes sufficient for universal computation, while dissipating far less than a typical SFQ energy per local state transition. This paper reports some initial progress towards this goal from the first year of this project.
Some related work: Early in the present project, we learned that Osborn and Wustmann at LPS/JQI are developing a related style of synchronous ballistic reversible fluxon logic [27] - [30] ; however, their approach does not provide asynchrony. A group at Hokkaido University in Japan proposed an asynchronous ballistic computing scheme for SCE based on "fusion gates" a decade ago [31] - [32] , but that one did not attempt to approach physical reversibility.
The present effort may thus be interpreted as an attempt to combine the features of ballistic propagation of polarized pulses from [27] - [30] with concepts of stateful devices from [18] and asynchronous logic from [31] - [32] , although in fact the current project was conceived independently, without any specific or conscious influence from these or other sources.
In the next section, we study in some detail a particular technology base that is suitable for building ballistic interconnects.
II. TECHNOLOGY BASE
In our initial study, we are focusing on Nb-based processes, which are widely available; these include the well-known processes at Hypres and at MIT Lincoln Labs, as well as our own SNS process (with Ta-N barriers) which is currently under development at Sandia [33] , [34] .
For our ballistic interconnects, we are considering two general classes of structures: (1) microstrip passive transmission lines (PTLs) [35] - [37] , and (2) long Josephson junctions (LJJs) [38] - [44] , which may either be continuous, or approximated by discrete (segmented) structures. Table I .
Initially, we are focusing on LJJ interconnects, which support propagation of soliton solutions to the sine-Gordon equation [38] - [43] , with associated properties of low rate of pulse dispersion and long unattenuated transmission distance.
We should clarify why we refer above to soliton dispersion as being low but not zero. In fact, flux solitons in LJJs do have a fixed width (and thus, a rate of dispersion that is identically zero) in certain circumstances, such as at the zero-velocity limit, or given a fixed nonzero bias current density. However, in our scenario, we consider fluxons launched with nonzero velocity down an unbiased LJJ, i.e., one in which the bias current density quickly approaches zero as the fluxon propagates away from its injection point. This case is dispersive, in that the fluxon width scales with 1 − v 2 /c 2 S up to a maximum of order λ J as its velocity v (v < c S = λ J ω J , the limiting Swihart velocity) declines due to various physical damping mechanisms [40] - [43] .
To facilitate modeling of LJJ structures in SPICE, we are using discretized LJJs (dLJJ) [44] , described as a sequence of identical lumped-element unit cells in a ladder configuration, i.e., a series of small parallel JJs. Table I gives example parameters for a dLJJ unit cell that can actually be constructed using available Nb fabrication processes, such as Hypres' S#45/100/200 process [45] . We captured a unit cell with these parameters as a schematic (Fig. 2) using the free XIC tool from Whiteley Research [46] . The JJ model was coded in XIC's model.lib file as:
.modeljjkjj(rtype = 0, vg = 2.8m,
representing an unshunted junction with a gap voltage of v g = 2.8 mV (appropriate for Nb), critical current I c = 1.5 μA (achievable e.g., via heavy oxidation of tunnel barriers [47] ), and capacitance of C J = 60 fF. We then strung together the unit cells to form longer LJJs for testing, such as shown in Fig. 3 . The limiting small-signal impedance Z LJJ of an unboundedlength string of dLJJ segments is derived by solving the circuit equivalence diagrammed in Fig. 4 ; this comes out to
where Z L = jωL/4 is the impedance of each 7.845 pH inductor shown in Fig. 2 , and 1/Z JJ = jωC J -j/ωL J (0) is the admittance Y J of the junction in the small-signal approximation, which applies when the phase difference ϕ across the junction approaches 0.
In Fig. 5 , we plot the value of |Z LJJ | as a function of τ = π/ω, taken to be the approximate duration of a soliton pulse whose dominant frequency component is ω. We simulated the 100-segment LJJ from Fig. 2 in WRSPICE [48] with various values of the terminating resistor, to validate impedance estimates. Fig. 6 shows some current traces. The top (red) trace is an asymmetric sawtooth-wave input to SUNY's DC-SFQ converter [49] , which responds to the rising edge by producing an SFQ pulse which is extended by the 100 pH input inductor to about 20 ps duration at the input to the LJJ; the remaining traces show the currents at 20-segment intervals along the dLJJ. We can see a well-defined flux soliton propagating at a constant rate of 20 dLJJ cells per 38 ps. Spatially, the pulse is therefore spread over ∼10.5 dLJJ cells. Assuming the rails of the ladder are straight-line inductors with ∼1 pH/μm, the estimated velocity of soliton propagation comes out to ∼8.258 × 10 6 m/s 1/36 c. In the left panel of Fig. 6 , the termination is 0 Ω (closed circuit); the flux threading the ladder is conserved in this case, and so the fluxon reflects off the end of the dLJJ transmission line with no change in polarity. In the middle panel, we terminate with a 16 Ω resistor matching the predicted impedance from Fig. 5 for a 20 ps pulse; we see that in this case, the fluxon escapes across the terminating resistor, and its entire energy is dissipated. Finally, in the right panel, we terminate with an open circuit; in this case, there is no resistor to damp the fluxon energy, so it reflects back, but this time with its polarity inverted-note that the flux polarity is not conserved in this case, since the end of the ladder is open.
For storing stationary state information internally to our devices, we are initially considering using JJ-containing superconducting loops in which the critical current I c and loop inductance L satisfy Φ 0 < I c L < 2Φ 0 ; such a loop can stably contain just 0 or 1 magnetic flux quanta Φ 0 of either + or − polarity, and thus it naturally has 3 distinct internal states (−1, 0, +1).
In the next section, we move to a more abstract discussion of the symmetry rules and conservation laws that apply to all JJ circuits; understanding these constraints will be important to help guide us in our search for circuit designs that effectively implement nontrivial ABRC device functions. Fig. 3 . XIC schematic for a simple test bench for a discrete LJJ transmission line made of a string of 100 copies of the unit cell shown in Fig. 1 . The asymmetric sawtooth current source triggers a DC-SFQ converter from the SUNY RSFQ cell library [49] ; the 100 pH inductor spreads out the resulting SFQ pulse to approximately match the soliton mode of the LJJ. The value of the terminating resistor at right can be adjusted to check impedance matching with the LJJ. 
III. CONSERVATION LAWS AND SYMMETRIES
In the first case in Fig. 6 , we saw that flux in the dLJJ was conserved when it was terminated by a closed circuit. More generally, any planar circuit with a continuous superconducting boundary must conserve net flux threading the interior of the circuit; this is due simply to Meissner-effect trapping [50] , which prevents flux from crossing the boundary; even in Type II materials, thermally-activated spontaneous formation of flux vortices is negligible. Non-planar circuits need not obey this constraint; consider, for example, a dLJJ ladder, but with a halftwist partway along its length; clearly, absolute fluxon polarity will be inverted in that case. Circuits whose boundary contains a resistor or a JJ also need not conserve flux, as illustrated in Fig. 6 by the loss of flux in the dLJJ terminated by a matching resistor, and by the flux inversion in the open circuit. Thus, for simplicity, we restrict our attention, for the time being, to planar circuits with no resistors or JJs on the boundary. SCE circuits that include only inductors, capacitors and JJs furthermore obey strong constraints due to the symmetries respected by the underlying electrodynamic physics; in particular, T (time) symmetry, meaning time-reversal invariance, implies that when the direction of all currents and fields is reversed, the dynamical trajectory of the circuit configuration will remain the same (apart from the reversal of the currents and fields).
Attending to such conservation laws and symmetries can drastically simplify our search for SCE circuits that implement ABRC device functions, by allowing us to immediately eliminate candidate functions and circuit-design strategies that are inconsistent with these constraints. For example, consider a 1-port device that inverts the polarity of the input fluxon. This is impossible in a planar circuit with a superconducting boundary. Or, consider a 2-port device that allows positive-polarity fluxons to pass through, but causes negative-polarity ones to reflect. This is only possible if the device has an internal trapped flux, since otherwise T symmetry wouldn't be respected.
Without paying attention to such constraints, even just to classify the possible ABRC device functions operating on polarized pulses would be a daunting task. Consider, for example, devices that contain at most 1 trapped fluxon of internal state, which may be either polarity. There are then 6 combinations of I/O pulse types (−1 or +1) and internal states (−1, 0, or +1), which we call I/O syndromes, for each I/O port. Each of these syndromes may, in general, map to any other, in an arbitrary fully logically reversible ABRC function, so the number of such functions can be obtained by counting permutations; see Table II . We wrote a simple Python program to enumerate all possible such functions; it took several hours to complete for the case of 2 ports, and would be infeasible to run for 3 or 4 port devices.
However, if symmetry and conservation constraints are accounted for, the problem becomes much simpler. For a 1-terminal device, the only nontrivial function is one that swaps the internal fluxon with the I/O pulse. For 2-terminal devices, the only other nontrivial functions are ones that separate pulses by polarity, like the example mentioned two paragraphs prior. And for 3-terminal devices, there are only a limited number of useful functions, particularly if attention is restricted to functions that also exhibit symmetry between I/O ports.
In the following, we begin the process of systematically analyzing the possible ABRC functions acting on polarized pulses and using 0 or 1 polarized fluxon of internal state that obey the conservation rules and symmetry constraints discussed above. Table III. an output syndrome on the same row. Thus, if the stored flux in the device state is 0, or is the same as the pulse's polarity, then neither the stored state nor the pulse polarity can change. However, when there is a stored flux of opposite polarity to the I/O pulse, then their polarities may be exchanged, or not. Due to the T symmetry constraint, if one input polarity on a given port causes a flux exchange, then the other must as well (assuming the device has no other trapped flux or bias currents). Further, the question of which I/O port the output pulse emerges on can only (at most) depend on the absolute total charge (but not the sign) of the I/O syndrome, and the port that that input arrived on-so, e.g., when the total charge is 0, both input syndromes +1(−1) and −1(+1) for a given input port must emit a pulse on the same output port (not necessarily the same as the input port).
For a 1-port device with 0 internal flux, the only possible reversible behaviors are the Reflector (R) behavior, or (if not flux-conserving) the Inverting Reflector (IR) behavior; in both cases, any internal device state is unchanged and unnecessary. Note, we already saw examples of both behaviors in Fig. 6 . If there is one (non-zero) internal fluxon, then after accounting for symmetries and conservation rules, the only nontrivial (stateusing) behavior is the Swap (S) behavior, in which the polarities of the moving and stored fluxons are exchanged (Table IV) . Such a device works as a reversible memory cell. Due to its functional simplicity, it would be an appropriate target for detailed circuit design efforts. Additional analysis to classify the possible functional behaviors of ABRC devices for the cases of 2 or 3 I/O ports will be provided in later work.
IV. CHALLENGES FOR FURTHER DEVELOPMENT
At this point, the effort reported here is still very much at a preliminary stage, and much work remains in order to develop the ABRC concept into a viable technology for fast and energy-efficient superconducting logic. Some major challenges that remain at this point include:
1) Identify specific circuit topologies and device parameter settings that implement useful ABRC device functionalities, such as that illustrated in Table IV , with low signal degradation over reasonable margins. At this point, we still need more insight into the required design methodologies.
As a contingency, we may automate a search through the space of possible circuits. 2) Even given working circuit designs for a useful suite of ABRC primitive functions, manufacturing variability may pose a substantial barrier to the workability of this approach to logic design in practice, compared to the case in typical irreversible logic styles such as RSFQ or RQL, in which logic signals are restored at each step. Over time, this problem may be alleviated through improvements in manufacturing processes, but in the meantime, it can be expected to remain a significant concern. 3) Further improvements in (and analysis of) the theoretical efficiency of general logic constructions based on ABRC are needed. At this time, it is still far from clear what exactly will be the overheads, in terms of circuit complexity, for implementing typical larger functions in terms of ABRC primitives, as opposed to traditional combinational and sequential Boolean logic.
V. CONCLUSION
In this paper, we reviewed some of the theoretical and simulation work done to date in our project at Sandia to implement the ABRC model [25] in SCE. We have modeled and simulated the discrete LJJ transmission lines that we intend to use for interdevice communication, and have begun to characterize how the symmetries and conservation laws that apply in JJ circuits can help narrow down the set of possible ABRC device functions that may be implementable in such circuits. Next steps include designing a 1-port JJ circuit that implements the Swap function, classifying the interesting 2-and 3-port functions, and better analyzing the margins and overheads of this approach.
