We introduce a novel all-optical logic architecture whereby the gates may be readily reconfigured to reprogram their logic to implement (N)AND/(N)OR/X(N)OR. A single gate structure may be used throughout the logic circuit to implement multiple truth tables. The reconfiguration is effected by an optical reference signal. The reference may also be adapted to an arbitrary Boolean complex alphabet at the gate logic inputs and calibrated to correct gate imperfections. The all-optical gate structure is partitioned into a linear interferometric front end and a nonlinear back end. In the linear section, two optical logic inputs, along with a reference signal, linearly interfere. The nonlinear back end realizes a phase-erasure (or phase-reset) function. The reconfiguration and recalibration capabilities, along with the functional decoupling between the linear and nonlinear sections of each gate, facilitate the potential aggregation of large gate counts into logic arrays. A fundamental lower bound for the expended energy per gate is derived as 3h + kT ln 2 Joules per bit.
INTRODUCTION
In the last twenty years considerable academic research has been conducted toward realizing digital logic functions (AND, OR, XOR, etc.) by means of all-optical (AO) devices. The research included proposals and demonstrations of all-optical logic gates [1, 2] , all-optical switching devices [3, 4] and transistors [5] , optically controlled photonic structures [6] [7] [8] , and all-optical analog to digital converters [9] . A main target application would be the realization of the mythical AO computer, in which photons rather than electrons effect the interactions between the gates. In the last few years such efforts have been reignited by the motivation of better exploiting the high transmission capacity of optical communication networks. This is envisioned to be achieved by AO networking, wherein the optical packets are routed by ultrafast smart AO switches, interpreting the headers and performing the packet switching functions all in the optical domain, without involving conversion to electronics and back to optics [10] . A related direction is the usage of optical interconnects for optical networking on a chip [11] .
In general, the main desirable properties against which to measure any proposed AO logic device family are the following [12] : (1) Speed, desired orders-of-magnitude faster than today's electronic gates (e.g., 40 Gb/ s up to Ͼ1 Tb/ s); (2) small dimensions, i.e., efficient footprint for large-scale integration (LSI); (3) low-power dissipation; (4) cascadability, the ability to interconnect and fan-in/out to form large logic arrays amenable to LSI (implying logic-level restoration); and (5) manufacturability: reliably and repeatably fabricated at low cost. Another desirable quality would be the logic devices' reconfigurability, or reprogrammability, i.e., the ability of the hardware architecture to rapidly alter the functionalities of its components and the interconnection between them as required, in effect enabling an AO field-programmable gate array (FPGA) architecture.
In this paper we introduce a new architecture for AO logic, potentially providing a better fit to the desirable attributes just listed. In particular, the new AO logic gates may be reconfigured at will, their logic reprogrammed to implement (N)AND/(N)OR/X(N)OR Boolean functions. The novel principle entails partitioning the AO gate structure into a linear interferometric front end, wherein two optical logic inputs, along with a reference signal, linearly interfere. The linear stage is followed by a nonlinear back end realizing a phase-erasure (PE) function, alternatively described as phase reset, and characterized as follows: the magnitude and phase at the PE optical one-port output are functions of the input magnitude, independent of the input phase. The logic calculation is essentially performed within the linear-optics stage, easing the requirements placed on the nonlinear section. However, the nonlinear PE is shown to be a necessary final ingredient of the gate operation, without which chaining of multiple gates would not be possible. As mentioned above, a key attribute of our novel AO logic family is its reconfigurability. The gate reprogramming is effected by an optical reference signal, which may also be adapted to an arbitrary Boolean complex amplitude alphabet at the gate logic inputs and may be further fine-tuned to compensate for small gate or input signal variations. These tunability/ calibration features facilitate the potential aggregation of large gate counts into extended logic arrays. Even if the logic circuit is not designed to be reconfigured on the fly, it is highly advantageous (in terms of photonic circuit den-sity and ease of manufacturing) to have a common compact gate structure. Under this venue the gates are fabricated all the same, while the truth table of each gate is fixed by the level set for its optical reference signal, which acts as a gate-type selector.
The gate linear front end, and to a lesser extent the nonlinear back end as well, are amenable to being realized as photonic integrated circuit (PIC) structures. The proposed gate architecture allows for closed-loop control and calibration procedures for the reference signals, maintaining each gate at its optimal operating point and mitigating the accumulation of errors.
All input/output (I/O) ports of the multiple gates may be taken to operate at the same wavelength, and the requirements for temporal and spatial coherence of the various optical signals at various ports in the logic array are quite modest. Hence cascading and fan-out of large numbers of gates is facilitated, and a single "optical power supply" is distributed over the chip.
We show that, in principle, an arbitrary PE characteristic following the linear part would be sufficient for realizing a single gate or an array of a few gates. However, in order to prevent accumulation of errors in large gate arrays, it is essential to restrict the PE designs to those displaying regenerative (limiter or thresholderlike) characteristics, enabling logic-level restoration. Realizations of regenerative PE should be facilitated by the decoupling between the PE nonlinearity and the linear-optics front end, as inherent in our architecture. The nonlinearity is freed from the burden of realizing the logic-related interactions, which are all relegated to the preceding linearoptics section. This enables separately implementing the nonlinear section by a variety of optical nonlinear effects. Any nonlinear process that has ever been considered for AO processing is relevant to regenerative PE realization in our context. In this paper we detail our preferred realization of regenerative PE, based on resonant gain saturation, or saturable absorption mechanisms. However, we envision that a variety of additional nonlinear PE mechanisms will probably be further proposed and investigated once the currently proposed architecture is disseminated.
Finally, we evaluate a fundamental limit on the gate energy efficiency, deriving a lower bound on the expended energy per gate per bit of the order of 3h + kT ln 2 consistent with the Landauer thermodynamic limit [13] .
The paper is structured as follows: in Section 2 we introduce the novel reconfigurable all-optical gates in the simpler case of unipolar and bipolar logic alphabets, briefly presenting the basic photonic building blocks. In Section 3 the treatment is extended to general complex alphabets. In Section 4 we detail the photonic realizations of the linear-optics front and the PE, the gate impairments, and issues of integration and cascadability. Section 5 derives a fundamental lower limit of energy consumption per bit.
RECONFIGURABLE ALL-OPTICAL GATES OVER REAL-VALUED ALPHABETS A. Theory of Operation
The novel all-optical gate operation may be described in the abstract as an analog mathematical transformation capable of realizing multiple Boolean operations [ Fig.  1(a) ]. A photonic structure physically realizing this transformation in the lightwave domain is shown in Fig. 1(b) .
Logic Alphabet
In our application, a Boolean or logic alphabet is a pair of complex-valued (or real-valued) numbers denoted
flexibly allowing the logic polarity conventions to vary from one gate input or output port to the next one. Optically, the two values ͕A L , A H ͖ represent the complex amplitudes of two possible light signals. In this section we restrict our attention to the simplest unipolar ͕0,A͖, and bipolar ͕ ± A͖ real-valued logic alphabets. The more general treatment of arbitrary complex-valued logic alphabets is treated in the next section.
Gates Structure
Our general approach is to realize each logic gate as the cascade of a linear stage and a nonlinear stage. The linear front-end stage implements either an adder-subtractor or just an adder. The terminating nonlinear stage is a phase eraser, an element that resets the signal's phase to that of a probe signal, with or without thresholding. The logic is "almost" realized in the linear part, with the nonlinear 
Linear Stage
The gate front end consists of a simple linear combiner (LC), adding/subtracting or more generally taking linear combinations with arbitrary coefficients of three input signals,
or, in particular, most simply at least for the purposes of principle of operation exposition, a = b =−c = 1; i.e., we use an adder-subtractor, called standard LC:
All signals and coefficients are complex-valued scalars: the two signals X , Y are the "logic" inputs, while the third input R is a reference signal, to be tuned to predetermined values in order to modify the gate logic function, thereby selecting a particular Boolean function (AND/OR/ XOR…etc.), further fine-tuned to calibrate the gate to variations in the input Boolean alphabets. There are multiple alternatives to optically realize the LC module. Our preferred optical implementation simply consists of a pair of directional couplers (DCs) connected as in Fig. 1(b) , terminating three of their four output ports and using the fourth port as output. A DC device acts as a linear two-port, described by a 2 ϫ 2 transfer matrix transforming from its input to its output complex amplitudes. Using planar PIC technology, DCs are readily designed and tuned to an appropriate length such as to perform the Hadamard matrix function,
with X , Y the complex amplitudes at the DC input waveguides and ⌺ , ⌬ the complex amplitudes at the DC output waveguides such that (up to the 1 / ͱ 2 factor) one output is the sum of the two inputs while the second output is the difference of the two inputs (up to a constant). Hence two DCs interconnected as in Fig. 1(b) implement the addition and subtraction,
compounding to the linear combination Eq. (2). As shown below, under a certain scenario, an even simpler LC may suffice, consisting of just a single addition of the logic inputs, U ϵ X + Y, realized by means of a single DC or Y-junction combiner
Nonlinear Stage: a Phase Eraser with or without Thresholding
The linear combiner is terminated in a nonlinear stage, realizing a PE operation that amounts to either generating the absolute value, |·|, or the squared absolute value, ͉ · ͉ 2 , of the input U, or any real-valued function thereof, M͕͉ · ͉ 2 ͖, or most generally a complex-valued function thereof:
Phase erasure means discarding or resetting the input phase information (retaining just the absolute value or a function thereof), evidently a nonlinear operation. Actually the gates may become more robust and work better in combination, provided that the function M acting on the absolute value consists of a thresholding (regenerative) operation, i.e., is ideally a two-level piecewise constant function, amounting to a limiter or an ideal-switching transfer characteristic, either an identity (ID gate) or an inverter (NOT gate), in effect acting as a one-bit quantizer or slicer:
͑6͒
[Practically, an approximation of Eq. (6) may suffice, whereby the transition slopes up abruptly but is not ideally discontinuous and the two levels are not perfectly flat]. We refer to the resulting module as a phase-erasing slicing inverter (PESI) or phase-erasing NOT. However, we should keep in mind that M NOT ͉͑U ͉͒ is a sophisticated inverter that operates on a complex-valued input amplitude while discarding its phase; i.e., we require a regenerative characteristic further endowed with the PE function. However, for single gate or a few-gates operation, a full PESI characteristic is not strictly necessary; e.g., a simple |·| or ͉ · ͉ 2 , providing the simplest PE functionality without regeneration, may suffice to terminate the LC and enable the gate to function.
Optical Amplifier/Saturable Absorber as Phase Eraser
An optical amplifying medium, e.g., a semiconductor optical amplifier (SOA) pumped just above transparency may be used to realize the PESI functionality-PE with (inverse) thresholding-taking advantage of the crossgain modulation (XGM) nonlinear effect. A detailed analysis will be carried out in Section 4, but to briefly introduce the concept, a two-level input U and a constant probe beam are passed through a gain block. If the input is LOW level, the probe is amplified and generates logical HIGH. If the input is HIGH level, the gain saturates, and the probe is attenuated and generates logical LOW. Notice that the population inversion is insensitive to the phase of the pump U but responds just to its power, ͉U͉ 2 ; hence a PE characteristic is attained. The exponential gain and absorption attained in the two respective cases serve to separate the output logic levels, yielding the switching PESI characteristic. The PESI device operates as the terminating stage in the gate shown in Fig. 2 , or in Fig. 1(b) . A similar scheme based on a saturable absorber is also possible.
B. Structures Realizing Various Gate Types

Y-Junction Combiner+ Thresholder Makes a NAND Gate
We now show that if the PE is regenerative (i.e., we have a PESI thresholder at our disposal), then the overall gate structure may be simplified: the linear stage may be reduced to a single addition, U ϵ X + Y (subtraction will also work), e.g., optically realized by a single directional coupler [ Fig. 2 Set the threshold at 1.5A; i.e., generate zero when U Ͼ 1.5A, generate A otherwise. Evidently zero output is obtained only when U =2A, i.e., in the TT case, whereas an A output is generated for any of the FF, FT, TF input combinations. Hence, we have indeed realized an all-optical NAND gate.
Pair of Directional Couplers+ Phase Erasure |·| Makes a NOR Gate
Assume that a PESI device with sharp transition is not available, yet we have at our disposal a nonregenerative PE, e.g., |·| or ͉ · ͉ 2 , which may be simpler to realize than the thresholder. Then we may still realize multiple AO logic gates provided that we precede the nonlinear PE with a pair of DCs rather than just a single combiner, generating the adder-subtractor LC described by Eq. (2) or Eq. (4) as shown in Fig. 1(b) . Here we exemplify just a NOR gate with such structure, again using the unipolar alphabet. As X , Y ͕0,A͖, the possible values of the first coupler output ⌺ ϵ X + Y in response to FF, FT, TF, TT are ⌺ ͕0,A , A ,2A͖ as above. Now subtract a reference R = 1.5A by means of the second coupler, U ϵ ⌺ − R, yielding three possible levels, U ͕−1.5A , −0.5A , −0.5A , 0.5A͖. After taking the absolute value, the phase (sign) is erased, yielding just two levels, ͉U͉ ͕1.5A , 0.5A , 0.5A , 0.5A͖; i.e., we obtain 1.5A for the input pair FF and 0.5A for FT, TF, FF. We have thus realized a NOR gate under the output alphabet assignment F out ↔ 0.5A, T out ↔ 1.5A. To restore the output alphabet to a unipolar one, ͕0,A͖, we may simply subtract 0.5A from the output alphabet ͕0.5A , 1.5A͖. Actually, this step may be saved by absorbing it within the reference subtraction occurring in the following gate driven by the current gate output.
We note that either NAND or NOR is a universal gate; i.e., all other gates may be generated from either of these gates. However, the universal construction is not necessary with our proposed AO gate technology (nor is it efficient), as we show further, below, that all gate types may be independently realized with the same structure, which is far more efficient than deriving gates from each other.
Alternative Gate Structures
There are multiple scenarios for which to consider combinations: (i) Unipolar or bipolar alphabet (or, most generally, complex-valued logic alphabet as treated in Section 3); (ii) either a Y-junction (or single DC) with a thresholder or a pair of DCs with nonthresholding PE; (iii) various gate types realizing each of the six truth tables (N)AND, (N)OR, X(N)OR.
Gates Reconfigurability
Remarkably, we show that under our optical logic family architecture, for an appropriate fixed logic polarity, we may readily switch the gate type within a subset of three out of the six gate types (N)AND, (N)OR, X(N)OR simply by changing the reference light signal, R [either one of (N)AND or one of (N)OR or one of X(N)OR, e.g., turn an AND into an OR or into a XOR]. For a given logic alphabet and logic polarity, the setting of the reference R will be seen to select the truth table, i.e., to set one of the three gate types. Beneficially, a common gate structure [consisting either of two DCs terminated in a simple PE or a Y-junction (or DC) terminated in a PESI] then suffices to implement any three out of the six types of gates at once. Such uniform gate structure may simplify the realization of gate arrays with high counts. In contrast, if hypothetically we were able to realize just a single type of universal gate, say, a NAND, then each of the other types of gates [(N)OR, X(N)OR] could still be obtained by multiple interconnected copies of the universal NAND. However, such "universal" construction would take far more "real estate" on the all-optical circuit than under our novel "reconfigurable" construction, whereby a single structure realizes multiple gate types simply by retuning the optical reference value R.
Beyond uniformity and efficiency of construction, an ultimate utilization of the gates' reconfigurability feature would evidently lead to the concept of all-optical FPGA-a fully reconfigurable optical circuit. However, this would also require reconfiguring the interconnects between gates [which in turn might be realized by means of more logic gates aggregated as (de)multiplexers].
Principle of Operation Scenarios
In the remainder of this section we proceed to describe the principle of operation of the various gate types over the unipolar and bipolar alphabets. As an inverter may be realized either passively (by changing the logic polarity convention at a port) or actively as a physical NOT device, we do not have to cover all six gate types, (N)AND, (N)OR, X(N)OR, but rather just three representatives will do (one of the first, second, and third pairs). For simplicity, when considering a nonthresholding PE (in conjunction with the two-DC-based LC), we use the simplest PE model |·| (the further application of a real-valued function M would simply modify the real-valued output alphabet). We shall also consider PE with thresholders (PESI devices), which allows reduction of the LC to a Y-junction or single DC.
We find it convenient to use the algebraic notation A + B = ͕x + y ͉ x A , y B͖ for the sum of two sets, e.g., ͕a , b͖
of an adder in the linear stage when the inputs are cycled to various combinations of logic values. We also define the sum of a set and a constant as A + c ϵ A + ͕c͖ = ͕x + c ͉ x A͖. In the statements below, positive logic polarity is assumed for both inputs and outputs.
Unipolar NOR with Two Couplers+ ͉ · ͉ Phase Erasure: already covered. We note that the reference for this gate type was R H = 1.5A (with the label H not implying a high logic value but rather signifying that this is the highest value of three possible reference values reconfiguring the gate to one of the three types NAND, NOR, XOR).
Unipolar AND with Two Couplers+ ͉ · ͉ Phase Erasure: By setting the reference to R L = 0.5A, the gate turns into an AND. Indeed,
Unipolar XNOR with Two Couplers+ ͉ · ͉ Phase Erasure: By setting the reference to R M = A, the gate becomes a XNOR (coincidence gate). Indeed,
Evidently, the three NOR, AND, XNOR gates above may be respectively converted into OR, NAND, XOR by applying inverters on their outputs.
Inverter (NOT):
As is well known in Boolean theory [14] , the negation (N) of logic functions may be actively realized by physically inserting an inverter device or virtually realized by switching the logic assignment from positive to negative polarity. The switching of the logic polarity convention does not always require inserting a physical inverter. Rather, it may be virtually attained with no extra hardware by switching logic polarities of both the output of a gate and the input of the gate following it, but if just one of the ends of the interconnection between the two gates is polarity switched, a physical NOT device is actually necessary. Evidently, a NOT may be realized from a NOR gate by "wiring the inputs together," but such a realization would be wasteful.
PESI as NOT:
In fact, the PESI inverse thresholder device described above does function as a NOT gate over the unipolar alphabet, {0,A} provided, that its threshold is set lower than the HIGH input value {A}, such that 0 (falling under the threshold) is mapped to HIGH, whereas A (above the threshold) is mapped to LOW.As already mentioned, the physical implementation of the PESI-based NOT consists of a saturable gain or absorption medium (for a gain medium, a HIGH pump saturates the output, setting it LOW; for an absorption medium, a HIGH input enhances upward transitions out of the lower more populated level, increasing the absorption hence the output goes LOW).
Directional Coupler+ ͉ · ͉ Phase Erasure as NOT: An alternative implementation of the NOT over the unipolar alphabet {0,A} is based on subtracting a bias A from the input by means of a DC, yielding ͕−A,0͖, then taking the absolute value (applying a PE), yielding {A,0}. The gate derivations over the bipolar alphabet follow:
Clearly the output alphabet for each of these gates is no longer bipolar, but it may be restored to bipolar by subtracting or adding a bias, as described above. We next consider gate versions based on Y-junction ͑or single DC͒ + inverse thresholder (PESI).
Unipolar NAND with Y-Junction+ Inverse Thresholder: already covered. The threshold was seen to be H = 1.5A in this case.
Unipolar NOR with Y-Junction+ Inverse Thresholder: Obtained by setting the threshold at L = 0.5A, partitioning the four values U = X + Y = ͕0,A͖ + ͕0,A͖ = ͕0,A , A ,2A͖ into two sets on either side of the threshold: {0} to the left of the threshold, corresponding to FF input (and yielding T out , as the thresholder is of the inverting type) versus ͕A , A ,2A͖ to the right of the threshold corresponding to inputs FT,TF,TT and yielding F out .
Unipolar XNOR with Y-Junction+ Inverse Thresholder: Not practically realizable in this configuration (but the XNOR may be realized by means of the two-DC configuration, as seen above).
C. Computer Simulations of Unipolar Reconfigurable Gates
In this subsection we numerically demonstrate the principle of operation of the novel reconfigurable gates over the unipolar alphabet. A full modal and time domain solver software package is used for beam-propagation method (BPM) and finite-difference time-domain (FDTD) simulations of the integrated-optical realizations of the gates' LC stage.
It was seen that gate implementations lacking PESI, i.e., using PEs without regeneration, require an addersubtractor [Eq. (2)] type of LC, henceforth referred to as standard LC. In fact, the more general linear combination [Eq. (1)] may also be used to generate logic functions, by rescaling the logic inputs and the reference, accordingly.
For example, rather than using a pair of DCs to generate [Eq. (2)], a standard LC may be alternatively implemented by a pair of Y-junction combiners, interconnected as shown in Fig. 3(a) , essentially generating a linear combination of the form
whereby the inputs X , Y experience the same loss, by symmetry. Notice that the factor ␥ satisfies ͉␥͉ Ͼ 1, since the loss experienced by the X , Y inputs, which traverse two Y-junctions, exceeds that of the RЈ reference, which passes through only a single Y-junction. Discarding the inconsequential loss factor a and setting RЈ ϵ −R / ␥, expression (12) is seen to be equivalent to our standard LC [Eq. (2)]. For example, an AND gate nominally requiring R = 0.5A would use a scaled reference RЈ ϵ −A / ͑2␥͒ when implemented in the structure of Fig. 3 (a). Moreover, ␥ is complex-valued, with its phase determined by the optical length differences between the signal and reference paths. When calibrating actual or simulated devices, it is difficult to evaluate the magnitude and phase of the factor ␥. To mitigate the calibration issues, it is useful to introduce a symmetrical balanced structure as shown in Fig.  3(b) . This version of the LC has two reference inputs,
For the purpose of simple and reliable simulations, it is most advantageous to use this structure owing to its "selfcalibration" property: all four inputs (the two logic inputs X , Y and the two references R 1 Ј , R 2 Ј) experience the same attenuation, a, as they traverse identical paths to the output, owing to the symmetry of the device. Hence the structure of Fig. 3(b) circumvents special calibration of unknown attenuation and phase factors. Discarding the inconsequential common scale factor, a, and selecting
retrieved. In particular, it is convenient to set the two ref- The BPM simulations in these figures were run in somewhat arbitrary units: the waveguide core and the cladding have refractive indices 3.1 and 3.0 respectively, and the waveguide has a height of 1 m and width of 2 m. Using these values, long signal paths ͑ Ͼ 100 m͒ were implemented to ensure that the outputs of the Y-junction combiners stabilize before reaching the next junction. Nevertheless, the general structures of Fig. 3 are applicable to any physically realizable PIC setup, and the waveguide material and dimensions can be optimized to minimize the device losses and PIC footprint.
Alternative 2-D FDTD simulations were run for more compact ͑ϳ20 m ϫ 30 m͒, higher-contrast devices as shown in Fig. 7 , using the following parameters: waveguide index= 1.5; cladding index= 1; waveguide width = 0.25 m; wavelength= 1.55 m; single-mode-waveguide input launch: Gaussian of 1/e width= 2 m.
Peculiarly, all the simulated gate structures of Figs. 4-7 seem to act perfectly well as logic gates of appropriate types, as configured by the references, without actually incorporating PEs. Although no physical PE devices are present, PE functionality is actually implied. It is the graphic representation of optical intensity ͉ · ͉ 2 in the output waveguide, as generated by the software program "power monitor," that acts as an effective PE, properly terminating the gate in a nonlinear mapping (albeit a computer-generated nonlinearity). This is the reason why graphic observation of the output waveguide in the upper rows in each of Figs. 4-6 provides proper indication of the two-level outputs. Similarly, the lower rows in each of Figs. 4-6 display magnitude plots (absolute value of the complex amplitude), implementing a |·| PE transformation, also displaying two-level outputs. For the NOR and AND gates in Figs. 4 and 5, we obtain one H and three L outputs, shown as four pulses, three of which have the same height (corresponding to L), equal to one third of the magnitude of the H pulse. If the simulation program were to show the actual signed amplitude (rather than the magnitude), some of the pulses labeled L would appear inverted. In the amplitude domain there are actually three levels, which may be denoted H,L,−L. The sign inversion must be discarded by the PE, in order to obtain two output levels H,L. While the "effective PEs" implied in the software-generated intensity or field magnitude enable numerical demonstrations of single-gate operation, the gates cannot possibly be cascaded "as is." Actual PEs In the lower row, the H output magnitude is 3ϫ as high as that of each L output, as measured to very high accuracy. Note that despite the graphic visualization of optical intensity giving a semblance of automatic PE above, a PE module would still be required in a complete gate to allow for logic cascading.
must be physically inserted into the interconnects between gates in order to enable cascading, providing the essential function of discarding the signs of the gate outputs prior to feeding the next gate in line.
In this section we introduced and simulated the Y-junction-combiner-based alternative structures of Fig. 3 for implementing the standard LC [Eq. (2)]. However, in the rest of the paper we shall revert to mostly considering the "pair-of-directional-couplers" (DC-pair) structure, whenever referring to the implementation of a standard LC (although any of the structures of Figs. 3, could be substituted for the DC pair). Despite the Y-junction-based structures being simpler, a unique feature of the DC-pair structure is the availability of additional "dangling ports" which might be utilized in certain cases to feed additional logic gates as described in Subsection 5.A. In contrast, when Y-junction combiners are used, the power of the antisymmetric modes of the double waveguide structure feeding the Y-junction gets dissipated in the substrate.
GENERALIZATION TO COMPLEX LOGIC ALPHABETS
When using a gain medium as a PE, as the amplitude of the probe emerging out of the optical gain or saturable loss medium is modulated, its phase is also inevitably modified (e.g., the charge-carrier density modulation in a SOA affects not only the gain but also the refractive index; or, more generally, the real and imaginary parts of the susceptibility are related by Kramers-Kronig relation: e.g., it is only right at the center of a Lorentzian gain curve that the phase shift is precisely zero). Hence we may say that the complex gain of the probe, and subsequently its output complex amplitude, is modulated by the intensity of the pump. By complex gain we mean that both the amplitude (or power) gain and the phase of the probe are affected by the intensity of the pump. It follows that the gain/loss medium actually realizes the PE function V = M͉͑U͉͒e j⌽͉͑U͉͒ rather than the simpler characteristic V = M͉͑U͉͒. The PE device is seen to be insensitive to the phase of the input, but it generates a two-level phase at its output (again modulated solely by the amplitude of the input, not its phase). This is a generally unavoidable parasitic effect, amounting to modifying the output alphabet from a real-valued to a complex-valued one. A second, more mundane, reason why a complex alphabet may appear is due to uncontrolled optical-path-length accumulation; e.g., a real-valued bipolar alphabet ͕ ± E͖ may be converted into the (antipodal) complex alphabet ͕Ee j͑+͒ , Ee j ͖ merely by propagation along an optical waveguide. This establishes the motivation for considering operation with complex-valued alphabets.
Fortunately, the appearance of complex-valued alphabets does not invalidate the proposed all-optical scheme, 
A28
which will still work provided that (i) we devise a means to map one complex alphabet into another desired alphabet or, alternatively, (ii) we manage to endow our gates with the ability to operate with arbitrary complex-valued input alphabets.
Considering option (i), mappings between alphabets may be effected means of additional linear optics: a directional coupler to realize subtraction or addition of an appropriate complex bias value and/or quasi-static optical phase and attenuation control (e.g., microheating the waveguide interconnects between the gates to tune the optical phase or using a variable optical gain or attenuation). For example, the complex alphabet ͕E L , E H ͖ (consisting of two possible values of the optical electric field, denoted by E) may be converted into a bipolar one by first subtracting off the mean value ͑E L + E H ͒ / 2 (by means of a directional coupler), which generates antipodal output values, ͕ ± ͑E H − E H ͒ /2͖, followed by phase-derotating the two antipodal outputs to render them real-valued. As another example, a bipolar output alphabet ͕ ± E͖ may be converted to a unipolar one ͕0,E͖ simply by adding up the bias A and scaling by half.
Instead of mapping the complex alphabet at the output of the gate back to a fixed one, as per option (i), it may be preferable to resort to option (ii): adapt the next gate in line to operate with an arbitrary complex-valued input alphabet. This may be attained in our architecture simply by modifying the value of the reference R of the following gate (which requires quasi-static phase and amplitude control). Let us prove that the transfer characteristic,
consisting of an adder-subtractor (realized by a pair of DCs) followed by a PE (the absolute-value operation), may be used to realize any one of the three gate types NOR, AND, XNOR for any complex alphabet ͕E L , E H ͖ simply by setting the complex amplitude of the reference R to suitable complex values to be determined next. Remarkably, when the proper reference values are used, despite there being four possible logic input combinations
To show this we start by decomposing the LC function U = X + Y − R into an adder of the two inputs X , Y followed by a subtractor of the reference:
, both logic inputs assume the L value, then ⌺ assumes the value ⌺ LL ϵ 2E L . Similarly, when X = Y = E H , then ⌺ assumes the value ⌺ HH =2E H . Now, either when X = E H and X = E L or when X = E L and X = E H , i.e., whenever the two logic inputs are different, then ⌺ assumes a common value ⌺ HL/LH ϵ E H + E L ; i.e., one cannot distinguish between the two HL and LH input cases. At this point the four input entries of the gate truth table have been reduced to three complex values at the ⌺-adder output:
It is evident both analytically and graphically [using vector addition by the parallelogram or triangle rule in Fig. 8(a) Fig. 8(b) ]. Finally, the setting Antipodal Binary Alphabets: The gates operation over real-valued unipolar or bipolar alphabets may be viewed as a special case of the general complex-valued phasors construction of Fig. 8 . In this case the three collinear points ͕⌺ LL , ⌺ HL/LH , ⌺ HH ͖ align along the real axis, and the rule of having the reference R at the midpoint of the appropriate segment still applies.
For a fixed average power constraint of the logic inputs ͑P L + P H ͒ /2ഛ ͗P In ͘ the optimal selection of input alphabet is the antipodal one, i.e., ͉E L ͉ = ͉E H ͉ and ЄE L = Є E H + , as then the center of gravity of the two-point constellation has been brought to the origin, the distance between the two logic states is maximal, and best noise discrimination is attained. An equivalent argument is made in communication theory, where it is shown that the antipodal constellation leads to lowest error probability under an average power constraint. Without loss of generality, we may then select ЄE H = 0, then ЄE L = ; i.e., both phasors E L , E H are real-valued with E L Ͻ 0 Ͻ E H , retrieving the real-valued unipolar constellation. The relevant five pha-
PHOTONIC CIRCUIT REALIZATIONS
In this section we elaborate on the photonic realizations of the LC and the PESI modules, which were briefly introduced in the last section, in particular considering the integration of multiple gates into photonic logic circuits, including imperfections and impairment-mitigation approaches.
A. Integrated Photonic Realizations of the Linear Combiner Front End
The linear front end is naturally amenable to a linearoptics implementation based on lightwave interference (LI), facilitating the photonic realization of the LC mathematical function (2). Our preferred implementations, are based on photonic integrated circuits (PIC) platforms implementing multiple gates, each comprising either a DC-pair structure [ Fig. 1(b) ] or more simply a Y-junction or single DC (Fig. 2 ) (which in turn requires a higherperformance PE-the PESI), as briefly introduced in Section 2. We note that nowadays integrated optical directional couplers are manufacturable with high yield and excellent reproducibility, as was recently demonstrated in [15] .
The sign reversal on the R-port in Eq. (2) may be obtained by taking the output of the second DC at its ⌬-port. Alternatively, the output may be taken at the ⌺-port, but the optical length of the waveguide feeding the R signal is tuned to extra phase shift. In fact, all the optical interconnects between the ports must be maintained at or tuned to particular optical lengths with sub-wavelength precision, as is attainable in integrated optics, and all optical signals should be crafted to desired magnitudes. This may necessitate temperature control to stabilize the PIC or quasi-static phase shifters and possibly also amplitude gains or loss control realizable by various integrated-optical techniques, e.g., thermo-optic or electro-optic bias tuning. Such calibration & tuning (C&T) measures are further discussed in this section.
As an alternative to the DC-pair, we may use any symmetrically structured optical 3-port (O3P) device to perform the LC function, as detailed in the appendix. In particular, multimode interference (MMI) waveguide devices, amenable photonic linear combiner integration [16, 17] , are good candidates to more compactly perform the LC function. It is remarkable that such simple linear optical structures as the Y-junction or DC pairs and the MMIbased O3P may essentially operate as reconfigurable optical gates (up to the requirement to PE/threshold their output).
B. Gate Impairments, Calibration & Control, Cascadability, and Fan-Out of a Few Gates
So far we have considered ideal gates, in the sense that the complex alphabets of the X and Y inputs were assumed identical, and the LC performed an ideal additionsubtraction [Eq. (2)]. In practice, the LC photonic circuit would generate U = aX + bY − cR with the taps a , b , c slightly different from unity. Moreover, the logic alphabets of the X and Y inputs may be slightly different, as generated by either the logic source or the previous gates feeding the current gate.
Mismatched Logic Alphabets and LC Tap Deviations
A mathematical analysis of the two impairmentsmismatched logic and LC tap deviations-may be carried out similarly to that worked out in Section 3 for gate operation with complex alphabets, but it is omitted due to lack of space. The results are that in the wake of such impairments the gate output is no longer binary, but rather multiple ͑ Ͼ 2͒ levels may appear at the output, with either of the ideal levels splitting up into multiple (2 or 3) sublevels. To the extent that the input binary alphabet imbalances and the LC tap deviations are small, and for suitable selection of the reference R, the new sublevels into which each ideal level splits up remain close together and well separated from the possibly split levels associated with the complementary ideal level. We now consider design measures in order to enable or improve the cascadability of the gates despite the impairments. In light of the compounding of uncertainties upon cascading multiple gates, it is essential to compress the sizes of the H and L logic supports of the output of each gate and increase their separation in order to enable or improve the cascadability of the gates. This objective may be attained by two means: (i) Introducing a C&T procedure in the linear module of each gate. The optical logic circuit may then operate reasonably well despite the imperfections provided that the gate count is not too large.
(ii) Endowing the nonlinear PE module with a regenerative characteristic (thresholding). Nominally any PE characteristic V = e j⌽͉͑U͉͒ M͉͑U͉͒ may be used to terminate the LC, completing the linear module to a full functioning logic gate. However, further imposing the regenerative characteristic additional constraint on the shape of the function M͉͑U͉͒ substantially improves cascadability. Using a high-quality PESI is then the best way to mitigate the splitting-of-levels impairment.
C&T Procedure
The idea is to optimally tune and tweak the amplitude and phases of the input ports in the presence of fabrication imperfections, imperfect settings of the I/O signals, etc. The gates must then be endowed with phase and amplitude tunability on each of their inputs and possibly also on the outputs. This enables actuation of calibrations at the level of a single gate to monitor the signals in the LC stage, applying active closed-loop control to tune the amplitudes and phases of the logic inputs X, Y and the amplitude and phase of the reference R and optimizing these values to effect as close bunching as possible within each logic support representing sublevels that should not be distinguished (i.e., should be assigned a common logic value). For closed-loop control, taps must be provided on some of the I/O ports of the linear combiner to measure amplitudes by means of monitoring photodiodes. Interferometric procedures should be devised to tune the complex gains along each path, as well as the complex amplitude of the reference signal R, to the desired operating points for optimal performance of each individual gate.
In addition to improving the performance of individual gates, the C&T capability would endow our gates with some resilience to repeated cascading or fan-out, at least for short chains. How long a chain (or fan-out tree) of multiple gates is feasible when using C&T is to be determined by an error-propagation analysis, working out the error accumulation over a chain of gates. Such error analysis is outside the scope of this paper, which is concerned with the essential principles of operation; however, the C&T procedures will be further exemplified in Subsection 4.D below.
Extra resilience to fabrication imperfections and parameter variations is best attained by providing a combination of C&T in the linear stage and level restoration in the nonlinear stage (in addition to its PE function), designing its amplitude transfer characteristic to exhibit a PESI response approximating that of an ideal limiter.
C. Survey of Potential Physical Approaches to Phase-Erasure Photonic Realization
As for the nonlinear physics at our disposal upon approaching the task of designing an efficient regenerative PE, let us briefly summarize some of the intense research activity in AO processing, which has surfaced multiple nonlinear materials and platforms. There has been significant activity in semiconductor optical amplifiers (SOA) and in electroabsorption modulators (EAMs). Recent approaches exploit ultra-high-speed carrier dynamics to improve the nonlinear response speed [18] [19] [20] [21] [22] . There has been some progress at the device level using various approaches, most of which are based on one of the mechanisms of cross-gain modulation (XGM) [23] [24] [25] [26] [27] , cross-phase modulation (XPM) [28, 29] , four-wave mixing (FWM) [30, 31] and cross-polarization modulation (XPolM) [32, 33] . Some of these mechanisms inevitably limit the operating speed of such devices owing to the carrier recovery time of SOA. AO processing based on SOA nonlinearities typically involves manipulation of multiple wavelengths; however, this complication is relieved in our approach, which supports single-wavelength operation (although it is also compatible with multiple-wavelength operation, if so desired).
A second class of nonlinearities that may be candidates for regenerative PE realizations involves parametric, nonresonant optical processes, which have the advantage of being nearly instantaneous, relying on virtual electron or hole transitions rather than modifying the real carrier densities. In particular, parametric nonlinear processes could be utilized, such as degenerate FWM, in which two optical signals U , W interact nonlinearly, generating the mixing product V = ͑3͒ WUU* = ͑3͒ W͉U͉ 2 , which is evidently phase-insensitive-a function of the absolute value of the input U. Third-order ͑3͒ nonlinearities, as reviewed in [34] , may be further classified as phasematched and non-phased-matched. Non-phase-matched processes include cross-and self-phase modulation (XPM, SPM) based on the Kerr effect, two-photon absorption (TPA), and Raman gain. Such processes have been exploited to demonstrate a wide range of AO functions such as optical logic [35, 36] , optical performance monitoring [37, 38] , 2R and 3R optical regeneration [29, 33, 39, 40] , wavelength conversion [27, 41, 42] , optical buffering and delay [25] , demultiplexing [43] , and optical performance monitoring [37, 38] . As for highly nonlinear fiber (HNLF) -based devices [35, 36, 34] , their bulky size and poor power efficiency hinder their practicality; however, nonlinear waveguide-based structures bear the promise of drastically reducing both the footprint size and the requisite power levels, potentially enabling LSI PIC realizations.
Both resonant and nonresonant (e.g., ͑3͒ ) processes are in principle candidates for realizing the PE functionality requisite in the implementation of our novel gate architecture. At least with existing approaches, the typical trade-off between resonant and nonresonant nonlinearities is that the speed of nonresonant Kerr-effect-based processes is higher, but so are the required optical powers and interaction lengths. The potential problem with parametric nonlinear processes is the requirement for high optical powers and long phase-matched interaction regions; hence nonlinear fiber-based devices tend to be very bulky, not amendable to large-scale integration. However, waveguide-based devices may be still be candidates for realizing the PE function using parametric nonlinear processes, especially those that do not depend on phase matching, such as SPM and XPM. Nevertheless, our preferred PE/PESI realizations are based on resonant gain saturation, or saturable absorption, mechanisms, as analyzed next.
D. PE and PESI Devices Based on Gain Saturation/ Saturable Absorption
In Section 2 we proposed to exploit gain saturation of pumped optical gain or loss media for the nonlinear section of our gates, realizing either a nonregenerative PE characteristic M͉͑U͉͒ or preferably an inverted limiterlike phase-insensitive PESI characteristic, well approximating the function M NOT ͉͑U͉͒ with M NOT given by Eq. (6) . Under this approach, the PE input is used as a pump to saturate the gain or loss of a probe signal propagating through the active medium.
Control/Probe Orthogonal Degrees of Freedom
The proposed configuration next passes two beams through an optically pumped amplifying medium: the input into the device, called here the control signal, and a second probe beam, separated by some optical degree of freedom (DOF) from the control signal, optical DOFs meaning angle, polarization, propagation mode, or wavelength. Unlike most SOA-based logic designs in the literature, our approach has both the probe and the control beams operating at the same wavelength; hence we rely on any one of the first three types of DOFs for orthogonal separation of the control and the probe. Singlewavelength operation of the optical logic circuit makes it more amenable to photonic integration.
In detail, the probe signal spectrally coincides with the control signal but is separated from it angularly, modally, or in polarization. "Angularly separated" means traveling at a different range of angles. "Modally separated" means that the two signals propagate as two different modes of a multimode guiding structure (e.g., the fundamental and the first-order mode). "Polarization separated" means that both signals are coherent and propagate collinearly but are launched in orthogonal polarizations, e.g., TE versus TM, by means of a polarization beam splitter (PBS) and are also separated at the output by a PBS. The advantage is the avoidance of spatial hole burning in the active medium, which arises in the case of angularly separated mutually coherent beams.
XGM-Based Phase Erasure
The probe beam is amplified by the available gain in the medium, which is set by the input control beam via the gain-saturation effect. The propagated probe signal is taken as the output of the nonlinear PE device. The principle of operation is succinctly described as XGM between the pump (control) and the probe beam: a stronger pump signal "saturates" the gain seen by the probe by reducing the amount of population inversion, which is determined solely by the intensity of the pump (control) optical signal, while it is insensitive to its phase (hence we have PE). The gain-saturation effect is simply modeled as a reduction of the differential gain, g, with increasing intensity according to the well-known formula [44] g͑I͒ ϵ
where I sat is the intensity level reducing the differential gain to half its small-signal value.
Higher control-signal levels correspond to lower population inversion (saturated), hence higher gain for the probe, whereas lower control-signal levels correspond to unsaturated, hence higher population inversion levels, yielding more gain and thus yielding a higher output level for the probe signal (it is assumed that saturated gains corresponding to both the H and the L signals exceed the loss coefficient of the system such that in both cases there is net gain of the probe, though at two different levels). It follows that the gain of the probe and subsequently its output level are inversely modulated by the intensity or amplitude of the control signal (insensitive to its phase). The intensity modulation of the control signal is transferred to the probe signal, realizing a characteristic M͉͑U͉͒ with the function M monotonically decreasing. This is then the principle of operation of the optical gain (or saturable absorber) module, which functions as a PE (not necessarily regenerative). Such PE device may be further converted into a PESI as described next.
XGM-Based PESI Model
The PESI module is the terminating block of the LC stage of the gate, which in turn generates the two intensity level
The objective is to approximate an ideal PESI M NOT ͉͑U͉͒ with the threshold I Th of the M NOT function (6) situated in between the two ranges, i.e., I L + Ͻ I Th Ͻ I H − . Evidently, such a device would overcome small impairments of the input alphabet and linear combining, which convert the L and H output levels into the ͓I L − , I L + ͔ , ͓I H − , I H + ͔ extended logic supports at the LC output. The proposed PESI device essentially consists of a gain medium pumped by any convenient means (optical, electrical, etc.), precisely as described above for the nonregenerative PE device. What turns the PE into a PESI is selecting a pumping level such that the device achieves transparency (differential gain= differential loss) at a control input intensity level I TTh , referred to here as transparency threshold, which satisfies the particular condition I L + Ͻ I TTh Ͻ I H − . The pumping level must be selected such that its transparency threshold is set between the L and H input power ranges of the preceding linear stage of the gate. In the special case of a unipolar input alphabet, the L level is zero, while the H level should exceed the pumping level that causes transparency. The input to the PESI device (the control) then exceeds (falls under) the threshold when the output of the preceding linear portion of the gate is H (L). If the control signal were hypothetically set right at the intermediate level I TTh (rather than falling within the valid
, then the net gain seen by the probe would null out; i.e., the probe beam would propagate at constant power:
Solving this equation for I TTh yields I TTh = I sat ͑g 0 / ␣ 0 −1͒ for the control input level that would achieve transparency. At any control power level I H exceeding the threshold, I TTh Ͻ I H , the medium experiences gain saturation; i.e., it supplies a lower gain than that provided at the intensity level I TTh , which barely sufficed to balance the net loss [this follows since the saturation function (14) monotonically decreases in I, and we have I H Ͼ I TTh , hence g͑I H ͒ Ͻ g͑I TTh ͒ = ␣ 0 ]:
Therefore, at any HIGH control intensity I H , a weak probe optical signal would see a net loss. Conversely, at any intensity level I L lower than the transparency threshold, the medium would supply net gain:
Assume for ease of exposition that the probe is launched with very low power (though this is not strictly necessary, nor desirable, as it may result in weak SNR). When there is net gain (i.e., in the case in which the control is set to I L ), assuming that the amplifying medium is sufficiently long, the probe signal intensity level I p ͑z͒ along the medium initially grows exponentially, and then the gain saturation sets in; i.e., the gain g͑z͒ gets saturated (reduced) with growing intensity according to g͑z͒ = g 0 / ͓1 + ͑I L + I p ͑z͒͒ / I sat ͔. Now the net gain coefficient seen by the probe is the difference of the gain and loss coefficients:
We may then formulate the differential evolution step
indicating that the probe intensity level first grows along the +z propagation axis, albeit at a lower and lower spatial rate of increase, as the increase in intensity reduces the differential gain; the intensity level keeps increasing to the level I TTh where the net gain is saturated down to zero, from which point the total intensity is clamped at level I TTh and the net gain continues to maintain zero value; i.e., we have reached a steady-state saturated transparency level. The gain g͑z͒ is now saturated down to the level of the loss ␣ 0 ; i.e., the term in square brackets in Eq. (20) nulls out. Solving for I p ͑ϱ͒ we have
͑21͒
hence the steady-state probe intensity (H output due to L input) is I p ͑ϱ͒ = I TTh − I L . This probe output level is achieved for LOW control inputs I L (i.e., lower than the transparency threshold) independent of the initial value with which the probe signal was launched (provided that the medium is sufficiently long, i.e., when z Ͼ 3/͉g net ͑I L ͉͒). Now assume that the control optical level is high, I H (i.e., it exceeds the transparency threshold); then the probe experiences net loss as explained above, decaying to zero regardless of the initial value with which it was launched, provided that the medium is sufficiently long. If the medium is not sufficiently long, the range of H input values, ͓I H − , I H + ͔, is mapped into a tight range of slightly positive output values, which range is still more compressed than the H input-logic support. For example, for a NOR gate the linear combiner outputs LH, HL, HH correspond to three distances that are ideally equal but owing to imperfections may have some small spread. Once they propagate through the limiter, the output values all tend to bunch together in the vicinity of zero, ideally tending to zero. This indicates that a sufficiently long PESI device tends to well approximate the ideal switching characteristic with breakpoint at the transparency threshold intensity, I TTh :
͑22͒
Notice that the PESI output is unipolar, ͓0,I p ͑ϱ͔͒, rather than bipolar. For a system based on bipolar logic, a final unipolar-to-bipolar mapping would be required at the PESI output in order to condition the signal to be suitable as input for the next gate. Hence, a third DC is to be inserted at the gain-medium output (in addition to the two DCs in the LC stage). Alternatively, this DC may be "deferred" to the next gate, wherein it may be combined with the second DC performing the reference subtraction. Using this approach we may retain at most two couplers per gate.
E. Reconfigurable Logic Gate: Detailed Structure with C&T Ports
In this subsection we outline preferred realizations of the reconfigurable gates (Figs. 9 and 10) . We also briefly describe the C&T procedure and specify its associated measurement and control ports. The proposed realizations are based on unipolar logic, which naturally arises at the output of PESI devices, as their lower output level was seen to be zero in Section 3. It is convenient to take the PESI input alphabet also as unipolar (as a nonzero L input level would cause some degree of saturation, detracting from the gain, relative to the case that the L input is zero). Figure 9 describes a particular three-gate design, utilizing all three unused outputs of the first and second DC to generate three reconfigurable logic outputs in parallel. In the case where just one or two inputs are required (rather than all three), the corresponding output PESI devices may be discarded and the optical port terminated. The reconfiguration of truth tables at the V1, V2, V3 ports is attained by selecting the reference to be one of the three respective values, R 1 ͕R L , R H , R M ͖. Notice the insertion of two additional PESI devices at the inputs (further to the PESIs following the DCs as mandated in the designs of Section 2). The role of these two extra PESIs is to provide input-level restoration, allowing the gate to operate with a variety of input levels X, Y. The two inputs are calibrated to have their binary alphabets coinciding by scaling them relative to each other by means of the (electrically controlled) pump inputs P1 and P2. The phase bias 1 preceding the Y-controlled PESI is intended to calibrate out the relative phases of the PESI outputs, rotating the phasor of the Y-controlled input into the first coupler to be collinear with that of the X-controlled input. Ideally the outputs of these two PESIs (which act as NOT gates, providing the two inputs to the first coupler inputs) should both be zero for X = Y = H, and be identical in magnitude and phase for X = Y = L. To attain this desired situation at calibration/setup time (or periodically during service cycles), the taps T1, T2 are photodetected, and the previous gates feeding the X and Y signals are required to cycle through all four combinations LL, LH, HL, HH for X and Y. When X = Y = H we should ideally get zero outputs at both T1,T2. The pumps P1,P2 may be adjusted if this is not the case. When X = Y = L, the PESI inverters ideally generate two high-output values, which are subtracted at the ⌬-port. In the case where the output T2 is not zero, that is indicative of imbalance between the two H values of the inputs into the first coupler. The signal processing may also use the photodetected output of the T1 tap to provide useful information, possibly involving applying low-frequency dithering tones to the pumps and the phase tuner 1 and lock-in detecting these tones or their harmonics in the taps T1, T2.
Once the logic inputs to the first coupler are calibrated, those are used as a reliable basis to calibrate the reference R2 input into the second coupler, setting it to the particular values ͕R L , R H , R M ͖ requisite for the unipolar scheme, where we have R L halfway in amplitude between the zero corresponding to LL and the LH/HL values (or the average of HL and LH in the wake of imperfections), R H halfway in amplitude between the LH/HL and the HH values, and R M coinciding with LH/HL values (or the average of HL and LH in the wake of imperfections) and the phasor R2 in any one of the three cases being collinear with the HH phasor. The calibration of the R2 reference is effected by changing its amplitude by means of the pump P3, which controls the gain of the input PESI to the second coupler (which actually acts not as a PESI but simply as a tunable gain amplifier), whereas the phase-tuner 2 is used to set the phase of R2 (possibly making up for the phase shift incurred in the amplifier with pump P3). The actuation of P3 and 2 is effected by means of a control loop acting on the output taps T3 and T4. As before, the control algorithm possibly involves applying lowfrequency dithering tones to the pump P3 and the phase tuner 2 and lock-in detecting these tones or their harmonics in the taps T3, T4. Moreover, it is again possible to cycle the inputs X, Y through their (already calibrated) input values to aid in the calibration of the second coupler. Actually, the calibration of the second coupler need not be conducted with high precision, as the PESIs at its output may take up the slack slicing away small variations. However, it is not desirable to deviate excessively from the ideal values, as the dynamic range (noise immunity of the system) may be reduced. The other evident function of the output PESIs is to erase the phases at the outputs of the second coupler. In fact, as already seen in Section 2, the scheme may work even without full PESIs (i.e., with a In the case wherein ideal or nearly ideal PESIs are provided at the output, we may do away with the second coupler as previously explained, reducing the system to a less complex design. nonideal switching characteristic) but rather using plain PEs to erase the phases of the outputs of the second coupler. In particular, if the gate in question is the last output stage and conversion to an electrical output is desired, these PEs (the output PESIs in Fig. 9 ) may simply be replaced by photodetectors (which are evidently sensitive to the intensity but not to the phase of the incident optical signals, hence providing the PE function).
In general, an additional factor potentially limiting the number of gates to be cascaded, even in the case where all signal settings are ideal, is the amplified spontaneous emission (ASE) noise, which keeps accumulating through the gates. Notice that the passive couplers generate no noise; however, the ASE at the input into an ideal PESI (due to upstream gates) combines with that additively generated by the PESI itself; hence we get noise accumulation, though the mechanism is not simple linear addition of the noise variances as in an optical amplifier chain, since the noise is not riding on the probe signal but rather is superposed on the control signal, which nonlinearly acts by reducing the gain seen by the probe through the mechanism of gain saturation. The ASE may set an ultimate limit on the total number of cascadable gates; however, an analysis of ASE accumulation is outside the scope of the current paper.
Using the methods of Section 2 and further applying De Morgan's rules to account for the effect PESI inverters (logic NOT) applied onto the parallel gates inputs and the outputs in Fig. 9 , we conclude that the logic functions generated at the respective ports V1, V2, V3 are as follows:
OR, NAND, XOR for the setting R L of the reference, NAND, OR, XOR for the setting R H of the reference, XOR, XNOR, XOR for the setting R M of the reference.
If nearly ideal PESIs are provided at the output, then we may actually do away with the second coupler as previously explained, reducing the system to the less complex design of Fig. 10 , nevertheless at the expense of a somewhat reduced dynamic range, and giving up the additional XNOR output (though XOR and XNOR would still be available for the R2=R M setting but not in parallel with the other AND/NOR functions).
We again note that in the case where just one or two logic outputs out of the three outputs V1, V2, V3 are required, the output PESI device(s) may be discarded and the corresponding coupler output optical port(s) optically terminated. We further mention that using PESI devices with sufficient optical amplification gain in principle allows fan-out (having one logic output drive two or more gates) by means of optical splitters attached to the optical outputs.
It is finally noted that in this proposed system the light signals rattling through the all-optical logic circuit are all at a common wavelength, e.g., as conveniently derived from a single-optical-source power supply. In terms of the requisite light coherence properties, we note that within each individual gate we require high coherence (fixed phase relationships among the various points-which is nevertheless readily achieved, given the small dimensions of each gate). Conveniently, however, there is no requirement of mutual coherence between different gates, because the phase is erased at each gate, considerably easing the design constraints.
FUNDAMENTAL LIMITS OF ENERGY CONSUMPTION PER BIT
In this final section we strive to formulate fundamental lower bounds on the energy consumption per bit for the proposed logic devices. We separately consider the linear and the nonlinear sections of the gate. We mention that these are ultimate lower bounds of theoretical interest, unlikely ever to be achieved in practice, much like Landauer's kT ln 2 limit [13] .
A. Three-Way Linear Combiner: Minimum Energy Expenditure The gate's linear stage should produce at least one photon of optical energy to be transferred to the nonlinear PE stage. We model the dissipation only in the first option proposed for the linear section. Considering the LC structure of Fig. 1(b) , consisting of a pair of DCs in tandem, the light at the dangling ports of the first and second couplers is lost, detracting from the gate efficiency (notice that in Fig. 9 we have managed to reuse the dangling ports, generating two extra logic functions; however, depending on the specifications of the overall logic circuit design, the additional logic functions generated at the dangling ports might not be useful). Thus a two-DC design attains an I/O energy efficiency of 25%. Indeed, on average, half the light is lost in the first coupler and half in the second coupler. This means that out of each four photons input into the LC, just one photon on average makes it to the output, while three are lost. In principle, the PE may ideally be run with a single input photon. To get this photon at the LC output (PE input), we would waste three photons on average in the LC; i.e., the minimum (average) energy expended in the linear section is 3h.
B. Phase-Erasure Energy Expenditure
We claim that the PE process may be viewed as suppressing one of the two quadratures of the input optical channel. Indeed, the PE of the input U may be in principle implemented by counterrotating U by the phase angle ⌽ =−Є U by means of an electro-optic modulator, yielding Ue j⌽ = ͉U͉e −jЄU e jЄU = ͉U͉. The phase modulation, i.e., multiplication by e j⌽ , is lossless (unitary); however, the daemon "knowing" the angle and applying it to the electrooptic modulator is actually dissipative. One must perform a measurement of the angle of U, which requires expending some energy. In fact, as the phase modulation is lossless, the minimum amount of energy possibly expended in this measurement ultimately equals the minimum amount of energy entailed in the PE process. In other words, we claim that measuring the phase and erasing it are energetically equivalent. However, rather than seeking the minimum energy entailed in the phase measurement process, we focus on the particular implementation of the PE process whereby U is complex-rotated to get aligned with the I-quadrature, yielding ͉U͉. This means that U has been subjected to a process where it lost its Q-quadrature component; i.e., we start with U having both quadratures, and we end up in ͉U͉ having a single quadrature. This is reminiscent of Landauer's original analysis of the energy wasted in an irreversible logic gate [13] , which has two input ports but a single output port (here the quadratures are analogous to Landauer's gate input ports).
Adapting the thermodynamic argument [13] to the current setup, the number of DOFs or microstates (referred to here as multiplicity) is halved, as the microstates associated with the suppressed quadrature are eliminated and the two quadratures are symmetric; hence each has the same number of microstates. In the process, the input entropy, given by S in = k ln Multiplicity, is reduced to S out = k ln Multiplicity/2, yielding the following entropy change for the phase eraser: ⌬S ϵ S out − S in = k ln 1/2 =−k ln 2. The environment then gains at least as much entropy as ⌬S env ജ −⌬S eraser = k ln 2 (such that overall the entropy does not decrease), and since ⌬S env = ⌬Q / T, where ⌬Q is the energy flowing from the eraser to the environment, it then follows that ⌬Q = T⌬S env = kT ln 2. We conclude that the minimum energy per application of the PE is kT ln 2. We have seen above that the minimum energy expended in the linear combiner section is 3h. Adding up the two contributions, it is apparent that the fundamental lower bound on the total energy expended per bit in each gate is 3h + kT ln 2. We may be certain that any gate structure comprising a DC pair and a PE, no matter what its nature, will never expend less than 3h + kT ln 2; however, this is a very unrealistically loose lower bound, as practical realizations will invariably expend many orders of magnitude higher energy per gate (as do their microelectronic counterparts, relative to the minute kT ln 2 Landauer limit). In particular, the ideally assumed single-photon PE and detection is unrealistic in the wake of device losses, amplified spontaneous emission, and other noise sources. Most of the extra power dissipation would be related to "optical power supply" losses, i.e., the power dissipated in optically pumping the media [12, 45] .
CONCLUSIONS
Succinctly described, our main gate architecture consists of a linear-optics two-or three-wave linear combiner. In the case of the three-wave combiner, with two of the waves being taken as logic inputs while the third wave is a reference determining the gate truth table, the gate is terminated in a phase-insensitive possibly regenerative nonlinearity. Notice that existing AO logic schemes place the full onus of the logic implementation on the nonlinear part, typically experiencing tough trade-offs among performance, energy efficiency, and sheer size. In contrast, in our "divide-and-conquer" approach between the linear and the nonlinear parts of the gate, it is the linear part that efficiently performs the truth-table-dependent logic by means of lightwave interference up to a wrong phase, which must be erased by the nonlinear part (which has a fixed structure, independent of the truth table).
In principle, the implementation of this requirement should be facilitated by the decoupling of the nonlinear phase-erasure function from the linear-optics front, which enables separate implementation of the nonlinear section by a variety of optical nonlinear effects without bearing the burden of the logic-related interactions, which are all performed in the linear-optics preceding section.
While the linear section of the new reconfigurable gate is simple to implement, the remaining challenge is to develop the most effective implementation of the phaseerasure transfer characteristics with regeneration (logiclevel restoration). Here we outlined the usage of gain saturation, or saturable absorption, as a potential phaseerasure regenerative mechanism, which may be preferred relative to bulkier parametric nonlinear interactions. However, we envision that a variety of other mechanisms may be, and probably will be, further proposed and investigated once this architecture is disseminated.
Another key aspect to investigate further is the photonic integration of the linear and nonlinear section of each gate, and of multiple gates, onto a single PIC substrate.
We have seen that the fundamental (very loose) lower limit for the expended energy per gate is of the order of 3h + kT ln 2. An analysis similar to that in [12, 45] should be performed to determine the much higher realistic lower bounds on the energy consumption for each of the proposed optical implementations, e.g., as related to the pumping of the PESI gain media.
An interesting architectural challenge, to be further investigated, is to make the optical linear stage ideally lossless by porting quantum computing concepts into the current classical optical computing setting, performing linear logic by means of unitary transformations without energy expenditure.
APPENDIX: OPTICAL THREE-PORT AS LINEAR COMBINER
An O3P is a 3 ϫ 3 optical multiport, i.e., a device with three input and three output ports. For our application we terminate two of the output ports. The complex amplitude at the retained output port is then a linear combination of the complex amplitude of the three inputs. By "symmetrically structured" we mean that the O3P has threefold rotational symmetry. Either a fused-fiber O3P fabricated by twisting and fusing three single-mode fibers or a mixed-rod device where a thin platelet of glass mixes light from three input fibers and divides it among three output fibers may be constructed with threefold symmetry. Let X , Y , R be the O3P inputs and U , UЈ , UЉ the outputs, then an ideal lossless symmetrically structured O3P device is described by the following unitary transfer ma- 
͑A1͒
Notice that all the matrix elements have identical magnitude, a consequence of the threefold structural symmetry; however, their phases depend on ⌿ i , ⌽ k , which in turn are affected by the selection of reference planes along the input and output waveguides. As in the case of the cascade of two DCs, an O3P-based implementation also requires the ability to tweak the complex amplitudes on the I/O ports, in effect tuning the u i and v k parameters. In our application, we adjust these parameters to satisfy v 1 u 1 = v 2 u 2 =−v 3 u 3 such that U = ͑X + Y − R͒ / ͱ 3, and we terminate the UЈ , UЉ outputs, thus realizing the desired LC function. The O3P may be realized as a multimode interference (MMI) waveguide fabricated as part of planar photonic circuit [16, 17] . O3P Fundamental Energy Dissipation: Since a O3P terminates two of its three output ports and we use structures with threefold rotational symmetry, then the input to output energy efficiency is 1 / 3, even for an ideally lossless device. Hence, on average, to obtain one output photon three photons must be input, two of which are lost. Hence, the minimum (average) energy expended in the linear section using an O3P is 2h.
