Abstract-Recently, the use of wireless (or contactless) 3D integration has been proposed as a low-cost method of stacking disparate processing and sensor dies into singular, small formfactor ICs. Whilst such devices would be ideally suited for the Internet of Things (IoT), in the IoT, maintaining low-power consumption is of paramount importance. Contactless intertier links use significant energy when forming a magnetic field which can penetrate multiple silicon dies, and hence are often criticised for their poor power efficiency when compared to wired alternatives such as through silicon vias (TSVs). To address this, in this paper we present a novel, neuro-inspired, inductive transceiver (for transmitting data between tiers of a 3D-IC) that maintains low power consumption by encoding frames of data in terms of the latency between pulses, thereby reducing the number of transmit pulses and energy required per bit. The proposed approach is validated using commercial electromagnetic and electrical circuit simulators in 65nm CMOS technology. Results demonstrate an energy consumption of 0.79pJ/bit, representing a reduction of 31% when compared to existing state-of-the-art transceivers, or an increased communication distance of up to 1.8× for the same energy budget.
I. INTRODUCTION
T HE Internet of Things (IoT) is causing a shift in the semiconductor industry towards low-power, technologically diverse integrated circuits (ICs) for building the next generation of heterogeneous sensor nodes and edge-compute devices [1] . To realise such technologically diverse ICs, designers are exploring 3D integration where multiple tiers, each of which may be fabricated in a different process technology, are stacked and interconnected in a single integrated circuit. The requirements of these ICs include having small formfactors, however they must also be low-cost and easy to manufacture with short design cycles. Existing mainstream approaches to 3D integration presently struggle to deliver on the latter of these requirements: 3D integration using Through Silicon Vias (TSVs) is often expensive (in terms of manufacture), slow to develop, and low yielding [2] which is also true of monolithic 3D integration (M3D) that suffers from immaturity meaning that presently, yields are prohibitively low [3] . To address these challenges, more recent research Experimental data used in this paper can be found at DOI:10.5258/SOTON/D0949 (https://doi.org/10.5258/SOTON/D0949). has looked to the use of Inductive Coupling Links (ICLs) to provide low-cost, highly reliable vertical integration in such IoT systems [4] . Fig. 1 illustrates one such 3D-IC using ICLs to interconnect two layers. Here, data is encoded in a series of current pulses that are fed through planar inductors fabricated in the upper Back-End-Of-Line (BEOL) interconnect layers of the upper transmitting (Tx) die. These current pulses cause a magnetic field which is intersected by a similar planar inductor fabricated in the lower receiving (Rx) die, and hence induce corresponding voltage fluctuations which can be detected (and used to recover the transmitted data stream), as shown. This approach has been demonstrated to work well in several prior works [5] - [9] , however one of the most significant drawbacks when using ICLs is their inferior energy efficiency [5] . In this paper we address this challenge, presenting a lowenergy ICL transceiver that uses a novel time-domain approach to encode data. Prior implementations of ICLs use coding schemes where one data bit is mapped to one-or-more transmit (TX) current pulses, resulting in a significant energy consumption when implemented on-chip. The approach proposed in this paper uses the latency between pulses to encode frames of data, thereby reducing the number of Tx current pulses and overall energy consumption.
The main novel contributions of this work can be summarised as:
• Design of a low-energy inductive transceiver that uses a novel neuro-inspired coding scheme to encode data in the time domain using the latency between sequential pulses.
• Mathematical modelling of the proposed transceiver design, evaluating optimal parameters for a range of 3D integration scenarios (Section III-B).
• A hardware implementation of the proposed transceiver consisting of digital encoding/decoding logic, analog driver and analog sense-amplifier circuits (Section IV).
• Experimental validation of the proposed transceiver using commercial full-wave modelling (for the EM channel) and circuit simulation tools in 65nm technology, demonstrating a 31% energy reduction per transmitted bit, when compared to existing state-of-the-art transceivers for wireless 3D integration (Section V).
II. BACKGROUND AND RELATED WORK
When considering a typical ICL architecture (c.f. Fig. 1 ) the current consumption can be categorised in three main parts:
1) The analog transmit current (I Tx ) through the driver circuits and TX inductor to generate the magnetic field.
2) The analog receive current (I Rx ) consumed by the Rx amplifier detecting induced Rx voltage. 3) The current consumed by the supporting digital logic (I SL ), including the data encoding/decoding circuits. Existing works in the domain of contactless 3D integration typically use the inductive non-return to zero (NRZ) signalling scheme, proposed in [9] by Miura et al., which is illustrated by the wave-forms in Fig. 2(a) . Here, each rising/falling data edge is encoded as a current pulse with corresponding positive/negative polarity. These current pulses form corresponding magnetic fields and hence voltage fluctuations in neighbouring stacked dies, allowing the data to be recovered. This is a robust solution that allows data to be simply encoded using a delay buffer, and decoded using just a sense amplifier (SA) and set-reset (SR) latch as shown in Fig. 4(a) [5]- [9] . When using this scheme, by far the largest contribution is the analog transmit power consumption (over 80%) as, on average, one pulse is required per transmitted bit. Each I Tx pulse is very expensive in terms of energy consumption [9] , especially when communicating over large distances (such as multiple stacked dies) meaning that these transceivers are power hungry.
Whilst NRZ inductive line code is by far the most common encoding scheme in the context of chip-to-chip inductive communication, other works have explored alternative encoding schemes, such as pulse-interval encoding (and similar), illustrated in Fig 2(b) , which is used in the RFID standard. These schemes, however, are designed for inductively coupled channels that deliver power between tiers (e.g. CoDAPT [10] ) and as such, in terms of power efficiency (which is essential when considering IoT devices), they perform very poorly.
III. INDUCTIVE TRANSCEIVER USING SPIKE-LATENCY ENCODING
To address the significant driver circuit power consumption in existing ICLs, this paper proposes a novel Spike-latency Encoding Transceiver (SET), outlined in the following sections.
A. Overview
As shown in Fig. 1 , the inter-tier Inductive Coupling Link (ICL) consists of four main parts, these are: the digital data encoding/decoding logic, the analog Tx driver circuits, the analog Rx amplifier and the inductive channel itself, consisting of two coupled inductors. A significant amount of prior research exists exploring optimisation of inductor geometries for use in the inductive channel [11] , and the sense amplifier and HBridge driver circuits are well understood. Additionally, finegrained amplitude or phase modulation is difficult to achieve in ICLs (due to the low amplitude of Rx voltage pulses). In this paper, therefore, we adopt on-off modulation, but focus on optimising the data encoding scheme whilst considering the energy trade-offs that exist in the system as a whole, to realise a low-energy transceiver design.
In contrast to the standard inductive NRZ scheme, we propose the use of spike-latency encoding to encode data frames in the time domain. Under the proposed scheme, values are not represented directly by current pulse patterns, but by the latency between the start of the frame, and the transmit current pulse (also known as Pulse Position Modulation). Fig.  2 (c) illustrates this concept. Here, N bits (in this case N =4) are translated into a decimal value which is represented by a single I TX pulse. The value of these encoded bits is encoded in terms of the latency with which the pulse is transmitted. For example, 'b1011 is denoted by transceiving a pulse when the Rx/Tx counter (COUNT) is at time value 11, and 'b0010 is denoted by transceiving a pulse when the Rx/Tx counter (COUNT) is at time value 2. This has the effect of reducing the number of high-energy I Tx pulses required to send a given bit stream.
Using the proposed SET scheme, as N increases, the number of pulses-per-transmitted-bit decreases by N , providing a significant analog transmit energy saving. However, as N increases, the COUNT frequency (and hence supporting digital logic energy required to maintain the existing data rate) increases proportionally to 2 N . To find the most energy efficient implementation of SET, the parameter N must be carefully selected to best-exploit the trade-off between I Tx and I SL by considering the transceiver as a whole. Section III-B provides mathematical modelling to explore this trade-off in more detail.
B. Modelling
The energy per bit of the ICL (E pb ) can be considered as the summation of three basic elements (I Tx , I Rx , and I SL ), as discussed above. For the proposed spike-latency encoding scheme, E pb is therefore given by:
where V is the supply voltage and f is the link operating frequency. Here, the first term represents the transmit pulse current, which is proportional to 1/N (as more bits are encoded using a single pulse). The second term represents the current consumed by the sense amplifier; as N increases, the number of sense operations increases by 2 N and hence this term is proportional to 2 N . The final component represents the supporting logic. The number of clock edges in the supporting logic to maintain a given data-rate will also increase by 2 N and hence this term is also proportional to 2 N . Additionally, the number of gates depends on N and so I SL is also a function of N (see below). These three elements (I Tx , I Rx , and I SL ) can be approximated as follows. The transmit pulse current (I Tx ) can be modelled mathematically by a gaussian pulse [9] :
Where I p is the peak amplitude of the current pulse required to ensure error-free pulse detection in the receiver and δ is the duration, determined by a delay element (c.f. Fig. 4 ). Given a wireless channel, with coupling coefficient k, using inductors with inductance L Tx and L Rx , the voltage pulse amplitude induced in the Rx coil is given by:
For transmission to be robust, V Rx must be greater than the minimum receiver sensitivity threshold V St (for example the minimum voltage fluctuation that can be detected by the sense amplifier). I p can therefore be obtained using Eqn. 4 below:
Once I p has been obtained, Eqn. 2 can be used to find I Tx . The receiver current (I Rx ) consumed in the sense amplifier can be modelled statically, as the average current required for a single sense operation will remain constant. However, the amount of supporting digital logic (I SL ) in the data encoder/decoder depends on N . Approximately, I SL (N ) can be modelled by:
where I DFF , I XOR , and I AND represent the dynamic current consumption of a flip-flop, XOR and AND gate respectively 1 (justification for this is provided later, in Section IV-A).
C. Evaluation of Modelling
Similar modelling can be applied to predict the performance of the existing state-of-the-art NRZ scheme. The bar chart in 1 For this basic mathematical modelling, static power consumption is considered negligible and hence ignored The variation of total energy consumption as channel coupling k and N vary. Fig. 3(a) shows the results of this modelling comparing the proposed scheme and existing NRZ scheme using databook logic gates parameters, assuming 12nH Tx and Rx inductors with k = 0.1 and a Rx sensitivity threshold V St of 30mV. As predicted, a trade-off between E pb and N can be observed, but energy savings are projected when using the proposed scheme for every value of N between 2 and 10. An optimal point (N =4) exists where a good balance between I Tx and I SL is established. At this point, the modelling results predict a ≈48% improvement in energy consumption using the proposed transceiver by reducing the number of transmit pulses in favour of additional digital processing. Fig. 3(b) shows how the optimum value of N varies as the coupling strength k between dies changes. As the EM coupling deteriorates, k reduces, the Tx current required for robust operation increases, and hence the best-performing value of N increases. For typical integrations, k is in the order of 0.15 [11] and hence, the modelling predicts that the optimal value of N will be around N =4 for typical integration scenarios
IV. HARDWARE IMPLEMENTATION
Motivated by the significant energy saving projected by these modelling results, the following sub-sections outline how the proposed transceiver can be implemented in hardware. Fig.  4 (b) presents an overall schematic for the transceiver design.
A. Encoding/Decoding Logic
The most important element of the proposed transceiver design is the encoding/decoding logic. Fig. 4(b) illustrates a practical implementation of the en/decoding logic consisting of an N bit counter (that generates the COUNT signal) and XOR-based match logic which compares the parallel Tx data bits with the incrementing COUNT signal. Here, the impact of . . .
. . increasing N on the logic size can be observed. Not only will a higher N result in a faster clock frequency, as N increases, one additional flip-flop will be required in the N -bit counter in addition to extra match logic. To minimize the bit-error-rate (BER) of the proposed scheme, instead of using a linear N -bit counter, an Nbit Gray-coded counter is used. The use of the Gray-coded counter means that if a pulse is detected in the wrong subwindow the effect of the incorrect detection on the data frame is minimised (e.g. incorrect detection of the Rx pulse in the N ± 1 th frame only translates to only 1 bit error in the whole frame). To minimize the power consumption of the system, a separate supply domain is used for implementing the digital SET logic where near-threshold voltage scaling is applied.
N-Bit

B. Driver Circuits
The proposed transceiver design uses the simple singleended driver shown in Fig. 4(a) . Whilst the spike-latency encoding scheme could be combined with a differential driver to increase the data rate (as 1 additional bit can be encoded in the phase of the pulse), it has been suggested that single ended operation offers greater energy efficiency (as the transmit power required to accurately detect a pulse is much less than that required to detect a pulse and determine its phase [12] . Results pertaining to BER, validating this assertion, are presented in Section V-A(a). Because of this, the arrangement shown in Fig. 4(b) is adopted where MN0 and MP1 are sized to meet the required I p for successful pulse detection, as discussed in Section III-B. To minimise the power consumption of the system, the width of the I TX pulse is limited by a delay element with length δ, as shown. This is analogous the δ delay used for modulation in the benchmark NRZ scheme. Assuming both schemes use the same δ, the upper-bound to the achievable bandwidth of the proposed scheme will be 1/ (N δ) compared to the NRZ benchmark which will be 1/δ.
C. Sense Amplifier
Finally, Fig. 5 shows the sense-amplifier adopted in the proposed transceiver. The design operates on the basis that, whilst SAMPLE is high, the RX signal is amplified by the NMOS pair MN4. This causes a negative pulse based on the differential input which is passed through buffers MP8 and MP9, latched to avoid glitching, and then used to copy the Rx COUNT value to the output, as shown in Fig. 4(b) .
V. EXPERIMENTAL VALIDATION AND RESULTS
To validate the proposed Spike-latency Encoding Transceiver (SET), it was compared to the existing inductive NRZ design (discussed in the introduction) using commercial EM and circuit simulators. For each simulation, the 65nm technology stackup shown in Fig. 6 was assumed where the Tx and Rx inductors exist in M9. The COIL-3D software tool [11] was used to generate optimal inductor layouts for this technology which correspond to parameters; outerdiameter=200 µm, number-of-turns=7, track-width=5 µm, and track-spacing=1 µm. For all experiments, four parallel inductive links are assumed, however reported results are normalized to a single channel. The EM channel was modelled using Ansys HFSS with the simulation setup shown in Fig. 6 . Each channel is assumed to transmit an equiprobable random bit stream to generate noise (crosstalk) between channels for accurate BER estimations.
A. Comparison with the state-of-the-art
A number of different comparisons were performed and the results are documented in the corresponding subsections: Proposed SET Approach (Linear COUNT) 75 9.38E-7
Proposed SET Approach (Gray-coded COUNT) 72 9.00E-7 (22% Improvement)
(a) Bit Error Rate (BER) and latency comparisons of the existing state-of-the-art NRZ and proposed SET approaches. (a) Bit Error Rate and Latency: Initially, the BER of the proposed scheme was evaluated. The results are presented in Table I where three cases have been explored -the NRZ benchmark, SET with a linear COUNT, and SET with a Graycoded COUNT. Across these simulations, the measured BER when using the proposed transceiver is lower than the NRZ benchmark. Whilst this may seem counter-intuitive (as a single bit error in the proposed scheme can effect a full 4-bit frame), the use of a single-phase scheme (on-off keying) means that the proposed transceiver is much more resilient to noise, as it does not rely upon the correct detection of phase. The proposed, single-phase spike-encoding transceiver demonstrated a BER < 9.5E-7, representing a 22% improvement over the state-of-the-art. Interestingly, the Gray-coded counter had little effect on BER, meaning that errors are likely arising from pulses being completely missed, rather than decoded in the wrong time-interval. The latency when using the SET approach is, however, greater than that when using the existing NRZ approach as the full data frame must be present before transmission. When using SET, the latency is 4 (or N ) clock cycles, rather just a single cycle.
(b) Area: Following this, the area overhead of the proposed SET approach was evaluated. Fig. 7 shows the layout of the proposed transceiver circuits using a TSMC 65nm CMOS process. The only additional overhead, when compared to the NRZ scheme is the encoding/decoding logic highlighted on the figure. As shown, the additional SET control logic does not add significantly to the footprint of the transceiver, in fact only contributing 1.7% of the overall area. The post-layout area breakdown of each component in both proposed and existing transceivers is provided in Table II .
(c) Energy-per-Bit: The effectiveness of the proposed transceiver in reducing energy consumption (the primary motivation for this study) was then evaluated. The energy-per-bit of the proposed approach was measured for a range of N values and compared with the sate-of-the-art NRZ transceiver (Miura et al). Fig. 8 shows the energy-per-bit for both transceivers (for the case of SET, multiple values of N are considered). As can be observed from the figure, the proposed transceiver is successful in reducing the energy consumption by up to 31% when compared with the state of the art. Fig. 8 also validates the mathematical modelling in Section III-B, demonstrating that N =4 performs optimally for this case. Whilst the energy savings are not quite as large as those projected in Section III-C, the proposed scheme still outperforms the existing approach by almost one-third in terms of energy. Fig. 9 shows the energy breakdown of (b) the proposed transceiver compared to (a) the state-of-the-art. The area of each circle represents the total energy consumption. Here it can be observed that the driver circuit energy portion has been reduced significantly (from 82% to 46%) in favour of slightly more digital processing. It is also important to note these energy savings will improve with technology scaling, as the energy cost of the SET digital processing will decrease as feature sizes reduce (whilst transmit energy remains constant, as a function of the communication distance). - [9] , and the proposed spike-latency encoding transceiver.
(d) Iso-Energy Communication Distance: A further experiment was performed to compare, for the same energy budget, the communication distance that can be achieved when using each transceiver design. The same simulation setup (outlined above) was used, but this time the communication distance was varied between 10 µm and 250 µm and the Tx transistors M0 and M1 were sized to meet the minimum Rx sensitivity threshold for the new channel coupling coefficient k. Fig. 10 shows the results of these experiments for parameters N =3 and N =5. As can be observed, for very small communication distances (e.g. when extreme substrate thinning is possible and the die is communicating with its closest neighbours) the benchmark NRZ approach offers slightly better energy efficiency. However, as the communication distance increases beyond around 80 µm (e.g. in the case of communication through thicker substrates or multiple tiers) the proposed SET approach exhibits superior power efficiency. As an example, assuming an arbitrary energy budget of 0.8pJ per-bit, the proposed scheme achieves a much greater communication distance of up to 220 µm (N =5) (when compared to the 121 µm which is achieved using the NRZ approach). This represents a 1.8× improvement, for the same energy consumption.
B. Overall Evaluation
Finally, Table III provides an overall summary of the results presented in this section when compared with existing prior works (NRZ encoding [5] - [9] , the predominant ICL scheme discussed previously, and CoDAPT [10] , an alternative ICL implementation that uses a high-frequency BPSK modulation scheme). Whilst the maximum data rate that can be achieved by the proposed transceiver is slightly less than existing 
31%
Improvement methods, SET is successful in reducing the energy required per transmitted bit (by 31% compared to NRZ, and 79% compared to CoDAPT) which is essential in IoT devices, and formed the primary motivation of this work.
VI. CONCLUSIONS
This paper presents a novel inductive transceiver (SET) for ICL-based 3D-ICs that encodes data between sequential pulses using spike-latency encoding. Experimental validation of the proposed design demonstrated a 31% reduction in energy per transmitted bit, when compared with the stateof-the art (with a low, 1.7% additional area overhead). In addition to this, in an iso-energy comparison, SET achieved a 1.8× improvement in communication distance when compared to existing transceivers, for the same energy budget. Whilst these gains come at the cost of a slight decrease in maximum data-rate, the proposed transceiver shows strong promise for adoption in low-power, low-cost IoT devices which do not require gigabit operating bandwidths.
