163 research outputs found
Power-Proportional Optical Links
The continuous increase in data transfer rate in short-reach links, such as chip-to-chip and between servers within a data-center, demands high-speed links. As power efficiency becomes ever more important in these links, power-efficient optical links need to be designed.
Power efficiency in a link can be achieved by enabling power-proportional communication over the serial link. In power-proportional links, the power dissipated by a link is proportional to the amount of data communicated. Normally, data-rate demand is not constant, and the peak data-rate is not required all the time. If a link is not adapted according to the data-rate demand, there will be a fixed power dissipation, and the power efficiency of the link will degrade during the sub-maximal link utilization. Adapting links to real-time data-rate requirements reduces power dissipation. Power proportionality is achieved by scaling the power of the serial link linearly with the link utilization, and techniques such as variable data-rate and burst-mode can be adopted for this purpose. Links whose data rate (and hence power dissipation) can be varied in response to system demands are proposed in this work.
Past works have presented rapidly reconfigurable bandwidth in variable data-rate receivers, allowing lower power dissipation for lower data-rate operation. However, maintaining synchronization during reconfiguration was not possible since previous approaches have introduced changes in front-end delay when they are reconfigured. This work presents a technique that allows rapid bandwidth adjustment while maintaining a near-constant delay through the receiver suitable for a power-scalable variable data-rate optical link. Measurements of a fabricated integrated circuit (IC) show nearly constant energy per bit across a 2Γ variation in data rate while introducing less than 10 % of a unit interval (UI) of delay variation.
With continuously increasing data communication in data-centers, parallel optical links with ever-increasing per-lane data rates are being used to meet overall throughput demands. Simultaneously, power efficiency is becoming increasingly important for these links since they do not transmit useful data all the time. The burst-mode solution for vertical-cavity surface-emitting laser (VCSEL)-based point-to-point communication can be used to improve linksβ energy efficiency during low link activity. The burst-mode technique for VCSEL-based links has not yet been deployed commercially. Past works have presented burst-mode solutions for single-channel receivers, allowing lower power dissipation during low link activity and solutions for fast activation of the receivers. However, this work presents a novel technique that allows rapid activation of a front-end and fast locking of a clock-and-data-recovery (CDR) for a multi-channel parallel link, utilizing opportunities arising from the parallel nature of many VCSEL-based links. The idea has been demonstrated through electrical and optical measurements of a fabricated IC at 10 Gbps, which show fast data detection and activation of the circuitry within 49 UIs while allowing the front-end to achieve better energy efficiency during low link activity. Simulation results are also presented in support of the proposed technique which allows the CDR to lock within 26 UIs from when it is powered on
Clock Jitter in Communication Systems
For reliable digital communication between devices, the sources that contribute to data sampling errors must be properly modeled and understood. Clock jitter is one such error source occurring during data transfer between integrated circuits. Clock jitter is a noise source in a communication link similar to electrical noise, but is a time domain noise variable affecting many different parts of the sampling process. Presented in this dissertation, the clock jitter effect on sampling is modeled for communication systems with the degree of accuracy needed for modern high speed data communication. The models developed and presented here have been used to develop the clocking specifications and silicon budgets for industry standards such as PCI Express, USB3.0, GDDR5 Memory, and HBM Memory interfaces
μ°¨μΈλ HBM μ© κ³ μ§μ , μ μ λ ₯ μ‘μμ κΈ° μ€κ³
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2020. 8. μ λκ· .This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV.
At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a Β± 10% supply variation.
In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.λ³Έ λ
Όλ¬Έμμλ μ°¨μΈλ HBMμ μν κ³ μ§μ μ μ λ ₯ μ‘μμ κΈ° μ€κ³ λ°©λ²μ μ μνλ€. 첫 λ²μ§Έλ‘, μ μ λ° μ¨λ λ³νμ μν λ°μ΄ν°μ ν΄λ κ° μμ μ°¨μ΄λ₯Ό 보μν μ μλ μ체 μΆμ 루νλ₯Ό κ°μ§ λ°μ΄ν° μμ κΈ°λ₯Ό μ μνλ€. μ μνλ μ체 μΆμ 루νλ λ°μ΄ν° μ μ‘ μλμ κ°μ μλλ‘ λμνλ μμ κ²μΆκΈ°λ₯Ό μ¬μ©νμ¬ μ λ ₯ μλͺ¨μ λ©΄μ μ μ€μλ€. λν λ©λͺ¨λ¦¬μ μ°κΈ° νλ ¨ (write training) κ³Όμ μ μ΄μ©νμ¬ ν¨κ³Όμ μΌλ‘ μμ κ²μΆκΈ°μ μ€νμ
μ 보μν μ μλ λ°©λ²μ μ μνλ€. μ μνλ λ°μ΄ν° μμ κΈ°λ 65 nm 곡μ μΌλ‘ μ μλμ΄ 4.8 Gb/sμμ 370 fJ/bμ μλͺ¨νμλ€. λν 10 % μ μ μ λ³νμ λνμ¬ μμ μ μΌλ‘ λμνλ κ²μ νμΈνμλ€.
λ λ²μ§Έλ‘, νΌλ ν¬μλ μ΄νλΌμ΄μ μ κ²°ν©λ ν¬λ‘μ€ ν ν¬ λ³΄μ λ°©μμ νμ©ν κ³ μ§μ μ‘μμ κΈ°λ₯Ό μ μνλ€. μ μνλ μ‘μ κΈ°λ ν¬λ‘μ€ ν ν¬ ν¬κΈ°μ ν΄λΉνλ λ§νΌ μ‘μ κΈ° μΆλ ₯μ μ곑νμ¬ ν¬λ‘μ€ ν ν¬λ₯Ό 보μνλ€. μ μνλ ν¬λ‘μ€ ν ν¬ λ³΄μ λ°©μμ μ±λ μμ€μ 보μνκΈ° μν΄ κ΅¬νλ νΌλ ν¬μλ μ΄νλΌμ΄μ λ₯Ό μ¬νμ©ν¨μΌλ‘μ¨ μΆκ°μ μΈ νλ‘λ₯Ό μ΅μννλ€. μ μνλ μ‘μμ κΈ°λ ν¬λ‘μ€ ν ν¬κ° 보μ κ°λ₯νκΈ° λλ¬Έμ, μ±λ κ°κ²©μ ν¬κ² μ€μ¬ κ³ μ§μ ν΅μ μ ꡬννμλ€. λν μ§μ λλ₯Ό λ μ¦κ°μν€κΈ° μν΄ μΈλ‘λ‘ μΈμ ν μ±λ μ¬μ΄μ μ°¨ν μΈ΅μ μ κ±°ν μ μΈ΅ μ±λ ꡬ쑰λ₯Ό μ μνλ€. 6κ°μ μ‘μμ κΈ°λ₯Ό ν¬ν¨ν νλ‘ν νμ
μΉ©μ 65 nm 곡μ μΌλ‘ μ μλμλ€. HBMκ³Ό νλ‘μΈμ μ¬μ΄μ silicon interposer channel μ λͺ¨μ¬νκΈ° μν 6 mm μ μ±λμ΄ μΉ© μμ ꡬνλμλ€. μ μνλ ν¬λ‘μ€ ν ν¬ λ³΄μ λ°©μμ 0.5 um κ°κ²©μ 6κ°μ μΈμ ν μ±λμ λμμ λ°μ΄ν°λ₯Ό μ μ‘νμ¬ κ²μ¦λμμΌλ©°, ν¬λ‘μ€ ν ν¬λ‘ μΈν μ§ν°λ₯Ό μ΅λ 78 % κ°μμμΌ°λ€. μ μνλ μ‘μμ κΈ°λ 8 Gb/s/um μ μ²λ¦¬λμ κ°μ§λ©° 6 κ°μ μ‘μμ κΈ°κ° μ΄ 36.6 mWμ μ λ ₯μ μλͺ¨νμλ€.CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 THESIS ORGANIZATION 4
CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6
2.1 OVERVIEW 6
2.2 TRANSCEIVER ARCHITECTURE 10
2.3 READ/WRITE OPERATION 15
2.3.1 READ OPERATION 15
2.3.2 WRITE OPERATION 19
CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21
3.1 GENERALIZED MODEL 21
3.2 EFFECT OF CROSSTALK 26
CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29
4.1 OVERVIEW 29
4.2 FEATURES OF DQ RECEIVER FOR HBM 33
4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35
4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35
4.3.2 OFFSET CALIBRATION 37
4.3.3 OPERATION SEQUENCE 39
4.4 CIRCUIT IMPLEMENTATION 42
4.5 MEASUREMENT RESULT 46
CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57
5.1 OVERVIEW 57
5.2 PROPOSED 3D-STAGGERED CHANNEL 61
5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61
5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66
5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72
5.4 CIRCUIT IMPLEMENTATION 77
5.4.1 OVERALL ARCHITECTURE 77
5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79
5.4.3 RECEIVER 81
5.5 MEASUREMENT RESULT 82
CHAPTER 6 CONCLUSION 93
BIBLIOGRAPHY 95
μ΄ λ‘ 102Docto
Design of energy efficient high speed I/O interfaces
Energy efficiency has become a key performance metric for wireline high speed I/O interfaces. Consequently, design of low power I/O interfaces has garnered large interest that has mostly been focused on active power reduction techniques at peak data rate. In practice, most systems exhibit a wide range of data transfer patterns. As a result, low energy per bit operation at peak data rate does not necessarily translate to overall low energy operation. Therefore, I/O interfaces that can scale their power consumption with data rate requirement are desirable. Rapid on-off I/O interfaces have a potential to scale power with data rate requirements without severely affecting either latency or the throughput of the I/O interface. In this work, we explore circuit techniques for designing rapid on-off high speed wireline I/O interfaces and digital fractional-N PLLs.
A burst-mode transmitter suitable for rapid on-off I/O interfaces is presented that achieves 6 ns turn-on time by utilizing a fast frequency settling ring oscillator in digital multiplying delay-locked loop and a rapid on-off biasing scheme for current mode output driver. Fabricated in 90 nm CMOS process, the prototype achieves 2.29 mW/Gb/s energy efficiency at peak data rate of 8 Gb/s. A 125X (8 Gb/s to 64 Mb/s) change in effective data rate results in 67X (18.29 mW to 0.27 mW) change in transmitter power consumption corresponding to only 2X (2.29 mW/Gb/s to 4.24 mW/Gb/s) degradation in energy efficiency for 32-byte long data bursts. We also present an analytical bit error rate (BER) computation technique for this transmitter under rapid on-off operation, which uses MDLL settling measurement data in conjunction with always-on transmitter measurements. This technique indicates that the BER bathtub width for 10^(β12) BER is 0.65 UI and 0.72 UI during rapid on-off operation and always-on operation, respectively.
Next, a pulse response estimation-based technique is proposed enabling burst-mode operation for baud-rate sampling receivers that operate over high loss channels. Such receivers typically employ discrete time equalization to combat inter-symbol interference. Implementation details are provided for a receiver chip, fabricated in 65nm CMOS technology, that demonstrates efficacy of the proposed technique. A low complexity pulse response estimation technique is also presented for low power receivers that do not employ discrete time equalizers.
We also present techniques for implementation of highly digital fractional-N PLL employing a phase interpolator based fractional divider to improve the quantization noise shaping properties of a 1-bit βΞ£ frequency-to-digital converter. Fabricated in 65nm CMOS process, the prototype calibration-free fractional-N Type-II PLL employs the proposed frequency-to-digital converter in place of a high resolution time-to-digital converter and achieves 848 fs rms integrated jitter (1 kHz-30 MHz) and -101 dBc/Hz in-band phase noise while generating 5.054 GHz output from 31.25 MHz input
Digital signal processing optical receivers for the mitigation of physical layer impairments in dynamic optical networks
IT IS generally believed by the research community that the introduction of complex
network functionsβsuch as routingβin the optical domain will allow a better network
utilisation, lower cost and footprint, and a more efficiency in energy usage. The new optical
components and sub-systems intended for dynamic optical networking introduce
new kinds of physical layer impairments in the optical signal, and it is of paramount
importance to overcome this problem if dynamic optical networks should become a
reality. Thus, the aim of this thesis was to first identify and characterise the physical
layer impairments of dynamic optical networks, and then digital signal processing
techniques were developed to mitigate them.
The initial focus of this work was the design and characterisation of digital optical
receivers for dynamic core optical networks. Digital receiver techniques allow for complex
algorithms to be implemented in the digital domain, which usually outperform
their analogue counterparts in performance and flexibility. An AC-coupled digital receiver
for core networksβconsisting of a standard PIN photodiode and a digitiser that
takes samples at twice the Nyquist rateβwas characterised in terms of both bit-error
rate and packet-error rate, and it is shown that the packet-error rate can be optimised by
appropriately setting the preamble length. Also, a realistic model of a digital receiver
that includes the quantisation impairments was developed. Finally, the influence of
the network load and the traffic sparsity on the packet-error rate performance of the
receiver was investigated.
Digital receiver technologies can be equally applied to optical access networks,
which share many traits with dynamic core networks. A dual-rate digital receiver, capable
of detecting optical packets at 10 and 1.25 Gb/s, was developed and characterised.
The receiver dynamic range was extended by means of DC-coupling and non-linear
signal clipping, and it is shown that the receiver performance is limited by digitiser
noise for low received power and non-linear clipping for high received power
Clocking and Skew-Optimization For Source-Synchronous Simultaneous Bidirectional Links
There is continuous expansion of computing capabilities in mobile devices which
demands higher I/O bandwidth and dense parallel links supporting higher data rates. Highspeed
signaling leverages technology advancements to achieve higher data rates but is limited
by the bandwidth of the electrical copper channel which have not scaled accordingly.
To meet the continuous data-rate demand, Simultaneous Bi-directional (SBD) signaling
technique is an attractive alternative relative to uni-directional signaling as it can work at
lower clock speeds, exhibits better spectral efficiency and provides higher throughput in
pad limited PCBs.
For low-power and more robust system, the SBD transceiver should utilize forwarded
clock system and per-pin de-skew circuits to correct the phase difference developed
between the data and clock. The system can be configured in two roles, master and
slave. To save more power, the system should have only one clock generator. The master
has its own clock source and shares its clock to the slave through the clock channel, and the
slave uses this forwarded clock to deserialize the inbound data and serialize the outbound
data. A clock-to-data skew exists which can be corrected with a phase tracking CDR. This
thesis presents a low-power implementation of forwarded clocking and clock-to-data skew
optimization for a 40 Gbps SBD transceiver. The design is implemented in 28nm CMOS
technology and consumes 8.8mW of power for 20 Gbps NRZ data at 0.9 V supply. The
area occupied by the clocking 0.018 mm^2 area
Modeling Performance of the Clock Phase Caching Approach to Clock and Data Recovery
Optical switching could enable data center networks to keep pace with the rapid growth of intra-data center traffic, however, sub-nanosecond clock and data recovery time is crucial to enabling optically-switched data center networks to transport small packet dominated data center traffic with over 90% efficiency. We review the clock-synchronized approach to clock and data recovery, which enables sub-nanosecond switching time in optically switched networks. We then introduce an analytical model to mathematically explore the operation of clock phase caching, and use this model to explore the impact of factors such as fiber temperature, clock jitter and symbol rate on the BER and clock and data recovery locking time performance clock phase caching approach, as well as their impact on scalability. Using commercial data center parameters matching those used in our previous experimental research, we find that our analytical model provides estimates that closely match our previous experimental results, validating its use for making predictions of the performance of clock phase cached systems
Tradeoffs in Design of Low-Power Gated-Oscillator CDR Circuits
This article describes some techniques for implementing low- power clock and data recovery (CDR) circuits based on gated- oscillator (GO) topology for short distance applications. Here, the main tradeoffs in design of a high performance and power-efficient GO CDR are studied and based on that a top-down design methodology is introduced such that the jitter tolerance (JTOL) and frequency tolerance (FTOL) requirements of the system are simultaneously satisfied. A test chip has been implemented in standard digital 0.18 ΞΌm CMOS while the proposed CDR circuit consumes only 10.5 mW and occupies 0.045 mm2 silicon area in 2.5 Gbps data bit rate. Measurement results show a good agreement to analyses proofs the capabilities of the proposed approach for implementing low-power GO CDRs
Clock Synchronisation Assisted Clock and Data Recovery for Sub-Nanosecond Data Centre Optical Switching
In current `Cloud' data centres, switching of data between servers is performed using deep hierarchies of interconnected electronic packet switches. Demand for network bandwidth from emerging data centre workloads, combined with the slowing of silicon transistor scaling, is leading to a widening gap between data centre traffic demand and electronically-switched data centre network capacity. All-optical switches could offer a future-proof alternative, with potentially under a third of the power consumption and cost of electronically-switched networks. However, the effective bandwidth of optical switches depends on their overall switching time. This is dominated by the clock and data recovery (CDR) locking time, which takes hundreds of nanoseconds in commercial receivers. Current data centre traffic is dominated by small packets that transmit in tens of nanoseconds, leading to low effective bandwidth, as a high proportion of receiver time is spent performing CDR locking instead of receiving data, removing the benefits of optical switching. High-performance optical switching requires sub-nanosecond CDR locking time to overcome this limitation. This thesis proposes, models, and demonstrates clock synchronisation assisted CDR, which can achieve this. This approach uses clock synchronisation to simplify the complexity of CDR versus previous asynchronous approaches. An analytical model of the technique is first derived that establishes its potential viability. Following this, two approaches to clock synchronisation assisted CDR are investigated: 1. Clock phase caching, which uses clock phase storage and regular updates in a 2km intra-building scale data centre network interconnected by single-mode optical fibre. 2. Single calibration clock synchronisation assisted CDR}, which leverages the 20 times lower thermal sensitivity of hollow core optical fibre versus single-mode fibre to synchronise a 100m cluster scale data centre network, with a single initial phase calibration step. Using a real-time FPGA-based optical switch testbed, sub-nanosecond CDR locking time was demonstrated for both approaches
PHY Link Design and Optimization For High-Speed Low-Power Communication Systems
The ever-growing demands for high-bandwidth data transfer have been pushing towards advancing research efforts in the field of high-performing communication systems. Studies on the performance of single chip, e.g. faster multi-core processors and higher system memory capacity, have been explored. To further enhance the system performance, researches have been focused on the improvement of data-transfer bandwidth for chip-to-chip communication in the high-speed serial link. Many solutions have been addressed to overcome the bottleneck caused by the non-idealties such as bandwidth-limited electrical channel that connects two link devices and varieties of undesired noise in the communication systems. Nevertheless, with these solutions data have run into limitations of the timing margins for high-speed interfaces running at multiple gigabits per second data rates on low-cost Printed Circuit Board (PCB) material with constrained power budget. Therefore, the challenge in designing a physical layer (PHY) link for high-speed communication systems turns out to be power-efficient, reliable and cost-effective. In this context, this dissertation is intended to focus on architectural design, system-level and circuit-level verification of a PHY link as well as system performance optimization in respective of power, reliability and adaptability in high-speed communication systems.
The PHY is mainly composed of clock data recovery (CDR), equalizers (EQs) and high- speed I/O drivers. Symmetrical structure of the PHY link is usually duplicated in both link devices for bidirectional data transmission. By introducing training mechanisms into high-speed communication systems, the timing in one link device is adaptively aligned to the timing condition specified in the other link device despite of different skews or induced jitter resulting from process, voltage and temperature (PVT) variations in the individual link. With reliable timing relationships among the interface signals provided, the total system bandwidth is dramatically improved. On the other hand, interface training offers high flexibility for reuse without further investigation on high demanding components involved in high costs.
In the training mode, a CDR module is essential for reconstructing the transmitted bitstream to achieve the best data eye and to detect the edges of data stream in asynchronous systems or source-synchronous systems. Generally, the CDR works as a feedback control system that aligns its output clock to the center of the received data. In systems that contain multiple data links, the overall CDR power consumption increases linearly with the increase in number of links as one CDR is required for each link. Therefore, a power-efficient CDR plays a significant role in such systems with parallel links. Furthermore, a high performance CDR requires low jitter generation in spite of high input jitter. To minimize the trade-off between power consumption and CDR jitter, a novel CDR architecture is proposed by utilizing the proportional-integral (PI) controller and three times sampling scheme.
Meanwhile, signal integrity (SI) becomes critical as the data rate exceeds several gigabits per second. Distorted data due to the non-idealties in systems are likely to reduce the signal quality aggressively and result in intolerable transmission errors in worst case scenarios, thus affect the system effective bandwidth. Hence, additional trainings such as transmitter (Tx) and receiver (Rx) EQ trainings for SI purpose are inserted into the interface training. Besides, a simplified system architecture with unsymmetrical placement of adaptive Rx and Tx EQs in a single link device is proposed and analyzed by using different coefficient adaptation algorithms. This architecture enables to reduce a large number of EQs through the training, especially in case of parallel links. Meanwhile, considerable power and chip area are saved.
Finally, high-speed I/O driver against PVT variations is discussed. Critical issues such as overshoot and undershoot interfering with the data are primarily accompanied by impedance mismatch between the I/O driver and its transmitting channel. By applying PVT compensation technique I/O driver impedances can be effectively calibrated close to the target value. Different digital impedance calibration algorithms against PVT variations are implemented and compared for achieving fast calibration and low power requirements
- β¦