66 research outputs found

    A PAM-4 VCSEL TRANSMITTER WITH 2.5 TAP NON-LINEAR EQUALIZER IN 65NM CMOS

    Get PDF
    This thesis presents a Vertical Cavity Surface Emitting Laser (VCSEL) based transmitter that uses a nonlinear equalizer to equalize for nonlinear and bandwidth limited behavior of VCSEL. The transmitter employs PAM4 modulation scheme and a 2.5 tap nonlinear equalizer to maximize the vertical eye opening and reduce the skew in PAM4 eyes resulting from nonlinear behavior. The equalizer can also compensate for the static nonlinearity resulting from finite output impedance of tail current sources and low bandwidth resulting from the large capacitance (parasitic and pad) and large resistance (of VCSEL) at the output node. The nonlinear equalizer reduces to a traditional linear equalizer in cases where VCSEL can be approximated as linear e.g., for high bias currents. For such cases, 2.5 tap equalizer provides performance improvement over traditional 2 tap equalizer due to larger memory. The proposed architecture here implements a 2.5 tap nonlinear equalizer using a look-up-table approach and can equalize for all 32 (4^2.5) rising, falling and non-transitioning edges separately. The proposed architecture also uses a nonuniform DAC in the current mode output driver which utilizes the information related to unused levels and results in improved resolution when compared against the traditionally used uniform DAC. The transmitter consumes a power of 250mW and achieves a data rate of 50Gbps with a power efficiency of 5pJ/bit. The core transmitter area including PRBS, LUT, serializer and output driver is 375um*500um while the total chip area is 1.4mm*1.4mm. The transmitter has been implemented in 65nm CMOS technology

    ์˜คํ”„์…‹ ์ œ๊ฑฐ๊ธฐ์˜ ์ ์‘ ์ œ์–ด ๋“ฑํ™”๊ธฐ์™€ ๋ณด์šฐ-๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ํ™œ์šฉํ•œ ์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ์—ผ์ œ์™„.In this thesis, designs of high-speed, low-power wireline receivers (RX) are explained. To be specific, the circuit techniques of DC offset cancellation, merged-summer DFE, stochastic Baud-rate CDR, and the phase detector (PD) for multi-level signal are proposed. At first, an RX with adaptive offset cancellation (AOC) and merged summer decision-feedback equalizer (DFE) is proposed. The proposed AOC engine removes the random DC offset of the data path by examining the random data stream's sampled data and edge outputs. In addition, the proposed RX incorporates a shared-summer DFE in a half-rate structure to reduce power dissipation and hardware complexity of the adaptive equalizer. A prototype chip fabricated in 40 nm CMOS technology occupies an active area of 0.083 mm2. Thanks to the AOC engine, the proposed RX achieves the BER of less than 10-12 in a wide range of data rates: 1.62-10 Gb/s. The proposed RX consumes 18.6 mW at 10 Gb/s over a channel with a 27 dB loss at 5 GHz, exhibiting a figure-of-merit of 0.068 pJ/b/dB. Secondly, a 40 nm CMOS RX with Baud-rate phase-detector (BRPD) is proposed. The RX includes two PDs: the BRPD employing the stochastic technique and the BRPD suitable for multi-level signals. Thanks to the Baud-rate CDRโ€™s advantage, by not using an edge-sampling clock, the proposed CDR can reduce the power consumption by lowering the hardware complexity. Besides, the proposed stochastic phase detector (SPD) tracks an optimal phase-locking point that maximizes the vertical eye opening. Furthermore, despite residual inter-symbol interference, proposed BRPD for multi-level signal secures vertical eye margin, which is especially vulnerable in the multi-level signal. Besides, the proposed BRPD has a unique lock point with an adaptive DFE, unlike conventional Mueller-Muller PD. A prototype chip fabricated in 40 nm CMOS technology occupies an active area of 0.24 mm2. The proposed PAM-4 RX achieves the bit-error-rate less than 10-11 in 48 Gb/s and the power efficiency of 2.42 pJ/b.๋ณธ ๋…ผ๋ฌธ์€ ๊ณ ์†, ์ €์ „๋ ฅ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ์œ ์„  ์ˆ˜์‹ ๊ธฐ์˜ ์„ค๊ณ„์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ณ  ์žˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๋งํ•˜๋ฉด, ์˜คํ”„์…‹ ์ƒ์‡„, ๋ณ‘ํ•ฉ๋œ ์„œ๋จธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ ๊ธฐ์ˆ , ํ™•๋ฅ ์  ๋ณด์šฐ ๋ ˆ์ดํŠธ ํด๋Ÿญ๊ณผ ๋ฐ์ดํ„ฐ ๋ณต์›๊ธฐ, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์— ์ ํ•ฉํ•œ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ์ ์‘ ์˜คํ”„์…‹ ์ œ๊ฑฐ ๋ฐ ๋ณ‘ํ•ฉ๋œ ์„œ๋จธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ๋ฅผ ๊ฐ–์ถ˜ ์ˆ˜์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ์ ์‘ ์˜คํ”„์…‹ ์ œ๊ฑฐ ์—”์ง„์€ ์ž„์˜์˜ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์˜ ์ƒ˜ํ”Œ๋ง ๋ฐ์ดํ„ฐ, ์—์ง€ ์ถœ๋ ฅ์„ ๊ฒ€์‚ฌํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ ์ƒ์˜ ์˜คํ”„์…‹์„ ์ œ๊ฑฐํ•œ๋‹ค. ๋˜ํ•œ ํ•˜ํ”„ ๋ ˆ์ดํŠธ ๊ตฌ์กฐ์˜ ๋ณ‘ํ•ฉ๋œ ์„œ๋จธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ๋Š” ์ „๋ ฅ์˜ ์‚ฌ์šฉ๊ณผ ํ•˜๋“œ์›จ์–ด์˜ ๋ณต์žก์„ฑ์„ ์ค„์ธ๋‹ค. 40 nm CMOS ๊ธฐ์ˆ ๋กœ ์ œ์ž‘๋œ ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 0.083 mm2 ์˜ ๋ฉด์ ์„ ๊ฐ€์ง„๋‹ค. ์ ์‘ ์˜คํ”„์…‹ ์ œ๊ฑฐ๊ธฐ ๋•๋ถ„์— ์ œ์•ˆ๋œ ์ˆ˜์‹ ๊ธฐ๋Š” 10-12 ๋ฏธ๋งŒ์˜ BER์„ ๋‹ฌ์„ฑํ•œ๋‹ค. ๋˜ํ•œ ์ œ์•ˆ๋œ ์ˆ˜์‹ ๊ธฐ๋Š” 5GHz์—์„œ 27 dB์˜ ๋กœ์Šค๋ฅผ ๊ฐ–๋Š” ์ฑ„๋„์—์„œ 10 Gb/s์˜ ์†๋„์—์„œ 18.6 mW๋ฅผ ์†Œ๋น„ํ•˜๋ฉฐ 0.068 pJ/b/dB์˜ FoM์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๊ฐ€ ์žˆ๋Š” 40 nm CMOS ์ˆ˜์‹ ๊ธฐ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ˆ˜์‹ ๊ธฐ์—๋Š” ๋‘๊ฐœ์˜ ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ํฌํ•จํ•œ๋‹ค. ํ•˜๋‚˜๋Š” ํ™•๋ฅ ๋ก ์  ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š” ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ์ด๋‹ค. ๋ณด์šฐ ๋ ˆ์ดํŠธ ํด๋Ÿญ ๋ฐ์ดํ„ฐ ๋ณต์›๊ธฐ์˜ ์žฅ์  ๋•๋ถ„์— ์—์ง€ ์ƒ˜ํ”Œ๋ง ํด๋Ÿญ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ์œผ๋กœ์„œ ํŒŒ์›Œ์˜ ์†Œ๋ชจ์™€ ํ•˜๋“œ์›จ์–ด์˜ ๋ณต์žก์„ฑ์„ ์ค„์˜€๋‹ค. ๋˜ํ•œ ํ™•๋ฅ ์  ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ์ˆ˜์ง ์•„์ด ์˜คํ”„๋‹์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ตœ์ ์˜ ์œ„์ƒ ์ง€์ ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‹ค๋ฅธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์— ์ ํ•ฉํ•œ ๋ฐฉ์‹์ด๋‹ค. ์‹ฌ๋ณผ ๊ฐ„ ๊ฐ„์„ญ์ด ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์— ๋งค์šฐ ์ทจ์•ฝํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ๋”๋ผ๋„ ์ œ์•ˆ๋œ ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์šฉ ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ์ˆ˜์ง ์•„์ด ๋งˆ์ง„์„ ํ™•๋ณดํ•œ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ์ œ์•ˆ๋œ ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ๊ธฐ์กด์˜ ๋ฎฌ๋Ÿฌ-๋ฎ๋Ÿฌ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ์™€ ๋‹ฌ๋ฆฌ ์ ์‘ํ˜• ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ๊ฐ€ ์žˆ๋”๋ผ๋„ ์œ ์ผํ•œ ๋ฝ ์ง€์ ์„ ๊ฐ–๋Š”๋‹ค. ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 0.24mm2์˜ ๋ฉด์ ์„ ๊ฐ€์ง„๋‹ค. ์ œ์•ˆ๋œ PAM-4 ์ˆ˜์‹ ๊ธฐ๋Š” 48 Gb/s์˜ ์†๋„์—์„œ 10-11 ๋ฏธ๋งŒ์˜ BER์„ ๊ฐ€์ง€๊ณ , 2.42 pJ/b์˜ FoM์„ ๊ฐ€์ง„๋‹ค.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 5 CHAPTER 2 BACKGROUNDS 6 2.1 BASIC ARCHITECTURE IN SERIAL LINK 6 2.1.1 SERIAL COMMUNICATION 6 2.1.2 CLOCK AND DATA RECOVERY 8 2.1.3 MULTI-LEVEL PULSE-AMPLITUDE MODULATION 10 2.2 EQUALIZER 12 2.2.1 EQUALIZER OVERVIEW 12 2.2.2 DECISION-FEEDBACK EQUALIZER 15 2.2.3 ADAPTIVE EQUALIZER 18 2.3 CLOCK RECOVERY 21 2.3.1 2X OVERSAMPLING PD ALEXANDER PD 22 2.3.2 BAUD-RATE PD MUELLER MULLER PD 25 CHAPTER 3 AN ADAPTIVE OFFSET CANCELLATION SCHEME AND SHARED SUMMER ADAPTIVE DFE 28 3.1 OVERVIEW 28 3.2 AN ADAPTIVE OFFSET CANCELLATION SCHEME AND SHARED-SUMMER ADAPTIVE DFE FOR LOW POWER RECEIVER 31 3.3 SHARED SUMMER DFE 37 3.4 RECEIVER IMPLEMENTATION 42 3.5 MEASUREMENT RESULTS 45 CHAPTER 4 PAM-4 BAUD-RATE DIGITAL CDR 51 4.1 OVERVIEW 51 4.2 OVERALL ARCHITECTURE 53 4.2.1 PROPOSED BAUD-RATE CDR ARCHITECTURE 53 4.2.2 PROPOSED ANALOG FRONT-END STRUCTURE 59 4.3 STOCHASTIC PHASE DETECTION PAM-4 CDR 64 4.3.1 PROPOSED STOCHASTIC PHASE DETECTION 64 4.3.2 COMPARISON OF THE STOCHASTIC PD WITH SS-MMPD 70 4.4 PHASE DETECTION FOR MULTI-LEVEL SIGNALING 73 4.4.1 PROPOSED BAUD-RATE PHASE DETECTOR FOR MULTI-LEVEL SIGNAL 73 4.4.2 DATA LEVEL AND DFE COEFFICIENT ADAPTATION 79 4.4.3 PROPOSED PHASE DETECTOR 84 4.5 MEASUREMENT RESULT 88 4.5.1 MEASUREMENT OF THE PROPOSED STOCHASTIC BAUD-RATE PHASE DETECTION 94 4.5.2 MEASUREMENT OF THE PROPOSED BAUD-RATE PHASE DETECTION FOR MULTI-LEVEL SIGNAL 97 CHAPTER 5 CONCLUSION 103 BIBLIOGRAPHY 105 ์ดˆ ๋ก 109๋ฐ•

    Contactless Test Access Mechanism for 3D IC

    Get PDF
    3D IC integration presents many advantages over the current 2D IC integration. It has the potential to reduce the power consumption and the physical size while supporting higher bandwidth and processing speed. Through Silicon Viaโ€™s (TSVs) are vertical interconnects between different layers of 3D ICs with a typical 5ฮผm diameter and 50ฮผm length. To test a 3D IC, an access mechanism is needed to apply test vectors to TSVs and observe their responses. However, TSVs are too small for access by current wafer probes and direct TSV probing may affect their physical integrity. In addition, the probe needles for direct TSV probing must be cleaned or replaced frequently. Contactless probing method resolves most of the TSV probing problems and can be employed for small-pitch TSVs. In this dissertation, contactless test access mechanisms for 3D IC have been explored using capacitive and inductive coupling techniques. Circuit models for capacitive and inductive communication links are extracted using 3D full-wave simulations and then circuit level simulations are carried out using Advanced Design System (ADS) design environment to verify the results. The effects of cross-talk and misalignment on the communication link have been investigated. A contactless TSV probing method using capacitive coupling is proposed and simulated. A prototype was fabricated using TSMC 65nm CMOS technology to verify the proposed method. The measurement results on the fabricated prototype show that this TSV probing scheme presents -55dB insertion loss at 1GHz frequency and maintains higher than 35dB signal-to-noise ratio within 5ยตm distance. A microscale contactless probe based on the principle of resonant inductive coupling has also been designed and simulated. Experimental measurements on a prototype fabricated in TSMC 65nm CMOS technology indicate that the data signal on the TSV can be reconstructed when the distance between the TSV and the probe remains less than 15ยตm

    High Speed Reconfigurable NRZ/PAM4 Transceiver Design Techniques

    Get PDF
    While the majority of wireline standards use simple binary non-return-to-zero (NRZ) signaling, four-level pulse-amplitude modulation (PAM4) standards are emerging to increase bandwidth density. This dissertation proposes efficient implementations for high speed NRZ/PAM4 transceivers. The first prototype includes a dual-mode NRZ/PAM4 serial I/O transmitter which can support both modulations with minimum power and hardware overhead. A source-series-terminated (SST) transmitter achieves 1.2Vpp output swing and employs lookup table (LUT) control of a 31-segment output digital-to-analog converter (DAC) to implement 4/2-tap feed-forward equalization (FFE) in NRZ/PAM4 modes, respectively. Transmitter power is improved with low-overhead analog impedance control in the DAC cells and a quarter-rate serializer based on a tri-state inverter-based mux with dynamic pre-driver gates. The transmitter is designed to work with a receiver that implements an NRZ/PAM4 decision feedback equalizer (DFE) that employs 1 finite impulse response (FIR) and 2 infinite impulse response (IIR) taps for first post-cursor and long-tail ISI cancellation, respectively. Fabricated in GP 65-nm CMOS, the transmitter occupies 0.060mmยฒ area and achieves 16Gb/s NRZ and 32Gb/s PAM4 operation at 10.4 and 4.9 mW/Gb/s while operating over channels with 27.6 and 13.5dB loss at Nyquist, respectively. The second prototype presents a 56Gb/s four-level pulse amplitude modulation (PAM4) quarter-rate wireline receiver which is implemented in a 65nm CMOS process. The frontend utilize a single stage continuous time linear equalizer (CTLE) to boost the main cursor and relax the pre-cursor cancelation requirement, requiring only a 2-tap pre-cursor feed-forward equalization (FFE) on the transmitter side. A 2-tap decision feedback equalizer (DFE) with one finite impulse response (FIR) tap and one infinite impulse response (IIR) tap is employed to cancel first post-cursor and longtail inter-symbol interference (ISI). The FIR tap direct feedback is implemented inside the CML slicers to relax the critical timing of DFE and maximize the achievable data-rate. In addition to the per-slice main 3 data samplers, an error sampler is utilized for background threshold control and an edge-based sampler performs both PLL-based CDR phase detection and generates information for background DFE tap adaptation. The receiver consumes 4.63mW/Gb/s and compensates for up to 20.8dB loss when operated with a 2- tap FFE transmitter. The experimental results and comparison with state-of-the-art shows superior power efficiency of the presented prototypes for similar data-rate and channel loss. The usage of proposed design techniques are not limited to these specific prototypes and can be applied for any wireline transceiver with different modulation, data-rate and CMOS technology

    PHY Link Design and Optimization For High-Speed Low-Power Communication Systems

    Get PDF
    The ever-growing demands for high-bandwidth data transfer have been pushing towards advancing research efforts in the field of high-performing communication systems. Studies on the performance of single chip, e.g. faster multi-core processors and higher system memory capacity, have been explored. To further enhance the system performance, researches have been focused on the improvement of data-transfer bandwidth for chip-to-chip communication in the high-speed serial link. Many solutions have been addressed to overcome the bottleneck caused by the non-idealties such as bandwidth-limited electrical channel that connects two link devices and varieties of undesired noise in the communication systems. Nevertheless, with these solutions data have run into limitations of the timing margins for high-speed interfaces running at multiple gigabits per second data rates on low-cost Printed Circuit Board (PCB) material with constrained power budget. Therefore, the challenge in designing a physical layer (PHY) link for high-speed communication systems turns out to be power-efficient, reliable and cost-effective. In this context, this dissertation is intended to focus on architectural design, system-level and circuit-level verification of a PHY link as well as system performance optimization in respective of power, reliability and adaptability in high-speed communication systems. The PHY is mainly composed of clock data recovery (CDR), equalizers (EQs) and high- speed I/O drivers. Symmetrical structure of the PHY link is usually duplicated in both link devices for bidirectional data transmission. By introducing training mechanisms into high-speed communication systems, the timing in one link device is adaptively aligned to the timing condition specified in the other link device despite of different skews or induced jitter resulting from process, voltage and temperature (PVT) variations in the individual link. With reliable timing relationships among the interface signals provided, the total system bandwidth is dramatically improved. On the other hand, interface training offers high flexibility for reuse without further investigation on high demanding components involved in high costs. In the training mode, a CDR module is essential for reconstructing the transmitted bitstream to achieve the best data eye and to detect the edges of data stream in asynchronous systems or source-synchronous systems. Generally, the CDR works as a feedback control system that aligns its output clock to the center of the received data. In systems that contain multiple data links, the overall CDR power consumption increases linearly with the increase in number of links as one CDR is required for each link. Therefore, a power-efficient CDR plays a significant role in such systems with parallel links. Furthermore, a high performance CDR requires low jitter generation in spite of high input jitter. To minimize the trade-off between power consumption and CDR jitter, a novel CDR architecture is proposed by utilizing the proportional-integral (PI) controller and three times sampling scheme. Meanwhile, signal integrity (SI) becomes critical as the data rate exceeds several gigabits per second. Distorted data due to the non-idealties in systems are likely to reduce the signal quality aggressively and result in intolerable transmission errors in worst case scenarios, thus affect the system effective bandwidth. Hence, additional trainings such as transmitter (Tx) and receiver (Rx) EQ trainings for SI purpose are inserted into the interface training. Besides, a simplified system architecture with unsymmetrical placement of adaptive Rx and Tx EQs in a single link device is proposed and analyzed by using different coefficient adaptation algorithms. This architecture enables to reduce a large number of EQs through the training, especially in case of parallel links. Meanwhile, considerable power and chip area are saved. Finally, high-speed I/O driver against PVT variations is discussed. Critical issues such as overshoot and undershoot interfering with the data are primarily accompanied by impedance mismatch between the I/O driver and its transmitting channel. By applying PVT compensation technique I/O driver impedances can be effectively calibrated close to the target value. Different digital impedance calibration algorithms against PVT variations are implemented and compared for achieving fast calibration and low power requirements

    Source-synchronous I/O Links using Adaptive Interface Training for High Bandwidth Applications

    Get PDF
    Mobility is the key to the global business which requires people to be always connected to a central server. With the exponential increase in smart phones, tablets, laptops, mobile traffic will soon reach in the range of Exabytes per month by 2018. Applications like video streaming, on-demand-video, online gaming, social media applications will further increase the traffic load. Future application scenarios, such as Smart Cities, Industry 4.0, Machine-to-Machine (M2M) communications bring the concepts of Internet of Things (IoT) which requires high-speed low power communication infrastructures. Scientific applications, such as space exploration, oil exploration also require computing speed in the range of Exaflops/s by 2018 which means TB/s bandwidth at each memory node. To achieve such bandwidth, Input/Output (I/O) link speed between two devices needs to be increased to GB/s. The data at high speed between devices can be transferred serially using complex Clock-Data-Recovery (CDR) I/O links or parallely using simple source-synchronous I/O links. Even though CDR is more efficient than the source-synchronous method for single I/O link, but to achieve TB/s bandwidth from a single device, additional I/O links will be required and the source-synchronous method will be more advantageous in terms of area and power requirements as additional I/O links do not require extra hardware resources. At high speed, there are several non-idealities (Supply noise, crosstalk, Inter- Symbol-Interference (ISI), etc.) which create unwanted skew problem among parallel source-synchronous I/O links. To solve these problems, adaptive trainings are used in time domain to synchronize parallel source-synchronous I/O links irrespective of these non-idealities. In this thesis, two novel adaptive training architectures for source-synchronous I/O links are discussed which require significantly less silicon area and power in comparison to state-of-the-art architectures. First novel adaptive architecture is based on the unit delay concept to synchronize two parallel clocks by adjusting the phase of one clock in only one direction. Second novel adaptive architecture concept consists of Phase Interpolator (PI)-based Phase Locked Loop (PLL) which can adjust the phase in both direction and achieve faster synchronization at the expense of added complexity. With an increase in parallel I/O links, clock skew which is generated by the improper clock tree, also affects the timing margin. Incorrect duty cycle further reduces the timing margin mainly in Double Data Rate (DDR) systems which are generally used to increase the bandwidth of a high-speed communication system. To solve clock skew and duty cycle problems, a novel clock tree buffering algorithm and a novel duty cycle corrector are described which further reduce the power consumption of a source-synchronous system

    ์ฐจ์„ธ๋Œ€ HBM ์šฉ ๊ณ ์ง‘์ , ์ €์ „๋ ฅ ์†ก์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ์ •๋•๊ท .This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV. At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a ยฑ 10% supply variation. In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ฐจ์„ธ๋Œ€ HBM์„ ์œ„ํ•œ ๊ณ ์ง‘์  ์ €์ „๋ ฅ ์†ก์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์ „์•• ๋ฐ ์˜จ๋„ ๋ณ€ํ™”์— ์˜ํ•œ ๋ฐ์ดํ„ฐ์™€ ํด๋Ÿญ ๊ฐ„ ์œ„์ƒ ์ฐจ์ด๋ฅผ ๋ณด์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์ž์ฒด ์ถ”์  ๋ฃจํ”„๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ ์ˆ˜์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์ž์ฒด ์ถ”์  ๋ฃจํ”„๋Š” ๋ฐ์ดํ„ฐ ์ „์†ก ์†๋„์™€ ๊ฐ™์€ ์†๋„๋กœ ๋™์ž‘ํ•˜๋Š” ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „๋ ฅ ์†Œ๋ชจ์™€ ๋ฉด์ ์„ ์ค„์˜€๋‹ค. ๋˜ํ•œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์“ฐ๊ธฐ ํ›ˆ๋ จ (write training) ๊ณผ์ •์„ ์ด์šฉํ•˜์—ฌ ํšจ๊ณผ์ ์œผ๋กœ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ์˜ ์˜คํ”„์…‹์„ ๋ณด์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ˆ˜์‹ ๊ธฐ๋Š” 65 nm ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์–ด 4.8 Gb/s์—์„œ 370 fJ/b์„ ์†Œ๋ชจํ•˜์˜€๋‹ค. ๋˜ํ•œ 10 % ์˜ ์ „์•• ๋ณ€ํ™”์— ๋Œ€ํ•˜์—ฌ ์•ˆ์ •์ ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ €์™€ ๊ฒฐํ•ฉ๋œ ํฌ๋กœ์Šค ํ† ํฌ ๋ณด์ƒ ๋ฐฉ์‹์„ ํ™œ์šฉํ•œ ๊ณ ์ง‘์  ์†ก์ˆ˜์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์†ก์‹ ๊ธฐ๋Š” ํฌ๋กœ์Šค ํ† ํฌ ํฌ๊ธฐ์— ํ•ด๋‹นํ•˜๋Š” ๋งŒํผ ์†ก์‹ ๊ธฐ ์ถœ๋ ฅ์„ ์™œ๊ณกํ•˜์—ฌ ํฌ๋กœ์Šค ํ† ํฌ๋ฅผ ๋ณด์ƒํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ํฌ๋กœ์Šค ํ† ํฌ ๋ณด์ƒ ๋ฐฉ์‹์€ ์ฑ„๋„ ์†์‹ค์„ ๋ณด์ƒํ•˜๊ธฐ ์œ„ํ•ด ๊ตฌํ˜„๋œ ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ €๋ฅผ ์žฌํ™œ์šฉํ•จ์œผ๋กœ์จ ์ถ”๊ฐ€์ ์ธ ํšŒ๋กœ๋ฅผ ์ตœ์†Œํ™”ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์†ก์ˆ˜์‹ ๊ธฐ๋Š” ํฌ๋กœ์Šค ํ† ํฌ๊ฐ€ ๋ณด์ƒ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ฑ„๋„ ๊ฐ„๊ฒฉ์„ ํฌ๊ฒŒ ์ค„์—ฌ ๊ณ ์ง‘์  ํ†ต์‹ ์„ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ง‘์ ๋„๋ฅผ ๋” ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์„ธ๋กœ๋กœ ์ธ์ ‘ํ•œ ์ฑ„๋„ ์‚ฌ์ด์˜ ์ฐจํ ์ธต์„ ์ œ๊ฑฐํ•œ ์ ์ธต ์ฑ„๋„ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. 6๊ฐœ์˜ ์†ก์ˆ˜์‹ ๊ธฐ๋ฅผ ํฌํ•จํ•œ ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 65 nm ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์—ˆ๋‹ค. HBM๊ณผ ํ”„๋กœ์„ธ์„œ ์‚ฌ์ด์˜ silicon interposer channel ์„ ๋ชจ์‚ฌํ•˜๊ธฐ ์œ„ํ•œ 6 mm ์˜ ์ฑ„๋„์ด ์นฉ ์œ„์— ๊ตฌํ˜„๋˜์—ˆ๋‹ค. ์ œ์•ˆํ•˜๋Š” ํฌ๋กœ์Šค ํ† ํฌ ๋ณด์ƒ ๋ฐฉ์‹์€ 0.5 um ๊ฐ„๊ฒฉ์˜ 6๊ฐœ์˜ ์ธ์ ‘ํ•œ ์ฑ„๋„์— ๋™์‹œ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ „์†กํ•˜์—ฌ ๊ฒ€์ฆ๋˜์—ˆ์œผ๋ฉฐ, ํฌ๋กœ์Šค ํ† ํฌ๋กœ ์ธํ•œ ์ง€ํ„ฐ๋ฅผ ์ตœ๋Œ€ 78 % ๊ฐ์†Œ์‹œ์ผฐ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์†ก์ˆ˜์‹ ๊ธฐ๋Š” 8 Gb/s/um ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๊ฐ€์ง€๋ฉฐ 6 ๊ฐœ์˜ ์†ก์ˆ˜์‹ ๊ธฐ๊ฐ€ ์ด 36.6 mW์˜ ์ „๋ ฅ์„ ์†Œ๋ชจํ•˜์˜€๋‹ค.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6 2.1 OVERVIEW 6 2.2 TRANSCEIVER ARCHITECTURE 10 2.3 READ/WRITE OPERATION 15 2.3.1 READ OPERATION 15 2.3.2 WRITE OPERATION 19 CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21 3.1 GENERALIZED MODEL 21 3.2 EFFECT OF CROSSTALK 26 CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29 4.1 OVERVIEW 29 4.2 FEATURES OF DQ RECEIVER FOR HBM 33 4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.2 OFFSET CALIBRATION 37 4.3.3 OPERATION SEQUENCE 39 4.4 CIRCUIT IMPLEMENTATION 42 4.5 MEASUREMENT RESULT 46 CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57 5.1 OVERVIEW 57 5.2 PROPOSED 3D-STAGGERED CHANNEL 61 5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61 5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66 5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72 5.4 CIRCUIT IMPLEMENTATION 77 5.4.1 OVERALL ARCHITECTURE 77 5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79 5.4.3 RECEIVER 81 5.5 MEASUREMENT RESULT 82 CHAPTER 6 CONCLUSION 93 BIBLIOGRAPHY 95 ์ดˆ ๋ก 102Docto

    Multichannel 25 Gb/s low-power driver and transimpedance amplifier integrated circuits for 100 Gb/s optical links

    Get PDF
    Highly integrated electronic driver and receiver ICs with low-power consumption are essential for the development of cost-effective multichannel fiber-optic transceivers with small form factor. This paper presents the latest results of a two-channel 28 Gb/s driver array for optical duobinary modulation and a four-channel 25 Gb/s TIA array suited for both NRZ and optical duobinary detection. This paper demonstrated that 28 Gb/s duobinary signals can be efficiently generated on chip with a delay-and-add digital filter and that the driver power consumption can be significantly reduced by optimizing the drive impedance well above 50 Omega, without degrading the signal quality. To the best of our knowledge, this is the fastest modulator driver with on-chip duobinary encoding and precoding, consuming only 652 mW per channel at a differential output swing of 6 Vpp. The 4 x 25 Gb/s TIA shows a good sensitivity of - 10.3 dBm average optical input power at 25 Gb/s for PRBS 2(31) -1 and low power consumption of 77 mW per channel. Both ICs were developed in a 130 nm SiGe BiCMOS process

    Wideband integrated circuits for optical communication systems

    Get PDF
    The exponential growth of internet traffic drives datacenters to constantly improvetheir capacity. Several research and industrial organizations are aiming towardsTbps Ethernet and beyond, which brings new challenges to the field of high-speedbroadband electronic circuit design. With datacenters rapidly becoming significantenergy consumers on the global scale, the energy efficiency of the optical interconnecttransceivers takes a primary role in the development of novel systems. Furthermore,wideband optical links are finding application inside very high throughput satellite(V/HTS) payloads used in the ever-expanding cloud of telecommunication satellites,enabled by the maturity of the existing fiber based optical links and the hightechnology readiness level of radiation hardened integrated circuit processes. Thereare several additional challenges unique in the design of a wideband optical system.The overall system noise must be optimized for the specific application, modulationscheme, PD and laser characteristics. Most state-of-the-art wideband circuits are builton high-end semiconductor SiGe and InP technologies. However, each technologydemands specific design decisions to be made in order to get low noise, high energyefficiency and adequate bandwidth. In order to overcome the frequency limitationsof the optoelectronic components, bandwidth enhancement and channel equalizationtechniques are used. In this work various blocks of optical communication systems aredesigned attempting to tackle some of the aforementioned challenges. Two TIA front-end topologies with 133 GHz bandwidth, a CB and a CE with shunt-shunt feedback,are designed and measured, utilizing a state-of-the-art 130 nm InP DHBT technology.A modular equalizer block built in 130 nm SiGe HBT technology is presented. Threeultra-wideband traveling wave amplifiers, a 4-cell, a single cell and a matrix single-stage, are designed in a 250 nm InP DHBT process to test the limits of distributedamplification. A differential VCSEL driver circuit is designed and integrated in a4x 28 Gbps transceiver system for intra-satellite optical communications based in arad-hard 130nm SiGe process
    • โ€ฆ
    corecore