52 research outputs found

    High-Speed Clocking Deskewing Architecture

    Get PDF
    As the CMOS technology continues to scale into the deep sub-micron regime, the demand for higher frequencies and higher levels of integration poses a significant challenge for the clock generation and distribution design of microprocessors. Hence, skew optimization schemes are necessary to limit clock inaccuracies to a small fraction of the clock period. In this thesis, a crude deskew buffer (CDB) is designed to facilitate an adaptive deskewing scheme that reduces the clock skew in an ASIC clock network under manufacturing process, supply voltage, and temperature (PVT)variations. The crude deskew buffer adopts a DLL structure and functions on a 1GHz nominal clock frequency with an operating frequency range of 800MHz to 1.2GHz. An approximate 91.6ps phase resolution is achieved for all simulation conditions including various process corners and temperature variation. When the crude deskew buffer is applied to seven ASIC clock networks with each under various PVT variations, a maximum of 67.1% reduction in absolute maximum clock skew has been achieved. Furthermore, the maximum phase difference between all the clock signals in the seven networks have been reduced from 957.1ps to 311.9ps, a reduction of 67.4%. Overall, the CDB serves two important purposes in the proposed deskewing methodology: reducing the absolute maximum clock skew and synchronizes all the clock signals to a certain limit for the fine deskewing scheme. By generating various clock phases, the CDB can also be potentially useful in high speed debugging and testing where the clock duty cycle can be adjusted accordingly. Various positive and negative duty cycle values can be generated based on the phase resolution and the number of clock phases being “hot swapped”. For a 500ps duty cycle, the following values can be achieved for both the positive and negative duty cycle: 224ps, 316ps, 408ps, 592ps, 684ps, and 776ps

    Precise Timing of Digital Signals: Circuits and Applications

    Get PDF
    With the rapid advances in process technologies, the performance of state-of-the-art integrated circuits is improving steadily. The drive for higher performance is accompanied with increased emphasis on meeting timing constraints not only at the design phase but during device operation as well. Fortunately, technology advancements allow for even more precise control of the timing of digital signals, an advantage which can be used to provide solutions that can address some of the emerging timing issues. In this thesis, circuit and architectural techniques for the precise timing of digital signals are explored. These techniques are demonstrated in applications addressing timing issues in modern digital systems. A methodology for slow-speed timing characterization of high-speed pipelined datapaths is proposed. The technique uses a clock-timing circuit to create shifted versions of a slow-speed clock. These clocks control the data flow in the pipeline in the test mode. Test results show that the design provides an average timing resolution of 52.9ps in 0.18μm CMOS technology. Results also demonstrate the ability of the technique to track the performance of high-speed pipelines at a reduced clock frequency and to test the clock-timing circuit itself. In order to achieve higher resolutions than that of an inverter/buffer stage, a differential (vernier) delay line is commonly used. To allow for the design of differential delay lines with programmable delays, a digitally-controlled delay-element is proposed. The delay element is monotonic and achieves a high degree of transfer characteristics' (digital code vs. delay) linearity. Using the proposed delay element, a sub-1ps resolution is demonstrated experimentally in 0.18μm CMOS. The proposed delay element with a fixed delay step of 2ps is used to design a high-precision all-digital phase aligner. High-precision phase alignment has many applications in modern digital systems such as high-speed memory controllers, clock-deskew buffers, and delay and phase-locked loops. The design is based on a differential delay line and a variation tolerant phase detector using redundancy. Experimental results show that the phase aligner's range is from -264ps to +247ps which corresponds to an average delay step of approximately 2.43ps. For various input phase difference values, test results show that the difference is reduced to less than 2ps at the output of the phase aligner. On-chip time measurement is another application that requires precise timing. It has applications in modern automatic test equipment and on-chip characterization of jitter and skew. In order to achieve small conversion time, a flash time-to-digital converter is proposed. Mismatch between the various delay comparators limits the time measurement precision. This is demonstrated through an experiment in which a 6-bit, 2.5ps resolution flash time-to-digital converter provides an effective resolution of only 4-bits. The converter achieves a maximum conversion rate of 1.25GSa/s

    Architectural Solutions for NanoMagnet Logic

    Get PDF
    The successful era of CMOS technology is coming to an end. The limit on minimum fabrication dimensions of transistors and the increasing leakage power hinder the technological scaling that has characterized the last decades. In several different ways, this problem has been addressed changing the architectures implemented in CMOS, adopting parallel processors and thus increasing the throughput at the same operating frequency. However, architectural alternatives cannot be the definitive answer to a continuous increase in performance dictated by Moore’s law. This problem must be addressed from a technological point of view. Several alternative technologies that could substitute CMOS in next years are currently under study. Among them, magnetic technologies such as NanoMagnet Logic (NML) are interesting because they do not dissipate any leakage power. More- over, magnets have memory capability, so it is possible to merge logic and memory in the same device. However, magnetic circuits, and NML in this specific research, have also some important drawbacks that need to be addressed: first, the circuit clock frequency is limited to 100 MHz, to avoid errors in data propagation; second, there is a connection between circuit layout and timing, and in particular, longer wires will have longer latency. These drawbacks are intrinsic to the technology and for this reason they cannot be avoided. The only chance is to limit their impact from an architectural point of view. The first step followed in the research path of this thesis is indeed the choice and optimization of architectures able to deal with the problems of NML. Systolic Ar- rays are identified as an ideal solution for this technology, because they are regular structures with local interconnections that limit the long latency of wires; more- over they are composed of several Processing Elements that work in parallel, thus exploit parallelization to increase throughput (limiting the impact of the low clock frequency). Through the analysis of Systolic Arrays for NML, several possible im- provements have been identified and addressed: 1) it has been defined a rigorous way to increase throughput with interleaving, providing equations that allow to esti- mate the number of operations to be interleaved and the rules to provide inputs; 2) a latency insensitive circuit has been designed, that exploits a data communication protocol between processing elements to avoid data synchronization problems. This feature has been exploited to design a latency insensitive Systolic Array that is able to execute the Floyd-Steinberg dithering algorithm. All the improvements presented in this framework apply to Systolic Arrays implemented in any technology. So, they can also be exploited to increase performance of today’s CMOS parallel circuits. This research path is presented in Chapter 3. While Systolic Arrays are an interesting solution for NML, their usage could be quite limited because they are normally application-specific. The second re- search path addresses this problem. A Reconfigurable Systolic Array is presented, that can be programmed to execute several algorithms. This architecture has been tested implementing many algorithms, including FIR and IIR filters, Discrete Cosine Transform and Matrix Multiplication. This research path is presented in Chapter 4. In common Von Neumann architectures, the logic part of the circuit and the memory one are separated. Today bus communication between logic and memory represents the bottleneck of the system. This problem is addressed presenting Logic- In-Memory (LIM), an architecture where memory elements are merged in logic ones. This research path aims at defining a real LIM architectures. This has been done in two steps. The first step is represented by an architecture composed of three layers: memory, routing and logic. In the second step instead the routing plane is no more present, and its features are inherited by the memory plane. In this solution, a pyramidal memory model is used, where memories near logic elements contain the most probably used data, and other memory layers contain the remaining data and instruction set. This circuit has been tested with odd-even sort algorithms and it has been benchmarked against GPUs and ASIC. This research path is presented in Chapter 5. MagnetoElastic NML (ME-NML) is a technological improvement of the NML principle, proposed by researchers of Politecnico di Torino, where the clock system is based on the induced stretch of a piezoelectric substrate when a voltage is ap- plied to its boundaries. The main advantage of this solution is that it consumes much less power than the classic clock implementation. This technology has not yet been investigated from an architectural point of view and considering complex circuits. In this research field, a standard methodology for the design of ME-NML circuits has been proposed. It is based on a Standard Cell Library and an enhanced VHDL model. The effectiveness of this methodology has been proved designing a Galois Field Multiplier. Moreover the serial-parallel trade-off in ME-NML has been investigated, designing three different solutions for the Multiply and Accumulate structure. This research path is presented in Chapter 6. While ME-NML is an extremely interesting technology, it needs to be combined with other faster technologies to have a real competitive system. Signal interfaces between NML and other technologies (mainly CMOS) have been rarely presented in literature. A mixed-technology multiplexer is designed and presented as the basis for a CMOS to NML interface. The reverse interface (from ME-NML to CMOS) is instead based on a sensing circuit for the Faraday effect: a change in the polarization of a magnet induces an electric field that can be used to generate an input signal for a CMOS circuit. This research path is presented in Chapter 7. The research work presented in this thesis represents a fundamental milestone in the path towards nanotechnologies. The most important achievement is the de- sign and simulation of complex circuits with NML, benchmarking this technology with real application examples. The characterization of a technology considering complex functions is a major step to be performed and that has not yet been ad- dressed in literature for NML. Indeed, only in this way it is possible to intercept in advance any weakness of NanoMagnet Logic that cannot be discovered consid- ering only small circuits. Moreover, the architectural improvements introduced in this thesis, although technology-driven, can be actually applied to any technology. We have demonstrated the advantages that can derive applying them to CMOS cir- cuits. This thesis represents therefore a major step in two directions: the first is the enhancement of NML technology; the second is a general improvement of parallel architectures and the development of the new Logic-In-Memory paradigm

    Automatic calibration of modulated fractional-N frequency synthesizers

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 145-148).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.The focus of this research has been the development of a low power, radio frequency transmitter architecture. Specifically, a technique for in service automatic calibration of a modulated phase locked loop (PLL) frequency synthesizer has been developed. Phase/frequency modulation is accomplished by modulating the feedback divide value in a phase locked loop frequency synthesizer. A digital precompensation filter is used to extend the modulation bandwidth by canceling the low-pass transfer function of the PLL. The automatic calibration circuit maintains accurate matching between the digital precompensation filter and the analog PLL transfer function across process and temperature variations. The automatic calibration circuit, which is the main contribution of this thesis, operates while the transmitter is in service. This online calibration eliminates the need for production calibration and periodic down time for calibration cycles.(cont.) In addition the calibration circuitry provides greater accuracy in the modulation than what is possible via offline methods of calibration. The calibration circuit works with M-ary GFSK as well as 2 level GFSK. The automatic calibration circuit has been implemented in two forms to prove its operation. The first version is a circuit board level implementation with a center frequency of around 60 MHz. The second implementation of the system is in a full custom 0.6 ,Lm BiCMOS integrated circuit. The integrated circuit contains the complete synthesizer with automatic calibration and operates in the 1.88 GHz frequency band used by the Digital European Cordless Telephone (DECT) standard. A data rate of 2.5 Mbps using 2 level GFSK and 5.0 Mbps using 4 level GFSK has been achieved with a power consumption of 78 mW.by Daniel R. McMahill.Ph.D

    Control, Readout and Commissioning of the Ultra-High Speed 1 Megapixel DSSC X-Ray Camera for the European XFEL

    Get PDF
    The goal of this thesis was to develop the software and firmware basis to control and read out the 1-Megapixel DEPFET Sensor with Signal Compression (DSSC) detector, which is being built for the European XFEL (EuXFEL). The DSSC detector proposes single photon resolution at 0.5 keV photon energy, high dynamic range of 10000 photons at a very high frame rate of 4.5 MHz. During this thesis, the readout chain has been implemented, which receives the average data rate of 134 GBit/s from the detector and transfers sorted image data via four QSFP+ fiber cables to the DAQ system. Additional control software and firmware has been developed for commissioning of the first 1/16th megapixel prototypes. A multifunctional measurement and data analysis framework has been created which is used for characterization of the detector, particularly of properties of the complex readout ASICs. Additional parameter trimming routines to automatically adapt gain and offset parameters of a large pixel matrix have been developed in order to generate suitable configurations which have been applied during two measurement campaigns at the Petra III synchrotron. For integration into the new Karabo framework, which is provided by XFEL for beamline control, several software devices have been implemented

    Clock multiplication techniques for high-speed I/Os

    Get PDF
    Generation of a low-jitter, high-frequency clock from a low-frequency reference clock using classical analog phase-locked loops (PLLs) requires a large loop filter capacitor and power hungry oscillator. Digital PLLs can help reduce area but their jitter performance is severely degraded by quantization error. In this dissertation different clock multiplication techniques have been explored that can be suitable for high-speed wireline systems. With the emphasis on ring oscillator based architecture using cascaded stages, three possible architectures are explored. First, a scrambling TDC (STDC) is presented to improve deterministic jitter (DJ) performance when used with a low-frequency reference clock. A cascaded architecture with digital multiplying delay locked loop as the first stage and hybrid analog/digital PLL as the second stage is used to achieve low random jitter in a power efficient manner. Fabricated in a 90nm CMOS process, the prototype frequency synthesizer consumes 4.76mW power from a 1.0V supply and generates 160MHz and 2.56 GHz output clocks from a 1.25MHz crystal reference frequency. The long-term absolute jitter of the 60MHz digital MDLL and 2.56 GHz digital PLL outputs are 2.4 psrms and 4.18 psrms, while the peak-to-peak jitter is 22.1 ps and 35.2 ps, respectively. The proposed frequency synthesizer occupies an active die area of 0.16mm2 and achieves power efficiency of 1.86 mW/GHz. Second, a hybrid phase/current-mode phase interpolator (HPC-PI) is presented to improve phase noise performance of ring oscillator-based fractional-N PLLs. The proposed HPC-PI alleviates the bandwidth trade-off between VCO phase noise suppression and ΔΣ quantization noise suppression. By combining the phase detection and interpolation functions into an XOR phase detector/interpolator (XOR PD-PI) block, accurate quantization error cancellation is achieved without using calibration. Use of a digital MDLL in front of the fractional-N PLL helps in alleviating the bandwidth limitation due to reference frequency and enables bandwidth extension even further. The extended bandwidth helps in suppressing the ring-VCO phase noise and lowering the in-band noise floor. Fabricated in 65nm CMOS process, the prototype generates fractional frequencies from 4.25 to 4.75 GHz, with an in-band phase noise floor of -104 dBc/Hz and 1.5 psrms integrated jitter. The clock multiplier achieves power efficiency of 2.4mW/GHz and FoM of -225.8 dB. Finally, an efficient clock generation, recovery, and distribution techniques for flexible-rate transceivers are presented. Using a fixed-frequency low-jitter clock provided by an integer-N PLL, fractional frequencies are generated/recovered locally using multi-phase fractional clock multipliers. Fabricated in a 65nm CMOS, the prototype transceiver can be programmed to operate at any rate from 3-to-10 Gb/s. At 10 Gb/s, integrated jitter of the Tx output and recovered clock is 360 fsrms and 758 fsrms, respectively

    데이터 전송로 확장성과 루프 선형성을 향상시킨 다중채널 수신기들에 관한 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 정덕균.Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively. The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise. Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9− 28 inch Nelco 4000-6 microstrips at 4− 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1 1.1 Motivations 1.2 Thesis Organization 2. Previous Receivers for Serial-Data Communications 2.1 Classification of the Links 2.2 Clocking architecture of transceivers 2.3 Components of receiver 2.3.1 Channel loss 2.3.2 Equalizer 2.3.3 Clock and data recovery circuit 2.3.3.1. Basic architecture 2.3.3.2. Phase detector 2.3.3.2.1. Linear phase detector 2.3.3.2.2. Binary phase detector 2.3.3.3. Frequency detector 2.3.3.4. Charge pump 2.3.3.5. Voltage controlled oscillator and delay-line 2.3.4 Loop dynamics of PLL 2.3.5 Loop dynamics of DLL 3. The Proposed PLL-Based Receiver with Loop Linearization Technique 3.1 Introduction 3.2 Motivation 3.3 Overview of binary phase detection 3.4 The proposed BBPD linearization technique 3.4.1 Architecture of the proposed PLL-based receiver 3.4.2 Linearization technique of binary phase detection 3.4.3 Rotational pattern of sampling phase offset 3.5 PD gain analysis and optimization 3.6 Loop Dynamics of the 2nd-order CDR 3.7 Verification with the time-accurate behavioral simulation 3.8 Summary 4. The Proposed DLL-Based Receiver with Forwarded-Clock 4.1 Introduction 4.2 Motivation 4.3 Design consideration 4.4 Architecture of the proposed forwarded-clock receiver 4.5 Circuit description 4.5.1 Analog multi-phase DLL 4.5.2 Dual-input interpolating deley cells 4.5.3 Dedicated half-rate data samplers 4.5.4 Cherry-Hooper continuous-time linear equalizer 4.5.5 Equalizer adaptation and phase-lock scheme 4.6 Measurement results 5. Conclusion 6. BibliographyDocto

    Deliverable D4.1: VLC modulation schemes

    Get PDF
    This report presents the analysis of different modulation schemes D4.1 for VLC systems of the VIDAS project. Considering the final prototype design and application, the deliverable D4.1 was projected. The detail analysis of various modulation schemes are carried out and a robust technique based on direct sequence spread spectrum (DSSS) is followed. DSSS technique though necessitates use of high bandwidth while minimizing the effect of noise. Since the final application does not require very high dat a rate of transmission but robustness against the noise (external lights) becomes necessary. The analysis is followed by model development using Matlab/Simulink. The performance of both of these systems are compared and evaluated. Some of the simulation results are presented

    Sincronização em sistemas integrados a alta velocidade

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaA distribui ção de um sinal relógio, com elevada precisão espacial (baixo skew) e temporal (baixo jitter ), em sistemas sí ncronos de alta velocidade tem-se revelado uma tarefa cada vez mais demorada e complexa devido ao escalonamento da tecnologia. Com a diminuição das dimensões dos dispositivos e a integração crescente de mais funcionalidades nos Circuitos Integrados (CIs), a precisão associada as transições do sinal de relógio tem sido cada vez mais afectada por varia ções de processo, tensão e temperatura. Esta tese aborda o problema da incerteza de rel ogio em CIs de alta velocidade, com o objetivo de determinar os limites do paradigma de desenho sí ncrono. Na prossecu ção deste objectivo principal, esta tese propõe quatro novos modelos de incerteza com âmbitos de aplicação diferentes. O primeiro modelo permite estimar a incerteza introduzida por um inversor est atico CMOS, com base em parâmetros simples e su cientemente gen éricos para que possa ser usado na previsão das limitações temporais de circuitos mais complexos, mesmo na fase inicial do projeto. O segundo modelo, permite estimar a incerteza em repetidores com liga ções RC e assim otimizar o dimensionamento da rede de distribui ção de relógio, com baixo esfor ço computacional. O terceiro modelo permite estimar a acumula ção de incerteza em cascatas de repetidores. Uma vez que este modelo tem em considera ção a correla ção entre fontes de ruí do, e especialmente util para promover t ecnicas de distribui ção de rel ogio e de alimentação que possam minimizar a acumulação de incerteza. O quarto modelo permite estimar a incerteza temporal em sistemas com m ultiplos dom ínios de sincronismo. Este modelo pode ser facilmente incorporado numa ferramenta autom atica para determinar a melhor topologia para uma determinada aplicação ou para avaliar a tolerância do sistema ao ru ído de alimentação. Finalmente, usando os modelos propostos, são discutidas as tendências da precisão de rel ogio. Conclui-se que os limites da precisão do rel ogio são, em ultima an alise, impostos por fontes de varia ção dinâmica que se preveem crescentes na actual l ogica de escalonamento dos dispositivos. Assim sendo, esta tese defende a procura de solu ções em outros ní veis de abstração, que não apenas o ní vel f sico, que possam contribuir para o aumento de desempenho dos CIs e que tenham um menor impacto nos pressupostos do paradigma de desenho sí ncrono.Distributing a the clock simultaneously everywhere (low skew) and periodically everywhere (low jitter) in high-performance Integrated Circuits (ICs) has become an increasingly di cult and time-consuming task, due to technology scaling. As transistor dimensions shrink and more functionality is packed into an IC, clock precision becomes increasingly a ected by Process, Voltage and Temperature (PVT) variations. This thesis addresses the problem of clock uncertainty in high-performance ICs, in order to determine the limits of the synchronous design paradigm. In pursuit of this main goal, this thesis proposes four new uncertainty models, with di erent underlying principles and scopes. The rst model targets uncertainty in static CMOS inverters. The main advantage of this model is that it depends only on parameters that can easily be obtained. Thus, it can provide information on upcoming constraints very early in the design stage. The second model addresses uncertainty in repeaters with RC interconnects, allowing the designer to optimise the repeater's size and spacing, for a given uncertainty budget, with low computational e ort. The third model, can be used to predict jitter accumulation in cascaded repeaters, like clock trees or delay lines. Because it takes into consideration correlations among variability sources, it can also be useful to promote oorplan-based power and clock distribution design in order to minimise jitter accumulation. A fourth model is proposed to analyse uncertainty in systems with multiple synchronous domains. It can be easily incorporated in an automatic tool to determine the best topology for a given application or to evaluate the system's tolerance to power-supply noise. Finally, using the proposed models, this thesis discusses clock precision trends. Results show that limits in clock precision are ultimately imposed by dynamic uncertainty, which is expected to continue increasing with technology scaling. Therefore, it advocates the search for solutions at other abstraction levels, and not only at the physical level, that may increase system performance with a smaller impact on the assumptions behind the synchronous design paradigm
    corecore