49 research outputs found

    A 0.45-0.7 v 1-6 Gb/s 0.29-0.58 pJ/b Source-Synchronous Transceiver Using Near-Threshold Operation

    No full text
    A low-power source-synchronous multi-Gb/s transceiver is presented. Supply voltage is aggressively scaled to reduce power, and the speed penalty resulting from low-voltage operation is overcome by multiplexing transmitter/receiver synchronized by low-rate multi-phase clocks. Phase spacing errors incurred by device mismatches are corrected using a self-calibration scheme. The proposed phase calibration method uses a single digital DLL to calibrate all the phases, which makes it insensitive to offsets in the calibrating DLL. Fabricated in a 65-nm CMOS process, energy efficiency and data rate of the prototype transceiver vary from 0.29 to 0.58 pJ/bit and 1 to 6 Gb/s, respectively, as the supply voltage changes from 0.45 to 0.7 V

    Toward realizing power scalable and energy proportional high-speed wireline links

    Get PDF
    Growing computational demand and proliferation of cloud computing has placed high-speed serial links at the center stage. Due to saturating energy efficiency improvements over the last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as output drivers, receiver, or clock generation and distribution. However, this approach yields very limited efficiency improvement. This dissertation takes an alternative approach toward reducing the serial link power. Instead of optimizing the power of individual building blocks, power of the entire serial link is reduced by exploiting serial link usage by the applications. It has been demonstrated that serial links in servers are underutilized. On average, they are used only 15% of the time, i.e. these links are idle for approximately 85% of the time. Conventional links consume power during idle periods to maintain synchronization between the transmitter and the receiver. However, by powering-off the link when idle and powering it back when needed, power consumption of the serial link can be scaled proportionally to its utilization. This approach of rapid power state transitioning is known as the rapid-on/off approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power, and power state transition energy must all be close to zero. However, in practice, it is very difficult to achieve these ideal conditions. Work presented in this dissertation addresses these challenges. When this research work was started (2011-12), there were only a couple of research papers available in the area of rapid-on/off links. Systematic study or design of a rapid power state transitioning in serial links was not available in the literature. Since rapid-on/off with nanoseconds granularity is not a standard in any wireline communication, even the popular test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However, these challenges provided a unique opportunity to explore new architectural techniques and identify trade-offs. The key contributions of this dissertation are as follows. The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to find alternative ways to reduce the serial link power. The second contribution is to identify potential power saving techniques and evaluate the challenges they pose and the opportunities they present. The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature. The transmitter achieves rapid-on/off capability in voltage mode output driver by using a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time as a function of various circuit parameters is also discussed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is, therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns, respectively. This dissertation highlights key trade-off in the clock multiplier architecture, to achieve fast power-on-lock capability at the cost of jitter performance. The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi- plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita- tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves power-on-lock in 1ns. The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit- ter and receiver. It was the first reported design of a complete transceiver, with an embedded clock architecture, having rapid-on/off capability. Background phase calibration technique in PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utiliza- tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes by 100x (7Gb/s-to-70Mb/s). The sixth and final contribution is the design of a temperature sensor to compensate the frequency drifts due to temperature variations, during long power-off periods, in the fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor is designed with all digital logic gates and achieves low supply sensitivity. This sensor is suitable for integration in processor and DRAM environments. The proposed sensor works on the principle of directly converting temperature information to frequency and finally to digital bits. A novel sensing technique is proposed in which temperature information is acquired by creating a threshold voltage difference between the transistors used in the oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and the overhead of voltage regulators and an external ideal reference frequency is avoided. The effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ±0.9oC and ±2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 , and measurement (conversion) time of 6.5μs

    Design of Energy-Efficient A/D Converters with Partial Embedded Equalization for High-Speed Wireline Receiver Applications

    Get PDF
    As the data rates of wireline communication links increases, channel impairments such as skin effect, dielectric loss, fiber dispersion, reflections and cross-talk become more pronounced. This warrants more interest in analog-to-digital converter (ADC)-based serial link receivers, as they allow for more complex and flexible back-end digital signal processing (DSP) relative to binary or mixed-signal receivers. Utilizing this back-end DSP allows for complex digital equalization and more bandwidth-efficient modulation schemes, while also displaying reduced process/voltage/temperature (PVT) sensitivity. Furthermore, these architectures offer straightforward design translation and can directly leverage the area and power scaling offered by new CMOS technology nodes. However, the power consumption of the ADC front-end and subsequent digital signal processing is a major issue. Embedding partial equalization inside the front-end ADC can potentially result in lowering the complexity of back-end DSP and/or decreasing the ADC resolution requirement, which results in a more energy-effcient receiver. This dissertation presents efficient implementations for multi-GS/s time-interleaved ADCs with partial embedded equalization. First prototype details a 6b 1.6GS/s ADC with a novel embedded redundant-cycle 1-tap DFE structure in 90nm CMOS. The other two prototypes explain more complex 6b 10GS/s ADCs with efficiently embedded feed-forward equalization (FFE) and decision feedback equalization (DFE) in 65nm CMOS. Leveraging a time-interleaved successive approximation ADC architecture, new structures for embedded DFE and FFE are proposed with low power/area overhead. Measurement results over FR4 channels verify the effectiveness of proposed embedded equalization schemes. The comparison of fabricated prototypes against state-of-the-art general-purpose ADCs at similar speed/resolution range shows comparable performances, while the proposed architectures include embedded equalization as well

    Source-synchronous I/O Links using Adaptive Interface Training for High Bandwidth Applications

    Get PDF
    Mobility is the key to the global business which requires people to be always connected to a central server. With the exponential increase in smart phones, tablets, laptops, mobile traffic will soon reach in the range of Exabytes per month by 2018. Applications like video streaming, on-demand-video, online gaming, social media applications will further increase the traffic load. Future application scenarios, such as Smart Cities, Industry 4.0, Machine-to-Machine (M2M) communications bring the concepts of Internet of Things (IoT) which requires high-speed low power communication infrastructures. Scientific applications, such as space exploration, oil exploration also require computing speed in the range of Exaflops/s by 2018 which means TB/s bandwidth at each memory node. To achieve such bandwidth, Input/Output (I/O) link speed between two devices needs to be increased to GB/s. The data at high speed between devices can be transferred serially using complex Clock-Data-Recovery (CDR) I/O links or parallely using simple source-synchronous I/O links. Even though CDR is more efficient than the source-synchronous method for single I/O link, but to achieve TB/s bandwidth from a single device, additional I/O links will be required and the source-synchronous method will be more advantageous in terms of area and power requirements as additional I/O links do not require extra hardware resources. At high speed, there are several non-idealities (Supply noise, crosstalk, Inter- Symbol-Interference (ISI), etc.) which create unwanted skew problem among parallel source-synchronous I/O links. To solve these problems, adaptive trainings are used in time domain to synchronize parallel source-synchronous I/O links irrespective of these non-idealities. In this thesis, two novel adaptive training architectures for source-synchronous I/O links are discussed which require significantly less silicon area and power in comparison to state-of-the-art architectures. First novel adaptive architecture is based on the unit delay concept to synchronize two parallel clocks by adjusting the phase of one clock in only one direction. Second novel adaptive architecture concept consists of Phase Interpolator (PI)-based Phase Locked Loop (PLL) which can adjust the phase in both direction and achieve faster synchronization at the expense of added complexity. With an increase in parallel I/O links, clock skew which is generated by the improper clock tree, also affects the timing margin. Incorrect duty cycle further reduces the timing margin mainly in Double Data Rate (DDR) systems which are generally used to increase the bandwidth of a high-speed communication system. To solve clock skew and duty cycle problems, a novel clock tree buffering algorithm and a novel duty cycle corrector are described which further reduce the power consumption of a source-synchronous system

    Design of clock and data recovery circuits for energy-efficient short-reach optical transceivers

    Get PDF
    Nowadays, the increasing demand for cloud based computing and social media services mandates higher throughput (at least 56 Gb/s per data lane with 400 Gb/s total capacity 1) for short reach optical links (with the reach typically less than 2 km) inside data centres. The immediate consequences are the huge and power hungry data centers. To address these issues the intra-data-center connectivity by means of optical links needs continuous upgrading. In recent years, the trend in the industry has shifted toward the use of more complex modulation formats like PAM4 due to its spectral efficiency over the traditional NRZ. Another advantage is the reduced number of channels count which is more cost-effective considering the required area and the I/O density. However employing PAM4 results in more complex transceivers circuitry due to the presence of multilevel transitions and reduced noise budget. In addition, providing higher speed while accommodating the stringent requirements of higher density and energy efficiency (< 5 pJ/bit), makes the design of the optical links more challenging and requires innovative design techniques both at the system and circuit level. This work presents the design of a Clock and Data Recovery Circuit (CDR) as one of the key building blocks for the transceiver modules used in such fibreoptic links. Capable of working with PAM4 signalling format, the new proposed CDR architecture targets data rates of 50−56 Gb/s while achieving the required energy efficiency (< 5 pJ/bit). At the system level, the design proposes a new PAM4 PD which provides a better trade-off in terms of bandwidth and systematic jitter generation in the CDR. By using a digital loop controller (DLC), the CDR gains considerable area reduction with flexibility to adjust the loop dynamics. At the circuit level it focuses on applying different circuit techniques to mitigate the circuit imperfections. It presents a wideband analog front end (AFE), suitable for a 56 Gb/s, 28-Gbaud PAM-4 signal, by using an 8x interleaved, master/ slave based sample and hold circuit. In addition, the AFE is equipped with a calibration scheme which corrects the errors associated with the sampling channels’ offset voltage and gain mismatches. The presented digital to phase converter (DPC) features a modified phase interpolator (PI), a new quadrature phase corrector (QPC) and multi-phase output with de-skewing capabilities.The DPC (as a standalone block) and the CDR (as the main focus of this work) were fabricated in 65-nm CMOS technology. Based on the measurements, the DPC achieves DNL/INL of 0.7/6 LSB respectively while consuming 40.5 mW power from 1.05 V supply. Although the CDR was not fully operational with the PAM4 input, the results from 25-Gbaud PAM2 (NRZ) test setup were used to estimate the performance. Under this scenario, the 1-UI JTOL bandwidth was measured to be 2 MHz with BER threshold of 10−4. The chip consumes 236 mW of power while operating on 1 − 1.2 V supply range achieving an energyefficiency of 4.27 pJ/bit

    Evaluating Techniques for Wireless Interconnected 3D Processor Arrays

    Get PDF
    In this thesis the viability of a wireless interconnect network for a highly parallel computer is investigated. The main theme of this thesis is to project the performance of a wireless network used to connect the processors in a parallel machine of such design. This thesis is going to investigate new design opportunities a wireless interconnect network can offer for parallel computing. A simulation environment is designed and implemented to carry out the tests. The results have shown that if the available radio spectrum is shared effectively between building blocks of the parallel machine, there are substantial chances to achieve high processor utilisation. The results show that some factors play a major role in the performance of such a machine. The size of the machine, the size of the problem and the communication and computation capabilities of each element of the machine are among those factors. The results show these factors set a limit on the number of nodes engaged in some classes of tasks. They have shown promising potential for further expansion and evolution of our idea to new architectural opportunities, which is discussed by the end of this thesis. To build a real machine of this type the architects would need to solve a number of challenging problems including heat dissipation, delivering electric power and Chip/board design; however, these issues are not part of this thesis and will be tackled in future

    Integrated Circuits and Systems for Smart Sensory Applications

    Get PDF
    Connected intelligent sensing reshapes our society by empowering people with increasing new ways of mutual interactions. As integration technologies keep their scaling roadmap, the horizon of sensory applications is rapidly widening, thanks to myriad light-weight low-power or, in same cases even self-powered, smart devices with high-connectivity capabilities. CMOS integrated circuits technology is the best candidate to supply the required smartness and to pioneer these emerging sensory systems. As a result, new challenges are arising around the design of these integrated circuits and systems for sensory applications in terms of low-power edge computing, power management strategies, low-range wireless communications, integration with sensing devices. In this Special Issue recent advances in application-specific integrated circuits (ASIC) and systems for smart sensory applications in the following five emerging topics: (I) dedicated short-range communications transceivers; (II) digital smart sensors, (III) implantable neural interfaces, (IV) Power Management Strategies in wireless sensor nodes and (V) neuromorphic hardware

    Algorithms for Energy Efficiency in Wireless Sensor Networks

    Get PDF
    The recent advances in microsensor and semiconductor technology have opened a new field within computer science: the networking of small-sized sensors which are capable of sensing, processing, and communicating. Such wireless sensor networks offer new applications in the areas of habitat and environment monitoring, disaster control and operation, military and intelligence control, object tracking, video surveillance, traffic control, as well as in health care and home automation. It is likely that the deployed sensors will be battery-powered, which will limit the energy capacity significantly. Thus, energy efficiency becomes one of the main challenges that need to be taken into account, and the design of energy-efficient algorithms is a major contribution of this thesis. As the wireless communication in the network is one of the main energy consumers, we first consider in detail the characteristics of wireless communication. By using the embedded sensor board (ESB) platform recently developed by the Free University of Berlin, we analyze the means of forward error correction and propose an appropriate resync mechanism, which improves the communication between two ESB nodes substantially. Afterwards, we focus on the forwarding of data packets through the network. We present the algorithms energy-efficient forwarding (EEF), lifetime-efficient forwarding (LEF), and energy-efficient aggregation forwarding (EEAF). While EEF is designed to maximize the number of data bytes delivered per energy unit, LEF additionally takes into account the residual energy of forwarding nodes. In so doing, LEF further prolongs the lifetime of the network. Energy savings due to data aggregation and in-network processing are exploited by EEAF. Besides single-link forwarding, in which data packets are sent to only one forwarding node, we also study the impact of multi-link forwarding, which exploits the broadcast characteristics of the wireless medium by sending packets to several (potential) forwarding nodes. By actively selecting a forwarder among all nodes that received a packet successfully, retransmissions can often be avoided. In the majority of cases, multi-link forwarding is thus more efficient and able to save energy. In the last part of this thesis, we present a topology and energy control algorithm (TECA) to turn off the nodes' radio transceivers completely in order to avoid idle listening. By means of TECA, a connected backbone of active nodes is established, while all other nodes may sleep and save energy by turning off their radios. All algorithms presented in this thesis have been fully analyzed, simulated, and implemented on the ESB platform. They are suitable for several applications scenarios and can easily be adapted even to other wireless sensor platforms
    corecore