445 research outputs found

    On-chip signaling techniques for high-speed Serdes transceivers

    Get PDF
    The general goal of the VLSI technology is to produce very fast chips with very low power consumption. The technology scaling along with increasing the working frequency had been the perfect solution, which enabled the evolution of electronic devices in the 20th century. However, in deep sub-micron technologies, the on-chip power density limited the continuous increment in frequency, which led to another trend for designing higher performance chips without increasing the working speed. Parallelism was the optimum solution, and the VLSI manufacturers began the era of multi-core chips. These multi-core chips require a full inter-core network for the required communication. These on-chip links were conventionally parallel. However, due to reverse scaling in modern technologies, parallel signaling is becoming a burden due to the very large area of needed interconnects. Also, due to the very high power due to the tremendous number of repeaters, in addition to cross talk issues. As a solution, on-chip serial communication was suggested. It will solve all the previous issues, but it will require very high speed circuits to achieve the same data rates. This thesis presents two full SerDes transceiver designs for on-chip high speed serial communication. Both designs use long lossy on-chip differential interconnects with capacitive termination. The first design uses a 3-level self-timed signaling technique. This signaling technique is totally jitter-insensitive, since both of the data and clock are extracted at the receiver from the same signal. A new encoding and driving technique is designed to enable the transmitter to work at a frequency equal to the data rate, which is half of the frequency of the previous designs, along with achieving the same data rate. Also, this design generates the third voltage level without the need of an external supply. This design is very tolerant to any possible variations, such as PVT variations or the input clock\u27s duty cycle variations. This transceiver is prepared for tape-out in UMC 0.13μm CMOS technology in June 2014. The second design uses a new 3-level signaling technique; the proposed technique uses a frequency of only half the data rate, which totally relaxes the full transceiver design. The new technique is also self-timed enabling the extraction of both the data, and the clock from the same signal. New encoders and decoders are designed, and a new architecture for a 3-level inverter is presented. This transceiver achieves very high data rates. This new design is expected to be taped-out using the GF 65nm CMOS technology in August 2014

    NDC: Analyzing the impact of 3D-stacked memory+logic devices on MapReduce workloads

    Get PDF
    pre-printWhile Processing-in-Memory has been investigated for decades, it has not been embraced commercially. A number of emerging technologies have renewed interest in this topic. In particular, the emergence of 3D stacking and the imminent release of Micron's Hybrid Memory Cube device have made it more practical to move computation near memory. However, the literature is missing a detailed analysis of a killer application that can leverage a Near Data Computing (NDC) architecture. This paper focuses on in-memory MapReduce workloads that are commercially important and are especially suitable for NDC because of their embarrassing parallelism and largely localized memory accesses. The NDC architecture incorporates several simple processing cores on a separate, non-memory die in a 3D-stacked memory package; these cores can perform Map operations with efficient memory access and without hitting the bandwidth wall. This paper describes and evaluates a number of key elements necessary in realizing efficient NDC operation: (i) low-EPI cores, (ii) long daisy chains of memory devices, (iii) the dynamic activation of cores and SerDes links. Compared to a baseline that is heavily optimized for MapReduce execution, the NDC design yields up to 15X reduction in execution time and 18X reduction in system energy

    Experimental Evaluation and Comparison of Time-Multiplexed Multi-FPGA Routing Architectures

    Get PDF
    Emulating large complex designs require multi-FPGA systems (MFS). However, inter-FPGA communication is confronted by the challenge of lack of interconnect capacity due to limited number of FPGA input/output (I/O) pins. Serializing parallel signals onto a single trace effectively addresses the limited I/O pin obstacle. Besides the multiplexing scheme and multiplexing ratio (number of inter-FPGA signals per trace), the choice of the MFS routing architecture also affect the critical path latency. The routing architecture of an MFS is the interconnection pattern of FPGAs, fixed wires and/or programmable interconnect chips. Performance of existing MFS routing architectures is also limited by off-chip interface selection. In this dissertation we proposed novel 2D and 3D latency-optimized time-multiplexed MFS routing architectures. We used rigorous experimental approach and real sequential benchmark circuits to evaluate and compare the proposed and existing MFS routing architectures. This research provides a new insight into the encouraging effects of using off-chip optical interface and three dimensional MFS routing architectures. The vertical stacking results in shorter off-chip links improving the overall system frequency with the additional advantage of smaller footprint area. The proposed 3D architectures employed serialized interconnect between intra-plane and inter-plane FPGAs to address the pin limitation problem. Additionally, all off-chip links are replaced by optical fibers that exhibited latency improvement and resulted in faster MFS. Results indicated that exploiting third dimension provided latency and area improvements as compared to 2D MFS. We also proposed latency-optimized planar 2D MFS architectures in which electrical interconnections are replaced by optical interface in same spatial distribution. Performance evaluation and comparison showed that the proposed architectures have reduced critical path delay and system frequency improvement as compared to conventional MFS. We also experimentally evaluated and compared the system performance of three inter-FPGA communication schemes i.e. Logic Multiplexing, SERDES and MGT in conjunction with two routing architectures i.e. Completely Connected Graph (CCG) and TORUS. Experimental results showed that SERDES attained maximum frequency than the other two schemes. However, for very high multiplexing ratios, the performance of SERDES & MGT became comparable

    Doctor of Philosophy

    Get PDF
    dissertationIn-memory big data applications are growing in popularity, including in-memory versions of the MapReduce framework. The move away from disk-based datasets shifts the performance bottleneck from slow disk accesses to memory bandwidth. MapReduce is a data-parallel application, and is therefore amenable to being executed on as many parallel processors as possible, with each processor requiring high amounts of memory bandwidth. We propose using Near Data Computing (NDC) as a means to develop systems that are optimized for in-memory MapReduce workloads, offering high compute parallelism and even higher memory bandwidth. This dissertation explores three different implementations and styles of NDC to improve MapReduce execution. First, we use 3D-stacked memory+logic devices to process the Map phase on compute elements in close proximity to database splits. Second, we attempt to replicate the performance characteristics of the 3D-stacked NDC using only commodity memory and inexpensive processors to improve performance of both Map and Reduce phases. Finally, we incorporate fixed-function hardware accelerators to improve sorting performance within the Map phase. This dissertation shows that it is possible to improve in-memory MapReduce performance by potentially two orders of magnitude by designing system and memory architectures that are specifically tailored to that end

    A high speed serializer/deserializer design

    Get PDF
    A Serializer/Deserializer (SerDes) is a circuit that converts parallel data into a serial stream and vice versa. It helps solve clock/data skew problems, simplifies data transmission, lowers the power consumption and reduces the chip cost. The goal of this project was to solve the challenges in high speed SerDes design, which included the low jitter design, wide bandwidth design and low power design. A quarter-rate multiplexer/demultiplexer (MUX/DEMUX) was implemented. This quarter-rate structure decreases the required clock frequency from one half to one quarter of the data rate. It is shown that this significantly relaxes the design of the VCO at high speed and achieves lower power consumption. A novel multi-phase LC-ring oscillator was developed to supply a low noise clock to the SerDes. This proposed VCO combined an LC-tank with a ring structure to achieve both wide tuning range (11%) and low phase noise (-110dBc/Hz at 1MHz offset). With this structure, a data rate of 36 Gb/s was realized with a measured peak-to-peak jitter of 10ps using 0.18microm SiGe BiCMOS technology. The power consumption is 3.6W with 3.4V power supply voltage. At a 60 Gb/s data rate the simulated peak-to-peak jitter was 4.8ps using 65nm CMOS technology. The power consumption is 92mW with 2V power supply voltage. A time-to-digital (TDC) calibration circuit was designed to compensate for the phase mismatches among the multiple phases of the PLL clock using a three dimensional fully depleted silicon on insulator (3D FDSOI) CMOS process. The 3D process separated the analog PLL portion from the digital calibration portion into different tiers. This eliminated the noise coupling through the common substrate in the 2D process. Mismatches caused by the vertical tier-to-tier interconnections and the temperature influence in the 3D process were attenuated by the proposed calibration circuit. The design strategy and circuits developed from this dissertation provide significant benefit to both wired and wireless applications

    DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling

    Get PDF
    With the advent of many-core chips that place substantial demand on the NoC, photonics has been investigated as a promising alternative to electrical NoCs. While numerous opto-electronic NoCs have been proposed, their evaluations tend to be based on fixed numbers for both photonic and electrical components, making it difficult to co-optimize. Through our own forays into opto-electronic NoC design, we observe that photonics and electronics are very much intertwined, reflecting a strong need for a NoC modeling tool that accurately models parameterized electronic and photonic components within a unified framework, capturing their interactions faithfully. In this paper, we present a tool, DSENT, for design space exploration of electrical and opto-electrical networks. We form a framework that constructs basic NoC building blocks from electrical and photonic technology parameters. To demonstrate potential use cases, we perform a network case study illustrating data-rate tradeoffs, a comparison with scaled electrical technology, and sensitivity to photonics parameters
    corecore