1,615 research outputs found

    Experimental Evaluation and Comparison of Time-Multiplexed Multi-FPGA Routing Architectures

    Get PDF
    Emulating large complex designs require multi-FPGA systems (MFS). However, inter-FPGA communication is confronted by the challenge of lack of interconnect capacity due to limited number of FPGA input/output (I/O) pins. Serializing parallel signals onto a single trace effectively addresses the limited I/O pin obstacle. Besides the multiplexing scheme and multiplexing ratio (number of inter-FPGA signals per trace), the choice of the MFS routing architecture also affect the critical path latency. The routing architecture of an MFS is the interconnection pattern of FPGAs, fixed wires and/or programmable interconnect chips. Performance of existing MFS routing architectures is also limited by off-chip interface selection. In this dissertation we proposed novel 2D and 3D latency-optimized time-multiplexed MFS routing architectures. We used rigorous experimental approach and real sequential benchmark circuits to evaluate and compare the proposed and existing MFS routing architectures. This research provides a new insight into the encouraging effects of using off-chip optical interface and three dimensional MFS routing architectures. The vertical stacking results in shorter off-chip links improving the overall system frequency with the additional advantage of smaller footprint area. The proposed 3D architectures employed serialized interconnect between intra-plane and inter-plane FPGAs to address the pin limitation problem. Additionally, all off-chip links are replaced by optical fibers that exhibited latency improvement and resulted in faster MFS. Results indicated that exploiting third dimension provided latency and area improvements as compared to 2D MFS. We also proposed latency-optimized planar 2D MFS architectures in which electrical interconnections are replaced by optical interface in same spatial distribution. Performance evaluation and comparison showed that the proposed architectures have reduced critical path delay and system frequency improvement as compared to conventional MFS. We also experimentally evaluated and compared the system performance of three inter-FPGA communication schemes i.e. Logic Multiplexing, SERDES and MGT in conjunction with two routing architectures i.e. Completely Connected Graph (CCG) and TORUS. Experimental results showed that SERDES attained maximum frequency than the other two schemes. However, for very high multiplexing ratios, the performance of SERDES & MGT became comparable

    From FPGA to ASIC: A RISC-V processor experience

    Get PDF
    This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

    Design Of Fpga Address Register In 28nm Process Technology Based On Standard Cell Based Approach

    Get PDF
    Secara tradisinya, “Field Programmable Gate Array” (FPGA) “Address Register” (AR) direka menggunakan “full custom”. Dengan keadaan geometri yang mengecut pada awal proses nod, maka keperluan untuk menimbang semula pendekatan reka bentuk yang digunakan untuk mereka bentuk FPGA AR diperlukan kerana kitaran reka bentuk meningkat dan merumitkan yang membawa kepada masa lelaran lanjut ke atas penutupan masa blok. Terdapat pelbagai jenis cabaran yang terpaksa dihadapi dalam proses 28nm dan seterusnya sekiranya pendekatan “full custom” masih digunakan untuk merekabentuk FPGA AR. Oleh itu, pendekatan berasaskan sel piawai digunakan untuk reka bentuk FPGA AR. Kitaran reka bentuk FPGA AR dapat dikurangkan dari bulan ke minggu dengan penggunaan kaedah sel piawai. Selain itu, penutupan masa dapat mengawal senario masa yang lebih. Keputusan menunjukkan bahawa FPGA AR menggunakan pendekatan berasaskan sel piawai adalah memenuhi spesifikasi reka bentuk yang diberikan. Di samping itu, jatuhan IR untuk kuasa dan bumi adalah di bawah 2mV, frekuensi adalah 330 MHz dan keluasan kawasan adalah 0.975mm2. Sebagai kesimpulan, pendekatan berasaskan sel piawai memberi pereka lebih banyak masa untuk menyelesaikan isu yang berkaitan dengan rekabentuk. Di samping itu, perubahan yang disebabkan oleh proses, voltan dan suhu dapat diperbaiki melalui kaedah pelbagai sudut dan senario ke atas FPGA AR. ________________________________________________________________________________________________________________________ Traditionally, Field Programmable Gate Array (FPGA) Address Register (AR) is designed using full custom approach. With geometries shrink on advance process node, there is a need to reconsider the design approach used to design FPGA AR because of increased design cycle and complexity that lead to more iteration time on closing block timing. Significant design effort and challenges are required in 28nm and beyond when using full custom approach. Therefore, standard cell based approach is used to design the FPGA AR. Design cycle of FPGA AR is reduced from months to weeks with the automated standard cell based approach. Besides that, timing closure is able to cover more timing scenarios. Results show that FPGA AR using standard cell based approach is meeting the given design specification. IR drop on both power and ground is achieving less than 2mV per rail, frequency of 330MHz is obtained on FPGA AR and area size is 0.975mm2. In summary, standard cell based approach gives designer more time to focus on resolving design issues, and close the design in more timing scenarios which cover more design corners to improve variation due to process, voltage and temperature

    Physical parameter-aware Networks-on-Chip design

    Get PDF
    PhD ThesisNetworks-on-Chip (NoCs) have been proposed as a scalable, reliable and power-efficient communication fabric for chip multiprocessors (CMPs) and multiprocessor systems-on-chip (MPSoCs). NoCs determine both the performance and the reliability of such systems, with a significant power demand that is expected to increase due to developments in both technology and architecture. In terms of architecture, an important trend in many-core systems architecture is to increase the number of cores on a chip while reducing their individual complexity. This trend increases communication power relative to computation power. Moreover, technology-wise, power-hungry wires are dominating logic as power consumers as technology scales down. For these reasons, the design of future very large scale integration (VLSI) systems is moving from being computation-centric to communication-centric. On the other hand, chip’s physical parameters integrity, especially power and thermal integrity, is crucial for reliable VLSI systems. However, guaranteeing this integrity is becoming increasingly difficult with the higher scale of integration due to increased power density and operating frequencies that result in continuously increasing temperature and voltage drops in the chip. This is a challenge that may prevent further shrinking of devices. Thus, tackling the challenge of power and thermal integrity of future many-core systems at only one level of abstraction, the chip and package design for example, is no longer sufficient to ensure the integrity of physical parameters. New designtime and run-time strategies may need to work together at different levels of abstraction, such as package, application, network, to provide the required physical parameter integrity for these large systems. This necessitates strategies that work at the level of the on-chip network with its rising power budget. This thesis proposes models, techniques and architectures to improve power and thermal integrity of Network-on-Chip (NoC)-based many-core systems. The thesis is composed of two major parts: i) minimization and modelling of power supply variations to improve power integrity; and ii) dynamic thermal adaptation to improve thermal integrity. This thesis makes four major contributions. The first is a computational model of on-chip power supply variations in NoCs. The proposed model embeds a power delivery model, an NoC activity simulator and a power model. The model is verified with SPICE simulation and employed to analyse power supply variations in synthetic and real NoC workloads. Novel observations regarding power supply noise correlation with different traffic patterns and routing algorithms are found. The second is a new application mapping strategy aiming vii to minimize power supply noise in NoCs. This is achieved by defining a new metric, switching activity density, and employing a force-based objective function that results in minimizing switching density. Significant reductions in power supply noise (PSN) are achieved with a low energy penalty. This reduction in PSN also results in a better link timing accuracy. The third contribution is a new dynamic thermal-adaptive routing strategy to effectively diffuse heat from the NoC-based threedimensional (3D) CMPs, using a dynamic programming (DP)-based distributed control architecture. Moreover, a new approach for efficient extension of two-dimensional (2D) partially-adaptive routing algorithms to 3D is presented. This approach improves three-dimensional networkon- chip (3D NoC) routing adaptivity while ensuring deadlock-freeness. Finally, the proposed thermal-adaptive routing is implemented in field-programmable gate array (FPGA), and implementation challenges, for both thermal sensing and the dynamic control architecture are addressed. The proposed routing implementation is evaluated in terms of both functionality and performance. The methodologies and architectures proposed in this thesis open a new direction for improving the power and thermal integrity of future NoC-based 2D and 3D many-core architectures

    High Speed Test Interface Module Using MEMS Technology

    Get PDF
    With the transient frequency of available CMOS technologies exceeding hundreds of gigahertz and the increasing complexity of Integrated Circuit (IC) designs, it is now apparent that the architecture of current testers needs to be greatly improved to keep up with the formidable challenges ahead. Test requirements for modern integrated circuits are becoming more stringent, complex and costly. These requirements include an increasing number of test channels, higher test-speeds and enhanced measurement accuracy and resolution. In a conventional test configuration, the signal path from Automatic Test Equipment (ATE) to the Device-Under-Test (DUT) includes long traces of wires. At frequencies above a few gigahertz, testing integrated circuits becomes a challenging task. The effects on transmission lines become critical requiring impedance matching to minimize signal reflection. AC resistance due to the skin effect and electromagnetic coupling caused by radiation can also become important factors affecting the test results. In the design of a Device Interface Board (DIB), the greater the physical separation of the DUT and the ATE pin electronics, the greater the distortion and signal degradation. In this work, a new Test Interface Module (TIM) based on MEMS technology is proposed to reduce the distance between the tester and device-under-test by orders of magnitude. The proposed solution increases the bandwidth of test channels and reduces the undesired effects of transmission lines on the test results. The MEMS test interface includes a fixed socket and a removable socket. The removable socket incorporates MEMS contact springs to provide temporary with the DUT pads and the fixed socket contains a bed of micro-pins to establish electrical connections with the ATE pin electronics. The MEMS based contact springs have been modified to implement a high-density wafer level test probes for Through Silicon Vias (TSVs) in three dimensional integrated circuits (3D-IC). Prototypes have been fabricated using Silicon On Insulator SOI wafer. Experimental results indicate that the proposed architectures can operate up to 50 GHz without much loss or distortion. The MEMS probes can also maintain a good elastic performance without any damage or deformation in the test phase

    Silicon Photonic Subsystems for Inter-Chip Optical Networks

    Get PDF
    The continuous growth of electronic compute and memory nodes in terms of the number of I/O pins, bandwidth, and areal throughput poses major integration and packaging challenges associated with offloading multi-Tbit/s data rates within the few pJ/bit targets. While integrated photonics are already deployed in long and short distances such as inter and intra data centers communications, the promising characteristics of the silicon photonic platform set it as the future technology for optical interconnects in ultra short inter-chip distances. The high index contrast between the waveguide and the cladding together with strong thermo-optic and carrier effects in silicon allows developing a wide range of micro-scale and low power optical devices compatible with the CMOS fabrication processes. Furthermore, the availability of photonic foundries and new electrical and optical co-packaging techniques further pushes this platform for the next steps of commercial deployment. The work in this dissertation presents the current trends in high-performance memory and processor nodes and gives motivation for disaggregated and reconfigurable inter-chip network enabled with the silicon photonic layer. A dense WDM transceiver and broadband switch architectures are discussed to support a bi-directional network of ten hybrid-memory cubes (HMC) interconnected to ten processor nodes with an overall aggregated bandwidth of 9.6Tbit/s. Latency and energy consumption are key performance parameters in a processor to primary memory nodes connectivity. The transceiver design is based on energy-efficient micro-ring resonators, and the broadband switch is constructed with 2x2 Mach-Zehnder elements for nano-second reconfiguration. Each transceiver is based on hundreds of micro-rings to convert the native HMC electrical protocol to the optical domain and the switch is based on tens of hundreds of 2x2 elements to achieve non-blocking all-to-all connectivity. The next chapters focus on developing methods for controlling and monitoring such complex and highly integrated silicon photonic subsystems. The thermo-optic effect is characterized and we show experimentally that the phase of the optical carrier can be reliably controlled with pulse-width modulation (PWM) signal, ultimately relaxing the need for hundreds of digital to analog converters (DACs). We further show that doped waveguide heaters can be utilized as \textit{in-line} optical power monitors by measuring photo-conductance current, which is an alternative for the conventional tapping and integration of photo-diodes. The next part concerned with a common cascaded micro-ring resonator in a WDM transceiver design. We develop on an FPGA control algorithm that abstracts the physical layer and takes user-defined inputs to set the resonances to the desired wavelength in a unicast and multicast transmission modes. The associated sensitivities of these silicon ring resonators are presented and addressed with three closed-loop solutions. We first show a closed-loop operation based on tapping the error signal from the drop port of the micro-ring. The second solution presents a resonance wavelength locking with a single digital I/O for control and feedback signals. Lastly, we leverage the photo-conductance effect and demonstrate the locking procedure using only the doped heater for both control and feedback purposes. To achieve the inter-chip reconfigurability we discuss recent advances of high-port-count SiP broadband switches for reconfigurable inter-chip networks. To ensure optimal operation in terms of low insertion loss, low cross-talk and high signal integrity per routing path, hundreds of 2x2 Mach-Zehnder elements need to be biased precisely for the cross and bar states. We address this challenge with a tapless and a design agnostic calibration approach based on the photo-conductance effect. The automated algorithm returns a look-up table for all for each 2x2 element and the associated calibrated biases. Each routing scenario is then tested for insertion loss, crosstalk and bit-error rate of 25Gbit/s 4-level pulse amplitude modulation signals. The last part utilizes the Mach-Zehnder interferometers in WDM transceiver applications. We demonstrate a polarization insensitive four-channel WDM receiver with 40Gbit/s per channel and a transmitter design generating 8-level pulse amplitude modulation signals at 30Gbit/s

    Energy Efficient Encoding Methods For Chip-to-Chip Communication

    Get PDF
    corecore