Search CORE

12 research outputs found

Experimental Evaluation and Comparison of Time-Multiplexed Multi-FPGA Routing Architectures

Author: Kashif Asmeen
Publication venue: 'University of Windsor Leddy Library'
Publication date: 05/10/2017
Field of study

Emulating large complex designs require multi-FPGA systems (MFS). However, inter-FPGA communication is confronted by the challenge of lack of interconnect capacity due to limited number of FPGA input/output (I/O) pins. Serializing parallel signals onto a single trace effectively addresses the limited I/O pin obstacle. Besides the multiplexing scheme and multiplexing ratio (number of inter-FPGA signals per trace), the choice of the MFS routing architecture also affect the critical path latency. The routing architecture of an MFS is the interconnection pattern of FPGAs, fixed wires and/or programmable interconnect chips. Performance of existing MFS routing architectures is also limited by off-chip interface selection. In this dissertation we proposed novel 2D and 3D latency-optimized time-multiplexed MFS routing architectures. We used rigorous experimental approach and real sequential benchmark circuits to evaluate and compare the proposed and existing MFS routing architectures. This research provides a new insight into the encouraging effects of using off-chip optical interface and three dimensional MFS routing architectures. The vertical stacking results in shorter off-chip links improving the overall system frequency with the additional advantage of smaller footprint area. The proposed 3D architectures employed serialized interconnect between intra-plane and inter-plane FPGAs to address the pin limitation problem. Additionally, all off-chip links are replaced by optical fibers that exhibited latency improvement and resulted in faster MFS. Results indicated that exploiting third dimension provided latency and area improvements as compared to 2D MFS. We also proposed latency-optimized planar 2D MFS architectures in which electrical interconnections are replaced by optical interface in same spatial distribution. Performance evaluation and comparison showed that the proposed architectures have reduced critical path delay and system frequency improvement as compared to conventional MFS. We also experimentally evaluated and compared the system performance of three inter-FPGA communication schemes i.e. Logic Multiplexing, SERDES and MGT in conjunction with two routing architectures i.e. Completely Connected Graph (CCG) and TORUS. Experimental results showed that SERDES attained maximum frequency than the other two schemes. However, for very high multiplexing ratios, the performance of SERDES & MGT became comparable

Scholarship at UWindsor

Recommended from our members

Design techniques for low-power multi-GS/s analog-to-digital converters

Author: Jiang Tao
Publication venue: 'Oregon State University'
Publication date
Field of study

Ultra-high-speed (>10GS/s), medium-resolution (5~6bit), low-power (<50mW) analog-to-digital converter can find it application in the areas of digital oscilloscopes and next-generation serial link receivers. There are several challenges to enable a successful design, however. First, the time-interleaved architecture is required in order to achieve over 10GS/s sampling rate, with the trade-off of the number of the channels and the sampling rate in each channel. Phase misalignment and channel mismatch must be considered too. Second, timing accuracy, especially dynamic jitter of sampling clock becomes a major concern at ultra-high frequency, and certain techniques must be taken to address it. Finally, to achieve low power consumption, Flash architecture is not suitable to serve as the sub-ADC, and a low-power sub-ADC that can work at relatively high speed need to be designed. A single channel, asynchronous successive approximation (SA) ADC with improved feedback delay has been fabricated in 40nm CMOS. Compared with a conventional SA structure that employs a single quantizer controlled by a digital feedback logic loop, the proposed SA-ADC employs multiple quantizers for each conversion bit, clocked by an asynchronous ripple clock that is generated after each quantization. Hence, the sampling rate of the 6-bit ADC is limited only by the six delays of the Capacitive-DAC settling and each comparator’s quantization delay, as the digital logic delay is eliminated. Measurement results of the 40nm-CMOS SA-ADC achieves peak SNDR of 32.9dB at 1GS/s and 30.5dB at 1.25GS/s, consuming 5.28mW and 6.08mW respectively, leading to FoM of 148fJ/conversion-step and 178fJ/conversion-step, in a core area less than 170µm by 85µm. Based on the previous work of sub-ADC, a 12-GS/s 5-b 50-mW ADC is designed in 40nm CMOS with 8 time-interleaved channels of Flash-SA hybrid structure each running at 1.5GS/s. A modified bootstrapped switch is used in the track-and-hold circuit, introducing a global clock signal to synchronize the sampling instants of each individual channel, therefore improve the phase alignment and reduce distortion. The global clock is provided by a CML buffer which is injected by off-chip low-noise sine-wave signal, so that the RMS dynamic jitter is low for better ENOB performance. Measurement results show that the 12GS/s ADC can achieve a SNDR of 25.8dB with the input signal frequency around DC and 22.8dB around 2GHz, consuming 32.1mW, leading to FoM of 237.3fJ/conversion-step, in a core area less than 800µm by 500µm

ScholarsArchive@OSU

A compact high-energy particle detector for low-cost deep space missions

Author: Kemp Dayne Hilton
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2015
Field of study

Over the last few decades particle physics has led to many new discoveries, laying the foundation for modern science. However, there are still many unanswered questions which the next generation of particle detectors could address, potentially expanding our knowledge and understanding of the Universe. Owing to recent technological advancements, electronic sensors are now able to acquire measurements previously unobtainable, creating opportunities for new deep-space high-energy particle missions. Consequently, a new compact instrument was developed capable of detecting gamma rays, neutrons and charged particles. This instrument combines the latest in FPGA System-on-Chip technology as the central processor and a 3x3 array of silicon photomultipliers coupled with an organic plastic scintillator as the detector. Using modern digital pulse shape discrimination and signal processing techniques, the scintillator and photomultiplier combination has been shown to accurately discriminate between the di_erent particle types and provide information such as total energy and incident direction. The instrument demonstrated the ability to capture 30,000 particle events per second across 9 channels - around 15 times that of the U.S. based CLAS detector. Furthermore, the input signals are simultaneously sampled at a maximum rate of 5 GSPS across all channels with 14-bit resolution. Future developments will include FPGA-implemented digital signal processing as well as hardware design for small satellite based deep-space missions that can overcome radiation vulnerability

Cape Town University OpenUCT

High-speed, low cost test platform using FPGA technology

Author: Chen Te-Hui
Publication venue: Georgia Institute of Technology
Publication date: 11/01/2017
Field of study

The object of this research is to develop a low-cost, adaptable testing platform for multi-GHz digital applications, with concentration on the test requirement of advanced devices. Since most advanced ATEs are very expensive, this equipment is not always available for testing cost-sensitive devices. The approach is to use recently-introduced advanced FPGAs for the core logic of the testing platform, thereby allowing for a low-cost, low power-consumption, high-performance, and adaptable test system. Furthermore to customize the testing system for specific applications, we implemented multiple extension testing modules base on this platform. With these extension modules, new functions can be added easily and the test system can be upgraded with specific features required for other testing purposes. The applications of this platform can help those digital devices to be delivered into market with shorter time, lower cost and help the development of the whole industry.Ph.D

Scholarly Materials And Research @ Georgia Tech

A Sub-Centimeter Ranging Precision LIDAR Sensor Prototype Based on ILO-TDC

Author: Chen Chih-Yuan
Publication venue
Publication date: 16/09/2016
Field of study

This thesis introduces a high-resolution light detection and ranging (LIDAR) sensor system-on-a-chip (SoC) that performs sub-centimeter ranging precision and maximally 124-meter ranging distance. With off-chip connected avalanche photodiodes (APDs), the time-of-flight (ToF) are resolved through 31×1 time-correlated single photon counting (TCSPC) channels. Embedded time-to-digital converters (TDCs) support 52-ps time resolution and 14-bit dynamic range. A novel injection-locked oscillator (ILO) based TDC are proposed to minimize the power of fine TDC clock distribution, and improve time precision. The global PVT variation among ILO clock distribution is calibrated by an on-chip phase-looked-loop (PLL) that assures a reliable counting performance over wide operating range. The proposed LIDAR sensor is designed, fabricated, and tested in the 65nm CMOS technology. Whole SoC consumes 37mW and each TDC channel consumes 788μW at nominal operation. The proposed TDC design achieved single-shot precision of 38.5 ps, channel uniformity of 14 ps, and DNL/INL of 0.56/1.56 LSB, respectively. The performance of proposed ILO-TDC makes it an excellent candidate for global counting TCSPC in automotive LIDAR

Texas A&M Repository

CIRCUITS AND ARCHITECTURE FOR BIO-INSPIRED AI ACCELERATORS

Author: Tognetti Gaspar
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 16/02/2021
Field of study

Technological advances in microelectronics envisioned through Moore’s law have led to powerful processors that can handle complex and computationally intensive tasks. Nonetheless, these advancements through technology scaling have come at an unfavorable cost of significantly larger power consumption, which has posed challenges for data processing centers and computers at scale. Moreover, with the emergence of mobile computing platforms constrained by power and bandwidth for distributed computing, the necessity for more energy-efficient scalable local processing has become more significant. Unconventional Compute-in-Memory architectures such as the analog winner-takes-all associative-memory and the Charge-Injection Device processor have been proposed as alternatives. Unconventional charge-based computation has been employed for neural network accelerators in the past, where impressive energy efficiency per operation has been attained in 1-bit vector-vector multiplications, and in recent work, multi-bit vector-vector multiplications. In the latter, computation was carried out by counting quanta of charge at the thermal noise limit, using packets of about 1000 electrons. These systems are neither analog nor digital in the traditional sense but employ mixed-signal circuits to count the packets of charge and hence we call them Quasi-Digital. By amortizing the energy costs of the mixed-signal encoding/decoding over compute-vectors with many elements, high energy efficiencies can be achieved. In this dissertation, I present a design framework for AI accelerators using scalable compute-in-memory architectures. On the device level, two primitive elements are designed and characterized as target computational technologies: (i) a multilevel non-volatile cell and (ii) a pseudo Dynamic Random-Access Memory (pseudo-DRAM) bit-cell. At the level of circuit description, compute-in-memory crossbars and mixed-signal circuits were designed, allowing seamless connectivity to digital controllers. At the level of data representation, both binary and stochastic-unary coding are used to compute Vector-Vector Multiplications (VMMs) at the array level. Finally, on the architectural level, two AI accelerator for data-center processing and edge computing are discussed. Both designs are scalable multi-core Systems-on-Chip (SoCs), where vector-processor arrays are tiled on a 2-layer Network-on-Chip (NoC), enabling neighbor communication and flexible compute vs. memory trade-off. General purpose Arm/RISCV co-processors provide adequate bootstrapping and system-housekeeping and a high-speed interface fabric facilitates Input/Output to main memory

JScholarship

Self-Aligned 3D Chip Integration Technology and Through-Silicon Serial Data Transmission

Author: Sun Fengda
Publication venue: Lausanne, EPFL
Publication date: 12/08/2011
Field of study

The emerging three-dimensional (3D) integration technology is expected to lead to an industry paradigm shift due to its tremendous benefits. Intense research activities are going on about technology, simulation, design, and product prototypes. This thesis work aims at fabricating through-silicon vias (TSVs) on diced processor chips, and later bonding them into a 3D-stacked chip. How to handle and process delicate processor chips with high alignment precision is a key issue. The TSV process to be developed also needs to adapt to this constraint. Four TSV processes have been studied. Among them, the ring-trench TSV process demonstrates the feasibility of fabricating TSVs with the prevailing dimensions, and the whole-through TSV process achieves the first dummy chip post-processed with TSVs in EPFL although the dimension is rather large to keep a reasonable aspect ratio (AR). Four self-alignment (SA) techniques have been investigated, among which the gravitational SA and the hydrophobic SA are found to be quite promising. Using gravitational SA, we come to the conclusion that cavities in silicon carrier wafer with a profile angle of 60° can align the chips with less than 20 µm inaccuracies. The alignment precision can be improved after adopting more advanced dicing tools instead of using the traditional dicing saws and larger cavity profile angle. Such inaccuracy will be sufficient to align the relatively large TSVs for general products such as 3D image sensors. By fabricating bottom TSVs in the carrier wafer, a 3D silicon interposer idea has been proposed to stack another chip, e.g. a processor chip, on the other side of the carrier wafer. But stacking microprocessor chips fabricated with TSVs will require higher alignment precision. A hydrophobic SA technique using the surface tension force generated by the water-to-air interfaces around the pads can greatly reduce the alignment inaccuracy to less than 1 µm. This low-cost and high throughput SA procedure is processed in air, fully-compatible with current fabrication technologies, and highly stable and repeatable. We present a theoretical meniscus model to predict SA results and to provide the design rules. This technique is quite promising for advanced 3D applications involving logic and heterogeneous stacking. As TSVs' dimensions in the chip-level 3D integration are constrained by the chip-level processes, such as bonding, the smallest TSVs might still be about 5 µm. Thus, the area occupied by the TSVs cannot be neglected. Fortunately, TSVs can withstand very high bandwidths, meaning that data can be serialized and transmitted using less numbers of TSVs. With 20 µm TSVs, the 2-Gb/s 8:1 serial link implemented saves 75% of the area of its 8-bit parallel counterpart. The quasi-serial link proposed can effectively balance the inter-layer bandwidth and the serial links' area consumption. The area model of the serial or quasi-serial links working under higher frequencies provides some guidelines to choose the proper serial link design, and it also predicts that when TSV diameter shrinks to 5 µm, it will be difficult to keep this area benefit if without some novel circuit design techniques. As the serial links can be implemented with less area, the bandwidth per unit area is increased. Two scenarios are studied, single-port memory access and multi-port memory access. The expanded inter-layer bandwidth by serialization does not improve the system performance because of the bus-bottleneck problem. In the latter scenario, the inter-layer ultra-wide bandwidth can be exploited as each memory bank can be accessed randomly through the NoC. Thus further widening the inter-layer bandwidth through serialization, the system performance will be improved

Infoscience - École polytechnique fédérale de Lausanne

Topical Workshop on Electronics for Particle Physics

Author: Dho Evelyne
Vasey François
Publication venue: CERN
Publication date: 01/01/2008
Field of study

The purpose of the workshop was to present results and original concepts for electronics research and development relevant to particle physics experiments as well as accelerator and beam instrumentation at future facilities; to review the status of electronics for the LHC experiments; to identify and encourage common efforts for the development of electronics; and to promote information exchange and collaboration in the relevant engineering and physics communities

CERN Document Server

Applications in Electronics Pervading Industry, Environment and Society

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs

Directory of Open Access Books (DOAB)

Topical Workshop on Electronics for Particle Physics

Author: Claude Sandra
Publication venue: CERN
Publication date: 01/01/2007
Field of study

CERN Document Server