1,206 research outputs found
Stability and efficiency of explicit integration in interconnect analysis on GPUs
This paper presents a technique to parallelise a numeric integration solver on general purpose GPU. The technique is based on the combination of space state modeling with an explicit integration method based on the Adams-Bashforth second order formula. The paper studies the stability of variable step explicit method and proposes a technique to guarantee integration stability using this technique. Although explicit methods require smaller integration steps compared to the traditional implicit techniques, they avoid the complex calculations on large which are used to solve the last ones. The technique is demonstrated simulating an RC model of an VLSI interconnect. Results achieved by the proposed variable step explicit method is compared to those achieved by a traditional implicit integration based simulator like Ngspice. The results show that the parallelised explicit solution is one order of
magnitude faster than the implicit one for increasingly complex circuits.This work has been partially funded by Spanish government through
project RTI2018-097088-B-C33 (MINECO/FEDER, UE) and by EPSRC
(the UK Engineering and Physical Sciences Research Council) under grant
EP/N0317681/1. The research stay at The University of Southampton has been
supported by Fundacion Séneca-Agencia de Ciencia y TecnologÃa de la Región
de Murcia, Programa Regional de Movilidad, Colaboración e Intercambio de
Conocimiento Jimenez de la Espada under grant 21187/EE/1
System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing
This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications.
Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance.
This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB.
Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy).
The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption.
Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude
Delay Extraction based Macromodeling with Parallel Processing for Efficient Simulation of High Speed Distributed Networks
This thesis attempts to address the computational demands of accurate modeling of high speed distributed networks such as interconnect networks and power distribution networks. In order to do so, two different approaches towards modeling of high speed distributed networks are considered. One approach deals with cases where the physical characteristics of the network are not known and the network is characterized by its frequency domain tabulated data. Such examples include long interconnect networks described by their Y parameter data. For this class of problems, a novel delay extraction based IFFT algorithm has been developed for accurate transient response simulation.
The other modeling approach is based on a detailed knowledge of the physical and electrical characteristics of the network and assuming a quasi transverse mode of propagation of the electromagnetic wave through the network. Such problems may include two dimensional (2D) and three dimensional (3D) power distribution networks with known geometry and materials. For this class of problem, a delay extraction based macromodeling approaches is proposed which has been found to be able to capture the distributed effects of the network resulting in more compact and accurate simulation compared to the state-of-the-art quasi-static lumped models. Furthermore, waveform relaxation based algorithms for parallel simulations of large interconnect networks and 2D power distribution networks is also presented. A key contribution of this body of work is the identification of naturally parallelizable and convergent iterative techniques that can divide the computational costs of solving such large macromodels over a multi-core hardware
Recommended from our members
Physics-Based Electromigration Modeling and Analysis and Optimization
Long-term reliability is a major concern in modern VLSI design. Literature has shown that reliability gets worse as technology advances. It is expected that the future VLSI systems would have shorter reliability-induced lifetime comparing with previous generations. Being one of the most serious reliability effects, electromigration (EM) is a physical phenomenon of the migration of metal atoms due to the momentum exchange between atoms and the conducting electrons. It can cause wire resistance change or open circuit and result in functional failure of the circuit. Power-ground networks are the most vulnerable part to EM effect among all the interconnect wires since the current flow on this part is the largest on the chip. With new generation oftechnology node and aggressive design strategies, more accurate and efficient EM models are required. However, traditional EM approaches are very conservative and cannot meet current aggressive design strategies. Besides circuit level, EM also need to be thoroughly studied in system level due to limited power and temperature budgets among cores on chip. This research focuses on developing physical level EM model for VLSI circuits and system level EM optimization for multi-core systems in order to overcome the aforementioned problems. Specifically, for physical level, we develop two EM immortality check methods and a power grid EM check method. Firstly, a voltage based EM immortality analysis has been developed. Immortality condition in nucleation phase can be determined fast and accurately for multi-segment interconnect wires. Secondly, a saturation volume based incubation phase immortality check method has been proposed. This method can further reduce the redundancy in VLSI circuit design by immortality check in multiphase. Furthermore, both immortality check methods are integrated into a new power grid EM check methodology (EMspice) as filter for EM analysis. These filters can accelerate the simulation by filtering out immortal trees so that we only need to do simulation on fewer trees that are mortal. Coupled EM simulation considering both hydrostatic stress and electronic current/voltage in the power grid network will be applied to these mortal trees. This tool can work seamlessly with commercial synthesis flow. Besides physical level reliability models, system level reliability optimization is also discussed in this research. A deep reinforcement learning based EM optimization has been proposed for multi-core system. Both long term reliability effect (hard error) and transient soft error are considered. Energy can be optimized with all the reliability and other constraints fast and accurately compared to existing reliability management techniques. Last but not least, a scheduling based reliability optimization method for multi-core systems has been proposed. NBTI, HCI and EM are considered jointly. Lifetime of the system can be improved significantly compared to traditional methods which mainly focus on utilization
Custom Integrated Circuits
Contains table of contents for Part III, table of contents for Section 1 and reports on eleven research projects.IBM CorporationMIT School of EngineeringNational Science Foundation Grant MIP 94-23221Defense Advanced Research Projects Agency/U.S. Army Intelligence Center Contract DABT63-94-C-0053Mitsubishi CorporationNational Science Foundation Young Investigator Award Fellowship MIP 92-58376Joint Industry Program on Offshore Structure AnalysisAnalog DevicesDefense Advanced Research Projects AgencyCadence Design SystemsMAFET ConsortiumConsortium for Superconducting ElectronicsNational Defense Science and Engineering Graduate FellowshipDigital Equipment CorporationMIT Lincoln LaboratorySemiconductor Research CorporationMultiuniversity Research IntiativeNational Science Foundatio
Custom Integrated Circuits
Contains reports on six research projects.U.S. Air Force - Office of Scientific Research (Contract F49620-84-C-0004)Analog Devices, Inc.Defense Advanced Research Projects Agency (Contract N00014-80-C-0622)National Science Foundation (Grant ECS83-10941
Automatic synthesis and optimization of chip multiprocessors
The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed
Thermal Aware Design Method for VCSEL-Based On-Chip Optical Interconnect
Optical Network-on-Chip (ONoC) is an emerging technology considered as one of
the key solutions for future generation on-chip interconnects. However, silicon
photonic devices in ONoC are highly sensitive to temperature variation, which
leads to a lower efficiency of Vertical-Cavity Surface-Emitting Lasers
(VCSELs), a resonant wavelength shift of Microring Resonators (MR), and results
in a lower Signal to Noise Ratio (SNR). In this paper, we propose a methodology
enabling thermal-aware design for optical interconnects relying on
CMOS-compatible VCSEL. Thermal simulations allow designing ONoC interfaces with
low gradient temperature and analytical models allow evaluating the SNR.Comment: IEEE International Conference on Design Automation and Test in Europe
(DATE 2015), Mar 2015, Grenoble, France. 201
- …