384 research outputs found

    MS

    Get PDF
    thesisAs microelectronics continue to scale, the transistor delay decreases while the wire delay remains relatively constant or even increases. The wire or interconnect delay is quickly becoming the key performance limiting factor in integrated circuit design. This thesis is designed to determine the feasibility of replacing conventional diffusive wires with transmission lines and to compare the tradeoffs of the two systems. The transmission lines propagate signals at the speed of light in the medium and are much less dependent on repeaters than comparable diffusive wires. Therefore, the transmission line system has potentially large power and performance benefits. To compare the tradeoffs, five important design metrics are used: propagation delay, power consumption, maximum throughput, area requirements, and noise tolerance. The transmission lines prove to be an excellent replacement for diffusive wires especially as the length passes 500 /im. For a 1 cm interconnect, the transmission line shows more than a 90% improvement in delay and more than an 80% improvement in energy per bit transmitted. In practice, fabricating transmission lines on real integrated circuits is a difficult process because they require precise resistance, inductance, and capacitance parameter extraction. Using tools specially developed by Mentor Graphics for this thesis, the necessary wire dimensions to produce various transmission lines are calculated for in IBM's 65 nm process

    Circuit design and analysis for on-FPGA communication systems

    No full text
    On-chip communication system has emerged as a prominently important subject in Very-Large- Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects. Interconnects often dictates the system performance, and, therefore, research for new methodologies and system architectures that deliver high-performance communication services across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication. Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable fabrics, switches and the specific routing architecture also introduce additional latency and bandwidth degradation further hindering intra-chip communication performance. Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs. Communication with programmable interconnect received little attention and is inadequately understood. This thesis is among the first to research on-chip communication systems that are built on top of programmable fabrics and proposes methodologies to maximize the interconnect throughput performance. There are three major contributions in this thesis: (i) an analysis of on-chip interconnect fringing, which degrades the bandwidth of communication channels due to routing congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly improves the interconnect throughput by exploiting the fundamental electrical characteristics of the reconfigurable interconnect structures. This new scheme can potentially mitigate the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime optimization for route planning and dynamic routing which, effectively utilizes the in-silicon bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new methodologies and concepts are proposed to enhance the on-FPGA communication throughput performance that is of vital importance in new technology processes

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Worst-Case Analysis of Electrical and Electronic Equipment via Affine Arithmetic

    Get PDF
    In the design and fabrication process of electronic equipment, there are many unkown parameters which significantly affect the product performance. Some uncertainties are due to manufacturing process fluctuations, while others due to the environment such as operating temperature, voltage, and various ambient aging stressors. It is desirable to consider these uncertainties to ensure product performance, improve yield, and reduce design cost. Since direct electromagnetic compatibility measurements impact on both cost and time-to-market, there has been a growing demand for the availability of tools enabling the simulation of electrical and electronic equipment with the inclusion of the effects of system uncertainties. In this framework, the assessment of device response is no longer regarded as deterministic but as a random process. It is traditionally analyzed using the Monte Carlo or other sampling-based methods. The drawback of the above methods is large number of required samples to converge, which are time-consuming for practical applications. As an alternative, the inherent worst-case approaches such as interval analysis directly provide an estimation of the true bounds of the responses. However, such approaches might provide unnecessarily strict margins, which are very unlikely to occur. A recent technique, affine arithmetic, advances the interval based methods by means of handling correlated intervals. However, it still leads to over-conservatism due to the inability of considering probability information. The objective of this thesis is to improve the accuracy of the affine arithmetic and broaden its application in frequency-domain analysis. We first extend the existing literature results to the efficient time-domain analysis of lumped circuits considering the uncertainties. Then we provide an extension of the basic affine arithmetic to the frequency-domain simulation of circuits. Classical tools for circuit analysis are used within a modified affine framework accounting for complex algebra and uncertainty interval partitioning for the accurate and efficient computation of the worst case bounds of the responses of both lumped and distributed circuits. The performance of the proposed approach is investigated through extensive simulations in several case studies. The simulation results are compared with the Monte Carlo method in terms of both simulation time and accuracy

    Advanced Integrated Power and Attitude Control System (IPACS) study

    Get PDF
    Integrated Power and Attitude Control System (IPACS) studies performed over a decade ago established the feasibility of simultaneously satisfying the demands of energy storage and attitude control through the use of rotating flywheels. It was demonstrated that, for a wide spectrum of applications, such a system possessed many advantages over contemporary energy storage and attitude control approaches. More recent technology advances in composite material rotors, magnetic suspension systems, and power control electronics have triggered new optimism regarding the applicability and merits of this concept. This study is undertaken to define an advanced IPACS and to evaluate its merits for a space station application. System and component designs are developed to establish the performance of this concept and system trade studies conducted to examine the viability of this approach relative to conventional candidate systems. It is clearly demonstrated that an advanced IPACS concept is not only feasible, but also offers substantial savings in mass and life-cycle cost for the space station mission

    Electro-Thermal Codesign in Liquid Cooled 3D ICs: Pushing the Power-Performance Limits

    Get PDF
    The performance improvement of today's computer systems is usually accompanied by increased chip power consumption and system temperature. Modern CPUs dissipate an average of 70-100W power while spatial and temporal power variations result in hotspots with even higher power density (up to 300W/cm^2). The coming years will continue to witness a significant increase in CPU power dissipation due to advanced multi-core architectures and 3D integration technologies. Nowadays the problems of increased chip power density, leakage power and system temperatures have become major obstacles for further improvement in chip performance. The conventional air cooling based heat sink has been proved to be insufficient for three dimensional integrated circuits (3D-ICs). Hence better cooling solutions are necessary. Micro-fluidic cooling, which integrates micro-channel heat sinks into silicon substrates of the chip and uses liquid flow to remove heat inside the chip, is an effective active cooling scheme for 3D-ICs. While the micro-fluidic cooling provides excellent cooling to 3D-ICs, the associated overhead (cooling power consumed by the pump to inject the coolant through micro-channels) is significant. Moreover, the 3D-IC structure also imposes constraints on micro-channel locations (basically resource conflict with through-silicon-vias TSVs or other structures). In this work, we investigate optimized micro-channel configurations that address the aforementioned considerations. We develop three micro-channel structures (hotspot optimized cooling configuration, bended micro-channel and hybrid cooling network) that can provide sufficient cooling to 3D-IC with minimum cooling power overhead, while at the same time, compatible with the existing electrical structure such as TSVs. These configurations can achieve up to 70% cooling power savings compared with the configuration without any optimization. Based on these configurations, we then develop a micro-fluidic cooling based dynamic thermal management approach that maintains the chip temperature through controlling the fluid flow rate (pressure drop) through micro-channels. These cooling configurations are designed after the electrical parts, and therefore, compatible with the current standard IC design flow. Furthermore, the electrical, thermal, cooling and mechanical aspects of 3D-IC are interdependent. Hence the conventional design flow that designs the cooling configuration after electrical aspect is finished will result in inefficiencies. In order to overcome this problem, we then investigate electrical-thermal co-design methodology for 3D-ICs. Two co-design problems are explored: TSV assignment and micro-channel placement co-design, and gate sizing and fluidic cooling co-design. The experimental results show that the co-design enables a fundamental power-performance improvement over the conventional design flow which separates the electrical and cooling design. For example, the gate sizing and fluidic cooling co-design achieves 12% power savings under the same circuit timing constraint and 16% circuit speedup under the same power budget

    Algorithms for Circuit Sizing in VLSI Design

    Get PDF
    One of the key problems in the physical design of computer chips, also known as integrated circuits, consists of choosing a  physical layout  for the logic gates and memory circuits (registers) on the chip. The layouts have a high influence on the power consumption and area of the chip and the delay of signal paths.  A discrete set of predefined layouts  for each logic function and register type with different physical properties is given by a library. One of the most influential characteristics of a circuit defined by the layout is its size. In this thesis we present new algorithms for the problem of choosing sizes for the circuits and its continuous relaxation,  and  evaluate these in theory and practice. A popular approach is based on Lagrangian relaxation and projected subgradient methods. We show that seemingly heuristic modifications that have been proposed for this approach can be theoretically justified by applying the well-known multiplicative weights algorithm. Subsequently, we propose a new model for the sizing problem as a min-max resource sharing problem. In our context, power consumption and signal delays are represented by resources that are distributed to customers. Under certain assumptions we obtain a polynomial time approximation for the continuous relaxation of the sizing problem that improves over the Lagrangian relaxation based approach. The new resource sharing algorithm has been implemented as part of the BonnTools software package which is developed at the Research Institute for Discrete Mathematics at the University of Bonn in cooperation with IBM. Our experiments on the ISPD 2013 benchmarks and state-of-the-art microprocessor designs provided by IBM illustrate that the new algorithm exhibits more stable convergence behavior compared to a Lagrangian relaxation based algorithm. Additionally, better timing and reduced power consumption was achieved on almost all instances. A subproblem of the new algorithm consists of finding sizes minimizing a weighted sum of power consumption and signal delays. We describe a method that approximates the continuous relaxation of this problem in polynomial time under certain assumptions. For the discrete problem we provide a fully polynomial approximation scheme under certain assumptions on the topology of the chip. Finally, we present a new algorithm for timing-driven optimization of registers. Their sizes and locations on a chip are usually determined during the clock network design phase, and remain mostly unchanged afterwards although the timing criticalities on which they were based can change. Our algorithm permutes register positions and sizes within so-called  clusters  without impairing the clock network such that it can be applied late in a design flow. Under mild assumptions, our algorithm finds an optimal solution which maximizes the worst cluster slack. It is implemented as part of the BonnTools and improves timing of registers on state-of-the-art microprocessor designs by up to 7.8% of design cycle time. </div
    corecore