436 research outputs found

    Optimal simultaneous mapping and clustering for FPGA delay optimization

    Get PDF

    An Ultra-Low-Energy, Variation-Tolerant FPGA Architecture Using Component-Specific Mapping

    Get PDF
    As feature sizes scale toward atomic limits, parameter variation continues to increase, leading to increased margins in both delay and energy. Parameter variation both slows down devices and causes devices to fail. For applications that require high performance, the possibility of very slow devices on critical paths forces designers to reduce clock speed in order to meet timing. For an important and emerging class of applications that target energy-minimal operation at the cost of delay, the impact of variation-induced defects at very low voltages mandates the sizing up of transistors and operation at higher voltages to maintain functionality. With post-fabrication configurability, FPGAs have the opportunity to self-measure the impact of variation, determining the speed and functionality of each individual resource. Given that information, a delay-aware router can use slow devices on non-critical paths, fast devices on critical paths, and avoid known defects. By mapping each component individually and customizing designs to a component's unique physical characteristics, we demonstrate that we can eliminate delay margins and reduce energy margins caused by variation. To quantify the potential benefit we might gain from component-specific mapping, we first measure the margins associated with parameter variation, and then focus primarily on the energy benefits of FPGA delay-aware routing over a wide range of predictive technologies (45 nm--12 nm) for the Toronto20 benchmark set. We show that relative to delay-oblivious routing, delay-aware routing without any significant optimizations can reduce minimum energy/operation by 1.72x at 22 nm. We demonstrate how to construct an FPGA architecture specifically tailored to further increase the minimum energy savings of component-specific mapping by using the following techniques: power gating, gate sizing, interconnect sparing, and LUT remapping. With all optimizations considered we show a minimum energy/operation savings of 2.66x at 22 nm, or 1.68--2.95x when considered across 45--12 nm. As there are many challenges to measuring resource delays and mapping per chip, we discuss methods that may make component-specific mapping more practical. We demonstrate that a simpler, defect-aware routing achieves 70% of the energy savings of delay-aware routing. Finally, we show that without variation tolerance, scaling from 16 nm to 12 nm results in a net increase in minimum energy/operation; component-specific mapping, however, can extend minimum energy/operation scaling to 12 nm and possibly beyond.</p

    CAD Techniques for Robust FPGA Design Under Variability

    Get PDF
    The imperfections in the semiconductor fabrication process and uncertainty in operating environment of VLSI circuits have emerged as critical challenges for the semiconductor industry. These are generally termed as process and environment variations, which lead to uncertainty in performance and unreliable operation of the circuits. These problems have been further aggravated in scaled nanometer technologies due to increased process variations and reduced operating voltage. Several techniques have been proposed recently for designing digital VLSI circuits under variability. However, most of them have targeted ASICs and custom designs. The flexibility of reconfiguration and unknown end application in FPGAs make design under variability different for FPGAs compared to ASICs and custom designs, and the techniques proposed for ASICs and custom designs cannot be directly applied to FPGAs. An important design consideration is to minimize the modifications in architecture and circuit to reduce the cost of changing the existing FPGA architecture and circuit. The focus of this work can be divided into three principal categories, which are, improving timing yield under process variations, improving power yield under process variations and improving the voltage profile in the FPGA power grid. The work on timing yield improvement proposes routing architecture enhancements along with CAD techniques to improve the timing yield of FPGA designs. The work on power yield improvement for FPGAs selects a low power dual-Vdd FPGA design as the baseline FPGA architecture for developing power yield enhancement techniques. It proposes CAD techniques to improve the power yield of FPGAs. A mathematical programming technique is proposed to determine the parameters of the buffers in the interconnect such as the sizes of the transistors and threshold voltage of the transistors, all within constraints, such that the leakage variability is minimized under delay constraints. Two CAD techniques are investigated and proposed to improve the supply voltage profile of the power grids in FPGAs. The first technique is a place and route technique and the second technique is a logic clustering technique to reduce IR-drops and spatial variation of supply voltage in the power grid

    Digital Circuit Design Using Floating Gate Transistors

    Get PDF
    Floating gate (flash) transistors are used exclusively for memory applications today. These applications include SD cards of various form factors, USB flash drives and SSDs. In this thesis, we explore the use of flash transistors to implement digital logic circuits. Since the threshold voltage of flash transistors can be modified at a fine granularity during programming, several advantages are obtained by our flash-based digital circuit design approach. For one, speed binning at the factory can be controlled with precision. Secondly, an IC can be re-programmed in the field, to negate effects such as aging, which has been a significant problem in recent times, particularly for mission-critical applications. Thirdly, unlike a regular MOSFET, which has one threshold voltage level, a flash transistor can have multiple threshold voltage levels. The benefit of having multiple threshold voltage levels in a flash transistor is that it allows the ability to encode more symbols in each device, unlike a regular MOSFET. This allows us to implement multi-valued logic functions natively. In this thesis, we evaluate different flash-based digital circuit design approaches and compare their performance with a traditional CMOS standard cell-based design approach. We begin by evaluating our design approach at the cell level to optimize the design’s delay, power energy and physical area characteristics. The flash-based approach is demonstrated to be better than the CMOS standard cell approach, for these performance metrics. Afterwards, we present the performance of our design approach at the block level. We describe a synthesis flow to decompose a circuit block into a network of interconnected flash-based circuit cells. We also describe techniques to optimize the resulting network of flash-based circuit cells using don’t cares. Our optimization approach distinguishes itself from other optimization techniques that use don’t cares, since it a) targets a flash-based design flow, b) optimizes clusters of logic nodes at once instead of one node at a time, c) attempts to reduce the number of cubes instead of reducing the number of literals in each cube and d) performs optimization on the post-technology mapped netlist which results in a direct improvement in result quality, as compared to pre-technology mapping logic optimization that is typically done in the literature. The resulting network characteristics (delay, power, energy and physical area) are presented. These results are compared with a standard cell-based realization of the same block (obtained using commercial tools) and we demonstrate significant improvements in all the design metrics. We also study flash-based FPGA designs (both static and dynamic), and present the tradeoff of delay, power dissipation and energy consumption of the various designs. Our work differs from previously proposed flash-based FPGAs, since we embed the flash transistors (which store the configuration bits) directly within the logic and interconnect fabrics. We also present a detailed description of how the programming of the configuration bits is accomplished, for all the proposed designs

    Digital Circuit Design Using Floating Gate Transistors

    Get PDF
    Floating gate (flash) transistors are used exclusively for memory applications today. These applications include SD cards of various form factors, USB flash drives and SSDs. In this thesis, we explore the use of flash transistors to implement digital logic circuits. Since the threshold voltage of flash transistors can be modified at a fine granularity during programming, several advantages are obtained by our flash-based digital circuit design approach. For one, speed binning at the factory can be controlled with precision. Secondly, an IC can be re-programmed in the field, to negate effects such as aging, which has been a significant problem in recent times, particularly for mission-critical applications. Thirdly, unlike a regular MOSFET, which has one threshold voltage level, a flash transistor can have multiple threshold voltage levels. The benefit of having multiple threshold voltage levels in a flash transistor is that it allows the ability to encode more symbols in each device, unlike a regular MOSFET. This allows us to implement multi-valued logic functions natively. In this thesis, we evaluate different flash-based digital circuit design approaches and compare their performance with a traditional CMOS standard cell-based design approach. We begin by evaluating our design approach at the cell level to optimize the design’s delay, power energy and physical area characteristics. The flash-based approach is demonstrated to be better than the CMOS standard cell approach, for these performance metrics. Afterwards, we present the performance of our design approach at the block level. We describe a synthesis flow to decompose a circuit block into a network of interconnected flash-based circuit cells. We also describe techniques to optimize the resulting network of flash-based circuit cells using don’t cares. Our optimization approach distinguishes itself from other optimization techniques that use don’t cares, since it a) targets a flash-based design flow, b) optimizes clusters of logic nodes at once instead of one node at a time, c) attempts to reduce the number of cubes instead of reducing the number of literals in each cube and d) performs optimization on the post-technology mapped netlist which results in a direct improvement in result quality, as compared to pre-technology mapping logic optimization that is typically done in the literature. The resulting network characteristics (delay, power, energy and physical area) are presented. These results are compared with a standard cell-based realization of the same block (obtained using commercial tools) and we demonstrate significant improvements in all the design metrics. We also study flash-based FPGA designs (both static and dynamic), and present the tradeoff of delay, power dissipation and energy consumption of the various designs. Our work differs from previously proposed flash-based FPGAs, since we embed the flash transistors (which store the configuration bits) directly within the logic and interconnect fabrics. We also present a detailed description of how the programming of the configuration bits is accomplished, for all the proposed designs

    POWER-AWARE TECHNOLOGY MAPPING AND ROUTING FOR DUAL-VT FPGAS

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Aggressive undervolting of FPGAs : power & reliability trade-offs

    Get PDF
    In this work, we evaluate aggressive undervolting, i.e., voltage underscaling below the nominal level to reduce the energy consumption of Field Programmable Gate Arrays (FPGAs). Usually, voltage guardbands are added by chip vendors to ensure the worst-case process and environmental scenarios. Through experimenting on several FPGA architectures, we con¿rm a large voltage guardband for several FPGA components, which in turn, delivers signi¿cant power savings. However, further undervolting below the voltage guardband may cause reliability issues as the result of the circuit delay increase, and faults might start to appear. We extensively characterize the behavior of these faults in terms of the rate, location, type, as well as sensitivity to environmental temperature, primarily focusing on FPGA on-chip memories, or Block RAMs (BRAMs). Understanding this behavior can allow to deploy ef¿cient mitigation techniques, and in turn, FPGA-based designs can be improved for better energy, reliability, and performance trade-offs. Finally, as a case study, we evaluate a typical FPGA-based Neural Network (NN) accelerator when the FPGA voltage is underscaled. In consequence, the substantial NN energy savings come with the cost of NN accuracy loss. To attain power savings without NN accuracy loss below the voltage guardband gap, we proposed an application-aware technique and we also, evaluated the built-in Error-Correcting Code (ECC) mechanism. Hence, First, we developed an application-dependent BRAMs placement technique that relies on the deterministic behavior of undervolting faults, and mitigates these faults by mapping the most reliability sensitive NN parameters to BRAM blocks that are relatively more resistant to undervolting faults. Second, as a more general technique, we applied the built-in ECC of BRAMs and observed a signi¿cant fault coverage capability thanks to the behavior of undervolting faults, with a negligible power consumption overhead.En este trabajo, evaluamos el reducir el voltaje en forma agresiva, es decir, bajar la tensión por debajo del nivel nominal para reducir el consumo de energía en Field Programmable Gate Arrays (FPGA). Por lo general, los vendedores de chips establecen margen de seguridad al voltaje para garantizar el funcionamiento de los mismos en el peor de los casos y en los peores escenarios ambientales. Mediante la experimentación en varias arquitecturas FPGA, confirmamos que hay un margen de seguridad de voltaje grande en varios de los componentes de la FPGA, que a su vez, nos ofrece ahorros de energía significativos. Sin embargo, un trabajar a un voltaje por debajo del margen de seguridad del voltaje puede causar problemas de confiabilidad a medida ya que aumenta el retardo del circuito y pueden comenzar a aparecer fallos. Caracterizamos ampliamente el comportamiento de estos fallos en términos de velocidad, ubicación, tipo, así como la sensibilidad a la temperatura ambiental, centrándonos principalmente en memorias internas de la FPGA, o Block RAM (BRAM). Comprender este comportamiento puede permitir el desarrollo de técnicas eficientes de mitigación y, a su vez, mejorar los diseños basados en FPGA para obtener ahorros en energía, una mayor confiabilidad y un mayor rendimiento. Finalmente, como caso de estudio, evaluamos un acelerador típico de Redes Neuronales basado en FPGA cuando el voltaje de la FPGA esta por debajo del nivel mínimo de seguridad. En consecuencia, los considerables ahorros de energía de la red neuronal vienen asociados con la pérdida de precisión de la red neuronal. Para obtener ahorros de energía sin una pérdida de precisión en la red neuronal por debajo del margen de seguridad del voltaje, proponemos una técnica que tiene en cuenta la aplicación, asi mismo, evaluamos el mecanismo integrado en las BRAMs de Error Correction Code (ECC). Por lo tanto, en primer lugar, desarrollamos una técnica de colocación de BRAM dependiente de la aplicación que se basa en el comportamiento determinista de las fallos cuando la FPGA funciona por debajo del margen de seguridad, y se mitigan estos fallos asignando los parámetros de la red neuronal más sensibles a producir fallos a los bloques BRAM que son relativamente más resistentes a los fallos. En segundo lugar, como técnica más general, aplicamos el ECC incorporado de los BRAM y observamos una capacidad de cobertura de fallos significativo gracias a las características de comportamiento de fallos, con una sobrecoste de consumo de energía insignificantePostprint (published version

    Towards Power- and Energy-Efficient Datacenters

    Full text link
    As the Internet evolves, cloud computing is now a dominant form of computation in modern lives. Warehouse-scale computers (WSCs), or datacenters, comprising the foundation of this cloud-centric web have been able to deliver satisfactory performance to both the Internet companies and the customers. With the increased focus and popularity of the cloud, however, datacenter loads rise and grow rapidly, and Internet companies are in need of boosted computing capacity to serve such demand. Unfortunately, power and energy are often the major limiting factors prohibiting datacenter growth: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. This dissertation aims to investigate the issues of power and energy usage in a modern datacenter environment. We identify the source of power and energy inefficiency at three levels in a modern datacenter environment and provides insights and solutions to address each of these problems, aiming to prepare datacenters for critical future growth. We start at the datacenter-level and find that the peak provisioning and improper service placement in multi-level power delivery infrastructures fragment the power budget inside production datacenters, degrading the compute capacity the existing infrastructure can support. We find that the heterogeneity among datacenter workloads is key to address this issue and design systematic methods to reduce the fragmentation and improve the utilization of the power budget. This dissertation then narrow the focus to examine the energy usage of individual servers running cloud workloads. Especially, we examine the power management mechanisms employed in these servers and find that the coarse time granularity of these mechanisms is one critical factor that leads to excessive energy consumption. We propose an intelligent and low overhead solution on top of the emerging finer granularity voltage/frequency boosting circuit to effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost, improving energy efficiency without sacrificing the quality of services. The final focus of this dissertation takes a further step to investigate how using a fundamentally more efficient computing substrate, field programmable gate arrays (FPGAs), benefit datacenter power and energy efficiency. Different from other types of hardware accelerations, FPGAs can be reconfigured on-the-fly to provide fine-grain control over hardware resource allocation and presents a unique set of challenges for optimal workload scheduling and resource allocation. We aim to design a set coordinated algorithms to manage these two key factors simultaneously and fully explore the benefit of deploying FPGAs in the highly varying cloud environment.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144043/1/hsuch_1.pd

    Leakage Power Modeling and Reduction Techniques for Field Programmable Gate Arrays

    Get PDF
    FPGAs have become quite popular for implementing digital circuits and systems because of reduced costs and fast design cycles. This has led to increased complexity of FPGAs, and with technology scaling, many new challenges have come up for the FPGA industry, leakage power being one of the key challenges. The current generation FPGAs are being implemented in 90nm technology, therefore, managing leakage power in deep-submicron FPGAs has become critical for the FPGA industry to remain competitive in the semiconductor market and to enter the mobile applications domain. In this work an analytical state dependent leakage power model for FPGAs is developed, followed by dual-Vt based designs of the FPGA architecture for reducing leakage power. The leakage power model computes subthreshold and gate leakage in FPGAs, since these are the two dominant components of total leakage power in the scaled nanometer technologies. The leakage power model takes into account the dependency of gate and subthreshold leakage on the state of the circuit inputs. The leakage power model has two main components, one which computes the probability of a state for a particular FPGA circuit element, and the other which computes the leakage of the FPGA circuit element for a given input using analytical equations. This FPGA power model is particularly important for rapidly analyzing various FPGA architectures across different technology nodes. Dual-Vt based designs of the FPGA architecture are proposed, developed, and evaluated, for reducing the leakage power using a CAD framework. The logic and the routing resources of the FPGA are considered for dual-Vt assignment. The number of the logic elements that can be assigned high-Vt in the ideal case by using a dual-Vt assignment algorithm in the CAD framework is estimated. Based upon this estimate two kinds of architectures are developed and evaluated, homogeneous and heterogeneous architectures. Results indicate that leakage power savings of up to 50% can be obtained from these architectures. The analytical state dependent leakage power model developed has been used for estimating the leakage power savings from the dual-Vt FPGA architectures. The CAD framework that has been developed can also be used for developing and evaluating different dual-Vt FPGA architectures, other than the ones proposed in this work

    Power-Adaptive Computing System Design for Solar-Energy-Powered Embedded Systems

    Get PDF
    corecore