29 research outputs found

    GROK-FPGA: Generating Real on-Chip Knowledge for FPGA Fine-Grain Delays Using Timing Extraction

    Get PDF
    Circuit variation is one of the biggest problems to overcome if Moore\u27s Law is to continue. It is no longer possible to maintain an abstraction of identical devices without huge yield losses, performance penalties, and energy costs. Current techniques such as margining and grade binning are used to deal with this problem. However, they tend to be conservative, offering limited solutions that will not scale as variation increases. Conventional circuits use limited tests and statistical models to determine the margining and binning required to counteract variation. If the limited tests fail, the whole chip is discarded. On the other hand, reconfigurable circuits, such as FPGAs, can use more fine-grained, aggressive techniques that carefully choose which resources to use in order to mitigate variation. Knowing which resources to use and avoid, however, requires measurement of underlying variation. We present Timing Extraction, a methodology that allows measurement of process variation without expensive testers nor highly invasive techniques, rather, relying only on resources already available on conventional FPGAs. It takes advantage of the fact that we can measure the delay of logic paths between any two registers. Measuring enough paths, provides the information necessary to decompose the delay of each path into individual components-essentially, forming a system of linear equations. Determining which paths to measure requires simple graph transformation algorithms applied to a representation of the FPGA circuit. Ultimately, this process decomposes the FPGA into individual components and identifies which paths to measure for computing the delay of individual components. We apply Timing Extraction to 18 commercially available Altera Cyclone III (65 nm) FPGAs. We measure 22×28 logic clusters and the interconnect within and between cluster. Timing Extraction decomposes this region into 1,356,182 components, classified into 10 categories, requiring 2,736,556 path measurements. With an accuracy of ±3.2 ps, our measurements reveal regional variation on the order of 50 ps, systematic variation from 30 ps to 70 ps, and random variation in the clusters with σ=15 ps and in the interconnect with σ=62 ps

    Algorithms and Techniques for Conquering Extreme Physical Variation in Bottom-Up Nanoscale Systems

    Get PDF
    Nanowire building blocks provide a promising path to small feature size and thus the ability to more densely pack logic. However, the small feature size and novel, bottom-up manufacturing process will exhibit extreme variation and produce many devices that operate outside acceptable operating ranges. One-mapping-fits-all, prefabrication assignment of logical functions to physical transistors that exhibit high threshold variation will not work—combining the wide range of physical variation in transistor threshold voltage with the wide range of fanouts in the design produces an unworkably large composite range of possible delays. Nonetheless, by carefully matching the fanout of each net to the physical threshold voltages of devices after fabrication, it is possible to reduce the net range of path delays sufficiently to achieve high system yield. Characterization of the complete threshold voltage distribution present in the system can be measured at a rate of 108 resources per second by augmenting the system with voltage comparison mechanisms. By adding a modest amount of extra resources, we achieve 100% yield for systems built out of devices with 38% variation, the ITRS prediction for threshold variation in 5 nm transistors. Moreover, for these systems, we maintain delay, energy and area close to the variation-free nominal case. What’s more, there is only a 10% overhead when the measurement precision is limited to ten discrete threshold voltage values

    Crystals and Snowflakes: Building Computation from Stochastically-Assembled, Defect- and Variation-prone Nanowire Crossbars

    No full text

    Evaluation of design strategies for stochastically assembled nanoarray memories

    No full text
    A key challenge facing nanotechnologies is learning to control uncertainty introduced by stochastic self-assembly. In this article, we explore architectural and manufacturing strategies to cope with this uncertainty when assembling nanoarrays, crossbars composed of two orthogonal sets of parallel nanowires (NWs) that are differentiated at their time of manufacture. NW deposition is a stochastic process and the NW encodings present in an array cannot be known in advance. We explore the reliable construction of memories from stochastically assembled arrays. This is accomplished by describing several families of NW encodings and developing strategies to map external binary addresses onto internal NW encodings using programmable circuitry. We explore a variety of different mapping strategies and develop probabilistic methods of analysis. This is the first article that makes clear the wide range of choices that are available

    ABSTRACT Techniques for Fault Reduction in Out-of-Order Microprocessors

    No full text
    This paper addresses the issue of reducing transient faults that affect instructions while they are in the instruction queue waiting to be executed. Previous work has shown that for an in-order processor, squashing instructions triggered by a cache miss can reduce the number of transient faults. This paper shows that for an outof-order processor, reducing the size of the instruction queue can have a bigger impact than more adaptive techniques such as fetch halting. Ongoing work will explore more effective techniques for selective fetch halting to provide a reduction in faults committed while having a minimal impact on performance. 1

    GROK-LAB

    No full text

    GROK-LAB: generating real on-chip knowledge for intra-cluster delays using timing extraction

    No full text
    Timing Extraction identifies the delay of fine-grained components within an FPGA. From these computed delays, the delay of any path can be calculated. Moreover, a comparison of the fine-grained delays allows a detailed understanding of the amount and type of process variation that exists in the FPGA. To obtain these delays, Timing Extraction measures, using only resources already available in the FPGA, the delay of a small subset of the total paths in the FPGA. We apply Timing Extraction to the Logic Array Block (LAB) on an Altera Cyclone III FPGA to obtain a view of the delay down to near individual LUT granularity, characterizing components with delays on the order of a few hundred picoseconds with a resolution of ±3.2 ps. This information reveals that the 65 nm process used has, on average, random variation of σ/µ = 4.0% with components having an average maximum spread of 83 ps. Timing Extraction also shows that as VDD decreases from 1.2 V to 0.9 V in a Cyclone IV 60 nm FPGA, paths slow down and variation increases from σ/µ = 4.3% to σ/µ 5.8%, a clear indication that lowering V_(DD) magnifies the impact of random variation

    3D Nanowire-Based Programmable Logic

    No full text
    In nanowire-based logic, the semiconducting material (e.g., Si, GaN, SiGe) is grown into individual nanowires rather than being part of the substrate. This offers us the opportunity to stack multiple layers of nanowires to create a three-dimensional logic structure which has high quality semiconductors in all vertical layers. The authors detail a feasible three-dimensional programmable logic architecture which can plausibly be realized from layers of semiconducting nanowires, making only modest assumptions about the control and placement of individual nanowires in the assembly. This shows a natural path for continuing to scale areal logic density once nanowire pitches approach fundamental limits. The authors show that the three dimensional systems are volumetrically efficient, with the surface area reducing roughly in proportion to the number of vertical layers. The authors further show that, on average, delay is reduced 18% from compact layout in three dimensions. For only a 20% area impact, the authors show how to avoid adding any manufacturing steps to physically isolate portions of nanowire layers
    corecore