210 research outputs found

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Fast algorithms for retiming large digital circuits

    Get PDF
    The increasing complexity of VLSI systems and shrinking time to market requirements demand good optimization tools capable of handling large circuits. Retiming is a powerful transformation that preserves functionality, and can be used to optimize sequential circuits for a wide range of objective functions by judiciously relocating the memory elements. Leiserson and Saxe, who introduced the concept, presented algorithms for period optimization (minperiod retiming) and area optimization (minarea retiming). The ASTRA algorithm proposed an alternative view of retiming using the equivalence between retiming and clock skew optimization;The first part of this thesis defines the relationship between the Leiserson-Saxe and the ASTRA approaches and utilizes it for efficient minarea retiming of large circuits. The new algorithm, Minaret, uses the same linear program formulation as the Leiserson-Saxe approach. The underlying philosophy of the ASTRA approach is incorporated to reduce the number of variables and constraints in this linear program. This allows minarea retiming of circuits with over 56,000 gates in under fifteen minutes;The movement of flip-flops in control logic changes the state encoding of finite state machines, requiring the preservation of initial (reset) states. In the next part of this work the problem of minimizing the number of flip-flops in control logic subject to a specified clock period and with the guarantee of an equivalent initial state, is formulated as a mixed integer linear program. Bounds on the retiming variables are used to guarantee an equivalent initial state in the retimed circuit. These bounds lead to a simple method for calculating an equivalent initial state for the retimed circuit;The transparent nature of level sensitive latches enables level-clocked circuits to operate faster and require less area. However, this transparency makes the operation of level-clocked circuits very complex, and optimization of level-clocked circuits is a difficult task. This thesis also presents efficient algorithms for retiming large level-clocked circuits. The relationship between retiming and clock skew optimization for level-clocked circuits is defined and utilized to develop efficient retiming algorithms for period and area optimization. Using these algorithms a circuit with 56,000 gates could be retimed for minimum period in under twenty seconds and for minimum area in under 1.5 hours

    Video face replacement

    Get PDF
    We present a method for replacing facial performances in video. Our approach accounts for differences in identity, visual appearance, speech, and timing between source and target videos. Unlike prior work, it does not require substantial manual operation or complex acquisition hardware, only single-camera video. We use a 3D multilinear model to track the facial performance in both videos. Using the corresponding 3D geometry, we warp the source to the target face and retime the source to match the target performance. We then compute an optimal seam through the video volume that maintains temporal consistency in the final composite. We showcase the use of our method on a variety of examples and present the result of a user study that suggests our results are difficult to distinguish from real video footage.National Science Foundation (U.S.) (Grant PHY-0835713)National Science Foundation (U.S.) (Grant DMS-0739255

    Exploiting parallelism within multidimensional multirate digital signal processing systems

    Get PDF
    The intense requirements for high processing rates of multidimensional Digital Signal Processing systems in practical applications justify the Application Specific Integrated Circuits designs and parallel processing implementations. In this dissertation, we propose novel theories, methodologies and architectures in designing high-performance VLSI implementations for general multidimensional multirate Digital Signal Processing systems by exploiting the parallelism within those applications. To systematically exploit the parallelism within the multidimensional multirate DSP algorithms, we develop novel transformations including (1) nonlinear I/O data space transforms, (2) intercalation transforms, and (3) multidimensional multirate unfolding transforms. These transformations are applied to the algorithms leading to systematic methodologies in high-performance architectural designs. With the novel design methodologies, we develop several architectures with parallel and distributed processing features for implementing multidimensional multirate applications. Experimental results have shown that those architectures are much more efficient in terms of execution time and/or hardware cost compared with existing hardware implementations

    Design-for-Test and Test Optimization Techniques for TSV-based 3D Stacked ICs

    Get PDF
    <p>As integrated circuits (ICs) continue to scale to smaller dimensions, long interconnects</p><p>have become the dominant contributor to circuit delay and a significant component of</p><p>power consumption. In order to reduce the length of these interconnects, 3D integration</p><p>and 3D stacked ICs (3D SICs) are active areas of research in both academia and industry.</p><p>3D SICs not only have the potential to reduce average interconnect length and alleviate</p><p>many of the problems caused by long global interconnects, but they can offer greater design</p><p>flexibility over 2D ICs, significant reductions in power consumption and footprint in</p><p>an era of mobile applications, increased on-chip data bandwidth through delay reduction,</p><p>and improved heterogeneous integration.</p><p>Compared to 2D ICs, the manufacture and test of 3D ICs is significantly more complex.</p><p>Through-silicon vias (TSVs), which constitute the dense vertical interconnects in a</p><p>die stack, are a source of additional and unique defects not seen before in ICs. At the same</p><p>time, testing these TSVs, especially before die stacking, is recognized as a major challenge.</p><p>The testing of a 3D stack is constrained by limited test access, test pin availability,</p><p>power, and thermal constraints. Therefore, efficient and optimized test architectures are</p><p>needed to ensure that pre-bond, partial, and complete stack testing are not prohibitively</p><p>expensive.</p><p>Methods of testing TSVs prior to bonding continue to be a difficult problem due to test</p><p>access and testability issues. Although some built-in self-test (BIST) techniques have been</p><p>proposed, these techniques have numerous drawbacks that render them impractical. In this dissertation, a low-cost test architecture is introduced to enable pre-bond TSV test through</p><p>TSV probing. This has the benefit of not needing large analog test components on the die,</p><p>which is a significant drawback of many BIST architectures. Coupled with an optimization</p><p>method described in this dissertation to create parallel test groups for TSVs, test time for</p><p>pre-bond TSV tests can be significantly reduced. The pre-bond probing methodology is</p><p>expanded upon to allow for pre-bond scan test as well, to enable both pre-bond TSV and</p><p>structural test to bring pre-bond known-good-die (KGD) test under a single test paradigm.</p><p>The addition of boundary registers on functional TSV paths required for pre-bond</p><p>probing results in an increase in delay on inter-die functional paths. This cost of test</p><p>architecture insertion can be a significant drawback, especially considering that one benefit</p><p>of 3D integration is that critical paths can be partitioned between dies to reduce their delay.</p><p>This dissertation derives a retiming flow that is used to recover the additional delay added</p><p>to TSV paths by test cell insertion.</p><p>Reducing the cost of test for 3D-SICs is crucial considering that more tests are necessary</p><p>during 3D-SIC manufacturing. To reduce test cost, the test architecture and test</p><p>scheduling for the stack must be optimized to reduce test time across all necessary test</p><p>insertions. This dissertation examines three paradigms for 3D integration - hard dies, firm</p><p>dies, and soft dies, that give varying degrees of control over 2D test architectures on each</p><p>die while optimizing the 3D test architecture. Integer linear programming models are developed</p><p>to provide an optimal 3D test architecture and test schedule for the dies in the 3D</p><p>stack considering any or all post-bond test insertions. Results show that the ILP models</p><p>outperform other optimization methods across a range of 3D benchmark circuits.</p><p>In summary, this dissertation targets testing and design-for-test (DFT) of 3D SICs.</p><p>The proposed techniques enable pre-bond TSV and structural test while maintaining a</p><p>relatively low test cost. Future work will continue to enable testing of 3D SICs to move</p><p>industry closer to realizing the true potential of 3D integration.</p>Dissertatio

    Energy-Efficient Digital Circuit Design using Threshold Logic Gates

    Get PDF
    abstract: Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical. The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation. Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR. Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths. Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    A low-power quadrature digital modulator in 0.18um CMOS

    Get PDF
    Quadrature digital modulation techniques are widely used in modern communication systems because of their high performance and flexibility. However, these advantages come at the cost of high power consumption. As a result, power consumption has to be taken into account as a main design factor of the modulator.In this thesis, a low-power quadrature digital modulator in 0.18um CMOS is presented with the target system clock speed of 150 MHz. The quadrature digital modulator consists of several key blocks: quadrature direct digital synthesizer (QDDS), pulse shaping filter, interpolation filter and inverse sinc filter. The design strategy is to investigate different implementations for each block and compare the power consumption of these implementations. Based on the comparison results, the implementation that consumes the lowest power will be chosen for each block. First of all, a novel low-power QDDS is proposed in the thesis. Power consumption estimation shows that it can save up to 60% of the power consumption at 150 MHz system clock frequency compared with one conventional design. Power consumption estimation results also show that using two pulse shaping blocks to process I/Q data, cascaded integrator comb (CIC) interpolation structure, and inverse sinc filter with modified canonic signed digit (MCSD) multiplication consume less power than alternative design choices. These low-power blocks are integrated together to achieve a low-power modulator. The power consumption estimation after layout shows that it only consumes about 95 mW at 150 MHz system clock rate, which is much lower than similar commercial products. The designed modulator can provide a low-power solution for various quadrature modulators. It also has an output bandwidth from 0 to 75 MHz, configurable pulse shaping filters and interpolation filters, and an internal sin(x)/x correction filter

    Arbitrary Hardware/Software Trade Offs

    Get PDF
    This paper discusses a novel transformation-based design methodology and its use in the design of complex programmable VLSI systems. During the life-cycle of a complex system, the optimal trade-off between partially implementing in hardware or software is changing. This is due to varying system requirements (short time-to-market, low-cost, low-power, etc.) and improving the device technology. The proposed methodology allows such redesigns to be made using different hardware-software trade-offs, in a guaranteed correct wa

    A Retiming-Based Test Pattern Generator Design for Built-In Self Test of Data Path Architectures

    Get PDF
    Recently, a new Built-In Self Test (BIST) methodology based on balanced bistable sequential kernels has been proposed that reduces the area overhead and performance degradation associated with the conventional BILBO-oriented BIST methodology. This new methodology guarantees high fault coverage but requires special test sequences and test pattern generator (TPG) designs. In this paper, we demonstrate the use of the retiming technique in designing TPGs for balanced bistable sequential kernels. Experimental results on ISCAS benchmark circuits demonstrate the effectiveness of the designed TPGs in achieving higher fault coverage than the conventional maximal-length LFSR TPGs
    corecore