Search CORE

210 research outputs found

Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

Author: Alex Kondratyev
Christos Sotiriou
Jordi Cortadella
Lavagno Luciano
Publication venue
Publication date: 01/01/2006
Field of study

Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Fast algorithms for retiming large digital circuits

Author: Maheshwari Naresh
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1998
Field of study

The increasing complexity of VLSI systems and shrinking time to market requirements demand good optimization tools capable of handling large circuits. Retiming is a powerful transformation that preserves functionality, and can be used to optimize sequential circuits for a wide range of objective functions by judiciously relocating the memory elements. Leiserson and Saxe, who introduced the concept, presented algorithms for period optimization (minperiod retiming) and area optimization (minarea retiming). The ASTRA algorithm proposed an alternative view of retiming using the equivalence between retiming and clock skew optimization;The first part of this thesis defines the relationship between the Leiserson-Saxe and the ASTRA approaches and utilizes it for efficient minarea retiming of large circuits. The new algorithm, Minaret, uses the same linear program formulation as the Leiserson-Saxe approach. The underlying philosophy of the ASTRA approach is incorporated to reduce the number of variables and constraints in this linear program. This allows minarea retiming of circuits with over 56,000 gates in under fifteen minutes;The movement of flip-flops in control logic changes the state encoding of finite state machines, requiring the preservation of initial (reset) states. In the next part of this work the problem of minimizing the number of flip-flops in control logic subject to a specified clock period and with the guarantee of an equivalent initial state, is formulated as a mixed integer linear program. Bounds on the retiming variables are used to guarantee an equivalent initial state in the retimed circuit. These bounds lead to a simple method for calculating an equivalent initial state for the retimed circuit;The transparent nature of level sensitive latches enables level-clocked circuits to operate faster and require less area. However, this transparency makes the operation of level-clocked circuits very complex, and optimization of level-clocked circuits is a difficult task. This thesis also presents efficient algorithms for retiming large level-clocked circuits. The relationship between retiming and clock skew optimization for level-clocked circuits is defined and utilized to develop efficient retiming algorithms for period and area optimization. Using these algorithms a circuit with 56,000 gates could be retimed for minimum period in under twenty seconds and for minimum area in under 1.5 hours

Digital Repository @ Iowa State University (ISU)

CiteSeerX

Video face replacement

Author: Daniel Vlasic
DeCarlo D.
Essa I.
Everingham M.
Hanspeter Pfister
Jones A.
Kalyan Sunkavalli
Kemelmacher-Shlizerman I.
Kevin Dale
Micah K. Johnson
Pighin F. H.
Robertson B.
Viola P. A.
Wojciech Matusik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

We present a method for replacing facial performances in video. Our approach accounts for differences in identity, visual appearance, speech, and timing between source and target videos. Unlike prior work, it does not require substantial manual operation or complex acquisition hardware, only single-camera video. We use a 3D multilinear model to track the facial performance in both videos. Using the corresponding 3D geometry, we warp the source to the target face and retime the source to match the target performance. We then compute an optimal seam through the video volume that maintains temporal consistency in the final composite. We showcase the use of our method on a variety of examples and present the result of a user study that suggests our results are difficult to distinguish from real video footage.National Science Foundation (U.S.) (Grant PHY-0835713)National Science Foundation (U.S.) (Grant DMS-0739255

DSpace@MIT

Crossref

Exploiting parallelism within multidimensional multirate digital signal processing systems

Author: Peng Dongming
Publication venue: Texas A&M University
Publication date: 30/09/2004
Field of study

The intense requirements for high processing rates of multidimensional Digital Signal Processing systems in practical applications justify the Application Specific Integrated Circuits designs and parallel processing implementations. In this dissertation, we propose novel theories, methodologies and architectures in designing high-performance VLSI implementations for general multidimensional multirate Digital Signal Processing systems by exploiting the parallelism within those applications. To systematically exploit the parallelism within the multidimensional multirate DSP algorithms, we develop novel transformations including (1) nonlinear I/O data space transforms, (2) intercalation transforms, and (3) multidimensional multirate unfolding transforms. These transformations are applied to the algorithms leading to systematic methodologies in high-performance architectural designs. With the novel design methodologies, we develop several architectures with parallel and distributed processing features for implementing multidimensional multirate applications. Experimental results have shown that those architectures are much more efficient in terms of execution time and/or hardware cost compared with existing hardware implementations

Texas A&M Repository

Design-for-Test and Test Optimization Techniques for TSV-based 3D Stacked ICs

Author: Noia Brandon Robert
Publication venue
Publication date: 01/01/2014
Field of study

As integrated circuits (ICs) continue to scale to smaller dimensions, long interconnectshave become the dominant contributor to circuit delay and a significant component ofpower consumption. In order to reduce the length of these interconnects, 3D integrationand 3D stacked ICs (3D SICs) are active areas of research in both academia and industry.3D SICs not only have the potential to reduce average interconnect length and alleviatemany of the problems caused by long global interconnects, but they can offer greater designflexibility over 2D ICs, significant reductions in power consumption and footprint inan era of mobile applications, increased on-chip data bandwidth through delay reduction,and improved heterogeneous integration.Compared to 2D ICs, the manufacture and test of 3D ICs is significantly more complex.Through-silicon vias (TSVs), which constitute the dense vertical interconnects in adie stack, are a source of additional and unique defects not seen before in ICs. At the sametime, testing these TSVs, especially before die stacking, is recognized as a major challenge.The testing of a 3D stack is constrained by limited test access, test pin availability,power, and thermal constraints. Therefore, efficient and optimized test architectures areneeded to ensure that pre-bond, partial, and complete stack testing are not prohibitivelyexpensive.Methods of testing TSVs prior to bonding continue to be a difficult problem due to testaccess and testability issues. Although some built-in self-test (BIST) techniques have beenproposed, these techniques have numerous drawbacks that render them impractical. In this dissertation, a low-cost test architecture is introduced to enable pre-bond TSV test throughTSV probing. This has the benefit of not needing large analog test components on the die,which is a significant drawback of many BIST architectures. Coupled with an optimizationmethod described in this dissertation to create parallel test groups for TSVs, test time forpre-bond TSV tests can be significantly reduced. The pre-bond probing methodology isexpanded upon to allow for pre-bond scan test as well, to enable both pre-bond TSV andstructural test to bring pre-bond known-good-die (KGD) test under a single test paradigm.The addition of boundary registers on functional TSV paths required for pre-bondprobing results in an increase in delay on inter-die functional paths. This cost of testarchitecture insertion can be a significant drawback, especially considering that one benefitof 3D integration is that critical paths can be partitioned between dies to reduce their delay.This dissertation derives a retiming flow that is used to recover the additional delay addedto TSV paths by test cell insertion.Reducing the cost of test for 3D-SICs is crucial considering that more tests are necessaryduring 3D-SIC manufacturing. To reduce test cost, the test architecture and testscheduling for the stack must be optimized to reduce test time across all necessary testinsertions. This dissertation examines three paradigms for 3D integration - hard dies, firmdies, and soft dies, that give varying degrees of control over 2D test architectures on eachdie while optimizing the 3D test architecture. Integer linear programming models are developedto provide an optimal 3D test architecture and test schedule for the dies in the 3Dstack considering any or all post-bond test insertions. Results show that the ILP modelsoutperform other optimization methods across a range of 3D benchmark circuits.In summary, this dissertation targets testing and design-for-test (DFT) of 3D SICs.The proposed techniques enable pre-bond TSV and structural test while maintaining arelatively low test cost. Future work will continue to enable testing of 3D SICs to moveindustry closer to realizing the true potential of 3D integration.Dissertatio

DukeSpace

Energy-Efficient Digital Circuit Design using Threshold Logic Gates

Author
Publication venue
Publication date: 01/01/2015
Field of study

abstract: Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical. The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation. Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR. Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths. Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

A low-power quadrature digital modulator in 0.18um CMOS

Author: Hu Song
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

Quadrature digital modulation techniques are widely used in modern communication systems because of their high performance and flexibility. However, these advantages come at the cost of high power consumption. As a result, power consumption has to be taken into account as a main design factor of the modulator.In this thesis, a low-power quadrature digital modulator in 0.18um CMOS is presented with the target system clock speed of 150 MHz. The quadrature digital modulator consists of several key blocks: quadrature direct digital synthesizer (QDDS), pulse shaping filter, interpolation filter and inverse sinc filter. The design strategy is to investigate different implementations for each block and compare the power consumption of these implementations. Based on the comparison results, the implementation that consumes the lowest power will be chosen for each block. First of all, a novel low-power QDDS is proposed in the thesis. Power consumption estimation shows that it can save up to 60% of the power consumption at 150 MHz system clock frequency compared with one conventional design. Power consumption estimation results also show that using two pulse shaping blocks to process I/Q data, cascaded integrator comb (CIC) interpolation structure, and inverse sinc filter with modified canonic signed digit (MCSD) multiplication consume less power than alternative design choices. These low-power blocks are integrated together to achieve a low-power modulator. The power consumption estimation after layout shows that it only consumes about 95 mW at 150 MHz system clock rate, which is much lower than similar commercial products. The designed modulator can provide a low-power solution for various quadrature modulators. It also has an output bandwidth from 0 to 75 MHz, configurable pulse shaping filters and interpolation filters, and an internal sin(x)/x correction filter

eCommons@USASK

University of Saskatchewan Research Archive

Arbitrary Hardware/Software Trade Offs

Author: Middelhoek P.F.A.
Middelhoek Peter F.A.
Publication venue: IEEE
Publication date: 01/06/1995
Field of study

This paper discusses a novel transformation-based design methodology and its use in the design of complex programmable VLSI systems. During the life-cycle of a complex system, the optimal trade-off between partially implementing in hardware or software is changing. This is due to varying system requirements (short time-to-market, low-cost, low-power, etc.) and improving the device technology. The proposed methodology allows such redesigns to be made using different hardware-software trade-offs, in a guaranteed correct wa

University of Twente Research Information

A Retiming-Based Test Pattern Generator Design for Built-In Self Test of Data Path Architectures

Author: El-Maleh Aiman H.
Osais Yahya E.
Publication venue
Publication date: 01/01/2001
Field of study

Recently, a new Built-In Self Test (BIST) methodology based on balanced bistable sequential kernels has been proposed that reduces the area overhead and performance degradation associated with the conventional BILBO-oriented BIST methodology. This new methodology guarantees high fault coverage but requires special test sequences and test pattern generator (TPG) designs. In this paper, we demonstrate the use of the retiming technique in designing TPGs for balanced bistable sequential kernels. Experimental results on ISCAS benchmark circuits demonstrate the effectiveness of the designed TPGs in achieving higher fault coverage than the conventional maximal-length LFSR TPGs

KFUPM ePrints