592 research outputs found

    Elastic circuits

    Get PDF
    Elasticity in circuits and systems provides tolerance to variations in computation and communication delays. This paper presents a comprehensive overview of elastic circuits for those designers who are mainly familiar with synchronous design. Elasticity can be implemented both synchronously and asynchronously, although it was traditionally more often associated with asynchronous circuits. This paper shows that synchronous and asynchronous elastic circuits can be designed, analyzed, and optimized using similar techniques. Thus, choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.Peer ReviewedPostprint (published version

    Digital Background Self-Calibration Technique for Compensating Transition Offsets in Reference-less Flash ADCs

    Get PDF
    This Dissertation focusses on proving that background calibration using adaptive algorithms are low-cost, stable and effective methods for obtaining high accuracy in flash A/D converters. An integrated reference-less 3-bit flash ADC circuit has been successfully designed and taped out in UMC 180 nm CMOS technology in order to prove the efficiency of our proposed background calibration. References for ADC transitions have been virtually implemented built-in in the comparators dynamic-latch topology by a controlled mismatch added to each comparator input front-end. An external very simple DAC block (calibration bank) allows control the quantity of mismatch added in each comparator front-end and, therefore, compensate the offset of its effective transition with respect to the nominal value. In order to assist to the estimation of the offset of the prototype comparators, an auxiliary A/D converter with higher resolution and lower conversion speed than the flash ADC is used: a 6-bit capacitive-DAC SAR type. Special care in synchronization of analogue sampling instant in both ADCs has been taken into account. In this thesis, a criterion to identify the optimum parameters of the flash ADC design with adaptive background calibration has been set. With this criterion, the best choice for dynamic latch architecture, calibration bank resolution and flash ADC resolution are selected. The performance of the calibration algorithm have been tested, providing great programmability to the digital processor that implements the algorithm, allowing to choose the algorithm limits, accuracy and quantization errors in the arithmetic. Further, systematic controlled offset can be forced in the comparators of the flash ADC in order to have a more exhaustive test of calibration

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Voltage stacking for near/sub-threshold operation

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationElasticity is a design paradigm in which circuits can tolerate arbitrary latency/delay variations in their computation units as well as communication channels. Creating elastic (both synchronous and asynchronous) designs from clocked designs has potential benefits of increased modularity and robustness to variations. Several transformations have been suggested in the literature and each of these require a handshake control network (examples include synchronous elasticization and desynchronization). Elastic control network area and power overheads may become prohibitive. This dissertation investigates different optimization avenues to reduce these overheads without sacrificing the control network performance. First, an algorithm and a tool, CNG, is introduced that generates a control network with minimal total number of join and fork control steering units. Synchronous Elastic FLow (SELF) is a handshake protocol used over synchronous elastic designs. Comparing to its standard eager implementation (that uses eager forks - EForks), lazy SELF can consume less power and area. However, it typically suff ers from combinational cycles and can have inferior performance in some systems. Hence, lazy SELF has been rarely studied in the literature. This work formally and exhaustively investigates the specifi cations, diff erent implementations, and verifi cation of the lazy SELF protocol. Furthermore, several new and existing lazy designs are mapped to hybrid eager/lazy imple-mentations that retain the performance advantage of the eager design but have power and area advantages of lazy implementations, and are combinational-cycle free. This work also introduces a novel ultra simple fork (USFork) design. The USFork has two advantages over lazy forks: it is composed of simpler logic (just wires) and does not form combinational cycles. The conditions under which an EFork can be replaced by a USFork without any performance loss are formally derived. The last optimization avenue discussed in this dissertation is Elastic Bu er Controller (EBC) merging. In a typical synchronous elastic control network, some EBCs may activate their corresponding latches at similar schedules. This work provides a framework for fi nding and merging such controllers in any control network; including open networks (i.e., when the environment abstract is not available or required to be flexible) as well as networks incorporating variable latency units. Replacing EForks with USForks under some equivalence conditions as well as EBC merging have been fully automated in a tool, HGEN. The impact of this work will help achieve elasticity at a reduced cost. It will broaden the class of circuits that can be elasticized with acceptable overhead (circuits that designers would otherwise nd it too expensive to elasticize). In a MiniMIPS processor case study, comparing to a basic control network implementation, the optimization techniques of this dissertation accumulatively achieve reductions in the control network area, dynamic, and leakage power of 73.2%, 68.6%, and 69.1%, respectively

    Sub-threshold Operation of a Timing Error Detection Latch

    Get PDF
    Ajoitusvirheentunnistus (TED) mahdollistaa energian kulutuksen vähentämisen mikroprosessoreissa. Tässä diplomityössä on kaksi versiota ajoitusvirheentunnistavasta salvasta (esim. TDTBsubI ja TDTBsubII) ja systeemitason testipiiri (SystemTest), joka käyttää TDTBsub salpaa, mikä on suunniteltu toimimaan kynnysalueen alapuolella. Diplomityö esittelee ensin dynaamisen jännitteen skaalauksen (DVS), koska TED käytetään sellaisissa järjestelmissä. Seuraavaksi esitellään teoriaa kynnysalueen alapuolen suunnittelun haasteista. Sitten esitellään molempien TDTBsub salpojen ja SystemTest-lohkojen suunnittelu. Simulaatiotuloksia esitellään keskittyen operaatiotaajuuteen, energian kulutukseen ja toimintavarmuuteen variaatiot huomioon ottaen. Operoitaessa kynnysalueen alapuolella TDTB-piirillä keskityttiin koon mitoittamiseen ja suunnittelutyyliin. Ennen kaikkea kaikkien komponenttien mitoituksen piti olla suurempi kuin minimi CMOS-tekniikan leveydet. Vaikka mitoittamisella saavutettiin toimintavarmuutta kynnysalueen alapuolella toimittaessa myös energian kulutus kasvoi siellä toimittaessa. Perinteisiä vuotovirtojen vähentäviä mitoitustoimenpiteitä tehtiin suurimmalle osalle komponenteista. Logiikkatyyli on tärkeää kynnysalueen alapuolella operoitaessa. TDTBsubII salvassa uuden tekniikan näytetään antavan systeemitason suorituskykyä. Simulaatioilla näytettiin kuinka ajoitusvirheentunnistus kykeni toimimaan kynnystason alapuolella. TDTBsubI:n ja yhteenlaskun testipiirin piirinkuvio tehtiin 65nm CMOS-prosessilla. TDTBsubII salpaa ei tehty, koska se suunniteltiin piirin määräajan jälkeen. Piiriä tarkasteltaessa osoittautui, että piiri ei toiminut. Piirin toimimattomuus johtui tuotantovaiheessa tapahtuneesta virheestä eikä suunnittelusta.Timing error detection (TED) is used to enable the reduction the energy consumption of microprocessors. In this thesis work, two versions of TED latches (i.e. TDTBsubI and TDTBsubII) and a system-level test circuit (SystemTest) that utilizes the TDTBsubI latch have been designed to operate in sub-threshold. The thesis first introduces dynamic voltage scaling (DVS) since TED is utilized with such a system. Next, theory is given to highlight the challenges within sub-threshold. The design of the both TDTBsub latches and SystemTest are then given. Simulation results follow with a focus on operation frequency, energy consumption, and robustness in the presence of variations. To operate TDTBsub into sub-threshold, attention was given to sizing and logic style. In general, the sizing of all components was required to be larger than the minimum CMOS width. Although this provided robustness in sub-threshold, the energy consumption in above sub-threshold was much higher. General leakage reduction sizing techniques were also applied to the majority of components. The choice of logic style is important for sub-threshold operation. In the TDTBsubII latch, a new technique is shown to provide system-level capability. Simulations displayed the capability of TED in sub-threshold. The layout of TDTBsubI and an adder test circuit were constructed in 65 nm CMOS. The TDTBsubII latch was not built since it was designed after the chip deadline. Upon inspection of the chip, it was determined to be inoperative. This mistake was a result of the manufacturering process and not the design in this work

    Exploiting Fine-Grain Concurrency Analytical Insights in Superscalar Processor Design

    Get PDF
    This dissertation develops analytical models to provide insight into various design issues associated with superscalar-type processors, i.e., the processors capable of executing multiple instructions per cycle. A survey of the existing machines and literature has been completed with a proposed classification of various approaches for exploiting fine-grain concurrency. Optimization of a single pipeline is discussed based on an analytical model. The model-predicted performance curves are found to be in close proximity to published results using simulation techniques. A model is also developed for comparing different branch strategies for single-pipeline processors in terms of their effectiveness in reducing branch delay. The additional instruction fetch traffic generated by certain branch strategies is also studied and is shown to be a useful criterion for choosing between equally well performing strategies. Next, processors with multiple pipelines are modelled to study the tradeoffs associated with deeper pipelines versus multiple pipelines. The model developed can reveal the cause of performance bottleneck: insufficient resources to exploit discovered parallelism, insufficient instruction stream parallelism, or insufficient scope of concurrency detection. The cost associated with speculative (i.e., beyond basic block) execution is examined via probability distributions that characterize the inherent parallelism in the instruction stream. The throughput prediction of the analytic model is shown, using a variety of benchmarks, to be close to the measured static throughput of the compiler output, under resource and scope constraints. Further experiments provide misprediction delay estimates for these benchmarks under scope constraints, assuming beyond-basic-block, out-of-order execution and run-time scheduling. These results were derived using traces generated by the Multiflow TRACE SCHEDULING™(*) compacting C and FORTRAN 77 compilers. A simplified extension to the model to include multiprocessors is also proposed. The extended model is used to analyze combined systems, such as superpipelined multiprocessors and superscalar multiprocessors, both with shared memory. It is shown that the number of pipelines (or processors) at which the maximum throughput is obtained is increasingly sensitive to the ratio of memory access time to network access delay, as memory access time increases. Further, as a function of inter-iteration dependency distance, optimum throughput is shown to vary nonlinearly, whereas the corresponding Optimum number of processors varies linearly. The predictions from the analytical model agree with published results based on simulations. (*)TRACE SCHEDULING is a trademark of Multiflow Computer, Inc

    Design and modelling of variability tolerant on-chip communication structures for future high performance system on chip designs

    Get PDF
    The incessant technology scaling has enabled the integration of functionally complex System-on-Chip (SoC) designs with a large number of heterogeneous systems on a single chip. The processing elements on these chips are integrated through on-chip communication structures which provide the infrastructure necessary for the exchange of data and control signals, while meeting the strenuous physical and design constraints. The use of vast amounts of on chip communications will be central to future designs where variability is an inherent characteristic. For this reason, in this thesis we investigate the performance and variability tolerance of typical on-chip communication structures. Understanding of the relationship between variability and communication is paramount for the designers; i.e. to devise new methods and techniques for designing performance and power efficient communication circuits in the forefront of challenges presented by deep sub-micron (DSM) technologies. The initial part of this work investigates the impact of device variability due to Random Dopant Fluctuations (RDF) on the timing characteristics of basic communication elements. The characterization data so obtained can be used to estimate the performance and failure probability of simple links through the methodology proposed in this work. For the Statistical Static Timing Analysis (SSTA) of larger circuits, a method for accurate estimation of the probability density functions of different circuit parameters is proposed. Moreover, its significance on pipelined circuits is highlighted. Power and area are one of the most important design metrics for any integrated circuit (IC) design. This thesis emphasises the consideration of communication reliability while optimizing for power and area. A methodology has been proposed for the simultaneous optimization of performance, area, power and delay variability for a repeater inserted interconnect. Similarly for multi-bit parallel links, bandwidth driven optimizations have also been performed. Power and area efficient semi-serial links, less vulnerable to delay variations than the corresponding fully parallel links are introduced. Furthermore, due to technology scaling, the coupling noise between the link lines has become an important issue. With ever decreasing supply voltages, and the corresponding reduction in noise margins, severe challenges are introduced for performing timing verification in the presence of variability. For this reason an accurate model for crosstalk noise in an interconnection as a function of time and skew is introduced in this work. This model can be used for the identification of skew condition that gives maximum delay noise, and also for efficient design verification
    corecore