56 research outputs found

    Impacto da variabilidade PVT em somadores construídos com XORs

    Get PDF
    A operação de soma é a mais usada em Unidades Lógicas e Aritméticas (ULA). A ULA é a unidade mais importante no processamento de dados. Em sistemas digitais, é desejado um somador completo com baixo consumo de energia e um alto desempenho. O somador completo faz parte do caminho crítico em sistemas computacionais, ele pode ser implementado de diversas maneiras, a maioria delas tendo como seu principal sub-circuito a porta lógica OU-exclusivo (XOR). Consequentemente, o estudo de somadores completos compostos por combinações de portas lógicas XOR é de grande valia para pesquisas na literatura. Melhorias nos módulos aritméticos pode reduzir significativamente o consumo de potência dos sistemas, mas em tecnologias nanométricas é necessário considerar o impacto da variabilidade. Esse trabalho tem como objetivo analisar projetos de somadores completos que quando submetidos aos efeitos de variabilidade devem ser robustos, ter um bom desempenho e mostrar bons resultados em consumo de energia, quando estão operando em tensão nominal e em tensão de quase limiar. Além disso, foi utilizada uma técnica chamada de célula de desacoplamento (Dcell) visando uma alternativa para a redução da variabilidade de processo. Esse trabalho analisa e compara 4 somadores tradicionais e 9 somadores completos construídos através de 3 blocos lógicos, dos quais 2 deles são substituídos por portas lógicas XOR, em uma tecnologia FinFET de 7nm. Foi observado que circuitos somadores que foram construídos usando a XOR da família lógica CMOS, especialmente no segundo bloco, obtiveram piores resultados de desempenho e consumo energético. Somadores operando em tensão nominal são cerca de 80% mais robustos quanto ao impacto da variabilidade de processo no consumo máximo. A operação em quase limiar implica em uma alta sensibilidade no desempenho e consumo, alcançando mais de 300% nos piores casos. Em relação à variabilidade de processo, foi verificado um aumento de sensibilidade de cerca de 40% no desempenho quando foram utilizadas a XOR V5 e a XOR V8 no segundo bloco dos somadores quando operando em tensão nominal. Para a operação em tensão de quase limiar o uso da metodologia proposta nesse trabalho mostrou ser uma boa opção para alcançar uma maior robustez quanto ao consumo dos circuitos. Considerando o uso da Dcell, na operação em tensão nominal, foi verificado uma redução no desempenho juntamente com uma redução na variabilidade. O melhor caso foi o somador FAV5V8 que para um aumento de 20% no atraso, obteve uma redução de 20% na variabilidade. Em relação ao consumo, houve uma redução de 16% na potência dinâmica, juntamente com uma redução de quase 30% na variabilidade, como o que ocorreu com o somador FAV8V1. Foi possível observar casos de redução da variabilidade em mais de 40% com um pequeno aumento no consumo dinâmico. O uso dessa técnica teve um alto impacto nos resultados de circuitos que operavam em tensão de quase limiar, chegando em alguns casos a mais de 40% de redução do desempenho para uma pequena redução na variabilidade. Quanto ao consumo, nesse caso, os somadores tradicionais foram os menos afetados, e novamente o uso da XOR V8 no segundo bloco para construção dos somadores mostrou ser uma boa opção para aumento da robustez dos circuitos.The sum operation is the most used in the Arithmetic and Logic Units (ALU). In digital systems, a complete adder with low energy consumption and high performance is desired. The full adder is part of the critical path in computer systems. It can be implemented in several ways, most of them having the OR-exclusive logic gate (XOR) as its main sub-circuit. Consequently, the study of full adders composed of combinations of XOR logic gates has a great value in the literature. Improvements in arithmetic modules can significantly reduce the power consumption of systems, however, in nanometric technologies it is necessary to consider the impact of variability. This work aims to analyse designs of full adders considering variability effects, comparing performance and energy consumption when operating at nominal voltage and also at near threshold voltage. In addition, a technique called decoupling cell (Dcell) was used to provide an alternative for reducing process variability. This work analyses and compares four traditional adders and nine adders built using three logic blocks, where two of them are replaced by XOR logic gates, in a 7nm FinFET technology. It was observed that full adders that were built using the XOR of the CMOS logic family, especially in the second block, had worse results in performance and energy consumption. Full adders operating at nominal voltage regime are about 80% more robust in terms of the impact of process variability on maximum consumption. The near threshold operation implies a high sensitivity in performance and consumption, reaching more than 300% in the worst cases. Regarding the process variability, there was an increase in sensitivity of about 40% in performance when the XOR V5 and XOR V8 were used in the second block of the adder when operating at nominal voltage. For the voltage operation of near threshold, the use of the methodology proposed in this work demonstrate to be a good option to achieve greater robustness regarding the consumption of the circuits. Considering the use of Dcell, in the operation at nominal voltage, a reduction in performance was verified together with a reduction in variability. The best case was the adder FAV5V8 which for a 20% increase in delay, obtained a reduction of 20% in variability. In relation to dynamic consumption, there was a 16% reduction in power, together with a reduction of almost 30% in variability, as occurred with the FAV8V1 adder. It was possible to observe cases of reduced variability by more than 40% with a small increase in dynamic consumption. The use of this technique had a high impact on the results of circuits operating at near threshold voltage, in some cases reaching more than 40% reduction in performance for a small reduction in variability. For consumption, in this case, the traditional full adders were the least affected, and again the use of the XOR V8 in the second block for the construction of the adder proved to be a good option for increasing the robustness of the circuits

    Power efficient resilient microarchitectures for PVT variability mitigation

    Get PDF
    Nowadays, the high power density and the process, voltage, and temperature variations became the most critical issues that limit the performance of the digital integrated circuits because of the continuous scaling of the fabrication technology. Dynamic voltage and frequency scaling technique is used to reduce the power consumption while different time relaxation techniques and error recovery microarchitectures are used to tolerate the process, voltage, and temperature variations. These techniques reduce the throughput by scaling down the frequency or flushing and restarting the errant pipeline. This thesis presents a novel resilient microarchitecture which is called ERSUT-based resilient microarchitecture to tolerate the induced delays generated by the voltage scaling or the process, voltage, and temperature variations. The resilient microarchitecture detects and recovers the induced errors without flushing the pipeline and without scaling down the operating frequency. An ERSUT-based resilient 16 × 16 bit MAC unit, implemented using Global Foundries 65 nm technology and ARM standard cells library, is introduced as a case study with 18.26% area overhead and up to 1.5x speedup. At the typical conditions, the maximum frequency of the conventional MAC unit is about 375 MHz while the resilient MAC unit operates correctly at a frequency up to 565 MHz. In case of variations, the resilient MAC unit tolerates induced delays up to 50% of the clock period while keeping its throughput equal to the conventional MAC unit’s maximum throughput. At 375 MHz, the resilient MAC unit is able to scale down the supply voltage from 1.2 V to 1.0 V saving about 29% of the power consumed by the conventional MAC unit. A double-edge-triggered microarchitecture is also introduced to reduce the power consumption extremely by reducing the frequency of the clock tree to the half while preserving the same maximum throughput. This microarchitecture is applied to different ISCAS’89 benchmark circuits in addition to the 16x16 bit MAC unit and the average power reduction of all these circuits is 63.58% while the average area overhead is 31.02%. All these circuits are designed using Global Foundries 65nm technology and ARM standard cells library. Towards the full automation of the ERSUT-based resilient microarchitecture, an ERSUT-based algorithm is introduced in C++ to accelerate the design process of the ERSUT-based microarchitecture. The developed algorithm reduces the design-time efforts dramatically and allows the ERSUT-based microarchitecture to be adopted by larger industrial designs. Depending on the ERSUT-based algorithm, a validation study about applying the ERSUT-based microarchitecture on the MAC unit and different ISCAS’89 benchmark circuits with different complexity weights is introduced. This study shows that 72% of these circuits tolerates more than 14% of their clock periods and 54.5% of these circuits tolerates more than 20% while 27% of these circuits tolerates more than 30%. Consequently, the validation study proves that the ERSUT-based resilient microarchitecture is a valid applicable solution for different circuits with different complexity weights

    Variation-aware high-level DSP circuit design optimisation framework for FPGAs

    Get PDF
    The constant technology shrinking and the increasing demand for systems that operate under different power profiles with the maximum performance, have motivated the work in this thesis. Modern design tools that target FPGA devices take a conservative approach in the estimation of the maximum performance that can be achieved by a design when it is placed on a device, accounting for any variability in the fabrication process of the device. The work presented here takes a new view on the performance improvement of DSP designs by pushing them into the error-prone regime, as defined by the synthesis tools, and by investigating methodologies that reduce the impact of timing errors at the output of the system. In this work two novel error reduction techniques are proposed to address this problem. One is based on reduced-precision redundancy and the other on an error optimisation framework that uses information from a prior characterisation of the device. The first one is a generic architecture that is appended to existing arithmetic operators. The second defines the high-level parameters of the algorithm without using extra resources. Both of these methods allow to achieve graceful degradation whilst variation increases. A comparison of the new methods is laid against the existing methodologies, and conclusions drawn on the tradeoffs between their cost, in terms of resources and errors, and their benefits in terms of throughput. In some cases it is possible to double the performance of the design while still producing valid results.Open Acces

    Multi-Threshold Low Power-Delay Product Memory and Datapath Components Utilizing Advanced FinFET Technology Emphasizing the Reliability and Robustness

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)In this thesis, we investigated the 7 nm FinFET technology for its delay-power product performance. In our study, we explored the ASAP7 library from Arizona State University, developed in collaboration with ARM Holdings. The FinFET technology was chosen since it has a subthreshold slope of 60mV/decade that enables cells to function at 0.7V supply voltage at the nominal corner. An emphasis was focused on characterizing the Non-Ideal effects, delay variation, and power for the FinFET device. An exhaustive analysis of the INVx1 delay variation for different operating conditions was also included, to assess the robustness. The 7nm FinFET device was then employed into 6T SRAM cells and 16 function ALU. The SRAM cells were approached with advanced multi-corner stability evaluation. The system-level architecture of the ALU has demonstrated an ultra-low power system operating at 1 GHz clock frequency

    Electronic systems for the restoration of the sense of touch in upper limb prosthetics

    Get PDF
    In the last few years, research on active prosthetics for upper limbs focused on improving the human functionalities and the control. New methods have been proposed for measuring the user muscle activity and translating it into the prosthesis control commands. Developing the feed-forward interface so that the prosthesis better follows the intention of the user is an important step towards improving the quality of life of people with limb amputation. However, prosthesis users can neither feel if something or someone is touching them over the prosthesis and nor perceive the temperature or roughness of objects. Prosthesis users are helped by looking at an object, but they cannot detect anything otherwise. Their sight gives them most information. Therefore, to foster the prosthesis embodiment and utility, it is necessary to have a prosthetic system that not only responds to the control signals provided by the user, but also transmits back to the user the information about the current state of the prosthesis. This thesis presents an electronic skin system to close the loop in prostheses towards the restoration of the sense of touch in prosthesis users. The proposed electronic skin system inlcudes an advanced distributed sensing (electronic skin), a system for (i) signal conditioning, (ii) data acquisition, and (iii) data processing, and a stimulation system. The idea is to integrate all these components into a myoelectric prosthesis. Embedding the electronic system and the sensing materials is a critical issue on the way of development of new prostheses. In particular, processing the data, originated from the electronic skin, into low- or high-level information is the key issue to be addressed by the embedded electronic system. Recently, it has been proved that the Machine Learning is a promising approach in processing tactile sensors information. Many studies have been shown the Machine Learning eectiveness in the classication of input touch modalities.More specically, this thesis is focused on the stimulation system, allowing the communication of a mechanical interaction from the electronic skin to prosthesis users, and the dedicated implementation of algorithms for processing tactile data originating from the electronic skin. On system level, the thesis provides design of the experimental setup, experimental protocol, and of algorithms to process tactile data. On architectural level, the thesis proposes a design ow for the implementation of digital circuits for both FPGA and integrated circuits, and techniques for the power management of embedded systems for Machine Learning algorithms

    Multi-objective Digital VLSI Design Optimisation

    Get PDF
    Modern VLSI design's complexity and density has been exponentially increasing over the past 50 years and recently reached a stage within its development, allowing heterogeneous, many-core systems and numerous functions to be integrated into a tiny silicon die. These advancements have revealed intrinsic physical limits of process technologies in advanced silicon technology nodes. Designers and EDA vendors have to handle these challenges which may otherwise result in inferior design quality, even failures, and lower design yields under time-to-market pressure. Multiple or many design objectives and constraints are emerging during the design process and often need to be dealt with simultaneously. Multi-objective evolutionary algorithms show flexible capabilities in maintaining multiple variable components and factors in uncertain environments. The VLSI design process involves a large number of available parameters both from designs and EDA tools. This provides many potential optimisation avenues where evolutionary algorithms can excel. This PhD work investigates the application of evolutionary techniques for digital VLSI design optimisation. Automated multi-objective optimisation frameworks, compatible with industrial design flows and foundry technologies, are proposed to improve solution performance, expand feasible design space, and handle complex physical floorplan constraints through tuning designs at gate-level. Methodologies for enriching standard cell libraries regarding drive strength are also introduced to cooperate with multi-objective optimisation frameworks, e.g., subsequent hill-climbing, providing a richer pool of solutions optimised for different trade-offs. The experiments of this thesis demonstrate that multi-objective evolutionary algorithms, derived from biological inspirations, can assist the digital VLSI design process, in an industrial design context, to more efficiently search for well-balanced trade-off solutions as well as optimised design space coverage. The expanded drive granularity of standard cells can push the performance of silicon technologies with offering improved solutions regarding critical objectives. The achieved optimisation results can better deliver trade-off solutions regarding power, performance and area metrics than using standard EDA tools alone. This has been not only shown for a single circuit solution but also covered the entire standard-tool-produced design space

    Applications of MATLAB in Science and Engineering

    Get PDF
    The book consists of 24 chapters illustrating a wide range of areas where MATLAB tools are applied. These areas include mathematics, physics, chemistry and chemical engineering, mechanical engineering, biological (molecular biology) and medical sciences, communication and control systems, digital signal, image and video processing, system modeling and simulation. Many interesting problems have been included throughout the book, and its contents will be beneficial for students and professionals in wide areas of interest

    Adiabatic Circuits for Power-Constrained Cryptographic Computations

    Get PDF
    This thesis tackles the need for ultra-low power operation in power-constrained cryptographic computations. An example of such an application could be smartcards. One of the techniques which has proven to have the potential of rendering ultra-low power operation is ‘Adiabatic Logic Technique’. However, the adiabatic circuits has associated challenges due to high energy dissipation of the Power-Clock Generator (PCG) and complexity of the multi-phase power-clocking scheme. Energy efficiency of the adiabatic system is often degraded due to the high energy dissipation of the PCG. In this thesis, nstep charging strategy using tank capacitors is considered for the power-clock generation and several design rules and trade-offs between the circuit complexity and energy efficiency of the PCG using n-step charging circuits have been proposed. Since pipelining is inherent in adiabatic logic design, careful selection of architecture is essential, as otherwise overhead in terms of area and energy due to synchronization buffers is induced specifically, in the case of adiabatic designs using 4-phase power-clocking scheme. Several architectures for the Montgomery multiplier using adiabatic logic technique are implemented and compared. An architecture which constitutes an appropriate trade-off between energy efficiency and throughput is proposed along with its methodology. Also, a strategy to reduce the overhead due to synchronization buffers is proposed. A modification in the Montgomery multiplication algorithm is proposed. Furthermore, a problem due to the application of power-clock gating in cascade stages of adiabatic logic is identified. The problem degrades the energy savings that would otherwise be obtained by the application of power-clock gating. A solution to this problem is proposed. Cryptographic implementations also present an obvious target for Power Analysis Attacks (PAA). There are several existing secure adiabatic logic designs which are proposed as a countermeasure against PAA. Shortcomings of the existing logic designs are identified, and two novel secure adiabatic logic designs are proposed as the countermeasures against PAA and improvement over the existing logic designs

    Power Reduction Techniques in Clock Distribution Networks with Emphasis on LC Resonant Clocking

    Get PDF
    In this thesis we propose a set of independent techniques in the overall concept of LC resonant clocking where each technique reduces power consumption and improve system performance. Low-power design is becoming a crucial design objective due to the growing demand on portable applications and the increasing difficulties in cooling and heat removal. The clock distribution network delivers the clock signal which acts as a reference to all sequential elements in the synchronous system. The clock distribution network consumes a considerable amount of power in synchronous digital systems. Resonant clocking is an emerging promising technique to reduce the power of the clock network. The inductor used in resonant clocking enables the conversion of the electric energy stored on the clock capacitance to magnetic energy in the inductor and vice versa. In this thesis, the concept of the slack in the clock skew has been extended for an LC fully-resonant clock distribution network. This extra slack in comparison to standard clock distribution networks can be used to reduce routing complexity, achieve reduction in wire elongation, total wire length, and power consumption. Simulation results illustrate that by utilizing the proposed approach, an average reduction of 53% in the number of wire elongations and 11% reduction in total wire length can be achieved. A dual-edge clocking scheme introduced in the literature to enable the operation of the flip-flop at the rising- and falling edges of the clock has been modified. The interval by which the charging elements in the flip-flop are being switched-on was reduced causing a reduction in power consumption. Simulating the flip-flop in STMicroelectronics 90-nm technology shows correct functionality of the Sense Amplifier flip-flop with a resonant clock signal of 500 MHz and a throughput of 1 GHz under process, voltage, and temperature (PVT) variations. Modeling the resonant system with the proposed flip-flop illustrates that dual-edge compared to single-edge triggering can achieve up to 58% reduction in power consumption when the clock capacitance is the dominating factor. The application of low-swing clocking to LC resonant clock distribution network has been investigated on-chip. The proposed low-swing resonant clocking scheme operates with one voltage supply and does not require an additional supply voltage. The Differential Conditional Capturing flip-flop introduced in the literature was modified to operate with a low-swing sinusoidal clock. Low-swing resonant clocking achieved around 5.8% reduction in total power with 5.7% area overhead. Modeling the clock network with the proposed flip-flop illustrates that low-swing clocking can achieve up to 58% reduction in the power consumption of the resonant clock. An analytical approach was introduced to estimate the required driver strength in the clock generator. Using the proposed approach early in the design stage reduces area and power overhead by eliminating the need for programmable switches in the driving circuit
    • …
    corecore