14 research outputs found

    Impacto da variabilidade PVT em somadores construídos com XORs

    Get PDF
    A operação de soma é a mais usada em Unidades Lógicas e Aritméticas (ULA). A ULA é a unidade mais importante no processamento de dados. Em sistemas digitais, é desejado um somador completo com baixo consumo de energia e um alto desempenho. O somador completo faz parte do caminho crítico em sistemas computacionais, ele pode ser implementado de diversas maneiras, a maioria delas tendo como seu principal sub-circuito a porta lógica OU-exclusivo (XOR). Consequentemente, o estudo de somadores completos compostos por combinações de portas lógicas XOR é de grande valia para pesquisas na literatura. Melhorias nos módulos aritméticos pode reduzir significativamente o consumo de potência dos sistemas, mas em tecnologias nanométricas é necessário considerar o impacto da variabilidade. Esse trabalho tem como objetivo analisar projetos de somadores completos que quando submetidos aos efeitos de variabilidade devem ser robustos, ter um bom desempenho e mostrar bons resultados em consumo de energia, quando estão operando em tensão nominal e em tensão de quase limiar. Além disso, foi utilizada uma técnica chamada de célula de desacoplamento (Dcell) visando uma alternativa para a redução da variabilidade de processo. Esse trabalho analisa e compara 4 somadores tradicionais e 9 somadores completos construídos através de 3 blocos lógicos, dos quais 2 deles são substituídos por portas lógicas XOR, em uma tecnologia FinFET de 7nm. Foi observado que circuitos somadores que foram construídos usando a XOR da família lógica CMOS, especialmente no segundo bloco, obtiveram piores resultados de desempenho e consumo energético. Somadores operando em tensão nominal são cerca de 80% mais robustos quanto ao impacto da variabilidade de processo no consumo máximo. A operação em quase limiar implica em uma alta sensibilidade no desempenho e consumo, alcançando mais de 300% nos piores casos. Em relação à variabilidade de processo, foi verificado um aumento de sensibilidade de cerca de 40% no desempenho quando foram utilizadas a XOR V5 e a XOR V8 no segundo bloco dos somadores quando operando em tensão nominal. Para a operação em tensão de quase limiar o uso da metodologia proposta nesse trabalho mostrou ser uma boa opção para alcançar uma maior robustez quanto ao consumo dos circuitos. Considerando o uso da Dcell, na operação em tensão nominal, foi verificado uma redução no desempenho juntamente com uma redução na variabilidade. O melhor caso foi o somador FAV5V8 que para um aumento de 20% no atraso, obteve uma redução de 20% na variabilidade. Em relação ao consumo, houve uma redução de 16% na potência dinâmica, juntamente com uma redução de quase 30% na variabilidade, como o que ocorreu com o somador FAV8V1. Foi possível observar casos de redução da variabilidade em mais de 40% com um pequeno aumento no consumo dinâmico. O uso dessa técnica teve um alto impacto nos resultados de circuitos que operavam em tensão de quase limiar, chegando em alguns casos a mais de 40% de redução do desempenho para uma pequena redução na variabilidade. Quanto ao consumo, nesse caso, os somadores tradicionais foram os menos afetados, e novamente o uso da XOR V8 no segundo bloco para construção dos somadores mostrou ser uma boa opção para aumento da robustez dos circuitos.The sum operation is the most used in the Arithmetic and Logic Units (ALU). In digital systems, a complete adder with low energy consumption and high performance is desired. The full adder is part of the critical path in computer systems. It can be implemented in several ways, most of them having the OR-exclusive logic gate (XOR) as its main sub-circuit. Consequently, the study of full adders composed of combinations of XOR logic gates has a great value in the literature. Improvements in arithmetic modules can significantly reduce the power consumption of systems, however, in nanometric technologies it is necessary to consider the impact of variability. This work aims to analyse designs of full adders considering variability effects, comparing performance and energy consumption when operating at nominal voltage and also at near threshold voltage. In addition, a technique called decoupling cell (Dcell) was used to provide an alternative for reducing process variability. This work analyses and compares four traditional adders and nine adders built using three logic blocks, where two of them are replaced by XOR logic gates, in a 7nm FinFET technology. It was observed that full adders that were built using the XOR of the CMOS logic family, especially in the second block, had worse results in performance and energy consumption. Full adders operating at nominal voltage regime are about 80% more robust in terms of the impact of process variability on maximum consumption. The near threshold operation implies a high sensitivity in performance and consumption, reaching more than 300% in the worst cases. Regarding the process variability, there was an increase in sensitivity of about 40% in performance when the XOR V5 and XOR V8 were used in the second block of the adder when operating at nominal voltage. For the voltage operation of near threshold, the use of the methodology proposed in this work demonstrate to be a good option to achieve greater robustness regarding the consumption of the circuits. Considering the use of Dcell, in the operation at nominal voltage, a reduction in performance was verified together with a reduction in variability. The best case was the adder FAV5V8 which for a 20% increase in delay, obtained a reduction of 20% in variability. In relation to dynamic consumption, there was a 16% reduction in power, together with a reduction of almost 30% in variability, as occurred with the FAV8V1 adder. It was possible to observe cases of reduced variability by more than 40% with a small increase in dynamic consumption. The use of this technique had a high impact on the results of circuits operating at near threshold voltage, in some cases reaching more than 40% reduction in performance for a small reduction in variability. For consumption, in this case, the traditional full adders were the least affected, and again the use of the XOR V8 in the second block for the construction of the adder proved to be a good option for increasing the robustness of the circuits

    ENERGY-EFFICIENT AND SECURE HARDWARE FOR INTERNET OF THINGS (IoT) DEVICES

    Get PDF
    Internet of Things (IoT) is a network of devices that are connected through the Internet to exchange the data for intelligent applications. Though IoT devices provide several advantages to improve the quality of life, they also present challenges related to security. The security issues related to IoT devices include leakage of information through Differential Power Analysis (DPA) based side channel attacks, authentication, piracy, etc. DPA is a type of side-channel attack where the attacker monitors the power consumption of the device to guess the secret key stored in it. There are several countermeasures to overcome DPA attacks. However, most of the existing countermeasures consume high power which makes them not suitable to implement in power constraint devices. IoT devices are battery operated, hence it is important to investigate the methods to design energy-efficient and secure IoT devices not susceptible to DPA attacks. In this research, we have explored the usefulness of a novel computing platform called adiabatic logic, low-leakage FinFET devices and Magnetic Tunnel Junction (MTJ) Logic-in-Memory (LiM) architecture to design energy-efficient and DPA secure hardware. Further, we have also explored the usefulness of adiabatic logic in the design of energy-efficient and reliable Physically Unclonable Function (PUF) circuits to overcome the authentication and piracy issues in IoT devices. Adiabatic logic is a low-power circuit design technique to design energy-efficient hardware. Adiabatic logic has reduced dynamic switching energy loss due to the recycling of charge to the power clock. As the first contribution of this dissertation, we have proposed a novel DPA-resistant adiabatic logic family called Energy-Efficient Secure Positive Feedback Adiabatic Logic (EE-SPFAL). EE-SPFAL based circuits are energy-efficient compared to the conventional CMOS based design because of recycling the charge after every clock cycle. Further, EE-SPFAL based circuits consume uniform power irrespective of input data transition which makes them resilience against DPA attacks. Scaling of CMOS transistors have served the industry for more than 50 years in providing integrated circuits that are denser, and cheaper along with its high performance, and low power. However, scaling of the transistors leads to increase in leakage current. Increase in leakage current reduces the energy-efficiency of the computing circuits,and increases their vulnerability to DPA attack. Hence, it is important to investigate the crypto circuits in low leakage devices such as FinFET to make them energy-efficient and DPA resistant. In this dissertation, we have proposed a novel FinFET based Secure Adiabatic Logic (FinSAL) family. FinSAL based designs utilize the low-leakage FinFET device along with adiabatic logic principles to improve energy-efficiency along with its resistance against DPA attack. Recently, Magnetic Tunnel Junction (MTJ)/CMOS based Logic-in-Memory (LiM) circuits have been explored to design low-power non-volatile hardware. Some of the advantages of MTJ device include non-volatility, near-zero leakage power, high integration density and easy compatibility with CMOS devices. However, the differences in power consumption between the switching of MTJ devices increase the vulnerability of Differential Power Analysis (DPA) based side-channel attack. Further, the MTJ/CMOS hybrid logic circuits which require frequent switching of MTJs are not very energy-efficient due to the significant energy required to switch the MTJ devices. In the third contribution of this dissertation, we have investigated a novel approach of building cryptographic hardware in MTJ/CMOS circuits using Look-Up Table (LUT) based method where the data stored in MTJs are constant during the entire encryption/decryption operation. Currently, high supply voltage is required in both writing and sensing operations of hybrid MTJ/CMOS based LiM circuits which consumes a considerable amount of energy. In order to meet the power budget in low-power devices, it is important to investigate the novel design techniques to design ultra-low-power MTJ/CMOS circuits. In the fourth contribution of this dissertation, we have proposed a novel energy-efficient Secure MTJ/CMOS Logic (SMCL) family. The proposed SMCL logic family consumes uniform power irrespective of data transition in MTJ and more energy-efficient compared to the state-of-art MTJ/ CMOS designs by using charge sharing technique. The other important contribution of this dissertation is the design of reliable Physical Unclonable Function (PUF). Physically Unclonable Function (PUF) are circuits which are used to generate secret keys to avoid the piracy and device authentication problems. However, existing PUFs consume high power and they suffer from the problem of generating unreliable bits. This dissertation have addressed this issue in PUFs by designing a novel adiabatic logic based PUF. The time ramp voltages in adiabatic PUF is utilized to improve the reliability of the PUF along with its energy-efficiency. Reliability of the adiabatic logic based PUF proposed in this dissertation is tested through simulation based temperature variations and supply voltage variations

    Crosstalk computing: circuit techniques, implementation and potential applications

    Get PDF
    Title from PDF of title [age viewed January 32, 2022Dissertation advisor: Mostafizur RahmanVitaIncludes bibliographical references (page 117-136)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2020This work presents a radically new computing concept for digital Integrated Circuits (ICs), called Crosstalk Computing. The conventional CMOS scaling trend is facing device scaling limitations and interconnect bottleneck. The other primary concern of miniaturization of ICs is the signal-integrity issue due to Crosstalk, which is the unwanted interference of signals between neighboring metal lines. The Crosstalk is becoming inexorable with advancing technology nodes. Traditional computing circuits always tries to reduce this Crosstalk by applying various circuit and layout techniques. In contrast, this research develops novel circuit techniques that can leverage this detrimental effect and convert it astutely to a useful feature. The Crosstalk is engineered into a logic computation principle by leveraging deterministic signal interference for innovative circuit implementation. This research work presents a comprehensive circuit framework for Crosstalk Computing and derives all the key circuit elements that can enable this computing model. Along with regular digital logic circuits, it also presents a novel Polymorphic circuit approach unique to Crosstalk Computing. In Polymorphic circuits, the functionality of a circuit can be altered using a control variable. Owing to the multi-functional embodiment in polymorphic-circuits, they find many useful applications such as reconfigurable system design, resource sharing, hardware security, and fault-tolerant circuit design, etc. This dissertation shows a comprehensive list of polymorphic logic gate implementations, which were not reported previously in any other work. It also performs a comparison study between Crosstalk polymorphic circuits and existing polymorphic approaches, which are either inefficient due to custom non-linear circuit styles or propose exotic devices. The ability to design a wide range of polymorphic logic circuits (basic and complex logics) compact in design and minimal in transistor count is unique to Crosstalk Computing, which leads to benefits in the circuit density, power, and performance. The circuit simulation and characterization results show a 6x improvement in transistor count, 2x improvement in switching energy, and 1.5x improvement in performance compared to counterpart implementation in CMOS circuit style. Nevertheless, the Crosstalk circuits also face issues while cascading the circuits; this research analyzes all the problems and develops auxiliary circuit techniques to fix the problems. Moreover, it shows a module-level cascaded polymorphic circuit example, which also employs the auxiliary circuit techniques developed. For the very first time, it implements a proof-of-concept prototype Chip for Crosstalk Computing at TSMC 65nm technology and demonstrates experimental evidence for runtime reconfiguration of the polymorphic circuit. The dissertation also explores the application potentials for Crosstalk Computing circuits. Finally, the future work section discusses the Electronic Design Automation (EDA) challenges and proposes an appropriate design flow; besides, it also discusses ideas for the efficient implementation of Crosstalk Computing structures. Thus, further research and development to realize efficient Crosstalk Computing structures can leverage the comprehensive circuit framework developed in this research and offer transformative benefits for the semiconductor industry.Introduction and Motivation -- More Moore and Relevant Beyond CMOS Research Directions -- Crosstalk Computing -- Crosstalk Circuits Based on Perception Model -- Crosstalk Circuit Types -- Cascading Circuit Issues and Sollutions -- Existing Polymorphic Circuit Approaches -- Crosstalk Polymorphic Circuits -- Comparison and Benchmarking of Crosstalk Gates -- Practical Realization of Crosstalk Gates -- Poential Applications -- Conclusion and Future Wor

    Near-Threshold Computing: Past, Present, and Future.

    Full text link
    Transistor threshold voltages have stagnated in recent years, deviating from constant-voltage scaling theory and directly limiting supply voltage scaling. To overcome the resulting energy and power dissipation barriers, energy efficiency can be improved through aggressive voltage scaling, and there has been increased interest in operating at near-threshold computing (NTC) supply voltages. In this region sizable energy gains are achieved with moderate performance loss, some of which can be regained through parallelism. This thesis first provides a methodical definition of how near to threshold is "near threshold" and continues with an in-depth examination of NTC across past, present, and future CMOS technologies. By systematically defining near-threshold, the trends and tradeoffs are analyzed, lending insight in how best to design and optimize near-threshold systems. NTC works best for technologies that feature good circuit delay scalability, therefore technologies without strong short-channel effects. Early planar technologies (prior to 90nm or so) featured good circuit scalability (8x energy gains), but lacked area in which to add cores for parallelization. Recent planar nodes (32nm – 20nm) feature more area for cores but suffer from poor delay scalability, and so are not well-suited for NTC (4x energy gains). The switch to FinFET CMOS technology allows for a return to strong voltage scalability (8x gain), reversing trends seen in planar technologies, while dark silicon has created an opportunity to add cores for parallelization. Improved FinFET voltage scalability even allows for latency reduction of a single task, as long as the task is sufficiently parallelizable (< 10% serial code). Finally, we will look at a technique for fast voltage boosting, called Shortstop, in which a core's operating voltage is raised in 10s of cycles. Shortstop can be used to quickly respond to single-threaded performance demands of a near-threshold system by leveraging the innate parasitic inductance of a dedicated dirty supply rail, further improving energy efficiency. The technique is demonstrated in a wirebond implementation and is able to boost a core up to 1.8x faster than a header-based approach, while reducing supply droop by 2-7x. An improved flip-chip architecture is also proposed.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113600/1/npfet_1.pd

    Bascules à impulsion robustes en technologie 28nm FDSOI pour circuits numériques basse consommation à très large gamme de tension d'alimentation

    Get PDF
    The explosion market of the mobile application and the paradigm of the Internet of Things lead to a huge demand for energy-efficient systems. To overcome the limit of Moore's law due to bulk technology, a new transistor technology has appeared recently in industrial process: the fully-depleted silicon on insulator, or FDSOI.In modern ASIC designs, a large portion of the total power consumption is due to the leaves of the clock tree: the flip-flops. Therefore, the appropriate flip-flop architecture is a major choice to reach the speed and energy constraints of mobile and ultra-low power applications. After a thorough overview of the literature, the explicit pulse-triggered flip-flop topology is pointed out as a very interesting flip-flop architecture for high-speed and low-power systems. However, it is today only used in high-performances circuits mainly because of its poor robustness at ultra-low voltage.In this work, explicit pulse-triggered flip-flops architecture design is developed and studied in order to improve their robustness and their energy-efficiency. A large comparison of resettable and scannable latch architecture is performed in the energy-delay domain by modifying the sizing of the transistors, both at nominal and ultra-low voltage. Then, it is shown that the back biasing technique allowed by the FDSOI technology provides better energy and delay performances than the sizing methodology. As the pulse generator is the main cause of functional failure, we proposed a new architecture which provides both a good robustness at ultra-low voltage and an energy efficiency. A selected topology of explicit pulse-triggered flip-flop was implemented in a 16x32b register file which exhibits better speed, energy consumption and area performances than a version with master-slave flip-flops, mainly thanks to the sharing of the pulse generator over several latches.Avec l'explosion du marché des applications portables et le paradigme de l'Internet des objets, la demande pour les circuits à très haute efficacité énergétique ne cesse de croître. Afin de repousser les limites de la loi de Moore, une nouvelle technologie est apparue très récemment dans les procédés industriels afin de remplacer la technologie en substrat massif ; elle est nommée fully-depleted silicon on insulator ou FDSOI. Dans les circuits numériques synchrones modernes, une grande portion de la consommation totale du circuit provient de l'arbre d'horloge, et en particulier son extrémité : les bascules. Dès lors, l'architecture adéquate de bascules est un choix crucial pour atteindre les contraintes de vitesse et d'énergie des applications basse-consommation. Après un large aperçu de l'état de l'art, les bascules à impulsion explicite sont reconnues les plus prometteuses pour les systèmes demandant une haute performance et une basse consommation. Cependant, cette architecture est pour l'instant fortement utilisée dans les circuits à haute performance et pratiquement absente des circuits à basse tension d'alimentation, principalement à cause de sa faible robustesse face aux variations.Dans ce travail, la conception d'architecture de bascule à impulsion explicite est étudiée dans le but d'améliorer la robustesse et l'efficacité énergétique. Un large panel d'architectures de bascule, avec les fonctions reset et scan, a été comparé dans le domaine énergie-délais, à haute et basse tension d'alimentation, grâce à une méthodologie de dimensionnement des transistors. Il a été montré que la technique dite de « back bias », l'un des principaux avantages de la technologie FDSOI, permettait des meilleures performances en énergie et délais que la méthodologie de dimensionnement. Ensuite, comme le générateur d'impulsion est la principale raison de dysfonctionnement, nous avons proposé une nouvelle architecture qui permet un très bon compromis entre robustesse à faible tension et consommation énergétique. Une topologie de bascule à impulsion explicite a été choisie pour être implémentée dans un banc de registres et, comparé aux bascules maître-esclave, elle présente une plus grande vitesse, une plus faible consommation énergétique et une plus petite surface

    Soft-Error Resilience Framework For Reliable and Energy-Efficient CMOS Logic and Spintronic Memory Architectures

    Get PDF
    The revolution in chip manufacturing processes spanning five decades has proliferated high performance and energy-efficient nano-electronic devices across all aspects of daily life. In recent years, CMOS technology scaling has realized billions of transistors within large-scale VLSI chips to elevate performance. However, these advancements have also continually augmented the impact of Single-Event Transient (SET) and Single-Event Upset (SEU) occurrences which precipitate a range of Soft-Error (SE) dependability issues. Consequently, soft-error mitigation techniques have become essential to improve systems\u27 reliability. Herein, first, we proposed optimized soft-error resilience designs to improve robustness of sub-micron computing systems. The proposed approaches were developed to deliver energy-efficiency and tolerate double/multiple errors simultaneously while incurring acceptable speed performance degradation compared to the prior work. Secondly, the impact of Process Variation (PV) at the Near-Threshold Voltage (NTV) region on redundancy-based SE-mitigation approaches for High-Performance Computing (HPC) systems was investigated to highlight the approach that can realize favorable attributes, such as reduced critical datapath delay variation and low speed degradation. Finally, recently, spin-based devices have been widely used to design Non-Volatile (NV) elements such as NV latches and flip-flops, which can be leveraged in normally-off computing architectures for Internet-of-Things (IoT) and energy-harvesting-powered applications. Thus, in the last portion of this dissertation, we design and evaluate for soft-error resilience NV-latching circuits that can achieve intriguing features, such as low energy consumption, high computing performance, and superior soft errors tolerance, i.e., concurrently able to tolerate Multiple Node Upset (MNU), to potentially become a mainstream solution for the aerospace and avionic nanoelectronics. Together, these objectives cooperate to increase energy-efficiency and soft errors mitigation resiliency of larger-scale emerging NV latching circuits within iso-energy constraints. In summary, addressing these reliability concerns is paramount to successful deployment of future reliable and energy-efficient CMOS logic and spintronic memory architectures with deeply-scaled devices operating at low-voltages

    Heterogeneous Chip Multiprocessor: Data Representation, Mixed-Signal Processing Tiles, and System Design

    Get PDF
    With the emergence of big data, the need for more computationally intensive processors that can handle the increased processing demand has risen. Conventional computing paradigms based on the Von Neumann model that separates computational and memory structures have become outdated and less efficient for this increased demand. As the speed and memory density of processors have increased significantly over the years, these models of computing, which rely on a constant stream of data between the processor and memory, see less gains due to finite bandwidth and latency. Moreover, in the presence of extreme scaling, these conventional systems, implemented in submicron integrated circuits, have become even more susceptible to process variability, static leakage current, and more. In this work, alternative paradigms, predicated on distributive processing with robust data representation and mixed-signal processing tiles, are explored for constructing more efficient and scalable computing systems in application specific integrated circuits (ASICs). The focus of this dissertation work has been on heterogeneous chip multi-processor (CMP) design and optimization across different levels of abstraction. On the level of data representation, a different modality of representation based on random pulse density modulation (RPDM) coding is explored for more efficient processing using stochastic computation. On the level of circuit description, mixed-signal integrated circuits that exploit charge-based computing for energy efficient fixed point arithmetic are designed. Consequently, 8 different chips that test and showcase these circuits were fabricated in submicron CMOS processes. Finally, on the architectural level of description, a compact instruction-set processor and controller that facilitates distributive computing on System-On-Chips (SoCs) is designed. In addition to this, a robust bufferless network architecture is designed with a network simulator, and I/O cells are designed for SoCs. The culmination of this thesis work has led to the design and fabrication of a heterogeneous chip multi- processor prototype comprised of over 12,000 VVM cores, warp/dewarp processors, cache, and additional processors, which can be applied towards energy efficient large-scale data processing

    Revamping Timing Error Resilience to Tackle Choke Points at NTC

    Get PDF
    The growing market of portable devices and smart wearables has contributed to innovation and development of systems with longer battery-life. While Near Threshold Computing (NTC) systems address the need for longer battery-life, they have certain limitations. NTC systems are prone to be significantly affected by variations in the fabrication process, commonly called process variation (PV). This dissertation explores an intriguing effect of PV, called choke points. Choke points are especially important due to their multifarious influence on the functional correctness of an NTC system. This work shows why novel research is required in this direction and proposes two techniques to resolve the problems created by choke points, while maintaining the reduced power needs
    corecore