1,342 research outputs found

    IDPAL – A Partially-Adiabatic Energy-Efficient Logic Family: Theory and Applications to Secure Computing

    Get PDF
    Low-power circuits and issues associated with them have gained a significant amount of attention in recent years due to the boom in portable electronic devices. Historically, low-power operation relied heavily on technology scaling and reduced operating voltage, however this trend has been slowing down recently due to the increased power density on chips. This dissertation introduces a new very-low power partially-adiabatic logic family called Input-Decoupled Partially-Adiabatic Logic (IDPAL) with applications in low-power circuits. Experimental results show that IDPAL reduces energy usage by 79% compared to equivalent CMOS implementations and by 25% when compared to the best adiabatic implementation. Experiments ranging from a simple buffer/inverter up to a 32-bit multiplier are explored and result in consistent energy savings, showing that IDPAL could be a viable candidate for a low-power circuit implementation. This work also shows an application of IDPAL to secure low-power circuits against power analysis attacks. It is often assumed that encryption algorithms are perfectly secure against attacks, however, most times attacks using side channels on the hardware implementation of an encryption operation are not investigated. Power analysis attacks are a subset of side channel attacks and can be implemented by measuring the power used by a circuit during an encryption operation in order to obtain secret information from the circuit under attack. Most of the previously proposed solutions for power analysis attacks use a large amount of power and are unsuitable for a low-power application. The almost-equal energy consumption for any given input in an IDPAL circuit suggests that this logic family is a good candidate for securing low-power circuits again power analysis attacks. Experimental results ranging from small circuits to large multipliers are performed and the power-analysis attack resistance of IDPAL is investigated. Results show that IDPAL circuits are not only low-power but also the most secure against power analysis attacks when compared to other adiabatic low-power circuits. Finally, a hybrid adiabatic-CMOS microprocessor design is presented. The proposed microprocessor uses IDPAL for the implementation of circuits with high switching activity (e.g. ALU) and CMOS logic for other circuits (e.g. memory, controller). An adiabatic-CMOS interface for transforming adiabatic signals to square-wave signals is presented and issues associated with a hybrid implementation and their solutions are also discussed

    NOVEL RESOURCE EFFICIENT CIRCUIT DESIGNS FOR REBOOTING COMPUTING

    Get PDF
    CMOS based computing is reaching its limits. To take computation beyond Moores law (the number of transistors and hence processing power on a chip doubles every 18 months to 3 years) requires research explorations in (i) new materials, devices, and processes, (ii) new architectures and algorithms, (iii) new paradigm of logic bit representation. The focus is on fundamental new ways to compute under the umbrella of rebooting computing such as spintronics, quantum computing, adiabatic and reversible computing. Therefore, this thesis highlights explicitly Quantum computing and Adiabatic logic, two new computing paradigms that come under the umbrella of rebooting computing. Quantum computing is investigated for its promising application in high-performance computing. The first contribution of this thesis is the design of two resource-efficient designs for quantum integer division. The first design is based on non-restoring division algorithm and the second one is based on restoring division algorithm. Both the designs are compared and shown to be superior to the existing work in terms of T-count and T-depth. The proliferation of IoT devices which work on low-power also has drawn interests to the rebooting computing. Hence, the second contribution of this thesis is proving that Adiabatic Logic is a promising candidate for implementation in IoT devices. The adiabatic logic family called Symmetric Pass Gate Adiabatic Logic (SPGAL) is implemented in PRESENT-80 lightweight algorithm. Adiabatic Logic is extended to emerging transistor devices

    Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing

    Get PDF
    How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one

    Emerging Technologies - NanoMagnets Logic (NML)

    Get PDF
    In the last decades CMOS technology has ruled the electronic scenario thanks to the constant scaling of transistor sizes. With the reduction of transistor sizes circuit area decreases, clock frequency increases and power consumption decreases accordingly. However CMOS scaling is now approaching its physical limits and many believe that CMOS technology will not be able to reach the end of the Roadmap. This is mainly due to increasing difficulties in the fabrication process, that is becoming very expensive, and to the unavoidable impact of leakage losses, particularly thanks to gate tunnel current. In this scenario many alternative technologies are studied to overcome the limitations of CMOS transistors. Among these possibilities, magnetic based technologies, like NanoMagnet Logic (NML) are among the most interesting. The reason of this interest lies in their magnetic nature, that opens up entire new possibilities in the design of logic circuits, like the possibility to mix logic and memory in the same device. Moreover they have no standby power consumption and potentially a much lower power consumption of CMOS transistors. In literature NML logic is well studied and theoretical and experimental proofs of concept were already found. However two important points are not enough considered in the analysis approach followed by most of the work in literature. First of all, no complex circuits are analyzed. NML logic is very different from CMOS technologies, so to completely understand the potential of this technology it is mandatory to investigate complex architectures. Secondly, most of the solutions proposed do not take into account the constraints derived from fabrication process, making them unrealistic and difficult to be fabricated experimentally. This thesis focuses therefore on NML logic keeping into account these two important limitations in the research approach followed in literature. The aim is to obtain a complete and accurate overview of NML logic, finding realistic circuital solutions and trying to improve at the same time their performance. After a brief and complete introduction (Chapter 1), the thesis is divided in two parts, which cover the two fundamental points followed in this three years of research: A circuits architecture analysis and a technological analysis. In the architecture analysis first an innovative VHDL model is described in Chapter 2. This model is extensively used in the analysis because it allows fast simulation of complex circuits, with, at the same time, the possibility to estimate circuit per- formance, like area and power consumption. In Chapter 3 the problem of signals synchronization in complex NML circuits is analyzed and solved, using as benchmark a simple but complete NML microprocessor. Different solutions based on asynchronous logic are studied and a new asynchronous solution, specifically designed to exploit the potential of NML logic, is developed. In Chapter 4 the layout of NML circuits is studied on a more physical level, considering the limitations of fabrication processes. The layout of NML circuits is therefore changed accordingly to these constraints. Secondly CMOS circuits architectures are compared to more simple architectures, evaluating therefore which one is more suited for NML logic. Finally the problem of interconnections in NML technology is analyzed and solutions to improve it are found. In Chapter 5 the problem of feedback signals in heavy pipelined technologies, like NML, is studied. Solutions to improve performances and synchronize signals are developed. Systolic arrays are then analyzed as possible candidate to exploit NML potential. Finally in Chapter 6 ToPoliNano, a simulator dedicated to NML and other emerging technologies, that we are developing, is described. This simulator allows to follow the same top-down approach followed for CMOS technology. The layout generator and the simulation engine are detailed described. In the first chapter of the technological analysis (Chapter 7), the performance of NML logic is explored throughout low level simulations. The aim is to understand if these circuits can be fabricated with optical lithography, allowing therefore the commercial development of NML logic. Basic logic gates and the clock system are there analyzed from a low level perspective. In Chapter 8 an innovative electric clock system for NML technology is shown and the first experimental results are reported. This clock system allows to achieve true low power for NML technology, obtaining a reduction of power consumption of 20 times considering the best CMOS transistors available. This power consumption takes into account all the losses, also the clock system losses. Moreover the solution presented can be fabricated with current technological processes. The research work behind this thesis represents an important breakthrough in NML logic. The solutions here presented allow the design and fabrication of complex NML circuits, considering the particular characteristics of this technology and considerably improving the performance. Moreover the technological solutions here presented allow the design and fabrication of circuits with available fabrication process with a considerable advantage over CMOS in terms of power consumption. This thesis represents therefore a considerable step froward in the study and development of NML technolog

    Designing Novel Hardware Security Primitives for Smart Computing Devices

    Get PDF
    Smart computing devices are miniaturized electronics devices that can sense their surroundings, communicate, and share information autonomously with other devices to work cohesively. Smart devices have played a major role in improving quality of the life and boosting the global economy. They are ubiquitously present, smart home, smart city, smart girds, industry, healthcare, controlling the hazardous environment, and military, etc. However, we have witnessed an exponential rise in potential threat vectors and physical attacks in recent years. The conventional software-based security approaches are not suitable in the smart computing device, therefore, hardware-enabled security solutions have emerged as an attractive choice. Developing hardware security primitives, such as True Random Number Generator (TRNG) and Physically Unclonable Function (PUF) from electrical properties of the sensor could be a novel research direction. Secondly, the Lightweight Cryptographic (LWC) ciphers used in smart computing devices are found vulnerable against Correlation Power Analysis (CPA) attack. The CPA performs statistical analysis of the power consumption of the cryptographic core and reveals the encryption key. The countermeasure against CPA results in an increase in energy consumption, therefore, they are not suitable for battery operated smart computing devices. The primary goal of this dissertation is to develop novel hardware security primitives from existing sensors and energy-efficient LWC circuit implementation with CPA resilience. To achieve these. we focus on developing TRNG and PUF from existing photoresistor and photovoltaic solar cell sensors in smart devices Further, we explored energy recovery computing (also known as adiabatic computing) circuit design technique that reduces the energy consumption compared to baseline CMOS logic design and same time increasing CPA resilience in low-frequency applications, e.g. wearable fitness gadgets, hearing aid and biomedical instruments. The first contribution of this dissertation is to develop a TRNG prototype from the uncertainty present in photoresistor sensors. The existing sensor-based TRNGs suffer a low random bit generation rate, therefore, are not suitable in real-time applications. The proposed prototype has an average random bit generation rate of 8 kbps, 32 times higher than the existing sensor-based TRNG. The proposed lightweight scrambling method results in random bit entropy close to ideal value 1. The proposed TRNG prototype passes all 15 statistical tests of the National Institute of Standards and Technology (NIST) Statistical Test Suite with quality performance. The second contribution of this dissertation is to develop an integrated TRNG-PUF designed using photovoltaic solar cell sensors. The TRNG and PUF are mutually independent in the way they are designed, therefore, integrating them as one architecture can be beneficial in resource-constrained computing devices. We propose a novel histogram-based technique to segregate photovoltaic solar cell sensor response suitable for TRNG and PUF respectively. The proposed prototype archives approximately 34\% improvement in TRNG output. The proposed prototype achieves an average of 92.13\% reliability and 50.91\% uniformity performance in PUF response. The proposed sensor-based hardware security primitives do not require additional interfacing hardware. Therefore, they can be ported as a software update on existing photoresistor and photovoltaic sensor-based devices. Furthermore, the sensor-based design approach can identify physically tempered and faulty sensor nodes during authentication as their response bit differs. The third contribution is towards the development of a novel 2-phase sinusoidal clocking implementation, 2-SPGAL for existing Symmetric Pass Gate Adiabatic Logic (SPGAL). The proposed 2-SPGAL logic-based LWC cipher PRESENT shows an average of 49.34\% energy saving compared to baseline CMOS logic implementation. Furthermore, the 2-SPGAL prototype has an average of 22.76\% better energy saving compared to 2-EE-SPFAL (2-phase Energy-Efficient-Secure Positive Feedback Adiabatic Logic). The proposed 2-SPGAL was tested for energy-efficiency performance for the frequency range of 50 kHz to 250 kHz, used in healthcare gadgets and biomedical instruments. The proposed 2-SPGAL based design saves 16.78\% transistor count compared to 2-EE-SPFAL counterpart. The final contribution is to explore Clocked CMOS Adiabatic Logic (CCAL) to design a cryptographic circuit. Previously proposed 2-SPGAL and 2-EE-SPFAL uses two complementary pairs of the transistor evaluation network, thus resulting in a higher transistor count compared to the CMOS counterpart. The CCAL structure is very similar to CMOS and unlike 2-SPGAL and 2-EE-SPFAL, it does not require discharge circuitry to improve security performance. The case-study implementation LWC cipher PRESENT S-Box using CCAL results into 45.74\% and 34.88\% transistor count saving compared to 2-EE-SPFAL and 2-SPGAL counterpart. Furthermore, the case-study implementation using CCAL shows more than 95\% energy saving compared to CMOS logic at frequency range 50 kHz to 125 kHz, and approximately 60\% energy saving at frequency 250 kHz. The case study also shows 32.67\% and 11.21\% more energy saving compared to 2-EE-SPFAL and 2-SPGAL respectively at frequency 250 kHz. We also show that 200 fF of tank capacitor in the clock generator circuit results in optimum energy and security performance in CCAL

    Scalable Energy-Recovery Architectures.

    Full text link
    Energy efficiency is a critical challenge for today's integrated circuits, especially for high-end digital signal processing and communications that require both high throughput and low energy dissipation for extended battery life. Charge-recovery logic recovers and reuses charge using inductive elements and has the potential to achieve order-of-magnitude improvement in energy efficiency while maintaining high performance. However, the lack of large-scale high-speed silicon demonstrations and inductor area overheads are two major concerns. This dissertation focuses on scalable charge-recovery designs. We present a semi-automated design flow to enable the design of large-scale charge-recovery chips. We also present a new architecture that uses in-package inductors, eliminating the area overheads caused by the use of integrated inductors in high-performance charge-recovery chips. To demonstrate our semi-automated flow, which uses custom-designed standard-cell-like dynamic cells, we have designed a 576-bit charge-recovery low-density parity-check (LDPC) decoder chip. Functioning correctly at clock speeds above 1 GHz, this prototype is the first-ever demonstration of a GHz-speed charge-recovery chip of significant complexity. In terms of energy consumption, this chip improves over recent state-of-the-art LDPCs by at least 1.3 times with comparable or better area efficiency. To demonstrate our architecture for eliminating inductor overheads, we have designed a charge-recovery LDPC decoder chip with in-package inductors. This test-chip has been fabricated in a 65nm CMOS flip-chip process. A custom 6-layer FC-BGA package substrate has been designed with 16 inductors embedded in the fifth layer of the package substrate, yielding higher Q and significantly improving area efficiency and energy efficiency compared to their on-chip counterparts. From measurements, this chip achieves at least 2.3 times lower energy consumption with better area efficiency over state-of-the-art published designs.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116653/1/terryou_1.pd

    Energy-Efficient Neural Network Architectures

    Full text link
    Emerging systems for artificial intelligence (AI) are expected to rely on deep neural networks (DNNs) to achieve high accuracy for a broad variety of applications, including computer vision, robotics, and speech recognition. Due to the rapid growth of network size and depth, however, DNNs typically result in high computational costs and introduce considerable power and performance overheads. Dedicated chip architectures that implement DNNs with high energy efficiency are essential for adding intelligence to interactive edge devices, enabling them to complete increasingly sophisticated tasks by extending battery lie. They are also vital for improving performance in cloud servers that support demanding AI computations. This dissertation focuses on architectures and circuit technologies for designing energy-efficient neural network accelerators. First, a deep-learning processor is presented for achieving ultra-low power operation. Using a heterogeneous architecture that includes a low-power always-on front-end and a selectively-enabled high-performance back-end, the processor dynamically adjusts computational resources at runtime to support conditional execution in neural networks and meet performance targets with increased energy efficiency. Featuring a reconfigurable datapath and a memory architecture optimized for energy efficiency, the processor supports multilevel dynamic activation of neural network segments, performing object detection tasks with 5.3x lower energy consumption in comparison with a static execution baseline. Fabricated in 40nm CMOS, the processor test-chip dissipates 0.23mW at 5.3 fps. It demonstrates energy scalability up to 28.6 TOPS/W and can be configured to run a variety of workloads, including severely power-constrained ones such as always-on monitoring in mobile applications. To further improve the energy efficiency of the proposed heterogeneous architecture, a new charge-recovery logic family, called zero-short-circuit current (ZSCC) logic, is proposed to decrease the power consumption of the always-on front-end. By relying on dedicated circuit topologies and a four-phase clocking scheme, ZSCC operates with significantly reduced short-circuit currents, realizing order-of-magnitude power savings at relatively low clock frequencies (in the order of a few MHz). The efficiency and applicability of ZSCC is demonstrated through an ANSI S1.11 1/3 octave filter bank chip for binaural hearing aids with two microphones per ear. Fabricated in a 65nm CMOS process, this charge-recovery chip consumes 13.8µW with a 1.75MHz clock frequency, achieving 9.7x power reduction per input in comparison with a 40nm monophonic single-input chip that represents the published state of the art. The ability of ZSCC to further increase the energy efficiency of the heterogeneous neural network architecture is demonstrated through the design and evaluation of a ZSCC-based front-end. Simulation results show 17x power reduction compared with a conventional static CMOS implementation of the same architecture.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147614/1/hsiwu_1.pd

    Average-case analysis of power consumption in embedded systems

    Get PDF
    Power efficiency is one of the most important constraints in the design of embedded systems since such systems are generally driven by batteries with limited energy budget or restricted power supply. In every embedded system, there are one or more processor cores to run the software and interact with the other hardware components of the system. The power consumption of the processor core(s) has an important impact on the total power dissipated in the system. Hence, the processor power optimization is crucial in satisfying the power consumption constraints, and developing low-power embedded systems. A key aspect of research in processor power optimization and management is “power estimation”. Having a fast and accurate method for processor power estimation at design time helps the designer to explore a large space of design possibilities, to make the optimal choices for developing a power efficient processor. Likewise, understanding the processor power dissipation behaviour of a specific software/application is the key for choosing appropriate algorithms in order to write power efficient software. Simulation-based methods for measuring the processor power achieve very high accuracy, but are available only late in the design process, and are often quite slow. Therefore, the need has arisen for faster, higher-level power prediction methods that allow the system designer to explore many alternatives for developing powerefficient hardware and software. The aim of this thesis is to present fast and high-level power models for the prediction of processor power consumption. Power predictability in this work is achieved in two ways: first, using a design method to develop power predictable circuits; second, analysing the power of the functions in the code which repeat during execution, then building the power model based on average number of repetitions. In the first case, a design method called Asynchronous Charge Sharing Logic (ACSL) is used to implement the Arithmetic Logic Unit (ALU) for the 8051 microcontroller. The ACSL circuits are power predictable due to the independency of their power consumption to the input data. Based on this property, a fast prediction method is presented to estimate the power of ALU by analysing the software program, and extracting the number of ALU-related instructions. This method achieves less than 1% error in power estimation and more than 100 times speedup in comparison to conventional simulation-based methods. In the second case, an average-case processor energy model is developed for the Insertion sort algorithm based on the number of comparisons that take place in the execution of the algorithm. The average number of comparisons is calculated using a high level methodology called MOdular Quantitative Analysis (MOQA). The parameters of the energy model are measured for the LEON3 processor core, but the model is general and can be used for any processor. The model has been validated through the power measurement experiments, and offers high accuracy and orders of magnitude speedup over the simulation-based method
    • …
    corecore