987 research outputs found

    Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing

    Get PDF
    How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one

    Designing Novel Hardware Security Primitives for Smart Computing Devices

    Get PDF
    Smart computing devices are miniaturized electronics devices that can sense their surroundings, communicate, and share information autonomously with other devices to work cohesively. Smart devices have played a major role in improving quality of the life and boosting the global economy. They are ubiquitously present, smart home, smart city, smart girds, industry, healthcare, controlling the hazardous environment, and military, etc. However, we have witnessed an exponential rise in potential threat vectors and physical attacks in recent years. The conventional software-based security approaches are not suitable in the smart computing device, therefore, hardware-enabled security solutions have emerged as an attractive choice. Developing hardware security primitives, such as True Random Number Generator (TRNG) and Physically Unclonable Function (PUF) from electrical properties of the sensor could be a novel research direction. Secondly, the Lightweight Cryptographic (LWC) ciphers used in smart computing devices are found vulnerable against Correlation Power Analysis (CPA) attack. The CPA performs statistical analysis of the power consumption of the cryptographic core and reveals the encryption key. The countermeasure against CPA results in an increase in energy consumption, therefore, they are not suitable for battery operated smart computing devices. The primary goal of this dissertation is to develop novel hardware security primitives from existing sensors and energy-efficient LWC circuit implementation with CPA resilience. To achieve these. we focus on developing TRNG and PUF from existing photoresistor and photovoltaic solar cell sensors in smart devices Further, we explored energy recovery computing (also known as adiabatic computing) circuit design technique that reduces the energy consumption compared to baseline CMOS logic design and same time increasing CPA resilience in low-frequency applications, e.g. wearable fitness gadgets, hearing aid and biomedical instruments. The first contribution of this dissertation is to develop a TRNG prototype from the uncertainty present in photoresistor sensors. The existing sensor-based TRNGs suffer a low random bit generation rate, therefore, are not suitable in real-time applications. The proposed prototype has an average random bit generation rate of 8 kbps, 32 times higher than the existing sensor-based TRNG. The proposed lightweight scrambling method results in random bit entropy close to ideal value 1. The proposed TRNG prototype passes all 15 statistical tests of the National Institute of Standards and Technology (NIST) Statistical Test Suite with quality performance. The second contribution of this dissertation is to develop an integrated TRNG-PUF designed using photovoltaic solar cell sensors. The TRNG and PUF are mutually independent in the way they are designed, therefore, integrating them as one architecture can be beneficial in resource-constrained computing devices. We propose a novel histogram-based technique to segregate photovoltaic solar cell sensor response suitable for TRNG and PUF respectively. The proposed prototype archives approximately 34\% improvement in TRNG output. The proposed prototype achieves an average of 92.13\% reliability and 50.91\% uniformity performance in PUF response. The proposed sensor-based hardware security primitives do not require additional interfacing hardware. Therefore, they can be ported as a software update on existing photoresistor and photovoltaic sensor-based devices. Furthermore, the sensor-based design approach can identify physically tempered and faulty sensor nodes during authentication as their response bit differs. The third contribution is towards the development of a novel 2-phase sinusoidal clocking implementation, 2-SPGAL for existing Symmetric Pass Gate Adiabatic Logic (SPGAL). The proposed 2-SPGAL logic-based LWC cipher PRESENT shows an average of 49.34\% energy saving compared to baseline CMOS logic implementation. Furthermore, the 2-SPGAL prototype has an average of 22.76\% better energy saving compared to 2-EE-SPFAL (2-phase Energy-Efficient-Secure Positive Feedback Adiabatic Logic). The proposed 2-SPGAL was tested for energy-efficiency performance for the frequency range of 50 kHz to 250 kHz, used in healthcare gadgets and biomedical instruments. The proposed 2-SPGAL based design saves 16.78\% transistor count compared to 2-EE-SPFAL counterpart. The final contribution is to explore Clocked CMOS Adiabatic Logic (CCAL) to design a cryptographic circuit. Previously proposed 2-SPGAL and 2-EE-SPFAL uses two complementary pairs of the transistor evaluation network, thus resulting in a higher transistor count compared to the CMOS counterpart. The CCAL structure is very similar to CMOS and unlike 2-SPGAL and 2-EE-SPFAL, it does not require discharge circuitry to improve security performance. The case-study implementation LWC cipher PRESENT S-Box using CCAL results into 45.74\% and 34.88\% transistor count saving compared to 2-EE-SPFAL and 2-SPGAL counterpart. Furthermore, the case-study implementation using CCAL shows more than 95\% energy saving compared to CMOS logic at frequency range 50 kHz to 125 kHz, and approximately 60\% energy saving at frequency 250 kHz. The case study also shows 32.67\% and 11.21\% more energy saving compared to 2-EE-SPFAL and 2-SPGAL respectively at frequency 250 kHz. We also show that 200 fF of tank capacitor in the clock generator circuit results in optimum energy and security performance in CCAL

    The development of low temperature heat capacity results : a heat capacity study of some chloroammine cobalt(III) compounds

    Get PDF
    A laboratory for heat capacity measurements by adiabatic calorimetry in the range from 2 to 100 kelvin was developed. A complete vacuum and gas-handling system was designed and constructed to serve a refurbished pumped-helium cryostat. Automatic data-collection, real-time display of temperature and semi-automation of heating and adiabatic control were achieved with a low-cost interface and microcomputer. The apparatus was calibrated with a benzoic acid thermometric standard. The heat capacity of chloropentaammine cobalt(III) dichloride was measured from 3.8 to 100 kelvin; thermodynamic functions are quoted from 6 to 100 kelvin. The heat capacity of trans-dichloro-tetrammine cobaltdil) chloride was measured from 4.7 to 100 kelvin; thermodynamic functions are quoted from 10 to 100 kelvin. No thermal anomalies were observed in these temperature ranges

    Low power JPEG2000 5/3 discrete wavelet transform algorithm and architecture

    Get PDF

    CIRCUITS AND ARCHITECTURE FOR BIO-INSPIRED AI ACCELERATORS

    Get PDF
    Technological advances in microelectronics envisioned through Moore’s law have led to powerful processors that can handle complex and computationally intensive tasks. Nonetheless, these advancements through technology scaling have come at an unfavorable cost of significantly larger power consumption, which has posed challenges for data processing centers and computers at scale. Moreover, with the emergence of mobile computing platforms constrained by power and bandwidth for distributed computing, the necessity for more energy-efficient scalable local processing has become more significant. Unconventional Compute-in-Memory architectures such as the analog winner-takes-all associative-memory and the Charge-Injection Device processor have been proposed as alternatives. Unconventional charge-based computation has been employed for neural network accelerators in the past, where impressive energy efficiency per operation has been attained in 1-bit vector-vector multiplications, and in recent work, multi-bit vector-vector multiplications. In the latter, computation was carried out by counting quanta of charge at the thermal noise limit, using packets of about 1000 electrons. These systems are neither analog nor digital in the traditional sense but employ mixed-signal circuits to count the packets of charge and hence we call them Quasi-Digital. By amortizing the energy costs of the mixed-signal encoding/decoding over compute-vectors with many elements, high energy efficiencies can be achieved. In this dissertation, I present a design framework for AI accelerators using scalable compute-in-memory architectures. On the device level, two primitive elements are designed and characterized as target computational technologies: (i) a multilevel non-volatile cell and (ii) a pseudo Dynamic Random-Access Memory (pseudo-DRAM) bit-cell. At the level of circuit description, compute-in-memory crossbars and mixed-signal circuits were designed, allowing seamless connectivity to digital controllers. At the level of data representation, both binary and stochastic-unary coding are used to compute Vector-Vector Multiplications (VMMs) at the array level. Finally, on the architectural level, two AI accelerator for data-center processing and edge computing are discussed. Both designs are scalable multi-core Systems-on-Chip (SoCs), where vector-processor arrays are tiled on a 2-layer Network-on-Chip (NoC), enabling neighbor communication and flexible compute vs. memory trade-off. General purpose Arm/RISCV co-processors provide adequate bootstrapping and system-housekeeping and a high-speed interface fabric facilitates Input/Output to main memory

    HDL-based Synthesis of Reversible Circuits : A Scalable Design Approach

    Get PDF
    Reversible computing is a promising research field due to its applications in several emerging technologies. Accordingly, several approaches for the design of reversible circuits have been introduced. Hardware Description Languages approach scales better than other methodologies, however, its main drawback is substantial amounts of additional circuit lines. This dissertation is an important step towards an elaborated scalable design flow of reversible circuits. In which, HDL-based design of reversible circuit is optimised, with line-awareness considered as the main objective. A line-aware programming style for a dedicated reversible hardware description language SyReC is proposed. Another contribution is a line-aware computation of HDL expressions. Reversible circuits' synthesis from a conventional hardware description language (VHDL) is examined. Finally, syntactical extensions to the dedicated hardware description language SyReC are suggested

    Emerging Technologies - NanoMagnets Logic (NML)

    Get PDF
    In the last decades CMOS technology has ruled the electronic scenario thanks to the constant scaling of transistor sizes. With the reduction of transistor sizes circuit area decreases, clock frequency increases and power consumption decreases accordingly. However CMOS scaling is now approaching its physical limits and many believe that CMOS technology will not be able to reach the end of the Roadmap. This is mainly due to increasing difficulties in the fabrication process, that is becoming very expensive, and to the unavoidable impact of leakage losses, particularly thanks to gate tunnel current. In this scenario many alternative technologies are studied to overcome the limitations of CMOS transistors. Among these possibilities, magnetic based technologies, like NanoMagnet Logic (NML) are among the most interesting. The reason of this interest lies in their magnetic nature, that opens up entire new possibilities in the design of logic circuits, like the possibility to mix logic and memory in the same device. Moreover they have no standby power consumption and potentially a much lower power consumption of CMOS transistors. In literature NML logic is well studied and theoretical and experimental proofs of concept were already found. However two important points are not enough considered in the analysis approach followed by most of the work in literature. First of all, no complex circuits are analyzed. NML logic is very different from CMOS technologies, so to completely understand the potential of this technology it is mandatory to investigate complex architectures. Secondly, most of the solutions proposed do not take into account the constraints derived from fabrication process, making them unrealistic and difficult to be fabricated experimentally. This thesis focuses therefore on NML logic keeping into account these two important limitations in the research approach followed in literature. The aim is to obtain a complete and accurate overview of NML logic, finding realistic circuital solutions and trying to improve at the same time their performance. After a brief and complete introduction (Chapter 1), the thesis is divided in two parts, which cover the two fundamental points followed in this three years of research: A circuits architecture analysis and a technological analysis. In the architecture analysis first an innovative VHDL model is described in Chapter 2. This model is extensively used in the analysis because it allows fast simulation of complex circuits, with, at the same time, the possibility to estimate circuit per- formance, like area and power consumption. In Chapter 3 the problem of signals synchronization in complex NML circuits is analyzed and solved, using as benchmark a simple but complete NML microprocessor. Different solutions based on asynchronous logic are studied and a new asynchronous solution, specifically designed to exploit the potential of NML logic, is developed. In Chapter 4 the layout of NML circuits is studied on a more physical level, considering the limitations of fabrication processes. The layout of NML circuits is therefore changed accordingly to these constraints. Secondly CMOS circuits architectures are compared to more simple architectures, evaluating therefore which one is more suited for NML logic. Finally the problem of interconnections in NML technology is analyzed and solutions to improve it are found. In Chapter 5 the problem of feedback signals in heavy pipelined technologies, like NML, is studied. Solutions to improve performances and synchronize signals are developed. Systolic arrays are then analyzed as possible candidate to exploit NML potential. Finally in Chapter 6 ToPoliNano, a simulator dedicated to NML and other emerging technologies, that we are developing, is described. This simulator allows to follow the same top-down approach followed for CMOS technology. The layout generator and the simulation engine are detailed described. In the first chapter of the technological analysis (Chapter 7), the performance of NML logic is explored throughout low level simulations. The aim is to understand if these circuits can be fabricated with optical lithography, allowing therefore the commercial development of NML logic. Basic logic gates and the clock system are there analyzed from a low level perspective. In Chapter 8 an innovative electric clock system for NML technology is shown and the first experimental results are reported. This clock system allows to achieve true low power for NML technology, obtaining a reduction of power consumption of 20 times considering the best CMOS transistors available. This power consumption takes into account all the losses, also the clock system losses. Moreover the solution presented can be fabricated with current technological processes. The research work behind this thesis represents an important breakthrough in NML logic. The solutions here presented allow the design and fabrication of complex NML circuits, considering the particular characteristics of this technology and considerably improving the performance. Moreover the technological solutions here presented allow the design and fabrication of circuits with available fabrication process with a considerable advantage over CMOS in terms of power consumption. This thesis represents therefore a considerable step froward in the study and development of NML technolog

    Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations

    Full text link
    We propose Paraiso, a domain specific language embedded in functional programming language Haskell, for automated tuning of explicit solvers of partial differential equations (PDEs) on GPUs as well as multicore CPUs. In Paraiso, one can describe PDE solving algorithms succinctly using tensor equations notation. Hydrodynamic properties, interpolation methods and other building blocks are described in abstract, modular, re-usable and combinable forms, which lets us generate versatile solvers from little set of Paraiso source codes. We demonstrate Paraiso by implementing a compressive hydrodynamics solver. A single source code less than 500 lines can be used to generate solvers of arbitrary dimensions, for both multicore CPUs and GPUs. We demonstrate both manual annotation based tuning and evolutionary computing based automated tuning of the program.Comment: 52 pages, 14 figures, accepted for publications in Computational Science and Discover
    • …
    corecore