424 research outputs found
A Comparative Analysis for Low-voltage, Low-power, and Low-energy Flip-flops
Recently, several flip-flops have been proposed to increase their speed while reducing their power and energy consumption. Flip-flop power is dependent on data activity and in many applications data activity is between 5-15%. In such cases, significantly large clock power and energy is wasted. This thesis explores performance of seven advanced D flip-flops in terms of power, delay, and energy using 65nm CMOS process technology. The main objective of this research is to compare and contrast recent flip-flops under different voltage and data activity conditions, and draw conclusions.
Transmission gate flip-flop (TGFF) is used as a reference flip-flop, and based on comparison result TGFF has shown power-performance trade-offs. 18-T single-phase clocked static flip-flops (18TSPC, TSPC18), and Low-power at low data activity flip-flop (LLFF) are the fastest alternatives suitable for higher performance. However, LLFF consumes more power on higher data activities, where as TSPC18 and 18TSPC consume more power on lower data activities. For lower data activities Topologically compressed flip-flop (TCFF) is power efficient amongst all, but poor in performance. Furthermore, post-layout simulation result illustrates that LLFF is the most energy efficient amongst all up to 20% of data activities, and 18TSPC is energy efficient for higher data activities
Energy-Efficient Digital Circuit Design using Threshold Logic Gates
abstract: Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical.
The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation.
Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR.
Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths.
Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Weather satellite picture receiving stations, APT digital scan converter
The automatic picture transmission digital scan converter is used at ground stations to convert signals received from scanning radiometers to data compatible with ground equipment designed to receive signals from vidicons aboard operational meteorological satellites. Information necessary to understand the circuit theory, functional operation, general construction and calibration of the converter is provided. Brief and detailed descriptions of each of the individual circuits are included, accompanied by a schematic diagram contained at the end of each circuit description. Listings of integral parts and testing equipment required as well as an overall wiring diagram are included. This unit will enable the user to readily accept and process weather photographs from the operational meteorological satellites
Latch-based RISC-V core with popcount instruction for CNN acceleration
Energy-efficiency is essential for vast majority of mobile and embedded battery-powered systems. Internet-of-Things paradigm combines requirements for high computational capabilities, extreme energy-efficiency and low-cost. Increasing manufacturing process variations pose formidable challenges for deep-submicron integrated circuit designs. The effects of variation are further exacerbated by lowered voltages in energy-efficient designs. Compared to traditional flip-flop-based design, latch-based design offers area, energy-efficiency and variation tolerance benefits at the cost of increased timing behavior complexity. A method for converting flip-flop-based processor core to latch-based core at register-transfer-level is presented in this work.
Convolutional neural networks have enabled image recognition in the field of computer vision at unprecedented accuracy. Performance and memory requirements of canonical convolutional neural networks have been out of reach for low-cost IoT devices. In collaboration with Tampere University, a custom popcount instruction was added to the cores for accelerating IoT optimized vehicle classification convolutional neural network.
This work compares simulation results from synthesized flip-flop-based and latch-based versions of a SCR1 RISC-V processor core and the effects of custom instruction for CNN acceleration. The latch core achieved roughly 50\% smaller energy per operation than the flip-flop core and 2.1x speedup was observed in the execution of the CNN when using the custom instruction
Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency
abstract: Static CMOS logic has remained the dominant design style of digital systems for
more than four decades due to its robustness and near zero standby current. Static
CMOS logic circuits consist of a network of combinational logic cells and clocked sequential
elements, such as latches and flip-flops that are used for sequencing computations
over time. The majority of the digital design techniques to reduce power, area, and
leakage over the past four decades have focused almost entirely on optimizing the
combinational logic. This work explores alternate architectures for the flip-flops for
improving the overall circuit performance, power and area. It consists of three main
sections.
First, is the design of a multi-input configurable flip-flop structure with embedded
logic. A conventional D-type flip-flop may be viewed as realizing an identity function,
in which the output is simply the value of the input sampled at the clock edge. In
contrast, the proposed multi-input flip-flop, named PNAND, can be configured to
realize one of a family of Boolean functions called threshold functions. In essence,
the PNAND is a circuit implementation of the well-known binary perceptron. Unlike
other reconfigurable circuits, a PNAND can be configured by simply changing the
assignment of signals to its inputs. Using a standard cell library of such gates, a technology
mapping algorithm can be applied to transform a given netlist into one with
an optimal mixture of conventional logic gates and threshold gates. This approach
was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier
in 65nm LP technology. Simulation and chip measurements show more than 30%
improvement in dynamic power and more than 20% reduction in core area.
The functional yield of the PNAND reduces with geometry and voltage scaling.
The second part of this research investigates the use of two mechanisms to improve
the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM
devices for low voltage operation.
The third part of this research focused on the design of flip-flops with non-volatile
storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated
with both conventional D-flipflop and the PNAND circuits to implement non-volatile
logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of
system locally when a power interruption occurs. However, manufacturing variations
in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading
to an overly pessimistic design and consequently, higher energy consumption. A
detailed analysis of the design trade-offs in the driver circuitry for performing backup
and restore, and a novel method to design the energy optimal driver for a given yield is
presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented,
in which the backup time is determined on a per-chip basis, resulting in minimizing
the energy wastage and satisfying the yield constraint. To achieve a yield of 98%,
the conventional approach would have to expend nearly 5X more energy than the
minimum required, whereas the proposed tunable approach expends only 26% more
energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are
designed with the same backup and restore circuitry in 65nm technology. The embedded
logic in NV-TLFF compensates performance overhead of NVL. This leads to the
possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and-
accumulate (MAC) unit is designed to demonstrate the performance benefits of the
proposed architecture. Based on the results of HSPICE simulations, the MAC circuit
with the proposed NV-TLFF cells is shown to consume at least 20% less power and
area as compared to the circuit designed with conventional DFFs, without sacrificing
any performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
Robust low-power digital circuit design in nano-CMOS technologies
Device scaling has resulted in large scale integrated, high performance, low-power, and low cost systems. However the move towards sub-100 nm technology nodes has increased variability in device characteristics due to large process variations. Variability has severe implications on digital circuit design by causing timing uncertainties in combinational circuits, degrading yield and reliability of memory elements, and increasing power density due to slow scaling of supply voltage. Conventional design methods add large pessimistic safety margins to mitigate increased variability, however, they incur large power and performance loss as the combination of worst cases occurs very rarely.
In-situ monitoring of timing failures provides an opportunity to dynamically tune safety margins in proportion to on-chip variability that can significantly minimize power and performance losses. We demonstrated by simulations two delay sensor designs to detect timing failures in advance that can be coupled with different compensation techniques such as voltage scaling, body biasing, or frequency scaling to avoid actual timing failures. Our simulation results using 45 nm and 32 nm technology BSIM4 models indicate significant reduction in total power consumption under temperature and statistical variations. Future work involves using dual sensing to avoid useless voltage scaling that incurs a speed loss.
SRAM cache is the first victim of increased process variations that requires handcrafted design to meet area, power, and performance requirements. We have proposed novel 6 transistors (6T), 7 transistors (7T), and 8 transistors (8T)-SRAM cells that enable variability tolerant and low-power SRAM cache designs. Increased sense-amplifier offset voltage due to device mismatch arising from high variability increases delay and power consumption of SRAM design. We have proposed two novel design techniques to reduce offset voltage dependent delays providing a high speed low-power SRAM design. Increasing leakage currents in nano-CMOS technologies pose a major challenge to a low-power reliable design. We have investigated novel segmented supply voltage architecture to reduce leakage power of the SRAM caches since they occupy bulk of the total chip area and power. Future work involves developing leakage reduction methods for the combination logic designs including SRAM peripherals
Ultra Low Power Digital Circuit Design for Wireless Sensor Network Applications
Ny forskning innenfor feltet trĂ„dlĂžse sensornettverk Ă„pner for nye og innovative produkter og lĂžsninger. Biomedisinske anvendelser er blant omrĂ„dene med stĂžrst potensial og det investeres i dag betydelige belĂžp for Ă„ bruke denne teknologien for Ă„ gjĂžre medisinsk diagnostikk mer effektiv samtidig som man Ă„pner for fjerndiagnostikk basert pĂ„ trĂ„dlĂžse sensornoder integrert i et âhelsenettâ. MĂ„let er Ă„ forbedre tjenestekvalitet og redusere kostnader samtidig som brukerne skal oppleve forbedret livskvalitet som fĂžlge av Ăžkt trygghet og mulighet for Ă„ tilbringe mest mulig tid i eget hjem og unngĂ„ unĂždvendige sykehusbesĂžk og innleggelser. For Ă„ gjĂžre dette til en realitet er man avhengige av sensorelektronikk som bruker minst mulig energi slik at man oppnĂ„r tilstrekkelig batterilevetid selv med veldig smĂ„ batterier. I sin avhandling â Ultra Low power Digital Circuit Design for Wireless Sensor Network Applicationsâ har PhD-kandidat Farshad Moradi fokusert pĂ„ nye lĂžsninger innenfor konstruksjon av energigjerrig digital kretselektronikk. Avhandlingen presenterer nye lĂžsninger bĂ„de innenfor aritmetiske og kombinatoriske kretser, samtidig som den studerer nye statiske minneelementer (SRAM) og alternative minnearkitekturer. Den ser ogsĂ„ pĂ„ utfordringene som oppstĂ„r nĂ„r silisiumteknologien nedskaleres i takt med mikroprosessorutviklingen og foreslĂ„r lĂžsninger som bidrar til Ă„ gjĂžre kretslĂžsninger mer robuste og skalerbare i forhold til denne utviklingen. De viktigste konklusjonene av arbeidet er at man ved Ă„ introdusere nye konstruksjonsteknikker bĂ„de er i stand til Ă„ redusere energiforbruket samtidig som robusthet og teknologiskalerbarhet Ăžker. Forskningen har vĂŠrt utfĂžrt i samarbeid med Purdue University og vĂŠrt finansiert av Norges ForskningsrĂ„d gjennom FRINATprosjektet âMicropower Sensor Interface in Nanometer CMOS Technologyâ
Energy Efficient Hardware Design for Securing the Internet-of-Things
The Internet of Things (IoT) is a rapidly growing field that holds potential to transform our everyday lives by placing tiny devices and sensors everywhere. The ubiquity and scale of IoT devices require them to be extremely energy efficient. Given the physical exposure to malicious agents, security is a critical challenge within the constrained resources. This dissertation presents energy-efficient hardware designs for IoT security.
First, this dissertation presents a lightweight Advanced Encryption Standard (AES) accelerator design. By analyzing the algorithm, a novel method to manipulate two internal steps to eliminate storage registers and replace flip-flops with latches to save area is discovered. The proposed AES accelerator achieves state-of-art area and energy efficiency.
Second, the inflexibility and high Non-Recurring Engineering (NRE) costs of Application-Specific-Integrated-Circuits (ASICs) motivate a more flexible solution. This dissertation presents a reconfigurable cryptographic processor, called Recryptor, which achieves performance and energy improvements for a wide range of security algorithms across public key/secret key cryptography and hash functions. The proposed design employs circuit techniques in-memory and near-memory computing and is more resilient to power analysis attack. In addition, a simulator for in-memory computation is proposed. It is of high cost to design and evaluate new-architecture like in-memory computing in Register-transfer level (RTL). A C-based simulator is designed to enable fast design space exploration and large workload simulations. Elliptic curve arithmetic and Galois counter mode are evaluated in this work.
Lastly, an error resilient register circuit, called iRazor, is designed to tolerate unpredictable variations in manufacturing process operating temperature and voltage of VLSI systems. When integrated into an ARM processor, this adaptive approach outperforms competing industrial techniques such as frequency binning and canary circuits in performance and energy.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147546/1/zhyiqun_1.pd
A sub-mW IoT-endnode for always-on visual monitoring and smart triggering
This work presents a fully-programmable Internet of Things (IoT) visual
sensing node that targets sub-mW power consumption in always-on monitoring
scenarios. The system features a spatial-contrast binary
pixel imager with focal-plane processing. The sensor, when working at its
lowest power mode ( at 10 fps), provides as output the number of
changed pixels. Based on this information, a dedicated camera interface,
implemented on a low-power FPGA, wakes up an ultra-low-power parallel
processing unit to extract context-aware visual information. We evaluate the
smart sensor on three always-on visual triggering application scenarios.
Triggering accuracy comparable to RGB image sensors is achieved at nominal
lighting conditions, while consuming an average power between and
, depending on context activity. The digital sub-system is extremely
flexible, thanks to a fully-programmable digital signal processing engine, but
still achieves 19x lower power consumption compared to MCU-based cameras with
significantly lower on-board computing capabilities.Comment: 11 pages, 9 figures, submitteted to IEEE IoT Journa
- âŠ