8 research outputs found

    Energy Optimization of Unrolled Block Ciphers using Combinational Checkpointing

    Get PDF
    Energy consumption of block ciphers is critical in resource constrained devices. Unrolling has been explored in literature as a technique to increase efficiency by eliminating energy spent in loop control elements such as registers and multiplexers. However these savings are minimal and are offset by the increase in glitching power that comes with unrolling. We propose an efficient latch-based glitch filter for unrolled designs that reduces energy per encryption by an order of magnitude over a straightforward implementation, and by 28-32% over the best existing glitch filtering schemes. We explore the optimal number of glitch filters that should be used in order to minimize total energy, and provide estimates of the area cost. Partially unrolled designs also benefit from using our scheme with energies competitive to fully serialized implementations. We demonstrate our approach on the SIMON-128 and AES-256 block ciphers

    Power reduction techniques for memory elements

    Get PDF
    High performance and computational capability in the current generation processors are made possible by small feature sizes and high device density. To maintain the current drive strength and control the dynamic power in these processors, simultaneous scaling down of supply and threshold voltages is performed. High device density and low threshold voltages result in an increase in the leakage current dissipation. Large on chip caches are integrated onto the current generation processors which are becoming a major contributor to total leakage power. In this work, a novel methodology is proposed to minimize the leakage power and dynamic power. The proposed static power reduction technique, GALEOR (GAted LEakage transistOR), introduces stacks by placing high threshold voltage transistors and consists of inherent control logic. The proposed dynamic power reduction technique, adaptive phase tag cache, achieves power savings through varying tag size for a design window. Testing and verification of the proposed techniques is performed on a two level cache system. Power delay squared product is used as a metric to measure the effectiveness of the proposed techniques. The GALEOR technique achieves 30% reduction when implemented on CMOS benchmark circuits and an overall leakage savings of 9% when implemented on the two level cache systems. The proposed dynamic power reduction technique achieves 10% savings when implemented on individual modules of the two level cache and an overall savings of 3% when implemented on the entire two level cache system

    Gate-level timing analysis and waveform evaluation

    Get PDF
    Static timing analysis (STA) is an integral part of modern VLSI chip design. Table lookup based methods are widely used in current industry due to its fast runtime and mature algorithms. Conventional STA algorithms based on table-lookup methods are developed under many assumptions in timing analysis; however, most of those assumptions, such as that input signals and output signals can be accurately modeled as ramp waveforms, are no longer satisfactory to meet the increasing demand of accuracy for new technologies. In this dissertation, we discuss several crucial issues that conventional STA has not taken into consideration, and propose new methods to handle these issues and show that new methods produce accurate results. In logic circuits, gates may have multiple inputs and signals can arrive at these inputs at different times and with different waveforms. Different arrival times and waveforms of signals can cause very different responses. However, multiple-input transition effects are totally overlooked by current STA tools. Using a conventional single-input transition model when multiple-input transition happens can cause significant estimation errors in timing analysis. Previous works on this issue focus on developing a complicated gate model to simulate the behavior of logic gates. These methods have high computational cost and have to make significant changes to the prevailing STA tools, and are thus not feasible in practice. This dissertation proposes a simplified gate model, uses transistor connection structures to capture the behavior of multiple-input transitions and requires no change to the current STA tools. Another issue with table lookup based methods is that the load of each gate in technology libraries is modeled as a single lumped capacitor. But in the real circuit, the Abstract 2 gate connects to its subsequent gates via metal wires. As the feature size of integrated circuit scales down, the interconnection cannot be seen as a simple capacitor since the resistive shielding effect will largely affect the equivalent capacitance seen from the gate. As the interconnection has numerous structures, tabulating the timing data for various interconnection structures is not feasible. In this dissertation, by using the concept of equivalent admittance, we reduce an arbitrary interconnection structure into an equivalent π-model RC circuit. Many previous works have mapped the π-model to an effective capacitor, which makes the table lookup based methods useful again. However, a capacitor cannot be equivalent to a π-model circuit, and will thus result in significant inaccuracy in waveform evaluation. In order to obtain an accurate waveform at gate output, a piecewise waveform evaluation method is proposed in this dissertation. Each part of the piecewise waveform is evaluated according to the gate characteristic and load structures. Another contribution of this dissertation research is a proposed equivalent waveform search method. The signal waveforms can be very complicated in the real circuits because of noises, race hazards, etc. The conventional STA only uses one attribute (i.e., transition time) to describe the waveform shape which can cause significant estimation errors. Our approach is to develop heuristic search functions to find equivalent ramps to approximate input waveforms. Here the transition time of a final ramp can be completely different from that of the original waveform, but we can get higher accuracy on output arrival time and transition time. All of the methods mentioned in this dissertation require no changes to the prevailing STA tools, and have been verified across different process technologies

    Static Timing Analysis Based Transformations of Super-Complex Instruction Set Hardware Functions

    Get PDF
    Application specific hardware implementations are an increasingly popular way of reducing execution time and power consumption in embedded systems. This application specific hardware typically consumes a small fraction of the execution time and power consumption that the equivalent software code would require. Modern electronic design automation (EDA) tools can be used to apply a variety of transformations to hardware blocks in an effort to achieve additional performance and power savings. A number of such transformations require a tool with knowledge of the designs' timing characteristics. This thesis describes a static timing analyzer and two timing analysis based design automation tools. The static timing analyzer estimates the worst-case timing characteristics of a hardware data flow graph. These hardware data flow graphs are intermediate representations generated within a C to VHDL hardware acceleration compiler. Two EDA tools were then developed which utilize static timing analysis. An automated pipelining tool was developed to increase the throughput of large blocks of combinational logic generated by the hardware acceleration compiler. Another tool was designed in an attempt to mitigate power consumption resulting from extraneous combinational switching. By inserting special signal buffers, known as delay elements, with preselected propagation delays, combinational functional units can be kept inactive until their inputs have stabilized. The hardware descriptions generated by both tools were synthesized, simulated, and power profiled using existing commercial EDA tools. The results show that pipelining leads to an average performance increase of 3.3x, while delay elements saved between 25% and 33% of the power consumption when tested on a set of signal and image processing benchmarks

    Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

    Get PDF
    The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption

    Glitch power minimization by selective gate freezing

    No full text
    This paper presents a technique for glitch power minimization in combinational circuits. The total number of glitches is reduced by replacing some existing gates with functionally equivalent ones (called F-Gates) that can be 'frozen' by asserting a control signal. A frozen gate cannot propagate glitches to its output. Algorithms for gate selection and clustering that maximize the percentage of filtered glitches and reduce the overhead for generating the control signals are introduced. A power-efficient CMOS implementation of F-Gates is also described. An important feature of the proposed method is that it can be applied in place directly to layout-level descriptions; therefore, it guarantees very predictable results and minimizes the impact of the transformation on circuit size and speed

    Low power JPEG2000 5/3 discrete wavelet transform algorithm and architecture

    Get PDF

    LOW POWER CIRCUITS DESIGN USING RESISTIVE NON-VOLATILE MEMORIES

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore