Search CORE

51 research outputs found

Efficient Superconductor Arithmetic Logic Unit for Ultra-Fast Computing

Author: Bozbey Ali
Razmkhah Sasan
Publication venue
Publication date: 14/12/2023
Field of study

We present a 4-bit Arithmetic Logic Unit (ALU) utilizing superconductor technology. The ALU serves as the central processing unit of a processor, performing crucial arithmetic and logical operations. We have adopted a bit-parallel architecture to ensure an efficient and streamlined design with minimal fanin/fanout and optimal latency. In terms of fabrication, the ALU has been fabricated using a standard commercial process. It operates at an impressive clock frequency exceeding 30 GHz while consuming a mere 4.75 mW of power, including applied reverse current, encompassing static and dynamic components. The ALU contains over 9000 Josephson junctions, with approximately 7000 JJs dedicated to wiring, delay lines, and path balancing, and it has over 18% bias margin. Designed as a co-processor, this arithmetic logic unit will work with external CMOS memory and processors via interface circuits. Thorough testing and validation of the ALU's functionality have been conducted with digital and analog simulations, and all the components were fabricated and measured within a 4K pulse-tube cryocooler. Experimental verification has confirmed the successful operation of both the arithmetic and logic units. These results have been analyzed and are presented alongside the experimental data to provide comprehensive insights into the ALU's behavior and capabilities.Comment: 11 pages, 10 figures and 37 reference

arXiv.org e-Print Archive

Investigating the Potential of Custom Instruction Set Extensions for SHA-3 Candidates on a 16-bit Microcontroller Architecture

Author: Burg Andreas Peter
Constantin Jeremy Hugues-Felix
Gurkaynak Frank K.
Publication venue: Cryptology ePrint Archive
Publication date: 06/02/2012
Field of study

In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA-3 candidates. To this end, we start from optimized assembly code for a common 16-bit microcontroller instruction set architecture. By themselves, these implementations provide reference for complexity of the algorithms on 16-bit architectures, commonly used in embedded systems. For each algorithm, we then propose suitable instruction set extensions and implement the modified processor core. We assess the gains in throughput, memory consumption, and the area overhead. Our results show that with less than 10% additional area, it is possible to increase the execution speed on average by almost 40%, while reducing memory requirements on average by more than 40%. In particular, the Grostl algorithm, which was one of the slowest algorithms in previous reference implementations, ends up being the fastest implementation by some margin, once minor (but dedicated) instruction set extensions are taken into account

Infoscience - École polytechnique fédérale de Lausanne

Cryptology ePrint Archive

Microarchitectural Low-Power Design Techniques for Embedded Microprocessors

Author: Constantin Jeremy Hugues-Felix
Publication venue: Lausanne, EPFL
Publication date: 09/11/2016
Field of study

With the omnipresence of embedded processing in all forms of electronics today, there is a strong trend towards wireless, battery-powered, portable embedded systems which have to operate under stringent energy constraints. Consequently, low power consumption and high energy efficiency have emerged as the two key criteria for embedded microprocessor design. In this thesis we present a range of microarchitectural low-power design techniques which enable the increase of performance for embedded microprocessors and/or the reduction of energy consumption, e.g., through voltage scaling. In the context of cryptographic applications, we explore the effectiveness of instruction set extensions (ISEs) for a range of different cryptographic hash functions (SHA-3 candidates) on a 16-bit microcontroller architecture (PIC24). Specifically, we demonstrate the effectiveness of light-weight ISEs based on lookup table integration and microcoded instructions using finite state machines for operand and address generation. On-node processing in autonomous wireless sensor node devices requires deeply embedded cores with extremely low power consumption. To address this need, we present TamaRISC, a custom-designed ISA with a corresponding ultra-low-power microarchitecture implementation. The TamaRISC architecture is employed in conjunction with an ISE and standard cell memories to design a sub-threshold capable processor system targeted at compressed sensing applications. We furthermore employ TamaRISC in a hybrid SIMD/MIMD multi-core architecture targeted at moderate to high processing requirements (> 1 MOPS). A range of different microarchitectural techniques for efficient memory organization are presented. Specifically, we introduce a configurable data memory mapping technique for private and shared access, as well as instruction broadcast together with synchronized code execution based on checkpointing. We then study an inherent suboptimality due to the worst-case design principle in synchronous circuits, and introduce the concept of dynamic timing margins. We show that dynamic timing margins exist in microprocessor circuits, and that these margins are to a large extent state-dependent and that they are correlated to the sequences of instruction types which are executed within the processor pipeline. To perform this analysis we propose a circuit/processor characterization flow and tool called dynamic timing analysis. Moreover, this flow is employed in order to devise a high-level instruction set simulation environment for impact-evaluation of timing errors on application performance. The presented approach improves the state of the art significantly in terms of simulation accuracy through the use of statistical fault injection. The dynamic timing margins in microprocessors are then systematically exploited for throughput improvements or energy reductions via our proposed instruction-based dynamic clock adjustment (DCA) technique. To this end, we introduce a 6-stage 32-bit microprocessor with cycle-by-cycle DCA. Besides a comprehensive design flow and simulation environment for evaluation of the DCA approach, we additionally present a silicon prototype of a DCA-enabled OpenRISC microarchitecture fabricated in 28 nm FD-SOI CMOS. The test chip includes a suitable clock generation unit which allows for cycle-by-cycle DCA over a wide range with fine granularity at frequencies exceeding 1 GHz. Measurement results of speedups and power reductions are provided

Infoscience - École polytechnique fédérale de Lausanne

Superscalar RISC-V Processor with SIMD Vector Extension

Author: He Jiongrui
Publication venue: 'University of Saskatchewan Library'
Publication date: 22/09/2020
Field of study

With the increasing number of digital products in the market, the need for robust and highly configurable processors rises. The demand is convened by the stable and extensible open-sourced RISC-V instruction set architecture. RISC-V processors are becoming popular in many fields of applications and research. This thesis presents a dual-issue superscalar RISC-V processor design with dynamic execution. The proposed design employs the global sharing scheme for branch prediction and Tomasulo algorithm for out-of-order execution. The processor is capable of speculative execution with five checkpoints. Data flow in the instruction dispatch and commit stages is optimized to achieve higher instruction throughput. The superscalar processor is extended with a customized vector instruction set of single-instruction-multiple-data computations to specifically improve the performance on machine learning tasks. According to the definition of the proposed vector instruction set, the scratchpad memory and element-wise arithmetic units are implemented in the vector co-processor. Different test programs are evaluated on the fully-tested superscalar processor. Compared to the reference work, the proposed design improves 18.9% on average instruction throughput and 4.92% on average prediction hit rate, with 16.9% higher operating clock frequency synthesized on the Intel Arria 10 FPGA board. The forward propagation of a convolution neural network model is evaluated by the standalone superscalar processor and the integration of the vector co-processor. The vector program with software-level optimizations achieves 9.53× improvement on instruction throughput and 10.18× improvement on real-time throughput. Moreover, the integration also provides 2.22× energy efficiency compared with the superscalar processor along

University of Saskatchewan Research Archive

Microprocessor energy characterization and optimization through fast, accurate, and flexible simulation

Author: Krashinsky Ronny (Ronny Meir), 1978-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2001
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 99-102).Energy dissipation is emerging as a key constraint for both high-performance and embedded microprocessor designs, requiring computer architects to consider energy in addition to performance when evaluating design decisions. A major limitation is the general difficulty in analyzing the energy impact of architectural and microarchitectural features without constructing detailed implementations and running slow simulations. This thesis first describes the design of a fast, accurate, and flexible circuit simulation tool which enables transition-sensitive studies of microprocessor energy consumption that would otherwise be impossible or impractical. With a simulation infrastructure in place, various optimizations are implemented that target the entire datapath and cache energy consumption. The individual energy optimizations are analyzed in detail, and the microprocessor design is characterized using various energy breakdowns and studies of the bit correlation between data values. This work shows that a few relatively simple energy-saving techniques can have a large impact in the implementation of an energy-efficient microprocessor. By fully characterizing the energy usage, this thesis establishes a coherent vision of microprocessor energy consumption, and serves as a basis and motivation for further energy optimizations.by Ronny Krashinsky.S.M

DSpace@MIT

A low power selective median filter design

Author: Dalai Radhamadhab
Publication venue
Publication date: 01/01/2008
Field of study

A selective median filter which consumes less power has been designed and different logics for majority bit evaluation has been applied and simulated in VHDL .It is rightly called as selective because an edge pixel detector [2] has been used to select those pixels which are to be processed through median filter. As for median value calculation; sorting of 3 x 3 window’s pixel values has been done using majority bit circuit [4].Different majority bit calculation method has been implemented and the result sorting circuit has been analyzed for power analysis. In this work a general median filter which uses binary sorting method known as Majority Voting Circuit (MVC) has been designed using VHDL and optimized using SYNOPSIS which has used 0.13μm CMOS technology .The digital design of sorting circuit saves approximately 60% of power comprising of cell leakage and dynamic power comparing to a mixed signal design of Floating gate based Majority bit median filter [4]. Before operating median filter on each pixel double derivative filter [2] has been applied to check whether it is an edge pixel or not. Overall this is a digital design of a mixed filter which preserves edges and removes noises as well.Low power techniques at logic level and algorithmic level have been embedded into this work. In our work we have also designed a small microprocessor using VHDL code. Later a memory (for the purpose of image storing) based Control Unit for single median value evaluation has been designed and simulated in XILINX. Here for sorting circuit a common logic based circuit (component) has been put forward. The power, latency or delay, area of whole design has been compared and tested with other designs

ethesis@nitr

Estimation of power generation potential of nonwoody biomass species

Author: Mishra Shankar
Publication venue
Publication date: 01/01/2007
Field of study

In view of high energy potentials in non-woody biomass species and an increasing interest in their utilization for power generation, an attempt has been made in this study to assess the proximate analysis and energy content of different components of Sida rhombifolia, Xanthium strumarium, Anisomeles lamiaceae and Eupatorium coelestinum biomass species (both non-woody), and their impact on power generation and land requirement for energy plantations. The net energy content in Sida is the highest. Xanthium biomass species appears to have slightly higher calorific values in its components than those of Anisomeles and Eupatorium. The pattern of variation of calorific value in the components like stump, branch, leaf and bark is not identical for all the presently studied biomass species. In all these studied biomass species, the calorific values of leaves, in general, are after stump and branch. The data for proximate and ultimate analysis of the components of these species are very close to each other and hence it is very difficult to draw a concrete conclusion. However, it appears from the present work that Eupatorium and Xanthium biomass species have the highest fixed carbon and lowest volatile matter contents in their stumps than the stumps of the others. As for ash fusion temperature The Sida biomass species has the highest values of IDT, ST, HT and FT (7860- 1490˚C) for its ash, followed by Anisomeles (740-1441˚C). Xanthium and Eupatorium have lower values for IDT, ST, HT and FT (670- 1244˚C) for their ashes. The results have shown that approximately 4, 7, 6 and 2 hectares of land are required to generate 20,000 kWh/day electricity from Sida rhombifolia, Eupatorium coelestinum, Xanthium strumarium and Anisomeles lamiaceae biomass species. Coal samples, obtained from six different local mines, were also examined for their qualities and the results were compared with those of studied biomass materials. This comparison reveals much higher power output with negligible emission of suspended particulate matters (SPM) from biomass materials

ethesis@nitr

Dynamic instruction scheduling and data forwarding in asynchronous superscalar processors

Author: Mullins Robert D.
Publication venue: The University of Edinburgh
Publication date: 01/01/2001
Field of study

Edinburgh Research Archive