119 research outputs found
Efficient modular arithmetic units for low power cryptographic applications
The demand for high security in energy constrained devices such as mobiles and PDAs is growing rapidly. This leads to the need for efficient design of cryptographic algorithms which offer data integrity, authentication, non-repudiation and confidentiality of the encrypted data and communication channels. The public key cryptography is an ideal choice for data integrity, authentication and non-repudiation whereas the private key cryptography ensures the confidentiality of the data transmitted. The latter has an extremely high encryption speed but it has certain limitations which make it unsuitable for use in certain applications. Numerous public key cryptographic algorithms are available in the literature which comprise modular arithmetic modules such as modular addition, multiplication, inversion and exponentiation. Recently, numerous cryptographic algorithms have been proposed based on modular arithmetic which are scalable, do word based operations and efficient in various aspects. The modular arithmetic modules play a crucial role in the overall performance of the cryptographic processor. Hence, better results can be obtained by designing efficient arithmetic modules such as modular addition, multiplication, exponentiation and squaring. This thesis is organized into three papers, describes the efficient implementation of modular arithmetic units, application of these modules in International Data Encryption Algorithm (IDEA). Second paper describes the IDEA algorithm implementation using the existing techniques and using the proposed efficient modular units. The third paper describes the fault tolerant design of a modular unit which has online self-checking capability --Abstract, page iv
CPU, GPU i FPGA implementacija MALD algoritma za otkrivanje nepravilnosti na površini keramičkih pločica
This paper addresses adjustments, implementation and performance comparison of the Moving Average with Local Difference (MALD) method for ceramic tile surface defects detection. Ceramic tile production process is completely autonomous, except the final stage where human eye is required for defects detection. Recent computational platform development and advances in machine vision provides us with several options for MALD algorithm implementation. In order to exploit the shortest execution time for ceramic tile production process, the MALD method is implemented on three different platforms: CPU, GPU and FPGA, and it is implemented on each platform in at least two ways. Implementations are done in MATLAB’s MEX/C++, C++, CUDA/C++, VHDL and Assembly programming languages. Execution times are measured and compared for different algorithms and their implementations on different computational platforms.U ovom radu razmatra se prilagodba, implementacija i usporedba performansi metode pomičnog usrednjavanja s lokalnom diferencijom (MALD) s primjenom u otkrivanju površinskih nedostataka na keramičkim pločicama. Proizvodna linija keramičkih pločica je autonomna sve do zadnje faze u kojoj je potreban ljudski vid kako bi se otkrili eventualni nedostaci na keramičkim pločicama. Nedavnim razvojem računalnih platformi i razvojem metoda računalnog vida omogućena je implementacija MALD metode na nekoliko načina. U nastojanju skraćenja vremena potrebnog za proizvodnju keramičkih pločica, MALD metoda je implementirana u trima različitim platformama: CPU (central processing unit), GPU (graphic processing unit) i FPGA (field programmable gate array), te s barem dva različita algoritma. Implementacija je izvršena sa MATLAB MEX/C++, C++, CUDA/C++, VHDL te Asembler programskim jezicima. Izmjerena vremena obrade su me.usobno uspore.ena za različite algoritme i njihove implementacije na različitim računalnim platformama
Recommended from our members
Low-cost duplication for separable error detection in computer arithmetic
Low-cost arithmetic error detection will be necessary in the future to ensure correct and safe system operation. However, current error detection mechanisms for arithmetic either have high area and energy overheads or are complex and offer incomplete protection against errors. Full duplication is simple, strong, and separable, but often is prohibitively costly. Alternative techniques such as arithmetic error coding require lower hardware and energy overheads than full duplication, but they do so at the expense of high design effort and error coverage holes. The goal of this research is to mitigate the deficiencies of duplication and arithmetic error coding to form an error detection scheme that may be readily employed in future systems. The techniques described by this work use a general duplication technique that employs an alternate number system in the duplicate arithmetic unit. These novel dual modular redundancy organizations are referred to as low-cost duplication, and they provide compelling efficiency and coverage advantages over prior arithmetic error detection mechanisms.Electrical and Computer Engineerin
Design of Soft Error Robust High Speed 64-bit Logarithmic Adder
Continuous scaling of the transistor size and reduction of the operating voltage have led to a significant performance improvement of integrated circuits. However, the vulnerability of the scaled circuits to transient data upsets or soft errors, which are caused by alpha particles and cosmic neutrons, has emerged as a major reliability concern. In this thesis, we have investigated the effects of soft errors in combinational circuits and proposed soft error detection techniques for high speed adders. In particular, we have proposed an area-efficient 64-bit soft error robust logarithmic adder (SRA). The adder employs the carry merge Sklansky adder architecture in which carries are generated every 4 bits. Since the particle-induced transient, which is often referred to as a single event transient (SET) typically lasts for 100~200 ps, the adder uses time redundancy by sampling the sum outputs twice. The sampling instances have been set at 110 ps apart. In contrast to the traditional time redundancy, which requires two clock cycles to generate a given output, the SRA generates an output in a single clock cycle. The sampled sum outputs are compared using a 64-bit XOR tree to detect any possible error. An energy efficient 4-input transmission gate based XOR logic is implemented to reduce the delay and the power in this case. The pseudo-static logic (PSL), which has the ability to recover from a particle induced transient, is used in the adder implementation. In comparison with the space redundant approach which requires hardware duplication for error detection, the SRA is 50% more area efficient. The proposed SRA is simulated for different operands with errors inserted at different nodes at the inputs, the carry merge tree, and the sum generation circuit. The simulation vectors are carefully chosen such that the SET is not masked by error masking mechanisms, which are inherently present in combinational circuits. Simulation results show that the proposed SRA is capable of detecting 77% of the errors. The undetected errors primarily result when the SET causes an even number of errors and when errors occur outside the sampling window
Recommended from our members
A Performance-Efficient and Practical Processor Error Recovery Framework
Continued reduction in the size of a transistor has affected the reliability of pro-
cessors built using them. This is primarily due to factors such as inaccuracies while
manufacturing, as well as non-ideal operating conditions, causing transistors to slow
down consistently, eventually leading to permanent breakdown and erroneous operation
of the processor. Permanent transistor breakdown, or faults, can occur at any point in
time in the processor’s lifetime. Errors are the discrepancies in the output of faulty
circuits. This dissertation shows that the components containing faults can continue
operating if the errors caused by them are within certain bounds. Further, the lifetime
of a processor can be increased by adding supportive structures that start working
once the processor develops these hard errors.
This dissertation has three major contributions, namely REPAIR, FaultSim and
PreFix. REPAIR is a fault tolerant system with minimal changes to the processor
design. It uses an external Instruction Re-execution Unit (IRU) to perform operations,
which the faulty processor might have erroneously executed. Instructions that are
found to use faulty hardware are then re-executed on the IRU. REPAIR shows that
the performance overhead of such targeted re-execution is low for a limited number of
faults.
FaultSim is a fast fault-simulator capable of simulating large circuits at the transistor
level. It is developed in this dissertation to understand the effect of faults on different
circuits. It performs digital logic based simulations, trading off analogue accuracy with
speed, while still being able to support most fault models. A 32-bit addition takes
under 15 micro-seconds, while simulating more than 1500 transistors. It can also be
integrated into an architectural simulator, which added a performance overhead of 10 to 26 percent to a simulation. The results obtained show that single faults cause an
error in an adder in less than 10 percent of the inputs.
PreFix brings together the fault models created using FaultSim and the design
directions found using REPAIR. PreFix performs re-execution of instructions on a
remote core, which pick up instructions to execute using a global instruction buffer.
Error prediction and detection are used to reduce the number of re-executed instructions.
PreFix has an area overhead of 3.5 percent in the setup used, and the performance
overhead is within 5 percent of a fault-free case. This dissertation shows that faults
in processors can be tolerated without explicitly switching off any component, and
minimal redundancy is sufficient to achieve the same
The 1991 3rd NASA Symposium on VLSI Design
Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2
A Survey on Approximate Multiplier Designs for Energy Efficiency: From Algorithms to Circuits
Given the stringent requirements of energy efficiency for Internet-of-Things
edge devices, approximate multipliers, as a basic component of many processors
and accelerators, have been constantly proposed and studied for decades,
especially in error-resilient applications. The computation error and energy
efficiency largely depend on how and where the approximation is introduced into
a design. Thus, this article aims to provide a comprehensive review of the
approximation techniques in multiplier designs ranging from algorithms and
architectures to circuits. We have implemented representative approximate
multiplier designs in each category to understand the impact of the design
techniques on accuracy and efficiency. The designs can then be effectively
deployed in high-level applications, such as machine learning, to gain energy
efficiency at the cost of slight accuracy loss.Comment: 38 pages, 37 figure
Design for pre-bond testability in 3D integrated circuits
In this dissertation we propose several DFT techniques specific to 3D
stacked IC systems. The goal has explicitly been to create techniques that
integrate easily with existing IC test systems. Specifically, this means
utilizing scan- and wrapper-based techniques, two foundations
of the digital IC test industry.
First, we describe a general test architecture for 3D ICs. In this
architecture, each tier of a 3D design is wrapped in test control logic that
both manages tier test
pre-bond and integrates the tier into the large test architecture post-bond.
We describe a new kind of boundary scan to provide the necessary test control
and observation of the partial circuits, and we propose
a new design methodology for test hardcore that ensures both pre-bond functionality
and post-bond optimality. We present the application of these techniques to
the 3D-MAPS test vehicle, which has proven their effectiveness.
Second, we extend these DFT techniques to circuit-partitioned designs. We find
that boundary scan design is generally sufficient, but that some 3D designs require
special DFT treatment. Most importantly, we demonstrate that the functional
partitioning inherent in 3D design can potentially decrease the total test cost
of verifying a circuit.
Third, we present a new CAD algorithm for designing 3D test wrappers. This algorithm
co-designs the pre-bond and post-bond wrappers to simultaneously minimize test
time and routing cost. On average, our algorithm utilizes over 90% of the wires
in both the pre-bond and post-bond wrappers.
Finally, we look at the 3D vias themselves to develop a low-cost, high-volume
pre-bond test methodology appropriate for production-level test. We describe
the shorting probes methodology, wherein large test probes are used to contact
multiple small 3D vias. This technique is an all-digital test method that
integrates seamlessly into existing test flows. Our
experimental results demonstrate two key facts: neither the large capacitance
of the probe tips nor the process variation in the 3D vias and the probe tips
significantly hinders the testability of the circuits.
Taken together, this body of work defines a complete test methodology for
testing 3D ICs pre-bond, eliminating one of the key hurdles to the
commercialization of 3D technology.PhDCommittee Chair: Lee, Hsien-Hsin; Committee Member: Bakir, Muhannad; Committee Member: Lim, Sung Kyu; Committee Member: Vuduc, Richard; Committee Member: Yalamanchili, Sudhaka
The 1992 4th NASA SERC Symposium on VLSI Design
Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design
- …