3,743 research outputs found

    Portable random number generators

    Get PDF
    Computers are deterministic devices, and a computer-generated random number is a contradiction in terms. As a result, computer-generated pseudorandom numbers are fraught with peril for the unwary. We summarize much that is known about the most well-known pseudorandom number generators: congruential generators. We also provide machine-independent programs to implement the generators in any language that has 32-bit signed integers-for example C, C++, and FORTRAN. Based on an extensive search, we provide parameter values better than those previously available.Programming (Mathematics) ; Computers

    Sequential Circuit Design for Embedded Cryptographic Applications Resilient to Adversarial Faults

    Get PDF
    In the relatively young field of fault-tolerant cryptography, the main research effort has focused exclusively on the protection of the data path of cryptographic circuits. To date, however, we have not found any work that aims at protecting the control logic of these circuits against fault attacks, which thus remains the proverbial Achilles’ heel. Motivated by a hypothetical yet realistic fault analysis attack that, in principle, could be mounted against any modular exponentiation engine, even one with appropriate data path protection, we set out to close this remaining gap. In this paper, we present guidelines for the design of multifault-resilient sequential control logic based on standard Error-Detecting Codes (EDCs) with large minimum distance. We introduce a metric that measures the effectiveness of the error detection technique in terms of the effort the attacker has to make in relation to the area overhead spent in implementing the EDC. Our comparison shows that the proposed EDC-based technique provides superior performance when compared against regular N-modular redundancy techniques. Furthermore, our technique scales well and does not affect the critical path delay

    Multiplierless CSD techniques for high performance FPGA implementation of digital filters.

    Get PDF
    I leverage FastCSD to develop a new, high performance iterative multiplierless structure based on a novel real-time CSD recoding, so that more zero partial products are introduced. Up to 66.7% zero partial products occur compared to 50% in the traditional modified Booth's recoding. Also, this structure reduces the non-zero partial products to a minimum. As a result, the number of arithmetic operations in the carry-save structure is reduced. Thus, an overall speed-up, as well as low-power consumption can be achieved. Furthermore, because the proposed structure involves real time CSD recoding and does not require a fixed value for the multiplier input to be known a priori, the proposed multiplier can be applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters.My work is based on a dramatic new technique for converting between 2's complement and CSD number systems, and results in high-performance structures that are particularly effective for implementing adaptive systems in reconfigurable logic.My research focus is on two key ideas for improving DSP performance: (1) Develop new high performance, efficient shift-add techniques ("multiplierless") to implement the multiply-add operations without the need for a traditional multiplier structure. (2) There is a growing trend toward design prototyping and even production in FPGAs as opposed to dedicated DSP processors or ASICs; leverage this trend synergistically with the new multiplierless structures to improve performance.Implementation of digital signal processing (DSP) algorithms in hardware, such as field programmable gate arrays (FPGAs), requires a large number of multipliers. Fast, low area multiply-adds have become critical in modern commercial and military DSP applications. In many contemporary real-time DSP and multimedia applications, system performance is severely impacted by the limitations of currently available speed, energy efficiency, and area requirement of an onboard silicon multiplier.I also introduce a new multi-input Canonical Signed Digit (CSD) multiplier unit, which requires fewer shift/add/subtract operations and reduced CSD number conversion overhead compared to existing techniques. This results in reduced power consumption and area requirements in the hardware implementation of DSP algorithms. Furthermore, because all the products are produced simultaneously, the multiplication speed and thus the throughput are improved. The multi-input multiplier unit is applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters. The implementation cost of these digital filters can be further reduced by limiting the wordlength of the input signal with little or no sacrifice to the filter performance, which is confirmed by my simulation results. The proposed multiplier unit can also be applied to other DSP algorithms, such as digital filter banks or matrix and vector multiplications.Finally, the tradeoff between filter order and coefficient length in the design and implementation of high-performance filters in Field Programmable Gate Arrays (FPGAs) is discussed. Non-minimum order FIR filters are designed for implementation using Canonical Signed Digit (CSD) multiplierless implementation techniques. By increasing the filter order, the length of the coefficients can be decreased without reducing the filter performance. Thus, an overall hardware savings can be achieved.Adaptive system implementations require real-time conversion of coefficients to Canonical Signed Digit (CSD) or similar representations to benefit from multiplierless techniques for implementing filters. Multiplierless approaches are used to reduce the hardware and increase the throughput. This dissertation introduces the first non-iterative hardware algorithm to convert 2's complement numbers to their CSD representations (FastCSD) using a fixed number of shift and logic operations. As a result, the power consumption and area requirements required for hardware implementation of DSP algorithms in which the coefficients are not known a priori can be greatly reduced. Because all CSD digits are produced simultaneously, the conversion speed and thus the throughput are improved when compared to overlap-and-scan techniques such as Booth's recoding

    Characterization and Design of High-Level VHDL I/Q Frequency Downconverter via Special Sampling Scheme

    Get PDF
    This study explores the characterization and implementation of a Special Sampling Scheme (SSS) for In-Phase and Quad-Phase (I/Q) down conversion utilizing top-level, portable design strategies. The SSS is an under-developed signal sampling methodology that can be used with military and industry receiver systems, specifically, United States Air Force (USAF) video receiver systems. The SSS processes a digital input signal-stream sampled at a specified sampling frequency, and down converts it into In-Phase (I) and Quad-Phase (Q) output signal-streams. Using the theory and application of the SSS, there are three main objectives that will be accomplished: characterization of the effects of input, output, and filter coefficient parameters on the I/Q imbalances using the SSS; development and verification of abstract, top-level VHDL code of the I/Q SSS for hardware implementation; and finally, development, verification, and analysis of variation between synthesizable pipelined and sequential VHDL implementations of the SSS for Field Programmable Gate Arrays (FPGA) and Application Specific Integrated Circuits (ASIC)

    Pond IDE: Machine level program development environment and register transfer level simulator for a massively parallel computer architecture

    Get PDF
    As computing architectures are being implemented in late and post silicon technologies, fault tolerance and concurrent operation are becoming increasingly important. It is already common knowledge that manufacturers are putting two, four or even more cores on a single silicon die to improve computing performance. The proposed architecture far exceeds this number by grouping thousands or even millions of simple reduced instruction set computing (RISC) processors, each of which is capable of a single operation at a time, and to communicate with its eight nearest neighbors. In this architecture, if a single core or cluster of cores have defects at the time of manufacture, or later in the life of the system, it is possible to test and disable them as necessary. A fine-grained architecture of this kind calls for a parallel programming style. One approach to this problem is the use of a parallelizing compiler. Another approach may be to use one of the several application programming interfaces (APIs) available for standard text based programming languages, with some built-in features for parallel programming. This work has generated a solution for creating machine level parallel programs for the massively parallel computer architecture described above using text and graphical means. To support this programming method, an integrated development environment (IDE) and a zero communication latency, register transfer level (RTL) simulator have been developed. Experimental results include the implementation of fundamental data processing algorithms and complex functions

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

    Evolutionary design of digital VLSI hardware

    Get PDF

    The Systolic/Cellular System Assembler: User\u27s Guide

    Get PDF
    As components are getting cheaper and smaller, computer systems are getting larger (in number of components) and more complex. In the new age of parallel computing comes entirely new domains of problems to solve. There are two ways to parallelize a problem. One is to restructure a known algorithm so that independent parts run in parallel. The other method is to restructure the problem so that it fits well onto parallel architectures. The Systolic/Cellular System is an array of processors which run in parallel. Its architecture was designed to implement a particular algorithm for matrix manipulation very well. This algorithm, called the Faddeev Algorithm, is well suited to solve a wide variety of operations such as matrix inverse, matrix multiplication, and matrix addition. It can also be used to calculate more complex problems such as the least squares problem and the inverse Jacobian. To efficiently implement this and other algorithms, it is necessary to program as close as possible to the architecture. The obvious way to do this is in machine code, but machine code is hard to read, tedious to write, and almost impossible to debug. The next step is to write an Assembler, and give mnemonics to the various operations, and making the system easier to program. This was the goal of my project. In this document you will find a user\u27s manual for an Assembler for the Systolic/Cellular System. In it, I have described the architecture, issues involved in programming this machine, the input requirements of the Assembler, and a brief discussion on the architecture and how it can be improved to make it an easier machine to program
    • 

    corecore