61 research outputs found

    Pipelined Two-Operand Modular Adders

    Get PDF
    Pipelined two-operand modular adder (TOMA) is one of basic components used in digital signal processing (DSP) systems that use the residue number system (RNS). Such modular adders are used in binary/residue and residue/binary converters, residue multipliers and scalers as well as within residue processing channels. The design of pipelined TOMAs is usually obtained by inserting an appriopriate number of latch layers inside a nonpipelined TOMA structure. Hence their area is also determined by the number of latches and the delay by the number of latch layers. In this paper we propose a new pipelined TOMA that is based on a new TOMA, that has the smaller area and smaller delay than other known structures. Comparisons are made using data from the very large scale of integration (VLSI) standard cell library

    Redundant residue number system code for fault-tolerant hybrid memories

    Get PDF
    Hybrid memories are envisioned as one of the alternatives to existing semiconductor memories. Although offering enormous data storage capacity, low power consumption, and reduced fabrication complexity (at least for the memory cell array), such memories are subject to a high degree of intermittent and transient faults leading to reliability issues. This article examines the use of Conventional Redundant Residue Number System (C-RRNS) error correction code, which has been extensively used in digital signal processing and communication, to detect and correct intermittent and transient cluster faults in hybrid memories. It introduces a modified version of C-RRNS, referred to as 6M-RRNS, to realize the aims at lower area overhead and performance penalty. The experimental results show that 6M-RRNS realizes a competitive error correction capability, provides larger data storage capacity, and offers higher decoding performance as compared to C-RRNS and Reed-Solomon (RS) codes. For instance, for 64-bit hybrid memories at 10% fault rate, 6M-RRNS has 98.95% error correction capability, which is 0.35% better than RS and 0.40% less than C-RRNS. Moreover, when considering 1Tbit memory, 6M-RRNS offers 4.35% more data storage capacity than RS and 11.41% more than C-RRNS. Additionally, it decodes up to 5.25 times faster than C-RRNS

    Algorithms and VLSI architectures for parametric additive synthesis

    Get PDF
    A parametric additive synthesis approach to sound synthesis is advantageous as it can model sounds in a large scale manner, unlike the classical sinusoidal additive based synthesis paradigms. It is known that a large body of naturally occurring sounds are resonant in character and thus fit the concept well. This thesis is concerned with the computational optimisation of a super class of form ant synthesis which extends the sinusoidal parameters with a spread parameter known as band width. Here a modified formant algorithm is introduced which can be traced back to work done at IRCAM, Paris. When impulse driven, a filter based approach to modelling a formant limits the computational work-load. It is assumed that the filter's coefficients are fixed at initialisation, thus avoiding interpolation which can cause the filter to become chaotic. A filter which is more complex than a second order section is required. Temporal resolution of an impulse generator is achieved by using a two stage polyphase decimator which drives many filterbanks. Each filterbank describes one formant and is composed of sub-elements which allow variation of the formant’s parameters. A resource manager is discussed to overcome the possibility of all sub- banks operating in unison. All filterbanks for one voice are connected in series to the impulse generator and their outputs are summed and scaled accordingly. An explorative study of number systems for DSP algorithms and their architectures is investigated. I invented a new theoretical mechanism for multi-level logic based DSP. Its aims are to reduce the number of transistors and to increase their functionality. A review of synthesis algorithms and VLSI architectures are discussed in a case study between a filter based bit-serial and a CORDIC based sinusoidal generator. They are both of similar size, but the latter is always guaranteed to be stable

    Architectures and implementations for the Polynomial Ring Engine over small residue rings

    Get PDF
    This work considers VLSI implementations for the recently introduced Polynomial Ring Engine (PRE) using small residue rings. To allow for a comprehensive approach to the implementation of the PRE mappings for DSP algorithms, this dissertation introduces novel techniques ranging from system level architectures to transistor level considerations. The Polynomial Ring Engine combines both classical residue mappings and new polynomial mappings. This dissertation develops a systematic approach for generating pipelined systolic/ semi-systolic structures for the PRE mappings. An example architecture is constructed and simulated to illustrate the properties of the new architectures. To simultaneously achieve large computational dynamic range and high throughput rate the basic building blocks of the PRE architecture use transistor size profiling. Transistor sizing software is developed for profiling the Switching Tree dynamic logic used to build the basic modulo blocks. The software handles complex nFET structures using a simple iterative algorithm. Issues such as convergence of the iterative technique and validity of the sizing formulae have been treated with an appropriate mathematical analysis. As an illustration of the use of PRE architectures for modem DSP computational problems, a Wavelet Transform for HDTV image compression is implemented. An interesting use is made of the PRE technique of using polynomial indeterminates as \u27placeholders\u27 for components of the processed data. In this case we use an indeterminate to symbolically handle the irrational number [square root of 3] of the Daubechie mother wavelet for N = 4. Finally, a multi-level fault tolerant PRE architecture is developed by combining the classical redundant residue approach and the circuit parity check approach. The proposed architecture uses syndromes to correct faulty residue channels and an embedded parity check to correct faulty computational channels. The architecture offers superior fault detection and correction with online data interruption

    Techniques for the realization of ultra- reliable spaceborne computer Final report

    Get PDF
    Bibliography and new techniques for use of error correction and redundancy to improve reliability of spaceborne computer

    Design of a fault tolerant airborne digital computer. Volume 1: Architecture

    Get PDF
    This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive

    CoFHEE: A Co-processor for Fully Homomorphic Encryption Execution

    Full text link
    The migration of computation to the cloud has raised privacy concerns as sensitive data becomes vulnerable to attacks since they need to be decrypted for processing. Fully Homomorphic Encryption (FHE) mitigates this issue as it enables meaningful computations to be performed directly on encrypted data. Nevertheless, FHE is orders of magnitude slower than unencrypted computation, which hinders its practicality and adoption. Therefore, improving FHE performance is essential for its real world deployment. In this paper, we present a year-long effort to design, implement, fabricate, and post-silicon validate a hardware accelerator for Fully Homomorphic Encryption dubbed CoFHEE. With a design area of 12mm212mm^2, CoFHEE aims to improve performance of ciphertext multiplications, the most demanding arithmetic FHE operation, by accelerating several primitive operations on polynomials, such as polynomial additions and subtractions, Hadamard product, and Number Theoretic Transform. CoFHEE supports polynomial degrees of up to n=214n = 2^{14} with a maximum coefficient sizes of 128 bits, while it is capable of performing ciphertext multiplications entirely on chip for n213n \leq 2^{13}. CoFHEE is fabricated in 55nm CMOS technology and achieves 250 MHz with our custom-built low-power digital PLL design. In addition, our chip includes two communication interfaces to the host machine: UART and SPI. This manuscript presents all steps and design techniques in the ASIC development process, ranging from RTL design to fabrication and validation. We evaluate our chip with performance and power experiments and compare it against state-of-the-art software implementations and other ASIC designs. Developed RTL files are available in an open-source repository

    Design of a reusable distributed arithmetic filter and its application to the affine projection algorithm

    Get PDF
    Digital signal processing (DSP) is widely used in many applications spanning the spectrum from audio processing to image and video processing to radar and sonar processing. At the core of digital signal processing applications is the digital filter which are implemented in two ways, using either finite impulse response (FIR) filters or infinite impulse response (IIR) filters. The primary difference between FIR and IIR is that for FIR filters, the output is dependent only on the inputs, while for IIR filters the output is dependent on the inputs and the previous outputs. FIR filters also do not sur from stability issues stemming from the feedback of the output to the input that aect IIR filters. In this thesis, an architecture for FIR filtering based on distributed arithmetic is presented. The proposed architecture has the ability to implement large FIR filters using minimal hardware and at the same time is able to complete the FIR filtering operation in minimal amount of time and delay when compared to typical FIR filter implementations. The proposed architecture is then used to implement the fast affine projection adaptive algorithm, an algorithm that is typically used with large filter sizes. The fast affine projection algorithm has a high computational burden that limits the throughput, which in turn restricts the number of applications. However, using the proposed FIR filtering architecture, the limitations on throughput are removed. The implementation of the fast affine projection adaptive algorithm using distributed arithmetic is unique to this thesis. The constructed adaptive filter shares all the benefits of the proposed FIR filter: low hardware requirements, high speed, and minimal delay.Ph.D.Committee Chair: Anderson, Dr. David V.; Committee Member: Hasler, Dr. Paul E.; Committee Member: Mooney, Dr. Vincent J.; Committee Member: Taylor, Dr. David G.; Committee Member: Vuduc, Dr. Richar

    A virtual pebble game to ensemble average graph rigidity

    Get PDF
    Previous works have demonstrated that protein rigidity is related to thermodynamic stability, especially under conditions that favor formation of native structure. Mechanical network rigidity properties of a single conformation are efficiently calculated using the in- teger Pebble Game (PG) algorithm. However, thermodynamic properties require averaging over many samples from the ensemble of accessible conformations, leading to fluctuations within the network. We have developed a mean field Virtual Pebble Game (VPG) that provides a probabilistic description of the interaction network, meaning that sampling is not required. We extensively test the VPG algorithm over a variety of body-bar networks created on disordered lattices, from these calculations we fully characterize the network conditions under which the performance of the VPG offers the best solution. The VPG provides a satisfactory description of the ensemble averaged PG properties, especially in regions removed from the rigidity transition where ensemble fluctuations are greatest. In further experiments, we characterized the VPG across a structurally nonredundant dataset of 272 proteins. Using quantitative and visual assessments of the rigidity characterizations, the VPG results are shown to accurately reflect the ensemble averaged PG properties. That is, the fluctuating interaction network is well represented by a single calculation that re- places density functions with average values, thus speeding up the desired calculation by several orders of magnitude. Finally, we propose a new algorithm that is based on the combination of PG and VPG to balance the amount of sampling and mean field treatment. While offering interesting results, this approach needs to be further optimized to fully lever- age its utility. All these results positions the VPG as an efficient alternative to understand the mechanical role that chemical interactions play in maintaining protein stability

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design
    corecore