15 research outputs found

    Gate and Throughput Optimizations for NULL Convention Self-timed Digital Circuits

    Get PDF
    NULL Convention Logic (NCL) provides an asynchronous design methodology employing dual-rail signals, quad-rail signals, or other Mutually Exclusive Assertion Groups (MEAGs) to incorporate data and control information into one mixed path. In NCL, the control is inherently present with each datum, so there is no need for worse-case delay analysis and control path delay matching. This dissertation focuses on optimization methods for NCL circuits, specifically addressing three related architectural areas of NCL design. First, a design method for optimizing NCL circuits is developed. The method utilizes conventional Boolean minimization followed by table-driven gate substitutions. It is applied to design time and space optimal fundamental logic functions, a time and space optimal full adder, and time, transistor count, and power optimal up-counter circuits. The method is applicable when composing logic functions where each gate is a state-holding element; and can produce delay-insensitive circuits requiring less area and fewer gate delays than alternative gate-level approaches requiring full minterm generation. Second, a pipelining method for producing throughput optimal NCL systems is developed. A relationship between the number of gate delays per stage and the worse-case throughput for a pipeline as a whole is derived. The method then uses this relationship to minimize a pipeline\u27s worse-case throughput by partitioning the NCL combinational circuitry through the addition of asynchronous registers. The method is applied to design a maximum throughput unsigned multiplier, which yields a speedup of 2.25 over the non-pipelined version, while maintaining delay-insensitivity. Third, a technique to mitigate the impact of the NULL cycle is developed. The technique further increases the maximum attainable throughput of a NCL system by reducing inherent overheads associated with an integrated data and control path. This technique is applied to a non-pipelined 4-bit by 4-bit unsigned multiplier to yield a speedup of 1.61 over the standalone version. Finally, these techniques are applied to design a 72+32x32 multiply and accumulate (MAC) unit, which outperforms other delay-insensitive/self-timed MACs in the literature. It also performs conditional rounding, scaling, and saturation of the output, whereas the others do not; thus further distinguishing it from the previous work. The methods developed facilitate speed, transistor count, and power tradeoffs using approaches that are readily automatable

    Gallium arsenide bit-serial integrated circuits

    Get PDF

    Design and implementation of high-radix arithmetic systems based on the SDNR/RNS data representation

    Get PDF
    This project involved the design and implementation of high-radix arithmetic systems based on the hybrid SDNRIRNS data representation. Some real-time applications require a real-time arithmetic system. An SDNR/RNS arithmetic system provides parallel, real-time processing. The advantages and disadvantages of high-radix SDNR/RNS arithmetic, and the feasibility of implementing SDNR/RNS arithmetic systems in CMOS VLSI technology, were investigated in this project. A common methodological model, which included the stages of analysis, design, implementation, testing, and simulation, was followed. The combination of the SDNR and RNS transforms potential complex logic networks into simpler logic blocks. It was found that when constructing a SDNRIRNS adder, factors such as the radix, digit set, and moduli must be taken into account. There are many avenues still to explore. For example, implementing other arithmetic systems in the same CMOS VLSI technology used in this project and comparing them to equivalent SDNR/RNS systems would provide a set of benchmarks. These benchmarks would be useful in addressing issues relating to relative performance

    Asynchronous techniques for new generation variation-tolerant FPGA

    Get PDF
    PhD ThesisThis thesis presents a practical scenario for asynchronous logic implementation that would benefit the modern Field-Programmable Gate Arrays (FPGAs) technology in improving reliability. A method based on Asynchronously-Assisted Logic (AAL) blocks is proposed here in order to provide the right degree of variation tolerance, preserve as much of the traditional FPGAs structure as possible, and make use of asynchrony only when necessary or beneficial for functionality. The newly proposed AAL introduces extra underlying hard-blocks that support asynchronous interaction only when needed and at minimum overhead. This has the potential to avoid the obstacles to the progress of asynchronous designs, particularly in terms of area and power overheads. The proposed approach provides a solution that is complementary to existing variation tolerance techniques such as the late-binding technique, but improves the reliability of the system as well as reducing the design’s margin headroom when implemented on programmable logic devices (PLDs) or FPGAs. The proposed method suggests the deployment of configurable AAL blocks to reinforce only the variation-critical paths (VCPs) with the help of variation maps, rather than re-mapping and re-routing. The layout level results for this method's worst case increase in the CLB’s overall size only of 6.3%. The proposed strategy retains the structure of the global interconnect resources that occupy the lion’s share of the modern FPGA’s soft fabric, and yet permits the dual-rail iv completion-detection (DR-CD) protocol without the need to globally double the interconnect resources. Simulation results of global and interconnect voltage variations demonstrate the robustness of the method

    Approximation and Optimization of an Auditory Model for Realization in VLSI Hardware

    Get PDF
    The Auditory Image Model (AIM) is a software tool set developed to functionally model the role of the ear in the human hearing process. AIM includes detailed filter equations for the major functional portions of the ear. Currently, AIM is run on a workstation and requires 10 to 100 times real-time to process audio information and produce an auditory image. An all-digital approximation of the AIM which is suitable for implementation in very large scale integrated circuits is presented. This document details the mathematical models of AIM and the approximations and optimizations used to simplify the filtering and signal processing accomplished by AIM. Included are the details of an efficient multi-rate architecture designed for sub-micron VLSI technology to carry out the approximated equations. Finally, simulation results which indicate that the architecture, when implemented in 0.8µm CMOS VLSI, will sustain real- time operation on a 32 channel system are included. The same tests also indicate that the chip will be approximately 3.3 mm2, and consume approximately 18 mW. The details of a new and efficient method for computing an approximate logarithm (base two) on binary integers is also presented. The approximate logarithm algorithm is used to convert sound energy into millibels quickly and with low power. Additionally, the algorithm, is easily extended to compute an approximate logarithm in base ten which broadens the class of problems to which it may be applied

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

    Public key cryptosystems : theory, application and implementation

    Get PDF
    The determination of an individual's right to privacy is mainly a nontechnical matter, but the pragmatics of providing it is the central concern of the cryptographer. This thesis has sought answers to some of the outstanding issues in cryptography. In particular, some of the theoretical, application and implementation problems associated with a Public Key Cryptosystem (PKC).The Trapdoor Knapsack (TK) PKC is capable of fast throughput, but suffers from serious disadvantages. In chapter two a more general approach to the TK-PKC is described, showing how the public key size can be significantly reduced. To overcome the security limitations a new trapdoor was described in chapter three. It is based on transformations between the radix and residue number systems.Chapter four considers how cryptography can best be applied to multi-addressed packets of information. We show how security or communication network structure can be used to advantage, then proposing a new broadcast cryptosystem, which is more generally applicable.Copyright is traditionally used to protect the publisher from the pirate. Chapter five shows how to protect information when in easily copyable digital format.Chapter six describes the potential and pitfalls of VLSI, followed in chapter seven by a model for comparing the cost and performance of VLSI architectures. Chapter eight deals with novel architectures for all the basic arithmetic operations. These architectures provide a basic vocabulary of low complexity VLSI arithmetic structures for a wide range of applications.The design of a VLSI device, the Advanced Cipher Processor (ACP), to implement the RSA algorithm is described in chapter nine. It's heart is the modular exponential unit, which is a synthesis of the architectures in chapter eight. The ACP is capable of a throughput of 50 000 bits per second

    Serial-data computation in VLSI

    Get PDF
    corecore