653 research outputs found

    Reliable Low-Latency and Low-Complexity Viterbi Architectures Benchmarked on ASIC and FPGA

    Get PDF
    The Viterbi algorithm is commonly applied in a number of sensitive usage models including decoding convolutional codes used in communications such as satellite communication, cellular relay, and wireless local area networks. Moreover, the algorithm has been applied to automatic speech recognition and storage devices. In this thesis, efficient error detection schemes for architectures based on low-latency, low-complexity Viterbi decoders are presented. The merit of the proposed schemes is that reliability requirements, overhead tolerance, and performance degradation limits are embedded in the structures and can be adapted accordingly. We also present three variants of recomputing with encoded operands and its modifications to detect both transient and permanent faults, coupled with signature-based schemes. The instrumented decoder architecture has been subjected to extensive error detection assessments through simulations, and application-specific integrated circuit (ASIC) [32nm library] and field-programmable gate array (FPGA) [Xilinx Virtex-6 family] implementations for benchmark. The proposed fine-grained approaches can be utilized based on reliability objectives and performance/implementation metrics degradation tolerance

    Arithmetic on a Distributed-Memory Quantum Multicomputer

    Full text link
    We evaluate the performance of quantum arithmetic algorithms run on a distributed quantum computer (a quantum multicomputer). We vary the node capacity and I/O capabilities, and the network topology. The tradeoff of choosing between gates executed remotely, through ``teleported gates'' on entangled pairs of qubits (telegate), versus exchanging the relevant qubits via quantum teleportation, then executing the algorithm using local gates (teledata), is examined. We show that the teledata approach performs better, and that carry-ripple adders perform well when the teleportation block is decomposed so that the key quantum operations can be parallelized. A node size of only a few logical qubits performs adequately provided that the nodes have two transceiver qubits. A linear network topology performs acceptably for a broad range of system sizes and performance parameters. We therefore recommend pursuing small, high-I/O bandwidth nodes and a simple network. Such a machine will run Shor's algorithm for factoring large numbers efficiently.Comment: 24 pages, 10 figures, ACM transactions format. Extended version of Int. Symp. on Comp. Architecture (ISCA) paper; v2, correct one circuit error, numerous small changes for clarity, add reference

    DESIGN OF NEW HIGH SPEED MULTI OUTPUT CARRY LOOK-AHEAD ADDERS

    Get PDF
    The carry look-ahead adders are designed till now by using standard 4 bit Manchester carry chain. Due to its limited carry chain length, the carries of the adders are computed using 4 bit carry chain. This leads to slow down the operation. A high speed 8 bit (MCC) adder in multi output domino CMOS logic is designed in this thesis. Due to its limited carry chain length this high speed MCC uses 2 separate 4-bit MCC. The 2 MCC namely odd carry chain and even carry chain are computed in parallel to increase the speed of the operation. This technique has been applied for the design of 8 bit adders in multi output domino logic and the simulation results are verified. Results prove that 8 bit MCC produces less delay compared to conventional 4 bit delay. The reduced delay realizes better speed compared to the conventional designs. The existing design and the previous designs are designed and simulated using Mentor Graphics. The delay of these designs is compared with 8 bit input and with 50 nm technology file. Implementation results reveal that the high speed comparator has delay of 37.47% less compared to the conventional designs used for comparison when operated at 50 MHz

    FPGA adders: performance evaluation and optimal design

    Get PDF
    Delay models and cost analyses developed for ASIC technology are not useful in designing and implementing FPGA devices. The authors discuss costs and operational delays of fixed-point adders on Xilinx 4000 series devices and propose timing models and optimization schemes for carry-skip and carry-select adders.published_or_final_versio

    Models of computation: A numeric analysis and performance evaluation

    Get PDF
    This research seeks to better understand what drives performance in computation. To develop this understanding the researcher investigates the literature on computational performance within the classical and quantum paradigm for both binary and multi-value logic. Based on the findings of the literature the researcher evaluates through an experiment of addition what drives performance and how performance can be improved. For the evaluation of this research, a realist research paradigm employs two research methods. The first is an automaton model of computation to model each of the computing paradigms and computational logic. The second is computational complexity theory for measuring the performance of addition. Through this evaluation the researcher seeks to gain a better understanding of what drives computational performance and how addition can be performed more efficiently. The results of the research lead the researcher to conclude that modernisation of machinery caused the birth start of automated computing and the binary number system in computers. As this research indicated that computation through increasing the radix can improve performance of computation for addition. Based on reported findings in the science of quantum mechanics research, it would be possible to implement a model of computation with increased radix. Through embracing state discrimination/ distinguishability this research calls to review the current quantum computing paradigm based on state duality

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

    High Performance and Optimal Configuration of Accurate Heterogeneous Block-Based Approximate Adder

    Full text link
    Approximate computing is an emerging paradigm to improve power and performance efficiency for error-resilient application. Recent approximate adders have significantly extended the design space of accuracy-power configurable approximate adders, and find optimal designs by exploring the design space. In this paper, a new energy-efficient heterogeneous block-based approximate adder (HBBA) is proposed; which is a generic/configurable model that can be transformed to a particular adder by defining some configurations. An HBBA, in general, is composed of heterogeneous sub-adders, where each sub-adder can have a different configuration. A set of configurations of all the sub-adders in an HBBA defines its configuration. The block-based adders are approximated through inexact logic configuration and truncated carry chains. HBBA increases design space providing additional design points that fall on the Pareto-front and offer better power-accuracy trade-off compared to other configurations. Furthermore, to avoid Mont-Carlo simulations, we propose an analytical modelling technique to evaluate the probability of error and Probability Mass Function (PMF) of error value. Moreover, the estimation method estimates delay, area and power of heterogeneous block-based approximate adders. Thus, based on the analytical model and estimation method, the optimal configuration under a given error constraint can be selected from the whole design space of the proposed adder model by exhaustive search. The simulation results show that our HBBA provides improved accuracy in terms of error metrics compared to some state-of-the-art approximate adders. HBBA with 32 bits length serves about 15% reduction in area and up to 17% reduction in energy compared to state-of-the-art approximate adders.Comment: Submitted to the IEEE-TCAD journal, 16 pages, 16 figure
    corecore