653 research outputs found
Reliable Low-Latency and Low-Complexity Viterbi Architectures Benchmarked on ASIC and FPGA
The Viterbi algorithm is commonly applied in a number of sensitive usage models including decoding convolutional codes used in communications such as satellite communication, cellular relay, and wireless local area networks. Moreover, the algorithm has been applied to automatic speech recognition and storage devices. In this thesis, efficient error detection schemes for architectures based on low-latency, low-complexity Viterbi decoders are presented. The merit of the proposed schemes is that reliability requirements, overhead tolerance, and performance degradation limits are embedded in the structures and can be adapted accordingly. We also present three variants of recomputing with encoded operands and its modifications to detect both transient and permanent faults, coupled with signature-based schemes. The instrumented decoder architecture has been subjected to extensive error detection assessments through simulations, and application-specific integrated circuit (ASIC) [32nm library] and field-programmable gate array (FPGA) [Xilinx Virtex-6 family] implementations for benchmark. The proposed fine-grained approaches can be utilized based on reliability objectives and performance/implementation metrics degradation tolerance
Arithmetic on a Distributed-Memory Quantum Multicomputer
We evaluate the performance of quantum arithmetic algorithms run on a
distributed quantum computer (a quantum multicomputer). We vary the node
capacity and I/O capabilities, and the network topology. The tradeoff of
choosing between gates executed remotely, through ``teleported gates'' on
entangled pairs of qubits (telegate), versus exchanging the relevant qubits via
quantum teleportation, then executing the algorithm using local gates
(teledata), is examined. We show that the teledata approach performs better,
and that carry-ripple adders perform well when the teleportation block is
decomposed so that the key quantum operations can be parallelized. A node size
of only a few logical qubits performs adequately provided that the nodes have
two transceiver qubits. A linear network topology performs acceptably for a
broad range of system sizes and performance parameters. We therefore recommend
pursuing small, high-I/O bandwidth nodes and a simple network. Such a machine
will run Shor's algorithm for factoring large numbers efficiently.Comment: 24 pages, 10 figures, ACM transactions format. Extended version of
Int. Symp. on Comp. Architecture (ISCA) paper; v2, correct one circuit error,
numerous small changes for clarity, add reference
DESIGN OF NEW HIGH SPEED MULTI OUTPUT CARRY LOOK-AHEAD ADDERS
The carry look-ahead adders are designed till now by using standard 4 bit Manchester carry chain. Due to its limited carry chain length, the carries of the adders are computed using 4 bit carry chain. This leads to slow down the operation. A high speed 8 bit (MCC) adder in multi output domino CMOS logic is designed in this thesis. Due to its limited carry chain length this high speed MCC uses 2 separate 4-bit MCC. The 2 MCC namely odd carry chain and even carry chain are computed in parallel to increase the speed of the operation. This technique has been applied for the design of 8 bit adders in multi output domino logic and the simulation results are verified. Results prove that 8 bit MCC produces less delay compared to conventional 4 bit delay. The reduced delay realizes better speed compared to the conventional designs. The existing design and the previous designs are designed and simulated using Mentor Graphics. The delay of these designs is compared with 8 bit input and with 50 nm technology file. Implementation results reveal that the high speed comparator has delay of 37.47% less compared to the conventional designs used for comparison when operated at 50 MHz
FPGA adders: performance evaluation and optimal design
Delay models and cost analyses developed for ASIC technology are not useful in designing and implementing FPGA devices. The authors discuss costs and operational delays of fixed-point adders on Xilinx 4000 series devices and propose timing models and optimization schemes for carry-skip and carry-select adders.published_or_final_versio
Models of computation: A numeric analysis and performance evaluation
This research seeks to better understand what drives performance in computation. To develop this understanding the researcher investigates the literature on computational performance within the classical and quantum paradigm for both binary and multi-value logic. Based on the findings of the literature the researcher evaluates through an experiment of addition what drives performance and how performance can be improved.
For the evaluation of this research, a realist research paradigm employs two research methods. The first is an automaton model of computation to model each of the computing paradigms and computational logic. The second is computational complexity theory for measuring the performance of addition. Through this evaluation the researcher seeks to gain a better understanding of what drives computational performance and how addition can be performed more efficiently.
The results of the research lead the researcher to conclude that modernisation of machinery caused the birth start of automated computing and the binary number system in computers. As this research indicated that computation through increasing the radix can improve performance of computation for addition. Based on reported findings in the science of quantum mechanics research, it would be possible to implement a model of computation with increased radix. Through embracing state discrimination/ distinguishability this research calls to review the current quantum computing paradigm based on state duality
High sample-rate Givens rotations for recursive least squares
The design of an application-specific integrated circuit of a parallel array processor is considered
for recursive least squares by QR decomposition using Givens rotations, applicable
in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation,
which, for this recursive algorithm, means that the time to perform arithmetic operations
is critical. The algorithm, architecture and arithmetic are considered in a single
integrated design procedure to achieve optimum results.
A realisation approach using standard arithmetic operators, add, multiply and divide is
adopted. The design of high-throughput operators with low delay is addressed for fixed- and
floating-point number formats, and the application of redundant arithmetic considered. New
redundant multiplier architectures are presented enabling reductions in area of up to 25%,
whilst maintaining low delay. A technique is presented enabling the use of a conventional
tree multiplier in recursive applications, allowing savings in area and delay. Two new divider
architectures are presented showing benefits compared with the radix-2 modified SRT algorithm.
Givens rotation algorithms are examined to determine their suitability for VLSI implementation.
A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed
enabling the sample-rate to be increased by a factor of approximately 6 and offering
area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of
136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology.
The enhanced SGR algorithm has been compared with a CORDIC approach and shown to
benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation
on a parallel array of general purpose (GP) DSP chips, it is estimated that a single
application specific chip could offer up to 1,500 times the computation obtained from a
single OP DSP chip
High Performance and Optimal Configuration of Accurate Heterogeneous Block-Based Approximate Adder
Approximate computing is an emerging paradigm to improve power and
performance efficiency for error-resilient application. Recent approximate
adders have significantly extended the design space of accuracy-power
configurable approximate adders, and find optimal designs by exploring the
design space. In this paper, a new energy-efficient heterogeneous block-based
approximate adder (HBBA) is proposed; which is a generic/configurable model
that can be transformed to a particular adder by defining some configurations.
An HBBA, in general, is composed of heterogeneous sub-adders, where each
sub-adder can have a different configuration. A set of configurations of all
the sub-adders in an HBBA defines its configuration. The block-based adders are
approximated through inexact logic configuration and truncated carry chains.
HBBA increases design space providing additional design points that fall on the
Pareto-front and offer better power-accuracy trade-off compared to other
configurations. Furthermore, to avoid Mont-Carlo simulations, we propose an
analytical modelling technique to evaluate the probability of error and
Probability Mass Function (PMF) of error value. Moreover, the estimation method
estimates delay, area and power of heterogeneous block-based approximate
adders. Thus, based on the analytical model and estimation method, the optimal
configuration under a given error constraint can be selected from the whole
design space of the proposed adder model by exhaustive search. The simulation
results show that our HBBA provides improved accuracy in terms of error metrics
compared to some state-of-the-art approximate adders. HBBA with 32 bits length
serves about 15% reduction in area and up to 17% reduction in energy compared
to state-of-the-art approximate adders.Comment: Submitted to the IEEE-TCAD journal, 16 pages, 16 figure
- …