10,535 research outputs found
Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials
We present a new type of bit-parallel non-recursive Karatsuba multiplier over generated by an arbitrary irreducible trinomial. This design effectively exploits Mastrovito approach and shifted polynomial basis (SPB) to reduce the time complexity and Karatsuba algorithm to reduce its space complexity.
We show that this type of multiplier is only one slower than the fastest bit-parallel multiplier for all trinomials, where is the delay of one 2-input XOR gate. Meanwhile, its space complexity is roughly 3/4 of those multipliers.
To the best of our knowledge, it is the first time that our scheme has reached such a time delay bound. This result outperforms previously proposed non-recursive Karatsuba multipliers
High sample-rate Givens rotations for recursive least squares
The design of an application-specific integrated circuit of a parallel array processor is considered
for recursive least squares by QR decomposition using Givens rotations, applicable
in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation,
which, for this recursive algorithm, means that the time to perform arithmetic operations
is critical. The algorithm, architecture and arithmetic are considered in a single
integrated design procedure to achieve optimum results.
A realisation approach using standard arithmetic operators, add, multiply and divide is
adopted. The design of high-throughput operators with low delay is addressed for fixed- and
floating-point number formats, and the application of redundant arithmetic considered. New
redundant multiplier architectures are presented enabling reductions in area of up to 25%,
whilst maintaining low delay. A technique is presented enabling the use of a conventional
tree multiplier in recursive applications, allowing savings in area and delay. Two new divider
architectures are presented showing benefits compared with the radix-2 modified SRT algorithm.
Givens rotation algorithms are examined to determine their suitability for VLSI implementation.
A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed
enabling the sample-rate to be increased by a factor of approximately 6 and offering
area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of
136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology.
The enhanced SGR algorithm has been compared with a CORDIC approach and shown to
benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation
on a parallel array of general purpose (GP) DSP chips, it is estimated that a single
application specific chip could offer up to 1,500 times the computation obtained from a
single OP DSP chip
Bit-level pipelined digit-serial array processors
A new architecture for high performance digit-serial vector inner product (VIP) which can be pipelined to the bit-level is introduced. The design of the digit-serial vector inner product is based on a new systematic design methodology using radix-2n arithmetic. The proposed architecture allows a high level of bit-level pipelining to increase the throughput rate with minimum initial delay and minimum area. This will give designers greater flexibility in finding the best tradeoff between hardware cost and throughput rate. It is shown that sub-digit pipelined digit-serial structure can achieve a higher throughput rate with much less area consumption than an equivalent bit-parallel structure. A twin-pipe architecture to double the throughput rate of digit-serial multipliers and consequently that of the digit-serial vector inner product is also presented. The effect of the number of pipelining levels and the twin-pipe architecture on the throughput rate and hardware cost are discussed. A two's complement digit-serial architecture which can operate on both negative and positive numbers is also presented
A Parallel Dual Fast Gradient Method for MPC Applications
We propose a parallel adaptive constraint-tightening approach to solve a
linear model predictive control problem for discrete-time systems, based on
inexact numerical optimization algorithms and operator splitting methods. The
underlying algorithm first splits the original problem in as many independent
subproblems as the length of the prediction horizon. Then, our algorithm
computes a solution for these subproblems in parallel by exploiting auxiliary
tightened subproblems in order to certify the control law in terms of
suboptimality and recursive feasibility, along with closed-loop stability of
the controlled system. Compared to prior approaches based on constraint
tightening, our algorithm computes the tightening parameter for each subproblem
to handle the propagation of errors introduced by the parallelization of the
original problem. Our simulations show the computational benefits of the
parallelization with positive impacts on performance and numerical conditioning
when compared with a recent nonparallel adaptive tightening scheme.Comment: This technical report is an extended version of the paper "A Parallel
Dual Fast Gradient Method for MPC Applications" by the same authors submitted
to the 54th IEEE Conference on Decision and Contro
Modeling Algorithms in SystemC and ACL2
We describe the formal language MASC, based on a subset of SystemC and
intended for modeling algorithms to be implemented in hardware. By means of a
special-purpose parser, an algorithm coded in SystemC is converted to a MASC
model for the purpose of documentation, which in turn is translated to ACL2 for
formal verification. The parser also generates a SystemC variant that is
suitable as input to a high-level synthesis tool. As an illustration of this
methodology, we describe a proof of correctness of a simple 32-bit radix-4
multiplier.Comment: In Proceedings ACL2 2014, arXiv:1406.123
Design of doubly-complementary IIR digital filters using a single complex allpass filter, with multirate applications
It is shown that a large class of real-coefficient doubly-complementary IIR transfer function pairs can be implemented by means of a single complex allpass filter. For a real input sequence, the real part of the output sequence corresponds to the output of one of the transfer functions G(z) (for example, lowpass), whereas the imaginary part of the output sequence corresponds to its "complementary" filter H(z)(for example, highpass). The resulting implementation is structurally lossless, and hence the implementations of G(z) and H(z) have very low passband sensitivity. Numerical design examples are included, and a typical numerical example shows that the new implementation with 4 bits per multiplier is considerably better than a direct form implementation with 9 bits per multiplier. Multirate filter bank applications (quadrature mirror filtering) are outlined
Architectures for block Toeplitz systems
In this paper efficient VLSI architectures of highly concurrent algorithms for the solution of block linear systems with Toeplitz or near-to-Toeplitz entries are presented. The main features of the proposed scheme are the use of scalar only operations, multiplications/divisions and additions, and the local communication which enables the development of wavefront array architecture. Both the mean squared error and the total squared error formulations are described and a variety of implementations are given
- âŠ