10,535 research outputs found

    Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials

    Get PDF
    We present a new type of bit-parallel non-recursive Karatsuba multiplier over GF(2m)GF(2^m) generated by an arbitrary irreducible trinomial. This design effectively exploits Mastrovito approach and shifted polynomial basis (SPB) to reduce the time complexity and Karatsuba algorithm to reduce its space complexity. We show that this type of multiplier is only one TXT_X slower than the fastest bit-parallel multiplier for all trinomials, where TXT_X is the delay of one 2-input XOR gate. Meanwhile, its space complexity is roughly 3/4 of those multipliers. To the best of our knowledge, it is the first time that our scheme has reached such a time delay bound. This result outperforms previously proposed non-recursive Karatsuba multipliers

    High sample-rate Givens rotations for recursive least squares

    Get PDF
    The design of an application-specific integrated circuit of a parallel array processor is considered for recursive least squares by QR decomposition using Givens rotations, applicable in adaptive filtering and beamforming applications. Emphasis is on high sample-rate operation, which, for this recursive algorithm, means that the time to perform arithmetic operations is critical. The algorithm, architecture and arithmetic are considered in a single integrated design procedure to achieve optimum results. A realisation approach using standard arithmetic operators, add, multiply and divide is adopted. The design of high-throughput operators with low delay is addressed for fixed- and floating-point number formats, and the application of redundant arithmetic considered. New redundant multiplier architectures are presented enabling reductions in area of up to 25%, whilst maintaining low delay. A technique is presented enabling the use of a conventional tree multiplier in recursive applications, allowing savings in area and delay. Two new divider architectures are presented showing benefits compared with the radix-2 modified SRT algorithm. Givens rotation algorithms are examined to determine their suitability for VLSI implementation. A novel algorithm, based on the Squared Givens Rotation (SGR) algorithm, is developed enabling the sample-rate to be increased by a factor of approximately 6 and offering area reductions up to a factor of 2 over previous approaches. An estimated sample-rate of 136 MHz could be achieved using a standard cell approach and O.35pm CMOS technology. The enhanced SGR algorithm has been compared with a CORDIC approach and shown to benefit by a factor of 3 in area and over 11 in sample-rate. When compared with a recent implementation on a parallel array of general purpose (GP) DSP chips, it is estimated that a single application specific chip could offer up to 1,500 times the computation obtained from a single OP DSP chip

    Bit-level pipelined digit-serial array processors

    Get PDF
    A new architecture for high performance digit-serial vector inner product (VIP) which can be pipelined to the bit-level is introduced. The design of the digit-serial vector inner product is based on a new systematic design methodology using radix-2n arithmetic. The proposed architecture allows a high level of bit-level pipelining to increase the throughput rate with minimum initial delay and minimum area. This will give designers greater flexibility in finding the best tradeoff between hardware cost and throughput rate. It is shown that sub-digit pipelined digit-serial structure can achieve a higher throughput rate with much less area consumption than an equivalent bit-parallel structure. A twin-pipe architecture to double the throughput rate of digit-serial multipliers and consequently that of the digit-serial vector inner product is also presented. The effect of the number of pipelining levels and the twin-pipe architecture on the throughput rate and hardware cost are discussed. A two's complement digit-serial architecture which can operate on both negative and positive numbers is also presented

    A Parallel Dual Fast Gradient Method for MPC Applications

    Full text link
    We propose a parallel adaptive constraint-tightening approach to solve a linear model predictive control problem for discrete-time systems, based on inexact numerical optimization algorithms and operator splitting methods. The underlying algorithm first splits the original problem in as many independent subproblems as the length of the prediction horizon. Then, our algorithm computes a solution for these subproblems in parallel by exploiting auxiliary tightened subproblems in order to certify the control law in terms of suboptimality and recursive feasibility, along with closed-loop stability of the controlled system. Compared to prior approaches based on constraint tightening, our algorithm computes the tightening parameter for each subproblem to handle the propagation of errors introduced by the parallelization of the original problem. Our simulations show the computational benefits of the parallelization with positive impacts on performance and numerical conditioning when compared with a recent nonparallel adaptive tightening scheme.Comment: This technical report is an extended version of the paper "A Parallel Dual Fast Gradient Method for MPC Applications" by the same authors submitted to the 54th IEEE Conference on Decision and Contro

    Modeling Algorithms in SystemC and ACL2

    Full text link
    We describe the formal language MASC, based on a subset of SystemC and intended for modeling algorithms to be implemented in hardware. By means of a special-purpose parser, an algorithm coded in SystemC is converted to a MASC model for the purpose of documentation, which in turn is translated to ACL2 for formal verification. The parser also generates a SystemC variant that is suitable as input to a high-level synthesis tool. As an illustration of this methodology, we describe a proof of correctness of a simple 32-bit radix-4 multiplier.Comment: In Proceedings ACL2 2014, arXiv:1406.123

    Design of doubly-complementary IIR digital filters using a single complex allpass filter, with multirate applications

    Get PDF
    It is shown that a large class of real-coefficient doubly-complementary IIR transfer function pairs can be implemented by means of a single complex allpass filter. For a real input sequence, the real part of the output sequence corresponds to the output of one of the transfer functions G(z) (for example, lowpass), whereas the imaginary part of the output sequence corresponds to its "complementary" filter H(z)(for example, highpass). The resulting implementation is structurally lossless, and hence the implementations of G(z) and H(z) have very low passband sensitivity. Numerical design examples are included, and a typical numerical example shows that the new implementation with 4 bits per multiplier is considerably better than a direct form implementation with 9 bits per multiplier. Multirate filter bank applications (quadrature mirror filtering) are outlined

    Architectures for block Toeplitz systems

    Get PDF
    In this paper efficient VLSI architectures of highly concurrent algorithms for the solution of block linear systems with Toeplitz or near-to-Toeplitz entries are presented. The main features of the proposed scheme are the use of scalar only operations, multiplications/divisions and additions, and the local communication which enables the development of wavefront array architecture. Both the mean squared error and the total squared error formulations are described and a variety of implementations are given
    • 

    corecore