Search CORE

11 research outputs found

Training a Linear Neural Network with a Stable LSP Solution for Jamming Cancellation

Author: Rachkovskij Dmitri
Revunova Elena
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2005
Field of study

Two jamming cancellation algorithms are developed based on a stable solution of least squares problem (LSP) provided by regularization. They are based on filtered singular value decomposition (SVD) and modifications of the Greville formula. Both algorithms allow an efficient hardware implementation. Testing results on artificial data modeling difficult real-world situations are also provided

Bulgarian Digital Mathematics Library at IMI-BAS

A weakly stable algorithm for general Toeplitz systems

Author: A. Griewank
A.N. Kolmogorov
A.W. Bojanczyk
A.W. Bojanczyk
A.W. Bojanczyk
A.W. Bojanczyk
A.W. Bojanczyk
Adam W. Bojanczyk
B.R. Musicus
C.-T. Pan
C.C. Paige
C.H. Bischof
D.J. Higham
D.R. Sweet
D.R. Sweet
D.R. Sweet
E.H. Bareiss
F.R. Hoog de
F.T. Luk
Frank R. de Hoog
G. Cybenko
G. Cybenko
G. Heinig
G. Heinig
G. Szegö
G.A. Watson
G.H. Golub
G.H. Golub
G.H. Golub
G.S. Ammar
G.W. Stewart
G.W. Stewart
G.W. Stewart
H. Sexton
I. Schur
J. Chun
J. Chun
J. Dongarra
J. Jankowski
J. Rissanen
J.G. Nagy
J.H. Wilkinson
J.H. Wilkinson
J.H. Wilkinson
J.M. Varah
J.R. Bunch
J.R. Bunch
M. Gentleman
M.A. Saunders
M.H. Gutknecht
M.H. Gutknecht
M.H. Gutknecht
N. Gould
N. Levinson
N. Wiener
P.C. Hansen
P.E. Gill
R. Fletcher
R.D. Skeel
R.P. Brent
R.P. Brent
R.R. Bitmead
R.W. Freund
R.W. Freund
R.W. Freund
Richard P. Brent
S. Qiao
S. Zohar
T. Kailath
T. Kailath
T. Kailath
T. Kailath
T.F. Chan
T.F. Chan
W. Miller
W.F. Trench
Å. Björck
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1995
Field of study

We show that a fast algorithm for the QR factorization of a Toeplitz or Hankel matrix A is weakly stable in the sense that R^T.R is close to A^T.A. Thus, when the algorithm is used to solve the semi-normal equations R^T.Rx = A^Tb, we obtain a weakly stable method for the solution of a nonsingular Toeplitz or Hankel linear system Ax = b. The algorithm also applies to the solution of the full-rank Toeplitz or Hankel least squares problem.Comment: 17 pages. An old Technical Report with postscript added. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub143.htm

arXiv.org e-Print Archive

CiteSeerX

Crossref

On the Factor Refinement Principle and its Implementation on Multicore Architectures

Author: Ali Md. Mohsin
Publication venue: Scholarship@Western
Publication date: 01/01/2011
Field of study

The factor refinement principle turns a partial factorization of integers (or polynomi als) into a more complete factorization represented by basis elements and exponents, with basis elements that are pairwise coprime. There are lots of applications of this refinement technique such as simplifying systems of polynomial inequations and, more generally, speeding up certain algebraic algorithms by eliminating redundant expressions that may occur during intermediate computations. Successive GCD computations and divisions are used to accomplish this task until all the basis elements are pairwise coprime. Moreover, square-free factorization (which is the first step of many factorization algorithms) is used to remove the repeated patterns from each input element. Differentiation, division and GCD calculation op erations are required to complete this pre-processing step. Both factor refinement and square-free factorization often rely on plain (quadratic) algorithms for multipli cation but can be substantially improved with asymptotically fast multiplication on sufficiently large input. In this work, we review the working principles and complexity estimates of the factor refinement, in case of plain arithmetic, as well as asymptotically fast arithmetic. Following this review process, we design, analyze and implement parallel adaptations of these factor refinement algorithms. We consider several algorithm optimization techniques such as data locality analysis, balancing subproblems, etc. to fully exploit modern multicore architectures. The Cilk++ implementation of our parallel algorithm based on the augment refinement principle of Bach, Driscoll and Shallit achieves linear speedup for input data of sufficiently large size

Scholarship@Western

The instruction of systolic array (ISA) and simulation of parallel algorithms

Author: Ossama K. Muslih (7169165)
Publication venue
Publication date: 01/01/1989
Field of study

Systolic arrays have proved to be well suited for Very Large Scale Integrated technology (VLSI) since they: -Consist of a regular network of simple processing cells, -Use local communication between the processing cells only, -Exploit a maximal degree of parallelism. However, systolic arrays have one main disadvantage compared with other parallel computer architectures: they are special purpose architectures only capable of executing one algorithm, e.g., a systolic array designed for sorting cannot be used to form matrix multiplication. Several approaches have been made to make systolic arrays more flexible, in order to be able to handle different problems on a single systolic array. In this thesis an alternative concept to a VLSI-architecture the Soft-Systolic Simulation System (SSSS), is introduced and developed as a working model of virtual machine with the power to simulate hard systolic arrays and more general forms of concurrency such as the SIMD and MIMD models of computation. The virtual machine includes a processing element consisting of a soft-systolic processor implemented in the virtual.machine language. The processing element considered here was a very general element which allows the choice of a wide range of arithmetic and logical operators and allows the simulation of a wide class of algorithms but in principle extra processing cells can be added making a library and this library be tailored to individual needs. The virtual machine chosen for this implementation is the Instruction Systolic Array (ISA). The ISA has a number of interesting features, firstly it has been used to simulate all SIMD algorithms and many MIMD algorithms by a simple program transformation technique, further, the ISA can also simulate the so-called wavefront processor algorithms, as well as many hard systolic algorithms. The ISA removes the need for the broadcasting of data which is a feature of SIMD algorithms (limiting the size of the machine and its cycle time) and also presents a fairly simple communication structure for MIMD algorithms. The model of systolic computation developed from the VLSI approach to systolic arrays is such that the processing surface is fixed, as are the processing elements or cells by virtue of their being embedded in the processing surface. The VLSI approach therefore freezes instructions and hardware relative to the movement of data with the virtual machine and softsystolic programming retaining the constructions of VLSI for array design features such as regularity, simplicity and local communication, allowing the movement of instructions with respect to data. Data can be frozen into the structure with instructions moving systolically. Alternatively both the data and instructions can move systolically around the virtual processors, (which are deemed fixed relative to the underlying architecture). The ISA is implemented in OCCAM programs whose execution and output implicitly confirm the correctness of the design. The soft-systolic preparation comprises of the usual operating system facilities for the creation and modification of files during the development of new programs and ISA processor elements. We allow any concurrent high level language to be used to model the softsystolic program. Consequently the Replicating Instruction Systolic Array Language (RI SAL) was devised to provide a very primitive program environment to the ISA but adequate for testing. RI SAL accepts instructions in an assembler-like form, but is fairly permissive about the format of statements, subject of course to syntax. The RI SAL compiler is adopted to transform the soft-systolic program description (RISAL) into a form suitable for the virtual machine (simulating the algorithm) to run. Finally we conclude that the principles mentioned here can form the basis for a soft-systolic simulator using an orthogonally connected mesh of processors. The wide range of algorithms which the ISA can simulate make it suitable for a virtual simulating grid

Loughborough University Institutional Repository

Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2<sup>m</sup>)

Author: Daneshbeh Amir
Publication venue: 'University of Waterloo'
Publication date: 01/01/2005
Field of study

Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures. This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF(2m) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein's) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved. Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w. r. t. other designs while being more flexible, reliable and with minimum number of inter-cell interconnects. The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delays structure. Further a novel restoring mechanism for result sequences of EEA is proposed using a double delay element implementation. Finally, using this systolic architecture a CMD (Combined Multiplier Divider) datapath is designed which is used as the core of a novel systolic elliptic curve processor. This EC processor uses affine coordinates to compute scalar point multiplication which results in having a very small control unit and negligible with respect to the datapath for all practical values of m. The throughput of this EC based on this bit serial systolic architecture is comparable with designs many times larger than itself reported previously

University of Waterloo's Institutional Repository

Some Linear-Time Algorithms for Systolic Arrays

Author: Brent Richard P.
Kung H. T.
Luk Franklin T.
Publication venue: 'SAGE Publications'
Publication date: 01/01/1983
Field of study

We survey some recent results on linear-time and almost linear-time algorithms for one and two-dimensional systolic arrays. In particular, we show how the greatest common divisor (GCD) of two polynomials of degree

n

over a finite field can be computed in time

O(n)

on a linear systolic array of

O(n)

cells; similarly for the GCD of two

n

-bit binary numbers. Assuming that the systolic cells can perform floating-point arithmetic, we show how

n

n

Toeplitz systems of linear equations can be solved in time

O(n)

on a linear array of

O(n)

cells, each of which has constant memory size (independent of

n

). Finally, we outline how a two-dimensional array of

O(n)

O(n)

cells with nearest-neighbor interconnections can be used to solve (to working accuracy) the eigenvalue problem for a symmetric real

n

n

matrix in time

O(nS(n))

. Here

S(n)

is a slowly-growing function of

n

; for practical purposes

S(n)

can be regarded as a constant. In addition to their theoretical interest, these results can be implemented relatively easily and have potential applications in the areas of error-correcting codes, symbolic and algebraic computation, signal processing and image processing. For example, systolic GCD arrays for error correction have been implemented with the microprogrammable "PSC" chip

eCommons@Cornell