Search CORE

12,905 research outputs found

A linear algebra processor using Monte Carlo methods

Author: Alexandrov Vassil Nikolov
Cadenas Medina Jose Oswaldo
Megson Graham M
Plaks T P
Publication venue
Publication date: 11/09/2003
Field of study

Central Archive at the University of Reading

Parallel algorithms and architecture for computation of manipulator forward dynamics

Author: Bejczy Antal K.
Fijany Amir
Publication venue
Publication date
Field of study

Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics

NASA Technical Reports Server

A survey of the state of the art and focused research in range systems, task 2

Author: Yao K.
Publication venue
Publication date
Field of study

Contract generated publications are compiled which describe the research activities for the reporting period. Study topics include: equivalent configurations of systolic arrays; least squares estimation algorithms with systolic array architectures; modeling and equilization of nonlinear bandlimited satellite channels; and least squares estimation and Kalman filtering by systolic arrays

NASA Technical Reports Server

Parallel computation of optimized arrays for 2-D electrical imaging surveys

Author: Chambers J.E.
Loke M.H.
Wilkinson P.B.
Publication venue: 'Wiley'
Publication date: 01/12/2010
Field of study

Modern automatic multi-electrode survey instruments have made it possible to use non-traditional arrays to maximize the subsurface resolution from electrical imaging surveys. Previous studies have shown that one of the best methods for generating optimized arrays is to select the set of array configurations that maximizes the model resolution for a homogeneous earth model. The Sherman–Morrison Rank-1 update is used to calculate the change in the model resolution when a new array is added to a selected set of array configurations. This method had the disadvantage that it required several hours of computer time even for short 2-D survey lines. The algorithm was modified to calculate the change in the model resolution rather than the entire resolution matrix. This reduces the computer time and memory required as well as the computational round-off errors. The matrix–vector multiplications for a single add-on array were replaced with matrix–matrix multiplications for 28 add-on arrays to further reduce the computer time. The temporary variables were stored in the double-precision Single Instruction Multiple Data (SIMD) registers within the CPU to minimize computer memory access. A further reduction in the computer time is achieved by using the computer graphics card Graphics Processor Unit (GPU) as a highly parallel mathematical coprocessor. This makes it possible to carry out the calculations for 512 add-on arrays in parallel using the GPU. The changes reduce the computer time by more than two orders of magnitude. The algorithm used to generate an optimized data set adds a specified number of new array configurations after each iteration to the existing set. The resolution of the optimized data set can be increased by adding a smaller number of new array configurations after each iteration. Although this increases the computer time required to generate an optimized data set with the same number of data points, the new fast numerical routines has made this practical on commonly available microcomputers

NERC Open Research Archive

Architectures for block Toeplitz systems

Author: Bouras Ilias
Glentis George-Othon
Kalouptsidis Nicholas
Publication venue: Elsevier
Publication date: 01/01/1995
Field of study

In this paper efficient VLSI architectures of highly concurrent algorithms for the solution of block linear systems with Toeplitz or near-to-Toeplitz entries are presented. The main features of the proposed scheme are the use of scalar only operations, multiplications/divisions and additions, and the local communication which enables the development of wavefront array architecture. Both the mean squared error and the total squared error formulations are described and a variety of implementations are given

CiteSeerX

University of Twente Research Information

A bibliography on parallel and vector numerical algorithms

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

NASA Technical Reports Server

Systolic VLSI for Kalman filters

Author: Chang J. J.
Yeh H.-G.
Publication venue
Publication date
Field of study

A novel two-dimensional parallel computing method for real-time Kalman filtering is presented. The mathematical formulation of a Kalman filter algorithm is rearranged to be the type of Faddeev algorithm for generalizing signal processing. The data flow mapping from the Faddeev algorithm to a two-dimensional concurrent computing structure is developed. The architecture of the resulting processor cells is regular, simple, expandable, and therefore naturally suitable for VLSI chip implementation. The computing methodology and the two-dimensional systolic arrays are useful for Kalman filter applications as well as other matrix/vector based algebraic computations

NASA Technical Reports Server

Quantum Monte Carlo for large chemical systems: Implementing efficient strategies for petascale platforms and beyond

Author: Caffarel Michel
Jalby William
Oseret Emmanuel
Scemama Anthony
Publication venue: 'Wiley'
Publication date: 01/10/2012
Field of study

Various strategies to implement efficiently QMC simulations for large chemical systems are presented. These include: i.) the introduction of an efficient algorithm to calculate the computationally expensive Slater matrices. This novel scheme is based on the use of the highly localized character of atomic Gaussian basis functions (not the molecular orbitals as usually done), ii.) the possibility of keeping the memory footprint minimal, iii.) the important enhancement of single-core performance when efficient optimization tools are employed, and iv.) the definition of a universal, dynamic, fault-tolerant, and load-balanced computational framework adapted to all kinds of computational platforms (massively parallel machines, clusters, or distributed grids). These strategies have been implemented in the QMC=Chem code developed at Toulouse and illustrated with numerical applications on small peptides of increasing sizes (158, 434, 1056 and 1731 electrons). Using 10k-80k computing cores of the Curie machine (GENCI-TGCC-CEA, France) QMC=Chem has been shown to be capable of running at the petascale level, thus demonstrating that for this machine a large part of the peak performance can be achieved. Implementation of large-scale QMC simulations for future exascale platforms with a comparable level of efficiency is expected to be feasible

arXiv.org e-Print Archive

HAL-INSA Toulouse

HAL UVSQ

Adapting the interior point method for the solution of LPs on serial, coarse grain parallel and massively parallel computers

Author: Andersen J
Levkovitz R
Mitra G
Tamiz M
Publication venue: Brunel University
Publication date: 01/01/1990
Field of study

In this paper we describe a unified scheme for implementing an interior point algorithm (IPM) over a range of computer architectures. In the inner iteration of the IPM a search direction is computed using Newton's method. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system, and the design of data structures to take advantage of serial, coarse grain parallel and massively parallel computer architectures, are considered in detail. We put forward arguments as to why integration of the system within a sparse simplex solver is important and outline how the system is designed to achieve this integration

CiteSeerX

Brunel University Research Archive