Search CORE

957 research outputs found

Solving the Cauchy-Riemann equations on parallel computers

Author: Fatoohi Raad A.
Grosch Chester E.
Publication venue
Publication date
Field of study

Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented

NASA Technical Reports Server

Probabilistic structural mechanics research for parallel processing computers

Author: Chen Heh-Chyun
Martin William R.
Sues Robert H.
Twisdale Lawrence A.
Publication venue
Publication date
Field of study

Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical

NASA Technical Reports Server

DeSyRe: on-Demand System Reliability

Author: Armato Antonino
Bouganis Christos-Savvas
Falsafi Babak
Gaydadjiev Georgi
Isaza Sebastian
Malek Alirad
Mariani Riccardo
Pnevmatikatos Dionisios N
Pradhan Dhiraj K
Rauwerda Gerard
Seepers Robert
Shafik Rishad Ahmed
Sourdis Ioannis
Strydis Christos
Sunesen Kim
Theodoropoulos Dimitris
Tzilis Stavros
Vavouras Michail
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

Southampton (e-Prints Soton)

EUR Research Repository

Chalmers Research

Chalmers Publication Library

Explore Bristol Research

The Parallel Persistent Memory Model

Author: Berryhill R.
Blelloch G. E.
Buettner M.
Chauhan H.
Herlihy M.
JaJa J.
Lee S. K.
Meena J. S.
Nawab F.
Pelley S.
Woude J. Van Der
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/06/2018
Field of study

We consider a parallel computational model that consists of

P

processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory are lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work-stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of

O(W/P_A + D(P/P_A) \lceil\log_{1/f} W\rceil)

in expectation, where

W

and

D

are the work and depth of the computation (in the absence of failures),

P_A

is the average number of processors available during the computation, and

f \le 1/2

is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same nam

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Theory and realization of novel algorithms for random sampling in digital signal processing

Author: Lo King Chuen
Publication venue
Publication date: 01/01/1996
Field of study

Random sampling is a technique which overcomes the alias problem in regular sampling. The randomization, however, destroys the symmetry property of the transform kernel of the discrete Fourier transform. Hence, when transforming a randomly sampled sequence to its frequency spectrum, the Fast Fourier transform cannot be applied and the computational complexity is N(^2). The objectives of this research project are (1) To devise sampling methods for random sampling such that computation may be reduced while the anti-alias property of random sampling is maintained : Two methods of inserting limited regularities into the randomized sampling grids are proposed. They are parallel additive random sampling and hybrid additive random sampling, both of which can save at least 75% of the multiplications required. The algorithms also lend themselves to the implementation by a multiprocessor system, which will further enhance the speed of the evaluation. (2) To study the auto-correlation sequence of a randomly sampled sequence as an alternative means to confirm its anti-alias property : The anti-alias property of the two proposed methods can be confirmed by using convolution in the frequency domain. However, the same conclusion is also reached by analysing in the spatial domain the auto-correlation of such sample sequences. A technique to evaluate the auto-correlation sequence of a randomly sampled sequence with a regular step size is proposed. The technique may also serve as an algorithm to convert a randomly sampled sequence to a regularly spaced sequence having a desired Nyquist frequency. (3) To provide a rapid spectral estimation using a coarse kernel : The approximate method proposed by Mason in 1980, which trades the accuracy for the speed of the computation, is introduced for making random sampling more attractive. (4) To suggest possible applications for random and pseudo-random sampling : To fully exploit its advantages, random sampling has been adopted in measurement Random sampling is a technique which overcomes the alias problem in regular sampling. The randomization, however, destroys the symmetry property of the transform kernel of the discrete Fourier transform. Hence, when transforming a randomly sampled sequence to its frequency spectrum, the Fast Fourier transform cannot be applied and the computational complexity is N"^. The objectives of this research project are (1) To devise sampling methods for random sampling such that computation may be reduced while the anti-alias property of random sampling is maintained : Two methods of inserting limited regularities into the randomized sampling grids are proposed. They are parallel additive random sampling and hybrid additive random sampling, both of which can save at least 75% , of the multiplications required. The algorithms also lend themselves to the implementation by a multiprocessor system, which will further enhance the speed of the evaluation. (2) To study the auto-correlation sequence of a randomly sampled sequence as an alternative means to confirm its anti-alias property : The anti-alias property of the two proposed methods can be confirmed by using convolution in the frequency domain. However, the same conclusion is also reached by analysing in the spatial domain the auto-correlation of such sample sequences. A technique to evaluate the auto-correlation sequence of a randomly sampled sequence with a regular step size is proposed. The technique may also serve as an algorithm to convert a randomly sampled sequence to a regularly spaced sequence having a desired Nyquist frequency. (3) To provide a rapid spectral estimation using a coarse kernel : The approximate method proposed by Mason in 1980, which trades the accuracy for the speed of the computation, is introduced for making random sampling more attractive. (4) To suggest possible applications for random and pseudo-random sampling : To fully exploit its advantages, random sampling has been adopted in measurement instruments where computing a spectrum is either minimal or not required. Such applications in instrumentation are easily found in the literature. In this thesis, two applications in digital signal processing are introduced. (5) To suggest an inverse transformation for random sampling so as to complete a two-way process and to broaden its scope of application. Apart from the above, a case study of realizing in a transputer network the prime factor algorithm with regular sampling is given in Chapter 2 and a rough estimation of the signal-to-noise ratio for a spectrum obtained from random sampling is found in Chapter 3. Although random sampling is alias-free, problems in computational complexity and noise prevent it from being adopted widely in engineering applications. In the conclusions, the criteria for adopting random sampling are put forward and the directions for its development are discussed

Durham e-Theses

Implementation and Performance Analysis of Numerical Algorithms on the MPP, Flex/32, and Cray/2

Author: Fatoohi Raad A.
Publication venue: ODU Digital Commons
Publication date: 01/01/1987
Field of study

This dissertation presents the results of the implementation of a number of numerical algorithms on three parallel/vector computers. The object of this research is to determine how well, or poorly, a number of numerical algorithms would map onto three different architectures and to analyze the performance of these architectures using these algorithms. These algorithms are: a relaxation scheme for the solution of the Cauchy-Riemann equations, an ADI method for the solution of the diffusion equation, and a compact difference scheme for the solution of two-dimensional Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and the Cray/2. The machine architectures are briefly described. The implementation of these algorithms is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for these algorithms on these architectures. Finally conclusions are presented

Old Dominion University

Implementation and analysis of a Navier-Stokes algorithm on parallel computers

Author: Fatoohi Raad A.
Grosch Chester E.
Publication venue
Publication date
Field of study

The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented

NASA Technical Reports Server

ASIP Design and Prototyping for Wireless Communication Applications

Author: Amer Baghdadi
Atif Raza Jafri
Michel Jezequel
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

International audienc

IntechOpen

HAL-Université de Bretagne Occidentale

HAL Descartes

Processor Enhancements for Media Streaming Applications

Author: Bilavarn S.
Debes E.
Diguet J.
Vandergheynst P.
Publication venue
Publication date: 18/06/2018
Field of study

The development of more processing demanding applications on the Internet (video broadcasting) on one hand and the popularity of recent devices at the user level (digital cameras, wireless videophones, ...) on the other hand introduce challenges at several levels. Today, such devices present processing capabilities and bandwidth settings that are inefficient to manage scalable QoS requirements in a typical media delivery framework. In this paper, we present an impact study of such a scalable data representation optimized for QoS (Matching Pursuit 3D algorithms) on processor architectures to achieve the best performance and power efficiency. A review of state of the art techniques for processor architecture enhancement let us expect promising opportunities from the latest developments in the reconfigurable computing research field. We present here the first design steps of an efficient reconfigurable coprocessor especially designed to cope with future video delivery and multimedia processing requirements. Architecture perspectives are proposed with respect to low development cost constraints, backward compatibilty and easy coprocessor usage using an original strategy based on a hardware/software codesign methodolog

RERO DOC Digital Library

Satellite on-board processing for earth resources data

Author: Bodenheimer R. E.
Gonzalez R. C.
Gupta J. N.
Hwang K.
Rochelle R. W.
Wilson J. B.
Wintz P. A.
Publication venue
Publication date
Field of study

Results of a survey of earth resources user applications and their data requirements, earth resources multispectral scanner sensor technology, and preprocessing algorithms for correcting the sensor outputs and for data bulk reduction are presented along with a candidate data format. Computational requirements required to implement the data analysis algorithms are included along with a review of computer architectures and organizations. Computer architectures capable of handling the algorithm computational requirements are suggested and the environmental effects of an on-board processor discussed. By relating performance parameters to the system requirements of each of the user requirements the feasibility of on-board processing is determined for each user. A tradeoff analysis is performed to determine the sensitivity of results to each of the system parameters. Significant results and conclusions are discussed, and recommendations are presented

NASA Technical Reports Server