957 research outputs found
Solving the Cauchy-Riemann equations on parallel computers
Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented
Probabilistic structural mechanics research for parallel processing computers
Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
The Parallel Persistent Memory Model
We consider a parallel computational model that consists of processors,
each with a fast local ephemeral memory of limited size, and sharing a large
persistent memory. The model allows for each processor to fault with bounded
probability, and possibly restart. On faulting all processor state and local
ephemeral memory are lost, but the persistent memory remains. This model is
motivated by upcoming non-volatile memories that are as fast as existing random
access memory, are accessible at the granularity of cache lines, and have the
capability of surviving power outages. It is further motivated by the
observation that in large parallel systems, failure of processors and their
caches is not unusual.
Within the model we develop a framework for developing locality efficient
parallel algorithms that are resilient to failures. There are several
challenges, including the need to recover from failures, the desire to do this
in an asynchronous setting (i.e., not blocking other processors when one
fails), and the need for synchronization primitives that are robust to
failures. We describe approaches to solve these challenges based on breaking
computations into what we call capsules, which have certain properties, and
developing a work-stealing scheduler that functions properly within the context
of failures. The scheduler guarantees a time bound of in expectation, where and are the work and
depth of the computation (in the absence of failures), is the average
number of processors available during the computation, and is the
probability that a capsule fails. Within the model and using the proposed
methods, we develop efficient algorithms for parallel sorting and other
primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same
nam
Theory and realization of novel algorithms for random sampling in digital signal processing
Random sampling is a technique which overcomes the alias problem in regular sampling. The randomization, however, destroys the symmetry property of the transform kernel of the discrete Fourier transform. Hence, when transforming a randomly sampled sequence to its frequency spectrum, the Fast Fourier transform cannot be applied and the computational complexity is N(^2). The objectives of this research project are (1) To devise sampling methods for random sampling such that computation may be reduced while the anti-alias property of random sampling is maintained : Two methods of inserting limited regularities into the randomized sampling grids are proposed. They are parallel additive random sampling and hybrid additive random sampling, both of which can save at least 75% of the multiplications required. The algorithms also lend themselves to the implementation by a multiprocessor system, which will further enhance the speed of the evaluation. (2) To study the auto-correlation sequence of a randomly sampled sequence as an alternative means to confirm its anti-alias property : The anti-alias property of the two proposed methods can be confirmed by using convolution in the frequency domain. However, the same conclusion is also reached by analysing in the spatial domain the auto-correlation of such sample sequences. A technique to evaluate the auto-correlation sequence of a randomly sampled sequence with a regular step size is proposed. The technique may also serve as an algorithm to convert a randomly sampled sequence to a regularly spaced sequence having a desired Nyquist frequency. (3) To provide a rapid spectral estimation using a coarse kernel : The approximate method proposed by Mason in 1980, which trades the accuracy for the speed of the computation, is introduced for making random sampling more attractive. (4) To suggest possible applications for random and pseudo-random sampling : To fully exploit its advantages, random sampling has been adopted in measurement Random sampling is a technique which overcomes the alias problem in regular sampling. The randomization, however, destroys the symmetry property of the transform kernel of the discrete Fourier transform. Hence, when transforming a randomly sampled sequence to its frequency spectrum, the Fast Fourier transform cannot be applied and the computational complexity is N"^. The objectives of this research project are (1) To devise sampling methods for random sampling such that computation may be reduced while the anti-alias property of random sampling is maintained : Two methods of inserting limited regularities into the randomized sampling grids are proposed. They are parallel additive random sampling and hybrid additive random sampling, both of which can save at least 75% , of the multiplications required. The algorithms also lend themselves to the implementation by a multiprocessor system, which will further enhance the speed of the evaluation. (2) To study the auto-correlation sequence of a randomly sampled sequence as an alternative means to confirm its anti-alias property : The anti-alias property of the two proposed methods can be confirmed by using convolution in the frequency domain. However, the same conclusion is also reached by analysing in the spatial domain the auto-correlation of such sample sequences. A technique to evaluate the auto-correlation sequence of a randomly sampled sequence with a regular step size is proposed. The technique may also serve as an algorithm to convert a randomly sampled sequence to a regularly spaced sequence having a desired Nyquist frequency. (3) To provide a rapid spectral estimation using a coarse kernel : The approximate method proposed by Mason in 1980, which trades the accuracy for the speed of the computation, is introduced for making random sampling more attractive. (4) To suggest possible applications for random and pseudo-random sampling : To fully exploit its advantages, random sampling has been adopted in measurement instruments where computing a spectrum is either minimal or not required. Such applications in instrumentation are easily found in the literature. In this thesis, two applications in digital signal processing are introduced. (5) To suggest an inverse transformation for random sampling so as to complete a two-way process and to broaden its scope of application. Apart from the above, a case study of realizing in a transputer network the prime factor algorithm with regular sampling is given in Chapter 2 and a rough estimation of the signal-to-noise ratio for a spectrum obtained from random sampling is found in Chapter 3. Although random sampling is alias-free, problems in computational complexity and noise prevent it from being adopted widely in engineering applications. In the conclusions, the criteria for adopting random sampling are put forward and the directions for its development are discussed
Implementation and Performance Analysis of Numerical Algorithms on the MPP, Flex/32, and Cray/2
This dissertation presents the results of the implementation of a number of numerical algorithms on three parallel/vector computers. The object of this research is to determine how well, or poorly, a number of numerical algorithms would map onto three different architectures and to analyze the performance of these architectures using these algorithms. These algorithms are: a relaxation scheme for the solution of the Cauchy-Riemann equations, an ADI method for the solution of the diffusion equation, and a compact difference scheme for the solution of two-dimensional Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and the Cray/2. The machine architectures are briefly described. The implementation of these algorithms is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for these algorithms on these architectures. Finally conclusions are presented
Implementation and analysis of a Navier-Stokes algorithm on parallel computers
The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented
ASIP Design and Prototyping for Wireless Communication Applications
International audienc
Processor Enhancements for Media Streaming Applications
The development of more processing demanding applications on the Internet (video broadcasting) on one hand and the popularity of recent devices at the user level (digital cameras, wireless videophones, ...) on the other hand introduce challenges at several levels. Today, such devices present processing capabilities and bandwidth settings that are inefficient to manage scalable QoS requirements in a typical media delivery framework. In this paper, we present an impact study of such a scalable data representation optimized for QoS (Matching Pursuit 3D algorithms) on processor architectures to achieve the best performance and power efficiency. A review of state of the art techniques for processor architecture enhancement let us expect promising opportunities from the latest developments in the reconfigurable computing research field. We present here the first design steps of an efficient reconfigurable coprocessor especially designed to cope with future video delivery and multimedia processing requirements. Architecture perspectives are proposed with respect to low development cost constraints, backward compatibilty and easy coprocessor usage using an original strategy based on a hardware/software codesign methodolog
Satellite on-board processing for earth resources data
Results of a survey of earth resources user applications and their data requirements, earth resources multispectral scanner sensor technology, and preprocessing algorithms for correcting the sensor outputs and for data bulk reduction are presented along with a candidate data format. Computational requirements required to implement the data analysis algorithms are included along with a review of computer architectures and organizations. Computer architectures capable of handling the algorithm computational requirements are suggested and the environmental effects of an on-board processor discussed. By relating performance parameters to the system requirements of each of the user requirements the feasibility of on-board processing is determined for each user. A tradeoff analysis is performed to determine the sensitivity of results to each of the system parameters. Significant results and conclusions are discussed, and recommendations are presented
- …