957 research outputs found

    Solving the Cauchy-Riemann equations on parallel computers

    Get PDF
    Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented

    Probabilistic structural mechanics research for parallel processing computers

    Get PDF
    Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    The Parallel Persistent Memory Model

    Full text link
    We consider a parallel computational model that consists of PP processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory are lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work-stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of O(W/PA+D(P/PA)log1/fW)O(W/P_A + D(P/P_A) \lceil\log_{1/f} W\rceil) in expectation, where WW and DD are the work and depth of the computation (in the absence of failures), PAP_A is the average number of processors available during the computation, and f1/2f \le 1/2 is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.Comment: This paper is the full version of a paper at SPAA 2018 with the same nam

    Theory and realization of novel algorithms for random sampling in digital signal processing

    Get PDF
    Random sampling is a technique which overcomes the alias problem in regular sampling. The randomization, however, destroys the symmetry property of the transform kernel of the discrete Fourier transform. Hence, when transforming a randomly sampled sequence to its frequency spectrum, the Fast Fourier transform cannot be applied and the computational complexity is N(^2). The objectives of this research project are (1) To devise sampling methods for random sampling such that computation may be reduced while the anti-alias property of random sampling is maintained : Two methods of inserting limited regularities into the randomized sampling grids are proposed. They are parallel additive random sampling and hybrid additive random sampling, both of which can save at least 75% of the multiplications required. The algorithms also lend themselves to the implementation by a multiprocessor system, which will further enhance the speed of the evaluation. (2) To study the auto-correlation sequence of a randomly sampled sequence as an alternative means to confirm its anti-alias property : The anti-alias property of the two proposed methods can be confirmed by using convolution in the frequency domain. However, the same conclusion is also reached by analysing in the spatial domain the auto-correlation of such sample sequences. A technique to evaluate the auto-correlation sequence of a randomly sampled sequence with a regular step size is proposed. The technique may also serve as an algorithm to convert a randomly sampled sequence to a regularly spaced sequence having a desired Nyquist frequency. (3) To provide a rapid spectral estimation using a coarse kernel : The approximate method proposed by Mason in 1980, which trades the accuracy for the speed of the computation, is introduced for making random sampling more attractive. (4) To suggest possible applications for random and pseudo-random sampling : To fully exploit its advantages, random sampling has been adopted in measurement Random sampling is a technique which overcomes the alias problem in regular sampling. The randomization, however, destroys the symmetry property of the transform kernel of the discrete Fourier transform. Hence, when transforming a randomly sampled sequence to its frequency spectrum, the Fast Fourier transform cannot be applied and the computational complexity is N"^. The objectives of this research project are (1) To devise sampling methods for random sampling such that computation may be reduced while the anti-alias property of random sampling is maintained : Two methods of inserting limited regularities into the randomized sampling grids are proposed. They are parallel additive random sampling and hybrid additive random sampling, both of which can save at least 75% , of the multiplications required. The algorithms also lend themselves to the implementation by a multiprocessor system, which will further enhance the speed of the evaluation. (2) To study the auto-correlation sequence of a randomly sampled sequence as an alternative means to confirm its anti-alias property : The anti-alias property of the two proposed methods can be confirmed by using convolution in the frequency domain. However, the same conclusion is also reached by analysing in the spatial domain the auto-correlation of such sample sequences. A technique to evaluate the auto-correlation sequence of a randomly sampled sequence with a regular step size is proposed. The technique may also serve as an algorithm to convert a randomly sampled sequence to a regularly spaced sequence having a desired Nyquist frequency. (3) To provide a rapid spectral estimation using a coarse kernel : The approximate method proposed by Mason in 1980, which trades the accuracy for the speed of the computation, is introduced for making random sampling more attractive. (4) To suggest possible applications for random and pseudo-random sampling : To fully exploit its advantages, random sampling has been adopted in measurement instruments where computing a spectrum is either minimal or not required. Such applications in instrumentation are easily found in the literature. In this thesis, two applications in digital signal processing are introduced. (5) To suggest an inverse transformation for random sampling so as to complete a two-way process and to broaden its scope of application. Apart from the above, a case study of realizing in a transputer network the prime factor algorithm with regular sampling is given in Chapter 2 and a rough estimation of the signal-to-noise ratio for a spectrum obtained from random sampling is found in Chapter 3. Although random sampling is alias-free, problems in computational complexity and noise prevent it from being adopted widely in engineering applications. In the conclusions, the criteria for adopting random sampling are put forward and the directions for its development are discussed

    Implementation and Performance Analysis of Numerical Algorithms on the MPP, Flex/32, and Cray/2

    Get PDF
    This dissertation presents the results of the implementation of a number of numerical algorithms on three parallel/vector computers. The object of this research is to determine how well, or poorly, a number of numerical algorithms would map onto three different architectures and to analyze the performance of these architectures using these algorithms. These algorithms are: a relaxation scheme for the solution of the Cauchy-Riemann equations, an ADI method for the solution of the diffusion equation, and a compact difference scheme for the solution of two-dimensional Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and the Cray/2. The machine architectures are briefly described. The implementation of these algorithms is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for these algorithms on these architectures. Finally conclusions are presented

    Implementation and analysis of a Navier-Stokes algorithm on parallel computers

    Get PDF
    The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented

    Processor Enhancements for Media Streaming Applications

    Get PDF
    The development of more processing demanding applications on the Internet (video broadcasting) on one hand and the popularity of recent devices at the user level (digital cameras, wireless videophones, ...) on the other hand introduce challenges at several levels. Today, such devices present processing capabilities and bandwidth settings that are inefficient to manage scalable QoS requirements in a typical media delivery framework. In this paper, we present an impact study of such a scalable data representation optimized for QoS (Matching Pursuit 3D algorithms) on processor architectures to achieve the best performance and power efficiency. A review of state of the art techniques for processor architecture enhancement let us expect promising opportunities from the latest developments in the reconfigurable computing research field. We present here the first design steps of an efficient reconfigurable coprocessor especially designed to cope with future video delivery and multimedia processing requirements. Architecture perspectives are proposed with respect to low development cost constraints, backward compatibilty and easy coprocessor usage using an original strategy based on a hardware/software codesign methodolog

    Satellite on-board processing for earth resources data

    Get PDF
    Results of a survey of earth resources user applications and their data requirements, earth resources multispectral scanner sensor technology, and preprocessing algorithms for correcting the sensor outputs and for data bulk reduction are presented along with a candidate data format. Computational requirements required to implement the data analysis algorithms are included along with a review of computer architectures and organizations. Computer architectures capable of handling the algorithm computational requirements are suggested and the environmental effects of an on-board processor discussed. By relating performance parameters to the system requirements of each of the user requirements the feasibility of on-board processing is determined for each user. A tradeoff analysis is performed to determine the sensitivity of results to each of the system parameters. Significant results and conclusions are discussed, and recommendations are presented
    corecore