1,995 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    GPU accelerated Monte Carlo simulation of Brownian motors dynamics with CUDA

    Full text link
    This work presents an updated and extended guide on methods of a proper acceleration of the Monte Carlo integration of stochastic differential equations with the commonly available NVIDIA Graphics Processing Units using the CUDA programming environment. We outline the general aspects of the scientific computing on graphics cards and demonstrate them with two models of a well known phenomenon of the noise induced transport of Brownian motors in periodic structures. As a source of fluctuations in the considered systems we selected the three most commonly occurring noises: the Gaussian white noise, the white Poissonian noise and the dichotomous process also known as a random telegraph signal. The detailed discussion on various aspects of the applied numerical schemes is also presented. The measured speedup can be of the astonishing order of about 3000 when compared to a typical CPU. This number significantly expands the range of problems solvable by use of stochastic simulations, allowing even an interactive research in some cases.Comment: 21 pages, 5 figures; Comput. Phys. Commun., accepted, 201

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Efficient parallel architecture for highly coupled real-time linear system applications

    Get PDF
    A systematic procedure is developed for exploiting the parallel constructs of computation in a highly coupled, linear system application. An overall top-down design approach is adopted. Differential equations governing the application under consideration are partitioned into subtasks on the basis of a data flow analysis. The interconnected task units constitute a task graph which has to be computed in every update interval. Multiprocessing concepts utilizing parallel integration algorithms are then applied for efficient task graph execution. A simple scheduling routine is developed to handle task allocation while in the multiprocessor mode. Results of simulation and scheduling are compared on the basis of standard performance indices. Processor timing diagrams are developed on the basis of program output accruing to an optimal set of processors. Basic architectural attributes for implementing the system are discussed together with suggestions for processing element design. Emphasis is placed on flexible architectures capable of accommodating widely varying application specifics

    Towards developing robust algorithms for solving partial differential equations on MIMD machines

    Get PDF
    Methods for efficient computation of numerical algorithms on a wide variety of MIMD machines are proposed. These techniques reorganize the data dependency patterns to improve the processor utilization. The model problem finds the time-accurate solution to a parabolic partial differential equation discretized in space and implicitly marched forward in time. The algorithms are extensions of Jacobi and SOR. The extensions consist of iterating over a window of several timesteps, allowing efficient overlap of computation with communication. The methods increase the degree to which work can be performed while data are communicated between processors. The effect of the window size and of domain partitioning on the system performance is examined both by implementing the algorithm on a simulated multiprocessor system

    Parallel algorithm with spectral convergence for nonlinear integro-differential equations

    Get PDF
    We discuss a numerical algorithm for solving nonlinear integro-differential equations, and illustrate our findings for the particular case of Volterra type equations. The algorithm combines a perturbation approach meant to render a linearized version of the problem and a spectral method where unknown functions are expanded in terms of Chebyshev polynomials (El-gendi's method). This approach is shown to be suitable for the calculation of two-point Green functions required in next to leading order studies of time-dependent quantum field theory.Comment: 15 pages, 9 figure

    Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

    Get PDF
    Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed
    corecore