67,491 research outputs found

    Teaching Parallel Programming Using Java

    Full text link
    This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that Java was used as the principle programming language. The course was divided into three sections. The first section covered parallel programming techniques for shared memory systems that include multicore and Symmetric Multi-Processor (SMP) systems. In this section, Java threads was taught as a viable programming API for such systems. The second section was dedicated to parallel programming tools meant for distributed memory systems including clusters and network of computers. We used MPJ Express-a Java MPI library-for conducting programming assignments and lab work for this section. The third and the final section covered advanced topics including the MapReduce programming model using Hadoop and the General Purpose Computing on Graphics Processing Units (GPGPU).Comment: 8 Pages, 6 figures, MPJ Express, MPI Java, Teaching Parallel Programmin

    A Block Minorization--Maximization Algorithm for Heteroscedastic Regression

    Full text link
    The computation of the maximum likelihood (ML) estimator for heteroscedastic regression models is considered. The traditional Newton algorithms for the problem require matrix multiplications and inversions, which are bottlenecks in modern Big Data contexts. A new Big Data-appropriate minorization--maximization (MM) algorithm is considered for the computation of the ML estimator. The MM algorithm is proved to generate monotonically increasing sequences of likelihood values and to be convergent to a stationary point of the log-likelihood function. A distributed and parallel implementation of the MM algorithm is presented and the MM algorithm is shown to have differing time complexity to the Newton algorithm. Simulation studies demonstrate that the MM algorithm improves upon the computation time of the Newton algorithm in some practical scenarios where the number of observations is large

    Computing Room Acoustics Using 3D FDTD: A Cuda Approach.

    Get PDF

    A Parallel Monte Carlo Code for Simulating Collisional N-body Systems

    Full text link
    We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within less than 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7. The runtime reaches a saturation with the addition of more processors beyond these limits which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60x, 100x, and 220x, respectively.Comment: 53 pages, 13 figures, accepted for publication in ApJ Supplement

    GADGET: A code for collisionless and gasdynamical cosmological simulations

    Full text link
    We describe the newly written code GADGET which is suitable both for cosmological simulations of structure formation and for the simulation of interacting galaxies. GADGET evolves self-gravitating collisionless fluids with the traditional N-body approach, and a collisional gas by smoothed particle hydrodynamics. Along with the serial version of the code, we discuss a parallel version that has been designed to run on massively parallel supercomputers with distributed memory. While both versions use a tree algorithm to compute gravitational forces, the serial version of GADGET can optionally employ the special-purpose hardware GRAPE instead of the tree. Periodic boundary conditions are supported by means of an Ewald summation technique. The code uses individual and adaptive timesteps for all particles, and it combines this with a scheme for dynamic tree updates. Due to its Lagrangian nature, GADGET thus allows a very large dynamic range to be bridged, both in space and time. So far, GADGET has been successfully used to run simulations with up to 7.5e7 particles, including cosmological studies of large-scale structure formation, high-resolution simulations of the formation of clusters of galaxies, as well as workstation-sized problems of interacting galaxies. In this study, we detail the numerical algorithms employed, and show various tests of the code. We publically release both the serial and the massively parallel version of the code.Comment: 32 pages, 14 figures, replaced to match published version in New Astronomy. For download of the code, see http://www.mpa-garching.mpg.de/gadget (new version 1.1 available
    corecore