67,491 research outputs found
Teaching Parallel Programming Using Java
This paper presents an overview of the "Applied Parallel Computing" course
taught to final year Software Engineering undergraduate students in Spring 2014
at NUST, Pakistan. The main objective of the course was to introduce practical
parallel programming tools and techniques for shared and distributed memory
concurrent systems. A unique aspect of the course was that Java was used as the
principle programming language. The course was divided into three sections. The
first section covered parallel programming techniques for shared memory systems
that include multicore and Symmetric Multi-Processor (SMP) systems. In this
section, Java threads was taught as a viable programming API for such systems.
The second section was dedicated to parallel programming tools meant for
distributed memory systems including clusters and network of computers. We used
MPJ Express-a Java MPI library-for conducting programming assignments and lab
work for this section. The third and the final section covered advanced topics
including the MapReduce programming model using Hadoop and the General Purpose
Computing on Graphics Processing Units (GPGPU).Comment: 8 Pages, 6 figures, MPJ Express, MPI Java, Teaching Parallel
Programmin
A Block Minorization--Maximization Algorithm for Heteroscedastic Regression
The computation of the maximum likelihood (ML) estimator for heteroscedastic
regression models is considered. The traditional Newton algorithms for the
problem require matrix multiplications and inversions, which are bottlenecks in
modern Big Data contexts. A new Big Data-appropriate minorization--maximization
(MM) algorithm is considered for the computation of the ML estimator. The MM
algorithm is proved to generate monotonically increasing sequences of
likelihood values and to be convergent to a stationary point of the
log-likelihood function. A distributed and parallel implementation of the MM
algorithm is presented and the MM algorithm is shown to have differing time
complexity to the Newton algorithm. Simulation studies demonstrate that the MM
algorithm improves upon the computation time of the Newton algorithm in some
practical scenarios where the number of observations is large
A Parallel Monte Carlo Code for Simulating Collisional N-body Systems
We present a new parallel code for computing the dynamical evolution of
collisional N-body systems with up to N~10^7 particles. Our code is based on
the the Henon Monte Carlo method for solving the Fokker-Planck equation, and
makes assumptions of spherical symmetry and dynamical equilibrium. The
principal algorithmic developments involve optimizing data structures, and the
introduction of a parallel random number generation scheme, as well as a
parallel sorting algorithm, required to find nearest neighbors for interactions
and to compute the gravitational potential. The new algorithms we introduce
along with our choice of decomposition scheme minimize communication costs and
ensure optimal distribution of data and workload among the processing units.
The implementation uses the Message Passing Interface (MPI) library for
communication, which makes it portable to many different supercomputing
architectures. We validate the code by calculating the evolution of clusters
with initial Plummer distribution functions up to core collapse with the number
of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find
that our results are in good agreement with self-similar core-collapse
solutions, and the core collapse times generally agree with expectations from
the literature. Also, we observe good total energy conservation, within less
than 0.04% throughout all simulations. We analyze the performance of the code,
and demonstrate near-linear scaling of the runtime with the number of
processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7.
The runtime reaches a saturation with the addition of more processors beyond
these limits which is a characteristic of the parallel sorting algorithm. The
resulting maximum speedups we achieve are approximately 60x, 100x, and 220x,
respectively.Comment: 53 pages, 13 figures, accepted for publication in ApJ Supplement
GADGET: A code for collisionless and gasdynamical cosmological simulations
We describe the newly written code GADGET which is suitable both for
cosmological simulations of structure formation and for the simulation of
interacting galaxies. GADGET evolves self-gravitating collisionless fluids with
the traditional N-body approach, and a collisional gas by smoothed particle
hydrodynamics. Along with the serial version of the code, we discuss a parallel
version that has been designed to run on massively parallel supercomputers with
distributed memory. While both versions use a tree algorithm to compute
gravitational forces, the serial version of GADGET can optionally employ the
special-purpose hardware GRAPE instead of the tree. Periodic boundary
conditions are supported by means of an Ewald summation technique. The code
uses individual and adaptive timesteps for all particles, and it combines this
with a scheme for dynamic tree updates. Due to its Lagrangian nature, GADGET
thus allows a very large dynamic range to be bridged, both in space and time.
So far, GADGET has been successfully used to run simulations with up to 7.5e7
particles, including cosmological studies of large-scale structure formation,
high-resolution simulations of the formation of clusters of galaxies, as well
as workstation-sized problems of interacting galaxies. In this study, we detail
the numerical algorithms employed, and show various tests of the code. We
publically release both the serial and the massively parallel version of the
code.Comment: 32 pages, 14 figures, replaced to match published version in New
Astronomy. For download of the code, see
http://www.mpa-garching.mpg.de/gadget (new version 1.1 available
- …