3,666 research outputs found
Teaching Parallel Programming Using Java
This paper presents an overview of the "Applied Parallel Computing" course
taught to final year Software Engineering undergraduate students in Spring 2014
at NUST, Pakistan. The main objective of the course was to introduce practical
parallel programming tools and techniques for shared and distributed memory
concurrent systems. A unique aspect of the course was that Java was used as the
principle programming language. The course was divided into three sections. The
first section covered parallel programming techniques for shared memory systems
that include multicore and Symmetric Multi-Processor (SMP) systems. In this
section, Java threads was taught as a viable programming API for such systems.
The second section was dedicated to parallel programming tools meant for
distributed memory systems including clusters and network of computers. We used
MPJ Express-a Java MPI library-for conducting programming assignments and lab
work for this section. The third and the final section covered advanced topics
including the MapReduce programming model using Hadoop and the General Purpose
Computing on Graphics Processing Units (GPGPU).Comment: 8 Pages, 6 figures, MPJ Express, MPI Java, Teaching Parallel
Programmin
rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks
Scientific applications often contain large and computationally intensive
parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a
balanced load execution of such applications on high performance computing
(HPC) systems. Large HPC systems are vulnerable to processors or node failures
and perturbations in the availability of resources. Most self-scheduling
approaches do not consider fault-tolerant scheduling or depend on failure or
perturbation detection and react by rescheduling failed tasks. In this work, a
robust dynamic load balancing (rDLB) approach is proposed for the robust self
scheduling of independent tasks. The proposed approach is proactive and does
not depend on failure or perturbation detection. The theoretical analysis of
the proposed approach shows that it is linearly scalable and its cost decrease
quadratically by increasing the system size. rDLB is integrated into an MPI DLS
library to evaluate its performance experimentally with two computationally
intensive scientific applications. Results show that rDLB enables the tolerance
of up to (P minus one) processor failures, where P is the number of processors
executing an application. In the presence of perturbations, rDLB boosted the
robustness of DLS techniques up to 30 times and decreased application execution
time up to 7 times compared to their counterparts without rDLB
Pervasive Parallel And Distributed Computing In A Liberal Arts College Curriculum
We present a model for incorporating parallel and distributed computing (PDC) throughout an undergraduate CS curriculum. Our curriculum is designed to introduce students early to parallel and distributed computing topics and to expose students to these topics repeatedly in the context of a wide variety of CS courses. The key to our approach is the development of a required intermediate-level course that serves as a introduction to computer systems and parallel computing. It serves as a requirement for every CS major and minor and is a prerequisite to upper-level courses that expand on parallel and distributed computing topics in different contexts. With the addition of this new course, we are able to easily make room in upper-level courses to add and expand parallel and distributed computing topics. The goal of our curricular design is to ensure that every graduating CS major has exposure to parallel and distributed computing, with both a breadth and depth of coverage. Our curriculum is particularly designed for the constraints of a small liberal arts college, however, much of its ideas and its design are applicable to any undergraduate CS curriculum
ParMooN - a modernized program package based on mapped finite elements
{\sc ParMooN} is a program package for the numerical solution of elliptic and
parabolic partial differential equations. It inherits the distinct features of
its predecessor {\sc MooNMD} \cite{JM04}: strict decoupling of geometry and
finite element spaces, implementation of mapped finite elements as their
definition can be found in textbooks, and a geometric multigrid preconditioner
with the option to use different finite element spaces on different levels of
the multigrid hierarchy. After having presented some thoughts about in-house
research codes, this paper focuses on aspects of the parallelization for a
distributed memory environment, which is the main novelty of {\sc ParMooN}.
Numerical studies, performed on compute servers, assess the efficiency of the
parallelized geometric multigrid preconditioner in comparison with some
parallel solvers that are available in the library {\sc PETSc}. The results of
these studies give a first indication whether the cumbersome implementation of
the parallelized geometric multigrid method was worthwhile or not.Comment: partly supported by European Union (EU), Horizon 2020, Marie
Sk{\l}odowska-Curie Innovative Training Networks (ITN-EID), MIMESIS, grant
number 67571
Going Stupid with EcoLab
In 2005, Railsback et al. proposed a very simple model ({\em Stupid
Model}) that could be implemented within a couple of hours, and later
extended to demonstrate the use of common ABM platform functionality. They
provided implementations of the model in several agent based modelling
platforms, and compared the platforms for ease of implementation of this simple
model, and performance. In this paper, I implement Railsback et al's Stupid
Model in the EcoLab simulation platform, a C++ based modelling platform,
demonstrating that it is a feasible platform for these sorts of models, and
compare the performance of the implementation with Repast, Mason and Swarm
versions
Parallel Implementation of Lossy Data Compression for Temporal Data Sets
Many scientific data sets contain temporal dimensions. These are the data
storing information at the same spatial location but different time stamps.
Some of the biggest temporal datasets are produced by parallel computing
applications such as simulations of climate change and fluid dynamics. Temporal
datasets can be very large and cost a huge amount of time to transfer among
storage locations. Using data compression techniques, files can be transferred
faster and save storage space. NUMARCK is a lossy data compression algorithm
for temporal data sets that can learn emerging distributions of element-wise
change ratios along the temporal dimension and encodes them into an index table
to be concisely represented. This paper presents a parallel implementation of
NUMARCK. Evaluated with six data sets obtained from climate and astrophysics
simulations, parallel NUMARCK achieved scalable speedups of up to 8788 when
running 12800 MPI processes on a parallel computer. We also compare the
compression ratios against two lossy data compression algorithms, ISABELA and
ZFP. The results show that NUMARCK achieved higher compression ratio than
ISABELA and ZFP.Comment: 10 pages, HiPC 201
- …