680,736 research outputs found
Teaching Concurrent Software Design: A Case Study Using Android
In this article, we explore various parallel and distributed computing topics
from a user-centric software engineering perspective. Specifically, in the
context of mobile application development, we study the basic building blocks
of interactive applications in the form of events, timers, and asynchronous
activities, along with related software modeling, architecture, and design
topics.Comment: Submitted to CDER NSF/IEEE-TCPP Curriculum Initiative on Parallel and
Distributed Computing - Core Topics for Undergraduate
A Formal Model For Real-Time Parallel Computation
The imposition of real-time constraints on a parallel computing environment-
specifically high-performance, cluster-computing systems- introduces a variety
of challenges with respect to the formal verification of the system's timing
properties. In this paper, we briefly motivate the need for such a system, and
we introduce an automaton-based method for performing such formal verification.
We define the concept of a consistent parallel timing system: a hybrid system
consisting of a set of timed automata (specifically, timed Buchi automata as
well as a timed variant of standard finite automata), intended to model the
timing properties of a well-behaved real-time parallel system. Finally, we give
a brief case study to demonstrate the concepts in the paper: a parallel matrix
multiplication kernel which operates within provable upper time bounds. We give
the algorithm used, a corresponding consistent parallel timing system, and
empirical results showing that the system operates under the specified timing
constraints.Comment: In Proceedings FTSCS 2012, arXiv:1212.657
General Algorithm For Improved Lattice Actions on Parallel Computing Architectures
Quantum field theories underlie all of our understanding of the fundamental
forces of nature. The are relatively few first principles approaches to the
study of quantum field theories [such as quantum chromodynamics (QCD) relevant
to the strong interaction] away from the perturbative (i.e., weak-coupling)
regime. Currently the most common method is the use of Monte Carlo methods on a
hypercubic space-time lattice. These methods consume enormous computing power
for large lattices and it is essential that increasingly efficient algorithms
be developed to perform standard tasks in these lattice calculations. Here we
present a general algorithm for QCD that allows one to put any planar improved
gluonic lattice action onto a parallel computing architecture. High performance
masks for specific actions (including non-planar actions) are also presented.
These algorithms have been successfully employed by us in a variety of lattice
QCD calculations using improved lattice actions on a 128 node Thinking Machines
CM-5.
{\underline{Keywords}}: quantum field theory; quantum chromodynamics;
improved actions; parallel computing algorithms
A Lanczos eigenvalue method on a parallel computer
Eigenvalue analyses of complex structures is a computationally intensive task which can benefit significantly from new and impending parallel computers. This study reports on a parallel computer implementation of the Lanczos method for free vibration analysis. The approach used here subdivides the major Lanczos calculation tasks into subtasks and introduces parallelism down to the subtask levels such as matrix decomposition and forward/backward substitution. The method was implemented on a commercial parallel computer and results were obtained for a long flexible space structure. While parallel computing efficiency for the Lanczos method was good for a moderate number of processors for the test problem, the greatest reduction in time was realized for the decomposition of the stiffness matrix, a calculation which took 70 percent of the time in the sequential program and which took 25 percent of the time on eight processors. For a sample calculation of the twenty lowest frequencies of a 486 degree of freedom problem, the total sequential computing time was reduced by almost a factor of ten using 16 processors
- …