Search CORE

6 research outputs found

Recommended from our members

Cluster partitioning approaches to parallel Monte Carlo simulation on multiprocessors

Author: Ranawake Udaya A.
Publication venue: 'Oregon State University'
Publication date
Field of study

We consider the parallelization of Monte Carlo algorithms for analyzing numerical models of charge transport used in semiconductor device physics. Parallel algorithms for the standard k-space Monte Carlo simulation of a three band model of bulk GaAs on hypercube multicomputers are first presented. This Monte Carlo model includes scattering due to polar-optical, intervalley, and acoustic phonons, as well as electron-electron scattering. The k-space Monte Carlo program, excluding electron-electron scattering, is then extended to simulate a semiconductor device by the addition of the real space position of each simulated particle and the assignment of particle charge, using a cloud in cell scheme, to solve the Poisson's equation with particle dynamics. Techniques for effectively partitioning this device so as to balance the computational load while minimizing the communication overhead are discussed. Approaches for improving the efficiency of the parallel algorithm, either by dynamically balancing of load or by employing the usual techniques for enhancing rare events in Monte Carlo simulations are also considered. The parallel algorithms were implemented on a 64-node NCUBE multiprocessor and test results were generated to validate the parallel k-space, as well as the device simulation programs. Timing measurements were also made to study the variation of speedups as both the problem size and number of processors are varied. The effective exploitation of the computational power of message passing multiprocessors requires the efficient mapping of parallel programs onto processors so as to balance the computational load while minimizing the communication overhead between processors. A lower bound for this communication volume when mapping arbitrary task graphs onto distributed processor systems is derived. For a K processor system this lower bound can be computed from the K (possibly) largest eigenvalues of the adjacency matrix of the task graph and the eigenvalues of the adjacency matrix of the processor graph. We also derive the eigenvalues of the adjacency matrix of the processor graph for a hypercube and give test results comparing the lower bound for the communication volume with the values given by a heuristic algorithm for a number of task graphs

ScholarsArchive@OSU

Galley: A New Parallel File System for Parallel Applications

Author: Nieuwejaar Nils
Publication venue: Dartmouth Digital Commons
Publication date: 01/11/1996
Field of study

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/O requirements of parallel scientific applications. Most multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access those multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated application and library programmers to use knowledge about their I/O to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. In this work we examine current multiprocessor file systems, as well as how those file systems are used by scientific applications. Contrary to the expectations of the designers of current parallel file systems, the workloads on those systems are dominated by requests to read and write small pieces of data. Furthermore, rather than being accessed sequentially and contiguously, as in uniprocessor and supercomputer workloads, files in multiprocessor file systems are accessed in regular, structured, but non-contiguous patterns. Based on our observations of multiprocessor workloads, we have designed Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. In this work, we introduce Galley and discuss its design and implementation. We describe Galley\u27s new three-dimensional file structure and discuss how that structure can be used by parallel applications to achieve higher performance. We introduce several new data-access interfaces, which allow applications to explicitly describe the regular access patterns we found to be common in parallel file system workloads. We show how these new interfaces allow parallel applications to achieve tremendous increases in I/O performance. Finally, we discuss how Galley\u27s new file structure and data-access interfaces can be useful in practice

Dartmouth Digital Commons (Dartmouth College)

A parallel functional language compiler for message-passing multicomputers

Author: Junaidu Sahalu B.
Publication venue: The University of St Andrews
Publication date: 22/05/2018
Field of study

The research presented in this thesis is about the design and implementation of Naira, a parallel, parallelising compiler for a rich, purely functional programming language. The source language of the compiler is a subset of Haskell 1.2. The front end of Naira is written entirely in the Haskell subset being compiled. Naira has been successfully parallelised and it is the largest successfully parallelised Haskell program having achieved good absolute speedups on a network of SUN workstations. Having the same basic structure as other production compilers of functional languages, Naira's parallelisation technology should carry forward to other functional language compilers. The back end of Naira is written in C and generates parallel code in the C language which is envisioned to be run on distributed-memory machines. The code generator is based on a novel compilation scheme specified using a restricted form of Milner's 7r-calculus which achieves asynchronous communication. We present the first working implementation of this scheme on distributed-memory message-passing multicomputers with split-phase transactions. Simulated assessment of the generated parallel code indicates good parallel behaviour. Parallelism is introduced using explicit, advisory user annotations in the source' program and there are two major aspects of the use of annotations in the compiler. First, the front end of the compiler is parallelised so as to improve its efficiency at compilation time when it is compiling input programs. Secondly, the input programs to the compiler can themselves contain annotations based on which the compiler generates the multi-threaded parallel code. These, therefore, make Naira, unusually and uniquely, both a parallel and a parallelising compiler. We adopt a medium-grained approach to granularity where function applications form the unit of parallelism and load distribution. We have experimented with two different task distribution strategies, deterministic and random, and have also experimented with thread-based and quantum- based scheduling policies. Our experiments show that there is little efficiency difference for regular programs but the quantum-based scheduler is the best in programs with irregular parallelism. The compiler has been successfully built, parallelised and assessed using both idealised and realistic measurement tools: we obtained significant compilation speed-ups on a variety of simulated parallel architectures. The simulated results are supported by the best results obtained on real hardware for such a large program: we measured an absolute speedup of 2.5 on a network of 5 SUN workstations. The compiler has also been shown to have good parallelising potential, based on popular test programs. Results of assessing Naira's generated unoptimised parallel code are comparable to those produced by other successful parallel implementation projects

St Andrews Research Repository

Algorithm Libraries for Multi-Core Processors

Author: Singler Johannes
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2010
Field of study

By providing parallelized versions of established algorithm libraries, we ease the exploitation of the multiple cores on modern processors for the programmer. The Multi-Core STL provides basic algorithms for internal memory, while the parallelized STXXL enables multi-core acceleration for algorithms on large data sets stored on disk. Some parallelized geometric algorithms are introduced into CGAL. Further, we design and implement sorting algorithms for huge data in distributed external memory

KITopen

Cognitive Foundations for Visual Analytics

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref