141 research outputs found

    User-Defined Data Distributions in High-Level Programming Languages

    Get PDF
    One of the characteristic features of today’s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade

    The exploitation of parallelism on shared memory multiprocessors

    Get PDF
    PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

    Computer algebra and transputers applied to the finite element method

    Get PDF
    Recent developments in computing technology have opened new prospects for computationally intensive numerical methods such as the finite element method. More complex and refined problems can be solved, for example increased number and order of the elements improving accuracy. The power of Computer Algebra systems and parallel processing techniques is expected to bring significant improvement in such methods. The main objective of this work has been to assess the use of these techniques in the finite element method. The generation of interpolation functions and element matrices has been investigated using Computer Algebra. Symbolic expressions were obtained automatically and efficiently converted into FORTRAN routines. Shape functions based on Lagrange polynomials and mapping functions for infinite elements were considered. One and two dimensional element matrices for bending problems based on Hermite polynomials were also derived. Parallel solvers for systems of linear equations have been developed since such systems often arise in numerical methods. Both symmetric and asymmetric solvers have been considered. The implementation was on Transputer-based machines. The speed-ups obtained are good. An analysis by finite element method of a free surface flow over a spillway has been carried out. Computer Algebra was used to derive the integrand of the element matrices and their numerical evaluation was done in parallel on a Transputer-based machine. A graphical interface was developed to enable the visualisation of the free surface and the influence of the parameters. The speed- ups obtained were good. Convergence of the iterative solution method used was good for gated spillways. Some problems experienced with the non-gated spillways have lead to a discussion and tests of the potential factors of instability

    Compilation techniques for multicomputers

    No full text
    This thesis considers problems in process and data partitioning when compiling programs for distributed-memory parallel computers (or multicomputers). These partitions may be specified by the user through the use of language constructs, or automatically determined by the compiler. Data and process partitioning techniques are developed for two models of compilation. The first compilation model focusses on the loop nests present in a serial program. Executing the iterations of these loop nests in parallel accounts for a significant amount of the parallelism which can be exploited in these programs. The parallelism is exploited by applying a set of transformations to the loop nests. The iterations of the transformed loop nests are in a form which can be readily distributed amongst the processors of a multicomputer. The manner in which the arrays, referenced within these loop nests, are partitioned between the processors is determined by the distribution of the loop iterations. The second compilation model is based on the data parallel paradigm, in which operations are applied to many different data items collectively. High Performance Fortran is used as an example of this paradigm. Novel collective communication routines are developed, and are applied to provide the communication associated with the data partitions for both compilation models. Furthermore, it is shown that by using these routines the communication associated with partitioning data on a multicomputer is greatly simplified. These routines are developed as part of this thesis. The experimental context for this thesis is the development of a compiler for the Fujitsu AP1000 multicomputer. A prototype compiler is presented. Experimental results for a variety of applications are included

    Building a High Performance Cluster through Computer Reuse

    Get PDF
    The goal of this MQP was to use outdated commodity PCs to build a cluster computer capable of being used for computationally intensive research. Traditionally, PCs are recycled after being taken out of use during a refresh cycle. By using open source software and minimal hardware modifications these PCs can be configured into a cluster that approaches the performance of state-of-the-art cluster computers. In this paper, we present the system setup, configuration, and validation of the WOPPR cluster computer

    Doctor of Philosophy

    Get PDF
    dissertationRecent trends in high performance computing present larger and more diverse computers using multicore nodes possibly with accelerators and/or coprocessors and reduced memory. These changes pose formidable challenges for applications code to attain scalability. Software frameworks that execute machine-independent applications code using a runtime system that shields users from architectural complexities oer a portable solution for easy programming. The Uintah framework, for example, solves a broad class of large-scale problems on structured adaptive grids using fluid-flow solvers coupled with particle-based solids methods. However, the original Uintah code had limited scalability as tasks were run in a predefined order based solely on static analysis of the task graph and used only message passing interface (MPI) for parallelism. By using a new hybrid multithread and MPI runtime system, this research has made it possible for Uintah to scale to 700K central processing unit (CPU) cores when solving challenging fluid-structure interaction problems. Those problems often involve moving objects with adaptive mesh refinement and thus with highly variable and unpredictable work patterns. This research has also demonstrated an ability to run capability jobs on the heterogeneous systems with Nvidia graphics processing unit (GPU) accelerators or Intel Xeon Phi coprocessors. The new runtime system for Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for multicore CPUs and/or accelerators/coprocessors on a node. Uintah's clear separation between application and runtime code has led to scalability increases without significant changes to application code. This research concludes that the adaptive directed acyclic graph (DAG)-based approach provides a very powerful abstraction for solving challenging multiscale multiphysics engineering problems. Excellent scalability with regard to the different processors and communications performance are achieved on some of the largest and most powerful computers available today

    Finite elements software and applications

    Get PDF
    The contents of this thesis are a detailed study of the software for the finite element method. In the text, the finite element method is introduced from both the engineering and mathematical points of view. The computer implementation of the method is explained with samples of mainframe, mini- and micro-computer implementations. A solution is presented for the problem of limited stack size for both mini- and micro-computers which possess stack architecture. Several finite element programs are presented. Special purpose programs to solve problems in structural analysis and groundwater flow are discussed. However, an efficient easy-to-use finite element program for general two-dimensional problems is presented. Several problems in groundwater flow are considered that include steady, unsteady flows in different types of aquifers. Different cases of sinks and sources in the flow domain are also considered. The performance of finite element methods is studied for the chosen problems by comparing the numerical solutions of test problems with analytical solutions (if they exist) or with solutions obtained by other numerical methods. The polynomial refinement of the finite elements is studied for the presented problems in order to offer some evidence as to which finite element simulation is best to use under a variety of circumstances
    corecore