279 research outputs found

    Reliable massively parallel symbolic computing : fault tolerance for a distributed Haskell

    Get PDF
    As the number of cores in manycore systems grows exponentially, the number of failures is also predicted to grow exponentially. Hence massively parallel computations must be able to tolerate faults. Moreover new approaches to language design and system architecture are needed to address the resilience of massively parallel heterogeneous architectures. Symbolic computation has underpinned key advances in Mathematics and Computer Science, for example in number theory, cryptography, and coding theory. Computer algebra software systems facilitate symbolic mathematics. Developing these at scale has its own distinctive set of challenges, as symbolic algorithms tend to employ complex irregular data and control structures. SymGridParII is a middleware for parallel symbolic computing on massively parallel High Performance Computing platforms. A key element of SymGridParII is a domain specific language (DSL) called Haskell Distributed Parallel Haskell (HdpH). It is explicitly designed for scalable distributed-memory parallelism, and employs work stealing to load balance dynamically generated irregular task sizes. To investigate providing scalable fault tolerant symbolic computation we design, implement and evaluate a reliable version of HdpH, HdpH-RS. Its reliable scheduler detects and handles faults, using task replication as a key recovery strategy. The scheduler supports load balancing with a fault tolerant work stealing protocol. The reliable scheduler is invoked with two fault tolerance primitives for implicit and explicit work placement, and 10 fault tolerant parallel skeletons that encapsulate common parallel programming patterns. The user is oblivious to many failures, they are instead handled by the scheduler. An operational semantics describes small-step reductions on states. A simple abstract machine for scheduling transitions and task evaluation is presented. It defines the semantics of supervised futures, and the transition rules for recovering tasks in the presence of failure. The transition rules are demonstrated with a fault-free execution, and three executions that recover from faults. The fault tolerant work stealing has been abstracted in to a Promela model. The SPIN model checker is used to exhaustively search the intersection of states in this automaton to validate a key resiliency property of the protocol. It asserts that an initially empty supervised future on the supervisor node will eventually be full in the presence of all possible combinations of failures. The performance of HdpH-RS is measured using five benchmarks. Supervised scheduling achieves a speedup of 757 with explicit task placement and 340 with lazy work stealing when executing Summatory Liouville up to 1400 cores of a HPC architecture. Moreover, supervision overheads are consistently low scaling up to 1400 cores. Low recovery overheads are observed in the presence of frequent failure when lazy on-demand work stealing is used. A Chaos Monkey mechanism has been developed for stress testing resiliency with random failure combinations. All unit tests pass in the presence of random failure, terminating with the expected results

    Research in Applied Mathematics, Fluid Mechanics and Computer Science

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period October 1, 1998 through March 31, 1999

    The development of a node for a hardware reconfigurable parallel processor

    Get PDF
    This dissertation concerns the design and implementation of a node for a hardware reconfigurable parallel processor. The hardware that was developed allows for the further development of a parallel processor with configurable hardware acceleration. Each node in the system has a standard microprocessor and reconfigurable logic device and has high speed communications channels for inter-node communication. The design of the node provided high-speed serial communications channels allowing the implementation of various network topographies. The node also provided a PCI master interface to provide an external interface and communicate with local nodes on the bus. A high speed RlSC processor provided communication and system control functions and the reconfigurable logic device provided communication interfaces and data processing functions. The node was designed and implemented as a PCI card that interfaced a standard PCI bus. VHDL designs for logic devices that provided system support were developed, VHDL designs for the reconfigurable logic FPGA and software including drivers and system software were written for the node. The 64-bit version Linux operating system was then ported to the processor providing a UNIX environment for the system. The node functioned as specified and parallel and hardware accelerated processing was demonstrated. The hardware acceleration was shown to provide substantial performance benefits for the system

    Performance modelling, analysis and prediction of Spark jobs in Hadoop cluster : a thesis by publications presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, School of Mathematical & Computational Sciences, Massey University, Auckland, New Zealand

    Get PDF
    Big Data frameworks have received tremendous attention from the industry and from academic research over the past decade. The advent of distributed computing frameworks such as Hadoop MapReduce and Spark are powerful frameworks that offer an efficient solution for analysing large-scale datasets running under the Hadoop cluster. Spark has been established as one of the most popular large-scale data processing engines because of its speed, low latency in-memory computation, and advanced analytics. Spark computational performance heavily depends on the selection of suitable parameters, and the configuration of these parameters is a challenging task. Although Spark has default parameters and can deploy applications without much effort, a significant drawback of default parameter selection is that it is not always the best for cluster performance. A major limitation for Spark performance prediction using existing models is that it requires either large input data or system configuration that is time-consuming. Therefore, an analytical model could be a better solution for performance prediction and for establishing appropriate job configurations. This thesis proposes two distinct parallelisation models for performance prediction: the 2D-Plate model and the Fully-Connected Node model. Both models were constructed based on serial boundaries for a certain arrangement of executors and size of the data. In order to evaluate the cluster performance, various HiBench workloads were used, and workload’s empirical data were fitted with the models for performance prediction analysis. The developed models were benchmarked with the existing models such as Amdahl’s, Gustafson, ERNEST, and machine learning. Our experimental results show that the two proposed models can quickly and accurately predict performance in terms of runtime, and they can outperform the accuracy of machine learning models when extrapolating predictions

    Designing a scalable dynamic load -balancing algorithm for pipelined single program multiple data applications on a non-dedicated heterogeneous network of workstations

    Get PDF
    Dynamic load balancing strategies have been shown to be the most critical part of an efficient implementation of various applications on large distributed computing systems. The need for dynamic load balancing strategies increases when the underlying hardware is a non-dedicated heterogeneous network of workstations (HNOW). This research focuses on the single program multiple data (SPMD) programming model as it has been extensively used in parallel programming for its simplicity and scalability in terms of computational power and memory size.;This dissertation formally defines and addresses the problem of designing a scalable dynamic load-balancing algorithm for pipelined SPMD applications on non-dedicated HNOW. During this process, the HNOW parameters, SPMD application characteristics, and load-balancing performance parameters are identified.;The dissertation presents a taxonomy that categorizes general load balancing algorithms and a methodology that facilitates creating new algorithms that can harness the HNOW computing power and still preserve the scalability of the SPMD application.;The dissertation devises a new algorithm, DLAH (Dynamic Load-balancing Algorithm for HNOW). DLAH is based on a modified diffusion technique, which incorporates the HNOW parameters. Analytical performance bound for the worst-case scenario of the diffusion technique has been derived.;The dissertation develops and utilizes an HNOW simulation model to conduct extensive simulations. These simulations were used to validate DLAH and compare its performance to related dynamic algorithms. The simulations results show that DLAH algorithm is scalable and performs well for both homogeneous and heterogeneous networks. Detailed sensitivity analysis was conducted to study the effects of key parameters on performance

    HPCCP/CAS Workshop Proceedings 1998

    Get PDF
    This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

    Transparent resilience for Chapel

    Get PDF
    High-performance systems pose a number of challenges to traditional fault tolerance approaches. The exponential increase of core numbers in large-scale distributed systems exposes the growth of permanent, intermittent, and transient faults. The redundancy schemes in use increase the number of system resources dedicated to recovery, while the extensive use of silent-failure mode inhibits systems’ capability to detect faults that hinder application progress. As parallel computation strives to survive the high failure rates, software shifts focus towards the support of resilience. The thesis proposes a mechanism for resilience support for Chapel, the high performance language developed by Cray. We investigate the potential for embedded transparent resilience, to assist uninterrupted program completion on distributed hardware, in the event of component failures. Our goal is to achieve graceful degradation; continued application execution when nodes in the system suffer fatal failures. We aim to provide a resilience-enabled version of the language, without application code modifications. We focus on Chapel’s task- and data-parallel constructs, and enhance their functionality with mechanisms to support resilience. In particular, we build on existing language constructs that facilitate parallel execution in Chapel. We focus on constructs that introduce unstructured and structured parallelism and constructs that introduce locality, as derived by the Partitioned Global Address Space programming model. Furthermore, we expand the resilient support to cover data distributions on library-level. The core implementation is on the runtime level, primarily on Chapels tasking and communication layers; we introduce mechanisms to support automatic task adoption and recovery by guiding the control to perform task re-execution. On the data-parallel track, we propose a resilience enabled version of the Block data distribution module. We develop an in-memory data redundancy mechanism, exploiting Chapel’s concept of locales. We apply the concept of buddy locales, as the primary means to store data redundantly and adopt remote workload from failed locales. We evaluate our resilient task-parallel mechanism with respect to the overheads introduced by embedded resilience. We use a set of constructed micro-benchmarks to evaluate the resilient task-parallel implementation, while for the evaluation of resilient data-parallelism we demonstrate results on the STREAM triad benchmark and the N-body all-pairs algorithm, on a 32-node Beowulf cluster. In order to assist the evaluation, we develop an error injection interface to simulate node failures.Heriot-Watt University James Watt Scholarshi

    Adaptive structured parallelism

    Get PDF
    Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and interaction. Parallel programs are expressed by interweaving parameterised skeletons analogously to the way in which structured sequential programs are developed, using well-defined constructs. Skeletons provide top-down design composition and control inheritance throughout the program structure. Based on the algorithmic skeleton concept, structured parallelism provides a high-level parallel programming technique which allows the conceptual description of parallel programs whilst fostering platform independence and algorithm abstraction. By decoupling the algorithm specification from machine-dependent structural considerations, structured parallelism allows programmers to code programs regardless of how the computation and communications will be executed in the system platform.Meanwhile, large non-dedicated multiprocessing systems have long posed a challenge to known distributed systems programming techniques as a result of the inherent heterogeneity and dynamism of their resources. Scant research has been devoted to the use of structural information provided by skeletons in adaptively improving program performance, based on resource utilisation. This thesis presents a methodology to improve skeletal parallel programming in heterogeneous distributed systems by introducing adaptivity through resource awareness. As we hypothesise that a skeletal program should be able to adapt to the dynamic resource conditions over time using its structural forecasting information, we have developed ASPara: Adaptive Structured Parallelism. ASPara is a generic methodology to incorporate structural information at compilation into a parallel program, which will help it to adapt at execution

    CSP for Executable Scientific Workflows

    Get PDF

    Analytical Modeling of High Performance Reconfigurable Computers: Prediction and Analysis of System Performance.

    Get PDF
    The use of a network of shared, heterogeneous workstations each harboring a Reconfigurable Computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the full potential of these systems can be challenging without the knowledge of the system’s performance characteristics. While some performance models exist for shared, heterogeneous workstations, none thus far account for the addition of Reconfigurable Computing systems. This dissertation develops and validates an analytic performance modeling methodology for a class of fork-join algorithms executing on a High Performance Reconfigurable Computing (HPRC) platform. The model includes the effects of the reconfigurable device, application load imbalance, background user load, basic message passing communication, and processor heterogeneity. Three fork-join class of applications, a Boolean Satisfiability Solver, a Matrix-Vector Multiplication algorithm, and an Advanced Encryption Standard algorithm are used to validate the model with homogeneous and simulated heterogeneous workstations. A synthetic load is used to validate the model under various loading conditions including simulating heterogeneity by making some workstations appear slower than others by the use of background loading. The performance modeling methodology proves to be accurate in characterizing the effects of reconfigurable devices, application load imbalance, background user load and heterogeneity for applications running on shared, homogeneous and heterogeneous HPRC resources. The model error in all cases was found to be less than five percent for application runtimes greater than thirty seconds and less than fifteen percent for runtimes less than thirty seconds. The performance modeling methodology enables us to characterize applications running on shared HPRC resources. Cost functions are used to impose system usage policies and the results of vii the modeling methodology are utilized to find the optimal (or near-optimal) set of workstations to use for a given application. The usage policies investigated include determining the computational costs for the workstations and balancing the priority of the background user load with the parallel application. The applications studied fall within the Master-Worker paradigm and are well suited for a grid computing approach. A method for using NetSolve, a grid middleware, with the model and cost functions is introduced whereby users can produce optimal workstation sets and schedules for Master-Worker applications running on shared HPRC resources
    corecore