8,047 research outputs found

    Deterministic Consistency: A Programming Model for Shared Memory Parallelism

    Full text link
    The difficulty of developing reliable parallel software is generating interest in deterministic environments, where a given program and input can yield only one possible result. Languages or type systems can enforce determinism in new code, and runtime systems can impose synthetic schedules on legacy parallel code. To parallelize existing serial code, however, we would like a programming model that is naturally deterministic without language restrictions or artificial scheduling. We propose "deterministic consistency", a parallel programming model as easy to understand as the "parallel assignment" construct in sequential languages such as Perl and JavaScript, where concurrent threads always read their inputs before writing shared outputs. DC supports common data- and task-parallel synchronization abstractions such as fork/join and barriers, as well as non-hierarchical structures such as producer/consumer pipelines and futures. A preliminary prototype suggests that software-only implementations of DC can run applications written for popular parallel environments such as OpenMP with low (<10%) overhead for some applications.Comment: 7 pages, 3 figure

    Development and implementation of a LabVIEW based SCADA system for a meshed multi-terminal VSC-HVDC grid scaled platform

    Get PDF
    This project is oriented to the development of a Supervisory, Control and Data Acquisition (SCADA) software to control and supervise electrical variables from a scaled platform that represents a meshed HVDC grid employing National Instruments hardware and LabVIEW logic environment. The objective is to obtain real time visualization of DC and AC electrical variables and a lossless data stream acquisition. The acquisition system hardware elements have been configured, tested and installed on the grid platform. The system is composed of three chassis, each inside of a VSC terminal cabinet, with integrated Field-Programmable Gate Arrays (FPGAs), one of them connected via PCI bus to a local processor and the rest too via Ethernet through a switch. Analogical acquisition modules were A/D conversion takes place are inserted into the chassis. A personal computer is used as host, screen terminal and storing space. There are two main access modes to the FPGAs through the real time system. It has been implemented a Scan mode VI to monitor all the grid DC signals and a faster FPGA access mode VI to monitor one converter AC and DC values. The FPGA application consists of two tasks running at different rates and a FIFO has been implemented to communicate between them without data loss. Multiple structures have been tested on the grid platform and evaluated, ensuring the compliance of previously established specifications, such as sampling and scanning rate, screen refreshment or possible data loss. Additionally a turbine emulator was implemented and tested in Labview for further testing

    Hardware Synchronization for Embedded Multi-Core Processors

    Get PDF
    Abstract — Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establishing coherence and consistency for different types of shared memory by hardware means. Also support for point-to-point synchronization between the processor cores is realized implementing different hardware barriers. The practical examinations focus on the logical first step from single- to dual-core systems, using an FPGA-development board with two hard PowerPC processor cores. Best- and worst-case results, together with intensive benchmarking of all synchronization primitives implemented, show the expected superiority of the hardware solutions. It is also shown that dual-ported memory outperforms single-ported memory if the multiple cores use inherent parallelism by locking shared memory more intelligently using an address-sensitive method. I

    Parallelization of a Dynamic Monte Carlo Algorithm: a Partially Rejection-Free Conservative Approach

    Full text link
    We experiment with a massively parallel implementation of an algorithm for simulating the dynamics of metastable decay in kinetic Ising models. The parallel scheme is directly applicable to a wide range of stochastic cellular automata where the discrete events (updates) are Poisson arrivals. For high performance, we utilize a continuous-time, asynchronous parallel version of the n-fold way rejection-free algorithm. Each processing element carries an lxl block of spins, and we employ the fast SHMEM-library routines on the Cray T3E distributed-memory parallel architecture. Different processing elements have different local simulated times. To ensure causality, the algorithm handles the asynchrony in a conservative fashion. Despite relatively low utilization and an intricate relationship between the average time increment and the size of the spin blocks, we find that for sufficiently large l the algorithm outperforms its corresponding parallel Metropolis (non-rejection-free) counterpart. As an example application, we present results for metastable decay in a model ferromagnetic or ferroelectric film, observed with a probe of area smaller than the total system.Comment: 17 pages, 7 figures, RevTex; submitted to the Journal of Computational Physic

    A communication model of broadcast in wormhole-routed networks on-chip

    Get PDF
    This paper presents a novel analytical model to compute communication latency of broadcast as the most fundamental collective communication operation. The novelty of the model lies in its ability to predict the broadcast communication latency in wormhole-routed architectures employing asynchronous multi-port routers scheme. The model is applied to the Quarc NoC and its validity is verified by comparing the model predictions against the results obtained from a discrete-event simulator developed using OMNET++

    Survivable algorithms and redundancy management in NASA's distributed computing systems

    Get PDF
    The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given
    corecore