Search CORE

8,047 research outputs found

Deterministic Consistency: A Programming Model for Shared Memory Parallelism

Author: Aviram Amittai
Ford Bryan
Publication venue
Publication date: 01/01/2010
Field of study

The difficulty of developing reliable parallel software is generating interest in deterministic environments, where a given program and input can yield only one possible result. Languages or type systems can enforce determinism in new code, and runtime systems can impose synthetic schedules on legacy parallel code. To parallelize existing serial code, however, we would like a programming model that is naturally deterministic without language restrictions or artificial scheduling. We propose "deterministic consistency", a parallel programming model as easy to understand as the "parallel assignment" construct in sequential languages such as Perl and JavaScript, where concurrent threads always read their inputs before writing shared outputs. DC supports common data- and task-parallel synchronization abstractions such as fork/join and barriers, as well as non-hierarchical structures such as producer/consumer pipelines and futures. A preliminary prototype suggests that software-only implementations of DC can run applications written for popular parallel environments such as OpenMP with low (<10%) overhead for some applications.Comment: 7 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Development and implementation of a LabVIEW based SCADA system for a meshed multi-terminal VSC-HVDC grid scaled platform

Author: Mora Comas Gisela
Publication venue: Universitat Politècnica de Catalunya
Publication date: 05/10/2016
Field of study

This project is oriented to the development of a Supervisory, Control and Data Acquisition (SCADA) software to control and supervise electrical variables from a scaled platform that represents a meshed HVDC grid employing National Instruments hardware and LabVIEW logic environment. The objective is to obtain real time visualization of DC and AC electrical variables and a lossless data stream acquisition. The acquisition system hardware elements have been configured, tested and installed on the grid platform. The system is composed of three chassis, each inside of a VSC terminal cabinet, with integrated Field-Programmable Gate Arrays (FPGAs), one of them connected via PCI bus to a local processor and the rest too via Ethernet through a switch. Analogical acquisition modules were A/D conversion takes place are inserted into the chassis. A personal computer is used as host, screen terminal and storing space. There are two main access modes to the FPGAs through the real time system. It has been implemented a Scan mode VI to monitor all the grid DC signals and a faster FPGA access mode VI to monitor one converter AC and DC values. The FPGA application consists of two tasks running at different rates and a FIFO has been implemented to communicate between them without data loss. Multiple structures have been tested on the grid platform and evaluated, ensuring the compliance of previously established specifications, such as sampling and scanning rate, screen refreshment or possible data loss. Additionally a turbine emulator was implemented and tested in Labview for further testing

UPCommons. Portal del coneixement obert de la UPC

Hardware Synchronization for Embedded Multi-Core Processors

Author: Haase Jan
Liccardi Benito
Schoeberl Martin
Stoif Christian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Abstract — Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establishing coherence and consistency for different types of shared memory by hardware means. Also support for point-to-point synchronization between the processor cores is realized implementing different hardware barriers. The practical examinations focus on the logical first step from single- to dual-core systems, using an FPGA-development board with two hard PowerPC processor cores. Best- and worst-case results, together with intensive benchmarking of all synchronization primitives implemented, show the expected superiority of the hardware solutions. It is also shown that dual-ported memory outperforms single-ported memory if the multiple cores use inherent parallelism by locking shared memory more intelligently using an address-sensitive method. I

CiteSeerX

Crossref

Online Research Database In Technology

Parallelization of a Dynamic Monte Carlo Algorithm: a Partially Rejection-Free Conservative Approach

Author: Aharoni
Avrami
Avrami
Avrami
Beale
Binder
Bortz
Chayes
Chayes
Cheng
Duiker
Friedberg
Fujimoto
G Korniss
Goldenfeld
Jacobs
Jefferson
Johnson
Kolesik
Kolesik
Kolmogorov
Lemerle
Lubachevsky
Lubachevsky
Lubachevsky
M.A Novotny
Novotny
Novotny
Novotny
Novotny
P.A Rikvold
Ramos
Richards
Rikvold
Rikvold
Sides
Sides
Swendsen
Wolff
Publication venue: 'Elsevier BV'
Publication date: 21/12/1998
Field of study

We experiment with a massively parallel implementation of an algorithm for simulating the dynamics of metastable decay in kinetic Ising models. The parallel scheme is directly applicable to a wide range of stochastic cellular automata where the discrete events (updates) are Poisson arrivals. For high performance, we utilize a continuous-time, asynchronous parallel version of the n-fold way rejection-free algorithm. Each processing element carries an lxl block of spins, and we employ the fast SHMEM-library routines on the Cray T3E distributed-memory parallel architecture. Different processing elements have different local simulated times. To ensure causality, the algorithm handles the asynchrony in a conservative fashion. Despite relatively low utilization and an intricate relationship between the average time increment and the size of the spin blocks, we find that for sufficiently large l the algorithm outperforms its corresponding parallel Metropolis (non-rejection-free) counterpart. As an example application, we present results for metastable decay in a model ferromagnetic or ferroelectric film, observed with a probe of area smaller than the total system.Comment: 17 pages, 7 figures, RevTex; submitted to the Journal of Computational Physic

arXiv.org e-Print Archive

Crossref

A communication model of broadcast in wormhole-routed networks on-chip

Author: Institute of Electrical and Electronics
Moadeli M.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

This paper presents a novel analytical model to compute communication latency of broadcast as the most fundamental collective communication operation. The novelty of the model lies in its ability to predict the broadcast communication latency in wormhole-routed architectures employing asynchronous multi-port routers scheme. The model is applied to the Quarc NoC and its validity is verified by comparing the model predictions against the results obtained from a discrete-event simulator developed using OMNET++

CiteSeerX

Crossref

Enlighten

Survivable algorithms and redundancy management in NASA's distributed computing systems

Author: Malek Miroslaw
Publication venue
Publication date
Field of study

The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given

NASA Technical Reports Server