8,047 research outputs found
Deterministic Consistency: A Programming Model for Shared Memory Parallelism
The difficulty of developing reliable parallel software is generating
interest in deterministic environments, where a given program and input can
yield only one possible result. Languages or type systems can enforce
determinism in new code, and runtime systems can impose synthetic schedules on
legacy parallel code. To parallelize existing serial code, however, we would
like a programming model that is naturally deterministic without language
restrictions or artificial scheduling. We propose "deterministic consistency",
a parallel programming model as easy to understand as the "parallel assignment"
construct in sequential languages such as Perl and JavaScript, where concurrent
threads always read their inputs before writing shared outputs. DC supports
common data- and task-parallel synchronization abstractions such as fork/join
and barriers, as well as non-hierarchical structures such as producer/consumer
pipelines and futures. A preliminary prototype suggests that software-only
implementations of DC can run applications written for popular parallel
environments such as OpenMP with low (<10%) overhead for some applications.Comment: 7 pages, 3 figure
Development and implementation of a LabVIEW based SCADA system for a meshed multi-terminal VSC-HVDC grid scaled platform
This project is oriented to the development of a Supervisory, Control and Data Acquisition
(SCADA) software to control and supervise electrical variables from a scaled platform that
represents a meshed HVDC grid employing National Instruments hardware and LabVIEW logic
environment. The objective is to obtain real time visualization of DC and AC electrical variables
and a lossless data stream acquisition.
The acquisition system hardware elements have been configured, tested and installed on the
grid platform. The system is composed of three chassis, each inside of a VSC terminal cabinet,
with integrated Field-Programmable Gate Arrays (FPGAs), one of them connected via PCI bus
to a local processor and the rest too via Ethernet through a switch. Analogical acquisition
modules were A/D conversion takes place are inserted into the chassis. A personal computer is
used as host, screen terminal and storing space.
There are two main access modes to the FPGAs through the real time system. It has been
implemented a Scan mode VI to monitor all the grid DC signals and a faster FPGA access mode
VI to monitor one converter AC and DC values. The FPGA application consists of two tasks
running at different rates and a FIFO has been implemented to communicate between them
without data loss.
Multiple structures have been tested on the grid platform and evaluated, ensuring the
compliance of previously established specifications, such as sampling and scanning rate, screen
refreshment or possible data loss.
Additionally a turbine emulator was implemented and tested in Labview for further testing
Hardware Synchronization for Embedded Multi-Core Processors
Abstract — Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establishing coherence and consistency for different types of shared memory by hardware means. Also support for point-to-point synchronization between the processor cores is realized implementing different hardware barriers. The practical examinations focus on the logical first step from single- to dual-core systems, using an FPGA-development board with two hard PowerPC processor cores. Best- and worst-case results, together with intensive benchmarking of all synchronization primitives implemented, show the expected superiority of the hardware solutions. It is also shown that dual-ported memory outperforms single-ported memory if the multiple cores use inherent parallelism by locking shared memory more intelligently using an address-sensitive method. I
Parallelization of a Dynamic Monte Carlo Algorithm: a Partially Rejection-Free Conservative Approach
We experiment with a massively parallel implementation of an algorithm for
simulating the dynamics of metastable decay in kinetic Ising models. The
parallel scheme is directly applicable to a wide range of stochastic cellular
automata where the discrete events (updates) are Poisson arrivals. For high
performance, we utilize a continuous-time, asynchronous parallel version of the
n-fold way rejection-free algorithm. Each processing element carries an lxl
block of spins, and we employ the fast SHMEM-library routines on the Cray T3E
distributed-memory parallel architecture. Different processing elements have
different local simulated times. To ensure causality, the algorithm handles the
asynchrony in a conservative fashion. Despite relatively low utilization and an
intricate relationship between the average time increment and the size of the
spin blocks, we find that for sufficiently large l the algorithm outperforms
its corresponding parallel Metropolis (non-rejection-free) counterpart. As an
example application, we present results for metastable decay in a model
ferromagnetic or ferroelectric film, observed with a probe of area smaller than
the total system.Comment: 17 pages, 7 figures, RevTex; submitted to the Journal of
Computational Physic
A communication model of broadcast in wormhole-routed networks on-chip
This paper presents a novel analytical model to compute communication latency of broadcast as the most fundamental collective communication operation. The novelty of the model lies in its ability to predict the broadcast communication latency in wormhole-routed architectures employing asynchronous multi-port routers scheme. The model is applied to the Quarc NoC and its validity is verified by comparing the model predictions against the results obtained from a discrete-event simulator developed using OMNET++
Survivable algorithms and redundancy management in NASA's distributed computing systems
The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given
- …