7 research outputs found

    A Simple Snapshot Algorithm for Multicore Systems

    Get PDF
    An atomic snapshot object is an object that can be concurrently accessed by n asynchronous processes prone to crash. It is made of m components (base atomic registers) and is defined by two operations: an update operation that allows a process to atomically assign a new value to a component and a snapshot operation that atomically reads and returns the values of all the components. To cope with the net effect of concurrency, asynchrony and failures, the algorithm implementing the update operation has to help concurrent snapshot operations in order they can always terminate. This paper presents a new and particularly simple construction of a snapshot object. This construction relies on a new principle, that we call “write first, help later” strategy. This strategy directs an update operation first to write its value and only then computes an helping snapshot value that can be used by a snapshot operation in order to terminate. Interestingly, not only the algorithms implementing the snapshot and update operations are simple and have easy proofs, but they are also efficient in terms of the number of accesses to the underlying atomic registers shared by the processes. An operation costs O(m) in the best case and O(n m) in the worst case

    On the Consistency Conditions of Transactional Memories

    Get PDF
    The aim of a Software Transactional Memory (STM) is to discharge the programmers from the management of synchronization in multiprocess programs that access concurrent objects. To that end, a STM system provides the programmer with the concept of a transaction: each sequential process is decomposed into transactions, where a transaction encapsulates a piece of sequential code accessing concurrent objects. A transaction contains no explicit synchronization statement and appears as if it has been executed atomically. Due to the underlying concurrency management, a transaction commits or aborts. Up to now, few papers focused on the definition of consistency conditions suited to STM systems. One of them has recently proposed the opacity consistency condition. Opacity involves all the transactions (i.e., the committed plus the aborted transactions). It requires that (1) until it aborts (if ever it does) a transaction sees a consistent global state of the concurrent objects, and (2) the execution is linearizable (i.e., it could have been produced by a sequential execution -of the same transactions- that respects the real time order on the non-concurrent transactions). This paper is on consistency conditions for transactional memories. It first presents a framework that allows defining a space of consistency conditions whose extreme endpoints are serializability and opacity. It then extracts from this framework a new consistency condition that we call virtual world consistency. This condition ensures that (1) each transaction (committed or aborted) reads values from a consistent global state, (2) the consistent global states read by committed transactions are mutually consistent, but (3) the consistent global states read by aborted transactions are not required to be mutually consistent. Interestingly enough, this consistency condition can benefit lots of STM applications as, from its local point of view, a transaction cannot differentiate it from opacity. Finally, the paper presents and proves correct a STM algorithm that implements the virtual world consistency condition. Interestingly, this algorithm distinguishes the serialization date of a transaction from its commit date (thereby allowing more transactions to commit)

    Factores de rendimiento en entornos multicore

    Get PDF
    Este documento refleja el estudio de investigación para la detección de factores que afectan al rendimiento en entornos multicore. Debido a la gran diversidad de arquitecturas multicore se ha definido un marco de trabajo, que consiste en la adopción de una arquitectura específica, un modelo de programación basado en paralelismo de datos, y aplicaciones del tipo Single Program Multiple Data. Una vez definido el marco de trabajo, se han evaluado los factores de rendimiento con especial atención al modelo de programación. Por este motivo, se ha analizado la librería de threads y la API OpenMP para detectar aquellas funciones sensibles de ser sintonizadas al permitir un comportamiento adaptativo de la aplicación al entorno, y que dependiendo de su adecuada utilización han de mejorar el rendimiento de la aplicación.Aquest document reflexa l'estudi d'investigació per a la detecció de factors que afecten al rendiment en entorns multicore. Degut a la gran quantitat d'arquitectures multicore s'ha definit un marc de treball acotat, que consisteix en la adopció d'una arquitectura específica, un model de programació basat en paral·lelisme de dates, i aplicacions del tipus Single Program Multiple Data. Una vegada definit el marc de treball, s'han avaluat els factors de rendiment amb especial atenció al model de programació. Per aquest motiu, s'ha analitzat la llibreria de thread i la API OpenMP per a detectar aquelles funcions sensibles de ser sintonitzades, al permetre un comportament adaptatiu de l'aplicació a l'entorn, i que, depenent de la seva adequada utilització s'aconsegueix una millora en el rendiment de la aplicació.This work reflects research studies for the detection of factors that affect performance in multicore environments. Due to the wide variety of multicore architectures we have defined a framework, consisting of a specific architecture, a programming model based on data parallelism, and Single Program Multiple Data applications. Having defined the framework, we evaluate the performance factors with special attention to programming model. For this reason, we have analyzed threaad libreary and OpenMP API to detect thos candidates functions to be tuned, allowin applications to beave adaptively to the computing environment, and based on their propper use will improve performance

    Help when needed, but no more: Efficient Read/Write Partial Snapshot

    Get PDF
    International audienc

    The Investigation of Efficiency of Physical Phenomena Modelling Using Differential Equations on Distributed Systems

    Get PDF
    This work is dedicated to development of mathematical modelling software. In this dissertation numerical methods and algorithms are investigated in software making context. While applying a numerical method it is important to take into account the limited computer resources, the architecture of these resources and how do methods affect software robustness. Three main aspects of this investigation are that software implementation must be efficient, robust and be able to utilize specific hardware resources. The hardware specificity in this work is related to distributed computations of different types: single CPU with multiple cores, multiple CPUs with multiple cores and highly parallel multithreaded GPU device. The investigation is done in three directions: GPU usage for 3D FDTD calculations, FVM method usage to implement efficient calculations of a very specific heat transferring problem, and development of special techniques for software for specific bacteria self organization problem when the results are sensitive to numerical methods, initial data and even computer round-off errors. All these directions are dedicated to create correct technological components that make a software implementation robust and efficient. The time prediction model for 3D FDTD calculations is proposed, which lets to evaluate the efficiency of different GPUs. A reasonable speedup with GPU comparing to CPU is obtained. For FVM implementation the OpenFOAM open source software is selected as a basis for implementation of calculations and a few algorithms and their modifications to solve efficiency issues are proposed. The FVM parallel solver is implemented and analyzed, it is adapted to heterogeneous cluster Vilkas. To create robust software for simulation of bacteria self organization mathematically robust methods are applied and results are analyzed, the algorithm is modified for parallel computations

    Heterogeneous multicore systems for signal processing

    Get PDF
    This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included
    corecore