17 research outputs found

    Efficient Multi-Word Compare and Swap

    Get PDF
    Atomic lock-free multi-word compare-and-swap (MCAS) is a powerful tool for designing concurrent algorithms. Yet, its widespread usage has been limited because lock-free implementations of MCAS make heavy use of expensive compare-and-swap (CAS) instructions. Existing MCAS implementations indeed use at least 2k+1 CASes per k-CAS. This leads to the natural desire to minimize the number of CASes required to implement MCAS. We first prove in this paper that it is impossible to "pack" the information required to perform a k-word CAS (k-CAS) in less than k locations to be CASed. Then we present the first algorithm that requires k+1 CASes per call to k-CAS in the common uncontended case. We implement our algorithm and show that it outperforms a state-of-the-art baseline in a variety of benchmarks in most considered workloads. We also present a durably linearizable (persistent memory friendly) version of our MCAS algorithm using only 2 persistence fences per call, while still only requiring k+1 CASes per k-CAS

    CheckFence: Checking Consistency of Concurrent Data Types on Relaxed Memory Models

    Get PDF
    Concurrency libraries can facilitate the development of multithreaded programs by providing concurrent implementations of familiar data types such as queues or sets. There exist many optimized algorithms that can achieve superior performance on multiprocessors by allowing concurrent data accesses without using locks. Unfortunately, such algorithms can harbor subtle concurrency bugs. Moreover, they require memory ordering fences to function correctly on relaxed memory models. To address these difficulties, we propose a verification approach that can exhaustively check all concurrent executions of a given test program on a relaxed memory model and can verify that they are observationally equivalent to a sequential execution. Our Check- Fence prototype automatically translates the C implementation code and the test program into a SAT formula, hands the latter to a standard SAT solver, and constructs counterexample traces if there exist incorrect executions. Applying CheckFence to five previously published algorithms, we were able to (1) find several bugs (some not previously known), and (2) determine how to place memory ordering fences for relaxed memory models

    Automatic contention detection and amelioration for data-intensive operations

    Full text link

    Formal verification of a concurrent binary search tree

    Get PDF
    In this thesis, we formally verify a simplified version of the non-blocking linearizable binary search tree of Ellen et al., which appeared in the Proceedings of the 29th Annual ACM Symposium on Principles of Distributed Computing (pages 131-140), using the PVS specification and verification system. The algorithm and its specification are both modelled as I/O automata. In order to formally verify that the algorithm implements the specification, we show that the algorithm's I/O automaton simulates the specification's. An intermediate I/O automaton is constructed to simplify the simulation proof of linearizability. By showing there is a forward simulation from the algorithm's I/O automaton to the intermediate automaton and there is a backward simulation from the intermediate automaton to the specification's automaton, we formally verify that the algorithm implements its specification. While formalizing the proof, we found small errors in the original proof

    High Level Concurrency in C∀

    Get PDF
    Concurrent programs are notoriously hard to write and even harder to debug. Furthermore concurrent programs must be performant, as the introduction of concurrency into a program is often done to achieve some form of speedup. This thesis presents a suite of high-level concurrent-language features in the new programming language C∀, all of which are implemented with the aim of improving the performance, productivity, and safety of concurrent programs. C∀ is a non-object-oriented programming language that extends C. The foundation for concurrency in C∀ was laid by Thierry Delisle [15], who implemented coroutines, user-level threads, and monitors. This thesis builds upon that work and introduces a suite of new concurrent features as its main contribution. The features include a mutex statement (similar to a C++ scoped lock or Java synchronized statement), Go-like channels and select statement, and an actor system. The root ideas behind these features are not new, but the C∀ implementations extends the original ideas in performance, productivity, and safety

    Software Transactional Memory Building Blocks

    Get PDF
    Exploiting thread-level parallelism has become a part of mainstream programming in recent years. Many approaches to parallelization require threads executing in parallel to also synchronize occassionally (i.e., coordinate concurrent accesses to shared state). Transactional Memory (TM) is a programming abstraction that provides the concept of database transactions in the context of programming languages such as C/C++. This allows programmers to only declare which pieces of a program synchronize without requiring them to actually implement synchronization and tune its performance, which in turn makes TM typically easier to use than other abstractions such as locks. I have investigated and implemented the building blocks that are required for a high-performance, practical, and realistic TM. They host several novel algorithms and optimizations for TM implementations, both for current hardware and future hardware extensions for TM, and are being used in or have influenced commercial TM implementations such as the TM support in GCC

    Categorization And Visualization Of Parallel Programming Systems

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2005Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2005Yükesek kazanımlı programlama olarak da bilinen paralel programlama, bir problemi daha hızlı çözmek için aynı anda birden çok işlemci kullanılmasına denir. Günümüzde, ağır işlemler içeren birçok problem paralel olarak uygulanmaya çalışılmaktadır, buna örnek olarak nehir sularının simüle edilmesi, fizik veya kimya problemleri, astrolojik simülasyonlar verilebilir. Bu tezin amacı, bilimsel hesaplama veya mühendislik amaçlı kullanılan yüksek kazanımlı yazılımları tartışmaktır. Paralel programlama sistemleri ile kastedilen kütüphaneler, diller, derleyiciler, derleyici yönlendiricileri veya bunun dışında kalan, programcının paralel algoritmasını ifade edebileceği yapılardır. Yükesek kazanımlı program tasarımı için programcının dikkat etmesi gereken iki önemli nokta vardır: problemi iyi kavrayıp uygun bir çözüm önermek, doğru sisteme karar verebilmek. Doğru karar verebilmek için kullanıcının sistemler hakkında oldukça iyi bilgiye sahip olması gerekir. Bazen, birden çok yazılım ve donanımı bir arada kullanmak da gerekebilir. Bu tezde var olan paralel programlama sistemleri tanımlanır ve sınıflandırılır, bunun için güncel bildiriler esas alınmıştır. Özellikle algoritmik taslaklar ve fonsiyonel paralel programlama üzerinde durulmuştur.Ayrica güncel bilgileri depolamak ve bir kaynak yaratmak için wiki temelli bir web kaynağı oluşturulmuştur. Sistemlerin grafik gösterimini sağlayıp daha anlaşılır bir sınıflandırma yapabilmek için yeni bir sözdizimi tasarlanıp dinamik ağ çizebilecek webdot aracı ile bir araya getirilerek sistemleri temsil edecek ağı çizecek araç geliştirilmiştir. Bu sözdiziminin öğrenilmesi ve kullanılması son derece kolaydır. Son olarak iki temel paralel programlama tipi, paylaşılan bellek ve mesajlaşma, iki farklı tipte algoritma kullanılarak karşılaştırılmıştır. Programlar OpenMP ve MPI ile gerçeklenmiştir, farklı paralel makinelerde koşturulup sonuçları karşılaştırılmıştır. Paralel makineler için Almanya nın Aachen Üniversitesi nin SMP ağı ve Ulakbim in dağıtık bellekli paralel makineleri kullanılmıştır.Parallel computing, also called high-performance computing, refers to solving problems faster by using multiple processors simultaneously. Nowadays, almost every computationally-intensive problem that one could imagine is tried to be implemented in parallel. This thesis is aimed at discussing high-performance software for scientific or engineering applications. The term parallel programming systems here means libraries, languages, compiler directives or other means through which a programmer can express a parallel algorithm. To design high performance programs, there are two keys for the programmer: to understand the problem and find a solution for parallelization, and to decide on the right system for the implementation, which requires a good knowledge about existing parallel programming systems. The programmer, after having understood the problem, has to choose between many systems, some of which are closely related, whereas others have big differences. This thesis describes and classifies existing parallel programming systems, thus bringing existing surveys up to date. It describes a wiki-based web portal for collecting information about most recent systems, which has been developed as part of the thesis. A special syntax and a visualization tool has been developed. This syntax and tool allow users to have their own categorization scheme. Fourth, it compares two major programming styles message passing and shared memory with two different algorithms in order show performance differences of these styles. Algorithms are implemented in OpenMP and MPI, performance of both programs are measured on the SMP Cluster of Aachen University, Germany and on the Beowulf Cluster of Ulakbim, Ankara.Yüksek LisansM.Sc
    corecore