29 research outputs found

    LIPIcs

    Get PDF
    Union-Find (or Disjoint-Set Union) is one of the fundamental problems in computer science; it has been well-studied from both theoretical and practical perspectives in the sequential case. Recently, there has been mounting interest in analyzing this problem in the concurrent scenario, and several asymptotically-efficient algorithms have been proposed. Yet, to date, there is very little known about the practical performance of concurrent Union-Find. This work addresses this gap. We evaluate and analyze the performance of several concurrent Union-Find algorithms and optimization strategies across a wide range of platforms (Intel, AMD, and ARM) and workloads (social, random, and road networks, as well as integrations into more complex algorithms). We first observe that, due to the limited computational cost, the number of induced cache misses is the critical determining factor for the performance of existing algorithms. We introduce new techniques to reduce this cost by storing node priorities implicitly and by using plain reads and writes in a way that does not affect the correctness of the algorithms. Finally, we show that Union-Find implementations are an interesting application for Transactional Memory (TM): one of the fastest algorithm variants we discovered is a sequential one that uses coarse-grained locking with the lock elision optimization to reduce synchronization cost and increase scalability

    Kinetics of substrate recognition and cleavage by human 8-oxoguanine-DNA glycosylase

    Get PDF
    Human 8-oxoguanine-DNA glycosylase (hOgg1) excises 8-oxo-7,8-dihydroguanine (8-oxoG) from damaged DNA. We report a pre-steady-state kinetic analysis of hOgg1 mechanism using stopped-flow and enzyme fluorescence monitoring. The kinetic scheme for hOgg1 processing an 8-oxoG:C-containing substrate was found to include at least three fast equilibrium steps followed by two slow, irreversible steps and another equilibrium step. The second irreversible step was rate-limiting overall. By comparing data from Ogg1 intrinsic fluorescence traces and from accumulation of products of different types, the irreversible steps were attributed to two main chemical steps of the Ogg1-catalyzed reaction: cleavage of the N-glycosidic bond of the damaged nucleotide and β-elimination of its 3′-phosphate. The fast equilibrium steps were attributed to enzyme conformational changes during the recognition of 8-oxoG, and the final equilibrium, to binding of the reaction product by the enzyme. hOgg1 interacted with a substrate containing an aldehydic AP site very slowly, but the addition of 8-bromoguanine (8-BrG) greatly accelerated the reaction, which was best described by two initial equilibrium steps followed by one irreversible chemical step and a final product release equilibrium step. The irreversible step may correspond to β-elimination since it is the very step facilitated by 8-BrG

    Experimental Study and Mathematical Modeling of the Processes Occurring in ZrN Coating/Silumin Substrate Systems under Pulsed Electron Beam Irradiation

    Get PDF
    This paper presents a study of a combined modification of silumin, which included deposition of a ZrN coating on a silumin substrate and subsequent treatment of the coating/substrate system with a submillisecond pulsed electron beam. The local temperature on the samples in the electron-beam-affected zone and the thickness of the melt zone were measured experimentally and calculated using a theoretical model. The Stefan problem was solved numerically for the fast heating of bare and ZrN-coated silumin under intense electron beam irradiation. Time variations of the temperature field, the position of the crystallization front, and the speed of the front movement have been calculated. It was found that when the coating thickness was increased from 0.5 to 2 [mu]m, the surface temperature of the samples increased from 760 to 1070 °C, the rise rate of the surface temperature increased from 6×107 to 9×107 K/s, and the melt depth was no more than 57 μm. The speed of the melt front during the pulse was 3×105 [mu]m/s. Good agreement was observed between the experimental and theoretical values of the temperature characteristics and melt zone thickness

    Efficiency guarantees for parallel incremental algorithms under relaxed schedulers

    No full text
    Several classic problems in graph processing and computational geometry are solved via incremental algorithms, which split computation into a series of small tasks acting on shared state, which gets updated progressively. While the sequential variant of such algorithms usually specifies a fixed (but sometimes random) order in which the tasks should be performed, a standard approach to parallelizing such algorithms is to relax this constraint to allow for out-of-order parallel execution. This is the case for parallel implementations of Dijkstra's single-source shortest-paths (SSSP) algorithm, and for parallel Delaunay mesh triangulation. While many software frameworks parallelize incremental computation in this way, it is still not well understood whether this relaxed ordering approach can still provide any complexity guarantees. In this paper, we address this problem, and analyze the efficiency guarantees provided by a range of incremental algorithms when parallelized via relaxed schedulers. We show that, for algorithms such as Delaunay mesh triangulation and sorting by insertion, schedulers with a maximum relaxation factor of k in terms of the maximum priority inversion allowed will introduce a maximum amount of wasted work of O(łog n poly(k)), where n is the number of tasks to be executed. For SSSP, we show that the additional work is O(poly(k), dmax / wmin), where dmax is the maximum distance between two nodes, and wmin is the minimum such distance. In practical settings where n >> k, this suggests that the overheads of relaxation will be outweighed by the improved scalability of the relaxed scheduler. On the negative side, we provide lower bounds showing that certain algorithms will inherently incur a non-trivial amount of wasted work due to scheduler relaxation, even for relatively benign relaxed schedulers

    Lock-free channels for programming via communicating sequential processes

    No full text
    Traditional concurrent programming involves manipulating shared mutable state. Alternatives to this programming style are communicating sequential processes (CSP) [1] and actor [2] models, which share data via explicit communication. Rendezvous channelis the common abstraction for communication between several processes, where senders and receivers perform a rendezvous handshake as a part of their protocol (senders wait for receivers and vice versa). Additionally to this, channels support the select expression. In this work, we present the first efficient lock-free channel algorithm, and compare it against Go [3] and Kotlin [4] baseline implementations

    Fast and scalable channels in Kotlin Coroutines

    No full text
    Asynchronous programming has gained significant popularity over the last decade: support for this programming pattern is available in many popular languages via libraries and native language implementations, typically in the form of coroutines or the async/await construct. Instead of programming via shared memory, this concept assumes implicit synchronization through message passing. The key data structure enabling such communication is the rendezvous channel. Roughly, a rendezvous channel is a blocking queue of size zero, so both send(e) and receive() operations wait for each other, performing a rendezvous when they meet. To optimize the message passing pattern, channels are usually equipped with a fixed-size buffer, so sends do not suspend and put elements into the buffer until its capacity is exceeded. This primitive is known as a buffered channel. This paper presents a fast and scalable algorithm for both rendezvous and buffered channels. Similarly to modern queues, our solution is based on an infinite array with two positional counters for send(e) and receive() operations, leveraging the unconditional Fetch-And-Add instruction to update them. Yet, the algorithm requires non-trivial modifications of this classic pattern, in order to support the full channel semantics, such as buffering and cancellation of waiting requests. We compare the performance of our solution to that of the Kotlin implementation, as well as against other academic proposals, showing up to 9.8× speedup. To showcase its expressiveness and performance, we also integrated the proposed algorithm into the standard Kotlin Coroutines library, replacing the previous channel implementations
    corecore