2,155 research outputs found

    Lock-free Concurrent Data Structures

    Full text link
    Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computin

    On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs

    Get PDF
    A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).Singapore-MIT Alliance (SMA

    IST Austria Technical Report

    Get PDF
    We study algorithmic questions for concurrent systems where the transitions are labeled from a complete, closed semiring, and path properties are algebraic with semiring operations. The algebraic path properties can model dataflow analysis problems, the shortest path problem, and many other natural problems that arise in program analysis. We consider that each component of the concurrent system is a graph with constant treewidth, a property satisfied by the controlflow graphs of most programs. We allow for multiple possible queries, which arise naturally in demand driven dataflow analysis. The study of multiple queries allows us to consider the tradeoff between the resource usage of the one-time preprocessing and for each individual query. The traditional approach constructs the product graph of all components and applies the best-known graph algorithm on the product. In this approach, even the answer to a single query requires the transitive closure (i.e., the results of all possible queries), which provides no room for tradeoff between preprocessing and query time. Our main contributions are algorithms that significantly improve the worst-case running time of the traditional approach, and provide various tradeoffs depending on the number of queries. For example, in a concurrent system of two components, the traditional approach requires hexic time in the worst case for answering one query as well as computing the transitive closure, whereas we show that with one-time preprocessing in almost cubic time, each subsequent query can be answered in at most linear time, and even the transitive closure can be computed in almost quartic time. Furthermore, we establish conditional optimality results showing that the worst-case running time of our algorithms cannot be improved without achieving major breakthroughs in graph algorithms (i.e., improving the worst-case bound for the shortest path problem in general graphs). Preliminary experimental results show that our algorithms perform favorably on several benchmarks

    IST Austria Technical Report

    Get PDF
    We study algorithmic questions for concurrent systems where the transitions are labeled from a complete, closed semiring, and path properties are algebraic with semiring operations. The algebraic path properties can model dataflow analysis problems, the shortest path problem, and many other natural properties that arise in program analysis. We consider that each component of the concurrent system is a graph with constant treewidth, and it is known that the controlflow graphs of most programs have constant treewidth. We allow for multiple possible queries, which arise naturally in demand driven dataflow analysis problems (e.g., alias analysis). The study of multiple queries allows us to consider the tradeoff between the resource usage of the \emph{one-time} preprocessing and for \emph{each individual} query. The traditional approaches construct the product graph of all components and apply the best-known graph algorithm on the product. In the traditional approach, even the answer to a single query requires the transitive closure computation (i.e., the results of all possible queries), which provides no room for tradeoff between preprocessing and query time. Our main contributions are algorithms that significantly improve the worst-case running time of the traditional approach, and provide various tradeoffs depending on the number of queries. For example, in a concurrent system of two components, the traditional approach requires hexic time in the worst case for answering one query as well as computing the transitive closure, whereas we show that with one-time preprocessing in almost cubic time, each subsequent query can be answered in at most linear time, and even the transitive closure can be computed in almost quartic time. Furthermore, we establish conditional optimality results that show that the worst-case running times of our algorithms cannot be improved without achieving major breakthroughs in graph algorithms (such as improving the worst-case bounds for the shortest path problem in general graphs whose current best-known bound has not been improved in five decades). Finally, we provide a prototype implementation of our algorithms which significantly outperforms the existing algorithmic methods on several benchmarks

    Permission-Based Separation Logic for Multithreaded Java Programs

    Get PDF
    This paper presents a program logic for reasoning about multithreaded Java-like programs with dynamic thread creation, thread joining and reentrant object monitors. The logic is based on concurrent separation logic. It is the first detailed adaptation of concurrent separation logic to a multithreaded Java-like language. The program logic associates a unique static access permission with each heap location, ensuring exclusive write accesses and ruling out data races. Concurrent reads are supported through fractional permissions. Permissions can be transferred between threads upon thread starting, thread joining, initial monitor entrancies and final monitor exits. In order to distinguish between initial monitor entrancies and monitor reentrancies, auxiliary variables keep track of multisets of currently held monitors. Data abstraction and behavioral subtyping are facilitated through abstract predicates, which are also used to represent monitor invariants, preconditions for thread starting and postconditions for thread joining. Value-parametrized types allow to conveniently capture common strong global invariants, like static object ownership relations. The program logic is presented for a model language with Java-like classes and interfaces, the soundness of the program logic is proven, and a number of illustrative examples are presented

    IST Austria Thesis

    Get PDF
    This dissertation focuses on algorithmic aspects of program verification, and presents modeling and complexity advances on several problems related to the static analysis of programs, the stateless model checking of concurrent programs, and the competitive analysis of real-time scheduling algorithms. Our contributions can be broadly grouped into five categories. Our first contribution is a set of new algorithms and data structures for the quantitative and data-flow analysis of programs, based on the graph-theoretic notion of treewidth. It has been observed that the control-flow graphs of typical programs have special structure, and are characterized as graphs of small treewidth. We utilize this structural property to provide faster algorithms for the quantitative and data-flow analysis of recursive and concurrent programs. In most cases we make an algebraic treatment of the considered problem, where several interesting analyses, such as the reachability, shortest path, and certain kind of data-flow analysis problems follow as special cases. We exploit the constant-treewidth property to obtain algorithmic improvements for on-demand versions of the problems, and provide data structures with various tradeoffs between the resources spent in the preprocessing and querying phase. We also improve on the algorithmic complexity of quantitative problems outside the algebraic path framework, namely of the minimum mean-payoff, minimum ratio, and minimum initial credit for energy problems. Our second contribution is a set of algorithms for Dyck reachability with applications to data-dependence analysis and alias analysis. In particular, we develop an optimal algorithm for Dyck reachability on bidirected graphs, which are ubiquitous in context-insensitive, field-sensitive points-to analysis. Additionally, we develop an efficient algorithm for context-sensitive data-dependence analysis via Dyck reachability, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is (i)~linear in the number of call sites and (ii)~only logarithmic in the size of the whole library, as opposed to linear in the size of the whole library. Finally, we prove that Dyck reachability is Boolean Matrix Multiplication-hard in general, and the hardness also holds for graphs of constant treewidth. This hardness result strongly indicates that there exist no combinatorial algorithms for Dyck reachability with truly subcubic complexity. Our third contribution is the formalization and algorithmic treatment of the Quantitative Interprocedural Analysis framework. In this framework, the transitions of a recursive program are annotated as good, bad or neutral, and receive a weight which measures the magnitude of their respective effect. The Quantitative Interprocedural Analysis problem asks to determine whether there exists an infinite run of the program where the long-run ratio of the bad weights over the good weights is above a given threshold. We illustrate how several quantitative problems related to static analysis of recursive programs can be instantiated in this framework, and present some case studies to this direction. Our fourth contribution is a new dynamic partial-order reduction for the stateless model checking of concurrent programs. Traditional approaches rely on the standard Mazurkiewicz equivalence between traces, by means of partitioning the trace space into equivalence classes, and attempting to explore a few representatives from each class. We present a new dynamic partial-order reduction method called the Data-centric Partial Order Reduction (DC-DPOR). Our algorithm is based on a new equivalence between traces, called the observation equivalence. DC-DPOR explores a coarser partitioning of the trace space than any exploration method based on the standard Mazurkiewicz equivalence. Depending on the program, the new partitioning can be even exponentially coarser. Additionally, DC-DPOR spends only polynomial time in each explored class. Our fifth contribution is the use of automata and game-theoretic verification techniques in the competitive analysis and synthesis of real-time scheduling algorithms for firm-deadline tasks. On the analysis side, we leverage automata on infinite words to compute the competitive ratio of real-time schedulers subject to various environmental constraints. On the synthesis side, we introduce a new instance of two-player mean-payoff partial-information games, and show how the synthesis of an optimal real-time scheduler can be reduced to computing winning strategies in this new type of games
    • …
    corecore