6 research outputs found

    Composable abstractions for synchronization in dynamic threading platforms

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 259-269).High-level abstractions for parallel programming simplify the development of efficient parallel applications. In particular, composable abstractions allow programmers to construct a complex parallel application out of multiple components, where each component itself may be designed to exploit parallelism. This dissertation presents the design of three composable abstractions for synchronization in dynamic-threading platforms, based on ideas of task-graph execution, helper locks, and transactional memory. These designs demonstrate provably efficient runtime scheduling for programs with synchronization. For applications that use task-graph synchronization, I demonstrate provably efficient execution of task graphs with arbitrary dependencies as a library in a fork-join platform. Conventional wisdom suggests that a fork-join platform can execute an arbitrary task graph only with special runtime support or by converting the graph into a series-parallel computation which has less parallelism. By implementing Nabbit, a Cilk++ library for arbitrary task-graph execution, I show that one can in fact avoid introducing runtime modifications or additional constraints on parallelism. Nabbit achieves an asymptotically optimal completion-time bound for task graphs with constant degree. For applications that use lock-based synchronization, I introduce helper locks, a new synchronization abstraction that enables programmers to exploit asynchronous task parallelism inside locked critical regions. When a processor fails to acquire a helper lock, it can help complete the parallel critical region protected by the lock instead of simply waiting for the lock to be released. I also present HELPER, a runtime for supporting helper locks, and prove theoretical performance bounds which imply that HELPER achieves linear speedup on programs with a small number of highly parallel critical regions. For applications that use transaction-based synchronization, I present CWSTM, the first design of a transactional memory (TM) system that supports transactions with nested parallelism and nested parallel transactions of unbounded nesting depth. CWSTM demonstrates that one can provide theoretical bounds on the overhead of transaction conflict detection which are independent of nesting depth. I also introduce the concept of ownership-aware TM, the idea of using information about which memory locations a software module owns to provide provable guarantees of safety and correctness for open-nested transactions.by Jim Sukha.Ph.D

    Program Model Checking: A Practitioner's Guide

    Get PDF
    Program model checking is a verification technology that uses state-space exploration to evaluate large numbers of potential program executions. Program model checking provides improved coverage over testing by systematically evaluating all possible test inputs and all possible interleavings of threads in a multithreaded system. Model-checking algorithms use several classes of optimizations to reduce the time and memory requirements for analysis, as well as heuristics for meaningful analysis of partial areas of the state space Our goal in this guidebook is to assemble, distill, and demonstrate emerging best practices for applying program model checking. We offer it as a starting point and introduction for those who want to apply model checking to software verification and validation. The guidebook will not discuss any specific tool in great detail, but we provide references for specific tools

    Algorithms incorporating concurrency and caching

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 189-203).This thesis describes provably good algorithms for modern large-scale computer systems, including today's multicores. Designing efficient algorithms for these systems involves overcoming many challenges, including concurrency (dealing with parallel accesses to the same data) and caching (achieving good memory performance.) This thesis includes two parallel algorithms that focus on testing for atomicity violations in a parallel fork-join program. These algorithms augment a parallel program with a data structure that answers queries about the program's structure, on the fly. Specifically, one data structure, called SP-ordered-bags, maintains the series-parallel relationships among threads, which is vital for uncovering race conditions (bugs) in the program. Another data structure, called XConflict, aids in detecting conflicts in a transactional-memory system with nested parallel transactions. For a program with work T and span To, maintaining either data structure adds an overhead of PT, to the running time of the parallel program when executed on P processors using an efficient scheduler, yielding a total runtime of O(T1/P + PTo). For each of these data structures, queries can be answered in 0(1) time. This thesis also introduces the compressed sparse rows (CSB) storage format for sparse matrices, which allows both Ax and ATx to be computed efficiently in parallel, where A is an n x n sparse matrix with nnz > n nonzeros and x is a dense n-vector. The parallel multiplication algorithm uses e(nnz) work and ... span, yielding a parallelism of ... , which is amply high for virtually any large matrix.(cont.) Also addressing concurrency, this thesis considers two scheduling problems. The first scheduling problem, motivated by transactional memory, considers randomized backoff when jobs have different lengths. I give an analysis showing that binary exponential backoff achieves makespan V2e(6v 1- i ) with high probability, where V is the total length of all n contending jobs. This bound is significantly larger than when jobs are all the same size. A variant of exponential backoff, however, achieves makespan of ... with high probability. I also present the size-hashed backoff protocol, specifically designed for jobs having different lengths, that achieves makespan ... with high probability. The second scheduling problem considers scheduling n unit-length jobs on m unrelated machines, where each job may fail probabilistically. Specifically, an input consists of a set of n jobs, a directed acyclic graph G describing the precedence constraints among jobs, and a failure probability qij for each job j and machine i. The goal is to find a schedule that minimizes the expected makespan. I give an O(log log(min {m, n}))-approximation for the case of independent jobs (when there are no precedence constraints) and an O(log(n + m) log log(min {m, n}))-approximation algorithm when precedence constraints form disjoint chains. This chain algorithm can be extended into one that supports precedence constraints that are trees, which worsens the approximation by another log(n) factor. To address caching, this thesis includes several new variants of cache-oblivious dynamic dictionaries.(cont.) A cache-oblivious dictionary fills the same niche as a classic B-tree, but it does so without tuning for particular memory parameters. Thus, cache-oblivious dictionaries optimize for all levels of a multilevel hierarchy and are more portable than traditional B-trees. I describe how to add concurrency to several previously existing cache-oblivious dictionaries. I also describe two new data structures that achieve significantly cheaper insertions with a small overhead on searches. The cache-oblivious lookahead array (COLA) supports insertions/deletions and searches in O((1/B) log N) and O(log N) memory transfers, respectively, where B is the block size, M is the memory size, and N is the number of elements in the data structure. The xDict supports these operations in O((1/1B E1-) logB(N/M)) and O((1/)0logB(N/M)) memory transfers, respectively, where 0 < E < 1 is a tunable parameter. Also on caching, this thesis answers the question: what is the worst possible page-replacement strategy? The goal of this whimsical chapter is to devise an online strategy that achieves the highest possible fraction of page faults / cache misses as compared to the worst offline strategy. I show that there is no deterministic strategy that is competitive with the worst offline. I also give a randomized strategy based on the most recently used heuristic and show that it is the worst possible pagereplacement policy. On a more serious note, I also show that direct mapping is, in some sense, a worst possible page-replacement policy. Finally, this thesis includes a new algorithm, following a new approach, for the problem of maintaining a topological ordering of a dag as edges are dynamically inserted.(cont.) The main result included here is an O(n2 log n) algorithm for maintaining a topological ordering in the presence of up to m < n(n - 1)/2 edge insertions. In contrast, the previously best algorithm has a total running time of O(min { m3/ 2, n5/2 }). Although these algorithms are not parallel and do not exhibit particularly good locality, some of the data structural techniques employed in my solution are similar to others in this thesis.by Jeremy T. Fineman.Ph.D

    Cooperative Communications inWireless Local Area Networks: MAC Protocol Design and Multi-layer Solutions

    Get PDF
    This dissertation addresses cooperative communications and proposes multi-layer solu- tions for wireless local area networks, focusing on cooperative MAC design. The coop- erative MAC design starts from CSMA/CA based wireless networks. Three key issues of cooperation from the MAC layer are dealt with: i.e., when to cooperate (opportunistic cooperation), whom to cooperate with (relay selection), and how to protect cooperative transmissions (message procedure design). In addition, a cooperative MAC protocol that addresses these three issues is proposed. The relay selection scheme is further optimized in a clustered network to solve the problem of high collision probability in a dense network. The performance of the proposed schemes is evaluated in terms of through- put, packet delivery rate and energy efficiency. Furthermore, the proposed protocol is verified through formal model checking using SPIN. Moreover, a cooperative code allo- cation scheme is proposed targeting at a clustered network where multiple relay nodes can transmit simultaneously. The cooperative communication design is then extended to the routing layer through cross layer routing metrics. Another part of the work aims at enabling concurrent transmissions using cooperative carrier sensing to improve the per- formance in a WLAN network with multiple access points sharing the same channel

    Executable Model Synthesis and Property Validation for Message Sequence Chart Specifications

    Get PDF
    Message sequence charts (MSCā€™s) are a formal language for the speciļ¬cation of scenarios in concurrent real-time systems. The thesis addresses the synthesis of executable object-oriented design-time models from MSC speciļ¬cations. The synthesis integrates with the software development process, its purpose being to automatically create working prototypes from speciļ¬cations without error and create executable models on which properties may be validated. The usefulness of existing algorithms for the synthesis of ROOM (Real-Time Object Oriented Modeling) models from MSCā€™s has been evaluated from the perspective of an applications programmer ac-cording to various criteria. A number of new synthesis features have been proposed to address them, and applied to a telephony call management system for illustration. These include the speciļ¬cation and construction of hierarchical structure and behavior of ROOM actors, views, multiple containment, replication, resolution of non-determinism and automatic coordination. Generalizations and algorithms have been provided. The hierarchical actor structure, replication, FSM merging, and global coordinator algorithms have been implemented in the Mesa CASE tool. A comparison is made to other speciļ¬cation and modeling languages and their synthesis, such as SDL, LSCā€™s, and statecharts. Another application of synthesis is to generate a model with support for the automated validation of safety and liveness properties. The Mobility Management services of the GSM digital mobile telecommunications system were speciļ¬ed in MSCā€™s. A Promela model of the system was then synthesized. A number of optimizations have been proposed to reduce the complexity of the model in order to successfully perform a validation of it. Properties of the system were encoded in Linear Temporal Logic, and the Promela model was used to automatically validate a number of identiļ¬ed properties using the model checker Spin. A ROOM model was then synthesized from the validated MSC speciļ¬cation using the proposed reļ¬nement features
    corecore