48 research outputs found

    Optimizing Transactions for Captured Memory

    Get PDF
    In this paper, we identify transaction-local memory as a major source of overhead from compiler instrumentation in software transactional memory (STM). Transaction-local memory is memory allocated inside a transaction, which cannot escape (i.e., is captured by) the allocating transaction. Accesses to such memory do not require calls to STM memory access functions (i.e., STM barriers). A compiler unaware of that may translate accesses to captured memory into expensive STM barriers. This presents us opportunities to improve STM performance. Our measurements with the STAMP benchmark suite (version 0.9.9) revealed that as many as 60% of the STM barriers generated by our baseline compiler access captured memory, including 90% of the write barriers and 45% of the read barriers. We propose runtime and compiler optimizations to elide STM barriers to captured memory. These techniques can also elide barriers for accesses to thread-local and read-only data. We implemented those optimizations in the Intel C++ STM compiler. Our experiments with the STAMP benchmark suite on a Intel Dunnington system (with 24 cores in a 4-node SMP system) show that these optimizations can improve performance by to 18% at 16 threads

    Abstract Instructionschedulingandregisterallocation/assignmentaretwooptimizationsthatarecommonlyused

    No full text
    withexposedinstruction-levelparallelismandlargeregister the code generation phase of modern compilers. These optimizations les. Theseoptimizations,however,impact are important for processors thetaskofthesymbolicdebuggerwhichattemptstopresenttotheuserasource-levelviewofprogram execution. The optimizations debuggers o for whenever most systems the user today asks usually for source punt level the debugging, issue of optimized or by not code, detecting either the by e turning optimizations on the source-level state. To not mislead the user, the debugger must provide feedback ects of of global the e register ects of allocation/assignment optimizations. In this on paper, symbolic we investigate debugging and the e present ects of approaches instruction that scheduling a debugger and can take

    Detection and Recovery of Endangered Variables Caused by Instruction Scheduling

    No full text
    Instruction scheduling re-orders and interleaves instruction sequences from different source statements. This impacts the task of a symbolic debugger, which attempts to present the user a picture of program execution that matches the source program. At a breakpoint B, if the value in the run-time location of a variable V may not correspond to the value the user expects V to have, then this variable is endangered at B. This paper describes an approach to detecting and recovering endangered variables caused by instruction scheduling. We measure the effects of instruction scheduling on a symbolic debugger's ability to recover source values at a breakpoint. This paper reports measurements for three C programs from the SPEC suite and a collection of programs from the Numerical Recipes, which have been compiled with a variant of a commercial C compiler. 1 Introduction A debugger allows a user to control the execution of a program (e.g., to set breakpoints) and to inspect the state o..

    UNLOCKING CONCURRENCY

    No full text
    Multicore architectures are an inflection point in mainstream software development because they force developers to write paral-lel programs. In a previous article in Queue, Herb Sutter and James Larus pointed out, “The concur-rency revolution is primarily a software revolution. The difficult problem is not building multicore hardware, but programming it in a way that lets mainstream applica-tions benefit from the continued exponential growth in CPU performance.” In this new multicore world, developers must write explicitly parallel applications that can take advantage of the increasing number of cores that each successive multicore generation will provide. Parallel programming poses many new challenges to the developer, one of which is synchronizing concurrent access to shared memory by multiple threads. Program-mers have traditionally used locks for synchronization, but lock-based synchronization has well-known pitfalls. Simplistic coarse-grained locking does not scale well, while more sophisticated fine-grained locking risks intro-ducing deadlocks and data races. Furthermore, scalable libraries written using fine-grained locks cannot be easily composed in a way that retains scalability and avoids deadlock and data races. TM (transactional memory) provides a new concur-rency-control construct that avoids the pitfalls of locks and significantly eases concurrent programming. It brings to mainstream parallel programming proven concur-more queue: www.acmqueue.co

    Software Engineering with Transactional Memory Versus Locks in Practice

    No full text
    Transactional Memory (TM) promises to simplify parallel programming by replacing locks with atomic transactions. Despite much recent progress in TM research, there is very little experience using TM to develop realistic parallel programs from scratch. In this article, we present the results of a detailed case study comparing teams of programmers developing a parallel program from scratch using transactional memory and locks. We analyze and quantify in a realistic environment the development time, programming progress, code metrics, programming patterns, and ease of code understanding for six teams who each wrote a parallel desktop search engine over a fifteen week period. Three randomly chosen teams used Intel’s Software Transactional Memory compiler and Pthreads, while the other teams used just Pthreads. Our analysis is exploratory: Given the same requirements, how far did each team get? The TM teams were among the first to have a prototype parallel search engine. Compared to the locks teams, the TM teams spent less than half the time debugging segmentation faults, but had more problems tuning performance and implementing queries. Code inspections with industry experts revealed that TM code was easier to understand than locks code, because the locks teams used many locks (up to thousands) to improve performance. Learning from each team’s individual success and failure story, this article provides valuable lessons for improving TM.Karlsruhe Institute of Technology (Excellence Initiative

    Source-Level Debugging of Scalar Optimized Code

    No full text
    Although compiler optimizations play a crucial role in the performance of modern computer systems, debugger technology has lagged behind in its support of optimizations. Yet debugging the unoptimized translation is often impossible or futile, so handling of code optimizations in the debugger is necessary. But compiler optimizations make it difficult to provide source-level debugger functionality: Global optimizations can cause the runtime value of a variable to be inconsistent with the source-level value expected at a breakpoint; such variables are called endangered variables. A debugger must detect and warn the user of endangered variables otherwise the user may draw incorrect conclusions about the program. This paper presents a new algorithm for detecting variables that are endangered due to global scalar optimizations. Our approach provides more precise classifications of variables and is still simpler than past approaches. We have implemented and evaluated our techniques in the con..

    Symbolic Debugging of Globally Optimized Code: Data Value Problems and Their Solutions

    No full text
    Symbolic debuggers are program development tools that allow a user to interact with an executing process at the source level. In response to a user query, the debugger must be able to retrieve and display the value of a source variable in a manner consistent with what the user expects with respect to the source statement where execution has halted. However, when a program has been compiled with optimizations, values of variables may either be inaccessible in the run-time state or inconsistent with what the user expects. Such problems that pertain to the retrieval of source values are called data value problems. In this paper we address the data value problems caused by global scalar optimizations. We describe in detail how global optimizations cause data value problems and the information a symbolic debugger can provide a user when data value problems occur. We provide a data flow algorithm that detects the impact of two global transformations: code hoisting and dead code elimination. ..

    Fundamentals of Multicore Software Development

    No full text
    With multicore processors now in every computer, server, and embedded device, the need for cost-effective, reliable parallel software has never been greater. By explaining key aspects of multicore programming, Fundamentals of Multicore Software Development helps software engineers understand parallel programming and master the multicore challenge. Accessible to newcomers to the field, the book captures the state of the art of multicore programming in computer science. It covers the fundamentals of multicore hardware, parallel design patterns, and parallel programming in C++, .NET, and Java. I
    corecore