2,270 research outputs found

    Performance Reproduction and Prediction of Selected Dynamic Loop Scheduling Experiments

    Full text link
    Scientific applications are complex, large, and often exhibit irregular and stochastic behavior. The use of efficient loop scheduling techniques in computationally-intensive applications is crucial for improving their performance on high-performance computing (HPC) platforms. A number of dynamic loop scheduling (DLS) techniques have been proposed between the late 1980s and early 2000s, and efficiently used in scientific applications. In most cases, the computing systems on which they have been tested and validated are no longer available. This work is concerned with the minimization of the sources of uncertainty in the implementation of DLS techniques to avoid unnecessary influences on the performance of scientific applications. Therefore, it is important to ensure that the DLS techniques employed in scientific applications today adhere to their original design goals and specifications. The goal of this work is to attain and increase the trust in the implementation of DLS techniques in present studies. To achieve this goal, the performance of a selection of scheduling experiments from the 1992 original work that introduced factoring is reproduced and predicted via both, simulative and native experimentation. The experiments show that the simulation reproduces the performance achieved on the past computing platform and accurately predicts the performance achieved on the present computing platform. The performance reproduction and prediction confirm that the present implementation of the DLS techniques considered both, in simulation and natively, adheres to their original description. The results confirm the hypothesis that reproducing experiments of identical scheduling scenarios on past and modern hardware leads to an entirely different behavior from expected

    A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue

    Get PDF
    We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable. We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms

    A Template for Implementing Fast Lock-free Trees Using HTM

    Full text link
    Algorithms that use hardware transactional memory (HTM) must provide a software-only fallback path to guarantee progress. The design of the fallback path can have a profound impact on performance. If the fallback path is allowed to run concurrently with hardware transactions, then hardware transactions must be instrumented, adding significant overhead. Otherwise, hardware transactions must wait for any processes on the fallback path, causing concurrency bottlenecks, or move to the fallback path. We introduce an approach that combines the best of both worlds. The key idea is to use three execution paths: an HTM fast path, an HTM middle path, and a software fallback path, such that the middle path can run concurrently with each of the other two. The fast path and fallback path do not run concurrently, so the fast path incurs no instrumentation overhead. Furthermore, fast path transactions can move to the middle path instead of waiting or moving to the software path. We demonstrate our approach by producing an accelerated version of the tree update template of Brown et al., which can be used to implement fast lock-free data structures based on down-trees. We used the accelerated template to implement two lock-free trees: a binary search tree (BST), and an (a,b)-tree (a generalization of a B-tree). Experiments show that, with 72 concurrent processes, our accelerated (a,b)-tree performs between 4.0x and 4.2x as many operations per second as an implementation obtained using the original tree update template

    Advanced transport operating system software upgrade: Flight management/flight controls software description

    Get PDF
    The Flight Management/Flight Controls (FM/FC) software for the Norden 2 (PDP-11/70M) computer installed on the NASA 737 aircraft is described. The software computes the navigation position estimates, guidance commands, those commands to be issued to the control surfaces to direct the aircraft in flight based on the modes selected on the Advanced Guidance Control System (AGSC) mode panel, and the flight path selected via the Navigation Control/Display Unit (NCDU)
    • …
    corecore