5,467 research outputs found

    Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

    Get PDF
    The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

    Introduction to the ISO specification language LOTOS

    Get PDF
    LOTOS is a specification language that has been specifically developed for the formal description of the OSI (Open Systems Interconnection) architecture, although it is applicable to distributed, concurrent systems in general. In LOTOS a system is seen as a set of processes which interact and exchange data with each other and with their environment. LOTOS is expected to become an ISO international standard by 1988

    Intensional Updates

    Get PDF

    Group Communication Patterns for High Performance Computing in Scala

    Full text link
    We developed a Functional object-oriented Parallel framework (FooPar) for high-level high-performance computing in Scala. Central to this framework are Distributed Memory Parallel Data structures (DPDs), i.e., collections of data distributed in a shared nothing system together with parallel operations on these data. In this paper, we first present FooPar's architecture and the idea of DPDs and group communications. Then, we show how DPDs can be implemented elegantly and efficiently in Scala based on the Traversable/Builder pattern, unifying Functional and Object-Oriented Programming. We prove the correctness and safety of one communication algorithm and show how specification testing (via ScalaCheck) can be used to bridge the gap between proof and implementation. Furthermore, we show that the group communication operations of FooPar outperform those of the MPJ Express open source MPI-bindings for Java, both asymptotically and empirically. FooPar has already been shown to be capable of achieving close-to-optimal performance for dense matrix-matrix multiplication via JNI. In this article, we present results on a parallel implementation of the Floyd-Warshall algorithm in FooPar, achieving more than 94 % efficiency compared to the serial version on a cluster using 100 cores for matrices of dimension 38000 x 38000

    ChimpCheck: Property-Based Randomized Test Generation for Interactive Apps

    Full text link
    We consider the problem of generating relevant execution traces to test rich interactive applications. Rich interactive applications, such as apps on mobile platforms, are complex stateful and often distributed systems where sufficiently exercising the app with user-interaction (UI) event sequences to expose defects is both hard and time-consuming. In particular, there is a fundamental tension between brute-force random UI exercising tools, which are fully-automated but offer low relevance, and UI test scripts, which are manual but offer high relevance. In this paper, we consider a middle way---enabling a seamless fusion of scripted and randomized UI testing. This fusion is prototyped in a testing tool called ChimpCheck for programming, generating, and executing property-based randomized test cases for Android apps. Our approach realizes this fusion by offering a high-level, embedded domain-specific language for defining custom generators of simulated user-interaction event sequences. What follows is a combinator library built on industrial strength frameworks for property-based testing (ScalaCheck) and Android testing (Android JUnit and Espresso) to implement property-based randomized testing for Android development. Driven by real, reported issues in open source Android apps, we show, through case studies, how ChimpCheck enables expressing effective testing patterns in a compact manner.Comment: 20 pages, 21 figures, Symposium on New ideas, New Paradigms, and Reflections on Programming and Software (Onward!2017

    Automatic matching of legacy code to heterogeneous APIs: An idiomatic approach

    Get PDF
    Heterogeneous accelerators often disappoint. They provide the prospect of great performance, but only deliver it when using vendor specific optimized libraries or domain specific languages. This requires considerable legacy code modifications, hindering the adoption of heterogeneous computing. This paper develops a novel approach to automatically detect opportunities for accelerator exploitation. We focus on calculations that are well supported by established APIs: sparse and dense linear algebra, stencil codes and generalized reductions and histograms. We call them idioms and use a custom constraint-based Idiom Description Language (IDL) to discover them within user code. Detected idioms are then mapped to BLAS libraries, cuSPARSE and clSPARSE and two DSLs: Halide and Lift. We implemented the approach in LLVM and evaluated it on the NAS and Parboil sequential C/C++ benchmarks, where we detect 60 idiom instances. In those cases where idioms are a significant part of the sequential execution time, we generate code that achieves 1.26× to over 20× speedup on integrated and external GPUs
    • …
    corecore