200,450 research outputs found

    Contract-Based Resource Verification for Higher-Order Functions with Memoization

    Get PDF
    We present a new approach for specifying and verifying resource utilization of higher-order functional programs that use lazy evaluation and memoization. In our approach, users can specify the desired resource bound as templates with numerical holes e.g. as steps <= ? not asymptotic to size(l) + ? in the contracts of functions. They can also express invariants necessary for establishing the bounds that may depend on the state of memoization. Our approach operates in two phases: first generating an instrumented first-order program that accurately models the higher-order control flow and the effects of memoization on resources using sets, algebraic datatypes and mutual recursion, and then verifying the contracts of the first-order program by producing verification conditions of the form there exists for all using an extended assume/guarantee reasoning. We use our approach to verify precise bounds on resources such as evaluation steps and number of heap-allocated objects on 17 challenging data structures and algorithms. Our benchmarks, comprising of 5K lines of functional Scala code, include lazy mergesort, Okasaki's real-time queue and deque data structures that rely on aliasing of references to first-class functions; lazy data structures based on numerical representations such as the conqueue data structure of Scala's data-parallel library, cyclic streams, as well as dynamic programming algorithms such as knapsack and Viterbi. Our evaluations show that when averaged over all benchmarks the actual runtime resource consumption is 80% of the value inferred by our tool when estimating the number of evaluation steps, and is 88% for the number of heap-allocated objects

    Group Communication Patterns for High Performance Computing in Scala

    Full text link
    We developed a Functional object-oriented Parallel framework (FooPar) for high-level high-performance computing in Scala. Central to this framework are Distributed Memory Parallel Data structures (DPDs), i.e., collections of data distributed in a shared nothing system together with parallel operations on these data. In this paper, we first present FooPar's architecture and the idea of DPDs and group communications. Then, we show how DPDs can be implemented elegantly and efficiently in Scala based on the Traversable/Builder pattern, unifying Functional and Object-Oriented Programming. We prove the correctness and safety of one communication algorithm and show how specification testing (via ScalaCheck) can be used to bridge the gap between proof and implementation. Furthermore, we show that the group communication operations of FooPar outperform those of the MPJ Express open source MPI-bindings for Java, both asymptotically and empirically. FooPar has already been shown to be capable of achieving close-to-optimal performance for dense matrix-matrix multiplication via JNI. In this article, we present results on a parallel implementation of the Floyd-Warshall algorithm in FooPar, achieving more than 94 % efficiency compared to the serial version on a cluster using 100 cores for matrices of dimension 38000 x 38000

    Design and optimization of a portable LQCD Monte Carlo code using OpenACC

    Full text link
    The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenACC, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.Comment: 26 pages, 2 png figures, preprint of an article submitted for consideration in International Journal of Modern Physics

    Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

    Get PDF
    The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

    Julia: A Fresh Approach to Numerical Computing

    Get PDF
    Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast. Julia questions notions generally held as "laws of nature" by practitioners of numerical computing: 1. High-level dynamic programs have to be slow. 2. One must prototype in one language and then rewrite in another language for speed or deployment, and 3. There are parts of a system for the programmer, and other parts best left untouched as they are built by the experts. We introduce the Julia programming language and its design --- a dance between specialization and abstraction. Specialization allows for custom treatment. Multiple dispatch, a technique from computer science, picks the right algorithm for the right circumstance. Abstraction, what good computation is really about, recognizes what remains the same after differences are stripped away. Abstractions in mathematics are captured as code through another technique from computer science, generic programming. Julia shows that one can have machine performance without sacrificing human convenience.Comment: 37 page

    How functional programming mattered

    Get PDF
    In 1989 when functional programming was still considered a niche topic, Hughes wrote a visionary paper arguing convincingly ‘why functional programming matters’. More than two decades have passed. Has functional programming really mattered? Our answer is a resounding ‘Yes!’. Functional programming is now at the forefront of a new generation of programming technologies, and enjoying increasing popularity and influence. In this paper, we review the impact of functional programming, focusing on how it has changed the way we may construct programs, the way we may verify programs, and fundamentally the way we may think about programs

    Verifying Resource Bounds of Programs with Lazy Evaluation and Memoization

    Get PDF
    We present a new approach for specifying and verifying resource utilization of higher-order functional programs that use lazy eval- uation and memoization. In our approach, users can specify the desired resource bound as templates with numerical holes e.g. as steps ≤ ? ∗ size(l) + ? in the contracts of functions. They can also express invariants necessary for establishing the bounds that may depend on the state of memoization. Our approach operates in two phases: first generating an instrumented first-order program that ac- curately models the higher-order control flow and the effects of memoization on resources using sets, algebraic datatypes and mu- tual recursion, and then verifying the contracts of the first-order program by producing verification conditions of the form ∃∀ using an extended assume/guarantee reasoning. We use our approach to verify precise bounds on resources such as evaluation steps and number of heap-allocated objects on 17 challenging data struc- tures and algorithms. Our benchmarks, comprising of 5K lines of functional Scala code, include lazy mergesort, Okasaki’s real-time queue and deque data structures that rely on aliasing of references to first-class functions; lazy data structures based on numerical rep- resentations such as the conqueue data structure of Scala’s data- parallel library, cyclic streams, as well as dynamic programming algorithms such as knapsack and Viterbi. Our evaluations show that when averaged over all benchmarks the actual runtime resource consumption is 80% of the value inferred by our tool when esti- mating the number of evaluation steps, and is 88% for the number of heap-allocated objects

    The role of concurrency in an evolutionary view of programming abstractions

    Full text link
    In this paper we examine how concurrency has been embodied in mainstream programming languages. In particular, we rely on the evolutionary talking borrowed from biology to discuss major historical landmarks and crucial concepts that shaped the development of programming languages. We examine the general development process, occasionally deepening into some language, trying to uncover evolutionary lineages related to specific programming traits. We mainly focus on concurrency, discussing the different abstraction levels involved in present-day concurrent programming and emphasizing the fact that they correspond to different levels of explanation. We then comment on the role of theoretical research on the quest for suitable programming abstractions, recalling the importance of changing the working framework and the way of looking every so often. This paper is not meant to be a survey of modern mainstream programming languages: it would be very incomplete in that sense. It aims instead at pointing out a number of remarks and connect them under an evolutionary perspective, in order to grasp a unifying, but not simplistic, view of the programming languages development process
    • …
    corecore