545 research outputs found

    Feedback Driven Annotation and Refactoring of Parallel Programs

    Get PDF

    Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

    Full text link
    MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically converting real-world, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi

    Semantic-Preserving Transformations for Stream Program Orchestration on Multicore Architectures

    Get PDF
    Because the demand for high performance with big data processing and distributed computing is increasing, the stream programming paradigm has been revisited for its abundance of parallelism in virtue of independent actors that communicate via data channels. The synchronous data-flow (SDF) programming model is frequently adopted with stream programming languages for its convenience to express stream programs as a set of nodes connected by data channels. Static data-rates of SDF programming model enable program transformations that greatly improve the performance of SDF programs on multicore architectures. The major application domain is for SDF programs are digital signal processing, audio, video, graphics kernels, networking, and security. This thesis makes the following three contributions that improve the performance of SDF programs: First, a new intermediate representation (IR) called LaminarIR is introduced. LaminarIR replaces FIFO queues with direct memory accesses to reduce the data communication overhead and explicates data dependencies between producer and consumer nodes. We provide transformations and their formal semantics to convert conventional, FIFO-queue based program representations to LaminarIR. Second, a compiler framework to perform sound and semantics-preserving program transformations from FIFO semantics to LaminarIR. We employ static program analysis to resolve token positions in FIFO queues and replace them by direct memory accesses. Third, a communication-cost-aware program orchestration method to establish a foundation of LaminarIR parallelization on multicore architectures. The LaminarIR framework, which consists of the aforementioned contributions together with the benchmarks that we used with the experimental evaluation, has been open-sourced to advocate further research on improving the performance of stream programming languages

    AN INVESTIGATION INTO PARTITIONING ALGORITHMS FOR AUTOMATIC HETEROGENEOUS COMPILERS

    Get PDF
    Automatic Heterogeneous Compilers allows blended hardware-software solutions to be explored without the cost of a full-fledged design team, but limited research exists on current partitioning algorithms responsible for separating hardware and software. The purpose of this thesis is to implement various partitioning algorithms onto the same automatic heterogeneous compiler platform to create an apples to apples comparison for AHC partitioning algorithms. Both estimated outcomes and actual outcomes for the solutions generated are studied and scored. The platform used to implement the algorithms is Cal Poly’s own Twill compiler, created by Doug Gallatin last year. Twill’s original partitioning algorithm is chosen along with two other partitioning algorithms: Tabu Search + Simulated Annealing (TSSA) and Genetic Search (GS). These algorithms are implemented inside Twill and test bench input code from the CHStone HLS Benchmark tests is used as stimulus. Along with the algorithms cost models, one key attribute of interest is queue counts generated, as the more cuts between hardware and software requires queues to pass the data between partition crossings. These high communication costs can end up damaging the heterogeneous solution’s performance. The Genetic, TSSA, and Twill’s original partitioning algorithm are all scored against each other’s cost models as well, combining the fitness and performance cost models with queue counts to evaluate each partitioning algorithm. The solutions generated by TSSA are rated as better by both the cost model for the TSSA algorithm and the cost model for the Genetic algorithm while producing low queue counts

    Type-Directed Program Synthesis and Constraint Generation for Library Portability

    Get PDF
    Fast numerical libraries have been a cornerstone of scientific computing for decades, but this comes at a price. Programs may be tied to vendor specific software ecosystems resulting in polluted, non-portable code. As we enter an era of heterogeneous computing, there is an explosion in the number of accelerator libraries required to harness specialized hardware. We need a system that allows developers to exploit ever-changing accelerator libraries, without over-specializing their code. As we cannot know the behavior of future libraries ahead of time, this paper develops a scheme that assists developers in matching their code to new libraries, without requiring the source code for these libraries. Furthermore, it can recover equivalent code from programs that use existing libraries and automatically port them to new interfaces. It first uses program synthesis to determine the meaning of a library, then maps the synthesized description into generalized constraints which are used to search the program for replacement opportunities to present to the developer. We applied this approach to existing large applications from the scientific computing and deep learning domains. Using our approach, we show speedups ranging from 1.1×\times to over 10×\times on end to end performance when using accelerator libraries.Comment: Accepted to PACT 201
    • …
    corecore