3,357 research outputs found

    pocl: A Performance-Portable OpenCL Implementation

    Get PDF
    OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. Our results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary OpenCL implementation for the platform at hand.Comment: This article was published in 2015; it is now openly accessible via arxi

    Analysis of a parallelized nonlinear elliptic boundary value problem solver with application to reacting flows

    Get PDF
    A parallelized finite difference code based on the Newton method for systems of nonlinear elliptic boundary value problems in two dimensions is analyzed in terms of computational complexity and parallel efficiency. An approximate cost function depending on 15 dimensionless parameters is derived for algorithms based on stripwise and boxwise decompositions of the domain and a one-to-one assignment of the strip or box subdomains to processors. The sensitivity of the cost functions to the parameters is explored in regions of parameter space corresponding to model small-order systems with inexpensive function evaluations and also a coupled system of nineteen equations with very expensive function evaluations. The algorithm was implemented on the Intel Hypercube, and some experimental results for the model problems with stripwise decompositions are presented and compared with the theory. In the context of computational combustion problems, multiprocessors of either message-passing or shared-memory type may be employed with stripwise decompositions to realize speedup of O(n), where n is mesh resolution in one direction, for reasonable n

    Autonomous monitoring framework for resource-constrained environments

    Get PDF
    Acknowledgments The research described here is supported by the award made by the RCUK Digital Economy programme to the dot.rural Digital Economy Hub, reference: EP/G066051/1. URL: http://www.dotrural.ac.uk/RemoteStream/Peer reviewedPublisher PD

    Estimating the effects of water-induced shallow landslides on soil erosion

    Get PDF
    Rainfall induced landslides and soil erosion are part of a complex system of multiple interacting processes, and both are capable of significantly affecting sediment budgets. These sediment mass movements also have the potential to significantly impact on a broad network of ecosystems health, functionality and the services they provide. To support the integrated assessment of these processes it is necessary to develop reliable modelling architectures. This paper proposes a semi-quantitative integrated methodology for a robust assessment of soil erosion rates in data poor regions affected by landslide activity. It combines heuristic, empirical and probabilistic approaches. This proposed methodology is based on the geospatial semantic array programming paradigm and has been implemented on a catchment scale methodology using Geographic Information Systems (GIS) spatial analysis tools and GNU Octave. The integrated data-transformation model relies on a modular architecture, where the information flow among modules is constrained by semantic checks. In order to improve computational reproducibility, the geospatial data transformations implemented in ESRI ArcGis are made available in the free software GRASS GIS. The proposed modelling architecture is flexible enough for future transdisciplinary scenario analysis to be more easily designed. In particular, the architecture might contribute as a novel component to simplify future integrated analyses of the potential impact of wildfires or vegetation types and distributions, on sediment transport from water induced landslides and erosion.Comment: 14 pages, 4 figures, 1 table, published in IEEE Earthzine 2014 Vol. 7 Issue 2, 910137+ 2nd quarter theme. Geospatial Semantic Array Programming. Available: http://www.earthzine.org/?p=91013

    A wireless sensor and actuator network for improving the electrical power grid dependability

    Get PDF
    This paper presents an overview of a Wireless Sensor and Actuator Network (WSAN) used to monitor an electrical power grid distribution infrastructure. The WSAN employs appropriate sensors to monitor key grid components, integrating both safety and security services, which improve the grid distribution dependability. The supported applications include, among others, video surveillance of remote secondary substations, which imposes special requirements from the point of view of quality of service and reliability. The paper presents the hardware and software architecture of the system together with performance results

    Design and synthesis of a high-performance, hyper-programmable DSP on an FPGA

    Get PDF
    In the field of high performance digital signal processing, DSPs and FPGAs provide the most flexibility. Due to the extensive customization available on FPGAs, DSP algorithm implementation on an FPGA exhibits an increased development time over programming a processor. Because of this, traditional DSPs typically yield a faster time to market than an FPGA design. However, it is often desirable to have the ASIC-like performance that is attainable through the additional customization and parallel computation available through an FPGA. This can be achieved through the class of processors known as hyper-programmable DSPs. A hyper-programmable DSP is a DSP in which multiple aspects of the architecture are programmable. This thesis contributes such a DSP, targeted for high-performance and realized in hardware using an FPGA. The design consists of both a scalar datapath and a vector datapath capable of parallel operations, both of which are extensively customizable. To aid in the design of the datapaths, graphical tools are introduced as an efficient way to modify the design. A tool was also created to supply a graphical interface to help write instructions for the vector datapath. Additionally, an adaptive assembler was created to convert assembly programs to machine code for any datapath design. The resulting design was synthesized for a Cyclone III FPGA. The synthesis resulted in a design capable of running at 135MHz with 61% of the logic used by processing elements. Benchmarks were run on the design to evaluate its performance. The benchmarks showed similar performance between the proposed design and commercial DSPs for the simple benchmarks but significant improvement for the more complex ones

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code
    corecore