26 research outputs found

    Adaptive structured parallelism for computational grids

    Get PDF
    Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and interaction. They provide top-down design composition and control inheritance throughout the whole structure. Parallel programs are expressed by interweaving parameterised skeletons analogously to the way sequential structured programs are constructed. This design paradigm, known as structured parallelism, provides a high-level parallel programming method which allows the abstract description of programs and fosters portability. That is to say, structured parallelism requires the description of the algorithm rather than its implementation, providing a clear and consistent meaning across platforms while their associated structure depends on the particular implementation. By decoupling the structure from the meaning of a parallel program, it benefits entirely from any performance improvements in the systems infrastructure

    Using eSkel to Implement the Multiple Baseline Stereo Application

    Get PDF
    We give an overview of the Edinburgh Skeleton Library eSkel, a structured parallel programming library which offers a range of skeletal parallel programming constructs to the C/MPI programmer. Then we illustrate the efficacy of such a high level approach through an application of multiple baseline stereo. We describe the application and show different ways to introduce parallelism using algorithmic skeletons. Some performance results will be reported

    Two Fundamental Concepts in Skeletal Parallel Programming

    Get PDF
    We define the concepts of nesting mode and interaction mode as they arise in the description of skeletal parallel programming systems. We sugegs

    Automatic Optimization of Python Skeletal Parallel Programs

    Get PDF
    International audienceSkeletal parallelism is a model of parallelism where parallel constructs are provided to the programmer as usual patterns of parallel algorithms. High-level skeleton libraries often offer a global view of programs instead of the common Single Program Multiple Data view in parallel programming. A program is written as a sequential program but operates on parallel data structures. Most of the time, skeletons on a parallel data structure have counterparts on a sequential data structure. For example, the map function that applies a given function to all the elements of a sequential collection (e.g., a list) has a map skeleton counterpart that applies a sequential function to all the elements of a distributed collection. Two of the challenges a programmer faces when using a skeleton library that provides a wide variety of skeletons are: which are the skeletons to use, and how to compose them? These design decisions may have a large impact on the performance of the parallel programs. However, skeletons, especially when they do not mutate the data structure they operate on, but are rather implemented as pure functions , possess algebraic properties that allow to transform compositions of skeletons into more efficient compositions of skeletons. In this paper, we present such an automatic transformation framework for the Python skeleton library PySke and evaluate it on several example applications

    Una solución paralela simple al método de difusión de error

    Get PDF
    El Halftoning o Mediotonado Digital se utiliza para plasmar imágenes en escala de grises en dispositivos electrónicos que sólo pueden reproducir elementos binarios creando la ilusión de tonos continuo. A pesar de los numerosos métodos de halftoning existentes, todos ellos comparten la problemática de que su tiempo de procesamiento es dependiente de las características de la imagen y el algoritmo utilizado. Es de interés la generación de propuestas para acelerar los tiempos de procesamiento independientemente de los métodos y los datos. En este trabajo se presentan dos sistemas paralelos del Método de Difusión de Error establecido por Floyd y Steinberg, considerado un método inherentemente secuencial. El primero es la implementación del diseño desarrollado por Metaxas y el segundo, propuesto por los autores, se caracteriza por aplicar una segmentación simple de la imagen de entrada y técnicas paralelas de gránulo grueso. Además de presentar ambos sistemas, se muestran algunos resultados experimentales, tanto en el desempeño como en la calidad de las imágenes resultantesDigital halftoning is a well-known technique for electronic devices that can reproduce only black and white pixels. It converts gray scale images into black and white ones creating the illusion of continuous tones. Besides the large number of existing methods and algorithms, all of them share the same problem: image size and selected algorithm dependence. As a consequence, it is necessary to develop new techniques in an independent method and data way. This paper presents two parallel implementations of Floyd-Steinberg´s diffusion error method; an inherently sequential method. The first one implements a proposed design by Metaxas, and the second one tries, proposed by authors, a simple segmentation and coarse grain parallel techniques. In addition, an experimental comparison of the implementations are shown.IV Workshop de Computación Gráfica, Imágenes y Visualización (WCGIV)Red de Universidades con Carreras en Informática (RedUNCI

    Una solución paralela simple al método de difusión de error

    Get PDF
    El Halftoning o Mediotonado Digital se utiliza para plasmar imágenes en escala de grises en dispositivos electrónicos que sólo pueden reproducir elementos binarios creando la ilusión de tonos continuo. A pesar de los numerosos métodos de halftoning existentes, todos ellos comparten la problemática de que su tiempo de procesamiento es dependiente de las características de la imagen y el algoritmo utilizado. Es de interés la generación de propuestas para acelerar los tiempos de procesamiento independientemente de los métodos y los datos. En este trabajo se presentan dos sistemas paralelos del Método de Difusión de Error establecido por Floyd y Steinberg, considerado un método inherentemente secuencial. El primero es la implementación del diseño desarrollado por Metaxas y el segundo, propuesto por los autores, se caracteriza por aplicar una segmentación simple de la imagen de entrada y técnicas paralelas de gránulo grueso. Además de presentar ambos sistemas, se muestran algunos resultados experimentales, tanto en el desempeño como en la calidad de las imágenes resultantesDigital halftoning is a well-known technique for electronic devices that can reproduce only black and white pixels. It converts gray scale images into black and white ones creating the illusion of continuous tones. Besides the large number of existing methods and algorithms, all of them share the same problem: image size and selected algorithm dependence. As a consequence, it is necessary to develop new techniques in an independent method and data way. This paper presents two parallel implementations of Floyd-Steinberg´s diffusion error method; an inherently sequential method. The first one implements a proposed design by Metaxas, and the second one tries, proposed by authors, a simple segmentation and coarse grain parallel techniques. In addition, an experimental comparison of the implementations are shown.IV Workshop de Computación Gráfica, Imágenes y Visualización (WCGIV)Red de Universidades con Carreras en Informática (RedUNCI

    High Performance Web Servers: A Study In Concurrent Programming Models

    Get PDF
    With the advent of commodity large-scale multi-core computers, the performance of software running on these computers has become a challenge to researchers and enterprise developers. While academic research and industrial products have moved in the direction of writing scalable and highly available services using distributed computing, single machine performance remains an active domain, one which is far from saturated. This thesis selects an archetypal software example and workload in this domain, and describes software characteristics affecting performance. The example is highly-parallel web-servers processing a static workload. Particularly, this work examines concurrent programming models in the context of high-performance web-servers across different architectures — threaded (Apache, Go and μKnot), event-driven (Nginx, μServer) and staged (WatPipe) — compared with two static workloads in two different domains. The two workloads are a Zipf distribution of file sizes representing a user session pulling an assortment of many small and a few large files, and a 50KB file representing chunked streaming of a large audio or video file. Significant effort is made to fairly compare eight web-servers by carefully tuning each via their adjustment parameters. Tuning plays a significant role in workload-specific performance. The two domains are no disk I/O (in-memory file set) and medium disk I/O. The domains are created by lowering the amount of RAM available to the web-server from 4GB to 2GB, forcing files to be evicted from the file-system cache. Both domains are also restricted to 4 CPUs. The primary goal of this thesis is to examine fundamental performance differences between threaded and event-driven concurrency models, with particular emphasis on user-level threading models. Additionally, a secondary goal of the work is to examine high-performance software under restricted hardware environments. Over-provisioned hardware environments can mask architectural and implementation shortcomings in software – the hypothesis in this work is that restricting resources stresses the application, bringing out important performance characteristics and properties. Experimental results for the given workload show that memory pressure is one of the most significant factors for the degradation of web-server performance, because it forces both the onset and amount of disk I/O. With an ever increasing need to support more content at faster rates, a web-server relies heavily on in-memory caching of files and related content. In fact, personal and small business web-servers are even run on minimal hardware, like the Raspberry Pi, with only 1GB of RAM and a small SD card for the file system. Therefore, understanding behaviour and performance in restricted contexts should be a normal aspect of testing a web server (and other software systems)
    corecore