52,905 research outputs found

    Optimization of Patterned Surfaces for Improved Superhydrophobicity Through Cost-Effective Large-Scale Computations

    Full text link
    The growing need for creating surfaces with specific wetting properties, such as superhyrdophobic behavior, asks for novel methods for their efficient design. In this work, a fast computational method for the evaluation of patterned superhyrdophobic surfaces is introduced. The hydrophobicity of a surface is quantified in energy terms through an objective function. The increased computational cost led to the parallelization of the method with the Message Passing Interface (MPI) communication protocol that enables calculations on distributed memory systems allowing for parametric investigations at acceptable time frames. The method is demonstrated for a surface consisting of an array of pillars with inverted conical (frustum) geometry. The parallel speedup achieved allows for low cost parametric investigations on the effect of the fine features (curvature and slopes) of the pillars on the superhydophobicity of the surface and consequently for the optimization of superhyrdophobic surfaces.Comment: 18 pages, 18 figure

    A heterogeneous computer vision architecture: implementation issues

    Get PDF
    The prototype of a heterogeneous architecture is currently being built. The architecture is aimed at video-rate computing and is based on a message passing MIMD topology at the top level-transputer based-and on VLSI associative processor arrays (APA, SIMD structure) for low level image processing tasks. The APA structure is implemented through a set of 4 VLSI chips (GLiTCH) containing 64 1-bit processing elements each. This communication addresses some issues concerning the implementation of the first prototype, namely those related to: • the design and integration of the APA controller unit, which provides the required interface between the APA, the MIMD topology and the video image interface: • the evaluation of the GLiTCH chip through an emulator based on transputers and fast programmable devices; the emulator was designed to be flexible enough to evaluate later modifications to the GLiTCH design; • the design of an integrated set of software development tools containing a structured editor-syntax oriented, with a visual interface/programming interface-and a cross compiler and debugger

    Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach

    Full text link
    Computationally-intensive loops are the primary source of parallelism in scientific applications. Such loops are often irregular and a balanced execution of their loop iterations is critical for achieving high performance. However, several factors may lead to an imbalanced load execution, such as problem characteristics, algorithmic, and systemic variations. Dynamic loop self-scheduling (DLS) techniques are devised to mitigate these factors, and consequently, improve application performance. On distributed-memory systems, DLS techniques can be implemented using a hierarchical master-worker execution model and are, therefore, called hierarchical DLS techniques. These techniques self-schedule loop iterations at two levels of hardware parallelism: across and within compute nodes. Hybrid programming approaches that combine the message passing interface (MPI) with open multi-processing (OpenMP) dominate the implementation of hierarchical DLS techniques. The MPI-3 standard includes the feature of sharing memory regions among MPI processes. This feature introduced the MPI+MPI approach that simplifies the implementation of parallel scientific applications. The present work designs and implements hierarchical DLS techniques by exploiting the MPI+MPI approach. Four well-known DLS techniques are considered in the evaluation proposed herein. The results indicate certain performance advantages of the proposed approach compared to the hybrid MPI+OpenMP approach

    RITSim: distributed systemC simulation

    Get PDF
    Parallel or distributed simulation is becoming more than a novel way to speedup design evaluation; it is becoming necessary for simulating modern processors in a reasonable timeframe. As architectural features become faster, smaller, and more complex, designers are interested in obtaining detailed and accurate performance and power estimations. Uniprocessor simulators may not be able to meet such demands. The RITSim project uses SystemC to model a processor microarchitecture and memory subsystem in great detail. SystemC is a C++ library built on a discrete-event simulation kernel. Many projects have successfully implemented parallel discrete-event simulation (PDES) frameworks to distribute simulation among several hosts. The field promises significant simulation speedup, possibly leading to faster turnaround time in design space exploration and commercial production. However, parallel implementation of such simulators is not an easy task. It requires modification of the simulation kernel for effective partitioning and synchronization. This thesis explores PDES techniques and presents a distributed version of the SystemC simulation environment. With minimal user interaction, SystemC models can executed on a cluster of workstations using a message-passing library such as the Message Passing Interface (MPI). The implementation is designed for transparency; distribution and synchronization happen with little intervention by the model author. Modification of SystemC is fashioned to promote maintainability with future releases. Furthermore, only freely available libraries are used for maximum flexibility and portability

    Learning from the Success of MPI

    Full text link
    The Message Passing Interface (MPI) has been extremely successful as a portable way to program high-performance parallel computers. This success has occurred in spite of the view of many that message passing is difficult and that other approaches, including automatic parallelization and directive-based parallelism, are easier to use. This paper argues that MPI has succeeded because it addresses all of the important issues in providing a parallel programming model.Comment: 12 pages, 1 figur
    • …
    corecore