28,674 research outputs found

    Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs

    Get PDF
    Programs written in the Unified Parallel C (UPC) language can access any location of the entire local and remote address space via read/write operations. However, UPC programs that contain fine-grained shared accesses can exhibit performance degradation. One solution is to use the inspector-executor technique to coalesce fine-grained shared accesses to larger remote access operations. A straightforward implementation of the inspector executor transformation results in excessive instrumentation that hinders performance.; This paper addresses this issue and introduces various techniques that aim at reducing the generated instrumentation code: a shared-data localization transformation based on Constant-Stride Linear Memory Descriptors (CSLMADs) [S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, Cambridge Monographs on Mathematical Physics, Cambridge University Press, 2003.], the inlining of data locality checks and the usage of an index vector to aggregate the data. Finally, the paper introduces a lightweight loop code motion transformation to privatize shared scalars that were propagated through the loop body.; A performance evaluation, using up to 2048 cores of a POWER 775, explores the impact of each optimization and characterizes the overheads of UPC programs. It also shows that the presented optimizations increase performance of UPC programs up to 1.8 x their UPC hand-optimized counterpart for applications with regular accesses and up to 6.3 x for applications with irregular accesses.Peer ReviewedPostprint (author's final draft

    Construction and Application of an AMR Algorithm for Distributed Memory Computers

    Get PDF
    While the parallelization of blockstructured adaptive mesh refinement techniques is relatively straight-forward on shared memory architectures, appropriate distribution strategies for the emerging generation of distributed memory machines are a topic of on-going research. In this paper, a locality-preserving domain decomposition is proposed that partitions the entire AMR hierarchy from the base level on. It is shown that the approach reduces the communication costs and simplifies the implementation. Emphasis is put on the effective parallelization of the flux correction procedure at coarse-fine boundaries, which is indispensable for conservative finite volume schemes. An easily reproducible standard benchmark and a highly resolved parallel AMR simulation of a diffracting hydrogen-oxygen detonation demonstrate the proposed strategy in practice

    High Fidelity Tape Transfer Printing Based On Chemically Induced Adhesive Strength Modulation

    Get PDF
    Transfer printing, a two-step process (i.e. picking up and printing) for heterogeneous integration, has been widely exploited for the fabrication of functional electronics system. To ensure a reliable process, strong adhesion for picking up and weak or no adhesion for printing are required. However, it is challenging to meet the requirements of switchable stamp adhesion. Here we introduce a simple, high fidelity process, namely tape transfer printing(TTP), enabled by chemically induced dramatic modulation in tape adhesive strength. We describe the working mechanism of the adhesion modulation that governs this process and demonstrate the method by high fidelity tape transfer printing several types of materials and devices, including Si pellets arrays, photodetector arrays, and electromyography (EMG) sensors, from their preparation substrates to various alien substrates. High fidelity tape transfer printing of components onto curvilinear surfaces is also illustrated

    Antenna Gain and Link Budget for Waves Carrying Orbital Angular Momentum (OAM)

    Full text link
    This paper addresses the RF link budget of a communication system using unusual waves carrying an orbital angular momentum (OAM) in order to clearly analyse the fundamental changes for telecommunication applications. The study is based on a typical configuration using circular array antennas to transmit and receive OAM waves. For any value of the OAM mode order, an original asymptotic formulation of the link budget is proposed in which equivalent antenna gains and free-space losses appear. The formulations are then validated with the results of a commercial electromagnetic simulation software. By this way, we also show how our formula can help to design a system capable of superimposing several channels on the same bandwidth and the same polarisation, based on the orthogonality of the OAM. Additional losses due to the use of this degree of freedom are notably clearly calculated to quantify the benefit and drawback according to the case.Comment: 33 pages, 11 figure
    corecore