1,063 research outputs found

    Programmable remapper for image processing

    Get PDF
    A video-rate coordinate remapper includes a memory for storing a plurality of transformations on look-up tables for remapping input images from one coordinate system to another. Such transformations are operator selectable. The remapper includes a collective processor by which certain input pixels of an input image are transformed to a portion of the output image in a many-to-one relationship. The remapper includes an interpolative processor by which the remaining input pixels of the input image are transformed to another portion of the output image in a one-to-many relationship. The invention includes certain specific transforms for creating output images useful for certain defects of visually impaired people. The invention also includes means for shifting input pixels and means for scrolling the output matrix

    MUSIC - Multisimulation Coordinator: Request For Comments

    Get PDF
    MUSIC is an API allowing large scale neuron simulators using MPI internally to exchange data during runtime. MUSIC provides mechanisms to transfer massive amounts of event information and continuous values from one parallel application to another. Special care has been taken to ensure that existing simulators can be adapted to MUSIC. In particular, MUSIC handles data transfer between applications that use different time steps and different data allocation strategies. This RFC - Request For Comments - document invites comments on the proposed design and prototype specifications. 
&#xa

    Optimal Compilation of HPF Remappings

    No full text
    International audienceApplications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines. HPF (High Performance Fortran) provides such remappings, on data that can be replicated, explicitly through therealign andredistribute directives and implicitly at procedure calls and returns. However such features are left out of the HPF subset or of the currently discussed hpf kernel for effeciency reasons. This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. The first phase is global and removes all useless remappings that appear naturally in procedures. The code generated by the second phase takes advantage of replications to shorten the remapping time. It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in HPFC, our prototype HPF compiler. Experiments were performed on a Dec Alpha farm

    Numerical simulation of friction welding processes: An arbitrary Lagrangian-Eulerian approach

    Get PDF
    The development and implementation of a finite strain thermo-viscoplasticity solver with thermomechanical friction contact for numerical simulation of friction welding processes are described. A finite strain associative coupled thermoplasticity model is used, which is suited for the large deformations characteristic of friction welding processes, and which resolves the viscoplastic deformations in the thermomechanically affected zone as well as the elastic stresses in the parent material. To prevent the large deformations from causing large distortions and degrading the simulation accuracy, an arbitrary Lagrangian Eulerian (ALE) formulation for coupled finite strain thermoplasticity is developed and incorporated into the solver, in which the motion of the reference configuration is represented incrementally in terms of a reference velocity field. Thus, the deformation from the material configuration is required neither explicitly in terms of a deformation field, nor implicitly in terms of the deformation gradient. The solver is implemented using the deal. II library and programmed for distributed memory parallel computing architectures, which reduces simulation run times and enables simulations with larger meshes than would fit on a single computer. The interprocess communications required in such a distributed memory parallel implementation of the ALE formulation and the thermomechanical friction contact are described and implemented. The axisymmetric solver implementation is validated with benchmark problems and used to simulate a direct drive friction welding process

    Layered architecture for quantum computing

    Full text link
    We develop a layered quantum computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface code quantum error correction. In doing so, we propose a new quantum computer architecture based on optical control of quantum dots. The timescales of physical hardware operations and logical, error-corrected quantum gates differ by several orders of magnitude. By dividing functionality into layers, we can design and analyze subsystems independently, demonstrating the value of our layered architectural approach. Using this concrete hardware platform, we provide resource analysis for executing fault-tolerant quantum algorithms for integer factoring and quantum simulation, finding that the quantum dot architecture we study could solve such problems on the timescale of days.Comment: 27 pages, 20 figure

    Automatic Generation of Efficient Sparse Tensor Format Conversion Routines

    Full text link
    This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor's nonzeros in memory. This lets a compiler emit code that performs complex remappings of nonzeros when converting between formats. We also develop a query language that can extract statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of many formats without hard-coding for any specific combination. Our evaluation shows that the technique generates sparse tensor conversion routines with performance between 1.00 and 2.01Ă—\times that of hand-optimized versions in SPARSKIT and Intel MKL, two popular sparse linear algebra libraries. And by emitting code that avoids materializing temporaries, which both libraries need for many combinations of source and target formats, our technique outperforms those libraries by 1.78 to 4.01Ă—\times for CSC/COO to DIA/ELL conversion.Comment: Presented at PLDI 202
    • …
    corecore