15,919 research outputs found

    CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures

    Get PDF
    The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendorneutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a “write once, run anywhere” ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to- OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of our automatically translated applications via CU2CL is on par with their manually ported countparts

    Experiences in porting mini-applications to OpenACC and OpenMP on heterogeneous systems

    Get PDF
    This article studies mini-applications—Minisweep, GenASiS, GPP, and FF—that use computational methods commonly encountered in HPC. We have ported these applications to develop OpenACC and OpenMP versions, and evaluated their performance on Titan (Cray XK7 with K20x GPUs), Cori (Cray XC40 with Intel KNL), Summit (IBM AC922 with Volta GPUs), and Cori-GPU (Cray CS-Storm 500NX with Intel Skylake and Volta GPUs). Our goals are for these new ports to be useful to both application and compiler developers, to document and describe the lessons learned and the methodology to create optimized OpenMP and OpenACC versions, and to provide a description of possible migration paths between the two specifications. Cases where specific directives or code patterns result in improved performance for a given architecture are highlighted. We also include discussions of the functionality and maturity of the latest compilers available on the above platforms with respect to OpenACC or OpenMP implementations

    Revisiting Actor Programming in C++

    Full text link
    The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor framework that subsumes full scalability, strong reliability, and high resource efficiency requires many conceptual and algorithmic additives to the original model. In this paper, we report on designing and building CAF, the "C++ Actor Framework". CAF targets at providing a concurrent and distributed native environment for scaling up to very large, high-performance applications, and equally well down to small constrained systems. We present the key specifications and design concepts---in particular a message-transparent architecture, type-safe message interfaces, and pattern matching facilities---that make native actors a viable approach for many robust, elastic, and highly distributed developments. We demonstrate the feasibility of CAF in three scenarios: first for elastic, upscaling environments, second for including heterogeneous hardware like GPGPUs, and third for distributed runtime systems. Extensive performance evaluations indicate ideal runtime behaviour for up to 64 cores at very low memory footprint, or in the presence of GPUs. In these tests, CAF continuously outperforms the competing actor environments Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the OpenMPI.Comment: 33 page

    Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging

    Full text link
    Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes. The mathematical descriptions of these functions are extremely concise, but their implementation in real code is tedious, especially when optimized for real-time performance on modern GPUs in interactive applications. In this work, we propose a new language, Opt (available under http://optlang.org), for writing these objective functions over image- or graph-structured unknowns concisely and at a high level. Our compiler automatically transforms these specifications into state-of-the-art GPU solvers based on Gauss-Newton or Levenberg-Marquardt methods. Opt can generate different variations of the solver, so users can easily explore tradeoffs in numerical precision, matrix-free methods, and solver approaches. In our results, we implement a variety of real-world graphics and vision applications. Their energy functions are expressible in tens of lines of code, and produce highly-optimized GPU solver implementations. These solver have performance competitive with the best published hand-tuned, application-specific GPU solvers, and orders of magnitude beyond a general-purpose auto-generated solver

    Nmag micromagnetic simulation tool - software engineering lessons learned

    Full text link
    We review design and development decisions and their impact for the open source code Nmag from a software engineering in computational science point of view. We summarise lessons learned and recommendations for future computational science projects. Key lessons include that encapsulating the simulation functionality in a library of a general purpose language, here Python, provides great flexibility in using the software. The choice of Python for the top-level user interface was very well received by users from the science and engineering community. The from-source installation in which required external libraries and dependencies are compiled from a tarball was remarkably robust. In places, the code is a lot more ambitious than necessary, which introduces unnecessary complexity and reduces main- tainability. Tests distributed with the package are useful, although more unit tests and continuous integration would have been desirable. The detailed documentation, together with a tutorial for the usage of the system, was perceived as one of its main strengths by the community.Comment: 7 pages, 5 figures, Software Engineering for Science, ICSE201
    • …
    corecore