46,678 research outputs found

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

    Developing efficient web-based GIS applications

    Get PDF
    There is an increase in the number of web-based GIS applications over the recent years. This paper describes different mapping technologies, database standards, and web application development standards that are relevant to the development of web-based GIS applications. Different mapping technologies for displaying geo-referenced data are available and can be used in different situations. This paper also explains why Oracle is the system of choice for geospatial applications that need to handle large amounts of data. Wireframing and design patterns have been shown to be useful in making GIS web applications efficient, scalable and usable, and should be an important part of every web-based GIS application. A range of different development technologies are available, and their use in different operating environments has been discussed here in some detail

    PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

    Full text link
    High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

    A terahertz grid frequency doubler

    Get PDF
    We present a 144-element terahertz quasi-optical grid frequency doubler. The grid is a planar structure with bow-tie antennas as a unit cell, each loaded with a planar Schottky diode. The maximum output power measured for this grid is 24 mW at 1 THz for 3.1-ÎŒs 500-GHz input pulses with a peak input power of 47 W. An efficiency of 0.17% for an input power of 6.3 W and output power of 10.8 mW is measured. To date, this is the largest recorded output power for a multiplier at terahertz frequencies. Input and output tuning curves are presented and an output pattern is measured and compared to theory

    Devito: Towards a generic Finite Difference DSL using Symbolic Python

    Full text link
    Domain specific languages (DSL) have been used in a variety of fields to express complex scientific problems in a concise manner and provide automated performance optimization for a range of computational architectures. As such DSLs provide a powerful mechanism to speed up scientific Python computation that goes beyond traditional vectorization and pre-compilation approaches, while allowing domain scientists to build applications within the comforts of the Python software ecosystem. In this paper we present Devito, a new finite difference DSL that provides optimized stencil computation from high-level problem specifications based on symbolic Python expressions. We demonstrate Devito's symbolic API and performance advantages over traditional Python acceleration methods before highlighting its use in the scientific context of seismic inversion problems.Comment: pyHPC 2016 conference submissio

    Dynamics estimation and generalized tuning of stationary frame current controller for grid-tied power converters

    Get PDF
    The integration of AC-DC power converters to manage the connection of generation to the grid has increased exponentially over the last years. PV or wind generation plants are one of the main applications showing this trend. High power converters are increasingly installed for integrating the renewables in a larger scale. The control design for these converters becomes more challenging due to the reduced control bandwidth and increased complexity in the grid connection filter. A generalized and optimized control tuning approach for converters becomes more favored. This paper proposes an algorithm for estimating the dynamic performance of the stationary frame current controllers, and based on it a generalized and optimized tuning approach is developed. The experience-based specifications of the tuning inputs are not necessary through the tuning approach. Simulation and experimental results in different scenarios are shown to evaluate the proposal.Peer ReviewedPostprint (published version

    An Extensible Timing Infrastructure for Adaptive Large-scale Applications

    Full text link
    Real-time access to accurate and reliable timing information is necessary to profile scientific applications, and crucial as simulations become increasingly complex, adaptive, and large-scale. The Cactus Framework provides flexible and extensible capabilities for timing information through a well designed infrastructure and timing API. Applications built with Cactus automatically gain access to built-in timers, such as gettimeofday and getrusage, system-specific hardware clocks, and high-level interfaces such as PAPI. We describe the Cactus timer interface, its motivation, and its implementation. We then demonstrate how this timing information can be used by an example scientific application to profile itself, and to dynamically adapt itself to a changing environment at run time
    • 

    corecore