171,446 research outputs found

    Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures

    Get PDF
    Feltor is a modular and free scientific software package. It allows developing platform independent code that runs on a variety of parallel computer architectures ranging from laptop CPUs to multi-GPU distributed memory systems. Feltor consists of both a numerical library and a collection of application codes built on top of the library. Its main target are two- and three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin methods as the main numerical discretization technique. We observe that numerical simulations of a recently developed gyro-fluid model produce non-deterministic results in parallel computations. First, we show how we restore accuracy and bitwise reproducibility algorithmically and programmatically. In particular, we adopt an implementation of the exactly rounded dot product based on long accumulators, which avoids accuracy losses especially in parallel applications. However, reproducibility and accuracy alone fail to indicate correct simulation behaviour. In fact, in the physical model slightly different initial conditions lead to vastly different end states. This behaviour translates to its numerical representation. Pointwise convergence, even in principle, becomes impossible for long simulation times. In a second part, we explore important performance tuning considerations. We identify latency and memory bandwidth as the main performance indicators of our routines. Based on these, we propose a parallel performance model that predicts the execution time of algorithms implemented in Feltor and test our model on a selection of parallel hardware architectures. We are able to predict the execution time with a relative error of less than 25% for problem sizes between 0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth gives a minimum array size per compute node to achieve a scaling efficiency above 50% (both strong and weak)

    Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs

    Get PDF
    In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain

    MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor

    Get PDF
    This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented

    GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics

    Full text link
    We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh Refinement code), which has adopted a novel approach to improve the performance of adaptive mesh refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is made to diminish by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely-baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with 8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included. Accepted for publication in ApJ

    Design and implementation of a multi-octave-band audio camera for realtime diagnosis

    Full text link
    Noise pollution investigation takes advantage of two common methods of diagnosis: measurement using a Sound Level Meter and acoustical imaging. The former enables a detailed analysis of the surrounding noise spectrum whereas the latter is rather used for source localization. Both approaches complete each other, and merging them into a unique system, working in realtime, would offer new possibilities of dynamic diagnosis. This paper describes the design of a complete system for this purpose: imaging in realtime the acoustic field at different octave bands, with a convenient device. The acoustic field is sampled in time and space using an array of MEMS microphones. This recent technology enables a compact and fully digital design of the system. However, performing realtime imaging with resource-intensive algorithm on a large amount of measured data confronts with a technical challenge. This is overcome by executing the whole process on a Graphic Processing Unit, which has recently become an attractive device for parallel computing

    A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications

    Get PDF
    Heterogeneous High-Performance Computing (HPC) platforms present a significant programming challenge, especially because the key users of HPC resources are scientists, not parallel programmers. We contend that compiler technology has to evolve to automatically create the best program variant by transforming a given original program. We have developed a novel methodology based on type transformations for generating correct-by-construction design variants, and an associated light-weight cost model for evaluating these variants for implementation on FPGAs. In this paper we present a key enabler of our approach, the cost model. We discuss how we are able to quickly derive accurate estimates of performance and resource-utilization from the design’s representation in our intermediate language. We show results confirming the accuracy of our cost model by testing it on three different scientific kernels. We conclude with a case-study that compares a solution generated by our framework with one from a conventional high-level synthesis tool, showing better performance and power-efficiency using our cost model based approach

    Improving the Accuracy and Scope of Control-Oriented Vapor Compression Cycle System Models

    Get PDF
    The benefits of applying advanced control techniques to vapor compression cycle systems are well know. The main advantages are improved performance and efficiency, the achievement of which brings both economic and environmental gains. One of the most significant hurdles to the practical application of advanced control techniques is the development of a dynamic system level model that is both accurate and mathematically tractable. Previous efforts in control-oriented modeling have produced a class of heat exchanger models known as moving-boundary models. When combined with mass flow device models, these moving-boundary models provide an excellent framework for both dynamic analysis and control design. This thesis contains the results of research carried out to increase both the accuracy and scope of these system level models. The improvements to the existing vapor compression cycle models are carried out through the application of various modeling techniques, some static and some dynamic, some data-based and some physics-based. Semiempirical static modeling techniques are used to increase the accuracy of both heat exchangers and mass flow devices over a wide range of operating conditions. Dynamic modeling techniques are used both to derive new component models that are essential to the simulation of very common vapor compression cycle systems and to improve the accuracy of the existing compressor model. A new heat exchanger model that accounts for the effects of moisture in the air is presented. All of these model improvements and additions are unified to create a simple but accurate system level model with a wide range of application. Extensive model validation results are presented, providing both qualitative and quantitative evaluation of the new models and model improvements.Air Conditioning and Refrigeration Project 17

    Adaptive computational methods for aerothermal heating analysis

    Get PDF
    The development of adaptive gridding techniques for finite-element analysis of fluid dynamics equations is described. The developmental work was done with the Euler equations with concentration on shock and inviscid flow field capturing. Ultimately this methodology is to be applied to a viscous analysis for the purpose of predicting accurate aerothermal loads on complex shapes subjected to high speed flow environments. The development of local error estimate strategies as a basis for refinement strategies is discussed, as well as the refinement strategies themselves. The application of the strategies to triangular elements and a finite-element flux-corrected-transport numerical scheme are presented. The implementation of these strategies in the GIM/PAGE code for 2-D and 3-D applications is documented and demonstrated

    Link-wise Artificial Compressibility Method

    Get PDF
    The Artificial Compressibility Method (ACM) for the incompressible Navier-Stokes equations is (link-wise) reformulated (referred to as LW-ACM) by a finite set of discrete directions (links) on a regular Cartesian mesh, in analogy with the Lattice Boltzmann Method (LBM). The main advantage is the possibility of exploiting well established technologies originally developed for LBM and classical computational fluid dynamics, with special emphasis on finite differences (at least in the present paper), at the cost of minor changes. For instance, wall boundaries not aligned with the background Cartesian mesh can be taken into account by tracing the intersections of each link with the wall (analogously to LBM technology). LW-ACM requires no high-order moments beyond hydrodynamics (often referred to as ghost moments) and no kinetic expansion. Like finite difference schemes, only standard Taylor expansion is needed for analyzing consistency. Preliminary efforts towards optimal implementations have shown that LW-ACM is capable of similar computational speed as optimized (BGK-) LBM. In addition, the memory demand is significantly smaller than (BGK-) LBM. Importantly, with an efficient implementation, this algorithm may be one of the few which is compute-bound and not memory-bound. Two- and three-dimensional benchmarks are investigated, and an extensive comparative study between the present approach and state of the art methods from the literature is carried out. Numerical evidences suggest that LW-ACM represents an excellent alternative in terms of simplicity, stability and accuracy.Comment: 62 pages, 20 figure
    corecore