171,446 research outputs found
Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures
Feltor is a modular and free scientific software package. It allows
developing platform independent code that runs on a variety of parallel
computer architectures ranging from laptop CPUs to multi-GPU distributed memory
systems. Feltor consists of both a numerical library and a collection of
application codes built on top of the library. Its main target are two- and
three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin
methods as the main numerical discretization technique. We observe that
numerical simulations of a recently developed gyro-fluid model produce
non-deterministic results in parallel computations. First, we show how we
restore accuracy and bitwise reproducibility algorithmically and
programmatically. In particular, we adopt an implementation of the exactly
rounded dot product based on long accumulators, which avoids accuracy losses
especially in parallel applications. However, reproducibility and accuracy
alone fail to indicate correct simulation behaviour. In fact, in the physical
model slightly different initial conditions lead to vastly different end
states. This behaviour translates to its numerical representation. Pointwise
convergence, even in principle, becomes impossible for long simulation times.
In a second part, we explore important performance tuning considerations. We
identify latency and memory bandwidth as the main performance indicators of our
routines. Based on these, we propose a parallel performance model that predicts
the execution time of algorithms implemented in Feltor and test our model on a
selection of parallel hardware architectures. We are able to predict the
execution time with a relative error of less than 25% for problem sizes between
0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth
gives a minimum array size per compute node to achieve a scaling efficiency
above 50% (both strong and weak)
Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs
In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain
MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor
This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented
GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh
Refinement code), which has adopted a novel approach to improve the performance
of adaptive mesh refinement (AMR) astrophysical simulations by a large factor
with the use of the graphic processing unit (GPU). The AMR implementation is
based on a hierarchy of grid patches with an oct-tree data structure. We adopt
a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a
multi-level relaxation scheme for the Poisson solver. Both solvers have been
implemented in GPU, by which hundreds of patches can be advanced in parallel.
The computational overhead associated with the data transfer between CPU and
GPU is carefully reduced by utilizing the capability of asynchronous memory
copies in GPU, and the computing time of the ghost-zone values for each patch
is made to diminish by overlapping it with the GPU computations. We demonstrate
the accuracy of the code by performing several standard test problems in
astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster
system. We measure the performance of the code by performing purely-baryonic
cosmological simulations in different hardware implementations, in which
detailed timing analyses provide comparison between the computations with and
without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are
demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with
8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included.
Accepted for publication in ApJ
Design and implementation of a multi-octave-band audio camera for realtime diagnosis
Noise pollution investigation takes advantage of two common methods of
diagnosis: measurement using a Sound Level Meter and acoustical imaging. The
former enables a detailed analysis of the surrounding noise spectrum whereas
the latter is rather used for source localization. Both approaches complete
each other, and merging them into a unique system, working in realtime, would
offer new possibilities of dynamic diagnosis. This paper describes the design
of a complete system for this purpose: imaging in realtime the acoustic field
at different octave bands, with a convenient device. The acoustic field is
sampled in time and space using an array of MEMS microphones. This recent
technology enables a compact and fully digital design of the system. However,
performing realtime imaging with resource-intensive algorithm on a large amount
of measured data confronts with a technical challenge. This is overcome by
executing the whole process on a Graphic Processing Unit, which has recently
become an attractive device for parallel computing
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications
Heterogeneous High-Performance Computing
(HPC) platforms present a significant programming challenge,
especially because the key users of HPC resources are scientists,
not parallel programmers. We contend that compiler technology
has to evolve to automatically create the best program variant
by transforming a given original program. We have developed a
novel methodology based on type transformations for generating
correct-by-construction design variants, and an associated
light-weight cost model for evaluating these variants for
implementation on FPGAs. In this paper we present a key
enabler of our approach, the cost model. We discuss how we
are able to quickly derive accurate estimates of performance
and resource-utilization from the design’s representation in our
intermediate language. We show results confirming the accuracy
of our cost model by testing it on three different scientific
kernels. We conclude with a case-study that compares a solution
generated by our framework with one from a conventional
high-level synthesis tool, showing better performance and
power-efficiency using our cost model based approach
Improving the Accuracy and Scope of Control-Oriented Vapor Compression Cycle System Models
The benefits of applying advanced control techniques to vapor compression cycle systems are well know.
The main advantages are improved performance and efficiency, the achievement of which brings both economic and
environmental gains. One of the most significant hurdles to the practical application of advanced control techniques
is the development of a dynamic system level model that is both accurate and mathematically tractable. Previous
efforts in control-oriented modeling have produced a class of heat exchanger models known as moving-boundary
models. When combined with mass flow device models, these moving-boundary models provide an excellent
framework for both dynamic analysis and control design. This thesis contains the results of research carried out to
increase both the accuracy and scope of these system level models.
The improvements to the existing vapor compression cycle models are carried out through the application
of various modeling techniques, some static and some dynamic, some data-based and some physics-based. Semiempirical
static modeling techniques are used to increase the accuracy of both heat exchangers and mass flow
devices over a wide range of operating conditions. Dynamic modeling techniques are used both to derive new
component models that are essential to the simulation of very common vapor compression cycle systems and to
improve the accuracy of the existing compressor model. A new heat exchanger model that accounts for the effects
of moisture in the air is presented. All of these model improvements and additions are unified to create a simple but
accurate system level model with a wide range of application. Extensive model validation results are presented,
providing both qualitative and quantitative evaluation of the new models and model improvements.Air Conditioning and Refrigeration Project 17
Adaptive computational methods for aerothermal heating analysis
The development of adaptive gridding techniques for finite-element analysis of fluid dynamics equations is described. The developmental work was done with the Euler equations with concentration on shock and inviscid flow field capturing. Ultimately this methodology is to be applied to a viscous analysis for the purpose of predicting accurate aerothermal loads on complex shapes subjected to high speed flow environments. The development of local error estimate strategies as a basis for refinement strategies is discussed, as well as the refinement strategies themselves. The application of the strategies to triangular elements and a finite-element flux-corrected-transport numerical scheme are presented. The implementation of these strategies in the GIM/PAGE code for 2-D and 3-D applications is documented and demonstrated
Link-wise Artificial Compressibility Method
The Artificial Compressibility Method (ACM) for the incompressible
Navier-Stokes equations is (link-wise) reformulated (referred to as LW-ACM) by
a finite set of discrete directions (links) on a regular Cartesian mesh, in
analogy with the Lattice Boltzmann Method (LBM). The main advantage is the
possibility of exploiting well established technologies originally developed
for LBM and classical computational fluid dynamics, with special emphasis on
finite differences (at least in the present paper), at the cost of minor
changes. For instance, wall boundaries not aligned with the background
Cartesian mesh can be taken into account by tracing the intersections of each
link with the wall (analogously to LBM technology). LW-ACM requires no
high-order moments beyond hydrodynamics (often referred to as ghost moments)
and no kinetic expansion. Like finite difference schemes, only standard Taylor
expansion is needed for analyzing consistency. Preliminary efforts towards
optimal implementations have shown that LW-ACM is capable of similar
computational speed as optimized (BGK-) LBM. In addition, the memory demand is
significantly smaller than (BGK-) LBM. Importantly, with an efficient
implementation, this algorithm may be one of the few which is compute-bound and
not memory-bound. Two- and three-dimensional benchmarks are investigated, and
an extensive comparative study between the present approach and state of the
art methods from the literature is carried out. Numerical evidences suggest
that LW-ACM represents an excellent alternative in terms of simplicity,
stability and accuracy.Comment: 62 pages, 20 figure
- …