46,151 research outputs found
Radiative transfer on hierarchial grids
We present new methods for radiative transfer on hierarchial grids. We
develop a new method for calculating the scattered flux that employs the grid
structure to speed up the computation. We describe a novel subiteration
algorithm that can be used to accelerate calculations with strong dust
temperature self-coupling. We compute two test models, a molecular cloud and a
circumstellar disc, and compare the accuracy and speed of the new algorithms
against existing methods. An adaptive model of the molecular cloud with less
than 8 % of the cells in the uniform grid produced results in good agreement
with the full resolution model. The relative RMS error of the surface
brightness <4 % at all wavelengths, and in regions of high column density the
relative RMS error was only 10^{-4}. Computation with the adaptive model was
faster by a factor of ~5. The new method for calculating the scattered flux is
faster by a factor of ~4 in large models with a deep hierarchy structure, when
images of the scattered light are computed towards several observing
directions. The efficiency of the subiteration algorithm is highly dependent on
the details of the model. In the circumstellar disc test the speed-up was a
factor of two, but much larger gains are possible. The algorithm is expected to
be most beneficial in models where a large number of small, dense regions are
embedded in an environment with a lower mean density.Comment: Accepted to A&A; 13 pages, 8 figures; (v2: minor typos corrected
Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification
Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and
contain specialized semi-programmable accelerators in addition to programmable
processors. In contrast to the pre-accelerator era, when the ISA played an
important role in verification by enabling a clean separation of concerns
between software and hardware, verification of these "accelerator-rich" SoCs
presents new challenges. From the perspective of hardware designers, there is a
lack of a common framework for the formal functional specification of
accelerator behavior. From the perspective of software developers, there exists
no unified framework for reasoning about software/hardware interactions of
programs that interact with accelerators. This paper addresses these challenges
by providing a formal specification and high-level abstraction for accelerator
functional behavior. It formalizes the concept of an Instruction Level
Abstraction (ILA), developed informally in our previous work, and shows its
application in modeling and verification of accelerators. This formal ILA
extends the familiar notion of instructions to accelerators and provides a
uniform, modular, and hierarchical abstraction for modeling software-visible
behavior of both accelerators and programmable processors. We demonstrate the
applicability of the ILA through several case studies of accelerators (for
image processing, machine learning, and cryptography), and a general-purpose
processor (RISC-V). We show how the ILA model facilitates equivalence checking
between two ILAs, and between an ILA and its hardware finite-state machine
(FSM) implementation. Further, this equivalence checking supports accelerator
upgrades using the notion of ILA compatibility, similar to processor upgrades
using ISA compatibility.Comment: 24 pages, 3 figures, 3 table
Adaptive Mesh Fluid Simulations on GPU
We describe an implementation of compressible inviscid fluid solvers with
block-structured adaptive mesh refinement on Graphics Processing Units using
NVIDIA's CUDA. We show that a class of high resolution shock capturing schemes
can be mapped naturally on this architecture. Using the method of lines
approach with the second order total variation diminishing Runge-Kutta time
integration scheme, piecewise linear reconstruction, and a Harten-Lax-van Leer
Riemann solver, we achieve an overall speedup of approximately 10 times faster
execution on one graphics card as compared to a single core on the host
computer. We attain this speedup in uniform grid runs as well as in problems
with deep AMR hierarchies. Our framework can readily be applied to more general
systems of conservation laws and extended to higher order shock capturing
schemes. This is shown directly by an implementation of a magneto-hydrodynamic
solver and comparing its performance to the pure hydrodynamic case. Finally, we
also combined our CUDA parallel scheme with MPI to make the code run on GPU
clusters. Close to ideal speedup is observed on up to four GPUs.Comment: Submitted to New Astronom
- …