46,151 research outputs found

    Radiative transfer on hierarchial grids

    Full text link
    We present new methods for radiative transfer on hierarchial grids. We develop a new method for calculating the scattered flux that employs the grid structure to speed up the computation. We describe a novel subiteration algorithm that can be used to accelerate calculations with strong dust temperature self-coupling. We compute two test models, a molecular cloud and a circumstellar disc, and compare the accuracy and speed of the new algorithms against existing methods. An adaptive model of the molecular cloud with less than 8 % of the cells in the uniform grid produced results in good agreement with the full resolution model. The relative RMS error of the surface brightness <4 % at all wavelengths, and in regions of high column density the relative RMS error was only 10^{-4}. Computation with the adaptive model was faster by a factor of ~5. The new method for calculating the scattered flux is faster by a factor of ~4 in large models with a deep hierarchy structure, when images of the scattered light are computed towards several observing directions. The efficiency of the subiteration algorithm is highly dependent on the details of the model. In the circumstellar disc test the speed-up was a factor of two, but much larger gains are possible. The algorithm is expected to be most beneficial in models where a large number of small, dense regions are embedded in an environment with a lower mean density.Comment: Accepted to A&A; 13 pages, 8 figures; (v2: minor typos corrected

    Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification

    Full text link
    Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and contain specialized semi-programmable accelerators in addition to programmable processors. In contrast to the pre-accelerator era, when the ISA played an important role in verification by enabling a clean separation of concerns between software and hardware, verification of these "accelerator-rich" SoCs presents new challenges. From the perspective of hardware designers, there is a lack of a common framework for the formal functional specification of accelerator behavior. From the perspective of software developers, there exists no unified framework for reasoning about software/hardware interactions of programs that interact with accelerators. This paper addresses these challenges by providing a formal specification and high-level abstraction for accelerator functional behavior. It formalizes the concept of an Instruction Level Abstraction (ILA), developed informally in our previous work, and shows its application in modeling and verification of accelerators. This formal ILA extends the familiar notion of instructions to accelerators and provides a uniform, modular, and hierarchical abstraction for modeling software-visible behavior of both accelerators and programmable processors. We demonstrate the applicability of the ILA through several case studies of accelerators (for image processing, machine learning, and cryptography), and a general-purpose processor (RISC-V). We show how the ILA model facilitates equivalence checking between two ILAs, and between an ILA and its hardware finite-state machine (FSM) implementation. Further, this equivalence checking supports accelerator upgrades using the notion of ILA compatibility, similar to processor upgrades using ISA compatibility.Comment: 24 pages, 3 figures, 3 table

    Adaptive Mesh Fluid Simulations on GPU

    Full text link
    We describe an implementation of compressible inviscid fluid solvers with block-structured adaptive mesh refinement on Graphics Processing Units using NVIDIA's CUDA. We show that a class of high resolution shock capturing schemes can be mapped naturally on this architecture. Using the method of lines approach with the second order total variation diminishing Runge-Kutta time integration scheme, piecewise linear reconstruction, and a Harten-Lax-van Leer Riemann solver, we achieve an overall speedup of approximately 10 times faster execution on one graphics card as compared to a single core on the host computer. We attain this speedup in uniform grid runs as well as in problems with deep AMR hierarchies. Our framework can readily be applied to more general systems of conservation laws and extended to higher order shock capturing schemes. This is shown directly by an implementation of a magneto-hydrodynamic solver and comparing its performance to the pure hydrodynamic case. Finally, we also combined our CUDA parallel scheme with MPI to make the code run on GPU clusters. Close to ideal speedup is observed on up to four GPUs.Comment: Submitted to New Astronom
    • …
    corecore