3,440 research outputs found
High Lundquist Number Simulations of Parker\u27s Model of Coronal Heating: Scaling and Current Sheet Statistics Using Heterogeneous Computing Architectures
Parker\u27s model [Parker, Astrophys. J., 174, 499 (1972)] is one of the most discussed mechanisms for coronal heating and has generated much debate. We have recently obtained new scaling results for a 2D version of this problem suggesting that the heating rate becomes independent of resistivity in a statistical steady state [Ng and Bhattacharjee, Astrophys. J., 675, 899 (2008)]. Our numerical work has now been extended to 3D using high resolution MHD numerical simulations. Random photospheric footpoint motion is applied for a time much longer than the correlation time of the motion to obtain converged average coronal heating rates. Simulations are done for different values of the Lundquist number to determine scaling. In the high-Lundquist number limit (S \u3e 1000), the coronal heating rate obtained is consistent with a trend that is independent of the Lundquist number, as predicted by previous analysis and 2D simulations. We will present scaling analysis showing that when the dissipation time is comparable or larger than the correlation time of the random footpoint motion, the heating rate tends to become independent of Lundquist number, and that the magnetic energy production is also reduced significantly. We also present a comprehensive reprogramming of our simulation code to run on NVidia graphics processing units using the Compute Unified Device Architecture (CUDA) and report code performance on several large scale heterogenous machines
Vanishing Point Detection with Direct and Transposed Fast Hough Transform inside the neural network
In this paper, we suggest a new neural network architecture for vanishing
point detection in images. The key element is the use of the direct and
transposed Fast Hough Transforms separated by convolutional layer blocks with
standard activation functions. It allows us to get the answer in the
coordinates of the input image at the output of the network and thus to
calculate the coordinates of the vanishing point by simply selecting the
maximum. Besides, it was proved that calculation of the transposed Fast Hough
Transform can be performed using the direct one. The use of integral operators
enables the neural network to rely on global rectilinear features in the image,
and so it is ideal for detecting vanishing points. To demonstrate the
effectiveness of the proposed architecture, we use a set of images from a DVR
and show its superiority over existing methods. Note, in addition, that the
proposed neural network architecture essentially repeats the process of direct
and back projection used, for example, in computed tomography.Comment: 9 pages, 9 figures, submitted to "Computer Optics"; extra experiment
added, new theorem proof added, references added; typos correcte
The GENGA Code: Gravitational Encounters in N-body simulations with GPU Acceleration
We describe an open source GPU implementation of a hybrid symplectic N-body
integrator, GENGA (Gravitational ENcounters with Gpu Acceleration), designed to
integrate planet and planetesimal dynamics in the late stage of planet
formation and stability analyses of planetary systems. GENGA uses a hybrid
symplectic integrator to handle close encounters with very good energy
conservation, which is essential in long-term planetary system integration. We
extended the second order hybrid integration scheme to higher orders. The GENGA
code supports three simulation modes: Integration of up to 2048 massive bodies,
integration with up to a million test particles, or parallel integration of a
large number of individual planetary systems. We compare the results of GENGA
to Mercury and pkdgrav2 in respect of energy conservation and performance, and
find that the energy conservation of GENGA is comparable to Mercury and around
two orders of magnitude better than pkdgrav2. GENGA runs up to 30 times faster
than Mercury and up to eight times faster than pkdgrav2. GENGA is written in
CUDA C and runs on all NVIDIA GPUs with compute capability of at least 2.0.Comment: Accepted by ApJ. 18 pages, 17 figures, 4 table
BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images
In cryo-electron microscopy (EM), molecular structures are determined from
large numbers of projection images of individual particles. To harness the full
power of this single-molecule information, we use the Bayesian inference of EM
(BioEM) formalism. By ranking structural models using posterior probabilities
calculated for individual images, BioEM in principle addresses the challenge of
working with highly dynamic or heterogeneous systems not easily handled in
traditional EM reconstruction. However, the calculation of these posteriors for
large numbers of particles and models is computationally demanding. Here we
present highly parallelized, GPU-accelerated computer software that performs
this task efficiently. Our flexible formulation employs CUDA, OpenMP, and MPI
parallelization combined with both CPU and GPU computing. The resulting BioEM
software scales nearly ideally both on pure CPU and on CPU+GPU architectures,
thus enabling Bayesian analysis of tens of thousands of images in a reasonable
time. The general mathematical framework and robust algorithms are not limited
to cryo-electron microscopy but can be generalized for electron tomography and
other imaging experiments
CampProf: A Visual Performance Analysis Tool for Memory Bound GPU Kernels
Current GPU tools and performance models provide some common architectural insights that guide the programmers to write optimal code. We challenge these performance models, by modeling and analyzing a lesser known, but very severe performance pitfall, called 'Partition Camping', in NVIDIA GPUs. Partition Camping is caused by memory accesses that are skewed towards a subset of the available memory partitions, which may degrade the performance of memory-bound CUDA kernels by up to seven-times. No existing tool can detect the partition camping effect in CUDA kernels.
We complement the existing tools by developing 'CampProf', a spreadsheet based, visual analysis tool, that detects the degree to which any memory-bound kernel suffers from partition camping. In addition, CampProf also predicts the kernel's performance at all execution configurations, if its performance parameters are known at any one of them. To demonstrate the utility of CampProf, we analyze three different applications using our tool, and demonstrate how it can be used to discover partition camping. We also demonstrate how CampProf can be used to monitor the performance improvements in the kernels, as the partition camping effect is being removed.
The performance model that drives CampProf was developed by applying multiple linear regression techniques over a set of specific micro-benchmarks that simulated the partition camping behavior. Our results show that the geometric mean of errors in our prediction model is within 12% of the actual execution times. In summary, CampProf is a new, accurate, and easy-to-use tool that can be used in conjunction with the existing tools to analyze and improve the overall performance of memory-bound CUDA kernels
- …