2,960 research outputs found
Parallel Algorithm for Solving Kepler's Equation on Graphics Processing Units: Application to Analysis of Doppler Exoplanet Searches
[Abridged] We present the results of a highly parallel Kepler equation solver
using the Graphics Processing Unit (GPU) on a commercial nVidia GeForce 280GTX
and the "Compute Unified Device Architecture" programming environment. We apply
this to evaluate a goodness-of-fit statistic (e.g., chi^2) for Doppler
observations of stars potentially harboring multiple planetary companions
(assuming negligible planet-planet interactions). We tested multiple
implementations using single precision, double precision, pairs of single
precision, and mixed precision arithmetic. We find that the vast majority of
computations can be performed using single precision arithmetic, with selective
use of compensated summation for increased precision. However, standard single
precision is not adequate for calculating the mean anomaly from the time of
observation and orbital period when evaluating the goodness-of-fit for real
planetary systems and observational data sets. Using all double precision, our
GPU code outperforms a similar code using a modern CPU by a factor of over 60.
Using mixed-precision, our GPU code provides a speed-up factor of over 600,
when evaluating N_sys > 1024 models planetary systems each containing N_pl = 4
planets and assuming N_obs = 256 observations of each system. We conclude that
modern GPUs also offer a powerful tool for repeatedly evaluating Kepler's
equation and a goodness-of-fit statistic for orbital models when presented with
a large parameter space.Comment: 19 pages, to appear in New Astronom
Enhanced molecular dynamics performance with a programmable graphics processor
Design considerations for molecular dynamics algorithms capable of taking
advantage of the computational power of a graphics processing unit (GPU) are
described. Accommodating the constraints of scalable streaming-multiprocessor
hardware necessitates a reformulation of the underlying algorithm. Performance
measurements demonstrate the considerable benefit and cost-effectiveness of
such an approach, which produces a factor of 2.5 speed improvement over
previous work for the case of the soft-sphere potential.Comment: 20 pages (v2: minor additions and changes; v3: corrected typos
Multi-view passive 3D face acquisition device
Approaches to acquisition of 3D facial data include laser scanners, structured
light devices and (passive) stereo vision. The laser scanner and structured light
methods allow accurate reconstruction of the 3D surface but strong light is projected
on the faces of subjects. Passive stereo vision based approaches do not require strong
light to be projected, however, it is hard to obtain comparable accuracy and robustness
of the surface reconstruction. In this paper a passive multiple view approach using
5 cameras in a ’+’ configuration is proposed that significantly increases robustness
and accuracy relative to traditional stereo vision approaches. The normalised cross
correlations of all 5 views are combined using direct projection of points instead of
the traditionally used rectified images. Also, errors caused by different perspective
deformation of the surface in the different views are reduced by using an iterative reconstruction
technique where the depth estimation of the previous iteration is used to
warp the windows of the normalised cross correlation for the different views
Accelerating incoherent dedispersion
Incoherent dedispersion is a computationally intensive problem that appears
frequently in pulsar and transient astronomy. For current and future transient
pipelines, dedispersion can dominate the total execution time, meaning its
computational speed acts as a constraint on the quality and quantity of science
results. It is thus critical that the algorithm be able to take advantage of
trends in commodity computing hardware. With this goal in mind, we present
analysis of the 'direct', 'tree' and 'sub-band' dedispersion algorithms with
respect to their potential for efficient execution on modern graphics
processing units (GPUs). We find all three to be excellent candidates, and
proceed to describe implementations in C for CUDA using insight gained from the
analysis. Using recent CPU and GPU hardware, the transition to the GPU provides
a speed-up of 9x for the direct algorithm when compared to an optimised
quad-core CPU code. For realistic recent survey parameters, these speeds are
high enough that further optimisation is unnecessary to achieve real-time
processing. Where further speed-ups are desirable, we find that the tree and
sub-band algorithms are able to provide 3-7x better performance at the cost of
certain smearing, memory consumption and development time trade-offs. We finish
with a discussion of the implications of these results for future transient
surveys. Our GPU dedispersion code is publicly available as a C library at:
http://dedisp.googlecode.com/Comment: 15 pages, 4 figures, 2 tables, accepted for publication in MNRA
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
Using image morphing for memory-efficient impostor rendering on GPU
Real-time rendering of large animated crowds consisting thousands of virtual humans is important for several applications including simulations, games and interactive walkthroughs; but cannot be performed using complex polygonal models at interactive frame rates. For that reason, several methods using large numbers of pre-computed image-based representations, which are called as impostors, have been proposed. These methods take the advantage of existing programmable graphics hardware to compensate the computational expense while maintaining the visual fidelity. Making the number of different virtual humans, which can be rendered in real-time, not restricted anymore by the required computational power but by the texture memory consumed for the variety and discretization of their animations. In this work, we proposed an alternative method that reduces the memory consumption by generating compelling intermediate textures using image-morphing techniques. In order to demonstrate the preserved perceptual quality of animations, where half of the key-frames were rendered using the proposed methodology, we have implemented the system using the graphical processing unit and obtained promising results at interactive frame rates
A survey of real-time crowd rendering
In this survey we review, classify and compare existing approaches for real-time crowd rendering. We first overview character animation techniques, as they are highly tied to crowd rendering performance, and then we analyze the state of the art in crowd rendering. We discuss different representations for level-of-detail (LoD) rendering of animated characters, including polygon-based, point-based, and image-based techniques, and review different criteria for runtime LoD selection. Besides LoD approaches, we review classic acceleration schemes, such as frustum culling and occlusion culling, and describe how they can be adapted to handle crowds of animated characters. We also discuss specific acceleration techniques for crowd rendering, such as primitive pseudo-instancing, palette skinning, and dynamic key-pose caching, which benefit from current graphics hardware. We also address other factors affecting performance and realism of crowds such as lighting, shadowing, clothing and variability. Finally we provide an exhaustive comparison of the most relevant approaches in the field.Peer ReviewedPostprint (author's final draft
- …