4,634 research outputs found
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by
means of a scalable hierarchical matrix approach on clusters equipped with
graphics hardware, i.e. graphics processing units (GPUs). To this end, we
extend our existing single-GPU hierarchical matrix library hmglib such that it
is able to scale on many GPUs and such that it can be coupled to arbitrary
application codes. Using a model GPU implementation of a boundary element
method (BEM) solver, we are able to achieve more than 67 percent relative
parallel speed-up going from 128 to 1024 GPUs for a model geometry test case
with 1.5 million unknowns and a real-world geometry test case with almost 1.2
million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6
minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the
setup phase and 20 seconds for the iterative solver. To the best of the
authors' knowledge, we here discuss the first fully GPU-based
distributed-memory parallel hierarchical matrix Open Source library using the
traditional H-matrix format and adaptive cross approximation with an
application to BEM problems
Massively parallel implicit equal-weights particle filter for ocean drift trajectory forecasting
Forecasting of ocean drift trajectories are important for many applications, including search and rescue operations, oil spill cleanup and iceberg risk mitigation. In an operational setting, forecasts of drift trajectories are produced based on computationally demanding forecasts of three-dimensional ocean currents. Herein, we investigate a complementary approach for shorter time scales by using the recently proposed two-stage implicit equal-weights particle filter applied to a simplified ocean model. To achieve this, we present a new algorithmic design for a data-assimilation system in which all components – including the model, model errors, and particle filter – take advantage of massively parallel compute architectures, such as graphical processing units. Faster computations can enable in-situ and ad-hoc model runs for emergency management, and larger ensembles for better uncertainty quantification. Using a challenging test case with near-realistic chaotic instabilities, we run data-assimilation experiments based on synthetic observations from drifting and moored buoys, and analyze the trajectory forecasts for the drifters. Our results show that even sparse drifter observations are sufficient to significantly improve short-term drift forecasts up to twelve hours. With equidistant moored buoys observing only 0.1% of the state space, the ensemble gives an accurate description of the true state after data assimilation followed by a high-quality probabilistic forecast
GNA: new framework for statistical data analysis
We report on the status of GNA --- a new framework for fitting large-scale
physical models. GNA utilizes the data flow concept within which a model is
represented by a directed acyclic graph. Each node is an operation on an array
(matrix multiplication, derivative or cross section calculation, etc). The
framework enables the user to create flexible and efficient large-scale lazily
evaluated models, handle large numbers of parameters, propagate parameters'
uncertainties while taking into account possible correlations between them, fit
models, and perform statistical analysis. The main goal of the paper is to give
an overview of the main concepts and methods as well as reasons behind their
design. Detailed technical information is to be published in further works.Comment: 9 pages, 3 figures, CHEP 2018, submitted to EPJ Web of Conference
Analysing Astronomy Algorithms for GPUs and Beyond
Astronomy depends on ever increasing computing power. Processor clock-rates
have plateaued, and increased performance is now appearing in the form of
additional processor cores on a single chip. This poses significant challenges
to the astronomy software community. Graphics Processing Units (GPUs), now
capable of general-purpose computation, exemplify both the difficult
learning-curve and the significant speedups exhibited by massively-parallel
hardware architectures. We present a generalised approach to tackling this
paradigm shift, based on the analysis of algorithms. We describe a small
collection of foundation algorithms relevant to astronomy and explain how they
may be used to ease the transition to massively-parallel computing
architectures. We demonstrate the effectiveness of our approach by applying it
to four well-known astronomy problems: Hogbom CLEAN, inverse ray-shooting for
gravitational lensing, pulsar dedispersion and volume rendering. Algorithms
with well-defined memory access patterns and high arithmetic intensity stand to
receive the greatest performance boost from massively-parallel architectures,
while those that involve a significant amount of decision-making may struggle
to take advantage of the available processing power.Comment: 10 pages, 3 figures, accepted for publication in MNRA
Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery
A robust and fast automatic moving object detection and tracking system is
essential to characterize target object and extract spatial and temporal
information for different functionalities including video surveillance systems,
urban traffic monitoring and navigation, robotic. In this dissertation, I
present a collaborative Spatial Pyramid Context-aware moving object detection
and Tracking system. The proposed visual tracker is composed of one master
tracker that usually relies on visual object features and two auxiliary
trackers based on object temporal motion information that will be called
dynamically to assist master tracker. SPCT utilizes image spatial context at
different level to make the video tracking system resistant to occlusion,
background noise and improve target localization accuracy and robustness. We
chose a pre-selected seven-channel complementary features including RGB color,
intensity and spatial pyramid of HoG to encode object color, shape and spatial
layout information. We exploit integral histogram as building block to meet the
demands of real-time performance. A novel fast algorithm is presented to
accurately evaluate spatially weighted local histograms in constant time
complexity using an extension of the integral histogram method. Different
techniques are explored to efficiently compute integral histogram on GPU
architecture and applied for fast spatio-temporal median computations and 3D
face reconstruction texturing. We proposed a multi-component framework based on
semantic fusion of motion information with projected building footprint map to
significantly reduce the false alarm rate in urban scenes with many tall
structures. The experiments on extensive VOTC2016 benchmark dataset and aerial
video confirm that combining complementary tracking cues in an intelligent
fusion framework enables persistent tracking for Full Motion Video and Wide
Aerial Motion Imagery.Comment: PhD Dissertation (162 pages
- …