23,352 research outputs found
Fully automatic extraction of salient objects from videos in near real-time
Automatic video segmentation plays an important role in a wide range of
computer vision and image processing applications. Recently, various methods
have been proposed for this purpose. The problem is that most of these methods
are far from real-time processing even for low-resolution videos due to the
complex procedures. To this end, we propose a new and quite fast method for
automatic video segmentation with the help of 1) efficient optimization of
Markov random fields with polynomial time of number of pixels by introducing
graph cuts, 2) automatic, computationally efficient but stable derivation of
segmentation priors using visual saliency and sequential update mechanism, and
3) an implementation strategy in the principle of stream processing with
graphics processor units (GPUs). Test results indicates that our method
extracts appropriate regions from videos as precisely as and much faster than
previous semi-automatic methods even though any supervisions have not been
incorporated.Comment: submitted to Special Issue on High Performance Computation on
Hardware Accelerators, the Computer Journa
\texttt{GooStats}: A GPU-based framework for multi-variate analysis in particle physics
\texttt{GooStats} is a software framework that provides a flexible
environment and common tools to implement multi-variate statistical analysis.
The framework is built upon the \texttt{CERN ROOT}, \texttt{MINUIT} and
\texttt{GooFit} packages. Running a multi-variate analysis in parallel on
graphics processing units yields a huge boost in performance and opens new
possibilities. The design and benchmark of \texttt{GooStats} are presented in
this article along with illustration of its application to statistical
problems.Comment: 16 pages, 10 figure
Kinematic Modelling of Disc Galaxies using Graphics Processing Units
With large-scale Integral Field Spectroscopy (IFS) surveys of thousands of
galaxies currently under-way or planned, the astronomical community is in need
of methods, techniques and tools that will allow the analysis of huge amounts
of data. We focus on the kinematic modelling of disc galaxies and investigate
the potential use of massively parallel architectures, such as the Graphics
Processing Unit (GPU), as an accelerator for the computationally expensive
model-fitting procedure. We review the algorithms involved in model-fitting and
evaluate their suitability for GPU implementation. We employ different
optimization techniques, including the Levenberg-Marquardt and Nested Sampling
algorithms, but also a naive brute-force approach based on Nested Grids. We
find that the GPU can accelerate the model-fitting procedure up to a factor of
~100 when compared to a single-threaded CPU, and up to a factor of ~10 when
compared to a multi-threaded dual CPU configuration. Our method's accuracy,
precision and robustness are assessed by successfully recovering the kinematic
properties of simulated data, and also by verifying the kinematic modelling
results of galaxies from the GHASP and DYNAMO surveys as found in the
literature. The resulting GBKFIT code is available for download from:
http://supercomputing.swin.edu.au/gbkfit.Comment: 34 pages, 16 figures, 8 tables, Accepted for publication in MNRA
Vector operations for accelerating expensive Bayesian computations -- a tutorial guide
Many applications in Bayesian statistics are extremely computationally
intensive. However, they are often inherently parallel, making them prime
targets for modern massively parallel processors. Multi-core and distributed
computing is widely applied in the Bayesian community, however, very little
attention has been given to fine-grain parallelisation using single instruction
multiple data (SIMD) operations that are available on most modern commodity
CPUs and is the basis of GPGPU computing. In this work, we practically
demonstrate, using standard programming libraries, the utility of the SIMD
approach for several topical Bayesian applications. We show that SIMD can
improve the floating point arithmetic performance resulting in up to
improvement in serial algorithm performance. Importantly, these improvements
are multiplicative to any gains achieved through multi-core processing. We
illustrate the potential of SIMD for accelerating Bayesian computations and
provide the reader with techniques for exploiting modern massively parallel
processing environments using standard tools
On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing
Compute-mode rendering is becoming more and more attractive for non-standard
rendering applications, due to the high flexibility of compute-mode execution.
These newly designed pipelines often include streaming vertex and geometry
processing stages. In typical triangle meshes, the same transformed vertex is
on average required six times during rendering. To avoid redundant computation,
a post-transform cache is traditionally suggested to enable reuse of vertex
processing results. However, traditional caching neither scales well as the
hardware becomes more parallel, nor can be efficiently implemented in a
software design. We investigate alternative strategies to reusing vertex
shading results on-the-fly for massively parallel software geometry processing.
Forming static and dynamic batching on the data input stream, we analyze the
effectiveness of identifying potential local reuse based on sorting, hashing,
and efficient intra-thread-group communication. Altogether, we present four
vertex reuse strategies, tailored to modern parallel architectures. Our
simulations showcase that our batch-based strategies significantly outperform
parallel caches in terms of reuse. On actual GPU hardware, our evaluation shows
that our strategies not only lead to good reuse of processing results, but also
boost performance by compared to na\"ively ignoring reuse in a
variety of practical applications
SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis
Synthesizing realistic images from human drawn sketches is a challenging
problem in computer graphics and vision. Existing approaches either need exact
edge maps, or rely on retrieval of existing photographs. In this work, we
propose a novel Generative Adversarial Network (GAN) approach that synthesizes
plausible images from 50 categories including motorcycles, horses and couches.
We demonstrate a data augmentation technique for sketches which is fully
automatic, and we show that the augmented data is helpful to our task. We
introduce a new network building block suitable for both the generator and
discriminator which improves the information flow by injecting the input image
at multiple scales. Compared to state-of-the-art image translation methods, our
approach generates more realistic images and achieves significantly higher
Inception Scores.Comment: Accepted to CVPR 201
Convex Cauchy Schwarz Independent Component Analysis for Blind Source Separation
We present a new high performance Convex Cauchy Schwarz Divergence (CCS DIV)
measure for Independent Component Analysis (ICA) and Blind Source Separation
(BSS). The CCS DIV measure is developed by integrating convex functions into
the Cauchy Schwarz inequality. By including a convexity quality parameter, the
measure has a broad control range of its convexity curvature. With this
measure, a new CCS ICA algorithm is structured and a non parametric form is
developed incorporating the Parzen window based distribution. Furthermore,
pairwise iterative schemes are employed to tackle the high dimensional problem
in BSS. We present two schemes of pairwise non parametric ICA algorithms, one
is based on gradient decent and the second on the Jacobi Iterative method.
Several case study scenarios are carried out on noise free and noisy mixtures
of speech and music signals. Finally, the superiority of the proposed CCS ICA
algorithm is demonstrated in metric comparison performance with FastICA,
RobustICA, convex ICA (C ICA), and other leading existing algorithms.Comment: 13 page
Massively Parallel Computation Using Graphics Processors with Application to Optimal Experimentation in Dynamic Control
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability has lead to its adoption in many non-graphics applications, including wide variety of scientific computing fields. At the same time, a number of important dynamic optimal policy problems in economics are athirst of computing power to help overcome dual curses of complexity and dimensionality. We investigate if computational economics may benefit from new tools on a case study of imperfect information dynamic programming problem with learning and experimentation trade-off that is, a choice between controlling the policy target and learning system parameters. Specifically, we use a model of active learning and control of linear autoregression with unknown slope that appeared in a variety of macroeconomic policy and other contexts. The endogeneity of posterior beliefs makes the problem difficult in that the value function need not be convex and policy function need not be continuous. This complication makes the problem a suitable target for massively-parallel computation using graphics processors. Our findings are cautiously optimistic in that new tools let us easily achieve a factor of 15 performance gain relative to an implementation targeting single-core processors and thus establish a better reference point on the computational speed vs. coding complexity trade-off frontier. While further gains and wider applicability may lie behind steep learning barrier, we argue that the future of many computations belong to parallel algorithms anyway.Graphics Processing Units, CUDA programming, Dynamic programming, Learning, Experimentation
Recommended from our members
Computational Strategies for Scalable Genomics Analysis.
The revolution in next-generation DNA sequencing technologies is leading to explosive data growth in genomics, posing a significant challenge to the computing infrastructure and software algorithms for genomics analysis. Various big data technologies have been explored to scale up/out current bioinformatics solutions to mine the big genomics data. In this review, we survey some of these exciting developments in the applications of parallel distributed computing and special hardware to genomics. We comment on the pros and cons of each strategy in the context of ease of development, robustness, scalability, and efficiency. Although this review is written for an audience from the genomics and bioinformatics fields, it may also be informative for the audience of computer science with interests in genomics applications
Broad Neural Network for Change Detection in Aerial Images
A change detection system takes as input two images of a region captured at
two different times, and predicts which pixels in the region have undergone
change over the time period. Since pixel-based analysis can be erroneous due to
noise, illumination difference and other factors, contextual information is
usually used to determine the class of a pixel (changed or not). This
contextual information is taken into account by considering a pixel of the
difference image along with its neighborhood. With the help of ground truth
information, the labeled patterns are generated. Finally, Broad Learning
classifier is used to get prediction about the class of each pixel. Results
show that Broad Learning can classify the data set with a significantly higher
F-Score than that of Multilayer Perceptron. Performance comparison has also
been made with other popular classifiers, namely Multilayer Perceptron and
Random Forest.Comment: : IEMGraph (International Conference on
Emerging Technology in Modelling and Graphics) 2018 : 6-7 September, 2018 :
Kolkatta, Indi
- …