23,352 research outputs found

    Fully automatic extraction of salient objects from videos in near real-time

    Full text link
    Automatic video segmentation plays an important role in a wide range of computer vision and image processing applications. Recently, various methods have been proposed for this purpose. The problem is that most of these methods are far from real-time processing even for low-resolution videos due to the complex procedures. To this end, we propose a new and quite fast method for automatic video segmentation with the help of 1) efficient optimization of Markov random fields with polynomial time of number of pixels by introducing graph cuts, 2) automatic, computationally efficient but stable derivation of segmentation priors using visual saliency and sequential update mechanism, and 3) an implementation strategy in the principle of stream processing with graphics processor units (GPUs). Test results indicates that our method extracts appropriate regions from videos as precisely as and much faster than previous semi-automatic methods even though any supervisions have not been incorporated.Comment: submitted to Special Issue on High Performance Computation on Hardware Accelerators, the Computer Journa

    \texttt{GooStats}: A GPU-based framework for multi-variate analysis in particle physics

    Full text link
    \texttt{GooStats} is a software framework that provides a flexible environment and common tools to implement multi-variate statistical analysis. The framework is built upon the \texttt{CERN ROOT}, \texttt{MINUIT} and \texttt{GooFit} packages. Running a multi-variate analysis in parallel on graphics processing units yields a huge boost in performance and opens new possibilities. The design and benchmark of \texttt{GooStats} are presented in this article along with illustration of its application to statistical problems.Comment: 16 pages, 10 figure

    Kinematic Modelling of Disc Galaxies using Graphics Processing Units

    Full text link
    With large-scale Integral Field Spectroscopy (IFS) surveys of thousands of galaxies currently under-way or planned, the astronomical community is in need of methods, techniques and tools that will allow the analysis of huge amounts of data. We focus on the kinematic modelling of disc galaxies and investigate the potential use of massively parallel architectures, such as the Graphics Processing Unit (GPU), as an accelerator for the computationally expensive model-fitting procedure. We review the algorithms involved in model-fitting and evaluate their suitability for GPU implementation. We employ different optimization techniques, including the Levenberg-Marquardt and Nested Sampling algorithms, but also a naive brute-force approach based on Nested Grids. We find that the GPU can accelerate the model-fitting procedure up to a factor of ~100 when compared to a single-threaded CPU, and up to a factor of ~10 when compared to a multi-threaded dual CPU configuration. Our method's accuracy, precision and robustness are assessed by successfully recovering the kinematic properties of simulated data, and also by verifying the kinematic modelling results of galaxies from the GHASP and DYNAMO surveys as found in the literature. The resulting GBKFIT code is available for download from: http://supercomputing.swin.edu.au/gbkfit.Comment: 34 pages, 16 figures, 8 tables, Accepted for publication in MNRA

    Vector operations for accelerating expensive Bayesian computations -- a tutorial guide

    Full text link
    Many applications in Bayesian statistics are extremely computationally intensive. However, they are often inherently parallel, making them prime targets for modern massively parallel processors. Multi-core and distributed computing is widely applied in the Bayesian community, however, very little attention has been given to fine-grain parallelisation using single instruction multiple data (SIMD) operations that are available on most modern commodity CPUs and is the basis of GPGPU computing. In this work, we practically demonstrate, using standard programming libraries, the utility of the SIMD approach for several topical Bayesian applications. We show that SIMD can improve the floating point arithmetic performance resulting in up to 6×6\times improvement in serial algorithm performance. Importantly, these improvements are multiplicative to any gains achieved through multi-core processing. We illustrate the potential of SIMD for accelerating Bayesian computations and provide the reader with techniques for exploiting modern massively parallel processing environments using standard tools

    On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

    Full text link
    Compute-mode rendering is becoming more and more attractive for non-standard rendering applications, due to the high flexibility of compute-mode execution. These newly designed pipelines often include streaming vertex and geometry processing stages. In typical triangle meshes, the same transformed vertex is on average required six times during rendering. To avoid redundant computation, a post-transform cache is traditionally suggested to enable reuse of vertex processing results. However, traditional caching neither scales well as the hardware becomes more parallel, nor can be efficiently implemented in a software design. We investigate alternative strategies to reusing vertex shading results on-the-fly for massively parallel software geometry processing. Forming static and dynamic batching on the data input stream, we analyze the effectiveness of identifying potential local reuse based on sorting, hashing, and efficient intra-thread-group communication. Altogether, we present four vertex reuse strategies, tailored to modern parallel architectures. Our simulations showcase that our batch-based strategies significantly outperform parallel caches in terms of reuse. On actual GPU hardware, our evaluation shows that our strategies not only lead to good reuse of processing results, but also boost performance by 23×2-3\times compared to na\"ively ignoring reuse in a variety of practical applications

    SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis

    Full text link
    Synthesizing realistic images from human drawn sketches is a challenging problem in computer graphics and vision. Existing approaches either need exact edge maps, or rely on retrieval of existing photographs. In this work, we propose a novel Generative Adversarial Network (GAN) approach that synthesizes plausible images from 50 categories including motorcycles, horses and couches. We demonstrate a data augmentation technique for sketches which is fully automatic, and we show that the augmented data is helpful to our task. We introduce a new network building block suitable for both the generator and discriminator which improves the information flow by injecting the input image at multiple scales. Compared to state-of-the-art image translation methods, our approach generates more realistic images and achieves significantly higher Inception Scores.Comment: Accepted to CVPR 201

    Convex Cauchy Schwarz Independent Component Analysis for Blind Source Separation

    Full text link
    We present a new high performance Convex Cauchy Schwarz Divergence (CCS DIV) measure for Independent Component Analysis (ICA) and Blind Source Separation (BSS). The CCS DIV measure is developed by integrating convex functions into the Cauchy Schwarz inequality. By including a convexity quality parameter, the measure has a broad control range of its convexity curvature. With this measure, a new CCS ICA algorithm is structured and a non parametric form is developed incorporating the Parzen window based distribution. Furthermore, pairwise iterative schemes are employed to tackle the high dimensional problem in BSS. We present two schemes of pairwise non parametric ICA algorithms, one is based on gradient decent and the second on the Jacobi Iterative method. Several case study scenarios are carried out on noise free and noisy mixtures of speech and music signals. Finally, the superiority of the proposed CCS ICA algorithm is demonstrated in metric comparison performance with FastICA, RobustICA, convex ICA (C ICA), and other leading existing algorithms.Comment: 13 page

    Massively Parallel Computation Using Graphics Processors with Application to Optimal Experimentation in Dynamic Control

    Get PDF
    The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability has lead to its adoption in many non-graphics applications, including wide variety of scientific computing fields. At the same time, a number of important dynamic optimal policy problems in economics are athirst of computing power to help overcome dual curses of complexity and dimensionality. We investigate if computational economics may benefit from new tools on a case study of imperfect information dynamic programming problem with learning and experimentation trade-off that is, a choice between controlling the policy target and learning system parameters. Specifically, we use a model of active learning and control of linear autoregression with unknown slope that appeared in a variety of macroeconomic policy and other contexts. The endogeneity of posterior beliefs makes the problem difficult in that the value function need not be convex and policy function need not be continuous. This complication makes the problem a suitable target for massively-parallel computation using graphics processors. Our findings are cautiously optimistic in that new tools let us easily achieve a factor of 15 performance gain relative to an implementation targeting single-core processors and thus establish a better reference point on the computational speed vs. coding complexity trade-off frontier. While further gains and wider applicability may lie behind steep learning barrier, we argue that the future of many computations belong to parallel algorithms anyway.Graphics Processing Units, CUDA programming, Dynamic programming, Learning, Experimentation

    Broad Neural Network for Change Detection in Aerial Images

    Full text link
    A change detection system takes as input two images of a region captured at two different times, and predicts which pixels in the region have undergone change over the time period. Since pixel-based analysis can be erroneous due to noise, illumination difference and other factors, contextual information is usually used to determine the class of a pixel (changed or not). This contextual information is taken into account by considering a pixel of the difference image along with its neighborhood. With the help of ground truth information, the labeled patterns are generated. Finally, Broad Learning classifier is used to get prediction about the class of each pixel. Results show that Broad Learning can classify the data set with a significantly higher F-Score than that of Multilayer Perceptron. Performance comparison has also been made with other popular classifiers, namely Multilayer Perceptron and Random Forest.Comment: Accepted at\textbf{Accepted at}: IEMGraph (International Conference on Emerging Technology in Modelling and Graphics) 2018 Date of Conference\textbf{Date of Conference}: 6-7 September, 2018 Location of Conference\textbf{Location of Conference}: Kolkatta, Indi
    corecore