Search CORE

3,790 research outputs found

GPU Acceleration of Image Convolution using Spatially-varying Kernel

Author: Hartung Steven
Miller J. Patrick
Pennypacker Carlton
Shukla Hemant
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/09/2012
Field of study

Image subtraction in astronomy is a tool for transient object discovery such as asteroids, extra-solar planets and supernovae. To match point spread functions (PSFs) between images of the same field taken at different times a convolution technique is used. Particularly suitable for large-scale images is a computationally intensive spatially-varying kernel. The underlying algorithm is inherently massively parallel due to unique kernel generation at every pixel location. The spatially-varying kernel cannot be efficiently computed through the Convolution Theorem, and thus does not lend itself to acceleration by Fast Fourier Transform (FFT). This work presents results of accelerated implementation of the spatially-varying kernel image convolution in multi-cores with OpenMP and graphic processing units (GPUs). Typical speedups over ANSI-C were a factor of 50 and a factor of 1000 over the initial IDL implementation, demonstrating that the techniques are a practical and high impact path to terabyte-per-night image pipelines and petascale processing.Comment: 4 pages. Accepted to IEEE-ICIP 201

arXiv.org e-Print Archive

Crossref

GPU acceleration of time-domain fluorescence lifetime imaging

Author: Chen Yu
Li David Day-Uei
Nowotny Thomas
Wu Gang
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 06/01/2016
Field of study

Fluorescence lifetime imaging microscopy (FLIM) plays a significant role in biological sciences, chemistry, and medical research. We propose a Graphic Processing Units (GPUs) based FLIM analysis tool suitable for high-speed and flexible time-domain FLIM applications. With a large number of parallel processors, GPUs can significantly speed up lifetime calculations compared to CPU-OpenMP (parallel computing with multiple CPU cores) based analysis. We demonstrate how to implement and optimize FLIM algorithms on GPUs for both iterative and non-iterative FLIM analysis algorithms. The implemented algorithms have been tested on both synthesized and experimental FLIM data. The results show that at the same precision the GPU analysis can be up to 24-fold faster than its CPU-OpenMP counterpart. This means that even for high precision but time-consuming iterative FLIM algorithms, GPUs enable fast or even real-time analysis

Crossref

University of Strathclyde Institutional Repository

Sussex Research Online

LiveCap: Real-time Human Performance Capture from Monocular Video

Author: Habermann Marc
Pons-Moll Gerard
Theobalt Christian
Xu Weipeng
Zollhoefer Michael
Publication venue
Publication date: 01/01/2019
Field of study

We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster

arXiv.org e-Print Archive

MPG.PuRe

Analysis of Parallel Montgomery Multiplication in CUDA

Author: Liu Yuheng
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2013
Field of study

For a given level of security, elliptic curve cryptography (ECC) offers improved efficiency over classic public key implementations. Point multiplication is the most common operation in ECC and, consequently, any significant improvement in perfor- mance will likely require accelerating point multiplication. In ECC, the Montgomery algorithm is widely used for point multiplication. The primary purpose of this project is to implement and analyze a parallel implementation of the Montgomery algorithm as it is used in ECC. Specifically, the performance of CPU-based Montgomery multiplication and a GPU-based implementation in CUDA are compared

SJSU ScholarWorks

Real time video pipeline for computer vision using embedded GPUs, A

Author: Patil Rutuja
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2016
Field of study

2016 Fall.Includes bibliographical references.This thesis presents case study confirming the feasibility of real time Computer Vision applications on embedded GPUs. Applications that depend on video processing, such as security surveillance, can benefit from applying optimizations common in scientific computing. This thesis demonstrates the benefit of applying such optimizations to real time Computer Vision applications on embedded GPUs. The primary contribution of this thesis is an optimized implementation of ViBe targeting NVIDIA's Jetson TK1. ViBe is a commonly used background subtraction algorithm. Optimizing a background subtraction algorithm accelerates the task of reducing the field of view to only interesting patches of the frames of the video. Placing portable hardware close to capturing devices in the surveillance system reduces bandwidth requirements and cost. The goals of the optimizations proposed for this algorithm are to 1) reduce memory traffic 2) overlap CPU and GPU usage 3) reduce kernel overhead. The optimized implementation of ViBe achieves a frame rate of almost 55 FPS beating the real time goal standard of 30 FPS for real time video. This is a small portion of the real-time window leaving processing time for additional algorithms like object recognition

Mountain Scholar (Digital Collections of Colorado and Wyoming)