397 research outputs found

    Parallel one-versus-rest SVM training on the GPU

    Get PDF
    Linear SVMs are a popular choice of binary classifier. It is often necessary to train many different classifiers on a multiclass dataset in a one-versus-rest fashion, and this for several values of the regularization constant. We propose to harness GPU parallelism by training as many classifiers as possible at the same time. We optimize the primal L2-loss SVM objective using the conjugate gradient method, with an adapted backtracking line search strategy. We compared our approach to liblinear and achieved speedups of up to 17 times on our available hardware

    GPU acceleration of object classification algorithms using NVIDIA CUDA

    Get PDF
    The field of computer vision has become an important part of today\u27s society, supporting crucial applications in the medical, manufacturing, military intelligence and surveillance domains. Many computer vision tasks can be divided into fundamental steps: image acquisition, pre-processing, feature extraction, detection or segmentation, and high-level processing. This work focuses on classification and object detection, specifically k-Nearest Neighbors, Support Vector Machine classification, and Viola & Jones object detection. Object detection and classification algorithms are computationally intensive, which makes it difficult to perform classification tasks in real-time. This thesis aims in overcoming the processing limitations of the above classification algorithms by offloading computation to the graphics processing unit (GPU) using NVIDIA\u27s Compute Unified Device Architecture (CUDA). The primary focus of this work is the implementation of the Viola and Jones object detector in CUDA. A multi-GPU implementation provides a speedup ranging from 1x to 6.5x over optimized OpenCV code for image sizes of 300 x 300 pixels up to 2900 x 1600 pixels while having comparable detection results. The second part of this thesis is the implementation of a multi-GPU multi-class SVM classifier. The classifier had the same accuracy as an identical implementation using LIBSVM with a speedup ranging from 89x to 263x on the tested datasets. The final part of this thesis was the extension of a previous CUDA k-Nearest Neighbor implementation by exploiting additional levels of parallelism. These extensions provided a speedup of 1.24x and 2.35x over the previous CUDA implementation. As an end result of this work, a library of these three CUDA classifiers has been compiled for use by future researchers

    Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning

    Get PDF
    Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to choose simpler, less powerful models, e.g. linear models, in order to process more training examples in a limited time. In this work, we introduce parallelism to the training of non-linear models by leveraging a different tradeoff--approximation. We demonstrate various techniques by which non-linear models can be made amenable to larger data sets and significantly more training parallelism by strategically introducing approximation in certain optimization steps. For gradient boosted regression tree ensembles, we replace precise selection of tree splits with a coarse-grained, approximate split selection, yielding both faster sequential training and a significant increase in parallelism, in the distributed setting in particular. For metric learning with nearest neighbor classification, rather than explicitly train a neighborhood structure we leverage the implicit neighborhood structure induced by task-specific random forest classifiers, yielding a highly parallel method for metric learning. For support vector machines, we follow existing work to learn a reduced basis set with extremely high parallelism, particularly on GPUs, via existing linear algebra libraries. We believe these optimization tradeoffs are widely applicable wherever machine learning is put in practice in large scale settings. By carefully introducing approximation, we also introduce significantly higher parallelism and consequently can process more training examples for more iterations than competing exact methods. While seemingly learning the model with less precision, this tradeoff often yields noticeably higher accuracy under a restricted training time budget

    Efficient multitemporal change detection techniques for hyperspectral images on GPU

    Get PDF
    Hyperspectral images contain hundreds of reflectance values for each pixel. Detecting regions of change in multiple hyperspectral images of the same scene taken at different times is of widespread interest for a large number of applications. For remote sensing, in particular, a very common application is land-cover analysis. The high dimensionality of the hyperspectral images makes the development of computationally efficient processing schemes critical. This thesis focuses on the development of change detection approaches at object level, based on supervised direct multidate classification, for hyperspectral datasets. The proposed approaches improve the accuracy of current state of the art algorithms and their projection onto Graphics Processing Units (GPUs) allows their execution in real-time scenarios
    corecore