1,732 research outputs found

    GPU acceleration of object classification algorithms using NVIDIA CUDA

    Get PDF
    The field of computer vision has become an important part of today\u27s society, supporting crucial applications in the medical, manufacturing, military intelligence and surveillance domains. Many computer vision tasks can be divided into fundamental steps: image acquisition, pre-processing, feature extraction, detection or segmentation, and high-level processing. This work focuses on classification and object detection, specifically k-Nearest Neighbors, Support Vector Machine classification, and Viola & Jones object detection. Object detection and classification algorithms are computationally intensive, which makes it difficult to perform classification tasks in real-time. This thesis aims in overcoming the processing limitations of the above classification algorithms by offloading computation to the graphics processing unit (GPU) using NVIDIA\u27s Compute Unified Device Architecture (CUDA). The primary focus of this work is the implementation of the Viola and Jones object detector in CUDA. A multi-GPU implementation provides a speedup ranging from 1x to 6.5x over optimized OpenCV code for image sizes of 300 x 300 pixels up to 2900 x 1600 pixels while having comparable detection results. The second part of this thesis is the implementation of a multi-GPU multi-class SVM classifier. The classifier had the same accuracy as an identical implementation using LIBSVM with a speedup ranging from 89x to 263x on the tested datasets. The final part of this thesis was the extension of a previous CUDA k-Nearest Neighbor implementation by exploiting additional levels of parallelism. These extensions provided a speedup of 1.24x and 2.35x over the previous CUDA implementation. As an end result of this work, a library of these three CUDA classifiers has been compiled for use by future researchers

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras
    • …
    corecore