196 research outputs found

    Highly Scalable, Parallel and Distributed AdaBoost Algorithm Using Light Weight Threads and Web Services on a Network of Multi-Core Machines

    Get PDF
    AdaBoost is an important algorithm in machine learning and is being widely used in object detection. AdaBoost works by iteratively selecting the best amongst weak classifiers, and then combines several weak classifiers to obtain a strong classifier. Even though AdaBoost has proven to be very effective, its learning execution time can be quite large depending upon the application e.g., in face detection, the learning time can be several days. Due to its increasing use in computer vision applications, the learning time needs to be drastically reduced so that an adaptive near real time object detection system can be incorporated. In this paper, we develop a hybrid parallel and distributed AdaBoost algorithm that exploits the multiple cores in a CPU via light weight threads, and also uses multiple machines via a web service software architecture to achieve high scalability. We present a novel hierarchical web services based distributed architecture and achieve nearly linear speedup up to the number of processors available to us. In comparison with the previously published work, which used a single level master-slave parallel and distributed implementation [1] and only achieved a speedup of 2.66 on four nodes, we achieve a speedup of 95.1 on 31 workstations each having a quad-core processor, resulting in a learning time of only 4.8 seconds per feature

    Parallel error-correcting output codes classification in volume visualization

    Get PDF
    In volume visualization, the definition of the regions of interest is inherently an iterative trial-and-error process finding out the best parameters to classify and render the final image. Generally, the user requires a lot of expertise to analyze and edit these parameters through multi-dimensional transfer functions. In this thesis, we present a framework of methods to label on-demand multiple regions of interest. The methods selected are a combination of 1vs1 Adaboost binary classifiers and an ECOC framework to combine binary results to generate a multi-class result. On a first step, Adaboost is used to train a set of 1vs1 binary classifiers, with a labeled subset of points on the target volume. On a second step, an ECOC framework is used to combine the Adaboost classifiers and classify the rest of the volume, assigning a label to each point among multiple possible labels. The labels have to be introduced by an expert on the target volume, and this labels have to be a small subset of all the points on the volume we want to classify. That way, we require a small e↵ort to the expert. But this requires an interactive process where the classification results are obtained in real or near real-time. That why on this master thesis we implemented the classification step in OpenCL, to exploit the parallelism in modern GPU. We provide experimental results for both accuracy on classification and execution time speedup, comparing GPU to single and multi-core CPU. Along with this work we will present some work derived from the use of OpenCL for the experiments, that we shared in OpenSource through Google code, and some abstraction on the parallelization process for any algorithm. Also, we will comment on future work and present some conclusions as the final sections of this document

    Application-level Performance Optimization: A Computer Vision Case Study on STHORM

    Get PDF
    AbstractComputer vision applications constitute one of the key drivers for embedded many-core architectures. In order to exploit the full potential of such systems, a balance between computation and communication is critical, but many computer vision algorithms present a highly data-dependent behavior that complexifies this task. To enable application performance optimization, the development environment must provide the developer with tools for fast and precise application-level performance analysis. We describe the process to port and optimize a face detection application onto the STHORM many-core accelerator using the STHORM OpenCL SDK. We identify the main factors that limit performance and discern the contributions arising from: the application itself, the OpenCL programming model, and the STHORM OpenCL SDK. Finally, we show how these issues can be addressed in the future to enable developers to further improve application performance

    Approximation and Relaxation Approaches for Parallel and Distributed Machine Learning

    Get PDF
    Large scale machine learning requires tradeoffs. Commonly this tradeoff has led practitioners to choose simpler, less powerful models, e.g. linear models, in order to process more training examples in a limited time. In this work, we introduce parallelism to the training of non-linear models by leveraging a different tradeoff--approximation. We demonstrate various techniques by which non-linear models can be made amenable to larger data sets and significantly more training parallelism by strategically introducing approximation in certain optimization steps. For gradient boosted regression tree ensembles, we replace precise selection of tree splits with a coarse-grained, approximate split selection, yielding both faster sequential training and a significant increase in parallelism, in the distributed setting in particular. For metric learning with nearest neighbor classification, rather than explicitly train a neighborhood structure we leverage the implicit neighborhood structure induced by task-specific random forest classifiers, yielding a highly parallel method for metric learning. For support vector machines, we follow existing work to learn a reduced basis set with extremely high parallelism, particularly on GPUs, via existing linear algebra libraries. We believe these optimization tradeoffs are widely applicable wherever machine learning is put in practice in large scale settings. By carefully introducing approximation, we also introduce significantly higher parallelism and consequently can process more training examples for more iterations than competing exact methods. While seemingly learning the model with less precision, this tradeoff often yields noticeably higher accuracy under a restricted training time budget

    Fast Face Detector Training Using Tailored Views

    Full text link
    Face detection is an important task in computer vision and often serves as the first step for a variety of applications. State-of-the-art approaches use efficient learning algorithms and train on large amounts of manually labeled imagery. Acquiring appropriate training images, however, is very time-consuming and does not guarantee that the collected training data is representative in terms of data variability. Moreover, available data sets are often acquired under con-trolled settings, restricting, for example, scene illumination or 3D head pose to a narrow range. This paper takes a look into the automated generation of adaptive training samples from a 3D morphable face model. Using statistical insights, the tailored training data guarantees full data variability and is enriched by arbitrary facial attributes such as age or body weight. Moreover, it can automatically adapt to environmental constraints, such as illumination or viewing angle of recorded video footage from surveillance cameras. We use the tailored imagery to train a new many-core imple-mentation of Viola Jones ’ AdaBoost object detection frame-work. The new implementation is not only faster but also enables the use of multiple feature channels such as color features at training time. In our experiments we trained seven view-dependent face detectors and evaluate these on the Face Detection Data Set and Benchmark (FDDB). Our experiments show that the use of tailored training imagery outperforms state-of-the-art approaches on this challenging dataset. 1

    A Near Real-Time, Highly Scalable, Parallel and Distributed Adaptive Object Detection and Re-Training Framework Based on the Adaboost Algorithm

    Get PDF
    Object detection, such as face detection using supervised learning, often requires extensive training for the computer, which results in high execution times. If the trained system needs re-training in order to accommodate a missed detection, waiting several hours or days before the system is ready may be unacceptable in practical implementations. This dissertation presents a generalized object detection framework whereby the system can efficiently adapt to misclassified data and be re-trained within a few minutes. Our developed methodology is based on the popular AdaBoost algorithm for object detection. AdaBoost functions by iteratively selecting the best among weak classifiers, and then combining several weak classifiers in order to obtain a stronger classifier. Even though AdaBoost has proven to be very effective, its learning execution time can be high depending upon the application. For example, in face detection, learning can take several days. In our dissertation, we present two techniques that contribute to reducing to the learning execution time within the AdaBoost algorithm. Our first technique utilizes a highly parallel and distributed AdaBoost algorithm that exploits the multiple cores in a CPU via lightweight threads. In addition, our technique uses multiple machines in a web service similar to a map-reduce architecture in order to achieve a high scalability, which results in a training execution time of a few minutes rather than several days. Our second technique is a methodology to create an optimal training subset to further reduce the training execution time. We obtained this subset through a novel score-keeping of the weight distribution within the AdaBoost algorithm, and then removed the images that had a minimal effect on the overall trained classifier. Finally, we incorporated our parallel and distributed AdaBoost algorithm, along with the optimized training subset, into a generalized object detection framework that efficiently adapts and makes corrections when it encounters misclassified data. We demonstrated the usefulness of our adaptive framework by providing detailed testing on face and car detection, and explained how our framework applies to developing any other object detection task

    並列計算アクセラレータへの効率的なアプリケーションマッピングに関する研究

    Get PDF
    長崎大学学位論文 学位記番号:博(工)甲第3号 学位授与年月日:平成26年3月20日Nagasaki University (長崎大学)課程博
    corecore