205 research outputs found

    Parallelizing Training of Deep Generative Models on Massive Scientific Datasets

    Full text link
    Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train traditional as well as generative adversarial networks built on LBANN, a scalable deep learning framework optimized for HPC systems. LBANN combines multiple levels of parallelism and exploits some of the worlds largest supercomputers. We demonstrate our framework by creating a complex predictive model based on multi-variate data from high-energy-density physics containing hundreds of millions of images and hundreds of millions of scalar values derived from tens of millions of simulations of inertial confinement fusion. Our approach combines an HPC workflow and extends LBANN with optimized data ingestion and the new tournament-style training algorithm to produce a scalable neural network architecture using a CORAL-class supercomputer. Experimental results show that 64 trainers (1024 GPUs) achieve a speedup of 70.2 over a single trainer (16 GPUs) baseline, and an effective 109% parallel efficiency

    ROBUST AND PARALLEL SEGMENTATION MODEL (RPSM) FOR EARLY DETECTION OF SKIN CANCER DISEASE USING HETEROGENEOUS DISTRIBUTIONS

    Get PDF
    Melanoma is the most common dangerous type of skin cancer; however, it is preventable if it is diagnosed early. Diagnosis of Melanoma would be improved if an accurate skin image segmentation model is available. Many computer vision methods have been investigated, yet the problem of finding a consistent and robust model that extracts the best threshold value, persists. This paper suggests a novel image segmentation approach using a multilevel cross entropy thresholding algorithm based on heterogeneous distributions. The proposed strategy searches the problem space by segmenting the image into several levels, and applying for each level one of the three benchmark distributions, including Gaussian, Lognormal or Gamma, which are combined to estimate the best thresholds that optimally extract the segmented regions. The classical technique of Minimum Cross Entropy Thresholding (MCET) is considered as the objective function for the applied method. Furthermore, a parallel processing algorithm is suggested to minimize the computational time of the proposed segmentation model in order to boost its performance. The efficiency of the proposed RPSM model is evaluated based on two datasets for skin cancer images: The International Skin Imaging Collaboration (ISIC) and Planet Hunters 2 (PH2). In conclusion, the proposed RPSM model shows a significant reduced processing time and reveals better accuracy and stable results, compared to other segmentation models. Design/methodology – The proposed model estimates two optimum threshold values that lead to extract optimally three segmented regions by combining the three benchmark statistical distributions: Gamma, Gaussian and lognormal. Outcomes – Based on the experimental results, the suggested segmentation methodology using MCET, could be nominated as a robust, precise and extremely reliable model with high efficiency. Novelty/utility –A novel multilevel segmentation model is developed using MCET technique and based on a combination of three statistical distributions: Gamma, Gaussian, and Lognormal. Moreover, this model is boosted by a parallelized method to reduce the processing time of the segmentation. Therefore, the suggested model should be considered as a precious mechanism in skin cancer disease detection

    Investigating the latency cost of statistical learning of a Gaussian mixture simulating on a convolutional density network with adaptive batch size technique for background modeling

    Get PDF
    Background modeling is a promising field of study in video analysis, with a wide range of applications in video surveillance. Deep neural networks have proliferated in recent years as a result of effective learning-based approaches to motion analysis. However, these strategies only provide a partial description of the observed scenes' insufficient properties since they use a single-valued mapping to estimate the target background's temporal conditional averages. On the other hand, statistical learning in the imagery domain has become one of the most widely used approaches due to its high adaptability to dynamic context transformation, especially Gaussian Mixture Models. Specifically, these probabilistic models aim to adjust latent parameters to gain high expectation of realistically observed data; however, this approach only concentrates on contextual dynamics in short-term analysis. In a prolonged investigation, it is challenging so that statistical methods cannot reserve the generalization of long-term variation of image data. Balancing the trade-off between traditional machine learning models and deep neural networks requires an integrated approach to ensure accuracy in conception while maintaining a high speed of execution. In this research, we present a novel two-stage approach for detecting changes using two convolutional neural networks in this work. The first architecture is based on unsupervised Gaussian mixtures statistical learning, which is used to classify the salient features of scenes. The second one implements a light-weighted pipeline of foreground detection. Our two-stage system has a total of approximately 3.5K parameters but still converges quickly to complex motion patterns. Our experiments on publicly accessible datasets demonstrate that our proposed networks are not only capable of generalizing regions of moving objects with promising results in unseen scenarios, but also competitive in terms of performance quality and effectiveness foreground segmentation. Apart from modeling the data's underlying generator as a non-convex optimization problem, we briefly examine the communication cost associated with the network training by using a distributed scheme of data-parallelism to simulate a stochastic gradient descent algorithm with communication avoidance for parallel machine learnin

    Bio-Inspired Optimization of Ultra-Wideband Patch Antennas Using Graphics Processing Unit Acceleration

    Get PDF
    Ultra-wideband (UWB) wireless systems have recently gained considerable attention as effective communications platforms with the properties of low power and high data rates. Applications of UWB such as wireless USB put size constraints on the antenna, however, which can be very dicult to meet using typical narrow band antenna designs. The aim of this thesis is to show how bio-inspired evolutionary optimization algorithms, in particular genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO) can produce novel UWB planar patch antenna designs that meet a size constraint of a 10 mm 10 mm patch. Each potential antenna design is evaluated with the nite dierence time domain (FDTD) technique, which is accurate but time-consuming. Another aspect of this thesis is the modication of FDTD to run on a graphics processing unit (GPU) to obtain nearly a 20 speedup. With the combination of GA, PSO, BBO and GPU-accelerated FDTD, three novel antenna designs are produced that meet the size and bandwidth requirements applicable to UWB wireless USB system

    Parallel error-correcting output codes classification in volume visualization

    Get PDF
    In volume visualization, the definition of the regions of interest is inherently an iterative trial-and-error process finding out the best parameters to classify and render the final image. Generally, the user requires a lot of expertise to analyze and edit these parameters through multi-dimensional transfer functions. In this thesis, we present a framework of methods to label on-demand multiple regions of interest. The methods selected are a combination of 1vs1 Adaboost binary classifiers and an ECOC framework to combine binary results to generate a multi-class result. On a first step, Adaboost is used to train a set of 1vs1 binary classifiers, with a labeled subset of points on the target volume. On a second step, an ECOC framework is used to combine the Adaboost classifiers and classify the rest of the volume, assigning a label to each point among multiple possible labels. The labels have to be introduced by an expert on the target volume, and this labels have to be a small subset of all the points on the volume we want to classify. That way, we require a small e↵ort to the expert. But this requires an interactive process where the classification results are obtained in real or near real-time. That why on this master thesis we implemented the classification step in OpenCL, to exploit the parallelism in modern GPU. We provide experimental results for both accuracy on classification and execution time speedup, comparing GPU to single and multi-core CPU. Along with this work we will present some work derived from the use of OpenCL for the experiments, that we shared in OpenSource through Google code, and some abstraction on the parallelization process for any algorithm. Also, we will comment on future work and present some conclusions as the final sections of this document
    corecore