1,003 research outputs found

    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    Get PDF
    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció

    Improvement of Fuzzy Geographically Weighted Clustering-Ant Colony Optimization Performance using Context-Based Clustering and CUDA Parallel Programming

    Get PDF
    Geo-demographic analysis (GDA) is the study of population characteristics by geographical area. Fuzzy Geographically Weighted Clustering (FGWC) is an effective algorithm used in GDA. Improvement of FGWC has been done by integrating a metaheuristic algorithm, Ant Colony Optimization (ACO), as a global optimization tool to increase the clustering accuracy in the initial stage of the FGWC algorithm. However, using ACO in FGWC increases the time to run the algorithm compared to the standard FGWC algorithm. In this paper, context-based clustering and CUDA parallel programming are proposed to improve the performance of the improved algorithm (FGWC-ACO). Context-based clustering is a method that focuses on the grouping of data based on certain conditions, while CUDA parallel programming is a method that uses the graphical processing unit (GPU) as a parallel processing tool. The Indonesian Population Census 2010 was used as the experimental dataset. It was shown that the proposed methods were able to improve the performance of FGWC-ACO without reducing the clustering quality of the original method. The clustering quality was evaluated using the clustering validity index

    What does fault tolerant Deep Learning need from MPI?

    Full text link
    Deep Learning (DL) algorithms have become the de facto Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive - even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults - requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: What is needed from MPI for de- signing fault tolerant DL implementations? In this paper, we address this problem for permanent faults. We motivate the need for a fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by ex- tending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet, and GoogLeNet neural network topologies demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM

    Parallel evaluation of Pittsburgh rule-based classifiers on GPUs

    Get PDF
    Individuals from Pittsburgh rule-based classifiers represent a complete solution to the classification problem and each individual is a variable-length set of rules. Therefore, these systems usually demand a high level of computational resources and run-time, which increases as the complexity and the size of the data sets. It is known that this computational cost is mainly due to the recurring evaluation process of the rules and the individuals as rule sets. In this paper we propose a parallel evaluation model of rules and rule sets on GPUs based on the NVIDIA CUDA programming model which significantly allows reducing the run-time and speeding up the algorithm. The results obtained from the experimental study support the great efficiency and high performance of the GPU model, which is scalable to multiple GPU devices. The GPU model achieves a rule interpreter performance of up to 64 billion operations per second and the evaluation of the individuals is speeded up of up to 3.461× when compared to the CPU model. This provides a significant advantage of the GPU model, especially addressing large and complex problems within reasonable time, where the CPU run-time is not acceptabl

    Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

    Get PDF

    THROUGHPUT OPTIMIZATION AND RESOURCE ALLOCATION ON GPUS UNDER MULTI-APPLICATION EXECUTION

    Get PDF
    Platform heterogeneity prevails as a solution to the throughput and computational chal- lenges imposed by parallel applications and technology scaling. Specifically, Graphics Processing Units (GPUs) are based on the Single Instruction Multiple Thread (SIMT) paradigm and they can offer tremendous speed-up for parallel applications. However, GPUs were designed to execute a single application at a time. In case of simultaneous multi-application execution, due to the GPUs’ massive multi-threading paradigm, ap- plications compete against each other using destructively the shared resources (caches and memory controllers) resulting in significant throughput degradation. In this thesis, a methodology for minimizing interference in shared resources and provide efficient con- current execution of multiple applications on GPUs is presented. Particularly, the pro- posed methodology (i) performs application classification; (ii) analyzes the per-class in- terference; (iii) finds the best matching between classes; and (iv) employs an efficient re- source allocation. Experimental results showed that the proposed approach increases the throughput of the system for two concurrent applications by an average of 36% compared to other optimization techniques, while for three concurrent applications the proposed approach achieved an average gain of 23%
    • …
    corecore