290 research outputs found

    Parallel Genetic Algorithms with GPU Computing

    Get PDF
    Genetic algorithms (GAs) are powerful solutions to optimization problems arising from manufacturing and logistic fields. It helps to find better solutions for complex and difficult cases, which are hard to be solved by using strict optimization methods. Accelerating parallel GAs with GPU computing have received significant attention from both practitioners and researchers, ever since the emergence of GPU-CPU heterogeneous architectures. Designing a parallel algorithm on GPU is different fundamentally from designing one on CPU. On CPU architecture, typically data or tasks are distributed across tens of threads or processes, while on GPU architecture, more than hundreds of thousands of threads run. In order to fully utilize the computing power of GPUs, the design approaches and implementation strategies of parallel GAs should be re-probed. In the chapter, a concise overview of parallel GAs on GPU is given from the perspective of GPU architecture. The concept of parallelism granularity is redefined, the aspect of data layout is discussed on how it will affect the kernel performance, and the hierarchy of threads is examined on how threads are organized in the grid and blocks to expose sufficient parallelism to GPU. Some future research is discussed. A hybrid parallel model, based on the feature of GPU architecture, is suggested to build up efficient parallel GAs for hyper-scale problems

    Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations

    Full text link
    We propose Paraiso, a domain specific language embedded in functional programming language Haskell, for automated tuning of explicit solvers of partial differential equations (PDEs) on GPUs as well as multicore CPUs. In Paraiso, one can describe PDE solving algorithms succinctly using tensor equations notation. Hydrodynamic properties, interpolation methods and other building blocks are described in abstract, modular, re-usable and combinable forms, which lets us generate versatile solvers from little set of Paraiso source codes. We demonstrate Paraiso by implementing a compressive hydrodynamics solver. A single source code less than 500 lines can be used to generate solvers of arbitrary dimensions, for both multicore CPUs and GPUs. We demonstrate both manual annotation based tuning and evolutionary computing based automated tuning of the program.Comment: 52 pages, 14 figures, accepted for publications in Computational Science and Discover

    Autotuning for Automatic Parallelization on Heterogeneous Systems

    Get PDF

    Distributed evolutionary algorithms and their models: A survey of the state-of-the-art

    Get PDF
    The increasing complexity of real-world optimization problems raises new challenges to evolutionary computation. Responding to these challenges, distributed evolutionary computation has received considerable attention over the past decade. This article provides a comprehensive survey of the state-of-the-art distributed evolutionary algorithms and models, which have been classified into two groups according to their task division mechanism. Population-distributed models are presented with master-slave, island, cellular, hierarchical, and pool architectures, which parallelize an evolution task at population, individual, or operation levels. Dimension-distributed models include coevolution and multi-agent models, which focus on dimension reduction. Insights into the models, such as synchronization, homogeneity, communication, topology, speedup, advantages and disadvantages are also presented and discussed. The study of these models helps guide future development of different and/or improved algorithms. Also highlighted are recent hotspots in this area, including the cloud and MapReduce-based implementations, GPU and CUDA-based implementations, distributed evolutionary multiobjective optimization, and real-world applications. Further, a number of future research directions have been discussed, with a conclusion that the development of distributed evolutionary computation will continue to flourish

    GPU-accelerated parallel gene-pool optimal mixing in a gray-box optimization setting

    Get PDF
    In a Gray-Box Optimization (GBO) setting that allows for partial evaluations, the fitness of an individual can be updated efficiently after a subset of its variables has been modified. This enables more efficient evolutionary optimization with the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) due to its key strength: Gene-pool Optimal Mixing (GOM). For each solution, GOM performs variation for many (small) sets of variables. To improve efficiency even further, parallel computing can be leveraged. For EAs, typically, this comprises population-wise parallelization. However, unless population sizes are large, this offers limited gains. For large GBO problems, parallelizing GOM-based variation holds greater speed-up potential, regardless of population size. However, this potential cannot be directly exploited because of dependencies between variables. We show how graph coloring can be used to group sets of variables that can undergo variation in parallel without violating dependencies. We test the performance of a CUDA implementation of parallel GOM on a Graphics Processing Unit (GPU) for the Max-Cut problem, a well-known problem for which the dependency structure can be controlled. We find that, for sufficiently large graphs with limited connectivity, finding high-quality solutions can be achieved up to 100 times faster, showcasing the great potential of our approach

    Solving the DNA fragment assembly problem with a parallel discrete firefly algorithm implemented on GPU

    Get PDF
    The Deoxyribonucleic Acid Fragment Assembly Problem (DNA-FAP) consists in reconstructing a DNA chain from a set of fragments taken randomly. This problem represents an important step in the genome project. Several authors are proposed different approaches to solve the DNA-FAP. In particular, nature-inspired algorithms have been used for its resolution. Even they were obtaining good results; its computational time associated is high. The bio-inspired algorithms are iterative search processes that can explore and exploit efficiently the solution space. Firefly Algorithm is one of the recent evolutionary computing models which is inspired by the flashing light behaviour of fireflies. Recently, the Graphics Processing Units (GPUs) technology are emerge as a novel environment for a parallel implementation and execution of bio-inspired algorithms. Therefore, the use of GPU-based parallel computing it is possible as a complementary tool to speed-up the search. In this work, we design and implement a Discrete Firefly Algorithm (DFA) on a GPU architecture in order to speed-up the search process for solving the DNA Fragment Assembly Problem. Through several experiments, the efficiency of the algorithm and the quality of the results are demonstrated with the potential to applied for longer sequences or sequences of unknown length as well.Fil: Vidal, Pablo Javier. Universidad Nacional de la Patagonia Austral. Unidad Académica Caleta Olivia. Departamento de Ciencias Exactas y Naturales; Argentina. Universidad Nacional de la Patagonia Austral. Centro de Investigaciones y Transferencia Golfo San Jorge. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro de Investigaciones y Transferencia Golfo San Jorge. Universidad Nacional de la Patagonia "San Juan Bosco". Centro de Investigaciones y Transferencia Golfo San Jorge; ArgentinaFil: Olivera, Ana Carolina. Universidad Nacional de la Patagonia Austral. Centro de Investigaciones y Transferencia Golfo San Jorge. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro de Investigaciones y Transferencia Golfo San Jorge. Universidad Nacional de la Patagonia "San Juan Bosco". Centro de Investigaciones y Transferencia Golfo San Jorge; Argentina. Universidad Nacional de la Patagonia Austral. Unidad Académica Caleta Olivia. Departamento de Ciencias Exactas y Naturales; Argentin

    Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems

    Get PDF
    Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of high-performance computer systems. A significant performance improvement can be achieved when suitable workloads are handled by the accelerator. Traditional CPUs can handle those workloads not well suited for accelerators. Combination of multiple types of processors in a single computer system is referred to as a heterogeneous system. This dissertation addresses tuning and scheduling issues in heterogeneous systems. The first section presents work on tuning scientific workloads on three different types of processors: multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU; common tuning methods and platform-specific tuning techniques are presented. Then, analysis is done to demonstrate the performance characteristics of the heterogeneous system on different input data. This section of the dissertation is part of the GeauxDock project, which prototyped a few state-of-art bioinformatics algorithms, and delivered a fast molecular docking program. The second section of this work studies the performance model of the GeauxDock computing kernel. Specifically, the work presents an extraction of features from the input data set and the target systems, and then uses various regression models to calculate the perspective computation time. This helps understand why a certain processor is faster for certain sets of tasks. It also provides the essential information for scheduling on heterogeneous systems. In addition, this dissertation investigates a high-level task scheduling framework for heterogeneous processor systems in which, the pros and cons of using different heterogeneous processors can complement each other. Thus a higher performance can be achieve on heterogeneous computing systems. A new scheduling algorithm with four innovations is presented: Ranked Opportunistic Balancing (ROB), Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and Automatic Small Tasks Rearranging (ASTR). The new algorithm consistently outperforms previously proposed algorithms with better scheduling results, lower computational complexity, and more consistent results over a range of performance prediction errors. Finally, this work extends the heterogeneous task scheduling algorithm to handle power capping feature. It demonstrates that a power-aware scheduler significantly improves the power efficiencies and saves the energy consumption. This suggests that, in addition to performance benefits, heterogeneous systems may have certain advantages on overall power efficiency

    Efficient Knowledge Extraction from Structured Data

    Get PDF
    Knowledge extraction from structured data aims for identifying valid, novel, potentially useful, and ultimately understandable patterns in the data. The core step of this process is the application of a data mining algorithm in order to produce an enumeration of particular patterns and relationships in large databases. Clustering is one of the major data mining tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. In this thesis, we advance the state-of-the-art data mining algorithms for analyzing structured data types. We describe the development of innovative solutions for hierarchical data mining. The EM-based hierarchical clustering method ITCH (Information-Theoretic Cluster Hierarchies) is designed to propose solid solutions for four different challenges. (1) to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. (2) to represent each cluster content in the hierarchy by an intuitive description with e.g. a probability density function. (3) to consistently handle outliers. (4) to avoid difficult parameter settings. ITCH is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL). Interpreting the hierarchical cluster structure as a statistical model of the dataset, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate induces a natural objective function for clustering, which automatically satisfies all four above mentioned goals. The genetic-based hierarchical clustering algorithm GACH (Genetic Algorithm for finding Cluster Hierarchies) overcomes the problem of getting stuck in a local optimum by a beneficial combination of genetic algorithms, information theory and model-based clustering. Besides hierarchical data mining, we also made contributions to more complex data structures, namely objects that consist of mixed type attributes and skyline objects. The algorithm INTEGRATE performs integrative mining of heterogeneous data, which is one of the major challenges in the next decade, by a unified view on numerical and categorical information in clustering. Once more, supported by the MDL principle, INTEGRATE guarantees the usability on real world data. For skyline objects we developed SkyDist, a similarity measure for comparing different skyline objects, which is therefore a first step towards performing data mining on this kind of data structure. Applied in a recommender system, for example SkyDist can be used for pointing the user to alternative car types, exhibiting a similar price/mileage behavior like in his original query. For mining graph-structured data, we developed different approaches that have the ability to detect patterns in static as well as in dynamic networks. We confirmed the practical feasibility of our novel approaches on large real-world case studies ranging from medical brain data to biological yeast networks. In the second part of this thesis, we focused on boosting the knowledge extraction process. We achieved this objective by an intelligent adoption of Graphics Processing Units (GPUs). The GPUs have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphics tasks but can also be used for general numeric and symbolic computations. As major advantage, GPUs provide extreme parallelism combined with a high bandwidth in memory transfer at low cost. In this thesis, we propose algorithms for computationally expensive data mining tasks like similarity search and different clustering paradigms which are designed for the highly parallel environment of a GPU, called CUDA-DClust and CUDA-k-means. We define a multi-dimensional index structure which is particularly suited to support similarity queries under the restricted programming model of a GPU. We demonstrate the superiority of our algorithms running on GPU over their conventional counterparts on CPU in terms of efficiency

    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    Get PDF
    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció
    • …
    corecore