375 research outputs found
Parallel Genetic Algorithms with GPU Computing
Genetic algorithms (GAs) are powerful solutions to optimization problems arising from manufacturing and logistic fields. It helps to find better solutions for complex and difficult cases, which are hard to be solved by using strict optimization methods. Accelerating parallel GAs with GPU computing have received significant attention from both practitioners and researchers, ever since the emergence of GPU-CPU heterogeneous architectures. Designing a parallel algorithm on GPU is different fundamentally from designing one on CPU. On CPU architecture, typically data or tasks are distributed across tens of threads or processes, while on GPU architecture, more than hundreds of thousands of threads run. In order to fully utilize the computing power of GPUs, the design approaches and implementation strategies of parallel GAs should be re-probed. In the chapter, a concise overview of parallel GAs on GPU is given from the perspective of GPU architecture. The concept of parallelism granularity is redefined, the aspect of data layout is discussed on how it will affect the kernel performance, and the hierarchy of threads is examined on how threads are organized in the grid and blocks to expose sufficient parallelism to GPU. Some future research is discussed. A hybrid parallel model, based on the feature of GPU architecture, is suggested to build up efficient parallel GAs for hyper-scale problems
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
Implementing genetic algorithms to CUDA environment using data parallelization
Računarske metode rješavanja paralelnih problema korištenjem grafičkih obradnih jedinica (GPUs) zadnjih su godina pobudile veliki interes. Paralelno izračunavanje može se primijeniti na genetske algoritme (GAs) u odnosu na proces evaluacije jedinki u populaciji. Ovaj rad opisuje još jednu metodu primjene GAs na CUDA okruženje gdje je CUDA računarsko okruženje opće namjene za GPUs koje daje NVIDIA. Osnovna karakteristika ovog istraživanja leži u tome da se paralelna obrada koristi ne samo za jedinke nego i za gene u jedinki. Predložena implementacija se procjenjuje kroz osam ispitnih funkcija. Ustanovili smo da predložena metoda implementacije daje 7,6-18,4 puta brže rezultate od onih kod primjene CPU.Computation methods of parallel problem solving using graphic processing units (GPUs) have attracted much research interests in recent years. Parallel computation can be applied to genetic algorithms (GAs) in terms of the evaluation process of individuals in a population. This paper describes yet another implementation method of GAs to the CUDA environment where CUDA is a general-purpose computation environment for GPUs provided by NVIDIA. The major characteristic point of this study is that the parallel processing is adopted not only for individuals but also for the genes in an individual. The proposed implementation is evaluated through eight test functions. We found that the proposed implementation method yields 7,6-18,4 times faster results than those of a CPU implementation
Implementing genetic algorithms to CUDA environment using data parallelization
Računarske metode rješavanja paralelnih problema korištenjem grafičkih obradnih jedinica (GPUs) zadnjih su godina pobudile veliki interes. Paralelno izračunavanje može se primijeniti na genetske algoritme (GAs) u odnosu na proces evaluacije jedinki u populaciji. Ovaj rad opisuje još jednu metodu primjene GAs na CUDA okruženje gdje je CUDA računarsko okruženje opće namjene za GPUs koje daje NVIDIA. Osnovna karakteristika ovog istraživanja leži u tome da se paralelna obrada koristi ne samo za jedinke nego i za gene u jedinki. Predložena implementacija se procjenjuje kroz osam ispitnih funkcija. Ustanovili smo da predložena metoda implementacije daje 7,6-18,4 puta brže rezultate od onih kod primjene CPU.Computation methods of parallel problem solving using graphic processing units (GPUs) have attracted much research interests in recent years. Parallel computation can be applied to genetic algorithms (GAs) in terms of the evaluation process of individuals in a population. This paper describes yet another implementation method of GAs to the CUDA environment where CUDA is a general-purpose computation environment for GPUs provided by NVIDIA. The major characteristic point of this study is that the parallel processing is adopted not only for individuals but also for the genes in an individual. The proposed implementation is evaluated through eight test functions. We found that the proposed implementation method yields 7,6-18,4 times faster results than those of a CPU implementation
Agent-based modelling and Swarm Intelligence in systems engineering
El objetivo de la tesis doctoral es evaluar la utilidad de las técnicas Modelado Basado en Agentes, algoritmos de optimización Swarm Intelligence y programación paralela sobre tarjeta gráfica en el campo de la Ingeniería de Sistemas y Automática.
Se ha realizado un revisión bibliográfica y desarrollado un marco de desarrollo de la técnica de Modelado Basado en Agentes. Esta técnica se ha empleado para realizar un modelo de un reactor de fangos activados (que se engloba dentro del proceso de depuración de aguas residuales).
Se ha desarrollado una notación complementaria para la descripción de modelos basados en agentes desde el punto de vista de la ingeniería de sistemas.
Se ha presentado asimismo un algoritmo de optimización basado en agentes bajo la filosofía Swarm Intelligence.
Se han trabajado con las técnicas de paralelización sobre tarjeta gráfica para reducir los tiempos de simulación de modelos y algoritmos.
Se trata por lo tanto de un tesis de integración de varias tecnologías.Departamento de Ingeniería de Sistemas y Automátic
A Performance/Cost Model for a CUDA Drug Discovery Application on Physical and Public Cloud Infrastructures
Virtual Screening (VS) methods can considerably aid drug discovery research, predicting how ligands interact with drug targets. BINDSURF is an efficient and fast blind VS methodology for the determination of protein binding sites, depending on the ligand, using the massively parallel architecture of graphics processing units(GPUs) for fast unbiased prescreening of large ligand databases. In this contribution, we provide a performance/cost model for the execution of this application on both local system and public cloud infrastructures. With our model, it is possible to determine which is the best infrastructure to use in terms of execution time and costs for any given problem to be solved by BINDSURF. Conclusions obtained from our study can be extrapolated to other GPU‐based VS methodologiesIngeniería, Industria y Construcció
HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU
The end of Dennard scaling and the slowdown of Moore's law led to a shift in
technology trends toward parallel architectures, particularly in HPC systems.
To continue providing performance benefits, HPC should embrace Approximate
Computing (AC), which trades application quality loss for improved performance.
However, existing AC techniques have not been extensively applied and evaluated
in state-of-the-art hardware architectures such as GPUs, the primary execution
vehicle for HPC applications today.
This paper presents HPAC-Offload, a pragma-based programming model that
extends OpenMP offload applications to support AC techniques, allowing portable
approximations across different GPU architectures. We conduct a comprehensive
performance analysis of HPAC-Offload across GPU-accelerated HPC applications,
revealing that AC techniques can significantly accelerate HPC applications
(1.64x LULESH on AMD, 1.57x NVIDIA) with minimal quality loss (0.1%). Our
analysis offers deep insights into the performance of GPU-based AC that guide
the future development of AC algorithms and systems for these architectures.Comment: 12 pages, 12 pages. Accepted at SC2
Natural ventilation design attributes application effect on, indoor natural ventilation performance of a double storey, single unit residential building
In establishing a good indoor thermal condition, air movement is one of the important parameter to be considered to provide indoor fresh air for occupants. Due to the public awareness on environment impact, people has been increasingly attentive to passive design in achieving good condition of indoor building ventilation. Throughout case studies, significant building attributes were found giving effect on building indoor natural ventilation performance. The studies were categorized under vernacular houses, contemporary houses with vernacular element and contemporary houses. The indoor air movement of every each spaces in the houses were compared with the outdoor air movement surrounding the houses to indicate the space’s indoor natural ventilation performance. Analysis found the wind catcher element appears to be the most significant attribute to contribute most to indoor natural ventilation. Wide opening was also found to be significant especially those with louvers. Whereas it is also interesting to find indoor layout design is also significantly giving impact on the performance. The finding indicates that a good indoor natural ventilation is not only dictated by having proper openings at proper location of a building, but also on how the incoming air movement is managed throughout the interior spaces by proper layout. Understanding on the air pressure distribution caused by indoor windward and leeward side is important in directing the air flow to desired spaces in producing an overall good indoor natural ventilation performance
Integrative multicellular biological modeling: a case study of 3D epidermal development using GPU algorithms
<p>Abstract</p> <p>Background</p> <p>Simulation of sophisticated biological models requires considerable computational power. These models typically integrate together numerous biological phenomena such as spatially-explicit heterogeneous cells, cell-cell interactions, cell-environment interactions and intracellular gene networks. The recent advent of programming for graphical processing units (GPU) opens up the possibility of developing more integrative, detailed and predictive biological models while at the same time decreasing the computational cost to simulate those models.</p> <p>Results</p> <p>We construct a 3D model of epidermal development and provide a set of GPU algorithms that executes significantly faster than sequential central processing unit (CPU) code. We provide a parallel implementation of the subcellular element method for individual cells residing in a lattice-free spatial environment. Each cell in our epidermal model includes an internal gene network, which integrates cellular interaction of Notch signaling together with environmental interaction of basement membrane adhesion, to specify cellular state and behaviors such as growth and division. We take a pedagogical approach to describing how modeling methods are efficiently implemented on the GPU including memory layout of data structures and functional decomposition. We discuss various programmatic issues and provide a set of design guidelines for GPU programming that are instructive to avoid common pitfalls as well as to extract performance from the GPU architecture.</p> <p>Conclusions</p> <p>We demonstrate that GPU algorithms represent a significant technological advance for the simulation of complex biological models. We further demonstrate with our epidermal model that the integration of multiple complex modeling methods for heterogeneous multicellular biological processes is both feasible and computationally tractable using this new technology. We hope that the provided algorithms and source code will be a starting point for modelers to develop their own GPU implementations, and encourage others to implement their modeling methods on the GPU and to make that code available to the wider community.</p
- …