Search CORE

182 research outputs found

Accelerating Scientific Computing Models Using GPU Processing

Author: Flagg Raymond F, III
Publication venue: DigitalCommons@UMaine
Publication date: 21/08/2015
Field of study

GPGPUs offer significant computational power for programmers to leverage. This computational power is especially useful when utilized for accelerating scientific models. This thesis analyzes the utilization of GPGPU programming to accelerate scientific computing models. First the construction of hardware for visualization and computation of scientific models is discussed. Several factors in the construction of the machines focus on the performance impacts related to scientific modeling. Image processing is an embarrassingly parallel problem well suited for GPGPU acceleration. An image processing library was developed to show the processes of recognizing embarrassingly parallel problems and serves as an excellent example of converting from a serial CPU implementation to a GPU accelerated implementation. Genetic algorithms are biologically inspired heuristic search algorithms based on natural selection. The Tetris genetic algorithm with A* pathfinding discusses memory bound limitations that can prevent direct algorithm conversions from the CPU to the GPU. An analysis of an existing landscape evolution model, CHILD, for GPU acceleration explores that even when a model shows promise for GPU acceleration, the underlying data structures can have a significant impact upon that ability to move to a GPU implementation. CHILD also offers an example of creating tighter MATLAB integration between existing models. Lastly, a parallel spatial sorting algorithm is discussed as a possible replacement for current spatial sorting algorithms implemented in models such as smoothed particle hydrodynamics

University of Maine

Efficient Implementation of Parallel Path Planning Algorithms on GPUs

Author: . Andreas Schäfer
. Dietmar Fey
. Michael Schmidt
. Ralf Seidler
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 13/09/2014
Field of study

In robot systems several computationally intensivetasks can be found, with path planning being one of them.Especially in dynamically changing environments, it is difficult tomeet real-time constraints with a serial processing approach. Forthose systems using standard computers, a promising option is toemploy a GPGPU as a coprocessor in order to offload those taskswhich can be efficiently parallelized. We implemented selectedparallel path planning algorithms on NVIDIA's CUDA platformand were able to accelerate all of these algorithms efficientlycompared to a multi-core implementation. We present the resultsand more detailed information about the implementation of thesealgorithms

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)

Evaluation of Directive-Based GPU Programming Models on a Block Eigensolver with Consideration of Large Sparse Matrices

Author: A Dziekonski
AV Knyazev
AV Knyazev
C Yang
G Ortega
JW Choi
M Knap
M Shao
P Maris
P Maris
X Yang
Y Wang
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Achieving high performance and performance portability for large-scale scientific applications is a major challenge on heterogeneous computing systems such as many-core CPUs and accelerators like GPUs. In this work, we implement a widely used block eigensolver, Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG), using two popular directive based programming models (OpenMP and OpenACC) for GPU-accelerated systems. Our work differs from existing work in that it adopts a holistic approach that optimizes the full solver performance rather than narrowing the problem into small kernels (e.g., SpMM, SpMV). Our LOPBCG GPU implementation achieves a 2.8

{\times }

–4.3

{\times }

speedup over an optimized CPU implementation when tested with four different input matrices. The evaluated configuration compared one Skylake CPU to one Skylake CPU and one NVIDIA V100 GPU. Our OpenMP and OpenACC LOBPCG GPU implementations gave nearly identical performance. We also consider how to create an efficient LOBPCG solver that can solve problems larger than GPU memory capacity. To this end, we create microbenchmarks representing the two dominant kernels (inner product and SpMM kernel) in LOBPCG and then evaluate performance when using two different programming approaches: tiling the kernels, and using Unified Memory with the original kernels. Our tiled SpMM implementation achieves a 2.9

{\times }

and 48.2

{\times }

speedup over the Unified Memory implementation on supercomputers with PCIe Gen3 and NVLink 2.0 CPU to GPU interconnects, respectively

Crossref

eScholarship - University of California

Dynamic Search on the GPU

Author: Badler Norman I
Boatright Cory D
Garcia Francisco
Kapadia Mubbasir
Publication venue: ScholarlyCommons
Publication date: 01/11/2013
Field of study

Path finding is a fundamental, yet computationally expensive problem in robotics navigation. Often times, it is necessary to sacrifice optimality to find a feasible plan given a time constraint due to the search complexity. Dynamic environments may further invalidate current computed plans, requiring an efficient planning strategy that can repair existing solutions. This paper presents a massively parallelized wavefront-based approach to path planning, running on the GPU, that can efficiently repair plans to accommodate world changes and agent movement, without having to restart the wavefront propagation process. In addition, we introduce a termination condition which ensures the minimum number of GPU iterations while maintaining strict optimality constraints on search graphs with non-uniform costs

ScholarlyCommons@Penn

Toward human-like pathfinding with hierarchical approaches and the GPS of the brain theory

Author: Rahmani Vahid
Publication venue: Universitat Politècnica de Catalunya
Publication date: 18/11/2020
Field of study

Pathfinding for autonomous agents and robots has been traditionally driven by finding optimal paths. Where typically optimality means finding the shortest path between two points in a given environment. However, optimality may not always be strictly necessary. For example, in the case of video games, often computing the paths for non-player characters (NPC) must be done under strict time constraints to guarantee real time simulation. In those cases, performance is more important than finding the shortest path, specially because often a sub-optimal path can be just as convincing from the point of view of the player. When simulating virtual humanoids, pathfinding has also been used with the same goal: finding the shortest path. However, humans very rarely follow precise shortest paths, and thus there are other aspects of human decision making and path planning strategies that should be incorporated in current simulation models. In this thesis we first focus on improving performance optimallity to handle as many virtual agents as possible, and then introduce neuroscience research to propose pathfinding algorithms that attempt to mimic humans in a more realistic manner.In the case of simulating NPCs for video games, one of the main challenges is to compute paths as efficiently as possible for groups of agents. As both the size of the environments and the number of autonomous agents increase, it becomes harder to obtain results in real time under the constraints of memory and computing resources. For this purpose we explored hierarchical approaches for two reasons: (1) they have shown important performance improvements for regular grids and other abstract problems, and (2) humans tend to plan trajectories also following an topbottom abstraction, focusing first on high level location and then refining the path as they move between those high level locations. Therefore, we believe that hierarchical approaches combine the best of our two goals: improving performance for multi-agent pathfinding and achieving more human-like pathfinding. Hierarchical approaches, such as HNA* (Hierarchical A* for Navigation Meshes) can compute paths more efficiently, although only for certain configurations of the hierarchy. For other configurations, the method suffers from a bottleneck in the step that connects the Start and Goal positions with the hierarchy. This bottleneck can drop performance drastically.In this thesis we present different approaches to solve the HNA* bottleneck and thus obtain a performance boost for all hierarchical configurations. The first method relies on further memory storage, and the second one uses parallelism on the GPU. Our comparative evaluation shows that both approaches offer speed-ups as high as 9x faster than A*, and show no limitations based on hierarchical configuration. Then we further exploit the potential of CUDA parallelism, to extend our implementation to HNA* for multi-agent path finding. Our method can now compute paths for over 500K agents simultaneously in real-time, with speed-ups above 15x faster than a parallel multi-agent implementation using A*. We then focus on studying neurosience research to learn about the way that humans build mental maps, in order to propose novel algorithms that take those finding into account when simulating virtual humans. We propose a novel algorithm for path finding that is inspired by neuroscience research on how the brain learns and builds cognitive maps. Our method represents the space as a hexagonal grid, based on the GPS of the brain theory, and fires memory cells as counters. Our path finder then combines a method for exploring unknown environments while building such a cognitive map, with an A* search using a modified heuristic that takes into account the GPS of the brain cognitive map.El problema de Pathfinding para agentes autónomos o robots, ha consistido tradicionalmente en encontrar un camino óptimo, donde por óptimo se entiende el camino de distancia mínima entre dos posiciones de un entorno. Sin embargo, en ocasiones puede que no sea estrictamente necesario encontrar una solución óptima. Para ofrecer la simulación de multitudes de agentes autónomos moviéndose en tiempo real, es necesario calcular sus trayectorias bajo condiciones estrictas de tiempo de computación, pero no es fundamental que las soluciones sean las de distancia mínima ya que, con frecuencia, el observador no apreciará la diferencia entre un camino óptimo y un sub-óptimo. Por tanto, suele ser suficiente con que la solución encontrada sea visualmente creíble para el observado. Cuando se simulan humanoides virtuales en aplicaciones de realidad virtual que requieren avatares que simulen el comportamiento de los humanos, se tiende a emplear los mismos algoritmos que en video juegos, con el objetivo de encontrar caminos de distancia mínima. Pero si realmente queremos que los avatares imiten el comportamiento humano, tenemos que tener en cuenta que, en el mundo real, los humanos rara vez seguimos precisamente el camino más corto, y por tanto se deben considerar otros aspectos que influyen en la toma de decisiones de los humanos y la selección de rutas en el mundo real. En esta tesis nos centraremos primero en mejorar el rendimiento de la búsqueda de caminos para poder simular grandes números de humanoides virtuales autónomos, y a continuación introduciremos conceptos de investigación con mamíferos en neurociencia, para proponer soluciones al problema de pathfinding que intenten imitar con mayor realismo, el modo en el que los humanos navegan el entorno que les rodea. A medida que aumentan tanto el tamaño de los entornos virtuales como el número de agentes autónomos, resulta más difícil obtener soluciones en tiempo real, debido a las limitaciones de memoria y recursos informáticos. Para resolver este problema, en esta tesis exploramos enfoques jerárquicos porque consideramos que combinan dos objetivos fundamentales: mejorar el rendimiento en la búsqueda de caminos para multitudes de agentes y lograr una búsqueda de caminos similar a la de los humanos. El primer método presentado en esta tesis se basa en mejorar el rendimiento del algoritmo HNA* (Hierarchical A* for Navigation Meshes) incrementando almacenamiento de datos en memoria, y el segundo utiliza el paralelismo para mejorar drásticamente el rendimiento. La evaluación cuantitativa realizada en esta tesis, muestra que ambos enfoques ofrecen aceleraciones que pueden llegar a ser hasta 9 veces más rápidas que el algoritmo A* y no presentan limitaciones debidas a la configuración jerárquica. A continuación, aprovechamos aún más el potencial del paralelismo ofrecido por CUDA para extender nuestra implementación de HNA* a sistemas multi-agentes. Nuestro método permite calcular caminos simultáneamente y en tiempo real para más de 500.000 agentes, con una aceleración superior a 15 veces la obtenida por una implementación paralela del algoritmo A*. Por último, en esta tesis nos hemos centrado en estudiar los últimos avances realizados en el ámbito de la neurociencia, para comprender la manera en la que los humanos construyen mapas mentales y poder así proponer nuevos algoritmos que imiten de forma más real el modo en el que navegamos los humanos. Nuestro método representa el espacio como una red hexagonal, basada en la distribución de ¿place cells¿ existente en el cerebro, e imita las activaciones neuronales como contadores en dichas celdas. Nuestro buscador de rutas combina un método para explorar entornos desconocidos mientras construye un mapa cognitivo hexagonal, utilizando una búsqueda A* con una nueva heurística adaptada al mapa cognitivo del cerebro y sus contadores

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Crowd simulation and visualization

Author: Pérez Hugo
Publication venue: Universitat Politècnica de Catalunya
Publication date: 09/05/2019
Field of study

Large-scale simulation and visualization are essential topics in areas as different as sociology, physics, urbanism, training, entertainment among others. This kind of systems requires a vast computational power and memory resources commonly available in High Performance Computing HPC platforms. Currently, the most potent clusters have heterogeneous architectures with hundreds of thousands and even millions of cores. The industry trends inferred that exascale clusters would have thousands of millions. The technical challenges for simulation and visualization process in the exascale era are intertwined with difficulties in other areas of research, including storage, communication, programming models and hardware. For this reason, it is necessary prototyping, testing, and deployment a variety of approaches to address the technical challenges identified and evaluate the advantages and disadvantages of each proposed solution. The focus of this research is interactive large-scale crowd simulation and visualization. To exploit to the maximum the capacity of the current HPC infrastructure and be prepared to take advantage of the next generation. The project develops a new approach to scale crowd simulation and visualization on heterogeneous computing cluster using a task-based technique. Its main characteristic is hardware agnostic. It abstracts the difficulties that imply the use of heterogeneous architectures like memory management, scheduling, communications, and synchronization — facilitating development, maintenance, and scalability. With the goal of flexibility and take advantage of computing resources as best as possible, the project explores different configurations to connect the simulation with the visualization engine. This kind of system has an essential use in emergencies. Therefore, urban scenes were implemented as realistic as possible; in this way, users will be ready to face real events. Path planning for large-scale crowds is a challenge to solve, due to the inherent dynamism in the scenes and vast search space. A new path-finding algorithm was developed. It has a hierarchical approach which offers different advantages: it divides the search space reducing the problem complexity, it can obtain a partial path instead of wait for the complete one, which allows a character to start moving and compute the rest asynchronously. It can reprocess only a part if necessary with different levels of abstraction. A case study is presented for a crowd simulation in urban scenarios. Geolocated data are used, they were produced by mobile devices to predict individual and crowd behavior and detect abnormal situations in the presence of specific events. It was also address the challenge of combining all these individual’s location with a 3D rendering of the urban environment. The data processing and simulation approach are computationally expensive and time-critical, it relies thus on a hybrid Cloud-HPC architecture to produce an efficient solution. Within the project, new models of behavior based on data analytics were developed. It was developed the infrastructure to be able to consult various data sources such as social networks, government agencies or transport companies such as Uber. Every time there is more geolocation data available and better computation resources which allow performing analysis of greater depth, this lays the foundations to improve the simulation models of current crowds. The use of simulations and their visualization allows to observe and organize the crowds in real time. The analysis before, during and after daily mass events can reduce the risks and associated logistics costs.La simulación y visualización a gran escala son temas esenciales en áreas tan diferentes como la sociología, la física, el urbanismo, la capacitación, el entretenimiento, entre otros. Este tipo de sistemas requiere una gran capacidad de cómputo y recursos de memoria comúnmente disponibles en las plataformas de computo de alto rendimiento. Actualmente, los equipos más potentes tienen arquitecturas heterogéneas con cientos de miles e incluso millones de núcleos. Las tendencias de la industria infieren que los equipos en la era exascale tendran miles de millones. Los desafíos técnicos en el proceso de simulación y visualización en la era exascale se entrelazan con dificultades en otras áreas de investigación, incluidos almacenamiento, comunicación, modelos de programación y hardware. Por esta razón, es necesario crear prototipos, probar y desplegar una variedad de enfoques para abordar los desafíos técnicos identificados y evaluar las ventajas y desventajas de cada solución propuesta. El foco de esta investigación es la visualización y simulación interactiva de multitudes a gran escala. Aprovechar al máximo la capacidad de la infraestructura actual y estar preparado para aprovechar la próxima generación. El proyecto desarrolla un nuevo enfoque para escalar la simulación y visualización de multitudes en un clúster de computo heterogéneo utilizando una técnica basada en tareas. Su principal característica es que es hardware agnóstico. Abstrae las dificultades que implican el uso de arquitecturas heterogéneas como la administración de memoria, las comunicaciones y la sincronización, lo que facilita el desarrollo, el mantenimiento y la escalabilidad. Con el objetivo de flexibilizar y aprovechar los recursos informáticos lo mejor posible, el proyecto explora diferentes configuraciones para conectar la simulación con el motor de visualización. Este tipo de sistemas tienen un uso esencial en emergencias. Por lo tanto, se implementaron escenas urbanas lo más realistas posible, de esta manera los usuarios estarán listos para enfrentar eventos reales. La planificación de caminos para multitudes a gran escala es un desafío a resolver, debido al dinamismo inherente en las escenas y el vasto espacio de búsqueda. Se desarrolló un nuevo algoritmo de búsqueda de caminos. Tiene un enfoque jerárquico que ofrece diferentes ventajas: divide el espacio de búsqueda reduciendo la complejidad del problema, puede obtener una ruta parcial en lugar de esperar a la completa, lo que permite que un personaje comience a moverse y calcule el resto de forma asíncrona, puede reprocesar solo una parte si es necesario con diferentes niveles de abstracción. Se presenta un caso de estudio para una simulación de multitud en escenarios urbanos. Se utilizan datos geolocalizados producidos por dispositivos móviles para predecir el comportamiento individual y público y detectar situaciones anormales en presencia de eventos específicos. También se aborda el desafío de combinar la ubicación de todos estos individuos con una representación 3D del entorno urbano. Dentro del proyecto, se desarrollaron nuevos modelos de comportamiento basados ¿¿en el análisis de datos. Se creo la infraestructura para poder consultar varias fuentes de datos como redes sociales, agencias gubernamentales o empresas de transporte como Uber. Cada vez hay más datos de geolocalización disponibles y mejores recursos de cómputo que permiten realizar un análisis de mayor profundidad, esto sienta las bases para mejorar los modelos de simulación de las multitudes actuales. El uso de simulaciones y su visualización permite observar y organizar las multitudes en tiempo real. El análisis antes, durante y después de eventos multitudinarios diarios puede reducir los riesgos y los costos logísticos asociadosPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Survey Report on Multi GPU Implementation in Open Source CFD Solver

Author: Suraj Salvi
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2016
Field of study

A CFD (Computational Fluid Dynamics) solver is used for simulation of fluid flow and heat transfer. It helps in analyzing the various parameters in fluid flow of scientific applications. OpenFOAM (Open Field Operation and Manipulation) [14] is an open source Software which can be used to solve Computational Fluid Dynamics (CFD) problems. The OpenFOAM solves various equations e.g. Mass Conservation, Conjugate Heat Transformer. OpenFOAM solves CFD problems serially as well as in parallel. The motivation of the project is to reduce time for solving CFD problems by implementing it on GPU. The OpenFOAM a software contribution which (Open Field Operation and Manipulation) is open source Tool. OpenFOAM has a many of features to get answer to anything from complex liquid (or gas) moves getting mixed in trouble chemical reactions, turbulence and heat give property in law, to solid driving power and electromagnetic. Existing OpenFOAM executes serially, by introducing parallelism, time required for execution of OpenFOAM can be reduced. For introducing parallelism, System requires CUDA (Compute Unified Device Architecture) enabled CPU (Central Processing Unit). To achieve parallelism, openFOAM requires CUDA support and support is achieved by of gpu library. Library is used to introduce parallelism for single GPU and multigpu can improve performance of OpenFOAM

International Journal on Recent and Innovation Trends in Computing and Communication

Device specialization in heterogeneous multi-GPU environments

Author: Cisternino Antonio
Cocco Gabriele
Publication venue: OASIcs - OpenAccess Series in Informatics. 2012 Imperial College Computing Student Workshop
Publication date: 01/01/2012
Field of study

In the last few years there have been many activities towards coupling CPUs and GPUs in order to get the most from CPU-GPU heterogeneous systems. One of the main problems that prevent these systems to be exploited in a device-aware manner is the CPU-GPU communication bottleneck, which often doesn\u27t allow to produce code more efficient than the GPU-only and the CPU-only counterparts. As a consequence, most of the heterogeneous scheduling systems treat CPUs and GPUs as homogeneous nodes, electing map-like data partitioning to employ both these processing resources. We propose to study how the radical change in the connection between GPU, CPU and memory characterizing the APUs (Accelerated Processing Units) affect the architecture of a compiler and if it is possible to use all these computing resources in a device-aware manner. We investigate on a methodology to analyze the devices that populate heterogeneous multi-GPU systems and to classify general purpose algorithms in order to perform near-optimal control flow and data partitioning

Dagstuhl Research Online Publication Server