1,446 research outputs found

    Crowd simulation and visualization

    Get PDF
    Large-scale simulation and visualization are essential topics in areas as different as sociology, physics, urbanism, training, entertainment among others. This kind of systems requires a vast computational power and memory resources commonly available in High Performance Computing HPC platforms. Currently, the most potent clusters have heterogeneous architectures with hundreds of thousands and even millions of cores. The industry trends inferred that exascale clusters would have thousands of millions. The technical challenges for simulation and visualization process in the exascale era are intertwined with difficulties in other areas of research, including storage, communication, programming models and hardware. For this reason, it is necessary prototyping, testing, and deployment a variety of approaches to address the technical challenges identified and evaluate the advantages and disadvantages of each proposed solution. The focus of this research is interactive large-scale crowd simulation and visualization. To exploit to the maximum the capacity of the current HPC infrastructure and be prepared to take advantage of the next generation. The project develops a new approach to scale crowd simulation and visualization on heterogeneous computing cluster using a task-based technique. Its main characteristic is hardware agnostic. It abstracts the difficulties that imply the use of heterogeneous architectures like memory management, scheduling, communications, and synchronization — facilitating development, maintenance, and scalability. With the goal of flexibility and take advantage of computing resources as best as possible, the project explores different configurations to connect the simulation with the visualization engine. This kind of system has an essential use in emergencies. Therefore, urban scenes were implemented as realistic as possible; in this way, users will be ready to face real events. Path planning for large-scale crowds is a challenge to solve, due to the inherent dynamism in the scenes and vast search space. A new path-finding algorithm was developed. It has a hierarchical approach which offers different advantages: it divides the search space reducing the problem complexity, it can obtain a partial path instead of wait for the complete one, which allows a character to start moving and compute the rest asynchronously. It can reprocess only a part if necessary with different levels of abstraction. A case study is presented for a crowd simulation in urban scenarios. Geolocated data are used, they were produced by mobile devices to predict individual and crowd behavior and detect abnormal situations in the presence of specific events. It was also address the challenge of combining all these individual’s location with a 3D rendering of the urban environment. The data processing and simulation approach are computationally expensive and time-critical, it relies thus on a hybrid Cloud-HPC architecture to produce an efficient solution. Within the project, new models of behavior based on data analytics were developed. It was developed the infrastructure to be able to consult various data sources such as social networks, government agencies or transport companies such as Uber. Every time there is more geolocation data available and better computation resources which allow performing analysis of greater depth, this lays the foundations to improve the simulation models of current crowds. The use of simulations and their visualization allows to observe and organize the crowds in real time. The analysis before, during and after daily mass events can reduce the risks and associated logistics costs.La simulación y visualización a gran escala son temas esenciales en áreas tan diferentes como la sociología, la física, el urbanismo, la capacitación, el entretenimiento, entre otros. Este tipo de sistemas requiere una gran capacidad de cómputo y recursos de memoria comúnmente disponibles en las plataformas de computo de alto rendimiento. Actualmente, los equipos más potentes tienen arquitecturas heterogéneas con cientos de miles e incluso millones de núcleos. Las tendencias de la industria infieren que los equipos en la era exascale tendran miles de millones. Los desafíos técnicos en el proceso de simulación y visualización en la era exascale se entrelazan con dificultades en otras áreas de investigación, incluidos almacenamiento, comunicación, modelos de programación y hardware. Por esta razón, es necesario crear prototipos, probar y desplegar una variedad de enfoques para abordar los desafíos técnicos identificados y evaluar las ventajas y desventajas de cada solución propuesta. El foco de esta investigación es la visualización y simulación interactiva de multitudes a gran escala. Aprovechar al máximo la capacidad de la infraestructura actual y estar preparado para aprovechar la próxima generación. El proyecto desarrolla un nuevo enfoque para escalar la simulación y visualización de multitudes en un clúster de computo heterogéneo utilizando una técnica basada en tareas. Su principal característica es que es hardware agnóstico. Abstrae las dificultades que implican el uso de arquitecturas heterogéneas como la administración de memoria, las comunicaciones y la sincronización, lo que facilita el desarrollo, el mantenimiento y la escalabilidad. Con el objetivo de flexibilizar y aprovechar los recursos informáticos lo mejor posible, el proyecto explora diferentes configuraciones para conectar la simulación con el motor de visualización. Este tipo de sistemas tienen un uso esencial en emergencias. Por lo tanto, se implementaron escenas urbanas lo más realistas posible, de esta manera los usuarios estarán listos para enfrentar eventos reales. La planificación de caminos para multitudes a gran escala es un desafío a resolver, debido al dinamismo inherente en las escenas y el vasto espacio de búsqueda. Se desarrolló un nuevo algoritmo de búsqueda de caminos. Tiene un enfoque jerárquico que ofrece diferentes ventajas: divide el espacio de búsqueda reduciendo la complejidad del problema, puede obtener una ruta parcial en lugar de esperar a la completa, lo que permite que un personaje comience a moverse y calcule el resto de forma asíncrona, puede reprocesar solo una parte si es necesario con diferentes niveles de abstracción. Se presenta un caso de estudio para una simulación de multitud en escenarios urbanos. Se utilizan datos geolocalizados producidos por dispositivos móviles para predecir el comportamiento individual y público y detectar situaciones anormales en presencia de eventos específicos. También se aborda el desafío de combinar la ubicación de todos estos individuos con una representación 3D del entorno urbano. Dentro del proyecto, se desarrollaron nuevos modelos de comportamiento basados ¿¿en el análisis de datos. Se creo la infraestructura para poder consultar varias fuentes de datos como redes sociales, agencias gubernamentales o empresas de transporte como Uber. Cada vez hay más datos de geolocalización disponibles y mejores recursos de cómputo que permiten realizar un análisis de mayor profundidad, esto sienta las bases para mejorar los modelos de simulación de las multitudes actuales. El uso de simulaciones y su visualización permite observar y organizar las multitudes en tiempo real. El análisis antes, durante y después de eventos multitudinarios diarios puede reducir los riesgos y los costos logísticos asociadosPostprint (published version

    OpenACC Based GPU Parallelization of Plane Sweep Algorithm for Geometric Intersection

    Get PDF
    Line segment intersection is one of the elementary operations in computational geometry. Complex problems in Geographic Information Systems (GIS) like finding map overlays or spatial joins using polygonal data require solving segment intersections. Plane sweep paradigm is used for finding geometric intersection in an efficient manner. However, it is difficult to parallelize due to its in-order processing of spatial events. We present a new fine-grained parallel algorithm for geometric intersection and its CPU and GPU implementation using OpenMP and OpenACC. To the best of our knowledge, this is the first work demonstrating an effective parallelization of plane sweep on GPUs. We chose compiler directive based approach for implementation because of its simplicity to parallelize sequential code. Using Nvidia Tesla P100 GPU, our implementation achieves around 40X speedup for line segment intersection problem on 40K and 80K data sets compared to sequential CGAL library

    Toward human-like pathfinding with hierarchical approaches and the GPS of the brain theory

    Get PDF
    Pathfinding for autonomous agents and robots has been traditionally driven by finding optimal paths. Where typically optimality means finding the shortest path between two points in a given environment. However, optimality may not always be strictly necessary. For example, in the case of video games, often computing the paths for non-player characters (NPC) must be done under strict time constraints to guarantee real time simulation. In those cases, performance is more important than finding the shortest path, specially because often a sub-optimal path can be just as convincing from the point of view of the player. When simulating virtual humanoids, pathfinding has also been used with the same goal: finding the shortest path. However, humans very rarely follow precise shortest paths, and thus there are other aspects of human decision making and path planning strategies that should be incorporated in current simulation models. In this thesis we first focus on improving performance optimallity to handle as many virtual agents as possible, and then introduce neuroscience research to propose pathfinding algorithms that attempt to mimic humans in a more realistic manner.In the case of simulating NPCs for video games, one of the main challenges is to compute paths as efficiently as possible for groups of agents. As both the size of the environments and the number of autonomous agents increase, it becomes harder to obtain results in real time under the constraints of memory and computing resources. For this purpose we explored hierarchical approaches for two reasons: (1) they have shown important performance improvements for regular grids and other abstract problems, and (2) humans tend to plan trajectories also following an topbottom abstraction, focusing first on high level location and then refining the path as they move between those high level locations. Therefore, we believe that hierarchical approaches combine the best of our two goals: improving performance for multi-agent pathfinding and achieving more human-like pathfinding. Hierarchical approaches, such as HNA* (Hierarchical A* for Navigation Meshes) can compute paths more efficiently, although only for certain configurations of the hierarchy. For other configurations, the method suffers from a bottleneck in the step that connects the Start and Goal positions with the hierarchy. This bottleneck can drop performance drastically.In this thesis we present different approaches to solve the HNA* bottleneck and thus obtain a performance boost for all hierarchical configurations. The first method relies on further memory storage, and the second one uses parallelism on the GPU. Our comparative evaluation shows that both approaches offer speed-ups as high as 9x faster than A*, and show no limitations based on hierarchical configuration. Then we further exploit the potential of CUDA parallelism, to extend our implementation to HNA* for multi-agent path finding. Our method can now compute paths for over 500K agents simultaneously in real-time, with speed-ups above 15x faster than a parallel multi-agent implementation using A*. We then focus on studying neurosience research to learn about the way that humans build mental maps, in order to propose novel algorithms that take those finding into account when simulating virtual humans. We propose a novel algorithm for path finding that is inspired by neuroscience research on how the brain learns and builds cognitive maps. Our method represents the space as a hexagonal grid, based on the GPS of the brain theory, and fires memory cells as counters. Our path finder then combines a method for exploring unknown environments while building such a cognitive map, with an A* search using a modified heuristic that takes into account the GPS of the brain cognitive map.El problema de Pathfinding para agentes autónomos o robots, ha consistido tradicionalmente en encontrar un camino óptimo, donde por óptimo se entiende el camino de distancia mínima entre dos posiciones de un entorno. Sin embargo, en ocasiones puede que no sea estrictamente necesario encontrar una solución óptima. Para ofrecer la simulación de multitudes de agentes autónomos moviéndose en tiempo real, es necesario calcular sus trayectorias bajo condiciones estrictas de tiempo de computación, pero no es fundamental que las soluciones sean las de distancia mínima ya que, con frecuencia, el observador no apreciará la diferencia entre un camino óptimo y un sub-óptimo. Por tanto, suele ser suficiente con que la solución encontrada sea visualmente creíble para el observado. Cuando se simulan humanoides virtuales en aplicaciones de realidad virtual que requieren avatares que simulen el comportamiento de los humanos, se tiende a emplear los mismos algoritmos que en video juegos, con el objetivo de encontrar caminos de distancia mínima. Pero si realmente queremos que los avatares imiten el comportamiento humano, tenemos que tener en cuenta que, en el mundo real, los humanos rara vez seguimos precisamente el camino más corto, y por tanto se deben considerar otros aspectos que influyen en la toma de decisiones de los humanos y la selección de rutas en el mundo real. En esta tesis nos centraremos primero en mejorar el rendimiento de la búsqueda de caminos para poder simular grandes números de humanoides virtuales autónomos, y a continuación introduciremos conceptos de investigación con mamíferos en neurociencia, para proponer soluciones al problema de pathfinding que intenten imitar con mayor realismo, el modo en el que los humanos navegan el entorno que les rodea. A medida que aumentan tanto el tamaño de los entornos virtuales como el número de agentes autónomos, resulta más difícil obtener soluciones en tiempo real, debido a las limitaciones de memoria y recursos informáticos. Para resolver este problema, en esta tesis exploramos enfoques jerárquicos porque consideramos que combinan dos objetivos fundamentales: mejorar el rendimiento en la búsqueda de caminos para multitudes de agentes y lograr una búsqueda de caminos similar a la de los humanos. El primer método presentado en esta tesis se basa en mejorar el rendimiento del algoritmo HNA* (Hierarchical A* for Navigation Meshes) incrementando almacenamiento de datos en memoria, y el segundo utiliza el paralelismo para mejorar drásticamente el rendimiento. La evaluación cuantitativa realizada en esta tesis, muestra que ambos enfoques ofrecen aceleraciones que pueden llegar a ser hasta 9 veces más rápidas que el algoritmo A* y no presentan limitaciones debidas a la configuración jerárquica. A continuación, aprovechamos aún más el potencial del paralelismo ofrecido por CUDA para extender nuestra implementación de HNA* a sistemas multi-agentes. Nuestro método permite calcular caminos simultáneamente y en tiempo real para más de 500.000 agentes, con una aceleración superior a 15 veces la obtenida por una implementación paralela del algoritmo A*. Por último, en esta tesis nos hemos centrado en estudiar los últimos avances realizados en el ámbito de la neurociencia, para comprender la manera en la que los humanos construyen mapas mentales y poder así proponer nuevos algoritmos que imiten de forma más real el modo en el que navegamos los humanos. Nuestro método representa el espacio como una red hexagonal, basada en la distribución de ¿place cells¿ existente en el cerebro, e imita las activaciones neuronales como contadores en dichas celdas. Nuestro buscador de rutas combina un método para explorar entornos desconocidos mientras construye un mapa cognitivo hexagonal, utilizando una búsqueda A* con una nueva heurística adaptada al mapa cognitivo del cerebro y sus contadores

    QuickCSG: Fast Arbitrary Boolean Combinations of N Solids

    Get PDF
    QuickCSG computes the result for general N-polyhedron boolean expressions without an intermediate tree of solids. We propose a vertex-centric view of the problem, which simplifies the identification of final geometric contributions, and facilitates its spatial decomposition. The problem is then cast in a single KD-tree exploration, geared toward the result by early pruning of any region of space not contributing to the final surface. We assume strong regularity properties on the input meshes and that they are in general position. This simplifying assumption, in combination with our vertex-centric approach, improves the speed of the approach. Complemented with a task-stealing parallelization, the algorithm achieves breakthrough performance, one to two orders of magnitude speedups with respect to state-of-the-art CPU algorithms, on boolean operations over two to dozens of polyhedra. The algorithm also outperforms GPU implementations with approximate discretizations, while producing an output without redundant facets. Despite the restrictive assumptions on the input, we show the usefulness of QuickCSG for applications with large CSG problems and strong temporal constraints, e.g. modeling for 3D printers, reconstruction from visual hulls and collision detection

    Autotuning for Automatic Parallelization on Heterogeneous Systems

    Get PDF

    Scalable and distributed constrained low rank approximations

    Get PDF
    Low rank approximation is the problem of finding two low rank factors W and H such that the rank(WH) << rank(A) and A ≈ WH. These low rank factors W and H can be constrained for meaningful physical interpretation and referred as Constrained Low Rank Approximation (CLRA). Like most of the constrained optimization problem, performing CLRA can be computationally expensive than its unconstrained counterpart. A widely used CLRA is the Non-negative Matrix Factorization (NMF) which enforces non-negativity constraints in each of its low rank factors W and H. In this thesis, I focus on scalable/distributed CLRA algorithms for constraints such as boundedness and non-negativity for large real world matrices that includes text, High Definition (HD) video, social networks and recommender systems. First, I begin with the Bounded Matrix Low Rank Approximation (BMA) which imposes a lower and an upper bound on every element of the lower rank matrix. BMA is more challenging than NMF as it imposes bounds on the product WH rather than on each of the low rank factors W and H. For very large input matrices, we extend our BMA algorithm to Block BMA that can scale to a large number of processors. In applications, such as HD video, where the input matrix to be factored is extremely large, distributed computation is inevitable and the network communication becomes a major performance bottleneck. Towards this end, we propose a novel distributed Communication Avoiding NMF (CANMF) algorithm that communicates only the right low rank factor to its neighboring machine. Finally, a general distributed HPC- NMF framework that uses HPC techniques in communication intensive NMF operations and suitable for broader class of NMF algorithms.Ph.D

    QuickCSG: Fast Arbitrary Boolean Combinations of N Solids

    Full text link
    QuickCSG computes the result for general N-polyhedron boolean expressions without an intermediate tree of solids. We propose a vertex-centric view of the problem, which simplifies the identification of final geometric contributions, and facilitates its spatial decomposition. The problem is then cast in a single KD-tree exploration, geared toward the result by early pruning of any region of space not contributing to the final surface. We assume strong regularity properties on the input meshes and that they are in general position. This simplifying assumption, in combination with our vertex-centric approach, improves the speed of the approach. Complemented with a task-stealing parallelization, the algorithm achieves breakthrough performance, one to two orders of magnitude speedups with respect to state-of-the-art CPU algorithms, on boolean operations over two to dozens of polyhedra. The algorithm also outperforms GPU implementations with approximate discretizations, while producing an output without redundant facets. Despite the restrictive assumptions on the input, we show the usefulness of QuickCSG for applications with large CSG problems and strong temporal constraints, e.g. modeling for 3D printers, reconstruction from visual hulls and collision detection
    • …
    corecore