358 research outputs found

    Accelerating kd-tree Searches for all k-nearest Neighbours

    Get PDF
    Finding the k nearest neighbours of each point in a point cloud forms an integral part of many point-cloud processing tasks. One common approach is to build a kd-tree over the points and then iteratively query the k nearest neighbors of each point. We introduce a simple modification to these queries to exploit the coherence between successive points; no changes are required to the kd-tree data structure. The path from the root to the appropriate leaf is updated incrementally, and backtracking is done bottom-up. We show that this can reduce the time to compute the neighbourhood graph of a 3D point cloud by over 10%, and by up to 24% when k = 1. The gains scale with the depth of the kd-tree, and the method is suitable for parallel implementation

    Accelerating kd-tree searches for all k-nearest neighbours

    Get PDF
    Finding the k nearest neighbours of each point in a point cloud forms an integral part of many point-cloud processing tasks. One common approach is to build a kd-tree over the points and then iteratively query the k nearest neighbors of each point. We introduce a simple modification to these queries to exploit the coherence between successive points; no changes are required to the kd-tree data structure. The path from the root to the appropriate leaf is updated incrementally, and backtracking is done bottom-up. We show that this can reduce the time to compute the neighbourhood graph of a 3D point cloud by over 10%, and by up to 24% when k = 1. The gains scale with the depth of the kd-tree, and the method is suitable for parallel implementation

    Accelerating Nearest Neighbor Search on Manycore Systems

    Full text link
    We develop methods for accelerating metric similarity search that are effective on modern hardware. Our algorithms factor into easily parallelizable components, making them simple to deploy and efficient on multicore CPUs and GPUs. Despite the simple structure of our algorithms, their search performance is provably sublinear in the size of the database, with a factor dependent only on its intrinsic dimensionality. We demonstrate that our methods provide substantial speedups on a range of datasets and hardware platforms. In particular, we present results on a 48-core server machine, on graphics hardware, and on a multicore desktop

    Hardware acceleration of photon mapping

    Get PDF
    PhD ThesisThe quest for realism in computer-generated graphics has yielded a range of algorithmic techniques, the most advanced of which are capable of rendering images at close to photorealistic quality. Due to the realism available, it is now commonplace that computer graphics are used in the creation of movie sequences, architectural renderings, medical imagery and product visualisations. This work concentrates on the photon mapping algorithm [1, 2], a physically based global illumination rendering algorithm. Photon mapping excels in producing highly realistic, physically accurate images. A drawback to photon mapping however is its rendering times, which can be significantly longer than other, albeit less realistic, algorithms. Not surprisingly, this increase in execution time is associated with a high computational cost. This computation is usually performed using the general purpose central processing unit (CPU) of a personal computer (PC), with the algorithm implemented as a software routine. Other options available for processing these algorithms include desktop PC graphics processing units (GPUs) and custom designed acceleration hardware devices. GPUs tend to be efficient when dealing with less realistic rendering solutions such as rasterisation, however with their recent drive towards increased programmability they can also be used to process more realistic algorithms. A drawback to the use of GPUs is that these algorithms often have to be reworked to make optimal use of the limited resources available. There are very few custom hardware devices available for acceleration of the photon mapping algorithm. Ray-tracing is the predecessor to photon mapping, and although not capable of producing the same physical accuracy and therefore realism, there are similarities between the algorithms. There have been several hardware prototypes, and at least one commercial offering, created with the goal of accelerating ray-trace rendering [3]. However, properties making many of these proposals suitable for the acceleration of ray-tracing are not shared by photon mapping. There are even fewer proposals for acceleration of the additional functions found only in photon mapping. All of these approaches to algorithm acceleration offer limited scalability. GPUs are inherently difficult to scale, while many of the custom hardware devices available thus far make use of large processing elements and complex acceleration data structures. In this work we make use of three novel approaches in the design of highly scalable specialised hardware structures for the acceleration of the photon mapping algorithm. Increased scalability is gained through: • The use of a brute-force approach in place of the commonly used smart approach, thus eliminating much data pre-processing, complex data structures and large processing units often required. • The use of Logarithmic Number System (LNS) arithmetic computation, which facilitates a reduction in processing area requirement. • A novel redesign of the photon inclusion test, used within the photon search method of the photon mapping algorithm. This allows an intelligent memory structure to be used for the search. The design uses two hardware structures, both of which accelerate one core rendering function. Renderings produced using field programmable gate array (FPGA) based prototypes are presented, along with details of 90nm synthesised versions of the designs which show that close to an orderof- magnitude speedup over a software implementation is possible. Due to the scalable nature of the design, it is likely that any advantage can be maintained in the face of improving processor speeds. Significantly, due to the brute-force approach adopted, it is possible to eliminate an often-used software acceleration method. This means that the device can interface almost directly to a frontend modelling package, minimising much of the pre-processing required by most other proposals

    Large scale geostatistics with locally varying anisotropy

    Get PDF
    Classical geostatistical methods are based on the hypothesis of stationarity, which allows to apply repetitive sampling in different locations of the spatial domain, in order to obtain enough information to infer cumulative distributions. In case of non stationarity, anisotropy is observed in the underlying physical phenomena. This feature manifest itself as preferential directions of continuity in the phenomena, i.e. properties are more continuous in one orientation than in another. In the case of local anisotropy, each location of the domain in study presents different preferential directions of continuity. The locally varying anisotropy (LVA) approach in geostatistics allows to incorporate a field of local anisotropy parameters defined for each domain point. With this additional input, more realistic spatial simulations can be generated, including geological features to the computational model such as folds, veins, faults, among others. Since the seminal article published by Boisvert and Deutsch (2011), to the best of the author's knowledge, no further analysis or public code improvements were developed. This is in part because acceleration and parallelization techniques must be applied to the inner kernels of the baseline LVA codes. Large execution time is needed to generate small-scale domain simulations, making large-scale domain simulations a prohibitive task. The contributions of this thesis are accelerating and parallelizing classical and LVA-based geostatistical simulation methods, particularly sequential simulation, which is one of the most common and computationally intensive methods in the field. This fact was recently remarked by some of the main authors in the field, Gómez-Hernández and Srivastava (2021), which shows the relevance of this work today. Two main parallel algorithms and an optimized version of a kd-tree search implementation are presented, all of them applied to both classical and LVA-based sequential simulation implementations. The first parallel algorithm is related to the parallel simulation of different domain points, after rearranging the order of simulation but preserving the exact results of a single-thread execution. The second parallel algorithm is related to the parallel search of neighbour points in the domain, which will be used to build data dependencies for the parallel simulation of points. The optimized kd-tree search was used in each test case in order to reduce the computational complexity of neighbour search tasks. Its modified implementation reduces the number of branching instructions and introduces specialized code sections to accelerate the execution. The main focus is on multi-core architectures using OpenMP and optimization techniques applied to Fortran and C++ codes. Additionally, acceleration and parallelization techniques were also applied to auxiliary applications, such as shortest path and variogram calculation on hybrid CPU/GPU architectures using Fortran, C++ and CUDA codes. In the last application, an analytical and heuristic model was developed to estimate the optimal workload distribution between CPU and GPU in the hybrid context. The overall results of this work are a set of applications that will allow researchers and practitioners to accelerate dramatically the execution of their experiments and simulations, being sgsim, sisim, sgs-lva and sisim-lva the accelerated codes presented. Final speedup results of 11x and 50x are obtained for non-LVA codes using 16 threads, and 56x and 1822x are obtained for LVA codes using 20 threads. These tools can be combined with other geostatistical tools, in order to improve the existing landscape of open source codes that can be used in practical scenarios.Los métodos geoestadísticos clásicos se basan en la hipótesis de la estacionariedad, que permite aplicar muestreos repetitivos en diferentes lugares del dominio espacial, con el fin de obtener información suficiente para inferir distribuciones acumuladas. En caso de no estacionariedad, se observa anisotropía en los fenómenos físicos subyacentes. Esta característica se manifiesta como direcciones preferenciales de continuidad en los fenómenos, es decir, las propiedades son más continuas en una orientación que en otra. En el caso de la anisotropía local, cada ubicación del dominio en estudio puede presentar diferentes direcciones preferenciales de continuidad. El enfoque de anisotropía localmente variable (LVA) en geoestadística permite incorporar un campo de parámetros de anisotropía locales definidos para cada punto de dominio. Con esta entrada adicional, se pueden generar simulaciones espaciales más realistas, incluyendo características geológicas al modelo computacional como pliegues, vetas, fallas, entre otras. Desde el artículo seminal publicado por Boisvert y Deutsch (2011), según el conocimiento del autor, no se han desarrollado más análisis ni mejoras en el código público. Esto se debe en parte a que se deben aplicar técnicas de aceleración y paralelización a los núcleos internos de los códigos LVA de referencia. Se necesita mucho tiempo de ejecución para generar simulaciones de dominio a pequeña escala, lo que hace que las simulaciones de dominio a gran escala sean una tarea prohibitiva. Las contribuciones de esta tesis consisten en acelerar y paralelizar métodos de simulación geoestadística clásicos y basados en LVA, particularmente la simulación secuencial, que es uno de los métodos más comunes e intensivos en computación en el campo. Este hecho fue señalado recientemente por algunos de los principales autores en el campo, Gómez-Hernández y Srivastava (2021), lo que demuestra la relevancia de este trabajo en la actualidad. Se presentan dos algoritmos paralelos principales y una versión optimizada de una implementación de búsqueda de árbol kd, todos ellos aplicados a implementaciones de simulación secuencial clásicas y basadas en LVA. El primer algoritmo paralelo está relacionado con la simulación paralela de diferentes puntos del dominio, después de reorganizar el orden de simulación pero conservando los resultados exactos de una ejecución de un solo hilo. El segundo algoritmo paralelo está relacionado con la búsqueda paralela de puntos vecinos en el dominio, que se utilizará para resolver dependencias de datos para la simulación paralela de puntos. La búsqueda optimizada de kd-tree se utilizó en cada caso de prueba para reducir la complejidad computacional de las tareas de búsqueda de vecinos. Su implementación modificada reduce el número de instrucciones branching e introduce código especializado para acelerar la ejecución. El foco principal está en arquitecturas multi-núcleo usando OpenMP y técnicas de optimización aplicadas a códigos Fortran y C++. Además, también se aplicaron técnicas de aceleración y paralelización a aplicaciones auxiliares, como el cálculo de la ruta más corta en un grafo y el cálculo de variogramas en arquitecturas híbridas CPU/GPU utilizando códigos Fortran, C++ y CUDA. En la última aplicación, se desarrolló un modelo analítico y heurístico para estimar la distribución óptima de la carga de trabajo entre CPU y GPU en el contexto híbrido. Los resultados generales de este trabajo son un conjunto de aplicaciones que permitirán a los investigadores y profesionales acelerar la ejecución de sus experimentos, siendo sgsim, sisim, sgs-lva y sisim-lva los códigos acelerados. Se obtienen resultados finales de aceleración de 11x y 50x para códigos que no son LVA usando 16 hilos, y se obtienen 56x y 1822x para códigos LVA usando 20 hilos. Estas herramientas se pueden combinar con otras herramientas geoestadícasPostprint (published version

    A Standardised Benchmark for Assessing the Performance of Fixed Radius Near Neighbours

    Get PDF
    Many agent based models require agents to have an awareness of their local peers. The handling of these fixed radius near neighbours (FRNNs) is often a limiting factor of performance. However without a standardised metric to assess the handling of FRNNs, contributions to the field lack the rigorous appraisal necessary to expose their relative benefits. This paper presents a standardised specification of a multi agent based benchmark model. The benchmark model provides a means for the objective assessment of FRNNs performance, through the comparison of implementations. Results collected from implementations of the benchmark model under three agent based modelling frameworks show the 64-bit floating point performance of each framework to scale linearly with agent population, in contrast the GPU accelerated framework’s 32- bit floating point performance only became linear after maximal device utilisation around 100,000 agent

    Working With Incremental Spatial Data During Parallel (GPU) Computation

    Get PDF
    Central to many complex systems, spatial actors require an awareness of their local environment to enable behaviours such as communication and navigation. Complex system simulations represent this behaviour with Fixed Radius Near Neighbours (FRNN) search. This algorithm allows actors to store data at spatial locations and then query the data structure to find all data stored within a fixed radius of the search origin. The work within this thesis answers the question: What techniques can be used for improving the performance of FRNN searches during complex system simulations on Graphics Processing Units (GPUs)? It is generally agreed that Uniform Spatial Partitioning (USP) is the most suitable data structure for providing FRNN search on GPUs. However, due to the architectural complexities of GPUs, the performance is constrained such that FRNN search remains one of the most expensive common stages between complex systems models. Existing innovations to USP highlight a need to take advantage of recent GPU advances, reducing the levels of divergence and limiting redundant memory accesses as viable routes to improve the performance of FRNN search. This thesis addresses these with three separate optimisations that can be used simultaneously. Experiments have assessed the impact of optimisations to the general case of FRNN search found within complex system simulations and demonstrated their impact in practice when applied to full complex system models. Results presented show the performance of the construction and query stages of FRNN search can be improved by over 2x and 1.3x respectively. These improvements allow complex system simulations to be executed faster, enabling increases in scale and model complexity

    Personalizing web search and crawling from clickstream data

    Get PDF
    Our aim is to improve web search engines, approaching the searching problem considering the user, his/her topics of interest and the navigation context. Furthermore, the clickstream also contains patterns inside. Our system will also try to predict the next pages that are going to be visited according to the clickstream. In a personalized search engine, two different users get different results for the same query, because the system considers the interests of each user separately. To personalize search, many sources of information can be used: the bookmarks of the user, his/her geographical location, his navigation history, etc. Web search engines have, broadly speaking, three basic phases. They are crawling, indexing and searching. The information available about the users interest can be considered in some of those three phases, depending on its nature. Work on search personalization already exists. We will see them in Chapter 3. In order to solve the problems of ignorance in relation to the user and his interests, we have developed a system that keeps track of the web pages that the user visits (his clickstream). Our system will analyze the clickstream, and will focus the crawling to pages related to the topics of interest of the user. Furthermore, each time the user executes a query, the system will consider his/her navigation context, and pages related to the navigation context will get better scores. Furthermore, our system also analyzes the clickstream of the user, and retrieves some navigation patterns from it. Those patterns will be used to give some navigation tips to the user based on his navigation context
    • …