    Otimização em GPU de bounding volume hierarchies para ray tracing

    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Métodos de Ray Tracing são conhecidos por produzir imagens extremamente realistas ao custo de um alto esforço computacional. Pouco após terem surgido, percebeu-se que a maior parte do custo associado a estes métodos está relacionada a encontrar a intersecção entre o grande número de raios que precisam ser traçados e a geometria da cena. Estruturas de dados especiais que indexam e organizam a geometria foram propostas para acelerar estes cálculos, de forma que apenas um subconjunto da geometria precise ser verificado para encontrar as intersecções. Dentre elas, podemos destacar as Bounding Volume Hierarchies (BVH), que são estruturas usadas para agrupar objetos 3D hierarquicamente. Recentemente, uma grande quantidade de esforços foi aplicada para acelerar a construção destas estruturas e aumentar sua qualidade. Este trabalho apresenta um novo método para a construção de BVHs de alta qualidade em sistemas manycore. O método em questão é uma extensão do atual estado da arte na construção de BVHs em GPU, Treelet Restructuring Bounding Volume Hierarchy (TRBVH), e consiste em otimizar uma árvore já existente reorganizando subconjuntos de seus nós através de uma abordagem de agrupamento aglomerativo. A implementação deste método foi feita para a arquitetura Kepler utilizando CUDA e foi testada em dezesseis cenas que são comumente usadas para avaliar o desempenho de estruturas aceleradoras. É demonstrado que esta implementação é capaz de produzir árvores com qualidade comparável às geradas utilizando TRBVH para aquelas cenas, além de ser 30% mais rápidaAbstract: Ray tracing methods are well known for producing very realistic images at the expense of a high computational effort. Most of the cost associated with those methods comes from finding the intersection between the massive number of rays that need to be traced and the scene geometry. Special data structures were proposed to speed up those calculations by indexing and organizing the geometry so that only a subset of it has to be effectively checked for intersections. One such construct is the Bounding Volume Hierarchy (BVH), which is a tree-like structure used to group 3D objects hierarchically. Recently, a significant amount of effort has been put into accelerating the construction of those structures and increasing their quality. We present a new method for building high-quality BVHs on manycore systems. Our method is an extension of the current state-of-the-art on GPU BVH construction, Treelet Restructuring Bounding Volume Hierarchy (TRBVH), and consists of optimizing an already existing tree by rearranging subsets of its nodes using an agglomerative clustering approach. We implemented our solution for the NVIDIA Kepler architecture using CUDA and tested it on sixteen distinct scenes that are commonly used to evaluate the performance of acceleration structures. We show that our implementation is capable of producing trees whose quality is equivalent to the ones generated by TRBVH for those scenes, while being about 30% faster to do soMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Efficient data mining algorithms for time series and complex medical data

    Efficient Knowledge Extraction from Structured Data

    Knowledge extraction from structured data aims for identifying valid, novel, potentially useful, and ultimately understandable patterns in the data. The core step of this process is the application of a data mining algorithm in order to produce an enumeration of particular patterns and relationships in large databases. Clustering is one of the major data mining tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. In this thesis, we advance the state-of-the-art data mining algorithms for analyzing structured data types. We describe the development of innovative solutions for hierarchical data mining. The EM-based hierarchical clustering method ITCH (Information-Theoretic Cluster Hierarchies) is designed to propose solid solutions for four different challenges. (1) to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. (2) to represent each cluster content in the hierarchy by an intuitive description with e.g. a probability density function. (3) to consistently handle outliers. (4) to avoid difficult parameter settings. ITCH is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL). Interpreting the hierarchical cluster structure as a statistical model of the dataset, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate induces a natural objective function for clustering, which automatically satisfies all four above mentioned goals. The genetic-based hierarchical clustering algorithm GACH (Genetic Algorithm for finding Cluster Hierarchies) overcomes the problem of getting stuck in a local optimum by a beneficial combination of genetic algorithms, information theory and model-based clustering. Besides hierarchical data mining, we also made contributions to more complex data structures, namely objects that consist of mixed type attributes and skyline objects. The algorithm INTEGRATE performs integrative mining of heterogeneous data, which is one of the major challenges in the next decade, by a unified view on numerical and categorical information in clustering. Once more, supported by the MDL principle, INTEGRATE guarantees the usability on real world data. For skyline objects we developed SkyDist, a similarity measure for comparing different skyline objects, which is therefore a first step towards performing data mining on this kind of data structure. Applied in a recommender system, for example SkyDist can be used for pointing the user to alternative car types, exhibiting a similar price/mileage behavior like in his original query. For mining graph-structured data, we developed different approaches that have the ability to detect patterns in static as well as in dynamic networks. We confirmed the practical feasibility of our novel approaches on large real-world case studies ranging from medical brain data to biological yeast networks. In the second part of this thesis, we focused on boosting the knowledge extraction process. We achieved this objective by an intelligent adoption of Graphics Processing Units (GPUs). The GPUs have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphics tasks but can also be used for general numeric and symbolic computations. As major advantage, GPUs provide extreme parallelism combined with a high bandwidth in memory transfer at low cost. In this thesis, we propose algorithms for computationally expensive data mining tasks like similarity search and different clustering paradigms which are designed for the highly parallel environment of a GPU, called CUDA-DClust and CUDA-k-means. We define a multi-dimensional index structure which is particularly suited to support similarity queries under the restricted programming model of a GPU. We demonstrate the superiority of our algorithms running on GPU over their conventional counterparts on CPU in terms of efficiency

    Scalable data clustering using GPUs

    The computational demands of multivariate clustering grow rapidly, and therefore processing large data sets, like those found in flow cytometry data, is very time consuming on a single CPU. Fortunately these techniques lend themselves naturally to large scale parallel processing. To address the computational demands, graphics processing units, specifically NVIDIA\u27s CUDA framework and Tesla architecture, were investigated as a low-cost, high performance solution to a number of clustering algorithms. C-means and Expectation Maximization with Gaussian mixture models were implemented using the CUDA framework. The algorithm implementations use a hybrid of CUDA, OpenMP, and MPI to scale to many GPUs on multiple nodes in a high performance computing environment. This framework is envisioned as part of a larger cloud-based workflow service where biologists can apply multiple algorithms and parameter sweeps to their data sets and quickly receive a thorough set of results that can be further analyzed by experts. Improvements over previous GPU-accelerated implementations range from 1.42x to 21x for C-means and 3.72x to 5.65x for the Gaussian mixture model on non-trivial data sets. Using a single NVIDIA GTX 260 speedups are on average 90x for C-means and 74x for Gaussians with flow cytometry files compared to optimized C code running on a single core of a modern Intel CPU. Using the TeraGrid Lincoln high performance cluster at NCSA C-means achieves 42% parallel efficiency and a CPU speedup of 4794x with 128 Tesla C1060 GPUs. The Gaussian mixture model achieves 72% parallel efficiency and a CPU speedup of 6286x

    Clustering: Methodology, hybrid systems, visualization, validation and implementation

    Unsupervised learning is one of the most important steps of machine learning applications. Besides its ability to obtain the insight of the data distribution, unsupervised learning is used as a preprocessing step for other machine learning algorithm. This dissertation investigates the application of unsupervised learning into various types of data for many machine learning tasks such as clustering, regression and classification. The dissertation is organized into three papers. In the first paper, unsupervised learning is applied to mixed categorical and numerical feature data type to transform the data objects from the mixed type feature domain into a new sparser numerical domain. By making use of the data fusion capacity of adaptive resonance theory clustering, the approach is able to reduce the distinction between the numerical and categorical features. The second paper presents a novel method to improve the performance of wind forecast by clustering the time series of the surrounding wind mills into the similar group by using hidden Markov model clustering and using the clustering information to enhance the forecast. A fast forecast method is also introduced by using extreme learning machine which can be trained by analytic form to choose the optimal value of past samples for prediction and appropriate size of the neural network. In the third paper, unsupervised learning is used to automatically learn the feature from the dataset itself without human design of sophisticated feature extractors. The paper points out that by using unsupervised feature learning with multi-quadric radial basis function extreme learning machine the performance of the classifier is better than several other supervised learning methods. The paper further improves the speed of training the neural network by presenting an algorithm that runs parallel on GPU --Abstract, page iv

    High Performance Computing Techniques to Better Understand Protein Conformational Space

    This thesis presents an amalgamation of high performance computing techniques to get better insight into protein molecular dynamics. Key aspects of protein function and dynamics can be learned from their conformational space. Datasets that represent the complex nuances of a protein molecule are high dimensional. Efficient dimensionality reduction becomes indispensable for the analysis of such exorbitant datasets. Dimensionality reduction forms a formidable portion of this work and its application has been explored for other datasets as well. It begins with the parallelization of a known non-liner feature reduction algorithm called Isomap. The code for the algorithm was re-written in C with portions of it parallelized using OpenMP. Next, a novel data instance reduction method was devised which evaluates the information content offered by each data point, which ultimately helps in truncation of the dataset with much fewer data points to evaluate. Once a framework has been established to reduce the number of variables representing a dataset, the work is extended to explore algebraic topology techniques to extract meaningful information from these datasets. This step is the one that helps in sampling the conformations of interest of a protein molecule. The method employs the notion of hierarchical clustering to identify classes within a molecule, thereafter, algebraic topology is used to analyze these classes. Finally, the work is concluded by presenting an approach to solve the open problem of protein folding. A Monte-Carlo based tree search algorithm is put forth to simulate the pathway that a certain protein conformation undertakes to reach another conformation. The dissertation, in its entirety, offers solutions to a few problems that hinder the progress of solution for the vast problem of understanding protein dynamics. The motion of a protein molecule is guided by changes in its energy profile. In this course the molecule gradually slips from one energy class to another. Structurally, this switch is transient spanning over milliseconds or less and hence is difficult to be captured solely by the work in wet laboratories

    Efficient data mining algorithms for time series and complex medical data

    Faster data structures and graphics hardware techniques for high performance rendering

    Computer generated imagery is used in a wide range of disciplines, each with different requirements. As an example, real-time applications such as computer games have completely different restrictions and demands than offline rendering of feature films. A game has to render quickly using only limited resources, yet present visually adequate images. Film and visual effects rendering may not have strict time requirements but are still required to render efficiently utilizing huge render systems with hundreds or even thousands of CPU cores. In real-time rendering, with limited time and hardware resources, it is always important to produce as high rendering quality as possible given the constraints available. The first paper in this thesis presents an analytical hardware model together with a feed-back system that guarantees the highest level of image quality subject to a limited time budget. As graphics processing units grow more powerful, power consumption becomes a critical issue. Smaller handheld devices have only a limited source of energy, their battery, and both small devices and high-end hardware are required to minimize energy consumption not to overheat. The second paper presents experiments and analysis which consider power usage across a range of real-time rendering algorithms and shadow algorithms executed on high-end, integrated and handheld hardware. Computing accurate reflections and refractions effects has long been considered available only in offline rendering where time isn’t a constraint. The third paper presents a hybrid approach, utilizing the speed of real-time rendering algorithms and hardware with the quality of offline methods to render high quality reflections and refractions in real-time. The fourth and fifth paper present improvements in construction time and quality of Bounding Volume Hierarchies (BVH). Building BVHs faster reduces rendering time in offline rendering and brings ray tracing a step closer towards a feasible real-time approach. Bonsai, presented in the fourth paper, constructs BVHs on CPUs faster than contemporary competing algorithms and produces BVHs of a very high quality. Following Bonsai, the fifth paper presents an algorithm that refines BVH construction by allowing triangles to be split. Although splitting triangles increases construction time, it generally allows for higher quality BVHs. The fifth paper introduces a triangle splitting BVH construction approach that builds BVHs with quality on a par with an earlier high quality splitting algorithm. However, the method presented in paper five is several times faster in construction time

    GPU-accelerated Chemical Similarity Assessment for Large Scale Databases

    The assessment of chemical similarity between molecules is a basic operation in chemoinformatics, a computational area concerning with the manipulation of chemical structural information. Comparing molecules is the basis for a wide range of applications such as searching in chemical databases, training prediction models for virtual screening or aggregating clusters of similar compounds. However, currently available multimillion databases represent a challenge for conventional chemoinformatics algorithms raising the necessity for faster similarity methods. In this paper, we extensively analyze the advantages of using many-core architectures for calculating some commonly-used chemical similarity coe_cients such as Tanimoto, Dice or Cosine. Our aim is to provide a wide-breath proof-of-concept regarding the usefulness of GPU architectures to chemoinformatics, a class of computing problems still uncovered. In our work, we present a general GPU algorithm for all-to-all chemical comparisons considering both binary fingerprints and floating point descriptors as molecule representation. Subsequently, we adopt optimization techniques to minimize global memory accesses and to further improve e_ciency. We test the proposed algorithm on different experimental setups, a laptop with a low-end GPU and a desktop with a more performant GPU. In the former case, we obtain a 4-to-6-fold speed-up over a single-core implementation for fingerprints and a 4-to-7-fold speed-up for descriptors. In the latter case, we respectively obtain a 195-to-206-fold speed-up and a 100-to-328-fold speed-up.National Institutes of Health (U.S.) (grant GM079804)National Institutes of Health (U.S.) (grant GM086145