The Euclidean distance is a measure frequently used in numerous applications, including data-analysis algorithms, to determine the similarity between objects, as a function of the distance between them. Given a set of objects, performing a distance similarity search consists of finding objects that are considered similar, i.e., finding objects within a threshold search distance of a given object, where the distance measure often employs the Euclidean distance formula. Distance similarity searches can be used as a building block for other algorithms, including the distance similarity join, k-Nearest Neighbors (kNN), k-Means, or the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithms. As such, optimizing the computation of Euclidean distances will improve the performance of distance similarity searches, and improving distance similarity searches will improve the performance of numerous other algorithms. Consequently, optimizing both Euclidean distance calculations and distance similarity searches is critical to improve the performance of many data-analysis algorithms, including the ones mentioned above, in addition to applications in other domains such as fields that require modeling and simulation.
The literature is rich with methods to improve the performance of Euclidean distance calculations and distance similarity searches, particularly using Central Processing Units (CPUs). While multicore CPUs can offer great parallel performance, they are outclassed by the higher computational throughput of Graphics Processing Units (GPUs). Using GPUs for these problems is relatively recent and there are, consequently, significantly fewer proposed work that use the GPU instead of the CPU. However, the design space for GPU algorithms is large, thus some algorithm designs have been neglected, including those that carefully exploit GPU resources. Furthermore, while both CPUs and GPUs have been extensively studied on their own, very little work has been conducted where both architectures are leveraged concurrently.
Tensor Cores (TCs) are a recent addition to certain GPU architectures. As an Application-Specific Integrated Circuit (ASIC), TCs are designed to compute Matrix Multiply-Accumulate (MMA) operations, at a higher throughput than other general-purpose cores. In the literature, TCs are primarily used for machine learning and other related fields involving linear algebra, yielding great performance improvements. Despite their specificity, TCs can be leveraged for any algorithm where the computation can be expressed using MMA operations. Nevertheless, leveraging TCs for general-purpose scientific algorithms remains an open problem.
We propose in this dissertation to optimize the performance of Euclidean distance calculations and more generally distance similarity searches, by examining: (i) GPU resource utilization; (ii) the joint use of both CPUs and GPUs for computation; (iii) the use of TCs to compute Euclidean distances; (iv) the joint use of general-purpose GPU cores and TCs to compute Euclidean distances
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.