Search CORE

4 research outputs found

Exploring the Design Space of Static and Incremental Graph Connectivity Algorithms on GPUs

Author: Acar Umut A.
Bader David A.
Banerjee Dip Sankar
Beamer Scott
Blelloch Guy E.
Chakrabarti Deepayan
Chitnis Laukik
Dhulipala Laxman
Dhulipala Laxman
Ediger D.
Ester Martin
Green O.
Hambrusch S.
Holm J.
Hsu Tsan-Sheng
Jayanti Siddhartha V.
Karger David R.
Kiveris Raimondas
Liu Sixue
Madduri Kamesh
McColl R.
Merrill Duane
Patwary M. M. A.
Phillips C. A.
Shun Julian
Shun Julian
Siddhartha
Slota George M.
Soman J.
Stergiou Stergios
Sutton M.
Wang Yangzihao
Wang Yangzihao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/08/2020
Field of study

Connected components and spanning forest are fundamental graph algorithms due to their use in many important applications, such as graph clustering and image segmentation. GPUs are an ideal platform for graph algorithms due to their high peak performance and memory bandwidth. While there exist several GPU connectivity algorithms in the literature, many design choices have not yet been explored. In this paper, we explore various design choices in GPU connectivity algorithms, including sampling, linking, and tree compression, for both the static as well as the incremental setting. Our various design choices lead to over 300 new GPU implementations of connectivity, many of which outperform state-of-the-art. We present an experimental evaluation, and show that we achieve an average speedup of 2.47x speedup over existing static algorithms. In the incremental setting, we achieve a throughput of up to 48.23 billion edges per second. Compared to state-of-the-art CPU implementations on a 72-core machine, we achieve a speedup of 8.26--14.51x for static connectivity and 1.85--13.36x for incremental connectivity using a Tesla V100 GPU

arXiv.org e-Print Archive

Crossref

DSpace@MIT

ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms

Author: Dhulipala Laxman
Hong Changwan
Shun Julian
Publication venue
Publication date: 10/08/2020
Field of study

Connected components is a fundamental kernel in graph applications due to its usefulness in measuring how well-connected a graph is, as well as its use as subroutines in many other graph algorithms. The fastest existing parallel multicore algorithms for connectivity are based on some form of edge sampling and/or linking and compressing trees. However, many combinations of these design choices have been left unexplored. In this paper, we design the ConnectIt framework, which provides different sampling strategies as well as various tree linking and compression schemes. ConnectIt enables us to obtain several hundred new variants of connectivity algorithms, most of which extend to computing spanning forest. In addition to static graphs, we also extend ConnectIt to support mixes of insertions and connectivity queries in the concurrent setting. We present an experimental evaluation of ConnectIt on a 72-core machine, which we believe is the most comprehensive evaluation of parallel connectivity algorithms to date. Compared to a collection of state-of-the-art static multicore algorithms, we obtain an average speedup of 37.4x (2.36x average speedup over the fastest existing implementation for each graph). Using ConnectIt, we are able to compute connectivity on the largest publicly-available graph (with over 3.5 billion vertices and 128 billion edges) in under 10 seconds using a 72-core machine, providing a 3.1x speedup over the fastest existing connectivity result for this graph, in any computational setting. For our incremental algorithms, we show that our algorithms can ingest graph updates at up to several billion edges per second. Finally, to guide the user in selecting the best variants in ConnectIt for different situations, we provide a detailed analysis of the different strategies in terms of their work and locality

arXiv.org e-Print Archive

DSpace@MIT

High-Performance Computing Algorithms for Constructing Inverted Files on Emerging Multicore Processors

Author: Wei Zheng
Publication venue
Publication date: 01/01/2012
Field of study

Current trends in processor architectures increasingly include more cores on a single chip and more complex memory hierarchies, and such a trend is likely to continue in the foreseeable future. These processors offer unprecedented opportunities for speeding up demanding computations if the available resources can be effectively utilized. Simultaneously, parallel programming languages such as OpenMP and MPI have been commonly used on clusters of multicore CPUs while newer programming languages such as OpenCL and CUDA have been widely adopted on recent heterogeneous systems and GPUs respectively. The main goal of this dissertation is to develop techniques and methodologies for exploiting these emerging parallel architectures and parallel programming languages to solve large scale irregular applications such as the construction of inverted files. The extraction of inverted files from large collections of documents forms a critical component of all information retrieval systems including web search engines. In this problem, the disk I/O throughput is the major performance bottleneck especially when intermediate results are written onto disks. In addition to the I/O bottleneck, a number of synchronization and consistency issues must be resolved in order to build the dictionary and postings lists efficiently. To address these issues, we introduce a dictionary data structure using a hybrid of trie and B-trees and a high-throughput pipeline strategy that completely avoids the use of disks as temporary storage for intermediate results, while ensuring the consumption of the input data at a high rate. The high-throughput pipelined strategy produces parallel parsed streams that are consumed at the same rate by parallel indexers. The pipelined strategy is implemented on a single multicore CPU as well as on a cluster of such nodes. We were able to achieve a throughput of more than 262MB/s on the ClueWeb09 dataset on a single node. On a cluster of 32 nodes, our experimental results show scalable performance using different metrics, significantly improving on prior published results. On the other hand, we develop a new approach for handling time-evolving documents using additional small temporal indexing structures. The lifetime of the collection is partitioned into multiple time windows, which guarantees a very fast temporal query response time at a small space overhead relative to the non-temporal case. Extensive experimental results indicate that the overhead in both indexing and querying is small in this more complicated case, and the query performance can indeed be improved using finer temporal partitioning of the collection. Finally, we employ GPUs to accelerate the indexing process for building inverted files and to develop a very fast algorithm for the highly irregular list ranking problem. For the indexing problem, the workload is split between CPUs and GPUs in such a way that the strengths of both architectures are exploited. For the list ranking problem involved in the decompression of inverted files, an optimized GPU algorithm is introduced by reducing the problem to a large number of fine grain computations in such a way that the processing cost per element is shown to be close to the best possible

Digital Repository at the University of Maryland

Image Re-ranking Acceleration On Gpus

Author: Borin E.
Breternitz M.
Da Torres R.S.
Pedronette D.C.G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/11/2015
Field of study

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Huge image collections are becoming available lately. In this scenario, the use of Content-Based Image Retrieval (CBIR) systems has emerged as a promising approach to support image searches. The objective of CBIR systems is to retrieve the most similar images in a collection, given a query image, by taking into account image visual properties such as texture, color, and shape. In these systems, the effectiveness of the retrieval process depends heavily on the accuracy of ranking approaches. Recently, re-ranking approaches have been proposed to improve the effectiveness of CBIR systems by taking into account the relationships among images. The re-ranking approaches consider the relationships among all images in a given dataset These approaches typically demands a huge amount of computational power, which hampers its use in practical situations. On the other hand, these methods can be massively parallelized. In this paper, we propose to speedup the computation of the RL-Sim algorithm, a recently proposed image re-ranking approach, by using the computational power of Graphics Processing Units (GPU). GPUs are emerging as relatively inexpensive parallel processors that are becoming available on a wide range of computer systems. We address the image re-ranking performance challenges by proposing a parallel solution designed to fit the computational model of GPUs. We conducted an experimental evaluation considering different implementations and devices. Experimental results demonstrate that significant performance gains can be obtained. Our approach achieves speedups of 7 × from serial implementation considering the overall algorithm and up to 36 × on its core steps. © 2013 IEEE.176183Brazilian Computer Society (SBC),Brazilian Funding Agencies CAPES,CNPq,et al.,IEEE Computer Society Through the Technical Committees,on Computer Architecture (TCCA) and TCSCConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Datta, R., Joshi, D., Li, J., Wang, J.Z., Image retrieval: Ideas, influences, and trends of the new age (2008) ACM Computing Surveys, 40 (2), pp. 51-560McDonald, S., Tait, J., Search strategies in content-based image retrieval (2003) 26th ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR'03), pp. 80-87Ferreira, C.D., Dos Santos, J.A., Da Torres, S.R., Gonçalves, M.A., Rezende, R.C., Fan, W., Relevance feedback based on genetic programming for image retrieval (2011) Pattern Recogninion Letters, 32 (1), pp. 27-37Dos Santos, J.A., Ferreira, C.D., Da Torres, S.R., Gonçalves, M.A., Lamparelli, R.A., A relevance feedback method based on genetic programming for classification of remote sensing images (2011) Information Sciences, 181 (13), pp. 2671-2684Pedronette, D.C.G., Da Torres, S.R., Image re-ranking and rank aggregation based on similarity of ranked lists (2013) Pattern Recognition, , to appear http://dx.doi.org/10.1016/j.patcog.2013.01.004Yang, X., Prasad, L., Latecki, L., Affinity learning with diffusion on tensor product graph (2012) Pattern Analysis and Machine Intelligence, PP (99), p. 1. , IEEE Transactions onYang, X., Latecki, L.J., Affinity learning on a tensor product graph with applications to shape and image retrieval (2011) IEEE Conference on Computer Vision and Pattern Recognition (CVPR'2011), pp. 2369-2376Pedronette, D.C.G., Da Torres, S.R., Exploiting pairwise recommendation and clustering strategies for image re-ranking (2012) Information Sciences, 207, pp. 19-34Jegou, H., Schmid, C., Harzallah, H., Verbeek, J., Accurate image search using the contextual dissimilarity measure (2010) IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (1), pp. 2-11Pedronette, D.C.G., Da Torres, S.R., Borin, E., Breternitz, M., Efficient image re-ranking computation on GPUs (2012) Int. Symposium Parallel Distributed Processing (ISPA'2012), pp. 95-102Pedronette, D.C.G., Da Torres, S.R., Image re-ranking and rank aggregation based on similarity of ranked lists (2011) Computer Analysis of Images and Patterns (CAIP'2011), 6854, pp. 369-376Banerjee, D., Kothapalli, K., Hybrid algorithms for list ranking and graph connected components (2011) High Performance Computing (HiPC), pp. 1-10. , 2011 18th International Conference on, decKontschieder, P., Donoser, M., Bischof, H., Beyond pairwise shape similarity analysis (2009) Asian Conference on Computer Vision, pp. 655-666Yang, X., Bai, X., Latecki, L.J., Tu, Z., Improving shape retrieval by learning graph transduction (2008) European Conference on Computer Vision (ECCV'2008), 4, pp. 788-801Jiang, J., Wang, B., Tu, Z., Unsupervised metric learning by self-smoothing operator (2011) ICCV, pp. 794-801Shen, X., Lin, Z., Brandt, J., Avidan, S., Wu, Y., Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking (2012) Computer Vision and Pattern Recognition (CVPR), pp. 3013-3020. , 2012 IEEE Conference on, juneQin, D., Gammeter, S., Bossard, L., Quack, T., Van Gool, L., Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors (2011) Computer Vision and Pattern Recognition (CVPR), pp. 777-784. , 2011 IEEE Conference on, juneYe, G., Liu, D., Jhuo, I.-H., Chang, S.-F., Robust late fusion with rank minimization (2012) Computer Vision and Pattern Recognition (CVPR), pp. 3021-3028. , 2012 IEEE Conference on, juneWang, J., Li, Y., Bai, X., Zhang, Y., Wang, C., Tang, N., Learning context-sensitive similarity by shortest path propagation (2011) Pattern Recognition, 44 (10-11), pp. 2367-2374Yang, X., Koknar-Tezel, S., Latecki, L.J., Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval (2009) IEEE Conference on Computer Vision and Pattern Recognition (CVPR'2009), pp. 357-364Pedronette, D.C.G., Da Torres, S.R., Exploiting clustering approaches for image re-ranking (2011) Journal of Visual Languages and Computing, 22 (6), pp. 453-466Pedronette, D.C.G., Da Torres, S.R., Exploiting contextual information for image re-ranking and rank aggregation (2012) International Journal of Multimedia Information Retrieval, 1 (2), pp. 115-128Perronnin, F., Liu, Y., Renders, J.-M., A family of contextual measures of similarity between distributions with application to image retrieval (2009) IEEE Conference on Computer Vision and Pattern Recognition (CVPR'2009), pp. 2358-2365Steele, J., Cochran, R., Introduction to GPGPU programming (2007) Proceedings of the 45th Annual Southeast Regional Conference, Ser. ACM-SE 45, pp. 508-508Scott Rostrup, S.S., Singhal, K., Fast and memory-efficient minimum spanning tree on the gpu (2011) Proceedings of the Second International Workshop on GPUs and Scientific Applications (GPUScA), pp. 3-13. , PACT 2011Thilina Gunarathne, A.C., Salpitikorala, B., Fox, G., Optimizing OpenCL kernels for iterative statistical algorithms on GPUs (2011) Second International Workshop on GPUs and Scientific Applications (GPUScA), pp. 33-44. , PACT 2011Wu, T., Wang, B., Shan, Y., Yan, F., Wang, Y., Xu, N., Efficient pagerank and spmv computation on AMD GPUs (2010) 39th International Conference on Parallel Processing (ICPP'2010), pp. 81-89Wang, B., Wu, T., Yan, F., Li, R., Xu, N., Wang, Y., Rankboost acceleration on both nvidia cuda and ati stream platforms (2009) Parallel and Distributed Systems (ICPADS), pp. 284-291. , 2009 15th International Conference on, decStrong, G., Gong, M., Browsing a large collection of community photos based on similarity on gpu (2008) 4th International Symposium on Advances in Visual Computing (ISVC'08), pp. 390-399Pham, N.-K., Morin, A., Gros, P., Accelerating image retrieval using factorial correspondence analysis on GPU (2009) Computer Analysis of Images and Patterns (CAIP'2009), pp. 565-572Zhu, F., Chen, P., Yang, D., Zhang, W., Chen, H., Zang, B., A GPU-based high-throughput image retrieval algorithm (2012) Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, Ser. GPGPU-5, pp. 30-37Pedronette, D.C.G., Da Torres, S.R., Shape retrieval using contour features and distance optmization (2010) International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP'2010), 1, pp. 197-202Stone, J.E., Gohara, D., Shi, G., OpenCL: A parallel programming standard for heterogeneous computing systems (2010) Computing in Science Engineering, 12 (3), pp. 66-73AMD Accelerated Parallel Processing OpenCL Programming Guide, , http://developer.amd.com/download, Accessed 2013-01-01AMD Accelerated Parallel Processing OpenCL Programming Guide, , http://developer.amd.com, accessed 2013-01-30Satish, N., Harris, M., Garland, M., Designing efficient sorting algorithms for manycore gpus (2009) Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, Ser. IPDPS '09Latecki, L.J., Lakmper, R., Eckhardt, U., Shape descriptors for non-rigid shapes with a single closed contour (2000) IEEE Conference on Computer Vision and Pattern Recognition (CVPR'2000), pp. 424-429Courtecuisse, H., Allard, J., Parallel dense gauss-seidel algorithm on many-core processors (2009) Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications, Ser. HPCC '09, pp. 139-147Hong, C., Chen, D., Chen, W., Zheng, W., Lin, H., MapCG: Writing parallel program portable between CPU and GPU (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, Ser. PACT '10Pedronette, D.C.G., Da Torres, S.R., Combining re-ranking and rank aggregation methods (2012) Iberoamerican Congress on Pattern Recognition (CIARP'2012), pp. 170-17

Repositorio da Producao Cientifica e Intelectual da Unicamp