3 research outputs found

    Graphics processing unit utilization in circuit simulation

    Get PDF
    Nykypäivän grafiikkaprosessorit (GPU) koostuvat sadoista monisäikeisistä, moniytimisistä prosessoreista ja monimutkaisesta korkean kaistanleveyden muistiarkkitehtuurista. Tämän vuoksi niistä on tullut hyvä vaihtoehto nopeuttamaan rinnakkaistettua yleislaskentaa, jossa suuria datamääriä käsitellään samoilla funktioilla. Myös piirisimuloinnin alalla on esitelty menestyksellisiä GPU-laskennan sovellutuksia. Tämän opinnäytteen tavoitteena on tutkia GPU-laskennan mahdollisuuksia APLAC-piirisimulointiohjelmassa. Työssä esitellään myös diodimallin laskennan toteutus GPU:lla. Epälineaarinen diodimalli toteutettiin NVIDIAn CUDA-arkkitehtuurilla, joka on niin sanottu SIMT-arkkitehtuuri (single-instruction, multiple-thread) eli yksi käsky suoritetaan kerrallaan usealle säikeelle. CUDA-laite ohjelmoitiin CUDA C -ohjelmointirajapinnalla, joka on standardin C-kielen laajennus. Testitulokset paljastivat että diodin yksinkertaisesta epälineaarisuudesta johtuen sen laskenta on liian kevyt, jotta GPU:n tehokkuudesta olisi mitään nopeusetua. Vaadittavat muutokset piirianalyysin rakenteeseen sekä datan hallintaan johtivat marginaalisesti alkuperäistä pidempään kokonaissimulointiaikaan. Kun diodimallia monimutkaistetaan moninkertaistamalla sen laskenta, CUDA-toteutus on nopeampi kuin alkuperäinen malli. Tämä antaa karkean arvion siitä kuinka monimutkainen malli hyötyy GPU-laskennasta. Vaikka diodimalli ei ollutkaan nopeampi GPU:lla, tämä toteutus on hyvä perusta tuleville CUDA-sovelluksille APLACissa. Näistä seuraavana on huomattavasti monimutkaisempi BSIM3-transistorimallin laskenta, joka mitä todennäköisimmin hyötyy GPU:n laskentatehosta.Graphics processing units (GPU) of today include hundreds of multi-threaded, multicore processors and a complex, high-bandwidth memory architecture, making them a good alternative to speed up general-purpose parallel computation where large data quantities are processed with same functions. Some successful applications of GPU computation have also been introduced in the field of circuit simulation. The objective of this thesis is to examine the GPU's computing potential in the APLAC circuit simulation software. The realization of a diode model on a GPU device is also presented. The nonlinear diode model was implemented on NVIDIA's Compute Unified Device Architecture (CUDA), that is a single-instruction, multiple-thread (SIMT) architecture. A CUDA device was programmed using the CUDA C application programming interface, which is an extension of the standard C language. The test results revealed that due to the diode's simple nonlinearity, its evaluation is computationally too light to gain any speed benefit from the GPU's computation power. The required modifications to the circuit analysis structure and data handling resulted in a marginally longer total simulation time than initially. However, when the diode model is made more complex by multiplying its evaluation, the CUDA implementation is faster than the original model. This gives a rough estimate of how complex a model benefits from the GPU computation. Although, the diode model evaluation was not faster on the GPU, this implementation is a good foundation for future CUDA applications in APLAC. The next of these applications will be the computationally more complex BSIM3 transistor model, which will most likely benefit from the computing power of GPU devices

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Image Re-ranking Acceleration On Gpus

    No full text
    Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Huge image collections are becoming available lately. In this scenario, the use of Content-Based Image Retrieval (CBIR) systems has emerged as a promising approach to support image searches. The objective of CBIR systems is to retrieve the most similar images in a collection, given a query image, by taking into account image visual properties such as texture, color, and shape. In these systems, the effectiveness of the retrieval process depends heavily on the accuracy of ranking approaches. Recently, re-ranking approaches have been proposed to improve the effectiveness of CBIR systems by taking into account the relationships among images. The re-ranking approaches consider the relationships among all images in a given dataset These approaches typically demands a huge amount of computational power, which hampers its use in practical situations. On the other hand, these methods can be massively parallelized. In this paper, we propose to speedup the computation of the RL-Sim algorithm, a recently proposed image re-ranking approach, by using the computational power of Graphics Processing Units (GPU). GPUs are emerging as relatively inexpensive parallel processors that are becoming available on a wide range of computer systems. We address the image re-ranking performance challenges by proposing a parallel solution designed to fit the computational model of GPUs. We conducted an experimental evaluation considering different implementations and devices. Experimental results demonstrate that significant performance gains can be obtained. Our approach achieves speedups of 7 × from serial implementation considering the overall algorithm and up to 36 × on its core steps. © 2013 IEEE.176183Brazilian Computer Society (SBC),Brazilian Funding Agencies CAPES,CNPq,et al.,IEEE Computer Society Through the Technical Committees,on Computer Architecture (TCCA) and TCSCConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Datta, R., Joshi, D., Li, J., Wang, J.Z., Image retrieval: Ideas, influences, and trends of the new age (2008) ACM Computing Surveys, 40 (2), pp. 51-560McDonald, S., Tait, J., Search strategies in content-based image retrieval (2003) 26th ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR'03), pp. 80-87Ferreira, C.D., Dos Santos, J.A., Da Torres, S.R., Gonçalves, M.A., Rezende, R.C., Fan, W., Relevance feedback based on genetic programming for image retrieval (2011) Pattern Recogninion Letters, 32 (1), pp. 27-37Dos Santos, J.A., Ferreira, C.D., Da Torres, S.R., Gonçalves, M.A., Lamparelli, R.A., A relevance feedback method based on genetic programming for classification of remote sensing images (2011) Information Sciences, 181 (13), pp. 2671-2684Pedronette, D.C.G., Da Torres, S.R., Image re-ranking and rank aggregation based on similarity of ranked lists (2013) Pattern Recognition, , to appear http://dx.doi.org/10.1016/j.patcog.2013.01.004Yang, X., Prasad, L., Latecki, L., Affinity learning with diffusion on tensor product graph (2012) Pattern Analysis and Machine Intelligence, PP (99), p. 1. , IEEE Transactions onYang, X., Latecki, L.J., Affinity learning on a tensor product graph with applications to shape and image retrieval (2011) IEEE Conference on Computer Vision and Pattern Recognition (CVPR'2011), pp. 2369-2376Pedronette, D.C.G., Da Torres, S.R., Exploiting pairwise recommendation and clustering strategies for image re-ranking (2012) Information Sciences, 207, pp. 19-34Jegou, H., Schmid, C., Harzallah, H., Verbeek, J., Accurate image search using the contextual dissimilarity measure (2010) IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (1), pp. 2-11Pedronette, D.C.G., Da Torres, S.R., Borin, E., Breternitz, M., Efficient image re-ranking computation on GPUs (2012) Int. Symposium Parallel Distributed Processing (ISPA'2012), pp. 95-102Pedronette, D.C.G., Da Torres, S.R., Image re-ranking and rank aggregation based on similarity of ranked lists (2011) Computer Analysis of Images and Patterns (CAIP'2011), 6854, pp. 369-376Banerjee, D., Kothapalli, K., Hybrid algorithms for list ranking and graph connected components (2011) High Performance Computing (HiPC), pp. 1-10. , 2011 18th International Conference on, decKontschieder, P., Donoser, M., Bischof, H., Beyond pairwise shape similarity analysis (2009) Asian Conference on Computer Vision, pp. 655-666Yang, X., Bai, X., Latecki, L.J., Tu, Z., Improving shape retrieval by learning graph transduction (2008) European Conference on Computer Vision (ECCV'2008), 4, pp. 788-801Jiang, J., Wang, B., Tu, Z., Unsupervised metric learning by self-smoothing operator (2011) ICCV, pp. 794-801Shen, X., Lin, Z., Brandt, J., Avidan, S., Wu, Y., Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking (2012) Computer Vision and Pattern Recognition (CVPR), pp. 3013-3020. , 2012 IEEE Conference on, juneQin, D., Gammeter, S., Bossard, L., Quack, T., Van Gool, L., Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors (2011) Computer Vision and Pattern Recognition (CVPR), pp. 777-784. , 2011 IEEE Conference on, juneYe, G., Liu, D., Jhuo, I.-H., Chang, S.-F., Robust late fusion with rank minimization (2012) Computer Vision and Pattern Recognition (CVPR), pp. 3021-3028. , 2012 IEEE Conference on, juneWang, J., Li, Y., Bai, X., Zhang, Y., Wang, C., Tang, N., Learning context-sensitive similarity by shortest path propagation (2011) Pattern Recognition, 44 (10-11), pp. 2367-2374Yang, X., Koknar-Tezel, S., Latecki, L.J., Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval (2009) IEEE Conference on Computer Vision and Pattern Recognition (CVPR'2009), pp. 357-364Pedronette, D.C.G., Da Torres, S.R., Exploiting clustering approaches for image re-ranking (2011) Journal of Visual Languages and Computing, 22 (6), pp. 453-466Pedronette, D.C.G., Da Torres, S.R., Exploiting contextual information for image re-ranking and rank aggregation (2012) International Journal of Multimedia Information Retrieval, 1 (2), pp. 115-128Perronnin, F., Liu, Y., Renders, J.-M., A family of contextual measures of similarity between distributions with application to image retrieval (2009) IEEE Conference on Computer Vision and Pattern Recognition (CVPR'2009), pp. 2358-2365Steele, J., Cochran, R., Introduction to GPGPU programming (2007) Proceedings of the 45th Annual Southeast Regional Conference, Ser. ACM-SE 45, pp. 508-508Scott Rostrup, S.S., Singhal, K., Fast and memory-efficient minimum spanning tree on the gpu (2011) Proceedings of the Second International Workshop on GPUs and Scientific Applications (GPUScA), pp. 3-13. , PACT 2011Thilina Gunarathne, A.C., Salpitikorala, B., Fox, G., Optimizing OpenCL kernels for iterative statistical algorithms on GPUs (2011) Second International Workshop on GPUs and Scientific Applications (GPUScA), pp. 33-44. , PACT 2011Wu, T., Wang, B., Shan, Y., Yan, F., Wang, Y., Xu, N., Efficient pagerank and spmv computation on AMD GPUs (2010) 39th International Conference on Parallel Processing (ICPP'2010), pp. 81-89Wang, B., Wu, T., Yan, F., Li, R., Xu, N., Wang, Y., Rankboost acceleration on both nvidia cuda and ati stream platforms (2009) Parallel and Distributed Systems (ICPADS), pp. 284-291. , 2009 15th International Conference on, decStrong, G., Gong, M., Browsing a large collection of community photos based on similarity on gpu (2008) 4th International Symposium on Advances in Visual Computing (ISVC'08), pp. 390-399Pham, N.-K., Morin, A., Gros, P., Accelerating image retrieval using factorial correspondence analysis on GPU (2009) Computer Analysis of Images and Patterns (CAIP'2009), pp. 565-572Zhu, F., Chen, P., Yang, D., Zhang, W., Chen, H., Zang, B., A GPU-based high-throughput image retrieval algorithm (2012) Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, Ser. GPGPU-5, pp. 30-37Pedronette, D.C.G., Da Torres, S.R., Shape retrieval using contour features and distance optmization (2010) International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP'2010), 1, pp. 197-202Stone, J.E., Gohara, D., Shi, G., OpenCL: A parallel programming standard for heterogeneous computing systems (2010) Computing in Science Engineering, 12 (3), pp. 66-73AMD Accelerated Parallel Processing OpenCL Programming Guide, , http://developer.amd.com/download, Accessed 2013-01-01AMD Accelerated Parallel Processing OpenCL Programming Guide, , http://developer.amd.com, accessed 2013-01-30Satish, N., Harris, M., Garland, M., Designing efficient sorting algorithms for manycore gpus (2009) Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, Ser. IPDPS '09Latecki, L.J., Lakmper, R., Eckhardt, U., Shape descriptors for non-rigid shapes with a single closed contour (2000) IEEE Conference on Computer Vision and Pattern Recognition (CVPR'2000), pp. 424-429Courtecuisse, H., Allard, J., Parallel dense gauss-seidel algorithm on many-core processors (2009) Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications, Ser. HPCC '09, pp. 139-147Hong, C., Chen, D., Chen, W., Zheng, W., Lin, H., MapCG: Writing parallel program portable between CPU and GPU (2010) Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, Ser. PACT '10Pedronette, D.C.G., Da Torres, S.R., Combining re-ranking and rank aggregation methods (2012) Iberoamerican Congress on Pattern Recognition (CIARP'2012), pp. 170-17
    corecore