Search CORE

230 research outputs found

Performance Comparison of GPU-Based Jacobi Solvers Using CUDA Provided Synchronization Methods

Author: Asif Ali daniyal
Aslam Maria
Mumtaz Shahzad
Riaz Omer
Publication venue
Publication date: 24/02/2020
Field of study

Peer reviewe

Aberdeen University Research

PRISM-PSY:Precise GPU-Accelerated Parameter Synthesis for Stochastic Systems

Author: Brim L
Ceska M
Kwiatkowska MZ
Kwiatkowska MZ
Paoletti N
Pilar P
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In this paper we present PRISM-PSY, a novel tool that performs precise GPU-accelerated parameter synthesis for continuous-time Markov chains and time-bounded temporal logic specifications. We redesign, in terms of matrix-vector operations, the recently formulated algorithms for precise parameter synthesis in order to enable effective dataparallel processing, which results in significant acceleration on many-core architectures. High hardware utilisation, essential for performance and scalability, is achieved by state space and parameter space parallelisation: the former leverages a compact sparse-matrix representation, and the latter is based on an iterative decomposition of the parameter space. Our experiments on several biological and engineering case studies demonstrate an overall speedup of up to 31-fold on a single GPU compared to the sequential implementation

Crossref

Royal Holloway - Pure

Oxford University Research Archive

Tools and Algorithms for the Construction and Analysis of Systems

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers

OAPEN Library

PROGRAMAÇÃO PARALELA DE UM MÉTODO ITERATIVO PARA SOLUÇÃO DE GRANDES SISTEMAS DE EQUAÇÕES LINEARES USANDO A INTEGRAÇÃO CUDA-MATLAB

Author: Martins de Paula Lauro Cássio
Publication venue: 'Revista de Sistemas e Computacao - RSC'
Publication date: 04/10/2014
Field of study

Linear Equations Systems may appear as modeling result of many problems in mathematics, engineering and computer science. The Bi-Conjugate Gradient Stabilized (BiCGStab) method is an iterative method used for solving linear systems, specially the sparse and large ones. In this context, this paper proposes a parallel implementation of the BiCGStab method for solving large linear systems. The proposed implementation uses a Graphics Processing Unit (GPU) through the CUDA-Matlab integration, in which the method operations are performed in the processing cores of the GPU by the Matlab built-in functions. Such implementation aims to provide a high computational performance in relation to its sequential implementation. In addition, we compare the BiCGStab computational performance with an implementation of the Hybrid Bi-Conjugate Gradient Stabilized (BiCGStab(2)) method, recently proposed by the author in the solution of random linear systems with varying sizes. The results showed that the parallelized BiCGStab is more efficient in solving the treated systems. It was possible to obtain gains of computational efficiency of approximately 5x in relation to the sequential implementation of the BiCGStab. Compared with the BiCGStab(2) the parallelized BiCGStab was on average 2x faster.Sistemas de equações lineares podem aparecer como resultado da modelagem de diversos problemas da área de matemática, engenharia e ciência da computação. O método Gradiente Bi-Conjugado Estabilizado (BiCGStab) é um método iterativo utilizado para solucionar sistemas lineares, principalmente sistemas esparsos e de grande porte. Nesse contexto, este artigo propõe uma implementação paralela do método BiCGStab para solução de grandes sistemas lineares. A implementação proposta faz uso de uma Graphics Processing Unit (GPU) por meio da integração CUDA-Matlab, onde as operações do método são executadas nos núcleos de processamento da GPU pelas próprias funções do Matlab. Tal implementação visa proporcionar um desempenho computacional superior em relação à sua implementação sequencial. Adicionalmente, comparou-se o desempenho computacional do BiCGStab com uma implementação do método Gradiente Bi-Conjugado Estabilizado Híbrido (BiCGStab(2)), proposta recentemente pelo autor, na solução de sistemas lineares aleatórios e com tamanhos variados. Os resultados mostraram que o BiCGStab paralelizado é mais eficiente na solução dos sistemas tratados. Foi possível obter ganhos de eficiência computacional de aproximadamente 5x em relação à implementação sequencial do BiCGStab. Em comparação com o BiCGStab(2), o BiCGStab paralelizado se mostrou, em média, 2x mais rápido

Universidade Salvador: Portal de Periódicos UNIFACS

IMPLEMENTAÇÃO PARALELA DO MÉTODO BICGSTAB(2) EM GPU USANDO CUDA E MATLAB PARA SOLUÇÃO DE SISTEMAS LINEARES

Author: Martins de Paula Lauro Cássio
Publication venue: 'Revista de Sistemas e Computacao - RSC'
Publication date: 24/12/2013
Field of study

This paper presents a parallel implementation of the Hybrid Bi-Conjugate Gradient Stabilized (BiCGStab(2)) iterative method in Graphics Processing Unit (GPU) for solution of large and sparse linear systems. This implementation uses the CUDA-Matlab integration, in which the method operations are performed in a GPU cores using Matlab built-in functions. The goal is to show that the exploitation of parallelism by using this new technology can provide a significant computational performance. For the validation of the work we compared the proposed implementation with a BiCGStab(2) sequential and parallelized implementation in the C and CUDA-C languages, respectively. The results showed that the proposed implementation is more efficient and can be indispensable for simulations being carried out with quality and in a timely manner. The gains in computational efficiency were, respectively, 76x and 6x compared to the implementation in C and CUDA-C.Este artigo apresenta uma implementação paralela do método iterativo Gradiente Bi-Conjugado Estabilizado Híbrido (BiCGStab(2)) em Graphics Processing Unit (GPU) para solução de sistemas lineares grandes e esparsos. Tal implementação faz uso da integração CUDA-Matlab, em que as operações do método são executadas nos núcleos de uma GPU por meio de funções padrão do Matlab. O objetivo é mostrar que a exploração de paralelismo utilizando essa nova tecnologia pode fornecer um desempenho computacional significante. Para a validação do trabalho, comparou-se a implementação proposta com uma implementação sequencial e outra paralelizada do BiCGStab(2) nas linguagens C e CUDA-C, respectivamente. Os resultados mostraram que a implementação proposta é mais eficiente e pode ser indispensável para que simulações sejam realizadas com qualidade e em um tempo hábil. Os ganhos de eficiência computacional foram de, respectivamente, 76x e 6x em relação à implementação em C e CUDA-C

Universidade Salvador: Portal de Periódicos UNIFACS

Tools and Algorithms for the Construction and Analysis of Systems

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2022
Field of study

This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems

Directory of Open Access Books (DOAB)

The Modest State of Learning, Sampling, and Verifying Strategies

Author: Hartmanns Arnd
Klauck Michaela
Publication venue: Springer
Publication date: 17/10/2022
Field of study

University of Twente Research Information

Neuromorphic Learning Systems for Supervised and Unsupervised Applications

Author: Chen Qiuwen
Publication venue: SURFACE at Syracuse University
Publication date: 01/12/2016
Field of study

The advancements in high performance computing (HPC) have enabled the large-scale implementation of neuromorphic learning models and pushed the research on computational intelligence into a new era. Those bio-inspired models are constructed on top of unified building blocks, i.e. neurons, and have revealed potentials for learning of complex information. Two major challenges remain in neuromorphic computing. Firstly, sophisticated structuring methods are needed to determine the connectivity of the neurons in order to model various problems accurately. Secondly, the models need to adapt to non-traditional architectures for improved computation speed and energy efficiency. In this thesis, we address these two problems and apply our techniques to different cognitive applications. This thesis first presents the self-structured confabulation network for anomaly detection. Among the machine learning applications, unsupervised detection of the anomalous streams is especially challenging because it requires both detection accuracy and real-time performance. Designing a computing framework that harnesses the growing computing power of the multicore systems while maintaining high sensitivity and specificity to the anomalies is an urgent research need. We present AnRAD (Anomaly Recognition And Detection), a bio-inspired detection framework that performs probabilistic inferences. We leverage the mutual information between the features and develop a self-structuring procedure that learns a succinct confabulation network from the unlabeled data. This network is capable of fast incremental learning, which continuously refines the knowledge base from the data streams. Compared to several existing anomaly detection methods, the proposed approach provides competitive detection accuracy as well as the insight to reason the decision making. Furthermore, we exploit the massive parallel structure of the AnRAD framework. Our implementation of the recall algorithms on the graphic processing unit (GPU) and the Xeon Phi co-processor both obtain substantial speedups over the sequential implementation on general-purpose microprocessor (GPP). The implementation enables real-time service to concurrent data streams with diversified contexts, and can be applied to large problems with multiple local patterns. Experimental results demonstrate high computing performance and memory efficiency. For vehicle abnormal behavior detection, the framework is able to monitor up to 16000 vehicles and their interactions in real-time with a single commodity co-processor, and uses less than 0.2ms for each testing subject. While adapting our streaming anomaly detection model to mobile devices or unmanned systems, the key challenge is to deliver required performance under the stringent power constraint. To address the paradox between performance and power consumption, brain-inspired hardware, such as the IBM Neurosynaptic System, has been developed to enable low power implementation of neural models. As a follow-up to the AnRAD framework, we proposed to port the detection network to the TrueNorth architecture. Implementing inference based anomaly detection on a neurosynaptic processor is not straightforward due to hardware limitations. A design flow and the supporting component library are developed to flexibly map the learned detection networks to the neurosynaptic cores. Instead of the popular rate code, burst code is adopted in the design, which represents numerical value using the phase of a burst of spike trains. This does not only reduce the hardware complexity, but also increases the result\u27s accuracy. A Corelet library, NeoInfer-TN, is implemented for basic operations in burst code and two-phase pipelines are constructed based on the library components. The design can be configured for different tradeoffs between detection accuracy, hardware resource consumptions, throughput and energy. We evaluate the system using network intrusion detection data streams. The results show higher detection rate than some conventional approaches and real-time performance, with only 50mW power consumption. Overall, it achieves 10^8 operations per Joule. In addition to the modeling and implementation of unsupervised anomaly detection, we also investigate a supervised learning model based on neural networks and deep fragment embedding and apply it to text-image retrieval. The study aims at bridging the gap between image and natural language. It continues to improve the bidirectional retrieval performance across the modalities. Unlike existing works that target at single sentence densely describing the image objects, we elevate the topic to associating deep image representations with noisy texts that are only loosely correlated. Based on text-image fragment embedding, our model employs a sequential configuration, connects two embedding stages together. The first stage learns the relevancy of the text fragments, and the second stage uses the filtered output from the first one to improve the matching results. The model also integrates multiple convolutional neural networks (CNN) to construct the image fragments, in which rich context information such as human faces can be extracted to increase the alignment accuracy. The proposed method is evaluated with both synthetic dataset and real-world dataset collected from picture news website. The results show up to 50% ranking performance improvement over the comparison models

Syracuse University Research Facility and Collaborative Environment