Search CORE

20 research outputs found

Exploiting hybrid parallelism in the kinematic analysis of multibody systems based on group equations

Author: Bernabé García Gregorio
Cano Lorente José Carlos
Cuenca Muñoz Antonio Javier
Flores Gil Antonio
Giménez Cánovas Domingo
Saura Sánchez Maríano
Segado Cabezos Pablo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Computational kinematics is a fundamental tool for the design, simulation, control, optimization and dynamic analysis of multibody systems. The analysis of complex multibody systems and the need for real time solutions requires the development of kinematic and dynamic formulations that reduces computational cost, the selection and efficient use of the most appropriated solvers and the exploiting of all the computer resources using parallel computing techniques. The topological approach based on group equations and natural coordinates reduces the computation time in comparison with well-known global formulations and enables the use of parallelism techniques which can be applied at different levels: simultaneous solution of equations, use of multithreading routines, or a combination of both. This paper studies and compares these topological formulation and parallel techniques to ascertain which combination performs better in two applications. The first application uses dedicated systems for the real time control of small multibody systems, defined by a few number of equations and small linear systems, so shared-memory parallelism in combination with linear algebra routines is analyzed in a small multicore and in Raspberry Pi. The control of a Stewart platform is used as a case study. The second application studies large multibody systems in which the kinematic analysis must be performed several times during the design of multibody systems. A simulator which allows us to control the formulation, the solver, the parallel techniques and size of the problem has been developed and tested in more powerful computational systems with larger multicores and GPU.This work was supported by the Spanish MINECO, as well as European Commission FEDER funds, under grant TIN2015-66972-C5-3-

Repositorio Digital de la Universidad Politécnica de Cartagena

Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement

Author: Anam Mohammad Ashraful
Andreopoulos Yiannis
Publication venue
Publication date: 16/04/2016
Field of study

A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on

M

integer data streams (

M\geq3

), such as: scaling, additions/subtractions, inner or outer vector products, permutations and convolutions. In the proposed method, the

M

input integer data streams are linearly superimposed to form

M

numerically-entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The results are extracted from the

M

entangled output streams by additions and arithmetic shifts. Any soft errors affecting any single disentangled output stream are guaranteed to be detectable via a specific post-computation reliability check. In addition, when utilizing a separate processor core for each of the

M

streams, the proposed approach can recover all outputs after any single fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, the number of operations required for the entanglement, extraction and validation of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via fast Fourier transforms, circular convolutions, and matrix multiplication operations. Our analysis and experiments reveal that the proposed approach incurs between

0.03\%

7\%

reduction in processing throughput for a wide variety of LSB operations. This overhead is 5 to 1000 times smaller than that of the equivalent ABFT method that uses a checksum stream. Thus, our proposal can be used in fault-generating processor hardware or safety-critical applications, where high reliability is required without the cost of ABFT or modular redundancy.Comment: to appear in IEEE Trans. on Signal Processing, 201

arXiv.org e-Print Archive

UCL Discovery

A Combined MPI-CUDA Parallel Solution of Linear and Nonlinear Poisson-Boltzmann Equation

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Crossref

A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration

Author: Mamalis Basilis
Perlitis Marios
Publication venue
Publication date: 20/11/2022
Field of study

The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also extensively been studied. The computational power provided by the modern GPUs as well as the rapid development of multicore CPU systems have led OpenMP and CUDA programming models to the top preferences during the last years. However, the desired efficient collaboration between CPU and GPU through the combined use of the above programming models is still considered a hard research problem. In the above context, we demonstrate here an excessively efficient implementation of standard simplex, targeting to the best possible exploitation of the concurrent use of all the computing resources, on a multicore platform with multiple CUDA-enabled GPUs. More concretely, we present a novel hybrid collaboration scheme which is based on the concurrent execution of suitably spread CPU-assigned (via multithreading) and GPU-offloaded computations. The experimental results extracted through the cooperative use of OpenMP and CUDA over a notably powerful modern hybrid platform (consisting of 32 cores and two high-spec GPUs, Titan Rtx and Rtx 2080Ti) highlight that the performance of the presented here hybrid GPU/CPU collaboration scheme is clearly superior to the GPU-only implementation under almost all conditions. The corresponding measurements validate the value of using all resources concurrently, even in the case of a multi-GPU configuration platform. Furthermore, the given implementations are completely comparable (and slightly superior in most cases) to other related attempts in the bibliography, and clearly superior to the native CPU-implementation with 32 cores.Comment: 12 page

arXiv.org e-Print Archive

Reliable Linear, Sesquilinear, and Bijective Operations on Integer Data Streams Via Numerical Entanglement

Author: Anam MA
Andreopoulos Y
Publication venue
Publication date: 29/04/2016
Field of study

A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on

M

integer data streams (

M \geq 3

), such as: scaling, additions/subtractions, inner or outer vector products, permutations and convolutions. In the proposed method,

M

input integer data streams are linearly superimposed to form

M

numerically-entangled integer data streams that are stored in-place of the original inputs. LSB operations can then be performed directly using these entangled data streams. The results are extracted from the

M

entangled output streams by additions and arithmetic shifts. Any soft errors affecting one disentangled output stream are guaranteed to be detectable via a post-computation reliability check. Additionally, when utilizing a separate processor core for each stream, our approach can recover all outputs after any single fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, the number of operations required for the entire process is linearly related to the number of inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor via several types of operations: fast Fourier transforms, convolutions, and matrix multiplication operations. Our analysis and experiments reveal that the proposed approach incurs between 0.03% to 7% reduction in processing throughput for numerous LSB operations. This overhead is 5 to 1000 times smaller than that of the equivalent ABFT method that uses a checksum stream. Thus, our proposal can be used in fault-generating processor hardware or safety-critical applications, where high reliability is required without the cost of ABFT or modular redundancy

UCL Discovery

Towards dense linear algebra for hybrid GPU accelerated manycore systems

Author: Buttari
Dongarra
Higham
Hruska
Jack Dongarra
Kågström
Marc Baboulin
Owens
Owens
Pharr
Seiler
Stanimire Tomov
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Parallel Model Counting with CUDA: Algorithm Engineering for Efficient Hardware Utilization

Author: Fichte Johannes K.
Hecher Markus
Roland Valentin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th International Conference on Principles and Practice of Constraint Programming (CP 2021)
Publication date: 01/01/2021
Field of study

Propositional model counting (MC) and its extensions as well as applications in the area of probabilistic reasoning have received renewed attention in recent years. As a result, also the need for quickly solving counting-based problems with automated solvers is critical for certain areas. In this paper, we present experiments evaluating various techniques in order to improve the performance of parallel model counting on general purpose graphics processing units (GPGPUs). Thereby, we mainly consider engineering efficient algorithms for model counting on GPGPUs that utilize the treewidth of a propositional formula by means of dynamic programming. The combination of our techniques results in the solver GPUSAT3, which is based on the programming framework Cuda that -compared to other frameworks- shows superior extensibility and driver support. When combining all findings of this work, we show that GPUSAT3 not only solves more instances of the recent Model Counting Competition 2020 (MCC 2020) than existing GPGPU-based systems, but also solves those significantly faster. A portfolio with one of the best solvers of MCC 2020 and GPUSAT3 solves 19% more instances than the former alone in less than half of the runtime

Dagstuhl Research Online Publication Server

Evaluación de prestaciones mediante la aplicación HPL de clusters utilizando rCUDA

Author: Castelló Adrián
Publication venue: 'Universitat Jaume I'
Publication date: 01/01/2014
Field of study

Treball de Fi de Màster en Sistemes Intel.ligents. Codi: SIU043. Curs 2013-2014A lo largo de este documento se describe el proyecto realizado en la asignatura SIU043-Trabajo Fin de Máster. Este trabajo se ha llevado a cabo en el grupo de investigación High Performance Computing and Architectures del Departamento de Ingeniería y Ciencia de los Computadores de la Universitat Jaume I bajo la supervisión de Rafael Mayo Gual. El proyecto se ha centrado en la evaluación del rendimiento mediante el uso de la aplicación Linpack Benchmark del software rCUDA. Este software permite la ejecución de una aplicación CUDA en un nodo que no disponga de ninguna GPU instalada, utilizando mediante la red de interconexión una GPU instalada en otro nodo como si fuera local. El objetivo de este trabajo es dotar a rCUDA de la funcionalidad necesaria para poder ejecutar este test y posteriormente analizar las prestaciones obtenidas. Estas prestaciones deben de ser comparadas con la ejecución de este mismo test sobre un nodo utilizando CUDA

Repositori Institucional de la Universitat Jaume I