Search CORE

4,735 research outputs found

A minimalistic approach for fast computation of geodesic distances on triangular meshes

Author: Calla Luciano A. Romero
Montenegro Anselmo A.
Perez Lizeth J. Fuentes
Publication venue: 'Elsevier BV'
Publication date: 23/08/2019
Field of study

The computation of geodesic distances is an important research topic in Geometry Processing and 3D Shape Analysis as it is a basic component of many methods used in these areas. In this work, we present a minimalistic parallel algorithm based on front propagation to compute approximate geodesic distances on meshes. Our method is practical and simple to implement and does not require any heavy pre-processing. The convergence of our algorithm depends on the number of discrete level sets around the source points from which distance information propagates. To appropriately implement our method on GPUs taking into account memory coalescence problems, we take advantage of a graph representation based on a breadth-first search traversal that works harmoniously with our parallel front propagation approach. We report experiments that show how our method scales with the size of the problem. We compare the mean error and processing time obtained by our method with such measures computed using other methods. Our method produces results in competitive times with almost the same accuracy, especially for large meshes. We also demonstrate its use for solving two classical geometry processing problems: the regular sampling problem and the Voronoi tessellation on meshes.Comment: Preprint submitted to Computers & Graphic

arXiv.org e-Print Archive

Harvesting graphics power for MD simulations

Author: A. Arnold
Anderson J.A.
D. Frenkel
Fernando R.
Frenkel D.
J.A. van Meel
Li W.
Makino J.
Moreland K.
R.G. Belleman
Rost R.J.
S.F. Portegies Zwart
Publication venue: 'Informa UK Limited'
Publication date: 20/09/2007
Field of study

We discuss an implementation of molecular dynamics (MD) simulations on a graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms suitable for short-ranged and long-ranged interactions, and a congruential shift random number generator are presented. The performance of the GPU's is compared to their main processor counterpart. We achieve speedups of up to 80, 40 and 150 fold, respectively. With newest generation of GPU's one can run standard MD simulations at 10^7 flops/$.Comment: 12 pages, 5 figures. Submitted to Mol. Si

arXiv.org e-Print Archive

UvA-DARE

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Author: Camille Coti
Camille Coti
Camille Coti
Emmanuel Agullo
Emmanuel Agullo
Emmanuel Agullo
Jack Dongarra
Jack Dongarra
Jack Dongarra
Julien Langou
Julien Langou
Qr Fac
Thomas Herault
Thomas Herault
Thomas Herault
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2009
Field of study

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures

Author: Benini Luca
Besta Maciej
Bruggmann Benigna
Hoefler Torsten
Iff Patrick
Publication venue
Publication date: 10/11/2023
Field of study

Chiplet architectures are a promising paradigm to overcome the scaling challenges of monolithic chips. Chiplets offer heterogeneity, modularity, and cost-effectiveness. The design space of chiplet architectures is huge as there are many degrees of freedom such as the number, size and placement of chiplets, the topology of the inter-chiplet interconnect and many more. Existing tools for cost and performance prediction are often too slow to explore this design space. We present RapidChiplet, a fast, open-source toolchain to predict latency and throughput of the inter-chiplet interconnect, as well as a chip's manufacturing cost and thermal stability

arXiv.org e-Print Archive

SQNR Estimation of Fixed-Point DSP Algorithms

Author: Carlos Carreras
Gabriel Caffarena
JuanA López
Ángel Fernández
Publication venue: Springer Nature
Publication date: 01/01/2010
Field of study

Springer - Publisher Connector

Degree-Driven Design of Geometric Algorithms for Point Location, Proximity, and Volume Calculation

Author: Millman David L.
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2012
Field of study

Correct implementation of published geometric algorithms is surprisingly difficult. Geometric algorithms are often designed for Real-RAM, a computational model that provides arbitrary precision arithmetic operations at unit cost. Actual commodity hardware provides only finite precision and may result in arithmetic errors. While the errors may seem small, if ignored, they may cause incorrect branching, which may cause an implementation to reach an undefined state, produce erroneous output, or crash. In 1999 Liotta, Preparata and Tamassia proposed that in addition to considering the resources of time and space, an algorithm designer should also consider the arithmetic precision necessary to guarantee a correct implementation. They called this design technique degree-driven algorithm design. Designers who consider the time, space, and precision for a problem up-front arrive at new solutions, gain further insight, and find simpler representations. In this thesis, I show that degree-driven design supports the development of new and robust geometric algorithms. I demonstrate this claim via several new algorithms. For n point sites on a UxU grid I consider three problems. First, I show how to compute the nearest neighbor transform in O(U^2) expected time, O(U^2) space, and double precision. Second, I show how to create a data structure in O(n log Un) expected time, O(n) expected space, and triple precision that supports O(log n) time and double precision post-office queries. Third, I show how to compute the Gabriel graph in O(n^2) time, O(n^2) space and double precision. For computing volumes of CSG models, I describe a framework that uses a minimal set of predicates that use at most five-fold precision. The framework is over 500x faster and two orders of magnitude more accurate than a Monte Carlo volume calculation algorithm.Doctor of Philosoph

Carolina Digital Repository