4,735 research outputs found
A minimalistic approach for fast computation of geodesic distances on triangular meshes
The computation of geodesic distances is an important research topic in
Geometry Processing and 3D Shape Analysis as it is a basic component of many
methods used in these areas. In this work, we present a minimalistic parallel
algorithm based on front propagation to compute approximate geodesic distances
on meshes. Our method is practical and simple to implement and does not require
any heavy pre-processing. The convergence of our algorithm depends on the
number of discrete level sets around the source points from which distance
information propagates. To appropriately implement our method on GPUs taking
into account memory coalescence problems, we take advantage of a graph
representation based on a breadth-first search traversal that works
harmoniously with our parallel front propagation approach. We report
experiments that show how our method scales with the size of the problem. We
compare the mean error and processing time obtained by our method with such
measures computed using other methods. Our method produces results in
competitive times with almost the same accuracy, especially for large meshes.
We also demonstrate its use for solving two classical geometry processing
problems: the regular sampling problem and the Voronoi tessellation on meshes.Comment: Preprint submitted to Computers & Graphic
Harvesting graphics power for MD simulations
We discuss an implementation of molecular dynamics (MD) simulations on a
graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code
on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms
suitable for short-ranged and long-ranged interactions, and a congruential
shift random number generator are presented. The performance of the GPU's is
compared to their main processor counterpart. We achieve speedups of up to 80,
40 and 150 fold, respectively. With newest generation of GPU's one can run
standard MD simulations at 10^7 flops/$.Comment: 12 pages, 5 figures. Submitted to Mol. Si
QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
Previous studies have reported that common dense linear algebra operations do
not achieve speed up by using multiple geographical sites of a computational
grid. Because such operations are the building blocks of most scientific
applications, conventional supercomputers are still strongly predominant in
high-performance computing and the use of grids for speeding up large-scale
scientific problems is limited to applications exhibiting parallelism at a
higher level. We have identified two performance bottlenecks in the distributed
memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear
algebra library. First, because ScaLAPACK assumes a homogeneous communication
network, the implementations of ScaLAPACK algorithms lack locality in their
communication pattern. Second, the number of messages sent in the ScaLAPACK
algorithms is significantly greater than other algorithms that trade flops for
communication. In this paper, we present a new approach for computing a QR
factorization -- one of the main dense linear algebra kernels -- of tall and
skinny matrices in a grid computing environment that overcomes these two
bottlenecks. Our contribution is to articulate a recently proposed algorithm
(Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in
order to confine intensive communications (ScaLAPACK calls) within the
different geographical sites. An experimental study conducted on the Grid'5000
platform shows that the resulting performance increases linearly with the
number of geographical sites on large-scale problems (and is in particular
consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed
Processing Symposium 2010 in Atlanta, GA, USA.
RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures
Chiplet architectures are a promising paradigm to overcome the scaling
challenges of monolithic chips. Chiplets offer heterogeneity, modularity, and
cost-effectiveness. The design space of chiplet architectures is huge as there
are many degrees of freedom such as the number, size and placement of chiplets,
the topology of the inter-chiplet interconnect and many more. Existing tools
for cost and performance prediction are often too slow to explore this design
space. We present RapidChiplet, a fast, open-source toolchain to predict
latency and throughput of the inter-chiplet interconnect, as well as a chip's
manufacturing cost and thermal stability
Degree-Driven Design of Geometric Algorithms for Point Location, Proximity, and Volume Calculation
Correct implementation of published geometric algorithms is surprisingly difficult. Geometric algorithms are often designed for Real-RAM, a computational model that provides arbitrary precision arithmetic operations at unit cost. Actual commodity hardware provides only finite precision and may result in arithmetic errors. While the errors may seem small, if ignored, they may cause incorrect branching, which may cause an implementation to reach an undefined state, produce erroneous output, or crash. In 1999 Liotta, Preparata and Tamassia proposed that in addition to considering the resources of time and space, an algorithm designer should also consider the arithmetic precision necessary to guarantee a correct implementation. They called this design technique degree-driven algorithm design. Designers who consider the time, space, and precision for a problem up-front arrive at new solutions, gain further insight, and find simpler representations. In this thesis, I show that degree-driven design supports the development of new and robust geometric algorithms. I demonstrate this claim via several new algorithms. For n point sites on a UxU grid I consider three problems. First, I show how to compute the nearest neighbor transform in O(U^2) expected time, O(U^2) space, and double precision. Second, I show how to create a data structure in O(n log Un) expected time, O(n) expected space, and triple precision that supports O(log n) time and double precision post-office queries. Third, I show how to compute the Gabriel graph in O(n^2) time, O(n^2) space and double precision. For computing volumes of CSG models, I describe a framework that uses a minimal set of predicates that use at most five-fold precision. The framework is over 500x faster and two orders of magnitude more accurate than a Monte Carlo volume calculation algorithm.Doctor of Philosoph
- …