4,735 research outputs found

    A minimalistic approach for fast computation of geodesic distances on triangular meshes

    Full text link
    The computation of geodesic distances is an important research topic in Geometry Processing and 3D Shape Analysis as it is a basic component of many methods used in these areas. In this work, we present a minimalistic parallel algorithm based on front propagation to compute approximate geodesic distances on meshes. Our method is practical and simple to implement and does not require any heavy pre-processing. The convergence of our algorithm depends on the number of discrete level sets around the source points from which distance information propagates. To appropriately implement our method on GPUs taking into account memory coalescence problems, we take advantage of a graph representation based on a breadth-first search traversal that works harmoniously with our parallel front propagation approach. We report experiments that show how our method scales with the size of the problem. We compare the mean error and processing time obtained by our method with such measures computed using other methods. Our method produces results in competitive times with almost the same accuracy, especially for large meshes. We also demonstrate its use for solving two classical geometry processing problems: the regular sampling problem and the Voronoi tessellation on meshes.Comment: Preprint submitted to Computers & Graphic

    Harvesting graphics power for MD simulations

    Get PDF
    We discuss an implementation of molecular dynamics (MD) simulations on a graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms suitable for short-ranged and long-ranged interactions, and a congruential shift random number generator are presented. The performance of the GPU's is compared to their main processor counterpart. We achieve speedups of up to 80, 40 and 150 fold, respectively. With newest generation of GPU's one can run standard MD simulations at 10^7 flops/$.Comment: 12 pages, 5 figures. Submitted to Mol. Si

    QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

    Get PDF
    Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

    RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures

    Full text link
    Chiplet architectures are a promising paradigm to overcome the scaling challenges of monolithic chips. Chiplets offer heterogeneity, modularity, and cost-effectiveness. The design space of chiplet architectures is huge as there are many degrees of freedom such as the number, size and placement of chiplets, the topology of the inter-chiplet interconnect and many more. Existing tools for cost and performance prediction are often too slow to explore this design space. We present RapidChiplet, a fast, open-source toolchain to predict latency and throughput of the inter-chiplet interconnect, as well as a chip's manufacturing cost and thermal stability

    Degree-Driven Design of Geometric Algorithms for Point Location, Proximity, and Volume Calculation

    Get PDF
    Correct implementation of published geometric algorithms is surprisingly difficult. Geometric algorithms are often designed for Real-RAM, a computational model that provides arbitrary precision arithmetic operations at unit cost. Actual commodity hardware provides only finite precision and may result in arithmetic errors. While the errors may seem small, if ignored, they may cause incorrect branching, which may cause an implementation to reach an undefined state, produce erroneous output, or crash. In 1999 Liotta, Preparata and Tamassia proposed that in addition to considering the resources of time and space, an algorithm designer should also consider the arithmetic precision necessary to guarantee a correct implementation. They called this design technique degree-driven algorithm design. Designers who consider the time, space, and precision for a problem up-front arrive at new solutions, gain further insight, and find simpler representations. In this thesis, I show that degree-driven design supports the development of new and robust geometric algorithms. I demonstrate this claim via several new algorithms. For n point sites on a UxU grid I consider three problems. First, I show how to compute the nearest neighbor transform in O(U^2) expected time, O(U^2) space, and double precision. Second, I show how to create a data structure in O(n log Un) expected time, O(n) expected space, and triple precision that supports O(log n) time and double precision post-office queries. Third, I show how to compute the Gabriel graph in O(n^2) time, O(n^2) space and double precision. For computing volumes of CSG models, I describe a framework that uses a minimal set of predicates that use at most five-fold precision. The framework is over 500x faster and two orders of magnitude more accurate than a Monte Carlo volume calculation algorithm.Doctor of Philosoph
    corecore