70,423 research outputs found

    Pipelining the Fast Multipole Method over a Runtime System

    Get PDF
    Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs.Comment: No. RR-7981 (2012

    Efficient Parallel and Adaptive Partitioning for Load-balancing in Spatial Join

    Get PDF
    Due to the developments of topographic techniques, clear satellite imagery, and various means for collecting information, geospatial datasets are growing in volume, complexity, and heterogeneity. For efficient execution of spatial computations and analytics on large spatial data sets, parallel processing is required. To exploit fine-grained parallel processing in large scale compute clusters, partitioning in a load-balanced way is necessary for skewed datasets. In this work, we focus on spatial join operation where the inputs are two layers of geospatial data. Our partitioning method for spatial join uses Adaptive Partitioning (ADP) technique, which is based on Quadtree partitioning. Unlike existing partitioning techniques, ADP partitions the spatial join workload instead of partitioning the individual datasets separately to provide better load-balancing. Based on our experimental evaluation, ADP partitions spatial data in a more balanced way than Quadtree partitioning and Uniform grid partitioning. ADP uses an output-sensitive duplication avoidance technique which minimizes duplication of geometries that are not part of spatial join output. In a distributed memory environment, this technique can reduce data communication and storage requirements compared to traditional methods.To improve the performance of ADP, an MPI+Threads based parallelization is presented. With ParADP, a pair of real world datasets, one with 717 million polylines and another with 10 million polygons, is partitioned into 65,536 grid cells within 7 seconds. ParADP performs well with both good weak scaling up to 4,032 CPU cores and good strong scaling up to 4,032 CPU cores

    X-ray Synthesis Based on Triangular Mesh Models Using GPU-Accelerated Ray Tracing for Multi-modal Breast Image Registration

    Get PDF
    For image registration of breast MRI and X-ray mammography we apply detailed biomechanical models. Synthesizing X-ray mammograms from these models is an important processing step for optimizing registration parameters and deriving images for multi-modal diagnosis. A fast computation time for creating synthetic images is essential to enable a clinically relevant application. In this paper we present a method to create synthetic X-ray attenuation images with an hardware-optimized ray tracing algorithm on recent graphics processing units’ (GPU) ray tracing (RT) cores. The ray tracing algorithm is able to calculate the attenuation of the X-rays by tracing through a triangular polygon-mesh. We use the Vulkan API, which enables access to RT cores. One frame for a triangle mesh with over 5 million triangles in the mesh and a detector resolution of 1080×1080 can be calculated and transferred to and from the GPU in about 0.76 s on NVidia RTX 2070 Super GPU. Calculation duration of an interactive application without the transfer overhead allows real time application with more than 30 frames per second (fps) even for very large polygon models. The presented method is able to calculate synthetic X-ray images in a short time and has the potential for real-time applications. Also it is the very first implementation using RT cores for this purpose. The toolbox will be available as an open source

    SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Unsupervised Learning

    Get PDF
    We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers in the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a document on a dedicated server with 6 multithreaded cores. Using SCC, we extracted 11.8 million citation context sentences from ∼33.3k PMC papers in the CORD19 dataset, released on June 13, 2020. The source code is released at https://gitee.com/irlab/SmartCiteCon

    Parallel Astronomical Data Processing with Python: Recipes for multicore machines

    Full text link
    High performance computing has been used in various fields of astrophysical research. But most of it is implemented on massively parallel systems (supercomputers) or graphical processing unit clusters. With the advent of multicore processors in the last decade, many serial software codes have been re-implemented in parallel mode to utilize the full potential of these processors. In this paper, we propose parallel processing recipes for multicore machines for astronomical data processing. The target audience are astronomers who are using Python as their preferred scripting language and who may be using PyRAF/IRAF for data processing. Three problems of varied complexity were benchmarked on three different types of multicore processors to demonstrate the benefits, in terms of execution time, of parallelizing data processing tasks. The native multiprocessing module available in Python makes it a relatively trivial task to implement the parallel code. We have also compared the three multiprocessing approaches - Pool/Map, Process/Queue, and Parallel Python. Our test codes are freely available and can be downloaded from our website.Comment: 15 pages, 7 figures, 1 table, "for associated test code, see http://astro.nuigalway.ie/staff/navtejs", Accepted for publication in Astronomy and Computin
    • …
    corecore