194 research outputs found

    Advancement of Computing on Large Datasets via Parallel Computing and Cyberinfrastructure

    Get PDF
    Large datasets require efficient processing, storage and management to efficiently extract useful information for innovation and decision-making. This dissertation demonstrates novel approaches and algorithms using virtual memory approach, parallel computing and cyberinfrastructure. First, we introduce a tailored user-level virtual memory system for parallel algorithms that can process large raster data files in a desktop computer environment with limited memory. The application area for this portion of the study is to develop parallel terrain analysis algorithms that use multi-threading to take advantage of common multi-core processors for greater efficiency. Second, we present two novel parallel WaveCluster algorithms that perform cluster analysis by taking advantage of discrete wavelet transform to reduce large data to coarser representations so data is smaller and more easily managed than the original data in size and complexity. Finally, this dissertation demonstrates an HPC gateway service that abstracts away many details and complexities involved in the use of HPC systems including authentication, authorization, and data and job management

    A GPU-Accelerated Shallow-Water Scheme for Surface Runoff Simulations

    Get PDF
    The capability of a GPU-parallelized numerical scheme to perform accurate and fast simulations of surface runo in watersheds, exploiting high-resolution digital elevation models (DEMs), was investigated. The numerical computations were carried out by using an explicit finite volume numerical scheme and adopting a recent type of grid called Block-Uniform Quadtree (BUQ), capable of exploiting the computational power of GPUs with negligible overhead. Moreover, stability and zero mass error were ensured, even in the presence of very shallow water depth, by introducing a proper reconstruction of conserved variables at cell interfaces, a specific formulation of the slope source term and an explicit discretization of the friction source term. The 2D shallow water model was tested against two dierent literature tests and a real event that recently occurred in Italy for which field data is available. The influence of the spatial resolution adopted in dierent portions of the domain was also investigated for the last test. The achieved low ratio of simulation to physical times, in some cases less than 1:20, opens new perspectives for flood management strategies. Based on the result of such models, emergency plans can be designed in order to achieve a significant reduction in the economic losses generated by flood events

    Elasto-plastic deformations within a material point framework on modern GPU architectures

    Get PDF
    Plastic strain localization is an important process on Earth. It strongly influ- ences the mechanical behaviour of natural processes, such as fault mechanics, earthquakes or orogeny. At a smaller scale, a landslide is a fantastic example of elasto-plastic deformations. Such behaviour spans from pre-failure mech- anisms to post-failure propagation of the unstable material. To fully resolve the landslide mechanics, the selected numerical methods should be able to efficiently address a wide range of deformation magnitudes. Accurate and performant numerical modelling requires important compu- tational resources. Mesh-free numerical methods such as the material point method (MPM) or the smoothed-particle hydrodynamics (SPH) are particu- larly computationally expensive, when compared with mesh-based methods, such as the finite element method (FEM) or the finite difference method (FDM). Still, mesh-free methods are particularly well-suited to numerical problems involving large elasto-plastic deformations. But, the computational efficiency of these methods should be first improved in order to tackle complex three-dimensional problems, i.e., landslides. As such, this research work attempts to alleviate the computational cost of the material point method by using the most recent graphics processing unit (GPU) architectures available. GPUs are many-core processors originally designed to refresh screen pixels (e.g., for computer games) independently. This allows GPUs to delivers a massive parallelism when compared to central processing units (CPUs). To do so, this research work first investigates code prototyping in a high- level language, e.g., MATLAB. This allows to implement vectorized algorithms and benchmark numerical results of two-dimensional analysis with analytical solutions and/or experimental results in an affordable amount of time. After- wards, low-level language such as CUDA C is used to efficiently implement a GPU-based solver, i.e., ep2-3De v1.0, can resolve three-dimensional prob- lems in a decent amount of time. This part takes advantages of the massive parallelism of modern GPU architectures. In addition, a first attempt of GPU parallel computing, i.e., multi-GPU codes, is performed to increase even more the performance and to address the on-chip memory limitation. Finally, this GPU-based solver is used to investigate three-dimensional granular collapses and is compared with experimental evidences obtained in the laboratory. This research work demonstrates that the material point method is well suited to resolve small to large elasto-plastic deformations. Moreover, the computational efficiency of the method can be dramatically increased using modern GPU architectures. These allow fast, performant and accurate three- dimensional modelling of landslides, provided that the on-chip memory limi- tation is alleviated with an appropriate parallel strategy

    A review of synthetic-aperture radar image formation algorithms and implementations: a computational perspective

    Get PDF
    Designing synthetic-aperture radar image formation systems can be challenging due to the numerous options of algorithms and devices that can be used. There are many SAR image formation algorithms, such as backprojection, matched-filter, polar format, Range–Doppler and chirp scaling algorithms. Each algorithm presents its own advantages and disadvantages considering efficiency and image quality; thus, we aim to introduce some of the most common SAR image formation algorithms and compare them based on these two aspects. Depending on the requisites of each individual system and implementation, there are many device options to choose from, for in stance, FPGAs, GPUs, CPUs, many-core CPUs, and microcontrollers. We present a review of the state of the art of SAR imaging systems implementations. We also compare such implementations in terms of power consumption, execution time, and image quality for the different algorithms used.info:eu-repo/semantics/publishedVersio

    Inference of Many-Taxon Phylogenies

    Get PDF
    Phylogenetic trees are tree topologies that represent the evolutionary history of a set of organisms. In this thesis, we address computational challenges related to the analysis of large-scale datasets with Maximum Likelihood based phylogenetic inference. We have approached this using different strategies: reduction of memory requirements, reduction of running time, and reduction of man-hours

    Exploiting BSP Abstractions for Compiler Based Optimizations of GPU Applications on multi-GPU Systems

    Get PDF
    Graphics Processing Units (GPUs) are accelerators for computers and provide massive amounts of computational power and bandwidth for amenable applications. While effectively utilizing an individual GPU already requires a high level of skill, effectively utilizing multiple GPUs introduces completely new types of challenges. This work sets out to investigate how the hierarchical execution model of GPUs can be exploited to simplify the utilization of such multi-GPU systems. The investigation starts with an analysis of the memory access patterns exhibited by applications from common GPU benchmark suites. Memory access patterns are collected using custom instrumentation and a simple simulation then analyzes the patterns and identifies implicit communication across the different levels of the execution hierarchy. The analysis reveals that for most GPU applications memory accesses are highly localized and there exists a way to partition the workload so that the communication volume grows slower than the aggregated bandwidth for growing numbers of GPUs. Next, an application model based on Z-polyhedra is derived that formalizes the distribution of work across multiple GPUs and allows the identification of data dependencies. The model is then used to implement a prototype compiler that consumes single-GPU programs and produces executables that distribute GPU workloads across all available GPUs in a system. It uses static analysis to identify memory access patterns and polyhedral code generation in combination with a dynamic tracking system to efficiently resolve data dependencies. The prototype is implemented as an extension to the LLVM/Clang compiler and published in full source. The prototype compiler is then evaluated using a set of benchmark applications. While the prototype is limited in its applicability by technical issues, it provides impressive speedups of up to 12.4x on 16 GPUs for amenable applications. An in-depth analysis of the application runtime reveals that dependency resolution takes up less than 10% of the runtime, often significantly less. A discussion follows and puts the work into context by presenting and differentiating related work, reflecting critically on the work itself and an outlook of the aspects that could be explored as part of this research. The work concludes with a summary and a closing opinion

    Fast algorithm for real-time rings reconstruction

    Get PDF
    The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of μs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

    Interstitial-Scale Modeling of Packed-Bed Reactors

    Get PDF
    Packed-beds are common to adsorption scrubbers, packed bed reactors, and trickle-bed reactors widely used across the petroleum, petrochemical, and chemical industries. The micro structure of these packed beds is generally very complex and has tremendous influence on heat, mass, and momentum transport phenomena on the micro and macro length scales within the bed. On a reactor scale, bed geometry strongly influences overall pressure drop, residence time distribution, and conversion of species through domain-fluid interactions. On the interstitial scale, particle boundary layer formation, fluid to particle mass transfer, and local mixing are controlled by turbulence and dissipation existing around packed particles. In the present research, a CFD model is developed using OpenFOAM: www.openfoam.org) to directly resolve momentum and scalar transport in both laminar and turbulent flow-fields, where the interstitial velocity field is resolved using the Navier-Stokes equations: i.e. no pseudo-continuum based assumptions. A discussion detailing the process of generating the complex domain using a Monte-Carlo packing algorithm is provided, along with relevant details required to generate an arbitrary polyhedral mesh describing the packed-bed. Lastly, an algorithm coupling OpenFOAM with a linear system solver using the graphics processing unit: GPU) computing paradigm was developed and will be discussed in detail
    corecore