Search CORE

194 research outputs found

Advancement of Computing on Large Datasets via Parallel Computing and Cyberinfrastructure

Author: Yildirim Ahmet Artu
Publication venue: DigitalCommons@USU
Publication date: 01/05/2015
Field of study

Large datasets require efficient processing, storage and management to efficiently extract useful information for innovation and decision-making. This dissertation demonstrates novel approaches and algorithms using virtual memory approach, parallel computing and cyberinfrastructure. First, we introduce a tailored user-level virtual memory system for parallel algorithms that can process large raster data files in a desktop computer environment with limited memory. The application area for this portion of the study is to develop parallel terrain analysis algorithms that use multi-threading to take advantage of common multi-core processors for greater efficiency. Second, we present two novel parallel WaveCluster algorithms that perform cluster analysis by taking advantage of discrete wavelet transform to reduce large data to coarser representations so data is smaller and more easily managed than the original data in size and complexity. Finally, this dissertation demonstrates an HPC gateway service that abstracts away many details and complexities involved in the use of HPC systems including authentication, authorization, and data and job management

DigitalCommons@USU

A GPU-Accelerated Shallow-Water Scheme for Surface Runoff Simulations

Author: Aureli Francesca
Dazzi Susanna
Ferrari Alessia
Prost Federico
Vacondio Renato
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

The capability of a GPU-parallelized numerical scheme to perform accurate and fast simulations of surface runo in watersheds, exploiting high-resolution digital elevation models (DEMs), was investigated. The numerical computations were carried out by using an explicit finite volume numerical scheme and adopting a recent type of grid called Block-Uniform Quadtree (BUQ), capable of exploiting the computational power of GPUs with negligible overhead. Moreover, stability and zero mass error were ensured, even in the presence of very shallow water depth, by introducing a proper reconstruction of conserved variables at cell interfaces, a specific formulation of the slope source term and an explicit discretization of the friction source term. The 2D shallow water model was tested against two dierent literature tests and a real event that recently occurred in Italy for which field data is available. The influence of the spatial resolution adopted in dierent portions of the domain was also investigated for the last test. The achieved low ratio of simulation to physical times, in some cases less than 1:20, opens new perspectives for flood management strategies. Based on the result of such models, emergency plans can be designed in order to achieve a significant reduction in the economic losses generated by flood events

Multidisciplinary Digital Publishing Institute

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Elasto-plastic deformations within a material point framework on modern GPU architectures

Author: Wyser Emmanuel
Publication venue: Université de Lausanne, Faculté des géosciences et de l'environnement
Publication date: 18/01/2022
Field of study

Plastic strain localization is an important process on Earth. It strongly influ- ences the mechanical behaviour of natural processes, such as fault mechanics, earthquakes or orogeny. At a smaller scale, a landslide is a fantastic example of elasto-plastic deformations. Such behaviour spans from pre-failure mech- anisms to post-failure propagation of the unstable material. To fully resolve the landslide mechanics, the selected numerical methods should be able to efficiently address a wide range of deformation magnitudes. Accurate and performant numerical modelling requires important compu- tational resources. Mesh-free numerical methods such as the material point method (MPM) or the smoothed-particle hydrodynamics (SPH) are particu- larly computationally expensive, when compared with mesh-based methods, such as the finite element method (FEM) or the finite difference method (FDM). Still, mesh-free methods are particularly well-suited to numerical problems involving large elasto-plastic deformations. But, the computational efficiency of these methods should be first improved in order to tackle complex three-dimensional problems, i.e., landslides. As such, this research work attempts to alleviate the computational cost of the material point method by using the most recent graphics processing unit (GPU) architectures available. GPUs are many-core processors originally designed to refresh screen pixels (e.g., for computer games) independently. This allows GPUs to delivers a massive parallelism when compared to central processing units (CPUs). To do so, this research work first investigates code prototyping in a high- level language, e.g., MATLAB. This allows to implement vectorized algorithms and benchmark numerical results of two-dimensional analysis with analytical solutions and/or experimental results in an affordable amount of time. After- wards, low-level language such as CUDA C is used to efficiently implement a GPU-based solver, i.e., ep2-3De v1.0, can resolve three-dimensional prob- lems in a decent amount of time. This part takes advantages of the massive parallelism of modern GPU architectures. In addition, a first attempt of GPU parallel computing, i.e., multi-GPU codes, is performed to increase even more the performance and to address the on-chip memory limitation. Finally, this GPU-based solver is used to investigate three-dimensional granular collapses and is compared with experimental evidences obtained in the laboratory. This research work demonstrates that the material point method is well suited to resolve small to large elasto-plastic deformations. Moreover, the computational efficiency of the method can be dramatically increased using modern GPU architectures. These allow fast, performant and accurate three- dimensional modelling of landslides, provided that the on-chip memory limi- tation is alleviated with an appropriate parallel strategy

Serveur académique lausannois

A review of synthetic-aperture radar image formation algorithms and implementations: a computational perspective

Author: Cláudio de Campos Neto Horácio
Cruz Helena
Duarte Rui
Monteiro José
Véstias Mário
Publication venue: 'MDPI AG'
Publication date: 04/03/2022
Field of study

Designing synthetic-aperture radar image formation systems can be challenging due to the numerous options of algorithms and devices that can be used. There are many SAR image formation algorithms, such as backprojection, matched-filter, polar format, Range–Doppler and chirp scaling algorithms. Each algorithm presents its own advantages and disadvantages considering efficiency and image quality; thus, we aim to introduce some of the most common SAR image formation algorithms and compare them based on these two aspects. Depending on the requisites of each individual system and implementation, there are many device options to choose from, for in stance, FPGAs, GPUs, CPUs, many-core CPUs, and microcontrollers. We present a review of the state of the art of SAR imaging systems implementations. We also compare such implementations in terms of power consumption, execution time, and image quality for the different algorithms used.info:eu-repo/semantics/publishedVersio

Repositório Científico do Instituto Politécnico de Lisboa

Multidisciplinary Digital Publishing Institute

Inference of Many-Taxon Phylogenies

Author: Izquierdo-Carrasco Fernando
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

Phylogenetic trees are tree topologies that represent the evolutionary history of a set of organisms. In this thesis, we address computational challenges related to the analysis of large-scale datasets with Maximum Likelihood based phylogenetic inference. We have approached this using different strategies: reduction of memory requirements, reduction of running time, and reduction of man-hours

KITopen

Harnessing emerging supercomputers for remote and interactive visual discovery in astronomy

Author: Dykes Timothy
Publication venue
Publication date: 01/12/2018
Field of study

Portsmouth University Research Portal (Pure)

Exploiting BSP Abstractions for Compiler Based Optimizations of GPU Applications on multi-GPU Systems

Author: Matz Alexander
Publication venue
Publication date: 01/01/2020
Field of study

Graphics Processing Units (GPUs) are accelerators for computers and provide massive amounts of computational power and bandwidth for amenable applications. While effectively utilizing an individual GPU already requires a high level of skill, effectively utilizing multiple GPUs introduces completely new types of challenges. This work sets out to investigate how the hierarchical execution model of GPUs can be exploited to simplify the utilization of such multi-GPU systems. The investigation starts with an analysis of the memory access patterns exhibited by applications from common GPU benchmark suites. Memory access patterns are collected using custom instrumentation and a simple simulation then analyzes the patterns and identifies implicit communication across the different levels of the execution hierarchy. The analysis reveals that for most GPU applications memory accesses are highly localized and there exists a way to partition the workload so that the communication volume grows slower than the aggregated bandwidth for growing numbers of GPUs. Next, an application model based on Z-polyhedra is derived that formalizes the distribution of work across multiple GPUs and allows the identification of data dependencies. The model is then used to implement a prototype compiler that consumes single-GPU programs and produces executables that distribute GPU workloads across all available GPUs in a system. It uses static analysis to identify memory access patterns and polyhedral code generation in combination with a dynamic tracking system to efficiently resolve data dependencies. The prototype is implemented as an extension to the LLVM/Clang compiler and published in full source. The prototype compiler is then evaluated using a set of benchmark applications. While the prototype is limited in its applicability by technical issues, it provides impressive speedups of up to 12.4x on 16 GPUs for amenable applications. An in-depth analysis of the application runtime reveals that dependency resolution takes up less than 10% of the runtime, often significantly less. A discussion follows and puts the work into context by presenting and differentiating related work, reflecting critically on the work itself and an outlook of the aspects that could be explored as part of this research. The work concludes with a summary and a closing opinion

Heidelberger Dokumentenserver

Fast algorithm for real-time rings reconstruction

Author: Ammendola R.
Bauce Matteo
Biagioni A.
Capuani S.
Chiozzi Stefano
Cotta Ramusino Angelo
Di Domenico Giovanni
Fantechi R.
Fiorini Massimiliano
Giagu S.
Gianoli Alberto
Graverini E.
Lamanna Gianluca
Lonardo A.
Messina A.
Neri Ilaria
Palombo Marco
Pantaleo F.
Paolucci P.S.
Piandani R.
Pontisso L.
Rescigno M.
Simula F.
Sozzi Marco
Vicini P.
Publication venue: Verlag Deutsches Elektronen-Synchrotron
Publication date: 01/01/2015
Field of study

The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of μs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

DESY Publication Database

DESY

Archivio istituzionale della ricerca - Università di Ferrara

Archivio della ricerca- Università di Roma La Sapienza

CERN Document Server

Interstitial-Scale Modeling of Packed-Bed Reactors

Author: Combest Daniel Parks
Publication venue: Washington University Open Scholarship
Publication date: 29/08/2012
Field of study

Packed-beds are common to adsorption scrubbers, packed bed reactors, and trickle-bed reactors widely used across the petroleum, petrochemical, and chemical industries. The micro structure of these packed beds is generally very complex and has tremendous influence on heat, mass, and momentum transport phenomena on the micro and macro length scales within the bed. On a reactor scale, bed geometry strongly influences overall pressure drop, residence time distribution, and conversion of species through domain-fluid interactions. On the interstitial scale, particle boundary layer formation, fluid to particle mass transfer, and local mixing are controlled by turbulence and dissipation existing around packed particles. In the present research, a CFD model is developed using OpenFOAM: www.openfoam.org) to directly resolve momentum and scalar transport in both laminar and turbulent flow-fields, where the interstitial velocity field is resolved using the Navier-Stokes equations: i.e. no pseudo-continuum based assumptions. A discussion detailing the process of generating the complex domain using a Monte-Carlo packing algorithm is provided, along with relevant details required to generate an arbitrary polyhedral mesh describing the packed-bed. Lastly, an algorithm coupling OpenFOAM with a linear system solver using the graphics processing unit: GPU) computing paradigm was developed and will be discussed in detail

Washington University St. Louis: Open Scholarship