Search CORE

1,385 research outputs found

Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

Author: Amr H. Hassan
Barsdell
Benjamin R. Barsdell
Christopher J. Fluke
David G. Barnes
Harris
Kirk
Larus
Nyland
Schaaf
Wayth
Publication venue: 'CSIRO Publishing'
Publication date: 26/08/2010
Field of study

General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.Comment: 13 pages, 5 figures, accepted for publication in PAS

arXiv.org e-Print Archive

Crossref

Swinburne Research Bank

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

BioEM: GPU-accelerated computing of Bayesian inference of electron microscopy images

Author: Baruffa Fabio
Cossio Pilar
Hummer Gerhard
Lindenstruth Volker
Rampp Markus
Rohr David
Publication venue: 'Elsevier BV'
Publication date: 21/09/2016
Field of study

In cryo-electron microscopy (EM), molecular structures are determined from large numbers of projection images of individual particles. To harness the full power of this single-molecule information, we use the Bayesian inference of EM (BioEM) formalism. By ranking structural models using posterior probabilities calculated for individual images, BioEM in principle addresses the challenge of working with highly dynamic or heterogeneous systems not easily handled in traditional EM reconstruction. However, the calculation of these posteriors for large numbers of particles and models is computationally demanding. Here we present highly parallelized, GPU-accelerated computer software that performs this task efficiently. Our flexible formulation employs CUDA, OpenMP, and MPI parallelization combined with both CPU and GPU computing. The resulting BioEM software scales nearly ideally both on pure CPU and on CPU+GPU architectures, thus enabling Bayesian analysis of tens of thousands of images in a reasonable time. The general mathematical framework and robust algorithms are not limited to cryo-electron microscopy but can be generalized for electron tomography and other imaging experiments

arXiv.org e-Print Archive

MPG.PuRe

Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance

Author: Cigale Matej
Jones Andrew
Knight Louise
Stefanic Polona
Taylor Ian
Publication venue: 'Elsevier BV'
Publication date: 01/11/2019
Field of study

SWITCH (Software Workbench for Interactive, Time Critical and Highly self-adaptive cloud applications) allows for the development and deployment of real-time applications in the cloud, but it does not yet support instances backed by Graphics Processing Units (GPUs). Wanting to explore how SWITCH might support CUDA (a GPU architecture) in the future, we have undertaken a review of time-critical CUDA applications, discovering that run-time requirements (which we call ‘wall time’) are in many cases regarded as the most important. We have performed experiments to investigate which parameters have the greatest impact on wall time when running multiple Amazon Web Services GPU-backed instances. Although a maximum of 8 single-GPU instances can be launched in a single Amazon Region, launching just 2 instances rather than 1 gives a 42% decrease in wall time. Also, instances are often wasted doing nothing, and there is a moderately-strong relationship between how problems are distributed across instances and wall time. These findings can be used to enhance the SWITCH provision for specifying Non-Functional Requirements (NFRs); in the future, GPU-backed instances could be supported. These findings can also be used more generally, to optimise the balance between the computational resources needed and the resulting wall time to obtain results

Online Research @ Cardiff

Efficient transfer entropy analysis of non-stationary neural time series

Author: Díaz-Pernas Francisco J.
Martínez-Zarzuela Mario
Vicente Raul
Wibral Michael
Wollstadt Patricia
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/07/2014
Field of study

Information theory allows us to investigate information processing in neural systems in terms of information transfer, storage and modification. Especially the measure of information transfer, transfer entropy, has seen a dramatic surge of interest in neuroscience. Estimating transfer entropy from two processes requires the observation of multiple realizations of these processes to estimate associated probability density functions. To obtain these observations, available estimators assume stationarity of processes to allow pooling of observations over time. This assumption however, is a major obstacle to the application of these estimators in neuroscience as observed processes are often non-stationary. As a solution, Gomez-Herrero and colleagues theoretically showed that the stationarity assumption may be avoided by estimating transfer entropy from an ensemble of realizations. Such an ensemble is often readily available in neuroscience experiments in the form of experimental trials. Thus, in this work we combine the ensemble method with a recently proposed transfer entropy estimator to make transfer entropy estimation applicable to non-stationary time series. We present an efficient implementation of the approach that deals with the increased computational demand of the ensemble method's practical application. In particular, we use a massively parallel implementation for a graphics processing unit to handle the computationally most heavy aspects of the ensemble method. We test the performance and robustness of our implementation on data from simulated stochastic processes and demonstrate the method's applicability to magnetoencephalographic data. While we mainly evaluate the proposed method for neuroscientific data, we expect it to be applicable in a variety of fields that are concerned with the analysis of information transfer in complex biological, social, and artificial systems.Comment: 27 pages, 7 figures, submitted to PLOS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging

Author: Bernstein Gilbert
DeVito Zachary
Fisher Matthew
Hanrahan Pat
Mara Michael
Nießner Matthias
Ragan-Kelley Jonathan
Theobalt Christian
Zollhöfer Michael
Publication venue
Publication date: 01/01/2016
Field of study

Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes. The mathematical descriptions of these functions are extremely concise, but their implementation in real code is tedious, especially when optimized for real-time performance on modern GPUs in interactive applications. In this work, we propose a new language, Opt (available under http://optlang.org), for writing these objective functions over image- or graph-structured unknowns concisely and at a high level. Our compiler automatically transforms these specifications into state-of-the-art GPU solvers based on Gauss-Newton or Levenberg-Marquardt methods. Opt can generate different variations of the solver, so users can easily explore tradeoffs in numerical precision, matrix-free methods, and solver approaches. In our results, we implement a variety of real-world graphics and vision applications. Their energy functions are expressible in tens of lines of code, and produce highly-optimized GPU solver implementations. These solver have performance competitive with the best published hand-tuned, application-specific GPU solvers, and orders of magnitude beyond a general-purpose auto-generated solver

arXiv.org e-Print Archive

MPG.PuRe

Image Stream Similarity Search in GPU Clusters

Author: Azevedo José Maria Pantoja Mata Vale e
Publication venue
Publication date: 01/01/2018
Field of study

Images are an important part of today’s society. They are everywhere on the internet and computing, from news articles to diverse areas such as medicine, autonomous vehicles and social media. This enormous amount of images requires massive amounts of processing power to process, upload, download and search for images. The ability to search an image, and find similar images in a library of millions of others empowers users with great advantages. Different fields have different constraints, but all benefit from the quick processing that can be achieved. Problems arise when creating a solution for this. The similarity calculation between several images, performing thousands of comparisons every second, is a challenge. The results of such computations are very large, and pose a challenge when attempting to process. Solutions for these problems often take advantage of graphs in order to index images and their similarity. The graph can then be used for the querying process. Creating and processing such a graph in an acceptable time frame poses yet another challenge. In order to tackle these challenges, we take advantage of a cluster of machines equipped with Graphics Processing Units (GPUs), enabling us to parallelize the process of describing an image visually and finding other images similar to it in an acceptable time frame. GPUs are incredibly efficient at processing data such as images and graphs, through algorithms that are heavily parallelizable. We propose a scalable and modular system that takes advantage of GPUs, distributed computing and fine-grained parallellism to detect image features, index images in a graph and allow users to search for similar images. The solution we propose is able to compare up to 5000 images every second. It is also able to query a graph with thousands of nodes and millions of edges in a matter of milliseconds, achieving a very efficient query speed. The modularity of our solution allows the interchangeability of algorithms and different steps in the solution, which provides great adaptability to any needs

Repositório da Universidade Nova de Lisboa