280 research outputs found

    Applications in GNSS water vapor tomography

    Get PDF
    Algebraic reconstruction algorithms are iterative algorithms that are used in many area including medicine, seismology or meteorology. These algorithms are known to be highly computational intensive. This may be especially troublesome for real-time applications or when processed by conventional low-cost personnel computers. One of these real time applications is the reconstruction of water vapor images from Global Navigation Satellite System (GNSS) observations. The parallelization of algebraic reconstruction algorithms has the potential to diminish signi cantly the required resources permitting to obtain valid solutions in time to be used for nowcasting and forecasting weather models. The main objective of this dissertation was to present and analyse diverse shared memory libraries and techniques in CPU and GPU for algebraic reconstruction algorithms. It was concluded that the parallelization compensates over sequential implementations. Overall the GPU implementations were found to be only slightly faster than the CPU implementations, depending on the size of the problem being studied. A secondary objective was to develop a software to perform the GNSS water vapor reconstruction using the implemented parallel algorithms. This software has been developed with success and diverse tests were made namely with synthetic and real data, the preliminary results shown to be satisfactory. This dissertation was written in the Space & Earth Geodetic Analysis Laboratory (SEGAL) and was carried out in the framework of the Structure of Moist convection in high-resolution GNSS observations and models (SMOG) (PTDC/CTE-ATM/119922/2010) project funded by FCT.Algoritmos de reconstrução algébrica são algoritmos iterativos que são usados em muitas áreas incluindo medicina, sismologia ou meteorologia. Estes algoritmos são conhecidos por serem bastante exigentes computacionalmente. Isto pode ser especialmente complicado para aplicações de tempo real ou quando processados por computadores pessoais de baixo custo. Uma destas aplicações de tempo real é a reconstrução de imagens de vapor de água a partir de observações de sistemas globais de navegação por satélite. A paralelização dos algoritmos de reconstrução algébrica permite que se reduza significativamente os requisitos computacionais permitindo obter soluções válidas para previsão meteorológica num curto espaço de tempo. O principal objectivo desta dissertação é apresentar e analisar diversas bibliotecas e técnicas multithreading para a reconstrução algébrica em CPU e GPU. Foi concluído que a paralelização compensa sobre a implementações sequenciais. De um modo geral as implementações GPU obtiveram resultados relativamente melhores que implementações em CPU, isto dependendo do tamanho do problema a ser estudado. Um objectivo secundário era desenvolver uma aplicação que realizasse a reconstrução de imagem de vapor de água através de sistemas globais de navegação por satélite de uma forma paralela. Este software tem sido desenvolvido com sucesso e diversos testes foram realizados com dados sintéticos e dados reais, os resultados preliminares foram satisfatórios. Esta dissertação foi escrita no Space & Earth Geodetic Analysis Laboratory (SEGAL) e foi realizada de acordo com o projecto Structure 01' Moist convection in high-resolution GNSS observations and models (SMOG) (PTDC / CTE-ATM/ 11992212010) financiado pelo FCT.Fundação para a Ciência e a Tecnologia (FCT

    Development of a low-level, algebra-based library to provide platform portability on hybrid supercomputers

    Get PDF
    Continuous enhancement in hardware technologies enables scientific computing to advance incessantly and reach further aims. Since the start of the global race for exascale high-performance computing, massively-parallel devices of various architectures have been incorporated into the newest supercomputers, leading to an increasing hybridization of compute nodes. In this context of accelerated innovation, software portability and efficiency become crucial. Traditionally, scientific computing software development using mesh methods is based on calculations in iterative stencil loops over a discretized geometry—the mesh. Despite being intuitive and versatile, the interdependency between algorithms and their computational implementations in stencil applications usually results in a large number of subroutines and introduces an inevitable complexity when it comes to portability and sustainability. An alternative is to break the interdependency between the algorithm and its implementation, and then to cast the calculations into a minimalist set of kernels. Algebra-based implementations rely on a reduced set of basic linear algebra subroutines, which simplifies the deployment of software in hybrid computing systems. In this work, we tackle the development of a fully-portable, algebraic library that can be coupled beneath other high-level, algebra-oriented framework. Namely, this library provides platform portability in the simplest possible manner (i.e., the user develops applications in a purely sequential style). Internally, algebraic objects are distributed among computing devices using a multilevel decomposition approach. Data exchanges between computing units or between nodes are hidden by a multithreaded overlapping scheme.The work of X.A.F, A.A.B, A.O., and F.X.T. has been financially supported by the following R+D projects: RETOtwin (PDC2021-120970-I00), given by MCIN/AEI/10.13039/501100011033 and European Union Next Generation EU/PRTR, FusionCAT (001-P-001722), given by Generalitat de Catalunya RIS3CAT-FEDER. X. A. F. has also been supported by a predoctoral contract (2019FI B2-00076) by the Government of Catalonia. A.A.B has also been supported by the predoctoral grants DIN2018-010061 and 2019-DI-90, given by MCIN/AEI/10.13039/501100011033 and the Catalan Agency for Management of University and Research Grants (AGAUR), respectively. The work of A. G. has been funded by the Russian Science Foundation, project 19-11-00299. The studies of this work have been carried out using computational resources of the Barcelona Supercomputing Center (IM-2020-3-0030 and IM-2022-1-0015). The authors thankfully acknowledge these institutions.Peer ReviewedPostprint (published version

    Towards extending the SWITCH platform for time-critical, cloud-based CUDA applications: Job scheduling parameters influencing performance

    Get PDF
    SWITCH (Software Workbench for Interactive, Time Critical and Highly self-adaptive cloud applications) allows for the development and deployment of real-time applications in the cloud, but it does not yet support instances backed by Graphics Processing Units (GPUs). Wanting to explore how SWITCH might support CUDA (a GPU architecture) in the future, we have undertaken a review of time-critical CUDA applications, discovering that run-time requirements (which we call ‘wall time’) are in many cases regarded as the most important. We have performed experiments to investigate which parameters have the greatest impact on wall time when running multiple Amazon Web Services GPU-backed instances. Although a maximum of 8 single-GPU instances can be launched in a single Amazon Region, launching just 2 instances rather than 1 gives a 42% decrease in wall time. Also, instances are often wasted doing nothing, and there is a moderately-strong relationship between how problems are distributed across instances and wall time. These findings can be used to enhance the SWITCH provision for specifying Non-Functional Requirements (NFRs); in the future, GPU-backed instances could be supported. These findings can also be used more generally, to optimise the balance between the computational resources needed and the resulting wall time to obtain results

    Novel high performance techniques for high definition computer aided tomography

    Get PDF
    Mención Internacional en el título de doctorMedical image processing is an interdisciplinary field in which multiple research areas are involved: image acquisition, scanner design, image reconstruction algorithms, visualization, etc. X-Ray Computed Tomography (CT) is a medical imaging modality based on the attenuation suffered by the X-rays as they pass through the body. Intrinsic differences in attenuation properties of bone, air, and soft tissue result in high-contrast images of anatomical structures. The main objective of CT is to obtain tomographic images from radiographs acquired using X-Ray scanners. The process of building a 3D image or volume from the 2D radiographs is known as reconstruction. One of the latest trends in CT is the reduction of the radiation dose delivered to patients through the decrease of the amount of acquired data. This reduction results in artefacts in the final images if conventional reconstruction methods are used, making it advisable to employ iterative reconstruction algorithms. There are numerous reconstruction algorithms available, from which we can highlight two specific types: traditional algorithms, which are fast but do not enable the obtaining of high quality images in situations of limited data; and iterative algorithms, slower but more reliable when traditional methods do not reach the quality standard requirements. One of the priorities of reconstruction is the obtaining of the final images in near real time, in order to reduce the time spent in diagnosis. To accomplish this objective, new high performance techniques and methods for accelerating these types of algorithms are needed. This thesis addresses the challenges of both traditional and iterative reconstruction algorithms, regarding acceleration and image quality. One common approach for accelerating these algorithms is the usage of shared-memory and heterogeneous architectures. In this thesis, we propose a novel simulation/reconstruction framework, namely FUX-Sim. This framework follows the hypothesis that the development of new flexible X-ray systems can benefit from computer simulations, which may also enable performance to be checked before expensive real systems are implemented. Its modular design abstracts the complexities of programming for accelerated devices to facilitate the development and evaluation of the different configurations and geometries available. In order to obtain near real execution times, low-level optimizations for the main components of the framework are provided for Graphics Processing Unit (GPU) architectures. Other alternative tackled in this thesis is the acceleration of iterative reconstruction algorithms by using distributed memory architectures. We present a novel architecture that unifies the two most important computing paradigms for scientific computing nowadays: High Performance Computing (HPC). The proposed architecture combines Big Data frameworks with the advantages of accelerated computing. The proposed methods presented in this thesis provide more flexible scanner configurations as they offer an accelerated solution. Regarding performance, our approach is as competitive as the solutions found in the literature. Additionally, we demonstrate that our solution scales with the size of the problem, enabling the reconstruction of high resolution images.El procesamiento de imágenes médicas es un campo interdisciplinario en el que participan múltiples áreas de investigación como la adquisición de imágenes, diseño de escáneres, algoritmos de reconstrucción de imágenes, visualización, etc. La tomografía computarizada (TC) de rayos X es una modalidad de imágen médica basada en el cálculo de la atenuación sufrida por los rayos X a medida que pasan por el cuerpo a escanear. Las diferencias intrínsecas en la atenuación de hueso, aire y tejido blando dan como resultado imágenes de alto contraste de estas estructuras anatómicas. El objetivo principal de la TC es obtener imágenes tomográficas a partir estas radiografías obtenidas mediante escáneres de rayos X. El proceso de construir una imagen o volumen en 3D a partir de las radiografías 2D se conoce como reconstrucción. Una de las últimas tendencias en la tomografía computarizada es la reducción de la dosis de radiación administrada a los pacientes a través de la reducción de la cantidad de datos adquiridos. Esta reducción da como resultado artefactos en las imágenes finales si se utilizan métodos de reconstrucción convencionales, por lo que es aconsejable emplear algoritmos de reconstrucción iterativos. Existen numerosos algoritmos de reconstrucción disponibles a partir de los cuales podemos destacar dos categorías: algoritmos tradicionales, rápidos pero no permiten obtener imágenes de alta calidad en situaciones en las que los datos son limitados; y algoritmos iterativos, más lentos pero más estables en situaciones donde los métodos tradicionales no alcanzan los requisitos en cuanto a la calidad de la imagen. Una de las prioridades de la reconstrucción es la obtención de las imágenes finales en tiempo casi real, con el fin de reducir el tiempo de diagnóstico. Para lograr este objetivo, se necesitan nuevas técnicas y métodos de alto rendimiento para acelerar estos algoritmos. Esta tesis aborda los desafíos de los algoritmos de reconstrucción tradicionales e iterativos, con respecto a la aceleración y la calidad de imagen. Un enfoque común para acelerar estos algoritmos es el uso de arquitecturas de memoria compartida y heterogéneas. En esta tesis, proponemos un nuevo sistema de simulación/reconstrucción, llamado FUX-Sim. Este sistema se construye alrededor de la hipótesis de que el desarrollo de nuevos sistemas de rayos X flexibles puede beneficiarse de las simulaciones por computador, en los que también se puede realizar un control del rendimiento de los nuevos sistemas a desarrollar antes de su implementación física. Su diseño modular abstrae las complejidades de la programación para aceleradores con el objetivo de facilitar el desarrollo y la evaluación de las diferentes configuraciones y geometrías disponibles. Para obtener ejecuciones en casi tiempo real, se proporcionan optimizaciones de bajo nivel para los componentes principales del sistema en las arquitecturas GPU. Otra alternativa abordada en esta tesis es la aceleración de los algoritmos de reconstrucción iterativa mediante el uso de arquitecturas de memoria distribuidas. Presentamos una arquitectura novedosa que unifica los dos paradigmas informáticos más importantes en la actualidad: computación de alto rendimiento (HPC) y Big Data. La arquitectura propuesta combina sistemas Big Data con las ventajas de los dispositivos aceleradores. Los métodos propuestos presentados en esta tesis proporcionan configuraciones de escáner más flexibles y ofrecen una solución acelerada. En cuanto al rendimiento, nuestro enfoque es tan competitivo como las soluciones encontradas en la literatura. Además, demostramos que nuestra solución escala con el tamaño del problema, lo que permite la reconstrucción de imágenes de alta resolución.This work has been mainly funded thanks to a FPU fellowship (FPU14/03875) from the Spanish Ministry of Education. It has also been partially supported by other grants: • DPI2016-79075-R. “Nuevos escenarios de tomografía por rayos X”, from the Spanish Ministry of Economy and Competitiveness. • TIN2016-79637-P Towards unification of HPC and Big Data Paradigms from the Spanish Ministry of Economy and Competitiveness. • Short-term scientific missions (STSM) grant from NESUS COST Action IC1305. • TIN2013-41350-P, Scalable Data Management Techniques for High-End Computing Systems from the Spanish Ministry of Economy and Competitiveness. • RTC-2014-3028-1 NECRA Nuevos escenarios clinicos con radiología avanzada from the Spanish Ministry of Economy and Competitiveness.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: José Daniel García Sánchez.- Secretario: Katzlin Olcoz Herrero.- Vocal: Domenico Tali

    Towards a methodology for creating time-critical, cloud-based CUDA applications

    Get PDF
    CUDA has been used in many different application domains, not all of which are specifically image processing related. There is the opportunity to use multiple and/or distributed CUDA resources in cloud facilities such as Amazon Web Services (AWS), in order to obtain enhanced processing power and to satisfy time-critical requirements which cannot be satisfied using a single CUDA resource. In particular, this would provide enhanced ability for processing Big Data, especially in conjunction with distributed file systems (for example). In this paper, we present a survey of time-critical CUDA applications, identifying requirements and concepts that they tend to have in common. In particular, we investigate the terminology used for Quality of Service metrics, and present a taxonomy which summarises the underlying concepts and maps these terms to the diverse terminology used. We also survey typical requirements for developing, deploying and managing such applications. Given these requirements, we consider how the SWITCH platform can in principle support the entire life-cycle of time-critical CUDA application development and cloud deployment, and identify specific extensions which would be needed in order fully to support this particular class of time-critical cloud applications

    PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

    Full text link
    High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

    PC-grade parallel processing and hardware acceleration for large-scale data analysis

    Get PDF
    Arguably, modern graphics processing units (GPU) are the first commodity, and desktop parallel processor. Although GPU programming was originated from the interactive rendering in graphical applications such as computer games, researchers in the field of general purpose computation on GPU (GPGPU) are showing that the power, ubiquity and low cost of GPUs makes them an ideal alternative platform for high-performance computing. This has resulted in the extensive exploration in using the GPU to accelerate general-purpose computations in many engineering and mathematical domains outside of graphics. However, limited to the development complexity caused by the graphics-oriented concepts and development tools for GPU-programming, GPGPU has mainly been discussed in the academic domain so far and has not yet fully fulfilled its promises in the real world. This thesis aims at exploiting GPGPU in the practical engineering domain and presented a novel contribution to GPGPU-driven linear time invariant (LTI) systems that are employed by the signal processing techniques in stylus-based or optical-based surface metrology and data processing. The core contributions that have been achieved in this project can be summarized as follow. Firstly, a thorough survey of the state-of-the-art of GPGPU applications and their development approaches has been carried out in this thesis. In addition, the category of parallel architecture pattern that the GPGPU belongs to has been specified, which formed the foundation of the GPGPU programming framework design in the thesis. Following this specification, a GPGPU programming framework is deduced as a general guideline to the various GPGPU programming models that are applied to a large diversity of algorithms in scientific computing and engineering applications. Considering the evolution of GPU’s hardware architecture, the proposed frameworks cover through the transition of graphics-originated concepts for GPGPU programming based on legacy GPUs and the abstraction of stream processing pattern represented by the compute unified device architecture (CUDA) in which GPU is considered as not only a graphics device but a streaming coprocessor of CPU. Secondly, the proposed GPGPU programming framework are applied to the practical engineering applications, namely, the surface metrological data processing and image processing, to generate the programming models that aim to carry out parallel computing for the corresponding algorithms. The acceleration performance of these models are evaluated in terms of the speed-up factor and the data accuracy, which enabled the generation of quantifiable benchmarks for evaluating consumer-grade parallel processors. It shows that the GPGPU applications outperform the CPU solutions by up to 20 times without significant loss of data accuracy and any noticeable increase in source code complexity, which further validates the effectiveness of the proposed GPGPU general programming framework. Thirdly, this thesis devised methods for carrying out result visualization directly on GPU by storing processed data in local GPU memory through making use of GPU’s rendering device features to achieve realtime interactions. The algorithms employed in this thesis included various filtering techniques, discrete wavelet transform, and the fast Fourier Transform which cover the common operations implemented in most LTI systems in spatial and frequency domains. Considering the employed GPUs’ hardware designs, especially the structure of the rendering pipelines, and the characteristics of the algorithms, the series of proposed GPGPU programming models have proven its feasibility, practicality, and robustness in real engineering applications. The developed GPGPU programming framework as well as the programming models are anticipated to be adaptable for future consumer-level computing devices and other computational demanding applications. In addition, it is envisaged that the devised principles and methods in the framework design are likely to have significant benefits outside the sphere of surface metrology.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore