629 research outputs found

    GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems

    Get PDF
    While many of the architectural details of future exascale-class high performance computer systems are still a matter of intense research, there appears to be a general consensus that they will be strongly heterogeneous, featuring "standard" as well as "accelerated" resources. Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi. Any software infrastructure that claims usefulness for such environments must be able to meet their inherent challenges: massive multi-level parallelism, topology, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a collection of building blocks that targets algorithms dealing with sparse matrix representations on current and future large-scale systems. It implements the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel numerical kernels, intelligent resource management, and truly heterogeneous parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We describe the details of its design with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models. The library code and several applications are available as open source. We also provide instructions on how to make use of GHOST in existing software packages, together with a case study which demonstrates the applicability and performance of GHOST as a component within a larger software stack.Comment: 32 pages, 11 figure

    Appraisal of Ancient Quarries and WWII Air Raids as Factors of Subsidence in Rome: A Geomatic Approach

    Get PDF
    Ancient mining and quarrying activities left anthropogenic geomorphologies that have shaped the natural landscape and affected environmental equilibria. The artificial structures and their related effects on the surrounding environment are analyzed here to characterize the quarrying landscape in the southeast area of Rome in terms of its dimensions, typology, state of preservation and interface with the urban environment. The increased occurrence of sinkhole events in urban areas has already been scientifically correlated to ancient cavities under increasing urban pressure. In this scenario, additional interacting anthropogenic factors, such as the aerial bombardments perpetrated during the Second World War, are considered here. These three factors have been investigated by employing a combined geomatic methodology. Information on air raids has been organized in vector archives. A dataset of historical aerial photographs has been processed into Digital Surface Models and orthomosaics to reconstruct the quarry landscape and its evolution, identify typologies of exploitation and forms of collapse and corroborate the discussion concerning the induced historical and recent subsidence phenomena, comparing these outputs with photogrammetric products obtained from recent satellite data. Geological and urbanistic characterization of the study area allowed a better connection between these historical and environmental factors. In light of the information gathered so far, SAR interferometric products allowed a preliminary interpretation of ground instabilities surrounding historical quarries, air raids and recent subsidence events. Various sub-areas of the AOI where the presenceof the considered factors also corresponds to areas in slight subsidence in the SAR velocity maps have been highlighted. Bivariate hotspot analysis allowed substantiating the hypothesis of a spatial correlation between these multiple aspects

    Improving the efficiency of the Energy-Split tool to compute the energy of very large molecular systems

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaThe Energy-Split tool receives as input pieces of a very large molecular system and computes all intra and inter-molecular energies, separately calculating the energies of each fragment and then the total energy of the molecule. It takes into account the connectivity information among atoms in a molecule to compute (i) the energy of all terms involving atoms covalently bonded, namely bonds, angles, dihedral angles, and improper angles, and (ii) Coulomb and the Van der Waals energies, that are independent of the atom’s connections, which have to be computed for every atom in the system. The required operations to obtain the total energy of a large molecule are computationally intensive, which require an efficient high-performance computing approach to obtain results in an acceptable time slot. The original Energy-Split Tcl code was thoroughly analyzed to be ported to a parallel and more efficient C++ version. New data structures were defined with data locality features, to take advantage of the advanced features present in current laptop or server systems. These include the vector extensions to the scalar processors, an efficient on-chip memory hierarchy, and the inherent parallelism in multicore devices. To improve the Energy-Split’s sequential variant a parallel version was developed using auxiliary libraries. Both implementations were tested on different multicore devices and optimized to take the most advantage of the features in high performance computing. Significant results by applying professional performance engineering approaches, namely (i) by identifying the data values that can be represented as Boolean variables (such as variables used in auxiliar data structures on the traversal algorithm that computes the Euclidean distance between atoms), leading to significant performance improvements due to the reduced memory bottleneck (over 10 times faster), and (ii) using an adequate compress format (CSR) to represent and operate on sparse matrices (namely matrices with Euclidean distances between atoms pairs, since all distances further the cut-off distance (user defined) are considered as zero, and these are the majority of values). After the first code optimizations, the performance of the sequential version was improved by around 100 times when compared to the original version on a dual-socket server. The parallel version improved up to 24 times, depending on the molecules tested, on the same server. The overall picture shows that the Energy-Split code is highly scalable, obtaining better results with larger molecule files, even when the atom’s arrangement influences the algorithm’s performance.A ferramenta Energy-Split recebe como ficheiro de input a descrição de fragmentos de um sistema molecular de grandes dimensões, de maneira a calcular os valores de energia intramolecular. Separadamente, também efetua o cálculo da energia de cada fragmento e a energia total de uma molécula. Ao mesmo tempo, tem em conta a informação das ligações entre átomos de uma molécula para calcular (i) a energia que envolve todos os átomos ligados covalentemente, nomeadamente bonds, angles, dihedral angles and improper angles, e (ii) energias de Coulomb e Vand der Waals, que são independentes das conexões dos átomos e têm de ser calculadas para cada átomo do sistema. Para cada átomo, o Energy-Split calcula a energia de interação com todos os outros átomos do sistema, considerando a partição da molécula em fragmentos, feita num programa open source, Visual Molecular Dynamics. As operações para o cálculo destas energias podem levar a tarefas muito intensivas, computacionalmente, fazendo com que seja necessário utilizar uma abordagem que tire proveito de computação de alto desempenho de modo a desenvolver código mais eficiente. O código fornecido, em Tcl, foi profundamente analisado e convertido para uma versão paralela e, mais eficiente, em C++. Ao mesmo tempo, foram definidas novas estruturas de dados, que aproveitam a boa localidade dos mesmos para tirar vantagem das extensões vetoriais presentes em qualquer computador e, também, para explorar o paralelismo inerente a máquinas multicore. Assim, foi implementada uma versão paralela do código convertido numa fase anterior com recurso ao uso de bibliotecas auxiliares. Ambas as versões foram testadas em diferentes ambientes multicore e otimizadas de maneira a ser possível tirar o máximo partido da computação de alto desempenho para obter os melhores resultados. Após a aplicação de técnicas de engenharia de performance como (i) a identificação de dados que poderiam ser representados em formatos mais leves como variáveis booleanas (por exemplo, variáveis usadas em estruturas de dados auxiliares ao cálculo da distância Euclideana entre átomos, utilizadas no algoritmo de travessia da molécula), o que levou a melhorias significativas na performance (cerca de 10 vezes) devido à redução de sobrecarga da memória. (ii) a utilização de um formato adequado para a representação de matirzes esparsas (nomeadamente a de representação das mesmas distâncias Euclidianas do primeiro ponto, uma vez que todas as distâncias que ultrapassem a distância de cutoff (definida pelo utilizador) são consideradas como 0, representado a maioria dos valores). 3 4 Depois das otimizações à versão sequencial, esta apresentou uma melhoria de cerca de 100 vezes em relação à versão original. A versão paralela foi melhorada até 24 vezes, dependendo das moléculas em questão. No geral, o código é escalável, uma vez que apresenta melhores resultados consoante o aumento do tamanho das moléculas testadas, apesar de se concluir que a disposição dos átomos também influencia a perfomance do algoritmo.This work was supported by FCT (Fundação para a Ciência e Tecnologia) within project RDB-TS: Uma base de dados de reações químicas baseadas em informação de estados de transição derivados de cálculos quânticos (Refª BI2-2019_NORTE-01-0145-FEDER-031689_UMINHO), co-funded by the North Portugal Regional Operational Programme, through the European Regional Development Fun

    Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures

    Get PDF
    We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) methods to tackle the problem of the interaction of solids with an incompressible fluid flow, and its implementation on heterogeneous platforms based on data-parallel accelerators such as NVIDIA GPUs and the Intel Xeon Phi. We explain in detail the parallelization of these methods and describe a number of optimizations, mainly focusing on improving memory management and reducing the cost of host-accelerator communication. As previous research has consistently shown, pure LBM simulations are able to achieve good performance results on heterogeneous systems thanks to the high parallel efficiency of this method. Unfortunately, when coupling LBM and IB methods, the overheads of IB degrade the overall performance. As an alternative, we have explored different hybrid implementations that effectively hide such overheads and allow us to exploit both the multi-core and the hardware accelerator in a cooperative way, with excellent performance results

    SusTrainable: Promoting Sustainability as a Fundamental Driver in Software Development Training and Education. 2nd Teacher Training, January 23-27, 2023, Pula, Croatia. Revised lecture notes

    Full text link
    This volume exhibits the revised lecture notes of the 2nd teacher training organized as part of the project Promoting Sustainability as a Fundamental Driver in Software Development Training and Education, held at the Juraj Dobrila University of Pula, Croatia, in the week January 23-27, 2023. It is the Erasmus+ project No. 2020-1-PT01-KA203-078646 - Sustrainable. More details can be found at the project web site https://sustrainable.github.io/ One of the most important contributions of the project are two summer schools. The 2nd SusTrainable Summer School (SusTrainable - 23) will be organized at the University of Coimbra, Portugal, in the week July 10-14, 2023. The summer school will consist of lectures and practical work for master and PhD students in computing science and closely related fields. There will be contributions from Babe\c{s}-Bolyai University, E\"{o}tv\"{o}s Lor\'{a}nd University, Juraj Dobrila University of Pula, Radboud University Nijmegen, Roskilde University, Technical University of Ko\v{s}ice, University of Amsterdam, University of Coimbra, University of Minho, University of Plovdiv, University of Porto, University of Rijeka. To prepare and streamline the summer school, the consortium organized a teacher training in Pula, Croatia. This was an event of five full days, organized by Tihana Galinac Grbac and Neven Grbac. The Juraj Dobrila University of Pula is very concerned with the sustainability issues. The education, research and management are conducted with sustainability goals in mind. The contributions in the proceedings were reviewed and provide a good overview of the range of topics that will be covered at the summer school. The papers in the proceedings, as well as the very constructive and cooperative teacher training, guarantee the highest quality and beneficial summer school for all participants.Comment: 85 pages, 8 figures, 3 code listings and 1 table; editors: Tihana Galinac Grbac, Csaba Szab\'{o}, Jo\~{a}o Paulo Fernande

    Towards Cyberbullying-free social media in smart cities: a unified multi-modal approach

    Get PDF
    YesSmart cities are shifting the presence of people from physical world to cyber world (cyberspace). Along with the facilities for societies, the troubles of physical world, such as bullying, aggression and hate speech, are also taking their presence emphatically in cyberspace. This paper aims to dig the posts of social media to identify the bullying comments containing text as well as image. In this paper, we have proposed a unified representation of text and image together to eliminate the need for separate learning modules for image and text. A single-layer Convolutional Neural Network model is used with a unified representation. The major findings of this research are that the text represented as image is a better model to encode the information. We also found that single-layer Convolutional Neural Network is giving better results with two-dimensional representation. In the current scenario, we have used three layers of text and three layers of a colour image to represent the input that gives a recall of 74% of the bullying class with one layer of Convolutional Neural Network.Ministry of Electronics and Information Technology (MeitY), Government of Indi

    A pilgrimage to gravity on GPUs

    Get PDF
    In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.Comment: To appear in: European Physical Journal "Special Topics" : "Computer Simulations on Graphics Processing Units" . 18 pages, 8 figure

    Doctor of Philosophy

    Get PDF
    dissertationMemory access irregularities are a major bottleneck for bandwidth limited problems on Graphics Processing Unit (GPU) architectures. GPU memory systems are designed to allow consecutive memory accesses to be coalesced into a single memory access. Noncontiguous accesses within a parallel group of threads working in lock step may cause serialized memory transfers. Irregular algorithms may have data-dependent control flow and memory access, which requires runtime information to be evaluated. Compile time methods for evaluating parallelism, such as static dependence graphs, are not capable of evaluating irregular algorithms. The goals of this dissertation are to study irregularities within the context of unstructured mesh and sparse matrix problems, analyze the impact of vectorization widths on irregularities, and present data-centric methods that improve control flow and memory access irregularity within those contexts. Reordering associative operations has often been exploited for performance gains in parallel algorithms. This dissertation presents a method for associative reordering of stencil computations over unstructured meshes that increases data reuse through caching. This novel parallelization scheme offers considerable speedups over standard methods. Vectorization widths can have significant impact on performance in vectorized computations. Although the hardware vector width is generally fixed, the logical vector width used within a computation can range from one up to the width of the computation. Significant performance differences can occur due to thread scheduling and resource limitations. This dissertation analyzes the impact of vectorization widths on dense numerical computations such as 3D dG postprocessing. It is difficult to efficiently perform dynamic updates on traditional sparse matrix formats. Explicitly controlling memory segmentation allows for in-place dynamic updates in sparse matrices. Dynamically updating the matrix without rebuilding or sorting greatly improves processing time and overall throughput. This dissertation presents a new sparse matrix format, dynamic compressed sparse row (DCSR), which allows for dynamic streaming updates to a sparse matrix. A new method for parallel sparse matrix-matrix multiplication (SpMM) that uses dynamic updates is also presented
    • …
    corecore