27 research outputs found

    Seismic Wave Propagation Simulations on Low-power and Performance-centric Manycores

    Get PDF
    International audienceThe large processing requirements of seismic wave propagation simulations make High Performance Computing (HPC) architectures a natural choice for their execution. However, to keep both the current pace of performance improvements and the power consumption under a strict power budget, HPC systems must be more energy e than ever. As a response to this need, energy-e and low-power processors began to make their way into the market. In this paper we employ a novel low-power processor, the MPPA-256 manycore, to perform seismic wave propagation simulations. It has 256 cores connected by a NoC, no cache-coherence and only a limited amount of on-chip memory. We describe how its particular architectural characteristics influenced our solution for an energy-e implementation. As a counterpoint to the low-power MPPA-256 architecture, we employ Xeon Phi, a performance-centric manycore. Although both processors share some architectural similarities, the challenges to implement an e seismic wave propagation kernel on these platforms are very di↵erent. In this work we compare the performance and energy e of our implementations for these processors to proven and optimized solutions for other hardware platforms such as general-purpose processors and a GPU. Our experimental results show that MPPA-256 has the best energy e consuming at least 77 % less energy than the other evaluated platforms, whereas the performance of our solution for the Xeon Phi is on par with a state-of-the-art solution for GPUs

    Leveraging performance of 3D finite difference schemes in large scientific computing simulations

    Get PDF
    Gone are the days when engineers and scientists conducted most of their experiments empirically. During these decades, actual tests were carried out in order to assess the robustness and reliability of forthcoming product designs and prove theoretical models. With the advent of the computational era, scientific computing has definetely become a feasible solution compared with empirical methods, in terms of effort, cost and reliability. Large and massively parallel computational resources have reduced the simulation execution times and have improved their numerical results due to the refinement of the sampled domain. Several numerical methods coexist for solving the Partial Differential Equations (PDEs). Methods such as the Finite Element (FE) and the Finite Volume (FV) are specially well suited for dealing with problems where unstructured meshes are frequent. Unfortunately, this flexibility is not bestowed for free. These schemes entail higher memory latencies due to the handling of irregular data accesses. Conversely, the Finite Difference (FD) scheme has shown to be an efficient solution for problems where the structured meshes suit the domain requirements. Many scientific areas use this scheme due to its higher performance. This thesis focuses on improving FD schemes to leverage the performance of large scientific computing simulations. Different techniques are proposed such as the Semi-stencil, a novel algorithm that increases the FLOP/Byte ratio for medium- and high-order stencils operators by reducing the accesses and endorsing data reuse. The algorithm is orthogonal and can be combined with techniques such as spatial- or time-blocking, adding further improvement. New trends on Symmetric Multi-Processing (SMP) systems -where tens of cores are replicated on the same die- pose new challenges due to the exacerbation of the memory wall problem. In order to alleviate this issue, our research is focused on different strategies to reduce pressure on the cache hierarchy, particularly when different threads are sharing resources due to Simultaneous Multi-Threading (SMT). Several domain decomposition schedulers for work-load balance are introduced ensuring quasi-optimal results without jeopardizing the overall performance. We combine these schedulers with spatial-blocking and auto-tuning techniques, exploring the parametric space and reducing misses in last level cache. As alternative to brute-force methods used in auto-tuning, where a huge parametric space must be traversed to find a suboptimal candidate, performance models are a feasible solution. Performance models can predict the performance on different architectures, selecting suboptimal parameters almost instantly. In this thesis, we devise a flexible and extensible performance model for stencils. The proposed model is capable of supporting multi- and many-core architectures including complex features such as hardware prefetchers, SMT context and algorithmic optimizations. Our model can be used not only to forecast execution time, but also to make decisions about the best algorithmic parameters. Moreover, it can be included in run-time optimizers to decide the best SMT configuration based on the execution environment. Some industries rely heavily on FD-based techniques for their codes. Nevertheless, many cumbersome aspects arising in industry are still scarcely considered in academia research. In this regard, we have collaborated in the implementation of a FD framework which covers the most important features that an HPC industrial application must include. Some of the node-level optimization techniques devised in this thesis have been included into the framework in order to contribute in the overall application performance. We show results for a couple of strategic applications in industry: an atmospheric transport model that simulates the dispersal of volcanic ash and a seismic imaging model used in Oil & Gas industry to identify hydrocarbon-rich reservoirs.Atrás quedaron los días en los que ingenieros y científicos realizaban sus experimentos empíricamente. Durante esas décadas, se llevaban a cabo ensayos reales para verificar la robustez y fiabilidad de productos venideros y probar modelos teóricos. Con la llegada de la era computacional, la computación científica se ha convertido en una solución factible comparada con métodos empíricos, en términos de esfuerzo, coste y fiabilidad. Los supercomputadores han reducido el tiempo de las simulaciones y han mejorado los resultados numéricos gracias al refinamiento del dominio. Diversos métodos numéricos coexisten para resolver las Ecuaciones Diferenciales Parciales (EDPs). Métodos como Elementos Finitos (EF) y Volúmenes Finitos (VF) están bien adaptados para tratar problemas donde las mallas no estructuradas son frecuentes. Desafortunadamente, esta flexibilidad no se confiere de forma gratuita. Estos esquemas conllevan latencias más altas debido al acceso irregular de datos. En cambio, el esquema de Diferencias Finitas (DF) ha demostrado ser una solución eficiente cuando las mallas estructuradas se adaptan a los requerimientos. Esta tesis se enfoca en mejorar los esquemas DF para impulsar el rendimiento de las simulaciones en la computación científica. Se proponen diferentes técnicas, como el Semi-stencil, un nuevo algoritmo que incrementa el ratio de FLOP/Byte para operadores de stencil de orden medio y alto reduciendo los accesos y promoviendo el reuso de datos. El algoritmo es ortogonal y puede ser combinado con técnicas como spatial- o time-blocking, añadiendo mejoras adicionales. Las nuevas tendencias hacia sistemas con procesadores multi-simétricos (SMP) -donde decenas de cores son replicados en el mismo procesador- plantean nuevos retos debido a la exacerbación del problema del ancho de memoria. Para paliar este problema, nuestra investigación se centra en estrategias para reducir la presión en la jerarquía de cache, particularmente cuando diversos threads comparten recursos debido a Simultaneous Multi-Threading (SMT). Introducimos diversos planificadores de descomposición de dominios para balancear la carga asegurando resultados casi óptimos sin poner en riesgo el rendimiento global. Combinamos estos planificadores con técnicas de spatial-blocking y auto-tuning, explorando el espacio paramétrico y reduciendo los fallos en la cache de último nivel. Como alternativa a los métodos de fuerza bruta usados en auto-tuning donde un espacio paramétrico se debe recorrer para encontrar un candidato, los modelos de rendimiento son una solución factible. Los modelos de rendimiento pueden predecir el rendimiento en diferentes arquitecturas, seleccionando parámetros suboptimos casi de forma instantánea. En esta tesis, ideamos un modelo de rendimiento para stencils flexible y extensible. El modelo es capaz de soportar arquitecturas multi-core incluyendo características complejas como prefetchers, SMT y optimizaciones algorítmicas. Nuestro modelo puede ser usado no solo para predecir los tiempos de ejecución, sino también para tomar decisiones de los mejores parámetros algorítmicos. Además, puede ser incluido en optimizadores run-time para decidir la mejor configuración SMT. Algunas industrias confían en técnicas DF para sus códigos. Sin embargo, no todos los aspectos que aparecen en la industria han sido sometidos a investigación. En este aspecto, hemos diseñado e implementado desde cero una infraestructura DF que cubre las características más importantes que una aplicación industrial debe incluir. Algunas de las técnicas de optimización propuestas en esta tesis han sido incluidas para contribuir en el rendimiento global a nivel industrial. Mostramos resultados de un par de aplicaciones estratégicas para la industria: un modelo de transporte atmosférico que simula la dispersión de ceniza volcánica y un modelo de imagen sísmica usado en la industria del petroleo y gas para identificar reservas ricas en hidrocarburo

    Development, production and performance testing of a three axes CNC router

    Get PDF
    Nowadays there are Computer Numerical Control (CNC) machines with up to 9 axes, which allow the fabrication of high complexity parts in a single machining operation. However, considering the cost and difficulty to operate, such machines are inadequate for most users. Even though there are very affordable options on the market, these are very limited in usable work volume and cutting capabilities. The present work focuses on the development and fabrication of a three axes CNC router that shows a good performance/cost relation. The performance can be translated as a combi-nation of a high volume of work with a good cutting precision and accuracy. The cutting pre-cision and accuracy are verified through a set of experiments designed for that purpose. The results obtained evidence that user with experience in building equipment can, with relative ease, create CNC routers as the one developed in this work. Additionally, a fourth axis was also considered for a future iteration of the machine.Atualmente existem máquinas controladas numericamente por computador (CNC) com até nove eixos, as quais permitem fabricar peças com elevada complexidade com uma única operação de fixação. No entanto, dado o custo e a dificuldade de operação destas máquinas para muitos utilizadores estes equipamentos são desadequados. Por outro lado, existem no mercado máquinas ferramentas a custos acessíveis, mas que tendem a ser muito limitadas quanto ao volume de trabalho e à capacidade de corte. O trabalho desenvolvido centra-se no desenvolvimento e na fabricação de uma fresa-dora CNC de três eixos que apresente uma boa relação desempenho custo. O desempenho traduz-se na conjugação de um grande volume de trabalho com uma boa exatidão de corte. A exatidão do corte verifica-se através de um conjunto de ensaios elaborados com esse propósito. Os resultados obtidos evidenciam que os utilizadores com boa experiência na construção de equipamentos podem, com relativa facilidade, implementar fresadoras como a desenvolvida neste trabalho. Foi também considerada a implementação de um quarto eixo numa próxima versão da fresadora

    Low-power System-on-Chip Processors for Energy Efficient High Performance Computing: The Texas Instruments Keystone II

    No full text
    The High Performance Computing (HPC) community recognizes energy consumption as a major problem. Extensive research is underway to identify means to increase energy efficiency of HPC systems including consideration of alternative building blocks for future systems. This thesis considers one such system, the Texas Instruments Keystone II, a heterogeneous Low-Power System-on-Chip (LPSoC) processor that combines a quad core ARM CPU with an octa-core Digital Signal Processor (DSP). It was first released in 2012. Four issues are considered: i) maximizing the Keystone II ARM CPU performance; ii) implementation and extension of the OpenMP programming model for the Keystone II; iii) simultaneous use of ARM and DSP cores across multiple Keystone SoCs; and iv) an energy model for applications running on LPSoCs like the Keystone II and heterogeneous systems in general. Maximizing the performance of the ARM CPU on the Keystone II system is fundamental to adoption of this system by the HPC community and, of the ARM architecture more broadly. Key to achieving good performance is exploitation of the ARM vector instructions. This thesis presents the first detailed comparison of the use of ARM compiler intrinsic functions with automatic compiler vectorization across four generations of ARM processors. Comparisons are also made with x86 based platforms and the use of equivalent Intel vector instructions. Implementation of the OpenMP programming model on the Keystone II system presents both challenges and opportunities. Challenges in that the OpenMP model was originally developed for a homogeneous programming environment with a common instruction set architecture, and in 2012 work had only just begun to consider how OpenMP might work with accelerators. Opportunities in that shared memory is accessible to all processing elements on the LPSoC, offering performance advantages over what typically exists with attached accelerators. This thesis presents an analysis of a prototype version of OpenMP implemented as a bare-metal runtime on the DSP of a Keystone I system. An implementation for the Keystone II that maps OpenMP 4.0 accelerator directives to OpenCL runtime library operations is presented and evaluated. Exploitation of some of the underlying hardware features of the Keystone II is also discussed. Simultaneous use of the ARM and DSP cores across multiple Keystone II boards is fundamental to the creation of commercially viable HPC offerings based on Keystone technology. The nCore BrownDwarf and HPE Moonshot systems represent two such systems. This thesis presents a proof-of-concept implementation of matrix multiplication (GEMM) for the BrownDwarf system. The BrownDwarf utilizes both Keystone II and Keystone I SoCs through a point-to-point interconnect called Hyperlink. Details of how a novel message passing communication framework across Hyperlink was implemented to support this complex environment are provided. An energy model that can be used to predict energy usage as a function of what fraction of a particular computation is performed on each of the available compute devices offers the opportunity for making runtime decisions on how best to minimize energy usage. This thesis presents a basic energy usage model that considers rates of executions on each device and their active and idle power usages. Using this model, it is shown that only under certain conditions does there exist an energy-optimal work partition that uses multiple compute devices. To validate the model a high resolution energy measurement environment is developed and used to gather energy measurements for a matrix multiplication benchmark running on a variety of systems. Results presented support the model. Drawing on the four issues noted above and other developments that have occurred since the Keystone II system was first announced, the thesis concludes by making comments regarding the future of LPSoCs as building blocks for HPC systems

    Proceedings of the Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016) Sofia, Bulgaria

    Get PDF
    Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016

    Sustainable 3D printing with copolyester-based polymers

    Get PDF
    Dissertação de mestrado integrado em Engenharia de Polímeros3D printing has been increasing over the last few years and is a recent method for developing plastic products. 3D printing provides several sustainability advantages: less waste during manufacturing as it is an additive process; the ability to optimise geometries and create lightweight components that reduce material and energy consumption; and reduced waste due to the ability to produce recycled parts. The present research aims to study the influence of printing temperature, printing speed, and percentage of recyclate on the mechanical properties, flow behaviour and morphology of virgin and recycled parts of copolyester-based polymers. In the tests performed in this research, the mechanical characterization, morphological characterization and thermal characterization of the materials used were studied: low, medium and high viscosity polyethene terephthalate glycol (PETG). From the mechanical tests, it was possible to conclude that with increasing printing temperature, it can be seen that the mechanical properties are improved, as there is better adhesion between layers compared to low printing temperatures. Furthermore, the parts that were subjected to the impact test proved this phenomenon using optical microscopy. Regarding the printing speed, although there is a lot of proximity in the results, it was possible to observe that the parts printed at 10mm/s resulted in better quality, probably because the layers solidified adequately and provided a good adhesion between layers. Furthermore, regardless of the variation in printing temperature or printing speed, it was possible to observe that with the density measurements, it remained equal to 1.26g/cm³. Regarding the percentage of recycled, it was possible to see that with the increase in the regrind fraction, the MFI increases, and the mechanical properties tend to decrease, probably because irreversible changes were caused in the material properties. According to the thermal characterization, the glass transition temperature remained constant, and the degradation temperature of PETG increased, even though there was no significant difference with the increase in the regrind fraction.A impressão 3D tem vindo a aumentar durante os últimos anos e é um método recente para o desenvolvimento de produtos em plástico. A impressão 3D fornece uma série de vantagens de sustentabilidade: menos resíduos durante o fabrico por ser um processo aditivo; a capacidade de otimizar geometrias e criar componentes leves que reduzem o consumo de material e de energia; e redução do desperdício devido à capacidade de produzir peças recicladas. A presente investigação visa o estudo da influência da temperatura de impressão, velocidade de impressão, e percentagem de reciclado nas propriedades mecânicas, no comportamento de fluxo e na morfologia das peças virgens e recicladas dos polímeros à base de copoliésteres. Nos testes realizados nesta pesquisa, estudou-se a caracterização mecânica, a caracterização morfológica e a caracterização térmica dos materiais utilizados: o polietileno teraftalato glicol (PETG) de baixa, média e alta viscosidade. A partir dos testes mecânicos, foi possível concluir que com o aumento da temperatura de impressão, é possível verificar que as propriedades mecânicas são melhoradas, visto que há uma melhor adesão entre camadas comparativamente com temperaturas de impressão baixas. Além disso, as peças que foram sujeitas ao teste de impacto comprovaram este fenómeno com recurso à microscopia ótica. No que toca à velocidade de impressão, apesar de haver muita proximidade nos resultados, foi possível observar que as peças impressas a 10mm/s resultaram numa melhor qualidade da peça, provavelmente porque as camadas solidificaram devidamente e proporcionaram uma boa adesão entre camadas. Para além disso, independentemente da variação da temperatura de impressão ou da velocidade de impressão, foi possível observar que com as medições da densidade, esta manteve-se igual a 1,26g/cm³. Em relação à percentagem de reciclado, foi possível observar que com o aumento da percentagem de reciclado, o MFI aumenta e as propriedades mecânicas tendem a diminuir, provavelmente porque foram causadas modificações irreversíveis nas propriedades do material. Segundo a caracterização térmica, a temperatura de transição vítrea manteve-se contante e a temperatura de degradação do PETG, apesar de não haver uma diferença significativa, aumentou com o aumento da percentagem de reciclado

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest

    Construção e teste de máquina de fabrico aditivo de metais

    Get PDF
    Fabrico Aditivo está a revolucionar o desenvolvimento do produto e da sua produção. Esta tecnologia apresenta numerosas vantagens, em particular a enorme liberdade geométrica através do desenvolvimento de componentes com geometria complexa, orgânica e otimizada, quando comparada com peças produzidas através de processos convencionais. O trabalho desenvolvido neste projeto abrange uma breve descrição da tecnologia de Fabrico Aditivo direcionada para a sinterização direta de metais por laser (Direct Metal Laser Sintering - DMLS), onde é apresentado o processo, materiais usados e os fabricantes de máquinas disponíveis no mercado. O objetivo deste projeto visa a finalização de uma máquina de fabrico aditivo de metais desenvolvida no Departamento de Engenharia Mecânica da Universidade de Aveiro. Com o projeto inicial da máquina previamente definido e parcialmente construído, realizaram-se alterações e novas aplicações de modo a tornar a máquina funcional. Ao nível do comando da máquina, foram efetuadas alterações do esquema elétrico inicial e parametrizados alguns dos componentes integrados, de forma a ser possível controlar e realizar ensaios experimentais na máquina. Os resultados obtidos através dos testes efetuados permitiram concluir que todos os componentes integrantes da máquina se encontram funcionais.Additive Manufacturing is revolutionizing the Product Development industry and its production as this technology possesses numerous advantages. When compared to more usual production processes, Additive Manufacturing has proven to be able to achieve highly complex, organic and optimized geometries due to the great amount of geometric and shape freedom. This documents body of work encapsules a brief description of Additive Manufacturing and its technology applied to the Direct Metal Laser Sintering (DMLS), where the process, used materials and available machine and manufactorers are presented and discussed. The project produced for this document has the goal of developing and building an Additive Manufacturing machine for metals and was done in Departamento de Engenharia Mecânica of University of Aveiro. An existing project was used as a basis and some alterations and enhancments were made in order to make the machine functional. Some changes were also made to the machine's existing eletrical componentes and adjustments to its parts were also made in order to allow the control and execution of experimental tests for the project. After such tests the reliability of the machine was confirmed and it is possible to say that it is functioning and operational.Mestrado em Engenharia Mecânic

    International student projects in a blended setting:How to facilitate problem based project work

    Get PDF
    corecore