190 research outputs found
Guided rewriting and constraint satisfaction for parallel GPU code generation
Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise.
This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only.
Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings.
The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation.
A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation
Machine Learning and Its Application to Reacting Flows
This open access book introduces and explains machine learning (ML) algorithms and techniques developed for statistical inferences on a complex process or system and their applications to simulations of chemically reacting turbulent flows. These two fields, ML and turbulent combustion, have large body of work and knowledge on their own, and this book brings them together and explain the complexities and challenges involved in applying ML techniques to simulate and study reacting flows. This is important as to the worldâs total primary energy supply (TPES), since more than 90% of this supply is through combustion technologies and the non-negligible effects of combustion on environment. Although alternative technologies based on renewable energies are coming up, their shares for the TPES is are less than 5% currently and one needs a complete paradigm shift to replace combustion sources. Whether this is practical or not is entirely a different question, and an answer to this question depends on the respondent. However, a pragmatic analysis suggests that the combustion share to TPES is likely to be more than 70% even by 2070. Hence, it will be prudent to take advantage of ML techniques to improve combustion sciences and technologies so that efficient and âgreenerâ combustion systems that are friendlier to the environment can be designed. The book covers the current state of the art in these two topics and outlines the challenges involved, merits and drawbacks of using ML for turbulent combustion simulations including avenues which can be explored to overcome the challenges. The required mathematical equations and backgrounds are discussed with ample references for readers to find further detail if they wish. This book is unique since there is not any book with similar coverage of topics, ranging from big data analysis and machine learning algorithm to their applications for combustion science and system design for energy generation
Co-designing reliability and performance for datacenter memory
Memory is one of the key components that affects reliability and performance of datacenter servers. Memory in todayâs servers is organized and shared in several ways to provide the most performant and efficient access to data. For example, cache hierarchy in multi-core chips to reduce access latency, non-uniform memory access (NUMA) in multi-socket servers to improve scalability,
disaggregation to increase memory capacity. In all these organizations, hardware coherence protocols are used to maintain memory consistency of this shared memory and implicitly move data to the requesting cores.
This thesis aims to provide fault-tolerance against newer models of failure in the organization of memory in datacenter servers. While designing for improved reliability, this thesis explores solutions that can also enhance performance of applications. The solutions build over modern coherence protocols to achieve these properties.
First, we observe that DRAM memory system failure rates have increased, demanding stronger forms of memory reliability. To combat this, the thesis proposes DvĂ©, a hardware driven replication mechanism where data blocks are replicated across two different memory controllers in a cache-coherent NUMA system. Data blocks are accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. DvĂ©âs organization offers two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory and (b) higher performance by providing another nearer point of memory access. DvĂ©âs coherent replication keeps the replicas in sync for reliability and also provides coherent access to read replicas during fault-free operation for improved performance. DvĂ© can
flexibly provide these benefits on-demand at runtime.
Next, we observe that the coherence protocol itself requires to be hardened against failures. Memory in datacenter servers is being disaggregated from the compute servers into dedicated memory servers, driven by standards like CXL. CXL specifies the coherence protocol semantics for compute servers to access and cache data from a shared region in the disaggregated memory. However, the CXL specification lacks the requisite level of fault-tolerance necessary to operate at an inter-server scale within the datacenter. Compute servers can fail or be unresponsive in the datacenter and therefore, it is important that the coherence protocol remain available in the presence of such failures.
The thesis proposes Äpta, a CXL-based, shared disaggregated memory system for keeping the cached data consistent without compromising availability in the face of compute server failures. Äpta architects a high-performance fault-tolerant object-granular memory server that significantly improves performance for stateless function-as-a-service (FaaS) datacenter applications
Cyber-Human Systems, Space Technologies, and Threats
CYBER-HUMAN SYSTEMS, SPACE TECHNOLOGIES, AND THREATS is our eighth textbook in a series covering the world of UASs / CUAS/ UUVs / SPACE. Other textbooks in our series are Space Systems Emerging Technologies and Operations; Drone Delivery of CBNRECy â DEW Weapons: Emerging Threats of Mini-Weapons of Mass Destruction and Disruption (WMDD); Disruptive Technologies with applications in Airline, Marine, Defense Industries; Unmanned Vehicle Systems & Operations On Air, Sea, Land; Counter Unmanned Aircraft Systems Technologies and Operations; Unmanned Aircraft Systems in the Cyber Domain: Protecting USAâs Advanced Air Assets, 2nd edition; and Unmanned Aircraft Systems (UAS) in the Cyber Domain Protecting USAâs Advanced Air Assets, 1st edition. Our previous seven titles have received considerable global recognition in the field. (Nichols & Carter, 2022) (Nichols, et al., 2021) (Nichols R. K., et al., 2020) (Nichols R. , et al., 2020) (Nichols R. , et al., 2019) (Nichols R. K., 2018) (Nichols R. K., et al., 2022)https://newprairiepress.org/ebooks/1052/thumbnail.jp
Temporal contrast-dependent modeling of laser-driven solids - studying femtosecond-nanometer interactions and probing
Establishing precise control over the unique beam parameters of laser-accelerated ions from relativistic ultra-short pulse laser-solid interactions has been a major goal for the past 20 years. While the spatio-temporal coupling of laser-pulse and target parameters create transient phenomena at femtosecond-nanometer scales that are decisive for the acceleration performance, these scales have also largely been inaccessible to experimental observation. Computer simulations of laser-driven plasmas provide valuable insight into the physics at play. Nevertheless, predictive capabilities are still lacking due to the massive computational cost to perform these in 3D at high resolution for extended simulation times. This thesis investigates the optimal acceleration of protons from ultra-thin foils following the interaction with an ultra-short ultra-high intensity laser pulse, including realistic contrast conditions up to a picosecond before the main pulse. Advanced ionization methods implemented into the highly scalable, open-source particle-in-cell code PIConGPU enabled this study. Supporting two experimental campaigns, the new methods led to a deeper understanding of the physics of Laser-Wakefield acceleration and Colloidal Crystal melting, respectively, for they now allowed to explain experimental observations with simulated ionization- and plasma dynamics. Subsequently, explorative 3D3V simulations of enhanced laser-ion acceleration were performed on the Swiss supercomputer Piz Daint. There, the inclusion of realistic laser contrast conditions altered the intra-pulse dynamics of the acceleration process significantly. Contrary to a perfect Gaussian pulse, a better spatio-temporal overlap of the protons with the electron sheath origin allowed for full exploitation of the accelerating potential, leading to higher maximum energies. Adapting well-known analytic models allowed to match the results qualitatively and, in chosen cases, quantitatively. However, despite complex 3D plasma dynamics not being reflected within the 1D models, the upper limit of ion acceleration performance within the TNSA scenario can be predicted remarkably well. Radiation signatures obtained from synthetic diagnostics of electrons, protons, and bremsstrahlung photons show that the target state at maximum laser intensity is encoded, previewing how experiments may gain insight into this previously unobservable time frame.
Furthermore, as X-ray Free Electron Laser facilities have only recently begun to allow observations at femtosecond-nanometer scales, benchmarking the physics models for solid-density plasma simulations is now in reach. Finally, this thesis presents the first start-to-end simulations of optical-pump, X-ray-probe laser-solid interactions with the photon scattering code ParaTAXIS. The associated PIC simulations guided the planning and execution of an LCLS experiment, demonstrating the first observation of solid-density plasma distribution driven by near-relativistic short-pulse laser pulses at femtosecond-nanometer resolution
Temporal contrast-dependent modeling of laser-driven solids: studying femtosecond-nanometer interactions and probing
Establishing precise control over the unique beam parameters of laser-accelerated ions from relativistic ultra-short pulse laser-solid interactions has been a major goal for the past 20 years. While the spatio-temporal coupling of laser-pulse and target parameters create transient phenomena at femtosecond-nanometer scales that are decisive for the acceleration performance, these scales have also largely been inaccessible to experimental observation. Computer simulations of laser-driven plasmas provide valuable insight into the physics at play. Nevertheless, predictive capabilities are still lacking due to the massive computational cost to perform these in 3D at high resolution for extended simulation times. This thesis investigates the optimal acceleration of protons from ultra-thin foils following the interaction with an ultra-short ultra-high intensity laser pulse, including realistic contrast conditions up to a picosecond before the main pulse. Advanced ionization methods implemented into the highly scalable, open-source particle-in-cell code PIConGPU enabled this study. Supporting two experimental campaigns, the new methods led to a deeper understanding of the physics of Laser-Wakeâeld acceleration and Colloidal Crystal melting, respectively, for they now allowed to explain experimental observations with simulated ionization- and plasma dynamics. Subsequently, explorative 3D3V simulations of enhanced laser-ion acceleration were performed on the Swiss supercomputer Piz Daint. There, the inclusion of realistic laser contrast conditions altered the intra-pulse dynamics of the acceleration process significantly. Contrary to a perfect Gaussian pulse, a better spatio-temporal overlap of the protons with the electron sheath origin allowed for full exploitation of the accelerating potential, leading to higher maximum energies. Adapting well-known analytic models allowed to match the results qualitatively and, in chosen cases, quantitatively. However, despite complex 3D plasma dynamics not being reflected within the 1D models, the upper limit of ion acceleration performance within the TNSA scenario can be predicted remarkably well. Radiation signatures obtained from synthetic diagnostics of electrons, protons, and bremsstrahlung photons show that the target state at maximum laser intensity is encoded, previewing how experiments may gain insight into this previously unobservable time frame. Furthermore, as X-ray Free Electron Laser facilities have only recently begun to allow observations at femtosecond-nanometer scales, benchmarking the physics models for solid-density plasma simulations is now in reach. Finally, this thesis presents the first start-to-end simulations of optical-pump, X-ray-probe laser-solid interactions with the photon scattering code ParaTAXIS. The associated PIC simulations guided the planning and execution of an LCLS experiment, demonstrating the first observation of solid-density plasma distribution driven by near-relativistic short-pulse laser pulses at femtosecond-nanometer resolution.Die Erlangung prĂ€ziser Kontrolle uÌber die einzigartigen Strahlparameter von laserbeschleunigten Ionen aus relativistischen Ultrakurzpuls-Laser-Festkörper-Wechselwirkungen ist ein wesentliches Ziel der letzten 20 Jahre. WĂ€hrend die rĂ€umlich-zeitliche Kopplung von Laserpuls und Targetparametern transiente PhĂ€nomene auf Femtosekunden- und Nanometerskalen erzeugt, die fuÌr den Beschleunigungsprozess entscheidend sind, waren diese Skalen der experimentellen Beobachtung bisher weitgehend unzugĂ€nglich. Computersimulationen von lasergetriebenen Plasmen liefern dabei wertvolle Einblicke in die zugrunde liegende Physik. Dennoch mangelt es noch an Vorhersagemöglichkeiten aufgrund des massiven Rechenaufwands, um Parameterstudien in 3D mit hoher Auflösung fuÌr lĂ€ngere Simulationszeiten durchzufuÌhren. In dieser Arbeit wird die optimale Beschleunigung von Protonen aus ultraduÌnnen Folien nach der Wechselwirkung mit einem ultrakurzen UltrahochintensitĂ€ts-Laserpuls unter Einbeziehung realistischer Kontrastbedingungen bis zu einer Pikosekunde vor dem Hauptpuls untersucht. Hierbei ermöglichen neu implementierte fortschrittliche Ionisierungsmethoden fuÌr den hoch skalierbaren, quelloffenen Partikel-in-Zelle-Code PIConGPU von nun an Studien dieser Art. Bei der UnterstuÌtzung zweier Experimentalkampagnen fuÌhrten diese Methoden zu einem tieferen VerstĂ€ndnis der Laser-Wakeâeld-Beschleunigung bzw. des Schmelzens kolloidaler Kristalle, da nun experimentelle Beobachtungen mit simulierter Ionisations- und Plasmadynamik erklĂ€rt werden konnten. Im Anschluss werden explorative 3D3V Simulationen verbesserter Laser-Ionen-Beschleunigung vorgestellt, die auf dem Schweizer Supercomputer Piz Daint durchgefuÌhrt wurden. Dabei verĂ€nderte die Einbeziehung realistischer Laserkontrastbedingungen die Intrapulsdynamik des Beschleunigungsprozesses signifikant. Im Gegensatz zu einem perfekten GauĂ-Puls erlaubte eine bessere rĂ€umlich-zeitliche Ăberlappung der Protonen mit dem Ursprung der Elektronenwolke die volle Ausnutzung des Beschleunigungspotentials, was zu höheren maximalen Energien fuÌhrte. Die Adaptation bekannter analytischer Modelle erlaubte es, die Ergebnisse qualitativ und in ausgewĂ€hlten FĂ€llen auch quantitativ zu bestĂ€tigen. Trotz der in den 1D-Modellen nicht abgebildeten komplexen 3D-Plasmadynamik zeigt die Vorhersage erstaunlich gut das obere Limit der erreichbaren Ionen-Energien im TNSA Szenario. Strahlungssignaturen, die aus synthethischen Diagnostiken von Elektronen, Protonen und Bremsstrahlungsphotonen gewonnen wurden, zeigen, dass der Target-Zustand bei maximaler LaserintensitĂ€t einkodiert ist, was einen Ausblick darauf gibt, wie Experimente Einblicke in dieses bisher unbeobachtbare Zeitfenster gewinnen können. Mit neuen Freie-Elektronen-Röntgenlasern sind Beobachtungen auf Femtosekunden-Nanometerskalen endlich zugĂ€nglich geworden. Damit liegt ein Benchmarking der physikalischen Modelle fuÌr Plasmasimulationen bei Festkörperdichte nun in Reichweite, aber Experimente sind immer noch selten, komplex, und schwer zu interpretieren. Zuletzt werden daher in dieser Arbeit die ersten Start-zu-End-Simulationen der Pump-Probe Wechselwirkungen von optischem sowie Röntgenlaser mit Festkörpern mittels des Photonenstreu-Codes ParaTAXIS vorgestellt. DaruÌber hinaus dienten die zugehörigen PIC-Simulationen als Grundlage fuÌr die Planung und DurchfuÌhrung eines LCLS-Experiments zur erstmaligen Beobachtung einer durch nah-relativistische Kurzpuls-Laserpulse getriebenen Festkörper-Plasma-Dichte, dessen Auflösungsbereich gleichzeitig bis auf Femtosekunden und Nanometer vordrang
Stratégies de checkpoint pour protéger les tùches parallÚles contre des erreurs ayant des distributions générales
This paper studies checkpointing strategies for parallel jobs subject to fail-stop errors. The optimal strategy is well known when failure inter-arrival times obey an Exponential law, but it is unknown for non-memoryless failure distributions. We explain why the latter fact is misunderstood in recent literature. We propose a general strategy that maximizes the expected efficiency until the next failure, and we show that this strategy is asymptotically optimal for very long jobs. Through extensive simulations, we show that the new strategy is always at least as good as the Young/Daly strategy for various failure distributions. For distributions with a high infant mortality (such as LogNormal 2.51 or Weibull 0.5), the execution time is divided by a factor 1.9 on average, and up to a factor 4.2 for recently deployed platforms.Cet article Ă©tudie les stratĂ©gies de checkpoint pour des tĂąches parallĂšles sujettes `a des erreurs fatales. La stratĂ©gie optimale est bien connue lorsque les temps dâinter-arrivĂ©e des pannes obĂ©issent `a une loi exponentielle, mais elle est inconnue pour les distributions dâerreurs gĂ©nĂ©rales. Nous expliquons pourquoi ce dernier fait est mal compris dans la littĂ©rature rĂ©cente. Nous proposons une stratĂ©gie gĂ©nĂ©rale qui maximise lâefficacitĂ© attendue jusquâ`a la prochaine dÂŽdĂ©faillance, et nous montrons que cette stratĂ©gie est asymptotiquement optimale pour les travaux trĂšs longs. Par des simulations extensives, nous montrons que la nouvelle stratĂ©gie est toujours au moins aussi bonne que la stratĂ©gie de Young/Daly pour diverses distributions de pannes. Pour les distributions avec une mortalitĂ© infantile Ă©levĂ©e (comme LogNormal 2.51 ou Weibull 0.5), le temps dâexĂ©cution est divisĂ© par un facteur 1.9 en moyenne, et jusquâ`a un facteur 4.2 pour des plates-formes rĂ©cemment dĂ©ployĂ©es
Roadmap on Electronic Structure Codes in the Exascale Era
Electronic structure calculations have been instrumental in providing many important insights into a range of physical and chemical properties of various molecular and solid-state systems. Their importance to various fields, including materials science, chemical sciences, computational chemistry and device physics, is underscored by the large fraction of available public supercomputing resources devoted to these calculations. As we enter the exascale era, exciting new opportunities to increase simulation numbers, sizes, and accuracies present themselves. In order to realize these promises, the community of electronic structure software developers will however first have to tackle a number of challenges pertaining to the efficient use of new architectures that will rely heavily on massive parallelism and hardware accelerators. This roadmap provides a broad overview of the state-of-the-art in electronic structure calculations and of the various new directions being pursued by the community. It covers 14 electronic structure codes, presenting their current status, their development priorities over the next five years, and their plans towards tackling the challenges and leveraging the opportunities presented by the advent of exascale computing
HPC-enabling technologies for high-fidelity combustion simulations
With the increase in computational power in the last decade and the forthcoming Exascale supercomputers, a new horizon in computational modelling and simulation is envisioned in combustion science. Considering the multiscale and multiphysics characteristics of turbulent reacting flows, combustion simulations are considered as one of the most computationally demanding applications running on cutting-edge supercomputers. Exascale computing opens new frontiers for the simulation of combustion systems as more realistic conditions can be achieved with high-fidelity methods. However, an efficient use of these computing architectures requires methodologies that can exploit all levels of parallelism. The efficient utilization of the next generation of supercomputers needs to be considered from a global perspective, that is, involving physical modelling and numerical methods with methodologies based on High-Performance Computing (HPC) and hardware architectures. This review introduces recent developments in numerical methods for large-eddy simulations (LES) and direct-numerical simulations (DNS) to simulate combustion systems, with focus on the computational performance and algorithmic capabilities. Due to the broad scope, a first section is devoted to describe the fundamentals of turbulent combustion, which is followed by a general description of state-of-the-art computational strategies for solving these problems. These applications require advanced HPC approaches to exploit modern supercomputers, which is addressed in the third section. The increasing complexity of new computing architectures, with tightly coupled CPUs and GPUs, as well as high levels of parallelism, requires new parallel models and algorithms exposing the required level of concurrency. Advances in terms of dynamic load balancing, vectorization, GPU acceleration and mesh adaptation have permitted to achieve highly-efficient combustion simulations with data-driven methods in HPC environments. Therefore, dedicated sections covering the use of high-order methods for reacting flows, integration of detailed chemistry and two-phase flows are addressed. Final remarks and directions of future work are given at the end.
}The research leading to these results has received funding from the European Unionâs Horizon 2020 Programme under the CoEC project, grant agreement No. 952181 and the CoE RAISE project grant agreement no. 951733.Peer ReviewedPostprint (published version
Research and innovation 2019
Research and innovation are two pillars that come together when universities are at stake. The expansion of the frontiers of human knowledge, in all areas and disciplines, is an irrefutable commitment of higher education institutions. Together with public and private entities, they are also committed to promoting knowledge transfer to society and the economy, in the form of new ideas, new products and new processes. Universities are supposed to transform ideas into value for society.
To achieve these goals, higher education institutions have to assure their human resources are highly qualified, that they have an adequate atmosphere, that research is of high quality, and finally that adequate interactions take place.
At UMinho we have a clear strategy to be an open and permanent space for knowledge production and furtherance of nationally and internationally relevant innovation across different social and economic sectors.
For many years, UMinho has adopted the principles of open access and open science. We aim at carrying out our scientific activity and the dissemination of the corresponding results transparently and collaboratively; this implies that researchers, citizens, policymakers, state agencies, companies, and third sector organizations work in close cooperation facing research and innovation processes. We believe this is the shorter way to trigger smart and sustainable growth and qualified job creation.
At UMinho, we encourage the coupling between research and education.
Our goal is to expand research opportunities and to give our students occasions to experience vibrant research environments, ensuring that learning goes beyond the âcommonâ routines.
Joining research and learning processes provides both undergraduate and postgraduate students with opportunities to own their learning process. We believe that research experience has a role to play in improving studentsâ motivation for learning, in the pursuit of their interests.
Doing better science occurs when we make it both more sensitive to the needs of society and also more efficient in what concerns the allocated resources. It is also a question of accountability. This is fundamental for reinforcing society awareness about our contributions to human and social development.
Following the 2018 publication, we present here the 2019 edition of Research and Innovation, a series that draws on the outcomes of the activity of the UMinho research and innovation ecosystem. This comprehensive volume gives particular emphasis to the Research Units outcomes, namely in terms of funding, research projects, papers, and the most important achievements; the activity of the Interface Units
and Collaborative Laboratories in which UMinho participates is also reported, through their activities and institutional projects, making evident their importance for the continuous growth of our Institution,
our region, and our country.
Rui Vieira de Castro
RectorPublishe
- âŠ