Search CORE

14 research outputs found

swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

Author: Gan Lin
Liu Changxi
Luan Zhongzhi
Qian Depei
Sun Rujun
Yang Guangwen
Yang Hailong
Publication venue
Publication date: 18/04/2019
Field of study

The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the exiting deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific and deep learning applications. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient code for deep learning application on Sunway. The experimental results show the ability of swTVM to automatically generate code for various deep neural network models on Sunway. The performance of automatically generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup on average than hand-optimized OpenACC implementations on convolution and fully connected layers respectively. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and high performance architecture particularly with productivity and efficiency in mind. We would like to open source the implementation so that more people can embrace the power of deep learning compiler and Sunway many-core processor

arXiv.org e-Print Archive

Simulation of coarsening in two-phase systems with dissimilar mobilities

Author: Andrews W. Beck
Thornton Katsuyo
Voorhees Peter W.
Publication venue: 'Elsevier BV'
Publication date: 31/03/2022
Field of study

In this work, we apply phase field simulations to examine the coarsening behavior of morphologically complex two-phase microstructures in which the phases have highly dissimilar mobilities, a condition approaching that found in experimental solid-liquid systems. Specifically, we consider a two-phase system at the critical composition (

50\%

volume fraction) in which the mobilities of the two phases differ by a factor of 100. This system is simulated in two and three dimensions using the Cahn-Hilliard model with a concentration-dependent mobility, and results are compared to simulations with a constant mobility. A morphological transition occurs during coarsening of the two-dimensional system (corresponding to a thin film geometry) with dissimilar mobilities, resulting in a system of nearly-circular particles of high-mobility phase embedded in a low-mobility matrix. This morphological transition causes the coarsening rate constant to decrease over time, which explains why a previous study found lack of agreement with the theoretical

t^{1/3}

power law. Three-dimensional systems with dissimilar mobilities resulted in bicontinuous microstructures that evolve self-similarly, as determined by quantitative analysis of the interfacial shape distribution. Coarsening kinetics in three dimensions agreed closely with the

t^{1/3}

power law after the initial transient stage. A model is derived to explain a nearly-linear relationship between the coarsening rate constant and the variance of scaled mean curvature that is observed during this transient stage.Comment: 25 pages, 12 figure

arXiv.org e-Print Archive

Numerical simulation of dual-phase steel based on real and virtual three-dimensional microstructures

Author: Britz Dominik
Diebels Stefan
Gola Jessica
Mücklich Frank
Scherff Frederik
Scholl Sebastian
Srivastava Kinshuk
Staudt Thorsten
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2021
Field of study

Dual-phase steel shows a strong connection between its microstructure and its mechanical properties. This structure–property correlation is caused by the composition of the microstructure of a soft ferritic matrix with embedded hard martensite areas, leading to a simultaneous increase in strength and ductility. As a result, dual-phase steels are widely used especially for strength-relevant and energy-absorbing sheet metal structures. However, their use as heavy plate steel is also desirable. Therefore, a better understanding of the structure–property correlation is of great interest. Microstructure-based simulation is essential for a realistic simulation of the mechanical properties of dual-phase steel. This paper describes the entire process route of such a simulation, from the extraction of the microstructure by 3D tomography and the determination of the properties of the individual phases by nanoindentation, to the implementation of a simulation model and its validation by experiments. In addition to simulations based on real microstructures, simulations based on virtual microstructures are also of great importance. Thus, a model for the generation of virtual microstructures is presented, allowing for the same statistical properties as real microstructures. With the help of these structures and the aforementioned simulation model, it is then possible to predict the mechanical properties of a dual-phase steel, whose three-dimensional (3D) microstructure is not yet known with high accuracy. This will enable future investigations of new dual-phase steel microstructures within a virtual laboratory even before their production

Universaar

Acronym

Scaling and Resilience in Numerical Algorithms for Exascale Computing

Author: Nielsen Allan Svejstrup
Publication venue: Lausanne, EPFL
Publication date: 05/11/2018
Field of study

The first Petascale supercomputer, the IBM Roadrunner, went online in 2008. Ten years later, the community is now looking ahead to a new generation of Exascale machines. During the decade that has passed, several hundred Petascale capable machines have been installed worldwide, yet despite the abundance of machines, applications that scale to their full size remain rare. Large clusters now routinely have 50.000+ cores, some have several million. This extreme level of parallelism, that has allowed a theoretical compute capacity in excess of a million billion operations per second, turns out to be difficult to use in many applications of practical interest. Processors often end up spending more time waiting for synchronization, communication, and other coordinating operations to complete, rather than actually computing. Component reliability is another challenge facing HPC developers. If even a single processor fail, among many thousands, the user is forced to restart traditional applications, wasting valuable compute time. These issues collectively manifest themselves as low parallel efficiency, resulting in waste of energy and computational resources. Future performance improvements are expected to continue to come in large part due to increased parallelism. One may therefore speculate that the difficulties currently faced, when scaling applications to Petascale machines, will progressively worsen, making it difficult for scientists to harness the full potential of Exascale computing. The thesis comprises two parts. Each part consists of several chapters discussing modifications of numerical algorithms to make them better suited for future Exascale machines. In the first part, the use of Parareal for Parallel-in-Time integration techniques for scalable numerical solution of partial differential equations is considered. We propose a new adaptive scheduler that optimize the parallel efficiency by minimizing the time-subdomain length without making communication of time-subdomains too costly. In conjunction with an appropriate preconditioner, we demonstrate that it is possible to obtain time-parallel speedup on the nonlinear shallow water equation, beyond what is possible using conventional spatial domain-decomposition techniques alone. The part is concluded with the proposal of a new method for constructing Parallel-in-Time integration schemes better suited for convection dominated problems. In the second part, new ways of mitigating the impact of hardware failures are developed and presented. The topic is introduced with the creation of a new fault-tolerant variant of Parareal. In the chapter that follows, a C++ Library for multi-level checkpointing is presented. The library uses lightweight in-memory checkpoints, protected trough the use of erasure codes, to mitigate the impact of failures by decreasing the overhead of checkpointing and minimizing the compute work lost. Erasure codes have the unfortunate property that if more data blocks are lost than parity codes created, the data is effectively considered unrecoverable. The final chapter contains a preliminary study on partial information recovery for incomplete checksums. Under the assumption that some meta knowledge exists on the structure of the data encoded, we show that the data lost may be recovered, at least partially. This result is of interest not only in HPC but also in data centers where erasure codes are widely used to protect data efficiently

Infoscience - École polytechnique fédérale de Lausanne

New approaches for efficient on-the-fly FE operator assembly in a high-performance mantle convection framework

Author: Bauer Simon
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 18/02/2019
Field of study

Massiv-parallele und großskalige Phasenfeldsimulationen zur Untersuchung der Mikrostrukturentwicklung

Author: Hötzer Johannes
Publication venue: KIT Scientific Publishing
Publication date: 30/07/2019
Field of study

The development of tailored materials with defined properties requires a deep understanding of the microstructure evolution. In the first part, the microstructure evolution during the directional solidification of ternary eutectics with a highly optimized phase-field solver in the waLBerla-framework is studied. In the second part, the microstructure evolution under the influence of pores at the grain boundaries in the final sintering stage is analyzed with the PACE3D solver

Directory of Open Access Books (DOAB)

Massiv-parallele und großskalige Phasenfeldsimulationen zur Untersuchung der Mikrostrukturentwicklung

Author: Hötzer Johannes
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2017
Field of study

Für maßgeschneiderte Bauteile mit definierten Eigenschaften ist ein detailliertes Verständnis der Mikrostrukturentwicklung notwendig. Im ersten Teil wird die Mikrostrukturentwicklung bei der ternären eutektischen gerichteten Erstarrung mit einem optimierten Phasenfeldlöser im massiv-parallelen waLBerla-Framework untersucht. Im zweiten Teil wird die Mikrostrukturentwicklung unter dem Einfluss von Poren an Korngrenzen während des Endstadiums des Sinterprozesses mit dem PACE3D-Löser analysiert

KITopen

Directory of Open Access Books (DOAB)