14 research outputs found
swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture
The flourish of deep learning frameworks and hardware platforms has been
demanding an efficient compiler that can shield the diversity in both software
and hardware in order to provide application portability. Among the exiting
deep learning compilers, TVM is well known for its efficiency in code
generation and optimization across diverse hardware devices. In the meanwhile,
the Sunway many-core processor renders itself as a competitive candidate for
its attractive computational power in both scientific and deep learning
applications. This paper combines the trends in these two directions.
Specifically, we propose swTVM that extends the original TVM to support
ahead-of-time compilation for architecture requiring cross-compilation such as
Sunway. In addition, we leverage the architecture features during the
compilation such as core group for massive parallelism, DMA for high bandwidth
memory transfer and local device memory for data locality, in order to generate
efficient code for deep learning application on Sunway. The experimental
results show the ability of swTVM to automatically generate code for various
deep neural network models on Sunway. The performance of automatically
generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup
on average than hand-optimized OpenACC implementations on convolution and fully
connected layers respectively. This work is the first attempt from the compiler
perspective to bridge the gap of deep learning and high performance
architecture particularly with productivity and efficiency in mind. We would
like to open source the implementation so that more people can embrace the
power of deep learning compiler and Sunway many-core processor
Simulation of coarsening in two-phase systems with dissimilar mobilities
In this work, we apply phase field simulations to examine the coarsening
behavior of morphologically complex two-phase microstructures in which the
phases have highly dissimilar mobilities, a condition approaching that found in
experimental solid-liquid systems. Specifically, we consider a two-phase system
at the critical composition ( volume fraction) in which the mobilities of
the two phases differ by a factor of 100. This system is simulated in two and
three dimensions using the Cahn-Hilliard model with a concentration-dependent
mobility, and results are compared to simulations with a constant mobility. A
morphological transition occurs during coarsening of the two-dimensional system
(corresponding to a thin film geometry) with dissimilar mobilities, resulting
in a system of nearly-circular particles of high-mobility phase embedded in a
low-mobility matrix. This morphological transition causes the coarsening rate
constant to decrease over time, which explains why a previous study found lack
of agreement with the theoretical power law. Three-dimensional
systems with dissimilar mobilities resulted in bicontinuous microstructures
that evolve self-similarly, as determined by quantitative analysis of the
interfacial shape distribution. Coarsening kinetics in three dimensions agreed
closely with the power law after the initial transient stage. A model
is derived to explain a nearly-linear relationship between the coarsening rate
constant and the variance of scaled mean curvature that is observed during this
transient stage.Comment: 25 pages, 12 figure
Numerical simulation of dual-phase steel based on real and virtual three-dimensional microstructures
Dual-phase steel shows a strong connection between its microstructure and its mechanical properties. This structure–property correlation is caused by the composition of the microstructure of a soft ferritic matrix with embedded hard martensite areas, leading to a simultaneous increase in strength and ductility. As a result, dual-phase steels are widely used especially for strength-relevant and energy-absorbing sheet metal structures. However, their use as heavy plate steel is also desirable. Therefore, a better understanding of the structure–property correlation is of great interest. Microstructure-based simulation is essential for a realistic simulation of the mechanical properties of dual-phase steel. This paper describes the entire process route of such a simulation, from the extraction of the microstructure by 3D tomography and the determination of the properties of the individual phases by nanoindentation, to the implementation of a simulation model and its validation by experiments. In addition to simulations based on real microstructures, simulations based on virtual microstructures are also of great importance. Thus, a model for the generation of virtual microstructures is presented, allowing for the same statistical properties as real microstructures. With the help of these structures and the aforementioned simulation model, it is then possible to predict the mechanical properties of a dual-phase steel, whose three-dimensional (3D) microstructure is not yet known with high accuracy. This will enable future investigations of new dual-phase steel microstructures within a virtual laboratory even before their production
Scaling and Resilience in Numerical Algorithms for Exascale Computing
The first Petascale supercomputer, the IBM Roadrunner, went online in 2008. Ten years later, the community is now looking ahead to a new generation of Exascale machines. During the decade that has passed, several hundred Petascale capable machines have been installed worldwide, yet despite the abundance of machines, applications that scale to their full size remain rare. Large clusters now routinely have 50.000+ cores, some have several million. This extreme level of parallelism, that has allowed a theoretical compute capacity in excess of a million billion operations per second, turns out to be difficult to use in many applications of practical interest. Processors often end up spending more time waiting for synchronization, communication, and other coordinating operations to complete, rather than actually computing. Component reliability is another challenge facing HPC developers. If even a single processor fail, among many thousands, the user is forced to restart traditional applications, wasting valuable compute time. These issues collectively manifest themselves as low parallel efficiency, resulting in waste of energy and computational resources. Future performance improvements are expected to continue to come in large part due to increased parallelism. One may therefore speculate that the difficulties currently faced, when scaling applications to Petascale machines, will progressively worsen, making it difficult for scientists to harness the full potential of Exascale computing.
The thesis comprises two parts. Each part consists of several chapters discussing modifications of numerical algorithms to make them better suited for future Exascale machines. In the first part, the use of Parareal for Parallel-in-Time integration techniques for scalable numerical solution of partial differential equations is considered. We propose a new adaptive scheduler that optimize the parallel efficiency by minimizing the time-subdomain length without making communication of time-subdomains too costly. In conjunction with an appropriate preconditioner, we demonstrate that it is possible to obtain time-parallel speedup on the nonlinear shallow water equation, beyond what is possible using conventional spatial domain-decomposition techniques alone. The part is concluded with the proposal of a new method for constructing Parallel-in-Time integration schemes better suited for convection dominated problems.
In the second part, new ways of mitigating the impact of hardware failures are developed and presented. The topic is introduced with the creation of a new fault-tolerant variant of Parareal. In the chapter that follows, a C++ Library for multi-level checkpointing is presented. The library uses lightweight in-memory checkpoints, protected trough the use of erasure codes, to mitigate the impact of failures by decreasing the overhead of checkpointing and minimizing the compute work lost. Erasure codes have the unfortunate property that if more data blocks are lost than parity codes created, the data is effectively considered unrecoverable. The final chapter contains a preliminary study on partial information recovery for incomplete checksums. Under the assumption that some meta knowledge exists on the structure of the data encoded, we show that the data lost may be recovered, at least partially. This result is of interest not only in HPC but also in data centers where erasure codes are widely used to protect data efficiently
Massiv-parallele und groĂźskalige Phasenfeldsimulationen zur Untersuchung der Mikrostrukturentwicklung
The development of tailored materials with defined properties requires a deep understanding of the microstructure evolution. In the first part, the microstructure evolution during the directional solidification of ternary eutectics with a highly optimized phase-field solver in the waLBerla-framework is studied. In the second part, the microstructure evolution under the influence of pores at the grain boundaries in the final sintering stage is analyzed with the PACE3D solver
Massiv-parallele und groĂźskalige Phasenfeldsimulationen zur Untersuchung der Mikrostrukturentwicklung
Für maßgeschneiderte Bauteile mit definierten Eigenschaften ist ein detailliertes Verständnis der Mikrostrukturentwicklung notwendig. Im ersten Teil wird die Mikrostrukturentwicklung bei der ternären eutektischen gerichteten Erstarrung mit einem optimierten Phasenfeldlöser im massiv-parallelen waLBerla-Framework untersucht. Im zweiten Teil wird die Mikrostrukturentwicklung unter dem Einfluss von Poren an Korngrenzen während des Endstadiums des Sinterprozesses mit dem PACE3D-Löser analysiert