14 research outputs found

    swTVM: Exploring the Automated Compilation for Deep Learning on Sunway Architecture

    Full text link
    The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the exiting deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific and deep learning applications. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient code for deep learning application on Sunway. The experimental results show the ability of swTVM to automatically generate code for various deep neural network models on Sunway. The performance of automatically generated code for AlexNet and VGG-19 by swTVM achieves 6.71x and 2.45x speedup on average than hand-optimized OpenACC implementations on convolution and fully connected layers respectively. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and high performance architecture particularly with productivity and efficiency in mind. We would like to open source the implementation so that more people can embrace the power of deep learning compiler and Sunway many-core processor

    Simulation of coarsening in two-phase systems with dissimilar mobilities

    Full text link
    In this work, we apply phase field simulations to examine the coarsening behavior of morphologically complex two-phase microstructures in which the phases have highly dissimilar mobilities, a condition approaching that found in experimental solid-liquid systems. Specifically, we consider a two-phase system at the critical composition (50%50\% volume fraction) in which the mobilities of the two phases differ by a factor of 100. This system is simulated in two and three dimensions using the Cahn-Hilliard model with a concentration-dependent mobility, and results are compared to simulations with a constant mobility. A morphological transition occurs during coarsening of the two-dimensional system (corresponding to a thin film geometry) with dissimilar mobilities, resulting in a system of nearly-circular particles of high-mobility phase embedded in a low-mobility matrix. This morphological transition causes the coarsening rate constant to decrease over time, which explains why a previous study found lack of agreement with the theoretical t1/3t^{1/3} power law. Three-dimensional systems with dissimilar mobilities resulted in bicontinuous microstructures that evolve self-similarly, as determined by quantitative analysis of the interfacial shape distribution. Coarsening kinetics in three dimensions agreed closely with the t1/3t^{1/3} power law after the initial transient stage. A model is derived to explain a nearly-linear relationship between the coarsening rate constant and the variance of scaled mean curvature that is observed during this transient stage.Comment: 25 pages, 12 figure

    Numerical simulation of dual-phase steel based on real and virtual three-dimensional microstructures

    Get PDF
    Dual-phase steel shows a strong connection between its microstructure and its mechanical properties. This structure–property correlation is caused by the composition of the microstructure of a soft ferritic matrix with embedded hard martensite areas, leading to a simultaneous increase in strength and ductility. As a result, dual-phase steels are widely used especially for strength-relevant and energy-absorbing sheet metal structures. However, their use as heavy plate steel is also desirable. Therefore, a better understanding of the structure–property correlation is of great interest. Microstructure-based simulation is essential for a realistic simulation of the mechanical properties of dual-phase steel. This paper describes the entire process route of such a simulation, from the extraction of the microstructure by 3D tomography and the determination of the properties of the individual phases by nanoindentation, to the implementation of a simulation model and its validation by experiments. In addition to simulations based on real microstructures, simulations based on virtual microstructures are also of great importance. Thus, a model for the generation of virtual microstructures is presented, allowing for the same statistical properties as real microstructures. With the help of these structures and the aforementioned simulation model, it is then possible to predict the mechanical properties of a dual-phase steel, whose three-dimensional (3D) microstructure is not yet known with high accuracy. This will enable future investigations of new dual-phase steel microstructures within a virtual laboratory even before their production

    Scaling and Resilience in Numerical Algorithms for Exascale Computing

    Get PDF
    The first Petascale supercomputer, the IBM Roadrunner, went online in 2008. Ten years later, the community is now looking ahead to a new generation of Exascale machines. During the decade that has passed, several hundred Petascale capable machines have been installed worldwide, yet despite the abundance of machines, applications that scale to their full size remain rare. Large clusters now routinely have 50.000+ cores, some have several million. This extreme level of parallelism, that has allowed a theoretical compute capacity in excess of a million billion operations per second, turns out to be difficult to use in many applications of practical interest. Processors often end up spending more time waiting for synchronization, communication, and other coordinating operations to complete, rather than actually computing. Component reliability is another challenge facing HPC developers. If even a single processor fail, among many thousands, the user is forced to restart traditional applications, wasting valuable compute time. These issues collectively manifest themselves as low parallel efficiency, resulting in waste of energy and computational resources. Future performance improvements are expected to continue to come in large part due to increased parallelism. One may therefore speculate that the difficulties currently faced, when scaling applications to Petascale machines, will progressively worsen, making it difficult for scientists to harness the full potential of Exascale computing. The thesis comprises two parts. Each part consists of several chapters discussing modifications of numerical algorithms to make them better suited for future Exascale machines. In the first part, the use of Parareal for Parallel-in-Time integration techniques for scalable numerical solution of partial differential equations is considered. We propose a new adaptive scheduler that optimize the parallel efficiency by minimizing the time-subdomain length without making communication of time-subdomains too costly. In conjunction with an appropriate preconditioner, we demonstrate that it is possible to obtain time-parallel speedup on the nonlinear shallow water equation, beyond what is possible using conventional spatial domain-decomposition techniques alone. The part is concluded with the proposal of a new method for constructing Parallel-in-Time integration schemes better suited for convection dominated problems. In the second part, new ways of mitigating the impact of hardware failures are developed and presented. The topic is introduced with the creation of a new fault-tolerant variant of Parareal. In the chapter that follows, a C++ Library for multi-level checkpointing is presented. The library uses lightweight in-memory checkpoints, protected trough the use of erasure codes, to mitigate the impact of failures by decreasing the overhead of checkpointing and minimizing the compute work lost. Erasure codes have the unfortunate property that if more data blocks are lost than parity codes created, the data is effectively considered unrecoverable. The final chapter contains a preliminary study on partial information recovery for incomplete checksums. Under the assumption that some meta knowledge exists on the structure of the data encoded, we show that the data lost may be recovered, at least partially. This result is of interest not only in HPC but also in data centers where erasure codes are widely used to protect data efficiently

    New approaches for efficient on-the-fly FE operator assembly in a high-performance mantle convection framework

    Get PDF

    Massiv-parallele und groĂźskalige Phasenfeldsimulationen zur Untersuchung der Mikrostrukturentwicklung

    Get PDF
    The development of tailored materials with defined properties requires a deep understanding of the microstructure evolution. In the first part, the microstructure evolution during the directional solidification of ternary eutectics with a highly optimized phase-field solver in the waLBerla-framework is studied. In the second part, the microstructure evolution under the influence of pores at the grain boundaries in the final sintering stage is analyzed with the PACE3D solver

    Massiv-parallele und groĂźskalige Phasenfeldsimulationen zur Untersuchung der Mikrostrukturentwicklung

    Get PDF
    Für maßgeschneiderte Bauteile mit definierten Eigenschaften ist ein detailliertes Verständnis der Mikrostrukturentwicklung notwendig. Im ersten Teil wird die Mikrostrukturentwicklung bei der ternären eutektischen gerichteten Erstarrung mit einem optimierten Phasenfeldlöser im massiv-parallelen waLBerla-Framework untersucht. Im zweiten Teil wird die Mikrostrukturentwicklung unter dem Einfluss von Poren an Korngrenzen während des Endstadiums des Sinterprozesses mit dem PACE3D-Löser analysiert
    corecore