29 research outputs found

    UMDA/S: An Effective Iterative Compilation Algorithm for Parameter Search

    Get PDF
    The search process is critical for iterative compilation because the large size of the search space and the cost of evaluating the candidate implementations make it infeasible to find the true optimal value of the optimization parameter by brute force. Considering it as a nonlinear global optimization problem, this paper introduces a new hybrid algorithm -- UMDA/S: Univariate Marginal Distribution Algorithm with Nelder-Mead Simplex Search, which utilizes the optimization space structure and parameter dependency to find the near optimal parameter. Elitist preservation, weighted estimation and mutation are proposed to improve the performance of UMDA/S. Experimental results show the ability of UMDA/S to locate more excellent parameters, as compared to existing static methods and search algorithms

    Multi-scale Recurrent LSTM and Transformer Network for Depth Completion

    Full text link
    Lidar depth completion is a new and hot topic of depth estimation. In this task, it is the key and difficult point to fuse the features of color space and depth space. In this paper, we migrate the classic LSTM and Transformer modules from NLP to depth completion and redesign them appropriately. Specifically, we use Forget gate, Update gate, Output gate, and Skip gate to achieve the efficient fusion of color and depth features and perform loop optimization at multiple scales. Finally, we further fuse the deep features through the Transformer multi-head attention mechanism. Experimental results show that without repetitive network structure and post-processing steps, our method can achieve state-of-the-art performance by adding our modules to a simple encoder-decoder network structure. Our method ranks first on the current mainstream autonomous driving KITTI benchmark dataset. It can also be regarded as a backbone network for other methods, which likewise achieves state-of-the-art performance

    Adaptive mesh refinement based simulations of three-dimensional detonation combustion in supersonic combustible mixtures with a detailed reaction model

    No full text
    Detonation combustion initiated with a hot jet in supersonic H2-O2-Ar mixtures are investigated by large-scale three-dimensional (3D) simulations in Tianhe-2 computing system with adaptive mesh refinement method. The reactive Euler equations are utilized as the governing equations with a detailed reaction model where the molar ratio of the combustible mixture is 2:1:7 under the condition of pressure 10kPa and temperature 298K. Results show that the Mach stem surface which is formed after the shock surface reflection on the upper wall is actually a local overdriven detonation. The side walls in 3D simulations can play an important role in detonation initiation in supersonic combustible mixtures, because they can help realize triple lines collisions and reflections during the initiation process. The width of the channel has an important influence on the strength of side-wall reflections, and under certain condition there might exist a critical width between the front and back sides of the channel for the successful initiation. Both the two-dimensional (2D) and the 3D detonations are overdriven and have a constant but different overdrive after their complete initiations. Although the overdrive degree of the 3D detonation is smaller than that of the 2D case, more complex and irregular detonation fronts can be observed in the 3D case compared with the 2D detonation, which is likely because of the propagation of transverse waves and collisions of triple lines in multi-directions in 3D detonations. After the hot jet is shut down, the newly formed 2D Chapman-Jouguet (CJ) detonation has almost the same characteristic parameters with the corresponding 3D case, indicating that the 2D instabilities can be perfectly preserved in 3D simulations. However, the slapping wave reflections on the side walls in the 3D detonation result in the second oscillation along with the main one, which presents stronger instabilities compared with the 2D case. The inherent stronger 3D instabilities is also verified through the quantitative comparison between the 2D and 3D cases where the 3D result always shows stronger fluctuations than the 2D case

    Three-dimensional simulation of detonation initiation and propagation in supersonic combustible mixtures

    No full text
    Detonation initiation and propagation in supersonic combustible mixtures using a hot jet have been investigated in three-dimensional numerical simulations with the detailed reaction model on Tianhe-2 system. Results indicate that the side walls can help realize the triple lines collisions and triple lines reflections, which play an important role in the detonation initiation. There should exists a critical width between the front and back sides of the three-dimensional channel for the successful initiation, which is totally different from that of two-dimensional cases. When the width exceeds the critical value, there will be not the effective reflections of the bow shock surface on the side walls, hence resulting in the failure of detonation initiation. For the detonation propagation, none of the standard detonation modes(rectangular mode, diagonal mode and spinning mode) is observed in the three-dimensional case. The initiated detonation is actually in an overdriven state because of the presence of the hot jet in the supersonic flow field, thus resulting in more complex detonation fronts than that in the CJ detonation. Because of both directions of three-dimensional detonation development than that of the two-dimensional case where the transverse waves propagation and the collisions of triple points can be realized only in one direction, the detonation fronts in three-dimensional simulation shows significantly larger irregularities and variations

    Optimization of a Parallel CFD Code and Its Performance Evaluation on Tianhe-1A

    Get PDF
    This paper describes performance tuning experiences with a parallel CFD code to enhance its performance and flexibility on large scale parallel computers. The code solves the incompressible Navier-Stokes equations based on the novel Slightly Compressible Model on three-dimensional structure grids. High level loop transformations and argument based code specialization are utilized to optimize its uniprocessor performance. Static arrays are converted into dynamically allocated arrays to improve the flexibility. The grid generator is coupled with the flow solver so that they can exchange grid data in the memory. A detailed performance evaluation is performed. The results show that our uniprocessor optimizations improve the performance of the flow solver for 1.38 times to 3.93 times on Tianhe-1A supercomputer. In memory grid data exchange optimization speeds up the application startup time by nearly two magnitudes. The optimized code exhibits an excellent parallel scalability running realistic test cases. On 4 096 CPU cores, it achieves a strong scaling parallel efficiency of 77.39 % and a maximum performance of 4.01 Tflops

    PIT: A Framework for Effectively Composing High-Level Loop Transformations

    Get PDF
    The increasing complexity of modern architectures and memory models challenges the design of optimizing compilers. It is mandatory to perform several optimizing transformations of the original program to exploit the machine to its best, especially for scientific, computational-intensive codes. Aiming at investigating the best transformation sequence and the best transformation parameters simultaneously, this paper presents a novel loop transformation framework, which integrates the advantages of polyhedral model and model-guided iterative compilation to create a powerful framework that is capable of fully automated non-parametric transformations and model-guided parametric transformations as well as automatic parameter search. The framework employs polyhedral model to facilitate the search of non-parametric code transformation composition, and designs a transformation model based on hardware performance counters to guide when, where and in what order to apply transformations to get the most benefit, finally uses Nelder-Mead simplex algorithm to find the optimal parameters. The framework is demonstrated on three typical computational kernels for code transformations to achieve performance that greatly exceeds the native compiler, and is significantly better than state-of-the-art polyhedral model based loop transformations and iterative compilation, generating efficient code on complex loop nests
    corecore