465 research outputs found

    Transformer-based NMT : modeling, training and implementation

    Get PDF
    International trade and industrial collaborations enable countries and regions to concentrate their developments on specific industries while making the most of other countries' specializations, which significantly accelerates global development. However, globalization also increases the demand for cross-region communication. Language barriers between many languages worldwide create a challenge for achieving deep collaboration between groups speaking different languages, increasing the need for translation. Language technology, specifically, Machine Translation (MT) holds the promise to enable communication between languages efficiently in real-time with minimal costs. Even though nowadays computers can perform computation in parallel very fast, which provides machine translation users with translations with very low latency, and although the evolution from Statistical Machine Translation (SMT) to Neural Machine Translation (NMT) with the utilization of advanced deep learning algorithms has significantly boosted translation quality, current machine translation algorithms are still far from accurately translating all input. Thus, how to further improve the performance of state-of-the-art NMT algorithm remains a valuable open research question which has received a wide range of attention. In the research presented in this thesis, we first investigate the long-distance relation modeling ability of the state-of-the-art NMT model, the Transformer. We propose to learn source phrase representations and incorporate them into the Transformer translation model, aiming to enhance its ability to capture long-distance dependencies well. Second, though previous work (Bapna et al., 2018) suggests that deep Transformers have difficulty in converging, we empirically find that the convergence of deep Transformers depends on the interaction between the layer normalization and residual connections employed to stabilize its training. We conduct a theoretical study about how to ensure the convergence of Transformers, especially for deep Transformers, and propose to ensure the convergence of deep Transformers by putting the Lipschitz constraint on its parameter initialization. Finally, we investigate how to dynamically determine proper and efficient batch sizes during the training of the Transformer model. We find that the gradient direction gets stabilized with increasing batch size during gradient accumulation. Thus we propose to dynamically adjust batch sizes during training by monitoring the gradient direction change within gradient accumulation, and to achieve a proper and efficient batch size by stopping the gradient accumulation when the gradient direction starts to fluctuate. For our research in this thesis, we also implement our own NMT toolkit, the Neutron implementation of the Transformer and its variants. In addition to providing fundamental features as the basis of our implementations for the approaches presented in this thesis, we support many advanced features from recent cutting-edge research work. Implementations of all our approaches in this thesis are also included and open-sourced in the toolkit. To compare with previous approaches, we mainly conducted our experiments on the data from the WMT 14 English to German (En-De) and English to French (En-Fr) news translation tasks, except when studying the convergence of deep Transformers, where we alternated the WMT 14 En-Fr task with the WMT 15 Czech to English (Cs-En) news translation task to compare with Bapna et al. (2018). The sizes of these datasets vary from medium (the WMT 14 En-De, ~ 4.5M sentence pairs) to very large (the WMT 14 En-Fr, ~ 36M sentence pairs), thus we suggest our approaches help improve the translation quality between popular language pairs which are widely used and have sufficient data.China Scholarship Counci

    Proving Expected Sensitivity of Probabilistic Programs with Randomized Variable-Dependent Termination Time

    Get PDF
    The notion of program sensitivity (aka Lipschitz continuity) specifies that changes in the program input result in proportional changes to the program output. For probabilistic programs the notion is naturally extended to expected sensitivity. A previous approach develops a relational program logic framework for proving expected sensitivity of probabilistic while loops, where the number of iterations is fixed and bounded. In this work, we consider probabilistic while loops where the number of iterations is not fixed, but randomized and depends on the initial input values. We present a sound approach for proving expected sensitivity of such programs. Our sound approach is martingale-based and can be automated through existing martingale-synthesis algorithms. Furthermore, our approach is compositional for sequential composition of while loops under a mild side condition. We demonstrate the effectiveness of our approach on several classical examples from Gambler's Ruin, stochastic hybrid systems and stochastic gradient descent. We also present experimental results showing that our automated approach can handle various probabilistic programs in the literature

    Kinetics of dyeing in continuous circulation with direct dyes: tencel case

    Get PDF
    Due to the special characteristics of Tencel fibres, it is important to gather new data and information in order to improve our knowledge of their performance during dyeing. Kinetic equations are used to describe the behaviour of the heterogeneous dye-fibre system under isothermal conditions in order to determine the evolution of dye exhaustion versus dyeing time. Direct dyes are particularly suitable because they are physically absorbed and because they exhibit outstanding substantivity to cellulose. In addition, some of these dyes have a linear structure which ensures good correlation with structural differences in the fibres. The aim of this study is to quantify the kinetic behaviour of the Tencel-C.I. Direct Blue 1 system (one of the most common dyes in dyeing studies) by using three bi-parametric empirical dyeing-rate equations and a continuous-flow dyeing cell to obtain experimental data at six different temperatures: 30, 40, 50, 60, 70 and 80ºC. In order to check the level of adjustment of the equations we record the at three exhaustion levels: 50%, 80%, and final exhaustionPostprint (published version

    Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation

    Full text link
    In this paper, we study the use of deep Transformer translation model for the CCMT 2022 Chinese-Thai low-resource machine translation task. We first explore the experiment settings (including the number of BPE merge operations, dropout probability, embedding size, etc.) for the low-resource scenario with the 6-layer Transformer. Considering that increasing the number of layers also increases the regularization on new model parameters (dropout modules are also introduced when using more layers), we adopt the highest performance setting but increase the depth of the Transformer to 24 layers to obtain improved translation quality. Our work obtains the SOTA performance in the Chinese-to-Thai translation in the constrained evaluation

    Decline curve analysis for multiple-fractured horizontal wells in tight oil reservoirs

    Get PDF
            The production of multiple-fractured horizontal wells (MFHW) in tight oil reservoirs decreases rapidly in the initial period. It then enters a stable state of the low production with a slow loss-ratio. Because of its complicated production dynamic characteristics, the conventional rate-decline analysis method is no longer applicable. We adopt the semi- analytical solution of the MFHW five-region linear flow model to get the multiple-fractured dimensionless rate-decline analysis curves and continuously calculate the loss-ratio of the decline curves. Based on the characteristics of the loss-ratio in the log-log plot, the decline curve analysis (DCA) model for tight oil MFHW is established. According to the DCA model, dimensionless production decline curves are obtained, and the sensitivity of the parameters in the model is analyzed. Finally, the DCA model is used to analyze the production rate decline of tight oil MFHW, and good matching results and production forecasts are achieved. This method provides a scientific basis for the rate-decline analysis of tight oil MFHW.Cited as: Xu, G., Yin., H., Yuan, H., Xing, C. Decline curve analysis for multiple-fractured horizontal wells in tight oil reservoirs. Advances in Geo-Energy Research, 2020, 4(3): 296-304, doi: 10.46690/ager.2020.03.0
    corecore