60 research outputs found

    Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

    Full text link
    Non-autoregressive approaches aim to improve the inference speed of translation models, particularly those that generate output in a one-pass forward manner. However, these approaches often suffer from a significant drop in translation quality compared to autoregressive models. This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models while maintaining a substantial acceleration in inference speed. We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively. Furthermore, we adopt the MASK insertion scheme for up-sampling instead of token duplication, and we present an embedding distillation method to further enhance performance. In our experiments, our model outperforms the baseline autoregressive model (Transformer \textit{base}) on multiple datasets, including WMT'14 DE↔\leftrightarrowEN, WMT'16 RO↔\leftrightarrowEN, and IWSLT'14 DE↔\leftrightarrowEN. Notably, our model achieves better performance than the baseline autoregressive model on the IWSLT'14 En↔\leftrightarrowDe and WMT'16 En↔\leftrightarrowRo datasets, even without using distillation data during training. It is worth highlighting that on the IWSLT'14 DE→\rightarrowEN dataset, our model achieves an impressive BLEU score of 39.59, setting a new state-of-the-art performance. Additionally, our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.Comment: 12 pages, 6 figure

    Second Triangular Hermite Spline Curves and Its Application

    Get PDF
    Abstract: A class of rational square trigonometric spline is presented, which shares the same properties of normal cubic Hermite interpolation spline. The given spline can more approximate the interpolated curve than the ordinary polynomial cubic spline.Key words: Hermite spline curve; C2 continuous; Faultage area; Precisio

    Frustratingly Easy Model Generalization by Dummy Risk Minimization

    Full text link
    Empirical risk minimization (ERM) is a fundamental machine learning paradigm. However, its generalization ability is limited in various tasks. In this paper, we devise Dummy Risk Minimization (DuRM), a frustratingly easy and general technique to improve the generalization of ERM. DuRM is extremely simple to implement: just enlarging the dimension of the output logits and then optimizing using standard gradient descent. Moreover, we validate the efficacy of DuRM on both theoretical and empirical analysis. Theoretically, we show that DuRM derives greater variance of the gradient, which facilitates model generalization by observing better flat local minima. Empirically, we conduct evaluations of DuRM across different datasets, modalities, and network architectures on diverse tasks, including conventional classification, semantic segmentation, out-of-distribution generalization, adverserial training, and long-tailed recognition. Results demonstrate that DuRM could consistently improve the performance under all tasks with an almost free lunch manner. Furthermore, we show that DuRM is compatible with existing generalization techniques and we discuss possible limitations. We hope that DuRM could trigger new interest in the fundamental research on risk minimization.Comment: Technical report; 22 page

    Social exclusion and suicide intention in Chinese college students: a moderated mediation model

    Get PDF
    Given the growing incidence rates of suicide among college students and the potential lifelong consequences of suicide, it is imperative to better understand the factors that reduce the rates at which college students in a clinical sample engage in suicide. This study examines the relationship between social exclusion and suicide intention, the mediating effect of depression, and the moderating effect of meaning in life. Two hundred and ninety-nine Chinese college students, aged from 18 to 22 years (56.86% female, M age = 20.14, SD = 1.27) completed questionnaires assessing their social exclusion, suicide intention, depression, and meaning in life. The result revealed that social exclusion was positively associated with suicide intention, and depression mediated this relationship. In addition, this mediating effect of depression was moderated by meaning in life. That is, the mediation effect was stronger for students with a higher level of meaning in life. These findings provide educational suggestions for preventing and intervening in suicide intention among college students
    • …
    corecore