1 research outputs found

    Minimum Error Rate Training Based on Ensemble Learning

    No full text
    最小错误率训练是统计机器翻译的标准调参方法,在统计机器翻译建模过程中发挥着重要作用.然而,该方法在训练过程中容易出现训练过拟合现象,即开发集训练得到的权重无法很好地适用于翻译测试集.针对该问题,本文引入集成学习方法来优化调参.在调参时挑选不同的特征子集来训练多组特征权重,并计算权重之间的空间距离以删除不合理的特征权重,再根据各组子集在开发集上的blEu(bIlInguAl EVAluATIOn undErSTudy)值来进行加权平均,获得最终的特征权重.nIST和IWSlT实验结果表明,该方法具有较好的效果.Minimum error rate training(MERT)is a standard tuning parameter procedure in statistical machine translation,playing a significant role in the process.However,the overfitting phenomenon is likely to occur in the original MERT.In other words,weights trained from development set cannot be fit for test sets.In view of this issue,we adopt ensemble learning method to the training process in this paper.To be specific,we first select different feature subsets to acquire several groups of feature weights through MERT,and then filter out unreasonable weights according to their spatial distance,and at last we compute the weighted average as the final feature weight based on their BLEU scores on development set.Experiments on NIST and IWSLT show that our method is efficient for the translation tasks using the training and testing data sets of different domains
    corecore