15 research outputs found

    Findings of the 2017 Conference on Machine Translation

    Get PDF
    This paper presents the results of the WMT17 shared tasks, which included three machine translation (MT) tasks (news, biomedical, and multimodal), two evaluation tasks (metrics and run-time estimation of MT quality), an automatic post-editing task, a neural MT training task, and a bandit learning task

    Results of the WMT17 metrics shared task

    Get PDF
    This paper presents the results of the WMT17 Metrics Shared Task. We asked participants of this task to score the outputs of the MT systems involved in the WMT17 news translation task and Neural MT training task. We collected scores of 14 metrics from 8 research groups. In addition to that, we computed scores of 7 standard metrics (BLEU, SentBLEU, NIST, WER, PER, TER and CDER) as baselines. The collected scores were evaluated in terms of system-level correlation (how well each metric’s scores correlate with WMT17 official manual ranking of systems) and in terms of segment level correlation (how often a metric agrees with humans in judging the quality of a particular sentence). This year, we build upon two types of manual judgements: direct assessment (DA) and HUME manual semantic judgements

    Findings of the 2017 Conference on Machine Translation (WMT17)

    Get PDF
    This paper presents the results of theWMT17 shared tasks, which included three machine translation (MT) tasks(news, biomedical, and multimodal), two evaluation tasks (metrics and run-time estimation of MT quality), an automatic post-editing task, a neural MT training task, and a bandit learning task

    Sen2Pro: A Probabilistic Perspective to Sentence Embedding from Pre-trained Language Model

    Full text link
    Sentence embedding is one of the most fundamental tasks in Natural Language Processing and plays an important role in various tasks. The recent breakthrough in sentence embedding is achieved by pre-trained language models (PLMs). Despite its success, an embedded vector (Sen2Vec) representing a point estimate does not naturally express uncertainty in a taskagnostic way. This paper thereby proposes an efficient framework on probabilistic sentence embedding (Sen2Pro) from PLMs, and it represents a sentence as a probability density distribution in an embedding space to reflect both model uncertainty and data uncertainty (i.e., many-to-one nature) in the sentence representation. The proposed framework performs in a plug-and-play way without retraining PLMs anymore, and it is easy to implement and generally applied on top of any PLM. The superiority of Sen2Pro over Sen2Vec has been theoretically verified and practically illustrated on different NLP tasks.Comment: Accepted to ACL2023 workshop Rep4NL
    corecore