109 research outputs found
Non-linear Learning for Statistical Machine Translation
Modern statistical machine translation (SMT) systems usually use a linear
combination of features to model the quality of each translation hypothesis.
The linear combination assumes that all the features are in a linear
relationship and constrains that each feature interacts with the rest features
in an linear manner, which might limit the expressive power of the model and
lead to a under-fit model on the current data. In this paper, we propose a
non-linear modeling for the quality of translation hypotheses based on neural
networks, which allows more complex interaction between features. A learning
framework is presented for training the non-linear models. We also discuss
possible heuristics in designing the network structure which may improve the
non-linear learning performance. Experimental results show that with the basic
features of a hierarchical phrase-based machine translation system, our method
produce translations that are better than a linear model.Comment: submitted to a conferenc
Top-Rank Enhanced Listwise Optimization for Statistical Machine Translation
Pairwise ranking methods are the basis of many widely used discriminative
training approaches for structure prediction problems in natural language
processing(NLP). Decomposing the problem of ranking hypotheses into pairwise
comparisons enables simple and efficient solutions. However, neglecting the
global ordering of the hypothesis list may hinder learning. We propose a
listwise learning framework for structure prediction problems such as machine
translation. Our framework directly models the entire translation list's
ordering to learn parameters which may better fit the given listwise samples.
Furthermore, we propose top-rank enhanced loss functions, which are more
sensitive to ranking errors at higher positions. Experiments on a large-scale
Chinese-English translation task show that both our listwise learning framework
and top-rank enhanced listwise losses lead to significant improvements in
translation quality.Comment: Accepted to CONLL 201
Neural Machine Translation with Word Predictions
In the encoder-decoder architecture for neural machine translation (NMT), the
hidden states of the recurrent structures in the encoder and decoder carry the
crucial information about the sentence.These vectors are generated by
parameters which are updated by back-propagation of translation errors through
time. We argue that propagating errors through the end-to-end recurrent
structures are not a direct way of control the hidden vectors. In this paper,
we propose to use word predictions as a mechanism for direct supervision. More
specifically, we require these vectors to be able to predict the vocabulary in
target sentence. Our simple mechanism ensures better representations in the
encoder and decoder without using any extra data or annotation. It is also
helpful in reducing the target side vocabulary and improving the decoding
efficiency. Experiments on Chinese-English and German-English machine
translation tasks show BLEU improvements by 4.53 and 1.3, respectivelyComment: Accepted at EMNLP201
Chunk-Based Bi-Scale Decoder for Neural Machine Translation
In typical neural machine translation~(NMT), the decoder generates a sentence
word by word, packing all linguistic granularities in the same time-scale of
RNN. In this paper, we propose a new type of decoder for NMT, which splits the
decode state into two parts and updates them in two different time-scales.
Specifically, we first predict a chunk time-scale state for phrasal modeling,
on top of which multiple word time-scale states are generated. In this way, the
target sentence is translated hierarchically from chunks to words, with
information in different granularities being leveraged. Experiments show that
our proposed model significantly improves the translation performance over the
state-of-the-art NMT model.Comment: Accepted as a short paper by ACL 201
EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling
Recently, Large Language Models (LLMs) have demonstrated outstanding
performance across a wide range of downstream language tasks. Temperature
sampling is a commonly used decoding strategy for LLMs' generation process.
However, a fixed temperature parameter is used in most cases, which may not
always be an optimal choice for balancing generation quality and diversity. In
this paper, we propose an effective Entropy-based Dynamic Temperature (EDT)
Sampling method, to achieve a more balanced performance in terms of both
generation quality and diversity by dynamically selecting the temperature
parameter. Additionally, we also show model performance and comprehensive
analyses for 4 different generation benchmarks. Our experiments show that EDT
significantly outperforms the existing strategies across different tasks
- …