15 research outputs found
Neural Machine Translation with Word Predictions
In the encoder-decoder architecture for neural machine translation (NMT), the
hidden states of the recurrent structures in the encoder and decoder carry the
crucial information about the sentence.These vectors are generated by
parameters which are updated by back-propagation of translation errors through
time. We argue that propagating errors through the end-to-end recurrent
structures are not a direct way of control the hidden vectors. In this paper,
we propose to use word predictions as a mechanism for direct supervision. More
specifically, we require these vectors to be able to predict the vocabulary in
target sentence. Our simple mechanism ensures better representations in the
encoder and decoder without using any extra data or annotation. It is also
helpful in reducing the target side vocabulary and improving the decoding
efficiency. Experiments on Chinese-English and German-English machine
translation tasks show BLEU improvements by 4.53 and 1.3, respectivelyComment: Accepted at EMNLP201
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
The recent surge of generative AI has been fueled by the generative power of
diffusion probabilistic models and the scalable capabilities of large language
models. Despite their potential, it remains elusive whether diffusion language
models can solve general language tasks comparable to their autoregressive
counterparts. This paper demonstrates that scaling diffusion models w.r.t.
data, sizes, and tasks can effectively make them strong language learners. We
build competent diffusion language models at scale by first acquiring knowledge
from massive data via masked language modeling pretraining thanks to their
intrinsic connections. We then reprogram pretrained masked language models into
diffusion language models via diffusive adaptation, wherein task-specific
finetuning and instruction finetuning are explored to unlock their versatility
in solving general language tasks. Experiments show that scaling diffusion
language models consistently improves performance across downstream language
tasks. We further discover that instruction finetuning can elicit zero-shot and
few-shot in-context learning abilities that help tackle many unseen tasks by
following natural language instructions, and show promise in advanced and
challenging abilities such as reasoning.Comment: added reference
Classification Based on both Attribute Value Weight and Tuple Weight under the Cloud Computing
In recent years, more and more people pay attention to cloud computing. Users need to deal with magnanimity data in the cloud computing environment. Classification can predict the need of users from large data in the cloud computing environment. Some traditional classification methods frequently adopt the following two ways. One way is to remove instance after it is covered by a rule, another way is to decrease tuple weight of instance after it is covered by a rule. The quality of these traditional classifiers may be not high. As a result, they cannot achieve high classification accuracy in some data. In this paper, we present a new classification approach, called classification based on both attribute value weight and tuple weight (CATW). CATW is distinguished from some traditional classifiers in two aspects. First, CATW uses both attribute value weight and tuple weight. Second, CATW proposes a new measure to select best attribute values and generate high quality classification rule set. Our experimental results indicate that CATW can achieve higher classification accuracy than some traditional classifiers