216 research outputs found
End-to-End Multi-View Networks for Text Classification
We propose a multi-view network for text classification. Our method
automatically creates various views of its input text, each taking the form of
soft attention weights that distribute the classifier's focus among a set of
base features. For a bag-of-words representation, each view focuses on a
different subset of the text's words. Aggregating many such views results in a
more discriminative and robust representation. Through a novel architecture
that both stacks and concatenates views, we produce a network that emphasizes
both depth and width, allowing training to converge quickly. Using our
multi-view architecture, we establish new state-of-the-art accuracies on two
benchmark tasks.Comment: 6 page
A Challenge Set Approach to Evaluating Machine Translation
Neural machine translation represents an exciting leap forward in translation
quality. But what longstanding weaknesses does it resolve, and which remain? We
address these questions with a challenge set approach to translation evaluation
and error analysis. A challenge set consists of a small set of sentences, each
hand-designed to probe a system's capacity to bridge a particular structural
divergence between languages. To exemplify this approach, we present an
English-French challenge set, and use it to analyze phrase-based and neural
systems. The resulting analysis provides not only a more fine-grained picture
of the strengths of neural systems, but also insight into which linguistic
phenomena remain out of reach.Comment: EMNLP 2017. 28 pages, including appendix. Machine readable data
included in a separate file. This version corrects typos in the challenge se
Cohesive Constraints in A Beam Search Phrase-based Decoder
Cohesive constraints allow the phrase-based decoder to employ arbitrary, non-syntactic phrases, and encourage it to translate those phrases in an order that respects the source dependency tree structure. We present extensions of the cohesive constraints, such as exhaustive interruption count and rich interruption check. We show that the cohesion-enhanced decoder significantly outperforms the standard phrase-based decoder on English→Spanish. Improvements between 0.5 and 1.2 BLEU point are obtained on English→Iraqi system
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
While large language models (LLMs) often adopt finetuning to unlock their
capabilities for downstream applications, our understanding on the inductive
biases (especially the scaling properties) of different finetuning methods is
still limited. To fill this gap, we conduct systematic experiments studying
whether and how different scaling factors, including LLM model size,
pretraining data size, new finetuning parameter size and finetuning data size,
affect the finetuning performance. We consider two types of finetuning --
full-model tuning (FMT) and parameter efficient tuning (PET, including prompt
tuning and LoRA), and explore their scaling behaviors in the data-limited
regime where the LLM model size substantially outweighs the finetuning data
size. Based on two sets of pretrained bilingual LLMs from 1B to 16B and
experiments on bilingual machine translation and multilingual summarization
benchmarks, we find that 1) LLM finetuning follows a powerbased multiplicative
joint scaling law between finetuning data size and each other scaling factor;
2) LLM finetuning benefits more from LLM model scaling than pretraining data
scaling, and PET parameter scaling is generally ineffective; and 3) the optimal
finetuning method is highly task- and finetuning data-dependent. We hope our
findings could shed light on understanding, selecting and developing LLM
finetuning methods.Comment: ICLR2
Reinforcement Learning based Curriculum Optimization for Neural Machine Translation
We consider the problem of making efficient use of heterogeneous training
data in neural machine translation (NMT). Specifically, given a training
dataset with a sentence-level feature such as noise, we seek an optimal
curriculum, or order for presenting examples to the system during training. Our
curriculum framework allows examples to appear an arbitrary number of times,
and thus generalizes data weighting, filtering, and fine-tuning schemes. Rather
than relying on prior knowledge to design a curriculum, we use reinforcement
learning to learn one automatically, jointly with the NMT system, in the course
of a single training run. We show that this approach can beat uniform and
filtering baselines on Paracrawl and WMT English-to-French datasets by up to
+3.4 BLEU, and match the performance of a hand-designed, state-of-the-art
curriculum.Comment: NAACL 2019 short paper. Reviewer comments not yet addresse
- …