97 research outputs found
Supervised Attentions for Neural Machine Translation
In this paper, we improve the attention or alignment accuracy of neural
machine translation by utilizing the alignments of training sentence pairs. We
simply compute the distance between the machine attentions and the "true"
alignments, and minimize this cost in the training procedure. Our experiments
on large-scale Chinese-to-English task show that our model improves both
translation and alignment qualities significantly over the large-vocabulary
neural machine translation system, and even beats a state-of-the-art
traditional syntax-based system.Comment: 6 pages. In Proceedings of EMNLP 2016. arXiv admin note: text overlap
with arXiv:1605.0314
Research on Feature Extraction of Indicator Card Data for Sucker-Rod Pump Working Condition Diagnosis
Three feature extraction methods of sucker-rod pump indicator card data have been studied, simulated, and compared in this paper, which are based on Fourier Descriptors (FD), Geometric Moment Vector (GMV), and Gray Level Matrix Statistics (GLMX), respectively. Numerical experiments show that the Fourier Descriptors algorithm requires less running time and less memory space with possible loss of information due to nonoptimal numbers of Fourier Descriptors, the Geometric Moment Vector algorithm is more time-consuming and requires more memory space, while the Gray Level Matrix Statistics algorithm provides low-dimension feature vectors with more time consumption and more memory space. Furthermore, the characteristic of rotational invariance, both in the Fourier Descriptors algorithm and the Geometric Moment Vector algorithm, may result in improper pattern recognition of indicator card data when used for sucker-rod pump working condition diagnosis
Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation
Recently CKY-based models show great potential in unsupervised grammar
induction thanks to their human-like encoding paradigm, which runs recursively
and hierarchically, but requires time-complexity. Recursive
Transformer based on Differentiable Trees (R2D2) makes it possible to scale to
large language model pre-training even with complex tree encoder by introducing
a heuristic pruning method. However, the rule-based pruning approach suffers
from local optimum and slow inference issues. In this paper, we fix those
issues in a unified method. We propose to use a top-down parser as a
model-based pruning method, which also enables parallel encoding during
inference. Typically, our parser casts parsing as a split point scoring task,
which first scores all split points for a given sentence, and then recursively
splits a span into two by picking a split point with the highest score in the
current span. The reverse order of the splits is considered as the order of
pruning in R2D2 encoder. Beside the bi-directional language model loss, we also
optimize the parser by minimizing the KL distance between tree probabilities
from parser and R2D2. Our experiments show that our Fast-R2D2 improves
performance significantly in grammar induction and achieves competitive results
in downstream classification tasks.Comment: EMNLP 202
Inconsistent dialogue responses and how to recover from them
One critical issue for chat systems is to stay consistent about preferences,
opinions, beliefs and facts of itself, which has been shown a difficult
problem. In this work, we study methods to assess and bolster utterance
consistency of chat systems. A dataset is first developed for studying the
inconsistencies, where inconsistent dialogue responses, explanations of the
inconsistencies, and recovery utterances are authored by annotators. This
covers the life span of inconsistencies, namely introduction, understanding,
and resolution. Building on this, we introduce a set of tasks centered on
dialogue consistency, specifically focused on its detection and resolution. Our
experimental findings indicate that our dataset significantly helps the
progress in identifying and resolving conversational inconsistencies, and
current popular large language models like ChatGPT which are good at resolving
inconsistencies however still struggle with detection.Comment: Accepted in EACL 2024. Code and dataset available at
https://github.com/mianzhang/CIDE
Discover, Explanation, Improvement: Automatic Slice Detection Framework for Natural Language Processing
Current natural language processing (NLP) models such as BERT and RoBERTa
have achieved high overall performance, but they often make systematic errors
due to bias or certain difficult features to learn. Thus research on slice
detection models (SDM) which automatically identifies underperforming groups of
datapoints has gradually caught more attention, which aims at both
understanding model behaviors and providing insights for future model training
and designing. However, there is little systematic research on SDM and
quantitative evaluation of its assessment for NLP models. Our paper fills this
gap by proposing "Discover, Explanation, Improvement" framework that discovers
coherent and underperforming groups of datapoints and unites datapoints of each
slice under human-understandable concepts; it also provides comprehensive
evaluation tasks and the corresponding quantitative metrics, which enable
convenient comparison for future works. Results show that our framework can
accurately select error-prone datapoints with informative semantic features
that summarize error patterns, based on which it directly boosts model
performance by an average of 2.85 points based on trained models without tuning
any parameters across multiple datasets.Comment: 15 pages, 5 figure
Collaborative decoding of critical tokens for boosting factuality of large language models
The most common training pipeline for large language models includes
pretraining, finetuning and aligning phases, with their respective resulting
models, such as the pretrained model and the finetuned model. Finetuned and
aligned models show improved abilities of instruction following and safe
generation, however their abilities to stay factual about the world are
impacted by the finetuning process. Furthermore, the common practice of using
sampling during generation also increases chances of hallucination. In this
work, we introduce a collaborative decoding framework to harness the high
factuality within pretrained models through the concept of critical tokens. We
first design a critical token classifier to decide which model to use for the
next token, and subsequently generates the next token using different decoding
strategies. Experiments with different models and datasets show that our
decoding framework is able to reduce model hallucination significantly,
showcasing the importance of the collaborative decoding framework.Comment: work in progres
- …