97 research outputs found

    Supervised Attentions for Neural Machine Translation

    Full text link
    In this paper, we improve the attention or alignment accuracy of neural machine translation by utilizing the alignments of training sentence pairs. We simply compute the distance between the machine attentions and the "true" alignments, and minimize this cost in the training procedure. Our experiments on large-scale Chinese-to-English task show that our model improves both translation and alignment qualities significantly over the large-vocabulary neural machine translation system, and even beats a state-of-the-art traditional syntax-based system.Comment: 6 pages. In Proceedings of EMNLP 2016. arXiv admin note: text overlap with arXiv:1605.0314

    Statistical Translation Model Based On Source Syntax Structure

    Get PDF

    Research on Feature Extraction of Indicator Card Data for Sucker-Rod Pump Working Condition Diagnosis

    Get PDF
    Three feature extraction methods of sucker-rod pump indicator card data have been studied, simulated, and compared in this paper, which are based on Fourier Descriptors (FD), Geometric Moment Vector (GMV), and Gray Level Matrix Statistics (GLMX), respectively. Numerical experiments show that the Fourier Descriptors algorithm requires less running time and less memory space with possible loss of information due to nonoptimal numbers of Fourier Descriptors, the Geometric Moment Vector algorithm is more time-consuming and requires more memory space, while the Gray Level Matrix Statistics algorithm provides low-dimension feature vectors with more time consumption and more memory space. Furthermore, the characteristic of rotational invariance, both in the Fourier Descriptors algorithm and the Geometric Moment Vector algorithm, may result in improper pattern recognition of indicator card data when used for sucker-rod pump working condition diagnosis

    Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation

    Full text link
    Recently CKY-based models show great potential in unsupervised grammar induction thanks to their human-like encoding paradigm, which runs recursively and hierarchically, but requires O(n3)O(n^3) time-complexity. Recursive Transformer based on Differentiable Trees (R2D2) makes it possible to scale to large language model pre-training even with complex tree encoder by introducing a heuristic pruning method. However, the rule-based pruning approach suffers from local optimum and slow inference issues. In this paper, we fix those issues in a unified method. We propose to use a top-down parser as a model-based pruning method, which also enables parallel encoding during inference. Typically, our parser casts parsing as a split point scoring task, which first scores all split points for a given sentence, and then recursively splits a span into two by picking a split point with the highest score in the current span. The reverse order of the splits is considered as the order of pruning in R2D2 encoder. Beside the bi-directional language model loss, we also optimize the parser by minimizing the KL distance between tree probabilities from parser and R2D2. Our experiments show that our Fast-R2D2 improves performance significantly in grammar induction and achieves competitive results in downstream classification tasks.Comment: EMNLP 202

    Inconsistent dialogue responses and how to recover from them

    Full text link
    One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem. In this work, we study methods to assess and bolster utterance consistency of chat systems. A dataset is first developed for studying the inconsistencies, where inconsistent dialogue responses, explanations of the inconsistencies, and recovery utterances are authored by annotators. This covers the life span of inconsistencies, namely introduction, understanding, and resolution. Building on this, we introduce a set of tasks centered on dialogue consistency, specifically focused on its detection and resolution. Our experimental findings indicate that our dataset significantly helps the progress in identifying and resolving conversational inconsistencies, and current popular large language models like ChatGPT which are good at resolving inconsistencies however still struggle with detection.Comment: Accepted in EACL 2024. Code and dataset available at https://github.com/mianzhang/CIDE

    Discover, Explanation, Improvement: Automatic Slice Detection Framework for Natural Language Processing

    Full text link
    Current natural language processing (NLP) models such as BERT and RoBERTa have achieved high overall performance, but they often make systematic errors due to bias or certain difficult features to learn. Thus research on slice detection models (SDM) which automatically identifies underperforming groups of datapoints has gradually caught more attention, which aims at both understanding model behaviors and providing insights for future model training and designing. However, there is little systematic research on SDM and quantitative evaluation of its assessment for NLP models. Our paper fills this gap by proposing "Discover, Explanation, Improvement" framework that discovers coherent and underperforming groups of datapoints and unites datapoints of each slice under human-understandable concepts; it also provides comprehensive evaluation tasks and the corresponding quantitative metrics, which enable convenient comparison for future works. Results show that our framework can accurately select error-prone datapoints with informative semantic features that summarize error patterns, based on which it directly boosts model performance by an average of 2.85 points based on trained models without tuning any parameters across multiple datasets.Comment: 15 pages, 5 figure

    Collaborative decoding of critical tokens for boosting factuality of large language models

    Full text link
    The most common training pipeline for large language models includes pretraining, finetuning and aligning phases, with their respective resulting models, such as the pretrained model and the finetuned model. Finetuned and aligned models show improved abilities of instruction following and safe generation, however their abilities to stay factual about the world are impacted by the finetuning process. Furthermore, the common practice of using sampling during generation also increases chances of hallucination. In this work, we introduce a collaborative decoding framework to harness the high factuality within pretrained models through the concept of critical tokens. We first design a critical token classifier to decide which model to use for the next token, and subsequently generates the next token using different decoding strategies. Experiments with different models and datasets show that our decoding framework is able to reduce model hallucination significantly, showcasing the importance of the collaborative decoding framework.Comment: work in progres
    • …
    corecore