4,635 research outputs found

    Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

    Get PDF
    Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic grammatical errors would be difficult, one could learn the distribution of naturallyoccurring errors and attempt to introduce them into other datasets. Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. Our approach yields error-filled artificial data that helps a vanilla bi-directional LSTM to outperform the previous state of the art at grammatical error detection, and a previously introduced model to gain further improvements of over 5% F0.5F_{0.5} score. When attempting to determine if a given sentence is synthetic, a human annotator at best achieves 39.39 F1F_1 score, indicating that our model generates mostly human-like instances.Comment: Accepted as a short paper at EMNLP 201

    Mask-Predict: Parallel Decoding of Conditional Masked Language Models

    Full text link
    Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively, and then repeatedly mask out and regenerate the subset of words that the model is least confident about. By applying this strategy for a constant number of iterations, our model improves state-of-the-art performance levels for non-autoregressive and parallel decoding translation models by over 4 BLEU on average. It is also able to reach within about 1 BLEU point of a typical left-to-right transformer model, while decoding significantly faster.Comment: EMNLP 201

    Symbolic stochastic dynamical systems viewed as binary N-step Markov chains

    Full text link
    A theory of systems with long-range correlations based on the consideration of binary N-step Markov chains is developed. In the model, the conditional probability that the i-th symbol in the chain equals zero (or unity) is a linear function of the number of unities among the preceding N symbols. The correlation and distribution functions as well as the variance of number of symbols in the words of arbitrary length L are obtained analytically and numerically. A self-similarity of the studied stochastic process is revealed and the similarity group transformation of the chain parameters is presented. The diffusion Fokker-Planck equation governing the distribution function of the L-words is explored. If the persistent correlations are not extremely strong, the distribution function is shown to be the Gaussian with the variance being nonlinearly dependent on L. The applicability of the developed theory to the coarse-grained written and DNA texts is discussed.Comment: 14 pages, 13 figure
    • …
    corecore