407 research outputs found

    TeaForN: Teacher-Forcing with N-grams

    Full text link
    Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.Comment: to be published in EMNLP 202

    CausalLM is not optimal for in-context learning

    Full text link
    Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model (prefixLM), in which in-context samples can all attend to each other, compared to causal language models (causalLM), which use auto-regressive attention that prohibits in-context samples to attend to future samples. While this result is intuitive, it is not understood from a theoretical perspective. In this paper we take a theoretical approach and analyze the convergence behavior of prefixLM and causalLM under a certain parameter construction. Our analysis shows that both LM types converge to their stationary points at a linear rate, but that while prefixLM converges to the optimal solution of linear regression, causalLM convergence dynamics follows that of an online gradient descent algorithm, which is not guaranteed to be optimal even as the number of samples grows infinitely. We supplement our theoretical claims with empirical experiments over synthetic and real tasks and using various types of transformers. Our experiments verify that causalLM consistently underperforms prefixLM in all settings

    Antioxidant generation during coffee roasting : a comparison and interpretation from three complementary assays

    Get PDF
    Coffee is a major source of dietary antioxidants; some are present in the green bean, whereas others are generated during roasting. However, there is no single accepted analytical method for their routine determination. This paper describes the adaption of three complementary assays (Folin-Ciocalteu (FC), ABTS and ORAC) for the routine assessment of antioxidant capacity of beverages, their validation, and use for determining the antioxidant capacities of extracts from coffee beans at different stages in the roasting process. All assays showed a progressive increase in antioxidant capacity during roasting to a light roast state, consistent with the production of melanoidins having a higher antioxidant effect than the degradation of CGAs. However, the three assays gave different numbers for the total antioxidant capacity of green beans relative to gallic acid (GA), although the range of values was much smaller when chlorogenic acid (CGA) was used as reference. Therefore, although all three assays indicated that there was an increase in antioxidant activity during coffee roasting, and the large differences in responses to GA and CGA illustrate their different sensitivities to different types of antioxidant molecule

    PreSTU: Pre-Training for Scene-Text Understanding

    Full text link
    The ability to recognize and reason about text embedded in visual inputs is often lacking in vision-and-language (V&L) models, perhaps because V&L pre-training methods have often failed to include such an ability in their training objective. In this paper, we propose PreSTU, a novel pre-training recipe dedicated to scene-text understanding (STU). PreSTU introduces OCR-aware pre-training objectives that encourage the model to recognize text from an image and connect it to the rest of the image content. We implement PreSTU using a simple transformer-based encoder-decoder architecture, combined with large-scale image-text datasets with scene text obtained from an off-the-shelf OCR system. We empirically demonstrate the effectiveness of this pre-training approach on eight visual question answering and four image captioning benchmarks.Comment: Accepted to ICCV 202
    • …
    corecore