407 research outputs found
TeaForN: Teacher-Forcing with N-grams
Sequence generation models trained with teacher-forcing suffer from issues
related to exposure bias and lack of differentiability across timesteps. Our
proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these
problems directly, through the use of a stack of N decoders trained to decode
along a secondary time axis that allows model parameter updates based on N
prediction steps. TeaForN can be used with a wide class of decoder
architectures and requires minimal modifications from a standard
teacher-forcing setup. Empirically, we show that TeaForN boosts generation
quality on one Machine Translation benchmark, WMT 2014 English-French, and two
News Summarization benchmarks, CNN/Dailymail and Gigaword.Comment: to be published in EMNLP 202
CausalLM is not optimal for in-context learning
Recent empirical evidence indicates that transformer based in-context
learning performs better when using a prefix language model (prefixLM), in
which in-context samples can all attend to each other, compared to causal
language models (causalLM), which use auto-regressive attention that prohibits
in-context samples to attend to future samples. While this result is intuitive,
it is not understood from a theoretical perspective. In this paper we take a
theoretical approach and analyze the convergence behavior of prefixLM and
causalLM under a certain parameter construction. Our analysis shows that both
LM types converge to their stationary points at a linear rate, but that while
prefixLM converges to the optimal solution of linear regression, causalLM
convergence dynamics follows that of an online gradient descent algorithm,
which is not guaranteed to be optimal even as the number of samples grows
infinitely. We supplement our theoretical claims with empirical experiments
over synthetic and real tasks and using various types of transformers. Our
experiments verify that causalLM consistently underperforms prefixLM in all
settings
Antioxidant generation during coffee roasting : a comparison and interpretation from three complementary assays
Coffee is a major source of dietary antioxidants; some are present in the green bean, whereas others are generated during roasting. However, there is no single accepted analytical method for their routine determination. This paper describes the adaption of three complementary assays (Folin-Ciocalteu (FC), ABTS and ORAC) for the routine assessment of antioxidant capacity of beverages, their validation, and use for determining the antioxidant capacities of extracts from coffee beans at different stages in the roasting process. All assays showed a progressive increase in antioxidant capacity during roasting to a light roast state, consistent with the production of melanoidins having a higher antioxidant effect than the degradation of CGAs. However, the three assays gave different numbers for the total antioxidant capacity of green beans relative to gallic acid (GA), although the range of values was much smaller when chlorogenic acid (CGA) was used as reference. Therefore, although all three assays indicated that there was an increase in antioxidant activity during coffee roasting, and the large differences in responses to GA and CGA illustrate their different sensitivities to different types of antioxidant molecule
PreSTU: Pre-Training for Scene-Text Understanding
The ability to recognize and reason about text embedded in visual inputs is
often lacking in vision-and-language (V&L) models, perhaps because V&L
pre-training methods have often failed to include such an ability in their
training objective. In this paper, we propose PreSTU, a novel pre-training
recipe dedicated to scene-text understanding (STU). PreSTU introduces OCR-aware
pre-training objectives that encourage the model to recognize text from an
image and connect it to the rest of the image content. We implement PreSTU
using a simple transformer-based encoder-decoder architecture, combined with
large-scale image-text datasets with scene text obtained from an off-the-shelf
OCR system. We empirically demonstrate the effectiveness of this pre-training
approach on eight visual question answering and four image captioning
benchmarks.Comment: Accepted to ICCV 202
- …