16 research outputs found
A Theme-Rewriting Approach for Generating Algebra Word Problems
Texts present coherent stories that have a particular theme or overall
setting, for example science fiction or western. In this paper, we present a
text generation method called {\it rewriting} that edits existing
human-authored narratives to change their theme without changing the underlying
story. We apply the approach to math word problems, where it might help
students stay more engaged by quickly transforming all of their homework
assignments to the theme of their favorite movie without changing the math
concepts that are being taught. Our rewriting method uses a two-stage decoding
process, which proposes new words from the target theme and scores the
resulting stories according to a number of factors defining aspects of
syntactic, semantic, and thematic coherence. Experiments demonstrate that the
final stories typically represent the new theme well while still testing the
original math concepts, outperforming a number of baselines. We also release a
new dataset of human-authored rewrites of math word problems in several themes.Comment: To appear EMNLP 201
Learning Answer Generation using Supervision from Automatic Question Answering Evaluators
Recent studies show that sentence-level extractive QA, i.e., based on Answer
Sentence Selection (AS2), is outperformed by Generation-based QA (GenQA)
models, which generate answers using the top-k answer sentences ranked by AS2
models (a la retrieval-augmented generation style). In this paper, we propose a
novel training paradigm for GenQA using supervision from automatic QA
evaluation models (GAVA). Specifically, we propose three strategies to transfer
knowledge from these QA evaluation models to a GenQA model: (i) augmenting
training data with answers generated by the GenQA model and labelled by GAVA
(either statically, before training, or (ii) dynamically, at every training
epoch); and (iii) using the GAVA score for weighting the generator loss during
the learning of the GenQA model. We evaluate our proposed methods on two
academic and one industrial dataset, obtaining a significant improvement in
answering accuracy over the previous state of the art.Comment: Accepted at ACL 202
Pyramidal Recurrent Unit for Language Modeling
LSTMs are powerful tools for modeling contextual information, as evidenced by
their success at the task of language modeling. However, modeling contexts in
very high dimensional space can lead to poor generalizability. We introduce the
Pyramidal Recurrent Unit (PRU), which enables learning representations in high
dimensional space with more generalization power and fewer parameters. PRUs
replace the linear transformation in LSTMs with more sophisticated interactions
including pyramidal and grouped linear transformations. This architecture gives
strong results on word-level language modeling while reducing the number of
parameters significantly. In particular, PRU improves the perplexity of a
recent state-of-the-art language model Merity et al. (2018) by up to 1.3 points
while learning 15-20% fewer parameters. For similar number of model parameters,
PRU outperforms all previous RNN models that exploit different gating
mechanisms and transformations. We provide a detailed examination of the PRU
and its behavior on the language modeling tasks. Our code is open-source and
available at https://sacmehta.github.io/PRU/Comment: Accepted as a long paper in EMNLP 201
BizBench: A Quantitative Reasoning Benchmark for Business and Finance
As large language models (LLMs) impact a growing number of complex domains,
it is becoming increasingly important to have fair, accurate, and rigorous
evaluation benchmarks. Evaluating the reasoning skills required for business
and financial NLP stands out as a particularly difficult challenge. We
introduce BizBench, a new benchmark for evaluating models' ability to reason
about realistic financial problems. BizBench comprises 8 quantitative reasoning
tasks. Notably, BizBench targets the complex task of question-answering (QA)
for structured and unstructured financial data via program synthesis (i.e.,
code generation). We introduce three diverse financially-themed code-generation
tasks from newly collected and augmented QA data. Additionally, we isolate
distinct financial reasoning capabilities required to solve these QA tasks:
reading comprehension of financial text and tables, which is required to
extract correct intermediate values; and understanding domain knowledge (e.g.,
financial formulas) needed to calculate complex solutions. Collectively, these
tasks evaluate a model's financial background knowledge, ability to extract
numeric entities from financial documents, and capacity to solve problems with
code. We conduct an in-depth evaluation of open-source and commercial LLMs,
illustrating that BizBench is a challenging benchmark for quantitative
reasoning in the finance and business domain.Comment: Work in progres