224 research outputs found
Transformers as Soft Reasoners over Language
Beginning with McCarthy's Advice Taker (1959), AI has pursued the goal of
providing a system with explicit, general knowledge and having the system
reason over that knowledge. However, expressing the knowledge in a formal
(logical or probabilistic) representation has been a major obstacle to this
research. This paper investigates a modern approach to this problem where the
facts and rules are provided as natural language sentences, thus bypassing a
formal representation. We train transformers to reason (or emulate reasoning)
over these sentences using synthetically generated data. Our models, that we
call RuleTakers, provide the first empirical demonstration that this kind of
soft reasoning over language is learnable, can achieve high (99%) accuracy, and
generalizes to test data requiring substantially deeper chaining than seen
during training (95%+ scores). We also demonstrate that the models transfer
well to two hand-authored rulebases, and to rulebases paraphrased into more
natural language. These findings are significant as it suggests a new role for
transformers, namely as limited "soft theorem provers" operating over explicit
theories in language. This in turn suggests new possibilities for
explainability, correctability, and counterfactual reasoning in
question-answering.Comment: IJCAI 202
Reasoning over Description Logic-based Contexts with Transformers
One way that the current state of the art measures the reasoning ability of
transformer-based models is by evaluating accuracy in downstream tasks like
logical question answering or proof generation over synthetic contexts
expressed in natural language. However, most of the contexts used are in
practice very simple; in most cases, they are generated from short first-order
logic sentences with only a few logical operators and quantifiers. In this
work, we seek to answer the question how well a transformer-based model will
perform reasoning over expressive contexts. For this purpose, we construct a
synthetic natural language question-answering dataset, generated by description
logic knowledge bases. For the generation of the knowledge bases, we use the
expressive language . The resulting dataset contains 384K
examples, and increases in two dimensions: i) reasoning depth, and ii) length
of sentences. We show that the performance of our DeBERTa-based model,
DELTA, is marginally affected when the reasoning depth is increased and it
is not affected at all when the length of the sentences is increasing. We also
evaluate the generalization ability of the model on reasoning depths unseen at
training, both increasing and decreasing, revealing interesting insights into
the model's adaptive generalization abilities
Braid: Weaving Symbolic and Neural Knowledge into Coherent Logical Explanations
Traditional symbolic reasoning engines, while attractive for their precision
and explicability, have a few major drawbacks: the use of brittle inference
procedures that rely on exact matching (unification) of logical terms, an
inability to deal with uncertainty, and the need for a precompiled rule-base of
knowledge (the "knowledge acquisition" problem). To address these issues, we
devise a novel logical reasoner called Braid, that supports probabilistic
rules, and uses the notion of custom unification functions and dynamic rule
generation to overcome the brittle matching and knowledge-gap problem prevalent
in traditional reasoners. In this paper, we describe the reasoning algorithms
used in Braid, and their implementation in a distributed task-based framework
that builds proof/explanation graphs for an input query. We use a simple QA
example from a children's story to motivate Braid's design and explain how the
various components work together to produce a coherent logical explanation.
Finally, we evaluate Braid on the ROC Story Cloze test and achieve close to
state-of-the-art results while providing frame-based explanations.Comment: Accepted at AAAI-202
Measuring Systematic Generalization in Neural Proof Generation with Transformers
We are interested in understanding how well Transformer language models
(TLMs) can perform reasoning tasks when trained on knowledge encoded in the
form of natural language. We investigate systematic generalization abilities on
an inductive logical reasoning task in natural language, which involves
reasoning over relationships between entities grounded in first-order logical
proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to
generate logical proofs represented in natural language. We systematically test
proof generation capabilities, along with inference capabilities leveraging the
generated proofs. We observe length-generalization issues in proof generation
and inference when evaluated on longer-than-trained sequences. However, we
observe TLMs improve their generalization performance after being exposed to
longer, exhaustive proofs. In addition, we discover that TLMs are able to
generalize better using backward-chaining proofs compared to their
forward-chaining counterparts, while they find it easier to generate forward
chaining proofs. We observe that models that are not trained to generate proofs
are better at generalizing to problems based on longer proofs. This result
suggests that Transformers have efficient, yet not interpretable reasoning
strategies internally. These results also highlight the systematic
generalization issues in TLMs in the context of logical reasoning, and we
believe this work will motivate deeper inspection of their underlying reasoning
strategies.Comment: NeurIPS 2020; 17 pages; 9 figures; 6 table
DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models
In this paper, we present and implement a multi-dimensional, modular framework for performing deep argument analysis (DeepA2) using current pre-trained language models (PTLMs). ArgumentAnalyst – a T5 model [Raffel et al. 2020] set up and trained within DeepA2 – reconstructs argumentative texts, which advance an informal argumentation, as valid arguments: It inserts, e.g., missing premises and conclusions, formalizes inferences, and coherently links the logical reconstruction to the source text. We create a synthetic corpus for deep argument analysis, and evaluate ArgumentAnalyst on this new dataset as well as on existing data, specifically EntailmentBank [Dalvi et al. 2021]. Our empirical findings vindicate the overall framework and highlight the advantages of a modular design, in particular its ability to emulate established heuristics (such as hermeneutic cycles), to explore the model’s uncertainty, to cope with the plurality of correct solutions (underdetermination), and to exploit higher-order evidence
OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models
In this paper, we conduct a thorough investigation into the reasoning
capabilities of Large Language Models (LLMs), focusing specifically on the Open
Pretrained Transformers (OPT) models as a representative of such models. Our
study entails finetuning three different sizes of OPT on a carefully curated
reasoning corpus, resulting in two sets of finetuned models: OPT-R, finetuned
without explanations, and OPT-RE, finetuned with explanations. We then evaluate
all models on 57 out-of-domain tasks drawn from the SUPER-NATURALINSTRUCTIONS
benchmark, covering 26 distinct reasoning skills, utilizing three prompting
techniques. Through a comprehensive grid of 27 configurations and 6,156 test
evaluations, we investigate the dimensions of finetuning, prompting, and scale
to understand the role of explanations on different reasoning skills. Our
findings reveal that having explanations in the fewshot exemplar has no
significant impact on the model's performance when the model is finetuned,
while positively affecting the non-finetuned counterpart. Moreover, we observe
a slight yet consistent increase in classification accuracy as we incorporate
explanations during prompting and finetuning, respectively. Finally, we offer
insights on which skills benefit the most from incorporating explanations
during finetuning and prompting, such as Numerical (+20.4%) and Analogical
(+13.9%) reasoning, as well as skills that exhibit negligible or negative
effects.Comment: Proceedings of the 1st Workshop on Natural Language Reasoning and
Structured Explanations (NLRSE) at ACL 202
DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models
In this paper, we present and implement a multi-dimensional, modular
framework for performing deep argument analysis (DeepA2) using current
pre-trained language models (PTLMs). ArgumentAnalyst -- a T5 model (Raffel et
al. 2020) set up and trained within DeepA2 -- reconstructs argumentative texts,
which advance an informal argumentation, as valid arguments: It inserts, e.g.,
missing premises and conclusions, formalizes inferences, and coherently links
the logical reconstruction to the source text. We create a synthetic corpus for
deep argument analysis, and evaluate ArgumentAnalyst on this new dataset as
well as on existing data, specifically EntailmentBank (Dalvi et al. 2021). Our
empirical findings vindicate the overall framework and highlight the advantages
of a modular design, in particular its ability to emulate established
heuristics (such as hermeneutic cycles), to explore the model's uncertainty, to
cope with the plurality of correct solutions (underdetermination), and to
exploit higher-order evidence.Comment: A Demo is available at
https://huggingface.co/spaces/debatelab/deepa2-demo , the model can be
downloaded from https://huggingface.co/debatelab/argument-analyst , and the
datasets can be accessed at https://huggingface.co/datasets/debatelab/aaa
- …