8 research outputs found
Discourse-Aware Graph Networks for Textual Logical Reasoning
Textual logical reasoning, especially question-answering (QA) tasks with
logical reasoning, requires awareness of particular logical structures. The
passage-level logical relations represent entailment or contradiction between
propositional units (e.g., a concluding sentence). However, such structures are
unexplored as current QA systems focus on entity-based relations. In this work,
we propose logic structural-constraint modeling to solve the logical reasoning
QA and introduce discourse-aware graph networks (DAGNs). The networks first
construct logic graphs leveraging in-line discourse connectives and generic
logic theories, then learn logic representations by end-to-end evolving the
logic relations with an edge-reasoning mechanism and updating the graph
features. This pipeline is applied to a general encoder, whose fundamental
features are joined with the high-level logic features for answer prediction.
Experiments on three textual logical reasoning datasets demonstrate the
reasonability of the logical structures built in DAGNs and the effectiveness of
the learned logic features. Moreover, zero-shot transfer results show the
features' generality to unseen logical texts
REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement
When answering a question, people often draw upon their rich world knowledge
in addition to the particular context. While recent works retrieve supporting
facts/evidence from commonsense knowledge bases to supply additional
information to each question, there is still ample opportunity to advance it on
the quality of the evidence. It is crucial since the quality of the evidence is
the key to answering commonsense questions, and even determines the upper bound
on the QA systems performance. In this paper, we propose a recursive erasure
memory network (REM-Net) to cope with the quality improvement of evidence. To
address this, REM-Net is equipped with a module to refine the evidence by
recursively erasing the low-quality evidence that does not explain the question
answering. Besides, instead of retrieving evidence from existing knowledge
bases, REM-Net leverages a pre-trained generative model to generate candidate
evidence customized for the question. We conduct experiments on two commonsense
question answering datasets, WIQA and CosmosQA. The results demonstrate the
performance of REM-Net and show that the refined evidence is explainable.Comment: Accepted by AAAI 202
Discourse-Aware Graph Networks for Textual Logical Reasoning.
Textual logical reasoning, especially question-answering (QA) tasks with logical reasoning, requires awareness of particular logical structures. The passage-level logical relations represent entailment or contradiction between propositional units (e.g., a concluding sentence). However, such structures are unexplored as current QA systems focus on entity-based relations. In this work, we propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs). The networks first construct logic graphs leveraging in-line discourse connectives and generic logic theories, then learn logic representations by end-to-end evolving the logic relations with an edge-reasoning mechanism and updating the graph features. This pipeline is applied to a general encoder, whose fundamental features are joined with the high-level logic features for answer prediction. Experiments on three textual logical reasoning datasets demonstrate the reasonability of the logical structures built in DAGNs and the effectiveness of the learned logic features. Moreover, zero-shot transfer results show the features' generality to unseen logical texts
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models
Automated theorem proving (ATP) has become an appealing domain for exploring
the reasoning ability of the recent successful generative language models.
However, current ATP benchmarks mainly focus on symbolic inference, but rarely
involve the understanding of complex number combination reasoning. In this
work, we propose TRIGO, an ATP benchmark that not only requires a model to
reduce a trigonometric expression with step-by-step proofs but also evaluates a
generative LM's reasoning ability on formulas and its capability to manipulate,
group, and factor number terms. We gather trigonometric expressions and their
reduced forms from the web, annotate the simplification process manually, and
translate it into the Lean formal language system. We then automatically
generate additional examples from the annotated samples to expand the dataset.
Furthermore, we develop an automatic generator based on Lean-Gym to create
dataset splits of varying difficulties and distributions in order to thoroughly
analyze the model's generalization ability. Our extensive experiments show our
proposed TRIGO poses a new challenge for advanced generative LM's including
GPT-4 which is pre-trained on a considerable amount of open-source formal
theorem-proving language data, and provide a new tool to study the generative
LM's ability on both formal and mathematical reasoning.Comment: Accepted by EMNLP 2023. Code is available at
https://github.com/menik1126/TRIG
LEGO-Prover: Neural Theorem Proving with Growing Libraries
Despite the success of large language models (LLMs), the task of theorem
proving still remains one of the hardest reasoning tasks that is far from being
fully solved. Prior methods using language models have demonstrated promising
results, but they still struggle to prove even middle school level theorems.
One common limitation of these methods is that they assume a fixed theorem
library during the whole theorem proving process. However, as we all know,
creating new useful theorems or even new theories is not only helpful but
crucial and necessary for advancing mathematics and proving harder and deeper
results. In this work, we present LEGO-Prover, which employs a growing skill
library containing verified lemmas as skills to augment the capability of LLMs
used in theorem proving. By constructing the proof modularly, LEGO-Prover
enables LLMs to utilize existing skills retrieved from the library and to
create new skills during the proving process. These skills are further evolved
(by prompting an LLM) to enrich the library on another scale. Modular and
reusable skills are constantly added to the library to enable tackling
increasingly intricate mathematical problems. Moreover, the learned library
further bridges the gap between human proofs and formal proofs by making it
easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass
rate on miniF2F-valid (48.0% to 57.0%) and miniF2F-test (45.5% to 47.1%).
During the proving process, LEGO-Prover also manages to generate over 20,000
skills (theorems/lemmas) and adds them to the growing library. Our ablation
study indicates that these newly added skills are indeed helpful for proving
theorems, resulting in an improvement from a success rate of 47.1% to 50.4%. We
also release our code and all the generated skills