58 research outputs found
Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks
Textual entailment is a fundamental task in natural language processing. Most
approaches for solving the problem use only the textual content present in
training data. A few approaches have shown that information from external
knowledge sources like knowledge graphs (KGs) can add value, in addition to the
textual content, by providing background knowledge that may be critical for a
task. However, the proposed models do not fully exploit the information in the
usually large and noisy KGs, and it is not clear how it can be effectively
encoded to be useful for entailment. We present an approach that complements
text-based entailment models with information from KGs by (1) using
Personalized PageR- ank to generate contextual subgraphs with reduced noise and
(2) encoding these subgraphs using graph convolutional networks to capture KG
structure. Our technique extends the capability of text models exploiting
structural and semantic information found in KGs. We evaluate our approach on
multiple textual entailment datasets and show that the use of external
knowledge helps improve prediction accuracy. This is particularly evident in
the challenging BreakingNLI dataset, where we see an absolute improvement of
5-20% over multiple text-based entailment models
ExBERT: An External Knowledge Enhanced BERT for Natural Language Inference
Neural language representation models such as BERT, pretrained on large-scale unstructured corpora lack explicit grounding to real-world commonsense knowledge and are often unable to remember facts required for reasoning and inference. Natural Language Inference (NLI) is a challenging reasoning task that relies on common human understanding of language and real-world commonsense knowledge. We introduce a new model for NLI called External Knowledge Enhanced BERT (ExBERT), to enrich the contextual representation with realworld commonsense knowledge from external knowledge sources and enhance BERT’s language understanding and reasoning capabilities. ExBERT takes full advantage of contextual word representations obtained from BERT and employs them to retrieve relevant external knowledge from knowledge graphs and to encode the retrieved external knowledge. Our model adaptively incorporates the external knowledge context required for reasoning over the inputs. Extensive experiments on the challenging SciTail and SNLI benchmarks demonstrate the effectiveness of ExBERT: in comparison to the previous state-of-the-art, we obtain an accuracy of 95.9% on SciTail and 91.5% on SNLI
IERL: Interpretable Ensemble Representation Learning - Combining CrowdSourced Knowledge and Distributed Semantic Representations
Large Language Models (LLMs) encode meanings of words in the form of distributed semantics. Distributed semantics capture common statistical patterns among language tokens (words, phrases, and sentences) from large amounts of data. LLMs perform exceedingly well across General Language Understanding Evaluation (GLUE) tasks designed to test a model’s understanding of the meanings of the input tokens. However, recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs when processing inputs that were seen rarely during training, or inputs that are associated with diverse contexts (e.g., well-known hallucination phenomenon in language generation tasks). Crowdsourced and expert-curated knowledge graphs such as ConceptNet are designed to capture the meaning of words from a compact set of well-defined contexts. Thus LLMs may benefit from leveraging such knowledge contexts to reduce inconsistencies in outputs. We propose a novel ensemble learning method, the Interpretable Ensemble Representation Learning (IERL), that systematically combines LLM and crowdsourced knowledge representations of input tokens. IERL has the distinct advantage of being interpretable by design (when was the LLM context used vs. when was the knowledge context used?) over state-of-the-art (SOTA) methods, allowing scrutiny of the inputs in conjunction with the parameters of the model, facilitating the analysis of models’ inconsistent or irrelevant outputs. Although IERL is agnostic to the choice of LLM and crowdsourced knowledge, we demonstrate our approach using BERT and ConceptNet. We report improved or competitive results with IERL across GLUE tasks over curren
Enhancing the Reasoning Capabilities of Natural Language Inference Models with Attention Mechanisms and External Knowledge
Natural Language Inference (NLI) is fundamental to natural language understanding. The task summarises the natural language understanding capabilities within a simple formulation of determining whether a natural language hypothesis can be inferred from a given natural language premise. NLI requires an inference system to address the full complexity of linguistic as well as real-world commonsense knowledge and, hence, the inferencing and reasoning capabilities of an NLI system are utilised in other complex language applications such as summarisation and machine comprehension. Consequently, NLI has received significant recent attention from both academia and industry. Despite extensive research, contemporary neural NLI models face challenges arising from the sole reliance on training data to comprehend all the linguistic and real-world commonsense knowledge. Further, different attention mechanisms, crucial to the success of neural NLI models, present the prospects of better utilisation when employed in combination. In addition, the NLI research field lacks a coherent set of guidelines for the application of one of the most crucial regularisation hyper-parameters in the RNN-based NLI models -- dropout.
In this thesis, we present neural models capable of leveraging the attention mechanisms and the models that utilise external knowledge to reason about inference. First, a combined attention model to leverage different attention mechanisms is proposed. Experimentation demonstrates that the proposed model is capable of better modelling the semantics of long and complex sentences. Second, to address the limitation of the sole reliance on the training data, two novel neural frameworks utilising real-world commonsense and domain-specific external knowledge are introduced. Employing the rule-based external knowledge retrieval from the knowledge graphs, the first model takes advantage of the convolutional encoders and factorised bilinear pooling to augment the reasoning capabilities of the state-of-the-art NLI models. Utilising the significant advances in the research of contextual word representations, the second model, addresses the existing crucial challenges of external knowledge retrieval, learning the encoding of the retrieved knowledge and the fusion of the learned encodings to the NLI representations, in unique ways. Experimentation demonstrates the efficacy and superiority of the proposed models over previous state-of-the-art approaches. Third, for the limitation on dropout investigations, formulated on exhaustive evaluation, analysis and validation on the proposed RNN-based NLI models, a coherent set of guidelines is introduced
A survey on knowledge-enhanced multimodal learning
Multimodal learning has been a field of increasing interest, aiming to
combine various modalities in a single joint representation. Especially in the
area of visiolinguistic (VL) learning multiple models and techniques have been
developed, targeting a variety of tasks that involve images and text. VL models
have reached unprecedented performances by extending the idea of Transformers,
so that both modalities can learn from each other. Massive pre-training
procedures enable VL models to acquire a certain level of real-world
understanding, although many gaps can be identified: the limited comprehension
of commonsense, factual, temporal and other everyday knowledge aspects
questions the extendability of VL tasks. Knowledge graphs and other knowledge
sources can fill those gaps by explicitly providing missing information,
unlocking novel capabilities of VL models. In the same time, knowledge graphs
enhance explainability, fairness and validity of decision making, issues of
outermost importance for such complex implementations. The current survey aims
to unify the fields of VL representation learning and knowledge graphs, and
provides a taxonomy and analysis of knowledge-enhanced VL models
Biomedical Question Answering: A Survey of Approaches and Challenges
Automatic Question Answering (QA) has been successfully applied in various
domains such as search engines and chatbots. Biomedical QA (BQA), as an
emerging QA task, enables innovative applications to effectively perceive,
access and understand complex biomedical knowledge. There have been tremendous
developments of BQA in the past two decades, which we classify into 5
distinctive approaches: classic, information retrieval, machine reading
comprehension, knowledge base and question entailment approaches. In this
survey, we introduce available datasets and representative methods of each BQA
approach in detail. Despite the developments, BQA systems are still immature
and rarely used in real-life settings. We identify and characterize several key
challenges in BQA that might lead to this issue, and discuss some potential
future directions to explore.Comment: In submission to ACM Computing Survey
Discourse-Aware Graph Networks for Textual Logical Reasoning
Textual logical reasoning, especially question-answering (QA) tasks with
logical reasoning, requires awareness of particular logical structures. The
passage-level logical relations represent entailment or contradiction between
propositional units (e.g., a concluding sentence). However, such structures are
unexplored as current QA systems focus on entity-based relations. In this work,
we propose logic structural-constraint modeling to solve the logical reasoning
QA and introduce discourse-aware graph networks (DAGNs). The networks first
construct logic graphs leveraging in-line discourse connectives and generic
logic theories, then learn logic representations by end-to-end evolving the
logic relations with an edge-reasoning mechanism and updating the graph
features. This pipeline is applied to a general encoder, whose fundamental
features are joined with the high-level logic features for answer prediction.
Experiments on three textual logical reasoning datasets demonstrate the
reasonability of the logical structures built in DAGNs and the effectiveness of
the learned logic features. Moreover, zero-shot transfer results show the
features' generality to unseen logical texts
- …