Search CORE

471 research outputs found

Logic Constrained Pointer Networks for Interpretable Textual Similarity

Author: Bansal Manish
Goyal Pawan
Kumar Rohan
Maji Subhadeep
Roy Kalyani
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 15/07/2020
Field of study

Systematically discovering semantic relationships in text is an important and extensively studied area in Natural Language Processing, with various tasks such as entailment, semantic similarity, etc. Decomposability of sentence-level scores via subsequence alignments has been proposed as a way to make models more interpretable. We study the problem of aligning components of sentences leading to an interpretable model for semantic textual similarity. In this paper, we introduce a novel pointer network based model with a sentinel gating function to align constituent chunks, which are represented using BERT. We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional. Finally, to guide the network with structured external knowledge, we introduce first-order logic constraints based on ConceptNet and syntactic knowledge. The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task, showing large improvements over the existing solutions. Source code is available at https://github.com/manishb89/interpretable_sentence_similarityComment: Accepted at IJCAI 2020 Main Track. Sole copyright holder is IJCAI, all rights reserved. Available at https://www.ijcai.org/Proceedings/2020/33

arXiv.org e-Print Archive

Crossref

On Differentiable Interpreters

Author: Bošnjak Matko
Publication venue: UCL (University College London)
Publication date: 28/02/2021
Field of study

Neural networks have transformed the fields of Machine Learning and Artificial Intelligence with the ability to model complex features and behaviours from raw data. They quickly became instrumental models, achieving numerous state-of-the-art performances across many tasks and domains. Yet the successes of these models often rely on large amounts of data. When data is scarce, resourceful ways of using background knowledge often help. However, though different types of background knowledge can be used to bias the model, it is not clear how one can use algorithmic knowledge to that extent. In this thesis, we present differentiable interpreters as an effective framework for utilising algorithmic background knowledge as architectural inductive biases of neural networks. By continuously approximating discrete elements of traditional program interpreters, we create differentiable interpreters that, due to the continuous nature of their execution, are amenable to optimisation with gradient descent methods. This enables us to write code mixed with parametric functions, where the code strongly biases the behaviour of the model while enabling the training of parameters and/or input representations from data. We investigate two such differentiable interpreters and their use cases in this thesis. First, we present a detailed construction of ∂4, a differentiable interpreter for the programming language FORTH. We demonstrate the ability of ∂4 to strongly bias neural models with incomplete programs of variable complexity while learning missing pieces of the program with parametrised neural networks. Such models can learn to solve tasks and strongly generalise to out-of-distribution data from small datasets. Second, we present greedy Neural Theorem Provers (gNTPs), a significant improvement of a differentiable Datalog interpreter NTP. gNTPs ameliorate the large computational cost of recursive differentiable interpretation, achieving drastic time and memory speedups while introducing soft reasoning over logic knowledge and natural language

UCL Discovery

Complex Knowledge Base Question Answering: A Survey

Author: He Gaole
Jiang Jing
Jiang Jinhao
Lan Yunshi
Wen Ji-Rong
Zhao Wayne Xin
Publication venue
Publication date: 01/01/2022
Field of study

Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB). Early studies mainly focused on answering simple questions over KBs and achieved great success. However, their performance on complex questions is still far from satisfactory. Therefore, in recent years, researchers propose a large number of novel methods, which looked into the challenges of answering complex questions. In this survey, we review recent advances on KBQA with the focus on solving complex questions, which usually contain multiple subjects, express compound relations, or involve numerical operations. In detail, we begin with introducing the complex KBQA task and relevant background. Then, we describe benchmark datasets for complex KBQA task and introduce the construction process of these datasets. Next, we present two mainstream categories of methods for complex KBQA, namely semantic parsing-based (SP-based) methods and information retrieval-based (IR-based) methods. Specifically, we illustrate their procedures with flow designs and discuss their major differences and similarities. After that, we summarize the challenges that these two categories of methods encounter when answering complex questions, and explicate advanced solutions and techniques used in existing work. Finally, we conclude and discuss several promising directions related to complex KBQA for future research.Comment: 20 pages, 4 tables, 7 figures. arXiv admin note: text overlap with arXiv:2105.1164

arXiv.org e-Print Archive

TU Delft Repository

Institutional Knowledge at Singapore Management University

Combining Representation Learning with Logic for Language Processing

Author: Rocktäschel Tim
Publication venue
Publication date: 27/12/2017
Field of study

The current state-of-the-art in many natural language processing and automated knowledge base completion tasks is held by representation learning methods which learn distributed vector representations of symbols via gradient-based optimization. They require little or no hand-crafted features, thus avoiding the need for most preprocessing steps and task-specific assumptions. However, in many cases representation learning requires a large amount of annotated training data to generalize well to unseen data. Such labeled training data is provided by human annotators who often use formal logic as the language for specifying annotations. This thesis investigates different combinations of representation learning methods with logic for reducing the need for annotated training data, and for improving generalization.Comment: PhD Thesis, University College London, Submitted and accepted in 201

arXiv.org e-Print Archive

UCL Discovery

A Survey on Knowledge Graphs: Representation, Acquisition and Applications

Author: Cambria Erik
Ji Shaoxiong
Marttinen Pekka
Pan Shirui
Yu Philip S.
Publication venue
Publication date: 17/01/2021
Field of study

Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review of knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning, are reviewed. We further explore several emerging topics, including meta relational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Aaltodoc Publication Archive

Explainable Neural Attention Recommender Systems

Author: Tal Omer
Publication venue: Scholars Commons @ Laurier
Publication date: 01/01/2019
Field of study

Recommender systems, predictive models that provide lists of personalized suggestions, have become increasingly popular in many web-based businesses. By presenting potential items that may interest a user, these systems are able to better monetize and improve users’ satisfaction. In recent years, the most successful approaches rely on capturing what best define users and items in the form of latent vectors, a numeric representation that assumes all instances can be described by their respective affiliation towards a set of hidden features. However, recommendation methods based on latent features still face some realworld limitations. The data sparsity problem originates from the unprecedented variety of available items, making generated suggestions irrelevant to many users. Furthermore, many systems have been recently expected to accompany their suggestions with corresponding reasoning. Users who receive unjustified recommendations they do not agree with are susceptible to stop using the system or ignore its suggestions. In this work we investigate the current trends in the field of recommender systems and focus on two rising areas, deep recommendation and explainable recommender systems. First we present Textual and Contextual Embedding-based Neural Recommender (TCENR), a model that mitigates the data sparsity problem in the area of point-of-interest (POI) recommendation. This method employs different types of deep neural networks to learn varied perspectives of the same user-location interaction, using textual reviews, geographical data and social networks

Wilfrid Laurier University

Attention in Natural Language Processing

Author: Galassi Andrea
Lippi Marco
Torroni Paolo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Attention is an increasingly popular mechanism used in a wide range of neural architectures. The mechanism itself has been realized in a variety of formats. However, because of the fast-paced advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures in natural language processing, with a focus on those designed to work with vector representations of the textual data. We propose a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the input and/or output. We present the examples of how prior information can be exploited in attention models and discuss ongoing research efforts and open challenges in the area, providing the first extensive categorization of the vast body of literature in this exciting domain

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia