181 research outputs found

    Knowledge Reasoning with Graph Neural Networks

    Get PDF
    Knowledge reasoning is the process of drawing conclusions from existing facts and rules, which requires a range of capabilities including but not limited to understanding concepts, applying logic, and calibrating or validating architecture based on existing knowledge. With the explosive growth of communication techniques and mobile devices, much of collective human knowledge resides on the Internet today, in unstructured and semi-structured forms such as text, tables, images, videos, etc. It is overwhelmingly difficult for human to navigate the gigantic Internet knowledge without the help of intelligent systems such as search engines and question answering systems. To serve various information needs, in this thesis, we develop methods to perform knowledge reasoning over both structured and unstructured data. This thesis attempts to answer the following research questions on the topic of knowledge reasoning: (1) How to perform multi-hop reasoning over knowledge graphs? How should we leverage graph neural networks to learn graph-aware representations efficiently? And, how to systematically handle the noise in human questions? (2) How to combine deep learning and symbolic reasoning in a consistent probabilistic framework? How to make the inference efficient and scalable for large-scale knowledge graphs? Can we strike a balance between the representation power and the simplicity of the model? (3) What is the reasoning pattern of graph neural networks for knowledge-aware QA tasks? Can those elaborately designed GNN modules really perform complex reasoning process? Are they under- or over-complicated? Can we design a much simpler yet effective model to achieve comparable performance? (4) How to build an open-domain question answering system that can reason over multiple retrieved documents? How to efficiently rank and filter the retrieved documents to reduce the noise for the downstream answer prediction module? How to propagate and assemble the information among multiple retrieved documents? (5) How to answer the questions that require numerical reasoning over textual passages? How to enable pre-trained language models to perform numerical reasoning? We explored the research questions above and discovered that graph neural networks can be leveraged as a powerful tool for various knowledge reasoning tasks over both structured and unstructured knowledge sources. On structured graph-based knowledge source, we build graph neural networks on top of the graph structure to capture the topology information for downstream reasoning tasks. On unstructured text-based knowledge source, we first identify graph-structured information such as entity co-occurrence and entity-number binding, and then employ graph neural networks to reason over the constructed graphs, working together with pre-trained language models to handle unstructured part of the knowledge source.Ph.D

    Multimedia question answering

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

    Get PDF
    Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents--or short passages--in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms--such as a person's name or a product model number--not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections--such as the document index of a commercial Web search engine--containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020

    Advanced models of supervised structural clustering

    Get PDF
    The strength and power of structured prediction approaches in machine learning originates from a proper recognition and exploitation of inherent structural dependencies within complex objects, which structural models are trained to output. Among the complex tasks that benefited from structured prediction approaches, clustering is of a special interest. Structured output models based on representing clusters by latent graph structures made the task of supervised clustering tractable. While in practice these models proved effective in solving the complex NLP task of coreference resolution, in this thesis, we aim at exploring their capacity to be extended to other tasks and domains, as well as the methods for performing such adaptation and for improvement in general, which, as a result, go beyond clustering and are commonly applicable in structured prediction. Studying the extensibility of the structural approaches for supervised clustering, we apply them to two different domains in two different ways. First, in the networking domain, we do clustering of network traffic by adapting the model, taking into account the continuity of incoming data. Our experiments demonstrate that the structural clustering approach is not only effective in such a scenario, but also, if changing the perspective, provides a novel potentially useful tool for detecting anomalies. The other part of our work is dedicated to assessing the amenability of the structural clustering model to joint learning with another structural model, for ranking. Our preliminary analysis in the context of the task of answer-passage reranking in question answering reveals a potential benefit of incorporating auxiliary clustering structures. Due to the intrinsic complexity of the clustering task and, respectively, its evaluation scenarios, it gave us grounds for studying the possibility and the effect from optimizing task-specific complex measures in structured prediction algorithms. It is common for structured prediction approaches to optimize surrogate loss functions, rather than the actual task-specific ones, in or- der to facilitate inference and preserve efficiency. In this thesis, we, first, study when surrogate losses are sufficient and, second, make a step towards enabling direct optimization of complex structural loss functions. We propose to learn an approximation of a complex loss by a regressor from data. We formulate a general structural framework for learning with a learned loss, which, applied to a particular case of a clustering problem – coreference resolution, i) enables the optimization of a coreference metric, by itself, having high computational complexity, and ii) delivers an improvement over the standard structural models optimizing simple surrogate objectives. We foresee this idea being helpful in many structured prediction applications, also as a means of adaptation to specific evaluation scenarios, and especially when a good loss approximation is found by a regressor from an induced feature space allowing good factorization over the underlying structure

    Dense Text Retrieval based on Pretrained Language Models: A Survey

    Full text link
    Text retrieval is a long-standing research topic on information seeking, where a system is required to return relevant information resources to user's queries in natural language. From classic retrieval methods to learning-based ranking functions, the underlying retrieval models have been continually evolved with the ever-lasting technical innovation. To design effective retrieval models, a key point lies in how to learn the text representation and model the relevance matching. The recent success of pretrained language models (PLMs) sheds light on developing more capable text retrieval approaches by leveraging the excellent modeling capacity of PLMs. With powerful PLMs, we can effectively learn the representations of queries and texts in the latent representation space, and further construct the semantic matching function between the dense vectors for relevance modeling. Such a retrieval approach is referred to as dense retrieval, since it employs dense vectors (a.k.a., embeddings) to represent the texts. Considering the rapid progress on dense retrieval, in this survey, we systematically review the recent advances on PLM-based dense retrieval. Different from previous surveys on dense retrieval, we take a new perspective to organize the related work by four major aspects, including architecture, training, indexing and integration, and summarize the mainstream techniques for each aspect. We thoroughly survey the literature, and include 300+ related reference papers on dense retrieval. To support our survey, we create a website for providing useful resources, and release a code repertory and toolkit for implementing dense retrieval models. This survey aims to provide a comprehensive, practical reference focused on the major progress for dense text retrieval

    Modelling input texts: from Tree Kernels to Deep Learning

    Get PDF
    One of the core questions when designing modern Natural Language Processing (NLP) systems is how to model input textual data such that the learning algorithm is provided with enough information to estimate accurate decision functions. The mainstream approach is to represent input objects as feature vectors where each value encodes some of their aspects, e.g., syntax, semantics, etc. Feature-based methods have demonstrated state-of-the-art results on various NLP tasks. However, designing good features is a highly empirical-driven process, it greatly depends on a task requiring a significant amount of domain expertise. Moreover, extracting features for complex NLP tasks often requires expensive pre-processing steps running a large number of linguistic tools while relying on external knowledge sources that are often not available or hard to get. Hence, this process is not cheap and often constitutes one of the major challenges when attempting a new task or adapting to a different language or domain. The problem of modelling input objects is even more acute in cases when the input examples are not just single objects but pairs of objects, such as in various learning to rank problems in Information Retrieval and Natural Language processing. An alternative to feature-based methods is using kernels which are essentially non-linear functions mapping input examples into some high dimensional space thus allowing for learning decision functions with higher discriminative power. Kernels implicitly generate a very large number of features computing similarity between input examples in that implicit space. A well-designed kernel function can greatly reduce the effort to design a large set of manually designed features often leading to superior results. However, in the recent years, the use of kernel methods in NLP has been greatly under-estimated primarily due to the following reasons: (i) learning with kernels is slow as it requires to carry out optimization in the dual space leading to quadratic complexity; (ii) applying kernels to the input objects encoded with vanilla structures, e.g., generated by syntactic parsers, often yields minor improvements over carefully designed feature-based methods. In this thesis, we adopt the kernel learning approach for solving complex NLP tasks and primarily focus on solutions to the aforementioned problems posed by the use of kernels. In particular, we design novel learning algorithms for training Support Vector Machines with structural kernels, e.g., tree kernels, considerably speeding up the training over the conventional SVM training methods. We show that using the training algorithms developed in this thesis allows for training tree kernel models on large-scale datasets containing millions of instances, which was not possible before. Next, we focus on the problem of designing input structures that are fed to tree kernel functions to automatically generate a large set of tree-fragment features. We demonstrate that previously used plain structures generated by syntactic parsers, e.g., syntactic or dependency trees, are often a poor choice thus compromising the expressivity offered by a tree kernel learning framework. We propose several effective design patterns of the input tree structures for various NLP tasks ranging from sentiment analysis to answer passage reranking. The central idea is to inject additional semantic information relevant for the task directly into the tree nodes and let the expressive kernels generate rich feature spaces. For the opinion mining tasks, the additional semantic information injected into tree nodes can be word polarity labels, while for more complex tasks of modelling text pairs the relational information about overlapping words in a pair appears to significantly improve the accuracy of the resulting models. Finally, we observe that both feature-based and kernel methods typically treat words as atomic units where matching different yet semantically similar words is problematic. Conversely, the idea of distributional approaches to model words as vectors is much more effective in establishing a semantic match between words and phrases. While tree kernel functions do allow for a more flexible matching between phrases and sentences through matching their syntactic contexts, their representation can not be tuned on the training set as it is possible with distributional approaches. Recently, deep learning approaches have been applied to generalize the distributional word matching problem to matching sentences taking it one step further by learning the optimal sentence representations for a given task. Deep neural networks have already claimed state-of-the-art performance in many computer vision, speech recognition, and natural language tasks. Following this trend, this thesis also explores the virtue of deep learning architectures for modelling input texts and text pairs where we build on some of the ideas to model input objects proposed within the tree kernel learning framework. In particular, we explore the idea of relational linking (proposed in the preceding chapters to encode text pairs using linguistic tree structures) to design a state-of-the-art deep learning architecture for modelling text pairs. We compare the proposed deep learning models that require even less manual intervention in the feature design process then previously described tree kernel methods that already offer a very good trade-off between the feature-engineering effort and the expressivity of the resulting representation. Our deep learning models demonstrate the state-of-the-art performance on a recent benchmark for Twitter Sentiment Analysis, Answer Sentence Selection and Microblog retrieval
    corecore