140,414 research outputs found
Compositional Semantic Parsing on Semi-Structured Tables
Two important aspects of semantic parsing for question answering are the
breadth of the knowledge source and the depth of logical compositionality.
While existing work trades off one aspect for another, this paper
simultaneously makes progress on both fronts through a new task: answering
complex questions on semi-structured tables using question-answer pairs as
supervision. The central challenge arises from two compounding factors: the
broader domain results in an open-ended set of relations, and the deeper
compositionality results in a combinatorial explosion in the space of logical
forms. We propose a logical-form driven parsing algorithm guided by strong
typing constraints and show that it obtains significant improvements over
natural baselines. For evaluation, we created a new dataset of 22,033 complex
questions on Wikipedia tables, which is made publicly available
Transformer-based Subject Entity Detection in Wikipedia Listings
In tasks like question answering or text summarisation, it is essential to
have background knowledge about the relevant entities. The information about
entities - in particular, about long-tail or emerging entities - in publicly
available knowledge graphs like DBpedia or CaLiGraph is far from complete. In
this paper, we present an approach that exploits the semi-structured nature of
listings (like enumerations and tables) to identify the main entities of the
listing items (i.e., of entries and rows). These entities, which we call
subject entities, can be used to increase the coverage of knowledge graphs. Our
approach uses a transformer network to identify subject entities at the
token-level and surpasses an existing approach in terms of performance while
being bound by fewer limitations. Due to a flexible input format, it is
applicable to any kind of listing and is, unlike prior work, not dependent on
entity boundaries as input. We demonstrate our approach by applying it to the
complete Wikipedia corpus and extracting 40 million mentions of subject
entities with an estimated precision of 71% and recall of 77%. The results are
incorporated in the most recent version of CaLiGraph.Comment: Published at Deep Learning for Knowledge Graphs workshop (DL4KG) at
International Semantic Web Conference 2022 (ISWC 2022
Answering Complex Questions Using Open Information Extraction
While there has been substantial progress in factoid question-answering (QA),
answering complex questions remains challenging, typically requiring both a
large body of knowledge and inference techniques. Open Information Extraction
(Open IE) provides a way to generate semi-structured knowledge for QA, but to
date such knowledge has only been used to answer simple questions with
retrieval-based methods. We overcome this limitation by presenting a method for
reasoning with Open IE knowledge, allowing more complex questions to be
handled. Using a recently proposed support graph optimization framework for QA,
we develop a new inference model for Open IE, in particular one that can work
effectively with multiple short facts, noise, and the relational structure of
tuples. Our model significantly outperforms a state-of-the-art structured
solver on complex questions of varying difficulty, while also removing the
reliance on manually curated knowledge.Comment: Accepted as short paper at ACL 201
Facilitating Information Access for Heterogeneous Data Across Many Languages
Information access, which enables people to identify, retrieve, and use information freely and effectively, has attracted interest from academia and industry. Systems for document retrieval and question answering have helped people access information in powerful and useful ways. Recently, natural language technologies based on neural network have been applied to various tasks for information access. Specifically, transformer-based pre-trained models have pushed tasks such as document and passage retrieval to new state-of-the-art effectiveness. (1) Most of the research has focused on helping people access passages and documents on the web. However, there is abundant information stored in other formats such as semi-structured tables and domain-specific relational databases in companies. Development of the models and frameworks that support access information from these data formats is also essential. (2) Moreover, most of the advances in information access research are based on English, leaving other languages less explored. It is insufficient and inequitable in our globalized and connected world to serve only speakers of English.
In this thesis, we explore and develop models and frameworks that could alleviate the aforementioned challenges. This dissertation consists of three parts. We begin with a discussion on developing models designed for accessing data in formats other than passages and documents. We mainly focus on two data formats, namely semi-structured tables and relational databases. In the second part, we discuss methods that can enhance the user experience for non-English speakers when using information access systems. Specifically, we first introduce model development for multilingual knowledge graph integration, which can benefit many information access applications such as cross-lingual question answering systems and other knowledge-driven cross-lingual NLP applications. We further focus on multilingual document dense retrieval and reranking that boost the effectiveness of search engines for non-English information access. Last but not least, we take a step further based on the aforementioned two parts by investigating models and frameworks that can facilitate non-English speakers to access structured data. In detail, we present cross-lingual Text-to-SQL semantic parsing systems that enable non-English speakers to query relational databases with queries in their languages
Ontology-based Information Extraction with SOBA
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities
- …