1,159 research outputs found
Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia
We study the task of generating from Wikipedia articles question-answer pairs
that cover content beyond a single sentence. We propose a neural network
approach that incorporates coreference knowledge via a novel gating mechanism.
Compared to models that only take into account sentence-level information
(Heilman and Smith, 2010; Du et al., 2017; Zhou et al., 2017), we find that the
linguistic knowledge introduced by the coreference representation aids question
generation significantly, producing models that outperform the current
state-of-the-art. We apply our system (composed of an answer span extraction
system and the passage-level QG system) to the 10,000 top-ranking Wikipedia
articles and create a corpus of over one million question-answer pairs. We also
provide a qualitative analysis for this large-scale generated corpus from
Wikipedia.Comment: Accepted to ACL 2018 (long paper
A Dataset and Baselines for Visual Question Answering on Art
Answering questions related to art pieces (paintings) is a difficult task, as
it implies the understanding of not only the visual information that is shown
in the picture, but also the contextual knowledge that is acquired through the
study of the history of art. In this work, we introduce our first attempt
towards building a new dataset, coined AQUA (Art QUestion Answering). The
question-answer (QA) pairs are automatically generated using state-of-the-art
question generation methods based on paintings and comments provided in an
existing art understanding dataset. The QA pairs are cleansed by crowdsourcing
workers with respect to their grammatical correctness, answerability, and
answers' correctness. Our dataset inherently consists of visual
(painting-based) and knowledge (comment-based) questions. We also present a
two-branch model as baseline, where the visual and knowledge questions are
handled independently. We extensively compare our baseline model against the
state-of-the-art models for question answering, and we provide a comprehensive
study about the challenges and potential future directions for visual question
answering on art
Recommended from our members
Linking Textual Resources to Support Information Discovery
A vast amount of information is today stored in the form of textual documents, many of which are available online. These documents come from different sources and are of different types. They include newspaper articles, books, corporate reports, encyclopedia entries and research papers. At a semantic level, these documents contain knowledge, which was created by explicitly connecting information and expressing it in the form of a natural language. However, a significant amount of knowledge is not explicitly stated in a single document, yet can be derived or discovered by researching, i.e. accessing, comparing, contrasting and analysing, information from multiple documents. Carrying out this work using traditional search interfaces is tedious due to information overload and the difficulty of formulating queries that would help us to discover information we are not aware of.
In order to support this exploratory process, we need to be able to effectively navigate between related pieces of information across documents. While information can be connected using manually curated cross-document links, this approach not only does not scale, but cannot systematically assist us in the discovery of sometimes non-obvious (hidden) relationships. Consequently, there is a need for automatic approaches to link discovery.
This work studies how people link content, investigates the properties of different link types, presents new methods for automatic link discovery and designs a system in which link discovery is applied on a collection of millions of documents to improve access to public knowledge
ANSWERING TOPICAL INFORMATION NEEDS USING NEURAL ENTITY-ORIENTED INFORMATION RETRIEVAL AND EXTRACTION
In the modern world, search engines are an integral part of human lives. The field of Information Retrieval (IR) is concerned with finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need (query) from within large collections (usually stored on computers). The search engine then displays a ranked list of results relevant to our query. Traditional document retrieval algorithms match a query to a document using the overlap of words in both. However, the last decade has seen the focus shifting to leveraging the rich semantic information available in the form of entities. Entities are uniquely identifiable objects or things such as places, events, diseases, etc. that exist in the real or fictional world. Entity-oriented search systems leverage the semantic information associated with entities (e.g., names, types, etc.) to better match documents to queries. Web search engines would provide better search results if they understand the meaning of a query.
This dissertation advances the state-of-the-art in IR by developing novel algorithmsthat understand text (query, document, question, sentence, etc.) at the semantic level. To this end, this dissertation aims to understand the fine-grained meaning of entities from the context in which the entities have been mentioned, for example, “oysters” in the context of food versus ecosystems. Further, we aim to automatically learn (vector) representations of entities that incorporate this fine-grained knowledge and knowledge about the query. This work refines the automatic understanding of text passages using deep learning, a modern artificial intelligence paradigm.
This dissertation utilized the semantic information extracted from entities to retrieve materials (text and entities) relevant to a query. The interplay between text and entities in the text is studied by addressing three related prediction problems: (1) Identify entities that are relevant for the query, (2) Understand an entity’s meaning in the context of the query, and (3) Identify text passages that elaborate the connection between the query and an entity.
The research presented in this dissertation may be integrated into a larger system de-signed for answering complex topical queries such as dark chocolate health benefits which require the search engine to automatically understand the connections between the query and the relevant material, thus transforming the search engine into an answering engine
Improving Question Generation with Sentence-level Semantic Matching and Answer Position Inferring
Taking an answer and its context as input, sequence-to-sequence models have
made considerable progress on question generation. However, we observe that
these approaches often generate wrong question words or keywords and copy
answer-irrelevant words from the input. We believe that lacking global question
semantics and exploiting answer position-awareness not well are the key root
causes. In this paper, we propose a neural question generation model with two
concrete modules: sentence-level semantic matching and answer position
inferring. Further, we enhance the initial state of the decoder by leveraging
the answer-aware gated fusion mechanism. Experimental results demonstrate that
our model outperforms the state-of-the-art (SOTA) models on SQuAD and MARCO
datasets. Owing to its generality, our work also improves the existing models
significantly.Comment: Revised version of paper accepted to Thirty-fourth AAAI Conference on
Artificial Intelligenc
- …