2,636 research outputs found

    Mining question-answer pairs from web forum: a survey of challenges and resolutions

    Get PDF
    Internet forums, which are also known as discussion boards, are popular web applications. Members of the board discuss issues and share ideas to form a community within the board, and as a result generate huge amount of content on different topics on daily basis. Interest in information extraction and knowledge discovery from such sources has been on the increase in the research community. A number of factors are limiting the potentiality of mining knowledge from forums. Lexical chasm or lexical gap that renders some Natural Language Processing techniques (NLP) less effective, Informal tone that creates noisy data, drifting of discussion topic that prevents focused mining and asynchronous issue that makes it difficult to establish post-reply relationship are some of the problems that need to be addressed. This survey introduces these challenges within the framework of question answering. The survey provides description of the problems; cites and explores useful publications to the reader for further examination; provides an overview of resolution strategies and findings relevant to the challenges

    A Critical Look at the Evaluation of Knowledge Graph Question Answering

    Get PDF
    PhD thesis in Information technologyThe field of information retrieval (IR) is concerned with systems that “make a given stored collection of information items available to a user population” [111]. The way in which information is made available to the user depends on the formulation of this broad concern of IR into specific tasks by which a system should address a user’s information need [85]. The specific IR task also dictates how the user may express their information need. The classic IR task is ad hoc retrieval, where the user issues a query to the system and gets in return a list of documents ranked by estimated relevance of each document to the query [85]. However, it has long been acknowledged that users are often looking for answers to questions, rather than an entire document or ranked list of documents [17, 141]. Question answering (QA) is thus another IR task; it comes in many flavors, but overall consists of taking in a user’s natural language (NL) question and returning an answer. This thesis describes work done within the scope of the QA task. The flavor of QA called knowledge graph question answering (KGQA) is taken as the primary focus, which enables QA with factual questions against structured data in the form of a knowledge graph (KG). This means the KGQA system addresses a structured representation of knowledge rather than—as in other QA flavors—an unstructured prose context. KGs have the benefit that given some identified entities or predicates, all associated properties are available and relationships can be utilized. KGQA then enables users to access structured data using only NL questions and without requiring formal query language expertise. Even so, the construction of satisfactory KGQA systems remains a challenge. Machine learning with deep neural networks (DNNs) is a far more promising approach than manually engineering retrieval models [29, 56, 130]. The current era dominated by DNNs began with seminal work on computer vision, where the deep learning paradigm demonstrated its first cases of “superhuman” performance [32, 71]. Subsequent work in other applications has also demonstrated “superhuman” performance with DNNs [58, 87]. As a result of its early position and hence longer history as a leading application of deep learning, computer vision with DNNs has been bolstered with much work on different approaches towards augmenting [120] or synthesizing [94] additional training data. The difficulty with machine learning approaches to KGQA appears to rest in large part with the limited volume, quality, and variety of available datasets for this task. Compared to labeled image data for computer vision, the problems of data collection, augmentation, and synthesis are only to a limited extent solved for QA, and especially for KGQA. There are few datasets for KGQA overall, and little previous work that has found unsupervised or semi-supervised learning approaches to address the sparsity of data. Instead, neural network approaches to KGQA rely on either fully or weakly supervised learning [29]. We are thus concerned with neural models trained in a supervised setting to perform QA tasks, especially of the KGQA flavor. Given a clear task to delegate to a computational system, it seems clear that we want the task performed as well as possible. However, what methodological elements are important to ensure good system performance within the chosen scope? How should the quality of system performance be assessed? This thesis describes work done to address these overarching questions through a number of more specific research questions. Altogether, we designate the topic of this thesis as KGQA evaluation, which we address in a broad sense, encompassing four subtopics from (1) the impact on performance due to volume of training data provided and (2) the information leakage between training and test splits due to unhygienic data partitioning, through (3) the naturalness of NL questions resulting from a common approach for generating KGQA datasets, to (4) the axiomatic analysis and development of evaluation measures for a specific flavor of the KGQA task. Each of the four subtopics is informed by previous work, but we aim in this thesis to critically examine the assumptions of previous work to uncover, verify, or address weaknesses in current practices surrounding KGQA evaluation

    Generating Synthetic Data for Neural Keyword-to-Question Models

    Full text link
    Search typically relies on keyword queries, but these are often semantically ambiguous. We propose to overcome this by offering users natural language questions, based on their keyword queries, to disambiguate their intent. This keyword-to-question task may be addressed using neural machine translation techniques. Neural translation models, however, require massive amounts of training data (keyword-question pairs), which is unavailable for this task. The main idea of this paper is to generate large amounts of synthetic training data from a small seed set of hand-labeled keyword-question pairs. Since natural language questions are available in large quantities, we develop models to automatically generate the corresponding keyword queries. Further, we introduce various filtering mechanisms to ensure that synthetic training data is of high quality. We demonstrate the feasibility of our approach using both automatic and manual evaluation. This is an extended version of the article published with the same title in the Proceedings of ICTIR'18.Comment: Extended version of ICTIR'18 full paper, 11 page

    Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion

    No full text
    Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user's inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, we develop CONVEX: an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies

    Natural language processing meets business:algorithms for mining meaning from corporate texts

    Get PDF

    EQG-RACE: Examination-Type Question Generation

    Full text link
    Question Generation (QG) is an essential component of the automatic intelligent tutoring systems, which aims to generate high-quality questions for facilitating the reading practice and assessments. However, existing QG technologies encounter several key issues concerning the biased and unnatural language sources of datasets which are mainly obtained from the Web (e.g. SQuAD). In this paper, we propose an innovative Examination-type Question Generation approach (EQG-RACE) to generate exam-like questions based on a dataset extracted from RACE. Two main strategies are employed in EQG-RACE for dealing with discrete answer information and reasoning among long contexts. A Rough Answer and Key Sentence Tagging scheme is utilized to enhance the representations of input. An Answer-guided Graph Convolutional Network (AG-GCN) is designed to capture structure information in revealing the inter-sentences and intra-sentence relations. Experimental results show a state-of-the-art performance of EQG-RACE, which is apparently superior to the baselines. In addition, our work has established a new QG prototype with a reshaped dataset and QG method, which provides an important benchmark for related research in future work. We will make our data and code publicly available for further research.Comment: Accepted by AAAI-202
    • …
    corecore