1,745 research outputs found

    Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus

    Full text link
    In Web search, entity-seeking queries often trigger a special Question Answering (QA) system. It may use a parser to interpret the question to a structured query, execute that on a knowledge graph (KG), and return direct entity responses. QA systems based on precise parsing tend to be brittle: minor syntax variations may dramatically change the response. Moreover, KG coverage is patchy. At the other extreme, a large corpus may provide broader coverage, but in an unstructured, unreliable form. We present AQQUCN, a QA system that gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of query syntax, between well-formed questions to short `telegraphic' keyword sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals from KGs and large corpora to directly rank KG entities, rather than commit to one semantic interpretation of the query. AQQUCN models the ideal interpretation as an unobservable or latent variable. Interpretations and candidate entity responses are scored as pairs, by combining signals from multiple convolutional networks that operate collectively on the query, KG and corpus. On four public query workloads, amounting to over 8,000 queries with diverse query syntax, we see 5--16% absolute improvement in mean average precision (MAP), compared to the entity ranking performance of recent systems. Our system is also competitive at entity set retrieval, almost doubling F1 scores for challenging short queries.Comment: Accepted to Information Retrieval Journa

    Cross-lingual question answering

    Get PDF
    Question Answering has become an intensively researched area in the last decade, being seen as the next step beyond Information Retrieval in the attempt to provide more concise and better access to large volumes of available information. Question Answering builds on Information Retrieval technology for a first touch of possible relevant data and uses further natural language processing techniques to search for candidate answers and to look for clues that accept or invalidate the candidates as right answers to the question. Though most of the research has been carried out in monolingual settings, where the question and the answer-bearing documents share the same natural language, current approaches concentrate on cross-language scenarios, where the question and the documents are in different languages. Known in this context and common with the Information Retrieval research are three methods of crossing the language barrier: by translating the question, by translating the documents or by aligning both the question and the documents to a common inter-lingual representation. We present a cross-lingual English to German Question Answering system, for both factoid and definition questions, using a German monolingual system and translating the questions from English to German. Two different techniques of translation are evaluated: • direct translation of the English input question into German and • transfer-based translation, by using an intermediate representation that captures the “meaning” of the original question and is translated into the target language. For both translation techniques two types of translation tools are used: bilingual dictionaries and machine translation. The intermediate representation captures the semantic meaning of the question in terms of Question Type (QType), Expected Answer Type (EAType) and Focus, information that steers the workflow of the question answering process. The German monolingual Question Answering system can answer both factoid and definition questions and is based on several premises: • facts and definitions are usually expressed locally at the level of a sentence and its surroundings; • proximity of concepts within a sentence can be related to their semantic dependency; • for factoid questions, redundancy of candidate answers is a good indicator of their suitability; • definitions of concepts are expressed using fixed linguistic structures such as appositions, modifiers, and abbreviation extensions. Extensive evaluations of the monolingual system have shown that the above mentioned hypothesis holds true in most of the cases when dealing with a fairly large collection of documents, like the one used in the CLEF evaluation forum.Innerhalb der letzten zehn Jahre hat sich Question Answering zu einem intensiv erforschten Themengebiet gewandelt, es stellt den nächsten Schritt des Information Retrieval dar, mit dem Bestreben einen präziseren Zugang zu großen Datenbeständen von verfügbaren Informationen bereitzustellen. Das Question Answering setzt auf die Information Retrieval-Technologie, um mögliche relevante Daten zu suchen, kombiniert mit weiteren Techniken zur Verarbeitung von natürlicher Sprache, um mögliche Antwortkandidaten zu identifizieren und diese anhand von Hinweisen oder Anhaltspunkten entsprechend der Frage als richtige Antwort zu akzeptieren oder als unpassend zu erklären. Während ein Großteil der Forschung den einsprachigen Kontext voraussetzt, wobei Frage- und Antwortdokumente ein und dieselbe Sprache teilen, konzentrieren sich aktuellere Ansätze auf sprachübergreifende Szenarien, in denen die Frage- und Antwortdokumente in unterschiedlichen Sprachen vorliegen. Im Kontext des Information Retrieval existieren drei bekannte Ansätze, die versuchen auf unterschiedliche Art und Weise die Sprachbarriere zu überwinden: durch die Übersetzung der Frage, durch die Übersetzung der Dokumente oder durch eine Angleichung von sowohl der Frage als auch der Dokumente zu einer gemeinsamen interlingualen Darstellung. Wir präsentieren ein sprachübergreifendes Question Answering System vom Englischen ins Deutsche, das sowohl für Faktoid- als auch für Definitionsfragen funktioniert. Dazu verwenden wir ein einsprachiges deutsches System und übersetzen die Fragen vom Englischen ins Deutsche. Zwei unterschiedliche Techniken der Übersetzung werden untersucht: • die direkte Übersetzung der englischen Fragestellung ins Deutsche und • die Abbildungs-basierte Übersetzung, die eine Zwischendarstellung verwendet, um die „Semantik“ der ursprünglichen Frage zu erfassen und in die Zielsprache zu übersetzen. Für beide aufgelisteten Übersetzungstechniken werden zwei Übersetzungsquellen verwendet: zweisprachige Wörterbücher und maschinelle Übersetzung. Die Zwischendarstellung erfasst die Semantik der Frage in Bezug auf die Art der Frage (QType), den erwarteten Antworttyp (EAType) und Fokus, sowie die Informationen, die den Ablauf des Frage-Antwort-Prozesses steuern. Das deutschsprachige Question Answering System kann sowohl Faktoid- als auch Definitionsfragen beantworten und basiert auf mehreren Prämissen: • Fakten und Definitionen werden in der Regel lokal auf Satzebene ausgedrückt; • Die Nähe von Konzepten innerhalb eines Satzes kann auf eine semantische Verbindung hinweisen; • Bei Faktoidfragen ist die Redundanz der Antwortkandidaten ein guter Indikator für deren Eignung; • Definitionen von Begriffen werden mit festen sprachlichen Strukturen ausgedrückt, wie Appositionen, Modifikatoren, Abkürzungen und Erweiterungen. Umfangreiche Auswertungen des einsprachigen Systems haben gezeigt, dass die oben genannten Hypothesen in den meisten Fällen wahr sind, wenn es um eine ziemlich große Sammlung von Dokumenten geht, wie bei der im CLEF Evaluationsforum verwendeten Version

    Question Answering Systems

    Get PDF
    Sistema de NLP capaç d'entendre preguntes i respondre utilitzant l'ontologia de DBpedia

    ARABIC QUESTION ANSWERING ON THE HOLY QUR'AN

    Get PDF
    In this dissertation,we address the need for an intelligent machine reading at scale (MRS) Question Answering (QA) system on the Holy Qur'an, given the permanent interest of inquisitors and knowledge seekers in this sacred and fertile knowledge resource. We adopt a pipelined Retriever-Reader architecture for our system to constitute (to the best of our knowledge) the first extractive MRS QA system on the Holy Qur'an. We also construct QRCD as the first extractive Qur'anic Reading Comprehension Dataset, composed of 1,337 question-passage-answer triplets for 1,093 question-passage pairs that comprise single-answer and multi-answer questions in modern standard Arabic (MSA). We then develop a sparse bag-of-words passage retriever over an index of Qur'anic passages expanded with Qur'an-related MSA resources to help in bridging the gap between questions posed in MSA and their answers in Qur'anic Classical Arabic (CA). Next, we introduce CLassical AraBERT (CL-AraBERT for short), a new AraBERT-based pre-trained model that is further pre-trained on about 1.05B-word Classical Arabic dataset (after being initially pre-trained on MSA datasets), to make it a better fit for NLP tasks on CA text such as the Holy Qur'an. We leverage cross-lingual transfer learning from MSA to CA, and fine-tune CL-AraBERT as a reader using a couple of MSA-based MRC datasets followed by fine-tuning it on our QRCD dataset, to bridge the above MSA-to-CA gap, and circumvent the lack of MRC datasets in CA. Finally, we integrate the retriever and reader components of the end-to-end QA system such that the top k retrieved answer-bearing passages to a given question are fed to the fine-tuned CL-AraBERT reader for answer extraction. We first evaluate the retriever and the reader components independently, before evaluating the end-to-end QA system using Partial Average Precision (pAP). We introduce pAP as an adapted version of the traditional rank-based Average Precision measure, which integrates partial matching in the evaluation over multi-answer and single-answer questions. Our experiments show that a passage retriever over a BM25 index of Qur'anic passages expanded with two MSA resources significantly outperformed a baseline retriever over an index of Qur'anic passages only. Moreover, we empirically show that the fine-tuned CL-AraBERT reader model significantly outperformed the similarly finetuned AraBERT model, which is the baseline. In general, the CL-AraBERT reader performed better on single-answer questions in comparison to multi-answer questions. Moreover, it has also outperformed the baseline over both types of questions. Furthermore, despite the integral contribution of fine-tuning with the MSA datasets in enhancing the performance of the readers, relying exclusively on those datasets (without MRC datasets in CA, e.g., QRCD) may not be sufficient for our reader models. This finding demonstrates the relatively high impact of the QRCD dataset (despite its modest size). As for the QA system, it consistently performed better on single-answer questions in comparison to multi-answer questions. However, our experiments provide enough evidence to suggest that a native BERT-based model architecture fine-tuned on the MRC task may not be intrinsically optimal for multi-answer questions
    • …
    corecore