12 research outputs found
Ontology-Based Sentence Extraction for Answering Why-Question
Most studies on why-question answering system usually used the keyword-based approaches. They rarely involved domain ontology in capturing the semantic of the document contents, especially in detecting the presence of the causal relations. Consequently, the word mismatch problem usually occurs and the system often retrieves not relevant answers. For solving this problem, we propose an answer extraction method by involving the semantic similarity measure, with selective causality detection. The selective causality detection is applied because not all sentences belonging to an answer contain causality. Moreover, the motivation of the use of semantic similarity measure in scoring function is to get more moderate results about the presence of the semantic annotations in a sentence, instead of 0/1. The semantic similarity measure employed is based on the shortest path and the maximum depth of the ontology graph. The evaluation is conducted by comparing the proposed method against the comparable ontology-based methods, i.e., the sentence extraction with Monge-Elkan with 0/1 internal similarity function. The proposed method shows the improvements in term of MRR (16%, 0.79-0.68), P@1 (15%, 0.76-0.66), P@5 (14%, 0.8-0.7), and Recall (19%, 0.86-0.72)
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale
We study the zero-shot transfer capabilities of text matching models on a
massive scale, by self-supervised training on 140 source domains from community
question answering forums in English. We investigate the model performances on
nine benchmarks of answer selection and question similarity tasks, and show
that all 140 models transfer surprisingly well, where the large majority of
models substantially outperforms common IR baselines. We also demonstrate that
considering a broad selection of source domains is crucial for obtaining the
best zero-shot transfer performances, which contrasts the standard procedure
that merely relies on the largest and most similar domains. In addition, we
extensively study how to best combine multiple source domains. We propose to
incorporate self-supervised with supervised multi-task learning on all
available source domains. Our best zero-shot transfer model considerably
outperforms in-domain BERT and the previous state of the art on six benchmarks.
Fine-tuning of our model with in-domain data results in additional large gains
and achieves the new state of the art on all nine benchmarks.Comment: EMNLP-202
What is not in the Bag of Words for Why-QA?
Contains fulltext :
86305.pdf (publisher's version ) (Open Access)16 p
Answer Re-ranking with bilingual LDA and social QA forum corpus
One of the most important tasks for AI is to find valuable information from the Web. In this research, we develop a question answering system for retrieving answers based on a topic model, bilingual latent Dirichlet allocation (Bi-LDA), and knowledge from social question answering (SQA) forum, such as Yahoo! Answers. Regarding question and answer pairs from a SQA forum as a bilingual corpus, a shared topic over question and answer documents is assigned to each term so that the answer re-ranking system can infer the correlation of terms between questions and answers. A query expansion approach based on the topic model obtains a 9% higher top-150 mean reciprocal rank (MRR@150) and a 16% better geometric mean rank as compared to a simple matching system via Okapi/BM25. In addition, this thesis compares the performance in several experimental settings to clarify the factor of the result
A text mining approach for Arabic question answering systems
As most of the electronic information available nowadays on the web is stored as text,developing Question Answering systems (QAS) has been the focus of many individualresearchers and organizations. Relatively, few studies have been produced for extractinganswers to “why” and “how to” questions. One reason for this negligence is that when goingbeyond sentence boundaries, deriving text structure is a very time-consuming and complexprocess. This thesis explores a new strategy for dealing with the exponentially large spaceissue associated with the text derivation task. To our knowledge, to date there are no systemsthat have attempted to addressing such type of questions for the Arabic language.We have proposed two analytical models; the first one is the Pattern Recognizer whichemploys a set of approximately 900 linguistic patterns targeting relationships that hold withinsentences. This model is enhanced with three independent algorithms to discover thecausal/explanatory role indicated by the justification particles. The second model is the TextParser which is approaching text from a discourse perspective in the framework of RhetoricalStructure Theory (RST). This model is meant to break away from the sentence limit. TheText Parser model is built on top of the output produced by the Pattern Recognizer andincorporates a set of heuristics scores to produce the most suitable structure representing thewhole text.The two models are combined together in a way to allow for the development of an ArabicQAS to deal with “why” and “how to” questions. The Pattern Recognizer model achieved anoverall recall of 81% and a precision of 78%. On the other hand, our question answeringsystem was able to find the correct answer for 68% of the test questions. Our results revealthat the justification particles play a key role in indicating intrasentential relations