17,081 research outputs found
Fact Checking in Community Forums
Community Question Answering (cQA) forums are very popular nowadays, as they
represent effective means for communities around particular topics to share
information. Unfortunately, this information is not always factual. Thus, here
we explore a new dimension in the context of cQA, which has been ignored so
far: checking the veracity of answers to particular questions in cQA forums. As
this is a new problem, we create a specialized dataset for it. We further
propose a novel multi-faceted model, which captures information from the answer
content (what is said and how), from the author profile (who says it), from the
rest of the community forum (where it is said), and from external authoritative
sources of information (external support). Evaluation results show a MAP value
of 86.54, which is 21 points absolute above the baseline.Comment: AAAI-2018; Fact-Checking; Veracity; Community-Question Answering;
Neural Networks; Distributed Representation
A Machine Learning Approach For Opinion Holder Extraction In Arabic Language
Opinion mining aims at extracting useful subjective information from reliable
amounts of text. Opinion mining holder recognition is a task that has not been
considered yet in Arabic Language. This task essentially requires deep
understanding of clauses structures. Unfortunately, the lack of a robust,
publicly available, Arabic parser further complicates the research. This paper
presents a leading research for the opinion holder extraction in Arabic news
independent from any lexical parsers. We investigate constructing a
comprehensive feature set to compensate the lack of parsing structural
outcomes. The proposed feature set is tuned from English previous works coupled
with our proposed semantic field and named entities features. Our feature
analysis is based on Conditional Random Fields (CRF) and semi-supervised
pattern recognition techniques. Different research models are evaluated via
cross-validation experiments achieving 54.03 F-measure. We publicly release our
own research outcome corpus and lexicon for opinion mining community to
encourage further research
Large-Scale Goodness Polarity Lexicons for Community Question Answering
We transfer a key idea from the field of sentiment analysis to a new domain:
community question answering (cQA). The cQA task we are interested in is the
following: given a question and a thread of comments, we want to re-rank the
comments so that the ones that are good answers to the question would be ranked
higher than the bad ones. We notice that good vs. bad comments use specific
vocabulary and that one can often predict the goodness/badness of a comment
even ignoring the question, based on the comment contents only. This leads us
to the idea to build a good/bad polarity lexicon as an analogy to the
positive/negative sentiment polarity lexicons, commonly used in sentiment
analysis. In particular, we use pointwise mutual information in order to build
large-scale goodness polarity lexicons in a semi-supervised manner starting
with a small number of initial seeds. The evaluation results show an
improvement of 0.7 MAP points absolute over a very strong baseline and
state-of-the art performance on SemEval-2016 Task 3.Comment: SIGIR '17, August 07-11, 2017, Shinjuku, Tokyo, Japan; Community
Question Answering; Goodness polarity lexicons; Sentiment Analysi
Adversarial Domain Adaptation for Duplicate Question Detection
We address the problem of detecting duplicate questions in forums, which is
an important step towards automating the process of answering new questions. As
finding and annotating such potential duplicates manually is very tedious and
costly, automatic methods based on machine learning are a viable alternative.
However, many forums do not have annotated data, i.e., questions labeled by
experts as duplicates, and thus a promising solution is to use domain
adaptation from another forum that has such annotations. Here we focus on
adversarial domain adaptation, deriving important findings about when it
performs well and what properties of the domains are important in this regard.
Our experiments with StackExchange data show an average improvement of 5.6%
over the best baseline across multiple pairs of domains.Comment: EMNLP 2018 short paper - camera ready. 8 page
- …