27 research outputs found
SubjQA: A Dataset for Subjectivity and Review Comprehension
Subjectivity is the expression of internal opinions or beliefs which cannot
be objectively observed or verified, and has been shown to be important for
sentiment analysis and word-sense disambiguation. Furthermore, subjectivity is
an important aspect of user-generated data. In spite of this, subjectivity has
not been investigated in contexts where such data is widespread, such as in
question answering (QA). We therefore investigate the relationship between
subjectivity and QA, while developing a new dataset. We compare and contrast
with analyses from previous work, and verify that findings regarding
subjectivity still hold when using recently developed NLP architectures. We
find that subjectivity is also an important feature in the case of QA, albeit
with more intricate interactions between subjectivity and QA performance. For
instance, a subjective question may or may not be associated with a subjective
answer. We release an English QA dataset (SubjQA) based on customer reviews,
containing subjectivity annotations for questions and answer spans across 6
distinct domains.Comment: EMNLP 2020 Long Paper - Camera Read
Knowledge Base Completion: Baseline strikes back (Again)
Knowledge Base Completion has been a very active area recently, where
multiplicative models have generally outperformed additive and other deep
learning methods -- like GNN, CNN, path-based models. Several recent KBC papers
propose architectural changes, new training methods, or even a new problem
reformulation. They evaluate their methods on standard benchmark datasets -
FB15k, FB15k-237, WN18, WN18RR, and Yago3-10. Recently, some papers discussed
how 1-N scoring can speed up training and evaluation. In this paper, we discuss
how by just applying this training regime to a basic model like Complex gives
near SOTA performance on all the datasets -- we call this model COMPLEX-V2. We
also highlight how various multiplicative methods recently proposed in
literature benefit from this trick and become indistinguishable in terms of
performance on most datasets. This paper calls for a reassessment of their
individual value, in light of these findings
PCoQA: Persian Conversational Question Answering Dataset
Humans seek information regarding a specific topic through performing a
conversation containing a series of questions and answers. In the pursuit of
conversational question answering research, we introduce the PCoQA, the first
\textbf{P}ersian \textbf{Co}nversational \textbf{Q}uestion \textbf{A}nswering
dataset, a resource comprising information-seeking dialogs encompassing a total
of 9,026 contextually-driven questions. Each dialog involves a questioner, a
responder, and a document from the Wikipedia; The questioner asks several
inter-connected questions from the text and the responder provides a span of
the document as the answer for each question. PCoQA is designed to present
novel challenges compared to previous question answering datasets including
having more open-ended non-factual answers, longer answers, and fewer lexical
overlaps. This paper not only presents the comprehensive PCoQA dataset but also
reports the performance of various benchmark models. Our models include
baseline models and pre-trained models, which are leveraged to boost the
performance of the model. The dataset and benchmarks are available at our
Github page
Effects of Layer Freezing when Transferring DeepSpeech to New Languages
In this paper, we train Mozilla's DeepSpeech architecture on German and Swiss
German speech datasets and compare the results of different training methods.
We first train the models from scratch on both languages and then improve upon
the results by using an English pretrained version of DeepSpeech for weight
initialization and experiment with the effects of freezing different layers
during training. We see that even freezing only one layer already improves the
results dramatically
Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations
Learning good representations on multi-relational graphs is essential to knowledge base completion (KBC). In this paper, we propose a new self-supervised training objective for multi-relational
graph representation learning, via simply incorporating relation prediction into the commonly used
1vsAll objective. The new training objective contains not only terms for predicting the subject
and object of a given triple, but also a term for predicting the relation type. We analyse how this
new objective impacts multi-relational learning in KBC: experiments on a variety of datasets and
models show that relation prediction can significantly improve entity ranking, the most widely
used evaluation task for KBC, yielding a 6.1% increase in MRR and 9.9% increase in Hits@1
on FB15k-237 as well as a 3.1% increase in MRR and 3.4% in Hits@1 on Aristo-v4. Moreover,
we observe that the proposed objective is especially effective on highly multi-relational datasets,
i.e. datasets with a large number of predicates, and generates better representations when larger
embedding sizes are used