6 research outputs found
Semi-Supervised Learning for Neural Keyphrase Generation
We study the problem of generating keyphrases that summarize the key points
for a given document. While sequence-to-sequence (seq2seq) models have achieved
remarkable performance on this task (Meng et al., 2017), model training often
relies on large amounts of labeled data, which is only applicable to
resource-rich domains. In this paper, we propose semi-supervised keyphrase
generation methods by leveraging both labeled data and large-scale unlabeled
samples for learning. Two strategies are proposed. First, unlabeled documents
are first tagged with synthetic keyphrases obtained from unsupervised keyphrase
extraction methods or a selflearning algorithm, and then combined with labeled
samples for training. Furthermore, we investigate a multi-task learning
framework to jointly learn to generate keyphrases as well as the titles of the
articles. Experimental results show that our semi-supervised learning-based
methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables
Concept learning consistency under three‑way decision paradigm
Concept Mining is one of the main challenges both in Cognitive Computing and in Machine Learning. The ongoing improvement of solutions to address this issue raises the need to analyze whether the consistency of the learning process is preserved. This paper addresses a particular problem, namely, how the concept mining capability changes under the reconsideration of the hypothesis class. The issue will be raised from the point of view of the so-called Three-Way Decision (3WD) paradigm. The paradigm provides a sound framework to reconsider decision-making processes, including those assisted by Machine Learning. Thus, the paper aims to analyze the influence of 3WD techniques in the Concept Learning Process itself. For this purpose, we introduce new versions of the Vapnik-Chervonenkis dimension. Likewise, to illustrate how the formal approach can be instantiated in a particular model, the case of concept learning in (Fuzzy) Formal Concept Analysis is considered.This work is supported by State Investigation Agency (Agencia Estatal de Investigación), project PID2019-109152GB-100/AEI/10.13039/501100011033. We acknowledge the reviewers for their suggestions and guidance on additional references that have enriched our paper. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature
A Probabilistic Framework for Information Modelling and Retrieval Based on User Annotations on Digital Objects
Annotations are a means to make critical remarks, to explain and
comment things, to add notes and give opinions, and to relate objects.
Nowadays, they can be found in digital libraries and collaboratories,
for example as a building block for scientific discussion on the one
hand or as private notes on the other. We further find them in product
reviews, scientific databases and many "Web 2.0" applications; even
well-established concepts like emails can be regarded as annotations
in a certain sense. Digital annotations can be (textual) comments,
markings (i.e. highlighted parts) and references to other documents
or document parts. Since annotations convey information which is
potentially important to satisfy a user's information need, this
thesis tries to answer the question of how to exploit annotations for
information retrieval. It gives a first answer to the question if
retrieval effectiveness can be improved with annotations.
A survey of the "annotation universe" reveals some facets of
annotations; for example, they can be content level annotations
(extending the content of the annotation object) or meta level ones
(saying something about the annotated object). Besides the annotations
themselves, other objects created during the process of annotation can
be interesting for retrieval, these being the annotated fragments.
These objects are integrated into an object-oriented model comprising
digital objects such as structured documents and annotations as well
as fragments. In this model, the different relationships among the
various objects are reflected. From this model, the basic data
structure for annotation-based retrieval, the structured annotation
hypertext, is derived.
In order to thoroughly exploit the information contained in structured
annotation hypertexts, a probabilistic, object-oriented logical
framework called POLAR is introduced. In POLAR, structured annotation
hypertexts can be modelled by means of probabilistic propositions and
four-valued logics. POLAR allows for specifying several relationships
among annotations and annotated (sub)parts or fragments. Queries can
be posed to extract the knowledge contained in structured annotation
hypertexts. POLAR supports annotation-based retrieval, i.e. document
and discussion search, by applying an augmentation strategy (knowledge
augmentation, propagating propositions from subcontexts like annotations,
or relevance augmentation, where retrieval status values are propagated)
in conjunction with probabilistic inference, where P(d -> q), the probability
that a document d implies a query q, is estimated.
POLAR's semantics is based on possible worlds and accessibility
relations. It is implemented on top of four-valued probabilistic Datalog.
POLAR's core retrieval functionality, knowledge augmentation with
probabilistic inference, is evaluated for discussion and document
search. The experiments show that all relevant POLAR objects, merged
annotation targets, fragments and content annotations, are able to
increase retrieval effectiveness when used as a context for discussion
or document search. Additional experiments reveal that we can determine
the polarity of annotations with an accuracy of around 80%