148 research outputs found
A Large-Scale Dataset for Biomedical Keyphrase Generation
Keyphrase generation is the task consisting in generating a set of words or
phrases that highlight the main topics of a document. There are few datasets
for keyphrase generation in the biomedical domain and they do not meet the
expectations in terms of size for training generative models. In this paper, we
introduce kp-biomed, the first large-scale biomedical keyphrase generation
dataset with more than 5M documents collected from PubMed abstracts. We train
and release several generative models and conduct a series of experiments
showing that using large scale datasets improves significantly the performances
for present and absent keyphrase generation. The dataset is available under
CC-BY-NC v4.0 license at https://huggingface.co/ datasets/taln-ls2n/kpbiomed.Comment: Accepted at the 13th International Workshop on Health Text Mining and
Information Analysis (LOUHI 2022
Combining compound and single terms under language model framework
International audienceMost existing Information Retrieval model including probabilistic and vector space models are based on the term independence hypothesis. To go beyond this assumption and thereby capture the semantics of document and query more accurately, several works have incorporated phrases or other syntactic information in IR, such attempts have shown slight benefit, at best. Particularly in language modeling approaches this extension is achieved through the use of the bigram or n-gram models. However, in these models all bigrams/n-grams are considered and weighted uniformly. In this paper we introduce a new approach to select and weight relevant n-grams associated with a document. Experimental results on three TREC test collections showed an improvement over three strongest state-of-the-art model baselines, which are the original unigram language model, the Markov Random Field model, and the positional language model
TemaTres: software to implement thesauri
The documentary language thesaurus, as standard terms, have been an important development for document management using software applications such as TemaTres, which is a controlled vocabulary server. This free application, for use on the web, provides continuous access to the sets of documents on specific topics for study, research and decision making. It presents the installation, functionality and interface that shows the application structure and ways of exploiting TemaTres for visual representation and management o fcontrolled vocabularies
Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis
Video sentiment analysis as a decision-making process is inherently complex,
involving the fusion of decisions from multiple modalities and the so-caused
cognitive biases. Inspired by recent advances in quantum cognition, we show
that the sentiment judgment from one modality could be incompatible with the
judgment from another, i.e., the order matters and they cannot be jointly
measured to produce a final decision. Thus the cognitive process exhibits
"quantum-like" biases that cannot be captured by classical probability
theories. Accordingly, we propose a fundamentally new, quantum cognitively
motivated fusion strategy for predicting sentiment judgments. In particular, we
formulate utterances as quantum superposition states of positive and negative
sentiment judgments, and uni-modal classifiers as mutually incompatible
observables, on a complex-valued Hilbert space with positive-operator valued
measures. Experiments on two benchmarking datasets illustrate that our model
significantly outperforms various existing decision level and a range of
state-of-the-art content-level fusion approaches. The results also show that
the concept of incompatibility allows effective handling of all combination
patterns, including those extreme cases that are wrongly predicted by all
uni-modal classifiers.Comment: The uploaded version is a preprint of the accepted AAAI-21 pape
A Study of Automatic Metrics for the Evaluation of Natural Language Explanations
As transparency becomes key for robotics and AI, it will be necessary to
evaluate the methods through which transparency is provided, including
automatically generated natural language (NL) explanations. Here, we explore
parallels between the generation of such explanations and the much-studied
field of evaluation of Natural Language Generation (NLG). Specifically, we
investigate which of the NLG evaluation measures map well to explanations. We
present the ExBAN corpus: a crowd-sourced corpus of NL explanations for
Bayesian Networks. We run correlations comparing human subjective ratings with
NLG automatic measures. We find that embedding-based automatic NLG evaluation
methods, such as BERTScore and BLEURT, have a higher correlation with human
ratings, compared to word-overlap metrics, such as BLEU and ROUGE. This work
has implications for Explainable AI and transparent robotic and autonomous
systems.Comment: Accepted at EACL 202
Is Meta-Learning the Right Approach for the Cold-Start Problem in Recommender Systems?
Recommender systems have become fundamental building blocks of modern online
products and services, and have a substantial impact on user experience. In the
past few years, deep learning methods have attracted a lot of research, and are
now heavily used in modern real-world recommender systems. Nevertheless,
dealing with recommendations in the cold-start setting, e.g., when a user has
done limited interactions in the system, is a problem that remains far from
solved. Meta-learning techniques, and in particular optimization-based
meta-learning, have recently become the most popular approaches in the academic
research literature for tackling the cold-start problem in deep learning models
for recommender systems. However, current meta-learning approaches are not
practical for real-world recommender systems, which have billions of users and
items, and strict latency requirements. In this paper we show that it is
possible to obtaining similar, or higher, performance on commonly used
benchmarks for the cold-start problem without using meta-learning techniques.
In more detail, we show that, when tuned correctly, standard and widely adopted
deep learning models perform just as well as newer meta-learning models. We
further show that an extremely simple modular approach using common
representation learning techniques, can perform comparably to meta-learning
techniques specifically designed for the cold-start setting while being much
more easily deployable in real-world applications
- …