Search CORE

148 research outputs found

A Large-Scale Dataset for Biomedical Keyphrase Generation

Author: Boudin Florian
Daille Beatrice
Houbre Mael
Publication venue
Publication date: 22/11/2022
Field of study

Keyphrase generation is the task consisting in generating a set of words or phrases that highlight the main topics of a document. There are few datasets for keyphrase generation in the biomedical domain and they do not meet the expectations in terms of size for training generative models. In this paper, we introduce kp-biomed, the first large-scale biomedical keyphrase generation dataset with more than 5M documents collected from PubMed abstracts. We train and release several generative models and conduct a series of experiments showing that using large scale datasets improves significantly the performances for present and absent keyphrase generation. The dataset is available under CC-BY-NC v4.0 license at https://huggingface.co/ datasets/taln-ls2n/kpbiomed.Comment: Accepted at the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI 2022

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Combining compound and single terms under language model framework

Author: Ahmed-Ouamar Rachid
Boughanem Mohand
Hammache Arezki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/03/2013
Field of study

International audienceMost existing Information Retrieval model including probabilistic and vector space models are based on the term independence hypothesis. To go beyond this assumption and thereby capture the semantics of document and query more accurately, several works have incorporated phrases or other syntactic information in IR, such attempts have shown slight benefit, at best. Particularly in language modeling approaches this extension is achieved through the use of the bigram or n-gram models. However, in these models all bigrams/n-grams are considered and weighted uniformly. In this paper we introduce a new approach to select and weight relevant n-grams associated with a document. Experimental results on three TREC test collections showed an improvement over three strongest state-of-the-art model baselines, which are the original unigram language model, the Markov Random Field model, and the positional language model

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

TemaTres: software to implement thesauri

Author: Ferreyra Diego
Gonzales-Aguilar Audilio
Ramírez-Posada María
Publication venue
Publication date: 01/01/2012
Field of study

The documentary language thesaurus, as standard terms, have been an important development for document management using software applications such as TemaTres, which is a controlled vocabulary server. This free application, for use on the web, provides continuous access to the sets of documents on specific topics for study, research and decision making. It presents the installation, functionality and interface that shows the application structure and ways of exploiting TemaTres for visual representation and management o fcontrolled vocabularies

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Scipedia

Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis

Author: Dehdashti Shahram
Gkoumas Dimitris
Li Qiuchi
Melucci Massimo
Song Dawei
Yu Yijun
Publication venue
Publication date: 12/01/2021
Field of study

Video sentiment analysis as a decision-making process is inherently complex, involving the fusion of decisions from multiple modalities and the so-caused cognitive biases. Inspired by recent advances in quantum cognition, we show that the sentiment judgment from one modality could be incompatible with the judgment from another, i.e., the order matters and they cannot be jointly measured to produce a final decision. Thus the cognitive process exhibits "quantum-like" biases that cannot be captured by classical probability theories. Accordingly, we propose a fundamentally new, quantum cognitively motivated fusion strategy for predicting sentiment judgments. In particular, we formulate utterances as quantum superposition states of positive and negative sentiment judgments, and uni-modal classifiers as mutually incompatible observables, on a complex-valued Hilbert space with positive-operator valued measures. Experiments on two benchmarking datasets illustrate that our model significantly outperforms various existing decision level and a range of state-of-the-art content-level fusion approaches. The results also show that the concept of incompatibility allows effective handling of all combination patterns, including those extreme cases that are wrongly predicted by all uni-modal classifiers.Comment: The uploaded version is a preprint of the accepted AAAI-21 pape

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Study of Automatic Metrics for the Evaluation of Natural Language Explanations

Author: Clinciu Miruna
Eshghi Arash
Hastie Helen
Publication venue
Publication date: 15/03/2021
Field of study

As transparency becomes key for robotics and AI, it will be necessary to evaluate the methods through which transparency is provided, including automatically generated natural language (NL) explanations. Here, we explore parallels between the generation of such explanations and the much-studied field of evaluation of Natural Language Generation (NLG). Specifically, we investigate which of the NLG evaluation measures map well to explanations. We present the ExBAN corpus: a crowd-sourced corpus of NL explanations for Bayesian Networks. We run correlations comparing human subjective ratings with NLG automatic measures. We find that embedding-based automatic NLG evaluation methods, such as BERTScore and BLEURT, have a higher correlation with human ratings, compared to word-overlap metrics, such as BLEU and ROUGE. This work has implications for Explainable AI and transparent robotic and autonomous systems.Comment: Accepted at EACL 202

arXiv.org e-Print Archive

Heriot Watt Pure

Edinburgh Research Explorer

Is Meta-Learning the Right Approach for the Cold-Start Problem in Recommender Systems?

Author: Buffelli Davide
Gupta Ashish
Plachouras Vassilis
Strzalka Agnieszka
Publication venue
Publication date: 16/08/2023
Field of study

Recommender systems have become fundamental building blocks of modern online products and services, and have a substantial impact on user experience. In the past few years, deep learning methods have attracted a lot of research, and are now heavily used in modern real-world recommender systems. Nevertheless, dealing with recommendations in the cold-start setting, e.g., when a user has done limited interactions in the system, is a problem that remains far from solved. Meta-learning techniques, and in particular optimization-based meta-learning, have recently become the most popular approaches in the academic research literature for tackling the cold-start problem in deep learning models for recommender systems. However, current meta-learning approaches are not practical for real-world recommender systems, which have billions of users and items, and strict latency requirements. In this paper we show that it is possible to obtaining similar, or higher, performance on commonly used benchmarks for the cold-start problem without using meta-learning techniques. In more detail, we show that, when tuned correctly, standard and widely adopted deep learning models perform just as well as newer meta-learning models. We further show that an extremely simple modular approach using common representation learning techniques, can perform comparably to meta-learning techniques specifically designed for the cold-start setting while being much more easily deployable in real-world applications

arXiv.org e-Print Archive