91 research outputs found
A Method for Short Message Contextualization: Experiments at CLEF/INEX
International audienceThis paper presents the approach we developed for automatic multi-document summarization applied to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm from smoothing from the local context. Our approach exploits topic-comment structure of a text. Moreover, we developed a graph-based algorithm for sentence reordering. The method has been evaluated at INEX/CLEF tweet contextualization track. We provide the evaluation results over the 4 years of the track. The method was also adapted to snippet retrieval and query expansion. The evaluation results indicate good performance of the approach
Automatic summary evaluation. Roug e modifications
Nowadays there is no common approach to summary. Manual evaluation is expensive and subjective and it is not applicable in real time or on a large corpus. Widely used approaches involve little human efforts and assume comparison with a set of reference summaries. We tried to overcome drawbacks of existing metrics such as ignoring redundant information, synonyms and sentence ordering. Our method combines edit distance, ROUGE-SU and trigrams similarity measure enriched by weights for different parts of speech and synonyms. Since nouns provide the most valuable information, each sentence is mapped into a set of nouns. If the normalized intersection of any pair is greater than a predefined threshold the sentences are penalized. Doing extracts there is no need to analyze sentence structure but sentence ordering is crucial. Sometimes it is impossible to compare sentence order with a gold standard. Therefore similarity between adjacent sentences may be used as a measure of text coherence. Chronological constraint violation should be penalized. Relevance score and readability assessment may be combined in the F-measure. In order to choose the best parameter values machine learning can be applied
IRIT at INEX 2014 : Tweet Contextualization Track
National audienceThe paper presents IRIT's approach used at INEX Tweet Contextualization Track 2014. Systems had to provide a context to a tweet from the perspective of the entity. This year we further modified our approach presented at INEX 2011, 2012 and 2013 underlain by the product of different measures based on smoothing from local context, named entity recognition, part-ofspeech weighting and sentence quality analysis. We introduced two ways to link an entity and a tweet, namely (1) concatenation of the entity and the tweet and (2) usage of the results obtained for the entity as a restriction to filter results retrieved for the tweet. Besides, we examined the influence of topic-comment relationship on contextualization
Query expansion in information retrieval : What can we learn from a deep analysis of queries ?
International audienceInformation retrieval aims at retrieving relevant documents answering a user's need expressed through a query. Users' queries are generally less than 3 words which make challenging to answer correctly. Automatic query expansion (QE) improves the precision in average even if it can decrease the results for some queries. In this paper, we propose a new automatic QE method that estimates the importance of expansion candidate terms by the strength of their relation to the query terms. The method combines local analysis and global analysis of texts. We evaluate the method using international benchmark collections and measures. We found comparable results in average compared to the Bo2 method. However, we show that a deep analysis of initial and expanded queries brings interesting insights that could help for future research in the domain
Multilingual Fake News Detection with Satire
International audienceThe information spread through the Web influences politics, stock markets, public health, people's reputation and brands. For these reasons, it is crucial to filter out false information. In this paper, we compare different automatic approaches for fake news detection based on statistical text analysis on the vaccination fake news dataset provided by the Storyzy company. Our CNN works better for discrimination of the larger classes (fake vs trusted) while the gradient boosting decision tree with feature stacking approach obtained better results for satire detection. We contribute by showing that efficient satire detection can be achieved using merged embeddings and a specific model, at the cost of larger classes. We also contribute by merging redundant information on purpose in order to better predict satire news from fake news and trusted news
Overview of the CLEF 2022 JOKER Task 1: Classify and Explain Instances of Wordplay
As a multidisciplinary field of study, humour remains one of the most difficult aspects of
intercultural communication. Understanding humor often involves understanding implicit
cultural references and/or double meanings, which raises the questions of how to detect
and classify instances of this complex phenomenon. This paper provides an overview of
Pilot Task 1 of the CLEF 2022 JOKER track, where participants had to classify and explain
instances of wordplay. We introduce a new classification of wordplay and a new annotation
scheme for wordplay interpretation suitable both for phrase-based wordplay and wordplay in
named entities. We describe the collection of our data, our task setup, and the evaluation
procedure, and we give a brief overview of the participating teams’ approaches and results
Overview of the CLEF 2022 JOKER Task 3: Pun Translation from English into French
The translation of the pun is one of the most challenging issues for translators and for this
reason has become an intensively studied phenomenon in the field of translation studies.
Translation technology aims to partially or even totally automate the translation process,
but relatively little attention has been paid to the use of computers for the translation of
wordplay. The CLEF 2022 JOKER track aims to build a multilingual corpus of wordplay and
evaluation metrics in order to advance the automation of creative-language translation. This
paper provides an overview of the track’s Pilot Task 3, where the goal is to translate entire
phrases containing wordplay (particularly puns). We describe the data collection, the task
setup, the evaluation procedure, and the participants’ results. We also cover a side product
of our project, a homogeneous monolingual corpus for wordplay detection in French
Design and baseline characteristics of the finerenone in reducing cardiovascular mortality and morbidity in diabetic kidney disease trial
Background: Among people with diabetes, those with kidney disease have exceptionally high rates of cardiovascular (CV) morbidity and mortality and progression of their underlying kidney disease. Finerenone is a novel, nonsteroidal, selective mineralocorticoid receptor antagonist that has shown to reduce albuminuria in type 2 diabetes (T2D) patients with chronic kidney disease (CKD) while revealing only a low risk of hyperkalemia. However, the effect of finerenone on CV and renal outcomes has not yet been investigated in long-term trials.
Patients and Methods: The Finerenone in Reducing CV Mortality and Morbidity in Diabetic Kidney Disease (FIGARO-DKD) trial aims to assess the efficacy and safety of finerenone compared to placebo at reducing clinically important CV and renal outcomes in T2D patients with CKD. FIGARO-DKD is a randomized, double-blind, placebo-controlled, parallel-group, event-driven trial running in 47 countries with an expected duration of approximately 6 years. FIGARO-DKD randomized 7,437 patients with an estimated glomerular filtration rate >= 25 mL/min/1.73 m(2) and albuminuria (urinary albumin-to-creatinine ratio >= 30 to <= 5,000 mg/g). The study has at least 90% power to detect a 20% reduction in the risk of the primary outcome (overall two-sided significance level alpha = 0.05), the composite of time to first occurrence of CV death, nonfatal myocardial infarction, nonfatal stroke, or hospitalization for heart failure.
Conclusions: FIGARO-DKD will determine whether an optimally treated cohort of T2D patients with CKD at high risk of CV and renal events will experience cardiorenal benefits with the addition of finerenone to their treatment regimen.
Trial Registration: EudraCT number: 2015-000950-39; ClinicalTrials.gov identifier: NCT02545049
Contextualisation de textes courts pour la recherche d'information : application à la contextualisation de tweets et à l'expansion automatique de requêtes.
The efficient communication tends to follow the principle of the least effort. According to this principle, using a given language interlocutors do not want to work any harder than necessary to reach understanding. This fact leads to the extreme compression of texts especially in electronic communication, e.g. microblogs, SMS, search queries. However, sometimes these texts are not self-contained and need to be explained since understanding them requires knowledge of terminology, named entities or related facts. The main goal of this research is to provide a context to a user or a system from a textual resource.The first aim of this work is to help a user to better understand a short message by extracting a context from an external source like a text collection, the Web or the Wikipedia by means of text summarization. To this end we developed an approach for automatic multi-document summarization and we applied it to short message contextualization, in particular to tweet contextualization. The proposed method is based on named entity recognition, part-of-speech weighting and sentence quality measuring. In contrast to previous research, we introduced an algorithm for smoothing from the local context. Our approach exploits topic-comment structure of a text. Moreover, we developed a graph-based algorithm for sentence reordering. The method has been evaluated at INEX/CLEF tweet contextualization track. We provide the evaluation results over the 4 years of the track. The method was also adapted to snippet retrieval. The evaluation results indicate good performance of the approach.La communication efficace a tendance à suivre la loi du moindre effort. Selon ce principe, en utilisant une langue donnée les interlocuteurs ne veulent pas travailler plus que nécessaire pour être compris. Ce fait mène à la compression extrême de textes surtout dans la communication électronique, comme dans les microblogues, SMS, ou les requêtes dans les moteurs de recherche. Cependant souvent ces textes ne sont pas auto-suffisants car pour les comprendre, il est nécessaire d’avoir des connaissances sur la terminologie, les entités nommées ou les faits liés. Ainsi, la tâche principale de la recherche présentée dans ce mémoire de thèse de doctorat est de fournir le contexte d’un texte court à l’utilisateur ou au système comme à un moteur de recherche par exemple.Le premier objectif de notre travail est d'aider l’utilisateur à mieux comprendre un message court par l’extraction du contexte d’une source externe comme le Web ou la Wikipédia au moyen de résumés construits automatiquement. Pour cela nous proposons une approche pour le résumé automatique de documents multiples et nous l’appliquons à la contextualisation de messages, notamment à la contextualisation de tweets. La méthode que nous proposons est basée sur la reconnaissance des entités nommées, la pondération des parties du discours et la mesure de la qualité des phrases. Contrairement aux travaux précédents, nous introduisons un algorithme de lissage en fonction du contexte local. Notre approche s’appuie sur la structure thème-rhème des textes. De plus, nous avons développé un algorithme basé sur les graphes pour le ré-ordonnancement des phrases. La méthode a été évaluée à la tâche INEX/CLEF Tweet Contextualization sur une période de 4 ans. La méthode a été également adaptée pour la génération de snippets. Les résultats des évaluations attestent une bonne performance de notre approche
- …