168 research outputs found

    LIMSI@CLEF eHealth 2017 Task 2: Logistic Regression for Automatic Article Ranking

    Get PDF
    This paper describes the participation of the LIMSI-MIROR team at CLEF eHealth 2017, task 2. The task addresses the automatic ranking of articles in order to assist with the screening process of Diagnostic Test Accuracy (DTA) Systematic Reviews. We used a logistic regression classifier and handled class imbalance using a combination of class reweighting and undersampling. We also experimented with two strategies for relevance feedback. Our best run obtained an overall Average Precision of 0.179 and Work Saved over Sampling @95% Recall of 0.650. This run uses stochastic gradient descent for training but no feature selection or relevance feedback. We observe high performance variation within the queries in the test set. Nonetheless, our results suggest that automatic assistance is promising for ranking the DTA literature as it could reduce the screening workload for review writer by 65% on average

    NLP Community Perspectives on Replicability.

    Get PDF
    International audienceWith recent efforts in drawing attention to the task of replicating and/or reproducing1 results, for example in the context of COLING 2018 and various LREC workshops, the question arises how the NLP community views the topic of replicability in general. Using a survey, in which we involve members of the NLP community, we investigate how our community perceives this topic, its relevance and options for improvement. Based on over two hundred participants, the survey results confirm earlier observations, that successful reproducibility requires more than having access to code and data. Additionally, the results show that the topic has to be tackled from the authors, reviewers and community 's side

    Automatic classification of registered clinical trials towards the Global Burden of Diseases taxonomy of diseases and injuries

    Get PDF
    Includes details on the implementation of MetaMap and IntraMap, prioritization rules, the test set of clinical trials and the classification of the external test set according to the 171 GBD categories. Dataset S1: Expert-based enrichment database for the classification according to the 28 GBD categories. Manual classification of 503 UMLS concepts that could not be mapped to any of the 28 GBD categories. Dataset S2: Expert-based enrichment database for the classification according to the 171 GBD categories. Manual classification of 655 UMLS concepts that could not be mapped to any of the 171 GBD categories, among which 108 could be projected to candidate GBD categories. Table S1: Excluded residual GBD categories for the grouping of the GBD cause list in 171 GBD categories. A grouping of 193 GBD categories was defined during the GBD 2010 study to inform policy makers about the main health problems per country. From these 193 GBD categories, we excluded the 22 residual categories listed in the Table. We developed a classifier for the remaining 171 GBD categories. Among these residual categories, the unique excluded categories in the grouping of 28 GBD categories were “Other infectious diseases” and “Other endocrine, nutritional, blood, and immune disorders”. Table S2: Per-category evaluation of performance of the classifier for the 171 GBD categories plus the “No GBD” category. Number of trials per GBD category from the test set of 2,763 clinical trials. Sensitivities, specificities (in %) and likelihood ratios for each of the 171 GBD categories plus the “No GBD” category for the classifier using the Word Sense Disambiguation server, the expert-based enrichment database and the priority to the health condition field. Table S3: Performance of the 8 versions of the classifier for the 171 GBD categories. Exact-matching and weighted averaged sensitivities and specificities for 8 versions of the classifier for the 171 GBD categories. Exact-matching corresponds to the proportion (in %) of trials for which the automatic GBD classification is correct. Exact-matching was estimated over all trials (N = 2,763), trials concerning a unique GBD category (N = 2,092), trials concerning 2 or more GBD categories (N = 187), and trials not relevant for the GBD (N = 484). The weighted averaged sensitivity and specificity corresponds to the weighted average across GBD categories of the sensitivities and specificities for each GBD category plus the “No GBD” category (in %). The 8 versions correspond to the combinations of the use or not of the Word Sense Disambiguation server during the text annotation, the expert-based enrichment database, and the priority to the health condition field as a prioritization rule. Table S4: Per-category evaluation of the performance of the baseline for the 28 GBD categories plus the “No GBD” category. Number of trials per GBD category from the test set of 2,763 clinical trials. Sensitivities and specificities (in %) of the 28 GBD categories plus the “No GBD” category for the classification of clinical trial records towards GBD categories without using the UMLS knowledge source but based on the recognition in free text of the names of diseases defining in each GBD category only. For the baseline a clinical trial records was classified with a GBD category if at least one of the 291 disease names from the GBD cause list defining that GBD category appeared verbatim in the condition field, the public or scientific titles, separately, or in at least one of these three text fields. (DOCX 84 kb

    CLISTER : un corpus pour la similarité sémantique textuelle dans des cas cliniques en français

    Get PDF
    National audienceNatural Language Processing relies on the availability of annotated corpora for training and evaluating models. There are very few resources for semantic similarity in the clinical domain in French. Herein, we introduce a definition of similarity guided by clinical facts and apply it to the development of a new shared corpus of 1,000 sentence pairs manually annotated with similarity scores. We evaluate the corpus through experiments of automatic similarity measurement. We show that a model of sentence embeddings can capture similarity with state of the art performance on the DEFT STS shared task data set (Spearman=0.8343). We also show that CLISTER is complementary to DEFT STS.Le TAL repose sur la disponibilité de corpus annotés pour l'entraßnement et l'évaluation de modÚles. Il existe trÚs peu de ressources pour la similarité sémantique dans le domaine clinique en français. Dans cette étude, nous proposons une définition de la similarité guidée par l'analyse clinique et l'appliquons au développement d'un nouveau corpus partagé de 1 000 paires de phrases annotées manuellement en scores de similarité. Nous évaluons ensuite le corpus par des expériences de mesure automatique de similarité. Nous montrons ainsi qu'un modÚle de plongements de phrases peut capturer la similarité avec des performances à l'état de l'art sur le corpus DEFT STS (Spearman=0,8343). Nous montrons également que le contenu du corpus CLISTER est complémentaire de celui de DEFT STS

    Use of a Citizen Science Platform for the Creation of a Language Resource to Study Bias in Language Models for French: a case study

    Get PDF
    International audienceThere is a growing interest in the evaluation of bias, fairness and social impact of Natural Language Processing models and tools. However, little resources are available for this task in languages other than English. Translation of resources originally developed for English is a promising research direction. However, there is also a need for complementing translated resources by newly sourced resources in the original languages and social contexts studied. In order to collect a language resource for the study of biases in Language Models for French, we decided to resort to citizen science. We created three tasks on the LanguageARC citizen science platform to assist with the translation of an existing resource from English into French as well as the collection of complementary resources in native French. We successfully collected data for all three tasks from a total of 102 volunteer participants. Participants from different parts of the world contributed and we noted that although calls sent to mailing lists had a positive impact on participation, some participants pointed barriers to contributions due to the collection platform

    French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English

    Get PDF
    International audienceWarning: This paper contains explicit statements of offensive stereotypes which may be upsetting. Much work on biases in natural language processing has addressed biases linked to the social and cultural experience of English speaking individuals in the United States. We seek to widen the scope of bias studies by creating material to measure social bias in language models (LMs) against specific demographic groups in France. We build on the US-centered CrowS-pairs dataset to create a multilingual stereotypes dataset that allows for comparability across languages while also characterizing biases that are specific to each country and language. We introduce 1,677 sentence pairs in French that cover stereotypes in ten types of bias like gender and age. 1,467 sentence pairs are translated from CrowS-pairs and 210 are newly crowdsourced and translated back into English. The sentence pairs contrast stereotypes concerning underadvantaged groups with the same sentence concerning advantaged groups. We find that four widely used language models (three French, one multilingual) favor sentences that express stereotypes in most bias categories. We report on the translation process, which led to a characterization of stereotypes in CrowS-pairs including the identification of US-centric cultural traits. We offer guidelines to further extend the dataset to other languages and cultural environments

    French CrowS-Pairs: Extension à une langue autre que l'anglais d'un corpus de mesure des biais sociétaux dans les modÚles de langue masqués

    Get PDF
    National audienceTo widen the scope of bias studies in natural language processing beyond American English we introduce material for measuring social bias in language models against demographic groups in France. We extend the CrowS-pairs dataset with 1,677 sentence pairs in French that cover stereotypes in ten types of bias. 1,467 sentence pairs are translated from CrowS-pairs and 210 are newly crowdsourced and translated back into English. The sentence pairs contrast stereotypes concerning underadvantaged groups with the same sentence concerning advantaged groups. We find that four widely used language models favor sentences that express stereotypes in most bias categories. We report on the translation process and offer guidelines to further extend the dataset to other languages.Afin de permettre l'Ă©tude des biais en traitement automatique de la langue au delĂ  de l'anglais amĂ©ricain, nous enrichissons le corpus amĂ©ricain CrowS-pairs de 1 677 paires de phrases en français reprĂ©sentant des stĂ©rĂ©otypes portant sur dix catĂ©gories telles que le genre. 1 467 paires de phrases sont traduites Ă  partir de CrowS-pairs et 210 sont nouvellement recueillies puis traduites en anglais. Selon le principe des paires minimales, les phrases du corpus contrastent un Ă©noncĂ© stĂ©rĂ©otypĂ© concernant un groupe dĂ©favorisĂ© et son Ă©quivalent pour un groupe favorisĂ©. Nous montrons que quatre modĂšles de langue favorisent les Ă©noncĂ©s qui expriment des stĂ©rĂ©otypes dans la plupart des catĂ©gories. Nous dĂ©crivons le processus de traduction et formulons des recommandations pour Ă©tendre le corpus Ă  d'autres langues. Attention : Cet article contient des Ă©noncĂ©s de stĂ©rĂ©otypes qui peuvent ĂȘtre choquants

    Reviewing Natural Language Processing Research

    Get PDF
    International audienceThis tutorial will cover the goals, processes, and evaluation of reviewing research in natural language processing. As has been pointed out for years by leading figures in our community (Web-ber, 2007), researchers in the ACL community face a heavy-and growing-reviewing burden. Initiatives to lower this burden have been discussed at the recent ACL general assembly in Florence (ACL 2019) 1. Simultaneously, notable "false negatives"-rejection by our conferences of work that was later shown to be tremendously important after acceptance by other conferences (Church, 2005)-has raised awareness of the fact that our reviewing practices leave something to be desired.. . and we do not often talk about "false positives" with respect to conference papers, but conversations in the hallways at *ACL meetings suggest that we have a publication bias towards papers that report high performance, with perhaps not much else of interest in them (Manning, 2015). It need not be this way. There is good reason to think that reviewing is a learnable (and teachable)

    Automatic inference of indexing rules for MEDLINE

    Get PDF
    This paper describes the use and customization of Inductive Logic Programming (ILP) to infer indexing rules from MEDLINE citations. Preliminary results suggest this method may enhance the subheading attachment module of the Medical Text Indexer, a system for assisting MEDLINE indexers.
    • 

    corecore