Search CORE

468 research outputs found

Processing Metonymy: a Domain-Model Heuristic Graph Traversal Approach

Author: Bachimont Bruno
Bouaud Jacques
Zweigenbaum Pierre
Publication venue
Publication date: 01/01/1996
Field of study

We address here the treatment of metonymic expressions from a knowledge representation perspective, that is, in the context of a text understanding system which aims to build a conceptual representation from texts according to a domain model expressed in a knowledge representation formalism. We focus in this paper on the part of the semantic analyser which deals with semantic composition. We explain how we use the domain model to handle metonymy dynamically, and more generally, to underlie semantic composition, using the knowledge descriptions attached to each concept of our ontology as a kind of concept-level, multiple-role qualia structure. We rely for this on a heuristic path search algorithm that exploits the graphic aspects of the conceptual graphs formalism. The methods described have been implemented and applied on French texts in the medical domain.Comment: 6 pages, LaTeX, one encapsulated PostScript figure, uses colap.sty (included) and epsf.sty (available from the cmp-lg macro library). To appear in Coling-9

arXiv.org e-Print Archive

CiteSeerX

HAL-Paris 13

Automatic extraction of semantic relations between medical entities: a rule based approach

Author: Ben Abacha Asma
Zweigenbaum Pierre
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

LIMSI@ CLEF eHealth 2015-task 2.

Author: D'hondt Eva
Grau Brigitte
Zweigenbaum Pierre
Publication venue: HAL CCSD
Publication date: 01/09/2015
Field of study

International audienceThis paper presents LIMSI’s participation in the User-Centered Health Information Retrieval task (task 2) at the CLEF eHealth 2015 workshop. In our contribution we explored two different strategies to query expansion, i.e. one based on entity recognition using MetaMap and the UMLS, and a second strategy based on disease hypothesis generation using self-constructed external resources such a corpus of Wikipedia pages describing diseases and conditions, and web pages from the Medline Plus health portal. Our best-scoring run was a weighed UMLS-based run which put emphasis on incorporating signs and symptoms recognized in the topic text by MetaMap. This run achieved a P@10 score of 0.262 and nDCG@10 of 0.196, respectively

Constructing Artificial Data for Fine-tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation

Author: A Rios
J Fürnkranz
M Zeng
P Zweigenbaum
SJ Pan
Publication venue
Publication date: 01/01/2020
Field of study

Biomedical text tagging systems are plagued by the dearth of labeled training data. There have been recent attempts at using pre-trained encoders to deal with this issue. Pre-trained encoder provides representation of the input text which is then fed to task-specific layers for classification. The entire network is fine-tuned on the labeled data from the target task. Unfortunately, a low-resource biomedical task often has too few labeled instances for satisfactory fine-tuning. Also, if the label space is large, it contains few or no labeled instances for majority of the labels. Most biomedical tagging systems treat labels as indexes, ignoring the fact that these labels are often concepts expressed in natural language e.g. `Appearance of lesion on brain imaging'. To address these issues, we propose constructing extra labeled instances using label-text (i.e. label's name) as input for the corresponding label-index (i.e. label's index). In fact, we propose a number of strategies for manufacturing multiple artificial labeled instances from a single label. The network is then fine-tuned on a combination of real and these newly constructed artificial labeled instances. We evaluate the proposed approach on an important low-resource biomedical task called \textit{PICO annotation}, which requires tagging raw text describing clinical trials with labels corresponding to different aspects of the trial i.e. PICO (Population, Intervention/Control, Outcome) characteristics of the trial. Our empirical results show that the proposed method achieves a new state-of-the-art performance for PICO annotation with very significant improvements over competitive baselines.Comment: International Workshop on Health Intelligence (W3PHIAI-20); AAAI-2

arXiv.org e-Print Archive

Crossref

UCL Discovery

Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers

Author: Fort Karen
Galibert Olivier
Grouin Cyril
Kahn Juliette
Rosset Sophie
Zweigenbaum Pierre
Publication venue: HAL CCSD
Publication date: 12/07/2012
Field of study

International audienceThis paper compares the reference annotation of structured named entities in two corpora with different origins and properties. It ad- dresses two questions linked to such a comparison. On the one hand, what specific issues were raised by reusing the same annotation scheme on a corpus that differs from the first in terms of media and that predates it by more than a century? On the other hand, what contrasts were observed in the resulting annotations across the two corpora

HAL-Paris 13

Non-Targeted Analyses for Pesticides Using Deconvolution, Accurate Masses, and Databases – Screening and Confirmation

Author: Chin-Kai Meng
Eva Blanke
Jerry Zweigenbaum
Mike Szelewski
Peter Fürst
Publication venue: 'IntechOpen'
Publication date: 21/10/2011
Field of study

IntechOpen

Crossref

Accès mesurés aux sens

Author: Habert Benoît
Zweigenbaum Pierre
Publication venue: 'OpenEdition'
Publication date: 24/04/2008
Field of study

On rencontre un besoin croissant d’accès sémantique robuste à des données textuelles volumineuses et hétérogènes. Nous présentons ici en trois grands types les méthodes qui aident à obtenir cet accès, et qui s’appliquent aux mots comme aux textes : découper en unités porteuses de sens, partitionner pour obtenir des catégories thématiques ou sémantiques, et répartir dans des classes prédéfinies.There is a growing need for robust semantic access to large, heterogeneous textual data. We present here under three categories the methods which help to achieve such an access, and which apply both to words and to texts : segmenting into meaning-bearing units, partitioning to obtain thematic or semantic categories, and distributing into predefined classes.Se necesita cada vez más un ecceso semántico a datos textuales voluminosos y heterogéneos que sea robusto. Presentamos aquí tres grandes tipos de métodos que favorecen la obtención a este acceso y que se aplican tanto a los textos como a las palabras : recortar en unidades que transportan el sentido, particionar para obtener categorías temáticas o semánticas, y distribuir por clases predefinidas

OpenEdition

Proposal for an Extension of Traditional Named Entitites: from Guidelines to Evaluation, an Overview

Author: Fort Karen
Galibert Olivier
Grouin Cyril
Quintard Ludovic
Rosset Sophie
Zweigenbaum Pierre
Publication venue: HAL CCSD
Publication date: 23/06/2011
Field of study

International audienceWithin the framework of the construction of a fact database, we defined guidelines to extract named entities, using a taxonomy based on an extension of the usual named entities defini- tion. We thus defined new types of entities with broader coverage including substantive- based expressions. These extended named en- tities are hierarchical (with types and compo- nents) and compositional (with recursive type inclusion and metonymy annotation). Human annotators used these guidelines to annotate a 1.3M word broadcast news corpus in French. This article presents the definition and novelty of extended named entity annotation guide- lines, the human annotation of a global corpus and of a mini reference corpus, and the evalu- ation of annotations through the computation of inter-annotator agreement. Finally, we dis- cuss our approach and the computed results, and outline further work

HAL-Paris 13

Hal-Diderot

Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective

Author: Möller Sebastian
Raithel Lisa
Roller Roland
Sapina Oliver
Thomas Philippe
Zweigenbaum Pierre
Publication venue
Publication date: 20/06/2022
Field of study

In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content. The data consists of 4,169 binary annotated documents from a German patient forum, where users talk about health issues and get advice from medical doctors. As is common in social media data in this domain, the class labels of the corpus are very imbalanced. This and a high topic imbalance make it a very challenging dataset, since often, the same symptom can have several causes and is not always related to a medication intake. We aim to encourage further multi-lingual efforts in the domain of ADR detection and provide preliminary experiments for binary classification using different methods of zero- and few-shot learning based on a multi-lingual model. When fine-tuning XLM-RoBERTa first on English patient forum data and then on the new German data, we achieve an F1-score of 37.52 for the positive class. We make the dataset and models publicly available for the community.Comment: Accepted at LREC 202

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1