468 research outputs found
Processing Metonymy: a Domain-Model Heuristic Graph Traversal Approach
We address here the treatment of metonymic expressions from a knowledge
representation perspective, that is, in the context of a text understanding
system which aims to build a conceptual representation from texts according to
a domain model expressed in a knowledge representation formalism.
We focus in this paper on the part of the semantic analyser which deals with
semantic composition. We explain how we use the domain model to handle metonymy
dynamically, and more generally, to underlie semantic composition, using the
knowledge descriptions attached to each concept of our ontology as a kind of
concept-level, multiple-role qualia structure.
We rely for this on a heuristic path search algorithm that exploits the
graphic aspects of the conceptual graphs formalism. The methods described have
been implemented and applied on French texts in the medical domain.Comment: 6 pages, LaTeX, one encapsulated PostScript figure, uses colap.sty
(included) and epsf.sty (available from the cmp-lg macro library). To appear
in Coling-9
LIMSI@ CLEF eHealth 2015-task 2.
International audienceThis paper presents LIMSI’s participation in the User-Centered Health Information Retrieval task (task 2) at the CLEF eHealth 2015 workshop. In our contribution we explored two different strategies to query expansion, i.e. one based on entity recognition using MetaMap and the UMLS, and a second strategy based on disease hypothesis generation using self-constructed external resources such a corpus of Wikipedia pages describing diseases and conditions, and web pages from the Medline Plus health portal. Our best-scoring run was a weighed UMLS-based run which put emphasis on incorporating signs and symptoms recognized in the topic text by MetaMap. This run achieved a P@10 score of 0.262 and nDCG@10 of 0.196, respectively
Constructing Artificial Data for Fine-tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation
Biomedical text tagging systems are plagued by the dearth of labeled training
data. There have been recent attempts at using pre-trained encoders to deal
with this issue. Pre-trained encoder provides representation of the input text
which is then fed to task-specific layers for classification. The entire
network is fine-tuned on the labeled data from the target task. Unfortunately,
a low-resource biomedical task often has too few labeled instances for
satisfactory fine-tuning. Also, if the label space is large, it contains few or
no labeled instances for majority of the labels. Most biomedical tagging
systems treat labels as indexes, ignoring the fact that these labels are often
concepts expressed in natural language e.g. `Appearance of lesion on brain
imaging'. To address these issues, we propose constructing extra labeled
instances using label-text (i.e. label's name) as input for the corresponding
label-index (i.e. label's index). In fact, we propose a number of strategies
for manufacturing multiple artificial labeled instances from a single label.
The network is then fine-tuned on a combination of real and these newly
constructed artificial labeled instances. We evaluate the proposed approach on
an important low-resource biomedical task called \textit{PICO annotation},
which requires tagging raw text describing clinical trials with labels
corresponding to different aspects of the trial i.e. PICO (Population,
Intervention/Control, Outcome) characteristics of the trial. Our empirical
results show that the proposed method achieves a new state-of-the-art
performance for PICO annotation with very significant improvements over
competitive baselines.Comment: International Workshop on Health Intelligence (W3PHIAI-20); AAAI-2
Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers
International audienceThis paper compares the reference annotation of structured named entities in two corpora with different origins and properties. It ad- dresses two questions linked to such a comparison. On the one hand, what specific issues were raised by reusing the same annotation scheme on a corpus that differs from the first in terms of media and that predates it by more than a century? On the other hand, what contrasts were observed in the resulting annotations across the two corpora
Accès mesurés aux sens
On rencontre un besoin croissant d’accès sémantique robuste à des données textuelles volumineuses et hétérogènes. Nous présentons ici en trois grands types les méthodes qui aident à obtenir cet accès, et qui s’appliquent aux mots comme aux textes : découper en unités porteuses de sens, partitionner pour obtenir des catégories thématiques ou sémantiques, et répartir dans des classes prédéfinies.There is a growing need for robust semantic access to large, heterogeneous textual data. We present here under three categories the methods which help to achieve such an access, and which apply both to words and to texts : segmenting into meaning-bearing units, partitioning to obtain thematic or semantic categories, and distributing into predefined classes.Se necesita cada vez más un ecceso semántico a datos textuales voluminosos y heterogéneos que sea robusto. Presentamos aquí tres grandes tipos de métodos que favorecen la obtención a este acceso y que se aplican tanto a los textos como a las palabras : recortar en unidades que transportan el sentido, particionar para obtener categorías temáticas o semánticas, y distribuir por clases predefinidas
Proposal for an Extension of Traditional Named Entitites: from Guidelines to Evaluation, an Overview
International audienceWithin the framework of the construction of a fact database, we defined guidelines to extract named entities, using a taxonomy based on an extension of the usual named entities defini- tion. We thus defined new types of entities with broader coverage including substantive- based expressions. These extended named en- tities are hierarchical (with types and compo- nents) and compositional (with recursive type inclusion and metonymy annotation). Human annotators used these guidelines to annotate a 1.3M word broadcast news corpus in French. This article presents the definition and novelty of extended named entity annotation guide- lines, the human annotation of a global corpus and of a mini reference corpus, and the evalu- ation of annotations through the computation of inter-annotator agreement. Finally, we dis- cuss our approach and the computed results, and outline further work
Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective
In this work, we present the first corpus for German Adverse Drug Reaction
(ADR) detection in patient-generated content. The data consists of 4,169 binary
annotated documents from a German patient forum, where users talk about health
issues and get advice from medical doctors. As is common in social media data
in this domain, the class labels of the corpus are very imbalanced. This and a
high topic imbalance make it a very challenging dataset, since often, the same
symptom can have several causes and is not always related to a medication
intake. We aim to encourage further multi-lingual efforts in the domain of ADR
detection and provide preliminary experiments for binary classification using
different methods of zero- and few-shot learning based on a multi-lingual
model. When fine-tuning XLM-RoBERTa first on English patient forum data and
then on the new German data, we achieve an F1-score of 37.52 for the positive
class. We make the dataset and models publicly available for the community.Comment: Accepted at LREC 202
- …