61,676 research outputs found
Identification of Fertile Translations in Medical Comparable Corpora: a Morpho-Compositional Approach
This paper defines a method for lexicon in the biomedical domain from
comparable corpora. The method is based on compositional translation and
exploits morpheme-level translation equivalences. It can generate translations
for a large variety of morphologically constructed words and can also generate
'fertile' translations. We show that fertile translations increase the overall
quality of the extracted lexicon for English to French translation
Building a semantically annotated corpus of clinical texts
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains
Automatic case acquisition from texts for process-oriented case-based reasoning
This paper introduces a method for the automatic acquisition of a rich case
representation from free text for process-oriented case-based reasoning. Case
engineering is among the most complicated and costly tasks in implementing a
case-based reasoning system. This is especially so for process-oriented
case-based reasoning, where more expressive case representations are generally
used and, in our opinion, actually required for satisfactory case adaptation.
In this context, the ability to acquire cases automatically from procedural
texts is a major step forward in order to reason on processes. We therefore
detail a methodology that makes case acquisition from processes described as
free text possible, with special attention given to assembly instruction texts.
This methodology extends the techniques we used to extract actions from cooking
recipes. We argue that techniques taken from natural language processing are
required for this task, and that they give satisfactory results. An evaluation
based on our implemented prototype extracting workflows from recipe texts is
provided.Comment: Sous presse, publication pr\'evue en 201
TermEval 2020 : shared task on automatic term extraction using the Annotated Corpora for term Extraction Research (ACTER) dataset
The TermEval 2020 shared task provided a platform for researchers to work on automatic term extraction (ATE) with the same dataset: the Annotated Corpora for Term Extraction Research (ACTER). The dataset covers three languages (English, French, and Dutch) and four domains, of which the domain of heart failure was kept as a held-out test set on which final f1-scores were calculated. The aim was to provide a large, transparent, qualitatively annotated, and diverse dataset to the ATE research community, with the goal of promoting comparative research and thus identifying strengths and weaknesses of various state-of-the-art methodologies. The results show a lot of variation between different systems and illustrate how some methodologies reach higher precision or recall, how different systems extract different types of terms, how some are exceptionally good at finding rare terms, or are less impacted by term length. The current contribution offers an overview of the shared task with a comparative evaluation, which complements the individual papers by all participants
- …