19,462 research outputs found
Improving Term Extraction with Terminological Resources
Studies of different term extractors on a corpus of the biomedical domain
revealed decreasing performances when applied to highly technical texts. The
difficulty or impossibility of customising them to new domains is an additional
limitation. In this paper, we propose to use external terminologies to
influence generic linguistic data in order to augment the quality of the
extraction. The tool we implemented exploits testified terms at different steps
of the process: chunking, parsing and extraction of term candidates.
Experiments reported here show that, using this method, more term candidates
can be acquired with a higher level of reliability. We further describe the
extraction process involving endogenous disambiguation implemented in the term
extractor YaTeA
Context and Keyword Extraction in Plain Text Using a Graph Representation
Document indexation is an essential task achieved by archivists or automatic
indexing tools. To retrieve relevant documents to a query, keywords describing
this document have to be carefully chosen. Archivists have to find out the
right topic of a document before starting to extract the keywords. For an
archivist indexing specialized documents, experience plays an important role.
But indexing documents on different topics is much harder. This article
proposes an innovative method for an indexing support system. This system takes
as input an ontology and a plain text document and provides as output
contextualized keywords of the document. The method has been evaluated by
exploiting Wikipedia's category links as a termino-ontological resources
- …