165 research outputs found
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
Adapting a general parser to a sublanguage
In this paper, we propose a method to adapt a general parser (Link Parser) to
sublanguages, focusing on the parsing of texts in biology. Our main proposal is
the use of terminology (identication and analysis of terms) in order to reduce
the complexity of the text to be parsed. Several other strategies are explored
and finally combined among which text normalization, lexicon and
morpho-guessing module extensions and grammar rules adaptation. We compare the
parsing results before and after these adaptations
Using NLP to build the hypertextuel network of a back-of-the-book index
Relying on the idea that back-of-the-book indexes are traditional devices for
navigation through large documents, we have developed a method to build a
hypertextual network that helps the navigation in a document. Building such an
hypertextual network requires selecting a list of descriptors, identifying the
relevant text segments to associate with each descriptor and finally ranking
the descriptors and reference segments by relevance order. We propose a
specific document segmentation method and a relevance measure for information
ranking. The algorithms are tested on 4 corpora (of different types and
domains) without human intervention or any semantic knowledge
Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches
We study the adaptation of Link Grammar Parser to the biomedical sublanguage
with a focus on domain terms not found in a general parser lexicon. Using two
biomedical corpora, we implement and evaluate three approaches to addressing
unknown words: automatic lexicon expansion, the use of morphological clues, and
disambiguation using a part-of-speech tagger. We evaluate each approach
separately for its effect on parsing performance and consider combinations of
these approaches. In addition to a 45% increase in parsing efficiency, we find
that the best approach, incorporating information from a domain part-of-speech
tagger, offers a statistically signicant 10% relative decrease in error. The
adapted parser is available under an open-source license at
http://www.it.utu.fi/biolg
Peut-on évaluer les outils d'acquisition de connaissances à partir de textes ?
National audienceMalgré les années de recul et d'expériences accumulées, il est difficile de se faire une idée claire de l'état d'avancement des recherches en acquisition de connaissances à partir de textes. Le manque de protocoles d'évaluation ne facilite pas la comparaison des résultats. Nous développons, dans cet article, la question de l'évaluation des outils d'acquisition de terminologies et d'ontologies en soulignant les princi- pales difficultés et en décrivant nos premières propositions dans ce domaine
Identifier les pronoms anaphoriques et trouver leurs antécédents : l'intérêt de la classification bayésienne
National audienceIn NLP, a traditional distinction opposes linguistically-based systems and knowledge-poor ones, which mainly rely on surface clues. Each approach has its drawbacks and its advantages. In this paper, we propose a new approach based on Bayes Networks that allows to combine both types of information. As a case study, we focus on the anaphora resolution which is known as a difficult NLP problem. We show that our bayesain system performs better than a state-of-the-art one for this task
A bayesian classifier for the recognition of the impersonal occurrences of the 'it' pronoun
International audienceThis paper presents a new system that makes the distinction between the impersonal and anaphoric occurrences of the 'it' pronoun. Compared with the state of the art methods, our system relies on the same types of linguistic knowledge but performs better. We argue that this is due to the bayesian model on which it is based: it enables to combine various pieces of knowledge and to exploit even unreliable ones in the process of pronoun occurrence classification
A Bayesian approach combining surface clues and linguistic knowledge: Application to the anaphora resolution problem
International audienceIn NLP, A traditional distinction opposes the linguistically-based systems and the knowledge-poor ones which mainly rely on surface clues. Each approach has its drawbacks and its advantages. In this paper, we propose a new method which is based on Bayes Networks and allows to combine both types of information. As a case study, we focus on the specific task of pronominal anaphora resolution which is known as a difficult NLP problem. We show that our bayesian system performs better than state-of-the art anaphora resolution ones
Towards a semantic model to enhance knowledge sharing and discovery in organic chemistry
4 pagesInternational audienceThis paper presents the project of an electronic encyclopaedia of organic Chemistry. The goal of the EnCOrE (Encyclopédie de la Chimie Organique Electronique) encyclopaedia is twofold: first, it aims at enhancing knowledge sharing in organic Chemistry, by providing a unified access to the increasing amount of domain resources; second, it aims at improving knowledge discovery in the field of organic Chemistry, by revealing unknown connections between those resources. EnCOrE encyclopaedia is designed for a broad category of users such as students, researchers, and teachers. The paper introduces our vision of the EnCOrE encyclopaedia as an information system. It presents its architecture and describes the semantic model sustaining this architecture. Critical points and challenges related to the EnCOrE project are also discussed
Les entités nommées : des clés linguistiques pour la conceptualisation
National audiencePartir de textes pour construire des ontologies présente de nombreux avantages. Cela permet notamment de produire des ontologies enrichies d'informations lexicales qui sont précieuses pour toutes les applications d'accès au contenu. La construction d'ontologies à partir de textes est un domaine qui ne cesse d'évoluer. Même si le processus de construction d'ontologies à partir de textes n'est pas entièrement automatique, l'ingénieur de la connaissance peut être guidé durant le processus de construction. Dans cet article, nous montrons que la détection des entités nommées peut servir à enrichir une ontologie existante ou à démarrer une conceptualisation et pas seulement à peupler une ontologie. Ce propos est illustré par deux cas d'usage portant sur des documents réglementaires et nous évaluons notre approche en comparant les ontologies construites par rapport à des références. Mots clés : construction d'ontologies, entité nommée, conceptualisation
- …