165 research outputs found

    Ontologies and Information Extraction

    Full text link
    This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

    Adapting a general parser to a sublanguage

    Full text link
    In this paper, we propose a method to adapt a general parser (Link Parser) to sublanguages, focusing on the parsing of texts in biology. Our main proposal is the use of terminology (identication and analysis of terms) in order to reduce the complexity of the text to be parsed. Several other strategies are explored and finally combined among which text normalization, lexicon and morpho-guessing module extensions and grammar rules adaptation. We compare the parsing results before and after these adaptations

    Using NLP to build the hypertextuel network of a back-of-the-book index

    Full text link
    Relying on the idea that back-of-the-book indexes are traditional devices for navigation through large documents, we have developed a method to build a hypertextual network that helps the navigation in a document. Building such an hypertextual network requires selecting a list of descriptors, identifying the relevant text segments to associate with each descriptor and finally ranking the descriptors and reference segments by relevance order. We propose a specific document segmentation method and a relevance measure for information ranking. The algorithms are tested on 4 corpora (of different types and domains) without human intervention or any semantic knowledge

    Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches

    Get PDF
    We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically signicant 10% relative decrease in error. The adapted parser is available under an open-source license at http://www.it.utu.fi/biolg

    Peut-on évaluer les outils d'acquisition de connaissances à partir de textes ?

    No full text
    National audienceMalgré les années de recul et d'expériences accumulées, il est difficile de se faire une idée claire de l'état d'avancement des recherches en acquisition de connaissances à partir de textes. Le manque de protocoles d'évaluation ne facilite pas la comparaison des résultats. Nous développons, dans cet article, la question de l'évaluation des outils d'acquisition de terminologies et d'ontologies en soulignant les princi- pales difficultés et en décrivant nos premières propositions dans ce domaine

    A bayesian classifier for the recognition of the impersonal occurrences of the 'it' pronoun

    No full text
    International audienceThis paper presents a new system that makes the distinction between the impersonal and anaphoric occurrences of the 'it' pronoun. Compared with the state of the art methods, our system relies on the same types of linguistic knowledge but performs better. We argue that this is due to the bayesian model on which it is based: it enables to combine various pieces of knowledge and to exploit even unreliable ones in the process of pronoun occurrence classification

    Identifier les pronoms anaphoriques et trouver leurs antécédents : l'intérêt de la classification bayésienne

    No full text
    National audienceIn NLP, a traditional distinction opposes linguistically-based systems and knowledge-poor ones, which mainly rely on surface clues. Each approach has its drawbacks and its advantages. In this paper, we propose a new approach based on Bayes Networks that allows to combine both types of information. As a case study, we focus on the anaphora resolution which is known as a difficult NLP problem. We show that our bayesain system performs better than a state-of-the-art one for this task

    A Bayesian approach combining surface clues and linguistic knowledge: Application to the anaphora resolution problem

    No full text
    International audienceIn NLP, A traditional distinction opposes the linguistically-based systems and the knowledge-poor ones which mainly rely on surface clues. Each approach has its drawbacks and its advantages. In this paper, we propose a new method which is based on Bayes Networks and allows to combine both types of information. As a case study, we focus on the specific task of pronominal anaphora resolution which is known as a difficult NLP problem. We show that our bayesian system performs better than state-of-the art anaphora resolution ones

    Towards a semantic model to enhance knowledge sharing and discovery in organic chemistry

    No full text
    4 pagesInternational audienceThis paper presents the project of an electronic encyclopaedia of organic Chemistry. The goal of the EnCOrE (Encyclopédie de la Chimie Organique Electronique) encyclopaedia is twofold: first, it aims at enhancing knowledge sharing in organic Chemistry, by providing a unified access to the increasing amount of domain resources; second, it aims at improving knowledge discovery in the field of organic Chemistry, by revealing unknown connections between those resources. EnCOrE encyclopaedia is designed for a broad category of users such as students, researchers, and teachers. The paper introduces our vision of the EnCOrE encyclopaedia as an information system. It presents its architecture and describes the semantic model sustaining this architecture. Critical points and challenges related to the EnCOrE project are also discussed

    Les entités nommées : des clés linguistiques pour la conceptualisation

    No full text
    National audiencePartir de textes pour construire des ontologies présente de nombreux avantages. Cela permet notamment de produire des ontologies enrichies d'informations lexicales qui sont précieuses pour toutes les applications d'accès au contenu. La construction d'ontologies à partir de textes est un domaine qui ne cesse d'évoluer. Même si le processus de construction d'ontologies à partir de textes n'est pas entièrement automatique, l'ingénieur de la connaissance peut être guidé durant le processus de construction. Dans cet article, nous montrons que la détection des entités nommées peut servir à enrichir une ontologie existante ou à démarrer une conceptualisation et pas seulement à peupler une ontologie. Ce propos est illustré par deux cas d'usage portant sur des documents réglementaires et nous évaluons notre approche en comparant les ontologies construites par rapport à des références. Mots clés : construction d'ontologies, entité nommée, conceptualisation
    corecore