Search CORE

165 research outputs found

Ontologies and Information Extraction

Author: Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Adapting a general parser to a sublanguage

Author: Aubin Sophie
Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

In this paper, we propose a method to adapt a general parser (Link Parser) to sublanguages, focusing on the parsing of texts in biology. Our main proposal is the use of terminology (identication and analysis of terms) in order to reduce the complexity of the text to be parsed. Several other strategies are explored and finally combined among which text normalization, lexicon and morpho-guessing module extensions and grammar rules adaptation. We compare the parsing results before and after these adaptations

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Using NLP to build the hypertextuel network of a back-of-the-book index

Author: Mekki Touria Aït El
Nazarenko Adeline
Publication venue
Publication date: 01/01/2005
Field of study

Relying on the idea that back-of-the-book indexes are traditional devices for navigation through large documents, we have developed a method to build a hypertextual network that helps the navigation in a document. Building such an hypertextual network requires selecting a list of descriptors, identifying the relevant text segments to associate with each descriptor and finally ranking the descriptors and reference segments by relevance order. We propose a specific document segmentation method and a relevance measure for information ranking. The algorithms are tested on 4 corpora (of different types and domains) without human intervention or any semantic knowledge

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Hal-Diderot

Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches

Author: Aubin Sophie
Nazarenko Adeline
Pyysalo Sampo
Salakoski Tapio
Publication venue
Publication date: 01/01/2006
Field of study

We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically signicant 10% relative decrease in error. The adapted parser is available under an open-source license at http://www.it.utu.fi/biolg

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

Peut-on évaluer les outils d'acquisition de connaissances à partir de textes ?

Author: Nazarenko Adeline
Zargayouna Haifa
Publication venue: HAL CCSD
Publication date: 27/01/2009
Field of study

National audienceMalgré les années de recul et d'expériences accumulées, il est difficile de se faire une idée claire de l'état d'avancement des recherches en acquisition de connaissances à partir de textes. Le manque de protocoles d'évaluation ne facilite pas la comparaison des résultats. Nous développons, dans cet article, la question de l'évaluation des outils d'acquisition de terminologies et d'ontologies en soulignant les princi- pales difficultés et en décrivant nos premières propositions dans ce domaine

HAL Descartes

HAL-Paris 13

Hal-Diderot

Identifier les pronoms anaphoriques et trouver leurs antécédents : l'intérêt de la classification bayésienne

Author: Nazarenko Adeline
Weissenbacher Davy
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 08/06/2007
Field of study

National audienceIn NLP, a traditional distinction opposes linguistically-based systems and knowledge-poor ones, which mainly rely on surface clues. Each approach has its drawbacks and its advantages. In this paper, we propose a new approach based on Bayes Networks that allows to combine both types of information. As a case study, we focus on the anaphora resolution which is known as a difficult NLP problem. We show that our bayesain system performs better than a state-of-the-art one for this task

HAL Descartes

HAL-Paris 13

Hal-Diderot

A bayesian classifier for the recognition of the impersonal occurrences of the 'it' pronoun

Author: Nazarenko Adeline
Weissenbacher Davy
Publication venue: Discourse Anaphora and Anaphor Resolution Colloquium
Publication date: 29/05/2007
Field of study

International audienceThis paper presents a new system that makes the distinction between the impersonal and anaphoric occurrences of the 'it' pronoun. Compared with the state of the art methods, our system relies on the same types of linguistic knowledge but performs better. We argue that this is due to the bayesian model on which it is based: it enables to combine various pieces of knowledge and to exploit even unreliable ones in the process of pronoun occurrence classification

HAL Descartes

HAL-Paris 13

Hal-Diderot

A Bayesian approach combining surface clues and linguistic knowledge: Application to the anaphora resolution problem

Author: Nazarenko Adeline
Weissenbacher Davy
Publication venue: HAL CCSD
Publication date: 29/09/2007
Field of study

International audienceIn NLP, A traditional distinction opposes the linguistically-based systems and the knowledge-poor ones which mainly rely on surface clues. Each approach has its drawbacks and its advantages. In this paper, we propose a new method which is based on Bayes Networks and allows to combine both types of information. As a case study, we focus on the specific task of pronominal anaphora resolution which is known as a difficult NLP problem. We show that our bayesian system performs better than state-of-the art anaphora resolution ones

HAL Descartes

HAL-Paris 13

Hal-Diderot

Towards a semantic model to enhance knowledge sharing and discovery in organic chemistry

Author: Dragos Valentina
Nazarenko Adeline
Publication venue: HAL CCSD
Publication date: 01/02/2009
Field of study

4 pagesInternational audienceThis paper presents the project of an electronic encyclopaedia of organic Chemistry. The goal of the EnCOrE (Encyclopédie de la Chimie Organique Electronique) encyclopaedia is twofold: first, it aims at enhancing knowledge sharing in organic Chemistry, by providing a unified access to the increasing amount of domain resources; second, it aims at improving knowledge discovery in the field of organic Chemistry, by revealing unknown connections between those resources. EnCOrE encyclopaedia is designed for a broad category of users such as students, researchers, and teachers. The paper introduces our vision of the EnCOrE encyclopaedia as an information system. It presents its architecture and describes the semantic model sustaining this architecture. Critical points and challenges related to the EnCOrE project are also discussed

HAL Descartes

HAL-Paris 13

Hal-Diderot

Les entités nommées : des clés linguistiques pour la conceptualisation

Author: Nazarenko Adeline
Omrane Nouha
Szulman Sylvie
Publication venue: HAL CCSD
Publication date: 16/05/2011
Field of study

National audiencePartir de textes pour construire des ontologies présente de nombreux avantages. Cela permet notamment de produire des ontologies enrichies d'informations lexicales qui sont précieuses pour toutes les applications d'accès au contenu. La construction d'ontologies à partir de textes est un domaine qui ne cesse d'évoluer. Même si le processus de construction d'ontologies à partir de textes n'est pas entièrement automatique, l'ingénieur de la connaissance peut être guidé durant le processus de construction. Dans cet article, nous montrons que la détection des entités nommées peut servir à enrichir une ontologie existante ou à démarrer une conceptualisation et pas seulement à peupler une ontologie. Ce propos est illustré par deux cas d'usage portant sur des documents réglementaires et nous évaluons notre approche en comparant les ontologies construites par rapport à des références. Mots clés : construction d'ontologies, entité nommée, conceptualisation

HAL Descartes

HAL-Paris 13

Hal-Diderot