16 research outputs found

    Using NLP to build the hypertextuel network of a back-of-the-book index

    Full text link
    Relying on the idea that back-of-the-book indexes are traditional devices for navigation through large documents, we have developed a method to build a hypertextual network that helps the navigation in a document. Building such an hypertextual network requires selecting a list of descriptors, identifying the relevant text segments to associate with each descriptor and finally ranking the descriptors and reference segments by relevance order. We propose a specific document segmentation method and a relevance measure for information ranking. The algorithms are tested on 4 corpora (of different types and domains) without human intervention or any semantic knowledge

    An application-oriented terminology evaluation: the case of back-of-the book indexes

    No full text
    4 pagesThis paper addresses the problem of computational terminology evaluation not per se but in a specific application context. This paper describes the evaluation procedure that has been used to assess the validity of our overall indexing approach and the quality of the IndDoc indexing tool. Even if user-oriented extended evaluation is irreplaceable, we argue that early evaluations are possible and they are useful for development guidance

    Cederilic : constitution d'un livret d'un index numérique

    Get PDF
    Nous décrivons une expérience en grandeur réelle de constitution d'un index thématique pour un ouvrage scientifique. Cet ouvrage est constitué d'une sélection de vingt-et-un articles de trois éditions des journées Ingénierie des connaissances (1999-2001). Ce corpus a été traité par l'analyseur SYNTEX puis par le système INDDOC, logiciel dédié à la constitution d'index. Ce travail a été réalisé dans un contexte entièrement numérique, c'est-à-dire à partir de fichiers numériques et pour constituer la collection des articles de l'ouvrage en un ensemble de fichiers HTML au sein duquel l'utilisateur navigue via un navigateur. Nous présentons les principaux problèmes rencontrés et les solutions adoptées.ingénierie des connaissances;livre numérique;indexation;acquisition des connaissances à partir de textes;structuration de terminologie;XML;DTD Docbook

    Le réseau terminologique, un élément central pour la construction d'index de document

    No full text
    Cet article montre comment les outils de structuration de terminologie qui établissent des liens sémantiques entre termes (hyperonymie, synonymie...) contribuent à la construction des index de documents

    Semi-automatic method for the construction of a document index

    No full text
    This paper deals with document indexes, which somteimes appear at the back of books and list their main subjects. It shows how recent advances in natural language processing may help the indexing process

    Building back-of-the-book indexes

    No full text
    24 pagesThis paper presents an original natural language processing (NLP) approach for building of back-of-the-book indexes. Our indexing system, IndDoc, exploits some terminological tools and automatically builds an index draft out of the analysis of the document text. The indexer then has to validate that index draft through a dedicated interface. This approach has been tested on several documents, with promising results. Relying on our experience in developing and testing the IndDoc indexing system, we aim at assessing the contribution of terminological analysis as well as the level of maturity that computational terminology has reached, in the indexing perspective

    Evaluer le réseau terminologique d'un index de document

    No full text
    This paper shows how terminological tools are concretely through the specific task of building back-of-the-book indexes

    L'index de fin de livre, une forme de résumé indicatif ?

    No full text
    Back-of-the-book indexes are traditional devices that help readers to get access to document content. Such indexes present the book topics in a different form and order than in the document itself. In this paper, we show that back-of-the-book indexes, which give a synthetic view over the document content, are quite similar to indicative summaries. The new methods that are developed for the fine-grained indexing of documents can be compared with summarization technics. Besides their similarities, both types of tools belong to different traditions and the underlying methods are quite different. The confrontation helps to understand the specificity of each approach

    Une mesure de pertinence pour le tri de l'information dans un index de fin de livre

    No full text
    This paper presents the IndDoc system that assists back-of-the-book indexing. One of the indexing challenge deals with information selection: which are the most relevant headings and references. This papers shows how this selection is handled in IndDoc. It presents a metric for the ranking of headings and , for each heading, for its references