9 research outputs found

    Searching and Visualization of References in Research Documents

    Get PDF
    This research aims to develop a module for information retrieval that can trace references from bibliography entries of research documents, specifically those based on Bogor Agricultural University (IPB)’s writing guidelines. A total of 242 research documents in PDF from the Department of Computer Science IPB were used to generate parsing patterns to extract the bibliography entries. With modified ParaTools, automatic extraction of bibliography entries was performed on text files generated from the PDF files. The entries are stored in a database that is used to visualize author relationship as graphs. This module is supplemented by an information retrieval system based on Sphinx search system and also provides information of authors’ publications and citations. Evaluation showed that (1) bibliography entry extraction missed only 5.37% bibliography entries caused by incorrect bibliography formatting, (2) 91.54% bibliography entry attributes could be identified correctly, and (3) 90.31% entries were successfully connected to other documents

    Extraction de citations contenues dans des documents brevet

    Get PDF
    International audienceLe présent article s'inscrit dans une démarche générale d'élaboration d'outils et de méthodes d'analyse permettant de caractériser les activités scientifiques et techniques. Le nombre de publications scientifiques numériques est de plus en plus important. Nous nous intéressons plus particulièrement ici au repérage et à l'extraction automatique de citations et de références contenues dans des documents, en anglais, de type brevet d'inventions. La méthode utilisée repose sur une approche symbolique qui fait appel à la création et l'utilisation combinée de dictionnaires électroniques et de grammaires locales. L'outil de traitement de corpus Unitex est utilisé pour l'élaboration et l'application de ces ressources linguistiques à un corpus d'étude

    Automatic construction of a TMF Terminological Database using a transducer cascade

    Get PDF
    International audienceThe automatic development of termino-logical databases, especially in a standardized format, has a crucial aspect for multiple applications related to technical and scientific knowledge that requires semantic and terminological descriptions covering multiple domains. In this context, we have two challenges: the first is the automatic extraction of terms in order to build a terminological database, and the second challenge is their normalization into a standardized format. To deal with these challenges, we propose an approach based on a cascade of transducers performed using CasSys tool of Unitex platform that benefits from both: the success of the rule-based approach for the extraction of terms, and the performance of the TMF standard for the representation of terms. We have tested and evaluated our approach on an Arabic scientific and technical documents for the Elevator domain and the results are very encouraging

    Improved bibliographic reference parsing based on repeated patterns

    Get PDF
    uploaded by Plaz

    Meta-Metadata: An Information Semantic Language and Software Architecture for Collection Visualization Application

    Get PDF
    Information collection and discovery tasks involve aggregation and manipulation of information resources. An information resource is a location from which a human gathers data to contribute to his/her understanding of something significant. Repositories of information resources include the Google search engine, the ACM Digital Library, Wikipedia, Flickr, and IMDB. Information discovery tasks involve having new ideas in contexts of information collecting. The information one needs to collect is large and diverse and hard to keep track of. The heterogeneity and scale also make difficult writing software to support information collection and discovery tasks. Metadata is a structured means for describing information resources. It forms the basis of digital libraries and search engines. As metadata is often called, "data about data," we define meta-metadata as a formal means for describing metadata as an XML based language. We consider the lifecycle of metadata in information collection and discovery tasks and develop a metametadata architecture which deals with the data structures for representation of metadata inside programs, extraction from information resources, rules for presentation to users, and logic that defines how an application needs to operate on metadata. Semantic actions for an information resource collection are steps taken to generate representative objects, including formation of iconographic image and text surrogates, associated with metadata. The meta-metadata language serves as a layer of abstraction between information resources, power users, and application developers. A power user can enhance an existing collection visualization application by authoring meta-metadata for a new information resource without modifying the application source code. The architecture provides a set of interfaces for semantic actions which different information discovery and visualization applications can implement according to their own custom requirements. Application developers can modify the implementation of these semantic actions to change the behavior of their application, regardless of the information resource. We have used our architecture in combinFormation, an information discovery and collection visualization application and validated it through a user study
    corecore