Search CORE

9 research outputs found

Searching and Visualization of References in Research Documents

Author: Annisa Annisa
Nadirman Firnas
Ridha Ahmad
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/06/2014
Field of study

This research aims to develop a module for information retrieval that can trace references from bibliography entries of research documents, specifically those based on Bogor Agricultural University (IPB)’s writing guidelines. A total of 242 research documents in PDF from the Department of Computer Science IPB were used to generate parsing patterns to extract the bibliography entries. With modified ParaTools, automatic extraction of bibliography entries was performed on text files generated from the PDF files. The entries are stored in a database that is used to visualize author relationship as graphs. This module is supplemented by an information retrieval system based on Sphinx search system and also provides information of authors’ publications and citations. Evaluation showed that (1) bibliography entry extraction missed only 5.37% bibliography entries caused by incorrect bibliography formatting, (2) 91.54% bibliography entry attributes could be identified correctly, and (3) 90.31% entries were successfully connected to other documents

TELKOMNIKA (Telecommunication Computing Electronics and Control)

Extraction de citations contenues dans des documents brevet

Author: Kim A-Young
Kogkitsidou Eleni
Kyriacopoulou Tita
Martineau Claude
Martinez Cristian
Schoen Antoine
Publication venue: HAL CCSD
Publication date: 10/09/2013
Field of study

International audienceLe présent article s'inscrit dans une démarche générale d'élaboration d'outils et de méthodes d'analyse permettant de caractériser les activités scientifiques et techniques. Le nombre de publications scientifiques numériques est de plus en plus important. Nous nous intéressons plus particulièrement ici au repérage et à l'extraction automatique de citations et de références contenues dans des documents, en anglais, de type brevet d'inventions. La méthode utilisée repose sur une approche symbolique qui fait appel à la création et l'utilisation combinée de dictionnaires électroniques et de grammaires locales. L'outil de traitement de corpus Unitex est utilisé pour l'élaboration et l'application de ces ressources linguistiques à un corpus d'étude

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Automatic construction of a TMF Terminological Database using a transducer cascade

Author: Ammar Chihebeddine
Haddar Kais
Romary Laurent
Publication venue: HAL CCSD
Publication date: 07/09/2015
Field of study

International audienceThe automatic development of termino-logical databases, especially in a standardized format, has a crucial aspect for multiple applications related to technical and scientific knowledge that requires semantic and terminological descriptions covering multiple domains. In this context, we have two challenges: the first is the automatic extraction of terms in order to build a terminological database, and the second challenge is their normalization into a standardized format. To deal with these challenges, we propose an approach based on a cascade of transducers performed using CasSys tool of Unitex platform that benefits from both: the success of the rule-based approach for the extraction of terms, and the performance of the TMF standard for the representation of terms. We have tested and evaluated our approach on an Arabic scientific and technical documents for the Elevator domain and the results are very encouraging

INRIA a CCSD electronic archive server

Hal-Diderot

Improved bibliographic reference parsing based on repeated patterns

Author: Böhm Klemens
Sautter Guido
Publication venue
Publication date: 30/04/2014
Field of study

uploaded by Plaz

ZENODO

Meta-Metadata: An Information Semantic Language and Software Architecture for Collection Visualization Application

Author: Mathur Abhinav
Publication venue
Publication date
Field of study

Information collection and discovery tasks involve aggregation and manipulation of information resources. An information resource is a location from which a human gathers data to contribute to his/her understanding of something significant. Repositories of information resources include the Google search engine, the ACM Digital Library, Wikipedia, Flickr, and IMDB. Information discovery tasks involve having new ideas in contexts of information collecting. The information one needs to collect is large and diverse and hard to keep track of. The heterogeneity and scale also make difficult writing software to support information collection and discovery tasks. Metadata is a structured means for describing information resources. It forms the basis of digital libraries and search engines. As metadata is often called, "data about data," we define meta-metadata as a formal means for describing metadata as an XML based language. We consider the lifecycle of metadata in information collection and discovery tasks and develop a metametadata architecture which deals with the data structures for representation of metadata inside programs, extraction from information resources, rules for presentation to users, and logic that defines how an application needs to operate on metadata. Semantic actions for an information resource collection are steps taken to generate representative objects, including formation of iconographic image and text surrogates, associated with metadata. The meta-metadata language serves as a layer of abstraction between information resources, power users, and application developers. A power user can enhance an existing collection visualization application by authoring meta-metadata for a new information resource without modifying the application source code. The architecture provides a set of interfaces for semantic actions which different information discovery and visualization applications can implement according to their own custom requirements. Application developers can modify the implementation of these semantic actions to change the behavior of their application, regardless of the information resource. We have used our architecture in combinFormation, an information discovery and collection visualization application and validated it through a user study

Texas A&M Repository