9 research outputs found
Searching and Visualization of References in Research Documents
This research aims to develop a module for information retrieval that can trace references from bibliography entries of research documents, specifically those based on Bogor Agricultural University (IPB)’s writing guidelines. A total of 242 research documents in PDF from the Department of Computer Science IPB were used to generate parsing patterns to extract the bibliography entries. With modified ParaTools, automatic extraction of bibliography entries was performed on text files generated from the PDF files. The entries are stored in a database that is used to visualize author relationship as graphs. This module is supplemented by an information retrieval system based on Sphinx search system and also provides information of authors’ publications and citations. Evaluation showed that (1) bibliography entry extraction missed only 5.37% bibliography entries caused by incorrect bibliography formatting, (2) 91.54% bibliography entry attributes could be identified correctly, and (3) 90.31% entries were successfully connected to other documents
Extraction de citations contenues dans des documents brevet
International audienceLe présent article s'inscrit dans une démarche générale d'élaboration d'outils et de méthodes d'analyse permettant de caractériser les activités scientifiques et techniques. Le nombre de publications scientifiques numériques est de plus en plus important. Nous nous intéressons plus particulièrement ici au repérage et à l'extraction automatique de citations et de références contenues dans des documents, en anglais, de type brevet d'inventions. La méthode utilisée repose sur une approche symbolique qui fait appel à la création et l'utilisation combinée de dictionnaires électroniques et de grammaires locales. L'outil de traitement de corpus Unitex est utilisé pour l'élaboration et l'application de ces ressources linguistiques à un corpus d'étude
Automatic construction of a TMF Terminological Database using a transducer cascade
International audienceThe automatic development of termino-logical databases, especially in a standardized format, has a crucial aspect for multiple applications related to technical and scientific knowledge that requires semantic and terminological descriptions covering multiple domains. In this context, we have two challenges: the first is the automatic extraction of terms in order to build a terminological database, and the second challenge is their normalization into a standardized format. To deal with these challenges, we propose an approach based on a cascade of transducers performed using CasSys tool of Unitex platform that benefits from both: the success of the rule-based approach for the extraction of terms, and the performance of the TMF standard for the representation of terms. We have tested and evaluated our approach on an Arabic scientific and technical documents for the Elevator domain and the results are very encouraging
Meta-Metadata: An Information Semantic Language and Software Architecture for Collection Visualization Application
Information collection and discovery tasks involve aggregation and manipulation
of information resources. An information resource is a location from which a human
gathers data to contribute to his/her understanding of something significant. Repositories
of information resources include the Google search engine, the ACM Digital Library,
Wikipedia, Flickr, and IMDB. Information discovery tasks involve having new ideas in
contexts of information collecting.
The information one needs to collect is large and diverse and hard to keep track
of. The heterogeneity and scale also make difficult writing software to support
information collection and discovery tasks. Metadata is a structured means for
describing information resources. It forms the basis of digital libraries and search
engines.
As metadata is often called, "data about data," we define meta-metadata as a
formal means for describing metadata as an XML based language. We consider the
lifecycle of metadata in information collection and discovery tasks and develop a metametadata
architecture which deals with the data structures for representation of metadata
inside programs, extraction from information resources, rules for presentation to users, and logic that defines how an application needs to operate on metadata. Semantic
actions for an information resource collection are steps taken to generate representative
objects, including formation of iconographic image and text surrogates, associated with
metadata.
The meta-metadata language serves as a layer of abstraction between information
resources, power users, and application developers. A power user can enhance an
existing collection visualization application by authoring meta-metadata for a new
information resource without modifying the application source code. The architecture
provides a set of interfaces for semantic actions which different information discovery
and visualization applications can implement according to their own custom
requirements. Application developers can modify the implementation of these semantic
actions to change the behavior of their application, regardless of the information
resource.
We have used our architecture in combinFormation, an information discovery
and collection visualization application and validated it through a user study