50,558 research outputs found

    A elaboração e uso de vocabulário controlado em sistema de gerenciamento eletrônico de informações arquivísticas: a experiência da Embrapa Soja.

    Get PDF
    Este estudo tem como objetivo descrever a metodologia de elaboração de vocabulário controlado para representação de conteúdo em um sistema de gerenciamento eletrônico de informações arquivística da Embrapa Soja, na perspectiva da arquivística integrada. Relata os benefícios alcançados no uso de vocabulário controlado para organização, tratamento e recuperação de informações no sistema. Perpassa as questões conceituais de vocabulário controlado e Sistemas de GED - Gerenciamento Eletrônico de Informações. Foi constatada melhora significativa na pesquisa/busca de informações, por conseguinte da adoção de vocabulário controlado, que permite padronização no uso de termos que representam o conteúdo informacional dos documentos. ABSTRACT: This study aims to describe the vocabulary development methodology controlled for content representation on an electronic management system of records to the Embrapa Soja, from the perspective of integrated archiving. Reports the benefits achieved in the use of controlled vocabulary for organizing, processing and retrieval of information in the system. It covers the conceptual issues of controlled vocabulary and GED Systems - Electronic Management of Information. Significant improvement was observed in the research / search for information, therefore the controlled vocabulary of adoption, enabling standardization in the use of terms representing the informational content of the documents

    The SpatialCIM methodology for spatial document coverage disambiguation and the entity recognition process aided by linguistic techniques.

    Get PDF
    Abstract. Nowadays it is becoming more usual for users to take into account the geographical localization of the documents in the retrieval information process. However, the conventional retrieval information systems based on key-word matching do not consider which words can represent geographical entities that are spatially related to other entities in the document. This paper presents the SpatialCIM methodology, which is based on three steps: pre-processing, data expansion and disambiguation. In the pre-processing step, the entity recognition process is carried out with the support of the Rembrandt tool. Additionally, a comparison between the performances regarding the discovery of the location entities in the texts of the Rembrandt tool against the use of a controlled vocabulary corresponding to the Brazilian geographic locations are presented. For the comparison a set of geographic labeled news covering the sugar cane culture in the Portuguese language is used. The results showed a F-measure value increase for the Rembrandt tool from 45% in the non-disambiguated process to 0.50 after disambiguation and from 35% to 38% using the controlled vocabulary. Additionally, the results showed the Rembrandt tool has a minimal amplitude difference between precision and recall, although the controlled vocabulary has always the biggest recall values.GeoDoc 2012, PAKDD 2012

    Converting a Controlled Vocabulary into an Ontology: the Case of GEM

    Get PDF
    The prevalance of digital information raised issues regarding the suitability of conventional library tools for organizing information. The multi-dimensionality of digital resources requires a more versatile and flexible representation to accommodate intelligent information representation and retrieval. Ontologies are used as a solution to such issues in many application domains, mainly due to their ability explicitly to specify the semantics and relations and to express them in a computer understandable language. Conventional knowledge organization tools such as classifications and thesauri resemble ontologies in a way that they define concepts and relationships in a systematic manner, but they are less expressive than ontologies when it comes to machine language. This paper used the controlled vocabulary at the Gateway to Educational Materials (GEM) as an example to address the issues in representing digital resources. The theoretical and methodological framework in this paper serves as the rationale and guideline for converting the GEM controlled vocabulary into an ontology. Compared to the original semantic model of GEM controlled vocabulary, the major difference between the two models lies in the values added through deeper semantics in describing digital objects, both conceptually and relationally

    CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

    Get PDF
    Medical Subject Headings (MeSH) is a controlled vocabulary used by the National Library of Medicine to index medical articles, abstracts, and journals contained within the MEDLINE database. Although MeSH imposes uniformity and consistency in the indexing process, it has been proven that using MeSH indices only result in a small increase in precision over free-text indexing. Moreover, studies have shown that the use of controlled vocabularies in the indexing process is not an effective method to increase semantic relevance in information retrieval. To address the need for semantic relevance, we present an ontology-based information retrieval system for the MEDLINE collection that result in a 37.5% increase in precision when compared to free-text indexing systems. The presented system focuses on the ontology to: provide an alternative to text-representation for medical articles, finding relationships among co-occurring terms in abstracts, and to index terms that appear in text as well as discovered relationships. The presented system is then compared to existing MeSH and Free-Text information retrieval systems. This dissertation provides a proof-of-concept for an online retrieval system capable of providing increased semantic relevance when searching through medical abstracts in MEDLINE

    Enhancing a Taxonomy for Health Information Technology: An Exploratory Study of User Input Towards Folksonomy

    Get PDF
    The U.S. Agency for Healthcare Research and Quality has created a public website to disseminate critical information regarding its health information technology initiative. The website is maintained by AHRQ's Natiomal Resource Center (NRC) for Health Information Technology. In the latest continuous quality improvement project, the NRC used the site's search logs to extract user-generated search phrases. The phrases were then compared to the site's controlled vocabulary with respect to language, grammar, and search precision. Results of the comparison demonstrate that search log data can be a cost-effective way to improve controlled vocabularies as well as information retrieval. User-entered search phrases were found to also share many similarities with folksonomy tags

    Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?

    Get PDF
    The popularity of social tagging has sparked a great deal of debate on whether tags could replace or improve upon professional metadata as descriptors of books and other information objects. In this paper we present a large-scale empirical comparison of the contributions of individual information elements like core bibliographic data, controlled vocabulary terms, reviews, and tags to the retrieval performance. Our comparison is done using a test collection of over 2 million book records with information elements from Amazon, the British Library, the Library of Congress, and LibraryThing. We find that tags and controlled vocabulary terms do not actually outperform each other consistently, but seem to provide complementary contributions: some information needs are best addressed using controlled vocabulary terms whereas other are best addressed using tags.ye

    Cross-concordances: terminology mapping and its effectiveness for information retrieval

    Get PDF
    The German Federal Ministry for Education and Research funded a major terminology mapping initiative, which found its conclusion in 2007. The task of this terminology mapping initiative was to organize, create and manage 'cross-concordances' between controlled vocabularies (thesauri, classification systems, subject heading lists) centred around the social sciences but quickly extending to other subject areas. 64 crosswalks with more than 500,000 relations were established. In the final phase of the project, a major evaluation effort to test and measure the effectiveness of the vocabulary mappings in an information system environment was conducted. The paper reports on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200

    Thesaurus in an Automated Information Retrieval System

    Get PDF
    The world of information retrieval today has changed dramatically, with immense increase in the availability of searchable full text and the increasing availability of powerful search engines. Still search engines retrieve more junk than pinpointed information necessitating the use of more efficient retrieval tools. Hence it is reasonable to ask whether there is any place left for thesauri in the new information retrieval environment. Meanwhile the use of machine-aided or even automatic indexing has also been raising a demand for the use of thesauri. A thesaurus is a tool designed to aid users in finding their way around the vocabulary of a database. In addition to its traditional use as an authority for the terms used in indexing the database it offers reminders to terms the user might not even have considered. In the modern context a thesaurus could be extremely useful to provide controlled access to large collections of text and unstructured information, and to help search engines to have more precision. Intelligent retrieval systems, which integrate statistical and semantic information to retrieve more useful results, could make use of an extensive thesaurus of word types and lationships. Producers who are concerned with providing ndardized subject access to their resources can also make use of thesauri to determine the content of the element(s) allocated to subject metadata. Certain fundamental problems are observed in the basic design of thesauri that make them less than optimally useful for more powerful retrieval scenarios. Still there are both pragmatic and philosophical reasons to be positive about the need for continued use of thesauri. And it is hoped that controlled vocabularies will be the foundation of next-generation web sites and Intranets
    corecore