7 research outputs found

    Vocabulário controlado e indexação social de imagens de arquitetura: um sistema de organização do conhecimento em ambiente colaborativo

    Get PDF
    This paper aims to report the research carried out for the development of a controlled vocabulary in a collaborative web environment, which allows social indexing by tagging the images posted by both the personal user and the institutional user. Created for the preservation and dissemination of Brazilian architecture images, Arquigrafia is also a social network formed by students, teachers, researchers, professionals and others interested in architecture and urban spaces photography. Thus, it was necessary to analyze the list of tags to improve the consistency of indexing, seeking a conceptual organization of domain terms from the application of terminological methodology, aiming at the alignment of vocabulary terms under construction with other knowledge organization systems.El objetivo de este trabajo es informar la investigación realizada para el desarrollo de un vocabulario controlado en un entorno web colaborativo, que permite la indexación social al etiquetar las imágenes publicadas tanto por el usuario personal como por el usuario institucional. Creado para la preservación y difusión de imágenes de la arquitectura brasileña, el Arquigrafia es también una red social formada por estudiantes, profesores, investigadores, profesionales y aquellos interesados en fotografías de Arquitectura y Espacios Urbanos. Por lo tanto, fue necesario analizar la lista de etiquetas para mejorar la consistencia de la indexación, buscando una organización conceptual de los términos del dominio a partir de la aplicación de la metodología terminológica, y también con el objetivo de alinear los términos del vocabulario en construcción con otros sistemas de organización del conocimiento.Este trabalho tem por objetivo relatar a pesquisa realizada para o desenvolvimento de um vocabulário controlado em ambiente colaborativo web, o qual permite a indexação social pelo tagueamento das imagens postadas tanto pelo usuário pessoal quanto pelo usuário institucional. Criado para preservação e divulgação de imagens de arquitetura brasileira, o Arquigrafia é também uma rede social formada por estudantes, professores, pesquisadores, profssionais e interessados em fotografias de Arquitetura e Espaços Urbanos. Assim, foi necessária a análise da lista de tags para melhoria da consistência da indexação, buscando-se uma organização conceitual dos termos do domínio a partir da aplicação de metodologia terminológica, visando ainda o alinhamento dos termos do vocabulário em construção com outros sistemas de organização do conhecimento

    Crowdsourcing for image metadata : a comparison between game-generated tags and professional descriptors

    Get PDF
    One way to address the challenge of creating metadata for digitized image collections is to rely on user-created index terms, typically by harvesting tags from the collaborative information services known as folksonomies or by allowing the users to tag directly in the catalog. An alternative method, only recently applied in cultural heritage institutions, is Human Computation Games, a crowdsourcing tool that relies on user-agreement to create valid tags. This study contributes to the research by investigating tags (at various degrees of validation) generated by a Human Computation Game and comparing them to descriptors assigned to the same images by professional indexers. The analysis is done by classifying tags and descriptors by term-category, as well as by measuring overlap on both syntactic (matching on terms) and semantic (matching on meaning) level between the tags and the descriptors. The findings shows that validated tags tend to describe ‘artifacts/objects’ and that game-generated tags typically will represent what is in the picture, rather than what it is about. Descriptors also primarily belonged to this term-category but also had a substantial amount of ‘Proper nouns’, mainly named locations. Tags generated by the game, not validated by player-agreement, had a higher frequency of ‘subjective/narrative’ tags, but also more errors. It was determined that the exact (character-for-character) overlap i.e. the number of common terms compared to the entire pool of tags and descriptors was slightly less than 5% for all types of tags. By extending the analysis to include fuzzy (word-stem) matching, the overlap more than doubled. The semantic overlap was established with thesaurus relations between a sample of tags and descriptors and adapting this - more inclusive - view of overlap resulted in an increase in percentage of tags that were matched to descriptors. More than half of the validated tags had some thesaurus relation to a descriptor added by a professional indexer. Approximately 60% of the thesaurus relations between descriptors and valid tags were either ‘same’ or ‘equivalent’ and roughly 20% were associative and 20% were hierarchical. For the hierarchical relations it was found that tags typically describe images at a less specific level than descriptors.Joint Master Degree in Digital Library Learning (DILL

    Organizing scientific data sets: studying similarities and differences in metadata and subject term creation

    Get PDF
    According to Salo, the metadata entered into repositories aredisorganized and metadata schemes underlying repositories are arcane. This creates a challenging repository environment in regards to personal information management (PIM) and knowledge organization systems (KOSs). This dissertation research is a step towards addressing the need to study information organization of scientific data in more detail. METHODS: A concurrent triangulation mixed methods approach was used to study the descriptive metadata and subject term application of information professionals and scientists when working with two data sets (the bird data set and the hunting data set). Quantitative and qualitative methods were used in combination during study design, data collection, and analysis. RESULTS: A total of 27 participants, 11 information professionals and 16 scientists took part in this study. Descriptive metadata results indicate that information professionals were more likely to use standardized metadata schemes. Scientists did not use library-based standards to organize data in their own collections. Nearly all scientists mentioned how central software was to their overall data organization processes. Subject term application results suggest that the Integrated Taxonomic Information System (ITIS) was the best vocabulary for describing scientific names, while Library of Congress Subject Headings (LCSH) was best for describing topical terms. The two groups applied 45 topical terms to the bird data set and 49 topical terms to the hunting data set. Term overlap, meaning the same terms were applied by both groups, was close to 25% for each data set (27% for the bird data set and 24% for the hunting data set). Unique terms, those terms applied by either group were more widely dispersed. CONCLUSIONS: While there were similarities between the two groups, it is the differences that were the most apparent. Based on this research it is recommended that general repositories use metadata created by information professionals, while domain specific repositories use metadata created by scientists
    corecore