7 research outputs found
Vocabulário controlado e indexação social de imagens de arquitetura: um sistema de organização do conhecimento em ambiente colaborativo
This paper aims to report the research carried out for the development of a controlled vocabulary in a collaborative web environment, which allows social indexing by tagging the images posted by both the personal user and the institutional user. Created for the preservation and dissemination of Brazilian architecture images, Arquigrafia is also a social network formed by students, teachers, researchers, professionals and others interested in architecture and urban spaces photography. Thus, it was necessary to analyze the list of tags to improve the consistency of indexing, seeking a conceptual organization of domain terms from the application of terminological methodology, aiming at the alignment of vocabulary terms under construction with other knowledge organization systems.El objetivo de este trabajo es informar la investigación realizada para el desarrollo de un vocabulario controlado en un entorno web colaborativo, que permite la indexación social al etiquetar las imágenes publicadas tanto por el usuario personal como por el usuario institucional. Creado para la preservación y difusión de imágenes de la arquitectura brasileña, el Arquigrafia es también una red social formada por estudiantes, profesores, investigadores, profesionales y aquellos interesados en fotografías de Arquitectura y Espacios Urbanos. Por lo tanto, fue necesario analizar la lista de etiquetas para mejorar la consistencia de la indexación, buscando una organización conceptual de los términos del dominio a partir de la aplicación de la metodología terminológica, y también con el objetivo de alinear los términos del vocabulario en construcción con otros sistemas de organización del conocimiento.Este trabalho tem por objetivo relatar a pesquisa realizada para o desenvolvimento de um vocabulário controlado em ambiente colaborativo web, o qual permite a indexação social pelo tagueamento das imagens postadas tanto pelo usuário pessoal quanto pelo usuário institucional. Criado para preservação e divulgação de imagens de arquitetura brasileira, o Arquigrafia é também uma rede social formada por estudantes, professores, pesquisadores, profssionais e interessados em fotografias de Arquitetura e Espaços Urbanos. Assim, foi necessária a análise da lista de tags para melhoria da consistência da indexação, buscando-se uma organização conceitual dos termos do domínio a partir da aplicação de metodologia terminológica, visando ainda o alinhamento dos termos do vocabulário em construção com outros sistemas de organização do conhecimento
Recommended from our members
Ontological realism, concepts and classification in molecular biology: Development and application of the gene ontology
Purpose – The purpose of this article is to evaluate the development and use of the gene ontology (GO), a scientific vocabulary widely used in molecular biology databases, with particular reference to the relation between the theoretical basis of the GO, and the pragmatics of its application.
Design/methodology/approach – The study uses a combination of bibliometric analysis, content analysis and discourse analysis. These analyses focus on details of the ways in which the terms of the ontology are amended and deleted, and in which they are applied by users.
Findings – Although the GO is explicitly based on an objective realist epistemology, a considerable extent of subjectivity and social factors are evident in its development and use. It is concluded that bio-ontologies could beneficially be extended to be pluralist, while remaining objective, taking a view of concepts closer to that of more traditional controlled vocabularies.
Originality/value – This is one of very few studies which evaluate the development of a formal ontology in relation to its conceptual foundations, and the first to consider the GO in this way
Crowdsourcing for image metadata : a comparison between game-generated tags and professional descriptors
One way to address the challenge of creating metadata for digitized image collections is to rely on user-created index terms, typically by harvesting tags from the collaborative information services known as folksonomies or by allowing the users to tag directly in the catalog. An alternative method, only recently applied in cultural heritage institutions, is Human Computation Games, a crowdsourcing tool that relies on user-agreement to create valid tags.
This study contributes to the research by investigating tags (at various degrees of validation) generated by a Human Computation Game and comparing them to descriptors assigned to the same images by professional indexers. The analysis is done by classifying tags and descriptors by term-category, as well as by measuring overlap on both syntactic (matching on terms) and semantic (matching on meaning) level between the tags and the descriptors.
The findings shows that validated tags tend to describe ‘artifacts/objects’ and that game-generated tags typically will represent what is in the picture, rather than what it is about. Descriptors also primarily belonged to this term-category but also had a substantial amount of ‘Proper nouns’, mainly named locations. Tags generated by the game, not validated by player-agreement, had a higher frequency of ‘subjective/narrative’ tags, but also more errors.
It was determined that the exact (character-for-character) overlap i.e. the number of common terms compared to the entire pool of tags and descriptors was slightly less than 5% for all types of tags. By extending the analysis to include fuzzy (word-stem) matching, the overlap more than doubled.
The semantic overlap was established with thesaurus relations between a sample of tags and descriptors and adapting this - more inclusive - view of overlap resulted in an increase in percentage of tags that were matched to descriptors. More than half of the validated tags had some thesaurus relation to a descriptor added by a professional indexer. Approximately 60% of the thesaurus relations between descriptors and valid tags were either ‘same’ or ‘equivalent’ and roughly 20% were associative and 20% were hierarchical. For the hierarchical relations it was found that tags typically describe images at a less specific level than descriptors.Joint Master Degree in Digital Library Learning (DILL
Recommended from our members
The classification of gene products in the molecular biology domain: Realism, objectivity, and the limitations of the Gene Ontology
Background: Controlled vocabularies in the molecular biology domain exist to facilitate data integration across database resources. One such tool is the Gene Ontology (GO), a classification designed to act as a universal index for gene products from any species. The Gene Ontology is used extensively in annotating gene products and analysing gene expression data, yet very little research exists from a library and information science perspective exploring the design principles, philosophy and social role of ontologies in biology.
Aim: To explore how molecular biologists, in creating the Gene Ontology, devised guidelines and rules for determining which scientific concepts are included in the ontology, and the criteria for how these concepts are represented.
Methods: A domain analysis approach was used to devise a mixed methodology to study the design of the Gene Ontology. Concept analysis of a GO term and a critical discourse analysis of GO developer mailing list texts were used to test whether ontological realism is a tenable basis for constructing objective ontologies. A comparison of the current GO vocabulary construction guidelines and a study of the reasons why GO terms are removed from the ontology further explored the justifications for the design of the Gene Ontology. Finally, a content analysis of published GO papers examined how authors use and cite GO data and terminology.
Results: Gene Ontology terms can be presented according to different epistemologies for concepts, indicating that ontological realism is not the only way objective ontologies can be designed. Social roles and the exercise of power were found to play an important role in determining ontology content, and poor synonym control, a lack of clear warrant for deciding terminology and arbitrary decisions to delete and invent new terms undermine the objectivity and universal applicability of the Gene Ontology. Authors exhibited poor compliance with GO data citation policies, and in re-wording and misquoting GO terminology, risk exacerbating the semantic problems this controlled vocabulary was designed to solve.
Conclusions: The failure of the Gene Ontology to define what is meant by a molecular function, the exercise of power by GO developers in clearing contentious concepts from the ontology, and the strict adherence to ontological realism, which marginalises social and subjective ways of classifying scientific concepts, limits the utility of the ontology as a tool to unify the molecular biology domain. These limitations to the Gene Ontology design could be overcome with the development of lighter, pluralistic, user-controlled ‘open ontologies’ for gene products that can work alongside more traditional, ‘top-down’ developed vocabularies
Organizing scientific data sets: studying similarities and differences in metadata and subject term creation
According to Salo, the metadata entered into repositories aredisorganized and metadata schemes underlying repositories are arcane. This creates a challenging repository environment in regards to personal information management (PIM) and knowledge organization systems (KOSs). This dissertation research is a step towards addressing the need to study information organization of scientific data in more detail. METHODS: A concurrent triangulation mixed methods approach was used to study the descriptive metadata and subject term application of information professionals and scientists when working with two data sets (the bird data set and the hunting data set). Quantitative and qualitative methods were used in combination during study design, data collection, and analysis. RESULTS: A total of 27 participants, 11 information professionals and 16 scientists took part in this study. Descriptive metadata results indicate that information professionals were more likely to use standardized metadata schemes. Scientists did not use library-based standards to organize data in their own collections. Nearly all scientists mentioned how central software was to their overall data organization processes. Subject term application results suggest that the Integrated Taxonomic Information System (ITIS) was the best vocabulary for describing scientific names, while Library of Congress Subject Headings (LCSH) was best for describing topical terms. The two groups applied 45 topical terms to the bird data set and 49 topical terms to the hunting data set. Term overlap, meaning the same terms were applied by both groups, was close to 25% for each data set (27% for the bird data set and 24% for the hunting data set). Unique terms, those terms applied by either group were more widely dispersed. CONCLUSIONS: While there were similarities between the two groups, it is the differences that were the most apparent. Based on this research it is recommended that general repositories use metadata created by information professionals, while domain specific repositories use metadata created by scientists