6 research outputs found

    Creating a Canonical Scientific and Technical Information Classification System for NCSTRL+

    Get PDF
    The purpose of this paper is to describe the new subject classification system for the NCSTRL+ project. NCSTRL+ is a canonical digital library (DL) based on the Networked Computer Science Technical Report Library (NCSTRL). The current NCSTRL+ classification system uses the NASA Scientific and Technical (STI) subject classifications, which has a bias towards the aerospace, aeronautics, and engineering disciplines. Examination of other scientific and technical information classification systems showed similar discipline-centric weaknesses. Traditional, library-oriented classification systems represented all disciplines, but were too generalized to serve the needs of a scientific and technically oriented digital library. Lack of a suitable existing classification system led to the creation of a lightweight, balanced, general classification system that allows the mapping of more specialized classification schemes into the new framework. We have developed the following classification system to give equal weight to all STI disciplines, while being compact and lightweight

    Augmenting Dublin Core digital library metadata with Dewey Decimal Classification

    Get PDF
    Purpose – The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query. Design/methodology/approach – The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records. Findings – The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies. Research limitations/implications – The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity. Practical implications – The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing. Social implications – The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries. Originality/value – The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger. </jats:sec

    AN EVOLUTIONARY APPROACH TO BIBLIOGRAPHIC CLASSIFICATION

    Get PDF
    This dissertation is research in the domain of information science and specifically, the organization and representation of information. The research has implications for classification of scientific books, especially as dissemination of information becomes more rapid and science becomes more diverse due to increases in multi-, inter-, trans-disciplinary research, which focus on phenomena, in contrast to traditional library classification schemes based on disciplines.The literature review indicates 1) human socio-cultural groups have many of the same properties as biological species, 2) output from human socio-cultural groups can be and has been the subject of evolutionary relationship analyses (i.e., phylogenetics), 3) library and information science theorists believe the most favorable and scientific classification for information packages is one based on common origin, but 4) library and information science classification researchers have not demonstrated a book classification based on evolutionary relationships of common origin.The research project supports the assertion that a sensible book classification method can be developed using a contemporary biological classification approach based on common origin, which has not been applied to a collection of books until now. Using a sample from a collection of earth-science digitized books, the method developed includes a text-mining step to extract important terms, which were converted into a dataset for input into the second step—the phylogenetic analysis. Three classification trees were produced and are discussed. Parsimony analysis, in contrast to distance and likelihood analyses, produced a sensible book classification tree. Also included is a comparison with a classification tree based on a well-known contemporary library classification scheme (the Library of Congress Classification).Final discussions connect this research with knowledge organization and information retrieval, information needs beyond science, and this type of research in context of a unified science of cultural evolution

    Perspectivas do uso do aprendizado de máquina em bibliotecas : uma revisão sistemática de literatura

    Get PDF
    Dissertação (mestrado) — Universidade de Brasília, Faculdade de Estudos Sociais Aplicados, Departamento de Ciência da Informação e Documentação, 2022.O presente trabalho tem por finalidade apresentar as aplicações da Inteligência Artificial, com ênfase em machine learning, em bibliotecas, cujo objetivo principal é mapear benefícios e impactos que o aprendizado de máquina pode oferecer para o desenvolvimento de produtos e serviços em bibliotecas. A fim de atender este objeto, o estudo se pautará em uma pesquisa de caráter qualitativo e quantitativo, com a abordagem exploratória, de natureza pura, por meio do uso da pesquisa bibliográfica. E, para realizar tal investigação, recorre-se à revisão sistemática de literatura, por meio da produção de um protocolo de pesquisa, baseado nas diretrizes propostas por Galvão e Ricarte (2020) para o campo da Ciência da Informação, complementados pelos estudos produzidos por Kitchenham (2004) e Felizardo et al (2017) para o campo da Ciência da Computação. Por fim, conclui-se que este estudo proporciona ao pesquisador refletir e identificar novos fenômenos nas relações interdisciplinares entre a Ciência da Informação e a Inteligência Artificial.The present work aims to present the applications of Artificial Intelligence, with emphasis on machine learning, in libraries, whose main objective is to map benefits and impacts that machine learning can offer for the development of products and services in libraries. In order to meet this object, the study will be based on a qualitative and quantitative research, with an exploratory approach, of a pure nature, through the use of bibliographic research. And, to carry out such an investigation, a systematic literature review is used, through the production of a research protocol, based on the guidelines proposed by Galvão and Ricarte (2020) for the field of Information Science, complemented by studies produced by Kitchenham (2004) and Felizardo et al (2017) for the field of Computer Science. Finally, it is concluded that this study allows the researcher to reflect and identify new phenomena in the interdisciplinary relationships between Information Science and Artificial Intelligence.El presente trabajo tiene como objetivo presentar las aplicaciones de la Inteligencia Artificial, con énfasis en el aprendizaje automático, en las bibliotecas, cuyo principal objetivo es mapear los beneficios e impactos que el aprendizaje automático puede ofrecer para el desarrollo de productos y servicios en las bibliotecas. Para cumplir con este objeto, el estudio se basará en una investigación cualitativa y cuantitativa, con un enfoque exploratorio, de carácter puro, mediante el uso de la investigación bibliográfica. Y, para llevar a cabo esta investigación, se utiliza una revisión sistemática de la literatura, a través de la producción de un protocolo de investigación, basado en las directrices propuestas por Galvão y Ricarte (2020) para el campo de las Ciencias de la Información, complementado con estudios producidos por Kitchenham (2004) y Felizardo et al (2017) para el campo de las Ciencias de la Computación. Finalmente, se concluye que este estudio permite al investigador reflexionar e identificar nuevos fenómenos en las relaciones interdisciplinarias entre las Ciencias de la Información y la Inteligencia Artificial

    Classification management and use in a networked environment : the case of the Universal Decimal Classification

    Get PDF
    In the Internet information space, advanced information retrieval (IR) methods and automatic text processing are used in conjunction with traditional knowledge organization systems (KOS). New information technology provides a platform for better KOS publishing, exploitation and sharing both for human and machine use. Networked KOS services are now being planned and developed as powerful tools for resource discovery. They will enable automatic contextualisation, interpretation and query matching to different indexing languages. The Semantic Web promises to be an environment in which the quality of semantic relationships in bibliographic classification systems can be fully exploited. Their use in the networked environment is, however, limited by the fact that they are not prepared or made available for advanced machine processing. The UDC was chosen for this research because of its widespread use and its long-term presence in online information retrieval systems. It was also the first system to be used for the automatic classification of Internet resources, and the first to be made available as a classification tool on the Web. The objective of this research is to establish the advantages of using UDC for information retrieval in a networked environment, to highlight the problems of automation and classification exchange, and to offer possible solutions. The first research question was is there enough evidence of the use of classification on the Internet to justify further development with this particular environment in mind? The second question is what are the automation requirements for the full exploitation of UDC and its exchange? The third question is which areas are in need of improvement and what specific recommendations can be made for implementing the UDC in a networked environment? A summary of changes required in the management and development of the UDC to facilitate its full adaptation for future use is drawn from this analysis.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore