Search CORE

10 research outputs found

Building a DDC Annotated Corpus from OAI Metadata

Author: Horstmann Wolfram
Lösch Mathias
Mehler Alexander
Waltinger Ulli
Publication venue: International Conference on Open Repositories : Proceedings
Publication date: 31/12/2010
Field of study

A frequently overlooked benefit of open access publications is that they are an easy accessible and cost-effective data source for research disciplines like text mining, natural language processing or computational linguistics. In those fields, linguistic data is usually managed in the form of corpora, i.e. machine readable bodies of texts that represent a particular variety of language

BieColl - Bielefeld eCollections

The Role of Classification Information in Open Access Repositories. Current status and future directions

Author: Pieper Dirk
Summann Friedrich
Publication venue
Publication date: 11/09/2015
Field of study

KITopen

Minería de texto en la determinación automática de código Dewey : Una primera aproximación

Author: Araya Jorge Matías
Klenzi Raúl O.
Publication venue
Publication date: 01/05/2014
Field of study

Este trabajo propone una primer aproximación automática del proceso de determinación de codificación Dewey asociado a todo material bibliográfico mediante técnicas de Aprendizaje de máquina y Minería de texto. El Sistema de Clasificación Decimal Dewey (CDD) en el ámbito de la biblioteca Emiliano Pedro Aparicio de la Facultad de Ciencias Exactas, Físicas y Naturales de la Universidad Nacional de San Juan (FCEFN-UNSJ), es una tarea que se realiza en forma manual. Es propósito del presente trabajo, poner a consideración una primera instancia de automatización del proceso, mediante tareas de segmentación y medidas de similitud sintáctica, y permitir de esta manera, asignar un código adecuado a material bibliográfico recientemente adquirido por la biblioteca. La aplicación se lleva adelante utilizando la herramienta de software libre RapidMiner (RM) 5.3.015 bajo licencia AGPL versión 3.0.Eje: Bases de Datos y Minería de DatosRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Minería de texto en la determinación automática de código Dewey : Una primera aproximación

Author: Araya Jorge Matías
Klenzi Raúl O.
Publication venue
Publication date: 09/10/2014
Field of study

Minería de texto en la determinación automática de código Dewey : Una primera aproximación

Author: Araya Jorge Matías
Klenzi Raúl O.
Publication venue
Publication date: 01/05/2014
Field of study

Augmenting Dublin Core digital library metadata with Dewey Decimal Classification

Author: Ahn Jae-Wook
Binding Ceri
Jones Hilary
Khoo Michael
Lin Xia
Massam Diana
Tudhope Douglas
Publication venue: 'Emerald'
Publication date: 14/09/2015
Field of study

Purpose – The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query. Design/methodology/approach – The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records. Findings – The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies. Research limitations/implications – The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity. Practical implications – The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing. Social implications – The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries. Originality/value – The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger. </jats:sec

Crossref

University of South Wales Research Explorer

Building a DDC-annotated Corpus from OAI Metadata

Author: Horstmann Wolfram
Lösch Mathias
Mehler Alexander
Waltinger Ulli
Publication venue: University of Southampton, Multimedia Research Group
Publication date: 31/12/2010
Field of study

Lösch M, Waltinger U, Horstmann W, Mehler A. Building a DDC-annotated Corpus from OAI Metadata. Journal of Digital Information. 2011;12(2)

BieColl - Bielefeld Electronic Collections

Publications at Bielefeld University

BieColl - Bielefeld eCollections

Journal of Digital Information (Texas Digital Library - TDL E-Journals)

Sjednocování věcného popisu agregovaných záznamů v repozitáři NUŠL: Unification of Subject Description of Aggregated Records in National Repository of Grey Literature

Author
Publication venue
Publication date: 15/11/2016
Field of study

Diplomová práce se zabývá metodami sjednocení věcného popisu v záznamech agregovaných z různých zdrojů v prostředí digitálního repozitáře na příkladu Národního úložiště šedé literatury (NUŠL). Po představení zahraničních zkušeností ze systémů BASE a LASSO je popsána i současná praxe v repozitáři NUŠL, v němž je k jednotnému popisu pomocí Polytematického strukturovaného hesláře (PSH) využívána automatická indexace. V rámci práce byly na PSH namapovány skupiny Konspektu a tezaurus MeSH. Tato mapování byla aplikována na záznamy přebírané do systému NUŠL z Národní lékařské knihovny a v průběhu navrženého experimentu byl srovnán výsledný věcný popis tvořený hesly PSH přiřazených na základě vytvořených mapování a věcný popis vytvořený automatickou indexací. Kromě toho byla řešena i možnost mapování autorských klíčových slov popisujících vysokoškolské kvalifikační práce v záznamech pocházejících z repozitářů spolupracujících vysokých škol.The diploma thesis focuses on subject description unification methods in records aggregated from different sources in digital repositories, using the example of the National Repository of Grey Literature (NRGL). After presenting experiences with systems BASE and LASSO abroad, I describe the current situation in NRGL, where the automatic indexing is used to assign each record a unified subject heading from the Polythematic Structured Subject Heading System (PSSHS). The thesis then presents how the MeSH thesaurus and Conspectus categorization scheme were mapped to PSSHS. These mappings were then applied to records from the National Medical Library. The aim of the experiment was to compare the subject description consisting of PSSHS subject headings created by automatic indexing, and the subject description created by mapping. In addition to that I explore the possibilities of mapping author keywords in records of academic theses

NTK Institutional Digital Repository