63 research outputs found

    Addressing the tacit knowledge of a digital library system

    Get PDF
    Recent surveys, about the Linked Data initiatives in library organizations, report the experimental nature of related projects and the difficulty in re-using data to provide improvements of library services. This paper presents an approach for managing data and its "tacit" organizational knowledge, as the originating data context, improving the interpretation of data meaning. By analyzing a Digital Libray system, we prototyped a method for turning data management into a "semantic data management", where local system knowledge is managed as a data, and natively foreseen as a Linked Data. Semantic data management aims to curates the correct consumers' understanding of Linked Datasets, driving to a proper re-use

    Expressing the tacit knowledge of a digital library system as linked data

    Get PDF
    Library organizations have enthusiastically undertaken semantic web initiatives and in particular the data publishing as linked data. Nevertheless, different surveys report the experimental nature of initiatives and the consumer difficulty in re-using data. These barriers are a hindrance for using linked datasets, as an infrastructure that enhances the library and related information services. This paper presents an approach for encoding, as a Linked Vocabulary, the "tacit" knowledge of the information system that manages the data source. The objective is the improvement of the interpretation process of the linked data meaning of published datasets. We analyzed a digital library system, as a case study, for prototyping the "semantic data management" method, where data and its knowledge are natively managed, taking into account the linked data pillars. The ultimate objective of the semantic data management is to curate the correct consumers' interpretation of data, and to facilitate the proper re-use. The prototype defines the ontological entities representing the knowledge, of the digital library system, that is not stored in the data source, nor in the existing ontologies related to the system's semantics. Thus we present the local ontology and its matching with existing ontologies, Preservation Metadata Implementation Strategies (PREMIS) and Metadata Objects Description Schema (MODS), and we discuss linked data triples prototyped from the legacy relational database, by using the local ontology. We show how the semantic data management, can deal with the inconsistency of system data, and we conclude that a specific change in the system developer mindset, it is necessary for extracting and "codifying" the tacit knowledge, which is necessary to improve the data interpretation process

    Provenance : from long-term preservation to query federation and grid reasoning

    Get PDF

    GENERIC AND ADAPTIVE METADATA MANAGEMENT FRAMEWORK FOR SCIENTIFIC DATA REPOSITORIES

    Get PDF
    Der rapide technologische Fortschritt hat in verschiedenen Forschungsdisziplinen zu vielfältigen Weiterentwicklungen in Datenakquise und -verarbeitung geführt. Hi- eraus wiederum resultiert ein immenses Wachstum an Daten und Metadaten, gener- iert durch wissenschaftliche Experimente. Unabhängig vom konkreten Forschungs- gebiet ist die wissenschaftliche Praxis immer stärker durch Daten und Metadaten gekennzeichnet. In der Folge intensivieren Universitäten, Forschungsgemeinschaften und Förderagenturen ihre Bemühungen, wissenschaftliche Daten effizient zu sichten, zu speichern und auszuwerten. Die wesentlichen Ziele wissenschaftlicher Daten- Repositorien sind die Etablierung von Langzeitspeicher, der Zugriff auf Daten, die Bereitstellung von Daten für die Wiederverwendung und deren Referenzierung, die Erfassung der Datenquelle zur Reproduzierbarkeit sowie die Bereitstellung von Meta- daten, Anmerkungen oder Verweisen zur Vermittlung domänenspezifischen Wis- sens, das zur Interpretation der Daten notwendig ist. Wissenschaftliche Datenspe- icher sind hochkomplexe Systeme, bestehend aus Elementen aus unterschiedlichen Forschungsfeldern, wie z. B. Algorithmen für Datenkompression und Langzeit- datenarchivierung, Frameworks für das Metadaten- und Annotations-management, Workflow-Provenance und Provenance-Interoperabilität zwischen heterogenen Work- flowsystemen, Autorisierungs und Authentifizierungsinfrastrukturen sowie Visual- isierungswerkzeuge für die Dateninterpretation. Die vorliegende Arbeit beschreibt eine modulare Architektur für ein wis- senschaftliches Datenarchiv, die Forschungsgemeinschaften darin unterstützt, ihre Daten und Metadaten gezielt über den jeweiligen Lebenszyklus hinweg zu orchestri- eren. Diese Architektur besteht aus Komponenten, die vier Forschungsfelder repräsen- tieren. Die erste Komponente ist ein Client zur Datenübertragung (“data transfer client”). Er bietet eine generische Schnittstelle für die Erfassung von Daten und den Zugriff auf Daten aus wissenschaftlichen Datenakquisesystemen. Die zweite Komponente ist das MetaStore-Framework, ein adaptives Metadaten- Management-Framework, das die Handhabung sowohl statischer als auch dynamis- cher Metadatenmodelle ermöglicht. Um beliebige Metadatenschemata behandeln zu können, basiert die Entwicklung des MetaStore-Frameworks auf dem komponen- tenbasierten dynamischen Kompositions-Entwurfsmuster (component-based dynamic composition design pattern). Der MetaStore ist außerdem mit einem Annotations- framework für die Handhabung von dynamischen Metadaten ausgestattet. Die dritte Komponente ist eine Erweiterung des MetaStore-Frameworks zur au- tomatisierten Behandlung von Provenance-Metadaten für BPEL-basierte Workflow- Management-Systeme. Der von uns entworfene und implementierte Prov2ONE Al- gorithmus übersetzt dafür die Struktur und Ausführungstraces von BPEL-Workflow- Definitionen automatisch in das Provenance-Modell ProvONE. Hierbei ermöglicht die Verfügbarkeit der vollständigen BPEL-Provenance-Daten in ProvONE nicht nur eine aggregierte Analyse der Workflow-Definition mit ihrem Ausführungstrace, sondern gewährleistet auch die Kompatibilität von Provenance-Daten aus unterschiedlichen Spezifikationssprachen. Die vierte Komponente unseres wissenschaftlichen Datenarchives ist das Provenance-Interoperabilitätsframework ProvONE - Provenance Interoperability Framework (P-PIF). Dieses gewährleistet die Interoperabilität von Provenance-Daten heterogener Provenance-Modelle aus unterschiedlichen Workflowmanagementsyste- men. P-PIF besteht aus zwei Komponenten: dem Prov2ONE-Algorithmus für SCUFL und MoML Workflow-Spezifikationen und Workflow-Management-System- spezifischen Adaptern zur Extraktion, Übersetzung und Modellierung retrospektiver Provenance-Daten in das ProvONE-Provenance-Modell. P-PIF kann sowohl Kon- trollfluss als auch Datenfluss nach ProvONE übersetzen. Die Verfügbarkeit hetero- gener Provenance-Traces in ProvONE ermöglicht das Vergleichen, Analysieren und Anfragen von Provenance-Daten aus unterschiedlichen Workflowsystemen. Wir haben die Komponenten des in dieser Arbeit vorgestellten wissenschaftlichen Datenarchives wie folgt evaluiert: für den Client zum Datentrasfer haben wir die Daten-übertragungsleistung mit dem Standard-Protokoll für Nanoskopie-Datensätze untersucht. Das MetaStore-Framework haben wir hinsichtlich der folgenden bei- den Aspekte evaluiert. Zum einen haben wir die Metadatenaufnahme und Voll- textsuchleistung unter verschiedenen Datenbankkonfigurationen getestet. Zum an- deren zeigen wir die umfassende Abdeckung der Funktionalitäten von MetaStore durch einen funktionsbasierten Vergleich von MetaStore mit bestehenden Metadaten- Management-Systemen. Für die Evaluation von P-PIF haben wir zunächst die Korrek- theit und Vollständigkeit unseres Prov2ONE-Algorithmus bewiesen und darüber hin- aus die vom Prov2ONE BPEL-Algorithmus generierten Prognose-Graphpattern aus ProvONE gegen bestehende BPEL-Kontrollflussmuster ausgewertet. Um zu zeigen, dass P-PIF ein nachhaltiges Framework ist, das sich an Standards hält, vergle- ichen wir außerdem die Funktionen von P-PIF mit denen bestehender Provenance- Interoperabilitätsframeworks. Diese Auswertungen zeigen die Überlegenheit und die Vorteile der einzelnen in dieser Arbeit entwickelten Komponenten gegenüber ex- istierenden Systemen

    The Red Queen in the Repository

    Get PDF
    One of the grand curation challenges is to secure metadata quality in the ever-changing environment of metadata standards and file formats. As the Red Queen tells Alice in Through the Looking-Glass: “Now, here, you see, it takes all the running you can do, to keep in the same place.” That is, there is some “running” needed to keep metadata records in a research data repository fit for long-term use and put in place. One of the main tools of adaptation and keeping pace with the evolution of new standards, formats – and versions of standards in this ever-changing environment are validation schemas. Validation schemas are mainly seen as methods of checking data quality and fitness for use, but are also important for long-term preservation. We might like to think that our present (meta)data standards and formats are made for eternity, but in reality we know that standards evolve, formats change (some even become obsolete with time), and so do our needs for storage, searching and future dissemination for re-use. Eventually, we come to a point where transformation of our archival records and migration to other formats will be necessary. This could also mean that even if the AIPs, the Archival Information Packages stay the same in storage, the DIPs, the Dissemination Information Packages that we want to extract from the archive are subject to change of format. Further, in order for archival information packages to be self-sustainable, as required in the OAIS model, it is important to take interdependencies between individual files in the information packages into account. This should be done already by the time of ingest and validation of the SIPs, the Submission Information Packages, and along the line at different points of necessary transformation/migration (from SIP to AIP, from AIP to DIP etc.), in order to counter obsolescence. This paper investigates possible validation errors and missing elements in metadata records from three general purpose, multidisciplinary research data repositories – Figshare, Harvard’s Dataverse and Zenodo, and explores the potential effects of these errors on future transformation to AIPs and migration to other formats within a digital archive. &nbsp

    Preservation and Access in an Age of E-Science and Electronic Records: Sharing the Problem and Discovering Common Solutions

    Get PDF
    As academic libraries grapple with the challenge of preserving their own digitized special collections, intensification of interest in preserving other electronic content may present opportunities to collaborate with organizations on campus. This article offers a brief introduction to some of the core issues in digital preservation and suggests an orientation to the problems that can be helpful in thinking about how to join forces with others on campus

    Provenance and digital context: contributions from information science

    Get PDF
    O objetivo deste artigo é discutir o conceito da proveniência e evidenciar sua importância no ambiente digital, focado na perspectiva da Ciência da Informação. Como procedimentos metodológicos, caracterizou-se por ser uma pesquisa qualitativa e exploratória, a partir de uma revisão de literatura sobre o contexto da proveniência em diferentes domínios. O artigo apresenta uma discussão do termo proveniência em diversos contextos, como na Arquivologia, Museologia, Preservação digital e Computação e destaca sua importância no âmbito digital. Conforme discutida a relevância da proveniência em diferentes contextos, destaca-se a necessidade de estudos mais aprofundados, se os instrumentos para representar a proveniência são adequados para garantir a veracidade e a inalterabilidade das informações. Dessa forma, as discussões expostas neste artigo revelam possibilidades em identificar metadados para cada característica que a proveniência possa apresentar, além da viabilidade de ampliação para outros contextos.The objective of this article is to discuss the concept of provenance and highlight its importance in the digital environment, focusing on the perspective of Information Science. As methodological procedures, it was characterized by being a qualitative and exploratory research, from a literature review on the context of provenance in different domains. The article presents a discussion of the term provenance in several contexts, such as in Achival Science, Museology, Digital Preservation and Computing, highlighting its importance in the digital sphere. As discussed the relevance of provenance in different contexts, the need for more in-depth studies is highlighted, analyzing whether the instruments to represent provenance are adequate to guarantee the veracity and inalterability of the information. Thus, the discussions exposed in this article reveal possibilities in identifying metadata for each characteristic that the provenance may present, in addition to the feasibility of expansion to other contexts.Facultad de Humanidades y Ciencias de la Educació

    DePICT : a conceptual model for digital preservation

    Get PDF
    Digital Preservation addresses a significant threat to our cultural and economic foundation: the loss of access to valuable and, sometimes, unique information that is captured in digital form through obsolescence, deterioration or loss of information of how to access the contents. Digital Preservation has been defined as “The series of managed activities necessary to ensure continued access to digital materials for as long as necessary” (Jones, Beagrie, 2001/2008). This thesis develops a conceptual model of the core concepts and constraints that appear in digital preservation - DePICT (Digital PreservatIon ConceptualisaTion). This includes a conceptual model of the digital preservation domain, a top-level vocabulary for the concepts in the model, an in-depth analysis of the role of digital object properties, characteristics, and the constraints that guide digital preservation processes, and of how properties, characteristics and constraints affect the interaction of digital preservation services. In addition, it presents a machine-interpretable XML representation of this conceptual model to support automated digital preservation tools. Previous preservation models have focused on preserving technical properties of digital files. Such an approach limits the choices of preservation actions and does not fully reflect preservation activities in practice. Organisations consider properties that go beyond technical aspects and that encompass a wide range of factors that influence and guide preservation processes, including organisational, legal, and financial ones. Consequently, it is necessary to be able to handle ‘digital’ objects in a very wide sense, including abstract objects, such as intellectual entities and collections, in addition to the files and sets of files that create renditions of logical objects that are normally considered. In addition, we find that not only the digital objects' properties, but also the properties of the environments in which they exist, guide digital preservation processes. Furthermore, organisations use risk-based analysis for their preservation strategies, policies and preservation planning. They combine information about risks with an understanding of actions that are expected to mitigate the risks. Risk and action specifications can be dependent on properties of the actions, as well as on properties of objects or environments which form the input and output of those actions. The model presented here supports this view explicitly. It links risks with the actions that mitigate them and expresses them in stakeholder specific constraints. Risk, actions and constraints are top-level entities in this model. In addition, digital objects and environments are top-level entities on an equal level. Models that do not have this property limit the choice of preservation actions to ones that transform a file in order to mitigate a risk. Establishing environments as top-level entities enables us to treat risks to objects, environments, or a combination of both. The DePICT model is the first conceptual model in the Digital Preservation domain that supports a comprehensive, whole life-cycle approach for dynamic, interacting preservation processes, rather than taking the customary and more limited view that is concerned with the management of digital objects once they are stored in a long-term repository.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore