5 research outputs found

    Planning for the Lifecycle Management and Long-Term Preservation of Research Data: A Federated Approach

    Get PDF
    Outcomes of the grant are archived here.The “data deluge” is a recent but increasingly well-understood phenomenon of scientific and social inquiry. Large-scale research instruments extend our observational power by many orders of magnitude but at the same time generate massive amounts of data. Researchers work feverishly to document and preserve changing or disappearing habitats, cultures, languages, and artifacts resulting in volumes of media in various formats. New software tools mine a growing universe of historical and modern texts and connect the dots in our semantic environment. Libraries, archives, and museums undertake digitization programs creating broad access to unique cultural heritage resources for research. Global-scale research collaborations with hundreds or thousands of participants, drive the creation of massive amounts of data, most of which cannot be recreated if lost. The University of Kansas (KU) Libraries in collaboration with two partners, the Greater Western Library Alliance (GWLA) and the Great Plains Network (GPN), received an IMLS National Leadership Grant designed to leverage collective strengths and create a proposal for a scalable and federated approach to the lifecycle management of research data based on the needs of GPN and GWLA member institutions.Institute for Museum and Library Services LG-51-12-0695-1

    Building blocks for semantic data organization on the desktop

    Get PDF
    Die Organisation von (Multimedia-) Daten auf Desktop-Systemen wird derzeit hauptsächlich durch das Einordnen von Dateien in ein hierarchisches Dateisystem bewerkstelligt. Zusätzlich werden gewisse Inhalte (z.B. Musik oder Fotos) von spezialisierter Software mit Hilfe Datei-bezogener Metadaten verwaltet. Diese Metadaten werden meist direkt im Dateikopf in einer Unzahl verschiedener, vorwiegend proprietärer Formate gespeichert. Allgemein nehmen Metadaten und Links die Schlüsselrollen in fortgeschrittenen Datenorganisationskonzepten ein, ihre eingeschränkte Unterstützung in vorherrschenden Dateisystemen macht die Einführung solcher Konzepte auf dem Desktop jedoch schwierig: Erstens müssen Anwendungen sowohl Dateiformat als auch Metadatenschema verstehen um auf Metadaten zugreifen zu können; zweitens ist ein getrennter Zugriff auf Daten und Metadaten nicht möglich und drittens kann man solche Metadaten nicht mit mehreren Dateien oder mit Dateiordnern assoziieren obgleich letztere die derzeit wichtigsten Konstrukte für die Dateiorganisation darstellen. Dies bedeutet in weiterer Folge: (i) eingeschränkte Möglichkeiten der Datenorganisation, (ii) eingeschränkte Navigationsmöglichkeiten, (iii) schlechte Auffindbarkeit der gespeicherten Daten, und (iv) Fragmentierung von Metadaten. Obschon es Versuche gab, diese Situation (zum Beispiel mit Hilfe semantischer Dateisysteme) zu verbessern, wurden die meisten dieser Probleme bisher vor allem im Web und im Speziellen im semantischen Web adressiert und gelöst. Das Anwenden dort entwickelter Lösungen auf dem Desktop, einer zentralen Plattform der Daten- und Metadatenmanipulation, wäre zweifellos von Vorteil. In der vorliegenden Arbeit wird ein neues, rückwärts-kompatibles Metadatenmodell als Lösungsversuch für die oben genannten Probleme präsentiert. Dieses Modell basiert auf stabilen Datei-Identifikatoren und externen, semantischen, Datei- bezogenen Metadatenbeschreibungen welche im RDF Graphenmodell repräsentiert werden. Diese Beschreibungen sind durch eine einheitliche Linked-Data- Schnittstelle zugänglich und können mit anderen Beschreibungen und Ressourcen verlinkt werden. Im Speziellen erlaubt dieses Modell semantische Links zwischen lokalen Dateisystemobjekten und Netzressourcen im Web sowie im entstehenden “Daten Web” und ermöglicht somit die Integration dieser Datenräume. Das Modell hängt entscheidend von der Stabilität dieser Links ab weshalb zwei Algorithmen präsentiert werden, welche deren Integrität in lokalen und vernetzten Umgebungen erhalten können. Dies bedeutet, dass Links zwischen Dateisystemobjekten, Metadatenbeschreibungen und Netzressourcen nicht brechen wenn sich deren Adressen ändern, z.B. wenn Dateien verschoben oder Linked-Data Ressourcen unter geänderten URIs publiziert werden. Schließlich wird eine prototypische Implementierung des vorgeschlagenen Metadatenmodells präsentiert, welche demonstriert wie die Summe dieser Bausteine eine Metadatenschicht bildet die als Grundlage für semantische Datenorganisation auf dem Desktop verwendet werden kann.The organization of (multimedia) data on current desktop systems is done to a large part by arranging files in hierarchical file systems, but also by specialized applications (e.g., music or photo organizing software) that make use of file-related metadata for this task. These metadata are predominantly stored in embedded file headers, using a magnitude of mainly proprietary formats. Generally, metadata and links play the key roles in advanced data organization concepts. Their limited support in prevalent file system implementations, however, hinders the adoption of such concepts on the desktop: First, non-uniform access interfaces require metadata consuming applications to understand both a file’s format and its metadata scheme; second, separate data/metadata access is not possible, and third, metadata cannot be attached to multiple files or to file folders although the latter are the primary constructs for file organization. As a consequence of this, current desktops suffer, inter alia, from (i) limited data organization possibilities, (ii) limited navigability, (iii) limited data findability, and (iv) metadata fragmentation. Although there were attempts to improve this situation, e.g., by introducing semantic file systems, most of these issues were successfully addressed and solved in the Web and in particular in the Semantic Web and reusing these solutions on the desktop, a central hub of data and metadata manipulation, is clearly desirable. In this thesis a novel, backwards-compatible metadata model that addresses the above-mentioned issues is introduced. This model is based on stable file identifiers and external, file-related, semantic metadata descriptions that are represented using the generic RDF graph model. Descriptions are accessible via a uniform Linked Data interface and can be linked with other descriptions and resources. In particular, this model enables semantic linking between local file system objects and remote resources on the Web or the emerging Web of Data, thereby enabling the integration of these data spaces. As the model crucially relies on the stability of these links, we contribute two algorithms that preserve their integrity in local and in remote environments. This means that links between file system objects, metadata descriptions and remote resources do not break even if their addresses change, e.g., when files are moved or Linked Data resources are re-published using different URIs. Finally, we contribute a prototypical implementation of the proposed metadata model that demonstrates how these building blocks sum up to constitute a metadata layer that may act as a foundation for semantic data organization on the desktop

    Repositório digital pessoal semântico baseado na “cloud”

    Get PDF
    Doutoramento em InformáticaAo longo do tempo os indivíduos procuraram sempre formas de preservar o conhecimento, recordações e experiencias de vida. A busca por suportes estáveis que possam preservar as recordações dos efeitos da passagem do tempo leva à projeção das mesmas sobre objetos físicos. Estes objetos eventualmente são agregados em coleções que representam partes das vidas dos seus criadores, e que que podem ser partilhadas com outras pessoas. O uso generalizado das tecnologias da informação, conjuntamente com a sua simplicidade trouxe consigo uma mudança de paradigma, levando a que muitas interações que poderiam criar objetos físicos sobre os quais seriam projetadas recordações passassem do mundo físico para o mundo digital passando a criar objetos digitais, sobre os quais também podem ser projetadas recordações, tal como o que acontece com os seus equivalentes físicos. Devido a sua natureza digital estes objetos são simples de criar, manipular, duplicar e partilhar. Estas características colocam-nos numa posição em que podem ser gerados facilmente, usado para transmitir conteúdo aparentemente trivial que é depois partilhado e prontamente esquecido. No entanto, apesar destes objetos poderem passar a incorporar memorias, a combinação do excesso de confiança nas suas características intrínsecas e de uma atitude que convida ao esquecimento acabam por impedir este desfecho, o que pode levar a que no futuro os indivíduos percam o acesso a estes objetos. O trabalho desenvolvido ao longo desta tese foca-se sobre este problema, propondo resolve-lo com a criação de um sistema de repositórios digitais pessoas para a recolha de informação sobre o conteúdo pessoal de cada individuo. Em vez de se focar na recolha do conteúdo propriamente dita, um repositório digital pessoal dá prioridade à recolha de metadados sobre o conteúdo (desde que este não esteja em perigo iminente) de forma a no futuro poder guiar os indivíduos de volta aos serviços na “nuvem” onde o conteúdo ainda reside no seu contexto original. Em cenários pessoais não é viável recorrer a pessoal especializado para proceder a recolha e seleção destes dados. Para mitigar este problema, os dados são recolhidos o mais cedo e próximo da origem quanto possível por agentes de recolha. Estes foram desenhados de forma a minimizar a intrusão nas rotinas dos seus utilizadores, ao mesmo tempo que oferecem serviços complementares que podem ser utilizados de forma independente do repositório digital pessoal, fomentando assim a adoção do uso destes agentes. Este trabalho também descreve uma proposta de extensão ao modelo CIDOC/CRM, utilizado para classificar e organizar a informação recolhida. Esta extensão foi criada devido à necessidade de dotar o modelo de novas entidades e propriedades destinadas a lidar com objetos digitais e cenários pessoais.Throughout time individuals have always sought forms to preserve their knowledge, memories and life experiences. Physical objects provide a medium upon which individuals are able to project their memories, in an attempt that they remain in a stable support better able to cope with the passage of time. Physical objects eventually coalesce into a collection that comes to represent part of its owners’ lives and that can eventually be passed on to others. Widespread use of information technologies, coupled with their perceived ease of use has shifted many interactions that would end up producing external memory objects from the physical to the digital realm. As with their physical counterparts, digital objects can also be used by individuals to project their memories. Due to their digital nature, these objects are simple to create, produce, manipulate, duplicate and share. These traits place them into a position where they can be generated without too much effort to convey what might appear to be trivial content, readily shared and forgotten afterwards. Though, through memory projection they could become part of their creator’s legacy, overconfidence in their reproducibility and being forgotten can prevent them from being so. This deprives their creators from part of their lives that, in spite of appearing trivial at first, might acquire a deeper meaning with the passage of time. The work done throughout this thesis addresses this issue by proposing the creation of personal digital repositories to collect information regarding personal content. Instead of focusing on collecting the content itself, the personal digital repository prioritises gathering metadata about the content (when not immediately at risk) in order to lead its owner back to the “cloud” applications where the content can still be found in its original context. In personal scenarios it is not feasible to rely on trained personnel to help with content gathering and organisation. To mitigate this issue, content is collected as soon as possible by collection agents. These are designed to be as unobtrusive as possible, also offering additional services that can be used even without the personal digital repository in order to encourage their adoption. This creates an intertwined ecosystem where the content collection agents feed the personal digital repository and can in turn use previously collected content to support their additional services. This work also describes a proposed extension to the CIDOC/CRM model, used to classify and organise the collected information. The extension was created due to a perceived gap in the CIDOC/CRM model when it came to dealing with digital objects
    corecore