5 research outputs found
Planning for the Lifecycle Management and Long-Term Preservation of Research Data: A Federated Approach
Outcomes of the grant are archived here.The “data deluge” is a recent but increasingly well-understood phenomenon of scientific and social inquiry. Large-scale research instruments extend our observational power by many orders of magnitude but at the same time generate massive amounts of data. Researchers work feverishly to document and preserve changing or disappearing habitats, cultures, languages, and artifacts resulting in volumes of media in various formats. New software tools mine a growing universe of historical and modern texts and connect the dots in our semantic environment. Libraries, archives, and museums undertake digitization programs creating broad access to unique cultural heritage resources for research. Global-scale research collaborations with hundreds or thousands of participants, drive the creation of massive amounts of data, most of which cannot be recreated if lost. The University of Kansas (KU) Libraries in collaboration with two partners, the Greater Western Library Alliance (GWLA) and the Great Plains Network (GPN), received an IMLS National Leadership Grant designed to leverage collective strengths and create a proposal for a scalable and federated approach to the lifecycle management of research data based on the needs of GPN and GWLA member institutions.Institute for Museum and Library Services LG-51-12-0695-1
Recommended from our members
B!SON: A Tool for Open Access Journal Recommendation
Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project
Building blocks for semantic data organization on the desktop
Die Organisation von (Multimedia-) Daten auf Desktop-Systemen wird derzeit
hauptsächlich durch das Einordnen von Dateien in ein hierarchisches Dateisystem
bewerkstelligt. Zusätzlich werden gewisse Inhalte (z.B. Musik oder Fotos) von
spezialisierter Software mit Hilfe Datei-bezogener Metadaten verwaltet. Diese
Metadaten werden meist direkt im Dateikopf in einer Unzahl verschiedener,
vorwiegend proprietärer Formate gespeichert. Allgemein nehmen Metadaten und
Links die Schlüsselrollen in fortgeschrittenen Datenorganisationskonzepten ein,
ihre eingeschränkte Unterstützung in vorherrschenden Dateisystemen macht die
Einführung solcher Konzepte auf dem Desktop jedoch schwierig: Erstens müssen
Anwendungen sowohl Dateiformat als auch Metadatenschema verstehen um auf
Metadaten zugreifen zu können; zweitens ist ein getrennter Zugriff auf Daten und
Metadaten nicht möglich und drittens kann man solche Metadaten nicht mit
mehreren Dateien oder mit Dateiordnern assoziieren obgleich letztere die derzeit
wichtigsten Konstrukte für die Dateiorganisation darstellen. Dies bedeutet in
weiterer Folge: (i) eingeschränkte Möglichkeiten der Datenorganisation, (ii)
eingeschränkte Navigationsmöglichkeiten, (iii) schlechte Auffindbarkeit der
gespeicherten Daten, und (iv) Fragmentierung von Metadaten. Obschon es Versuche
gab, diese Situation (zum Beispiel mit Hilfe semantischer Dateisysteme) zu
verbessern, wurden die meisten dieser Probleme bisher vor allem im Web und im
Speziellen im semantischen Web adressiert und gelöst. Das Anwenden dort
entwickelter Lösungen auf dem Desktop, einer zentralen Plattform der Daten- und
Metadatenmanipulation, wäre zweifellos von Vorteil.
In der vorliegenden Arbeit wird ein neues, rückwärts-kompatibles Metadatenmodell
als Lösungsversuch für die oben genannten Probleme präsentiert. Dieses Modell
basiert auf stabilen Datei-Identifikatoren und externen, semantischen, Datei-
bezogenen Metadatenbeschreibungen welche im RDF Graphenmodell repräsentiert
werden. Diese Beschreibungen sind durch eine einheitliche Linked-Data-
Schnittstelle zugänglich und können mit anderen Beschreibungen und Ressourcen
verlinkt werden. Im Speziellen erlaubt dieses Modell semantische Links zwischen
lokalen Dateisystemobjekten und Netzressourcen im Web sowie im entstehenden
“Daten Web” und ermöglicht somit die Integration dieser Datenräume. Das Modell
hängt entscheidend von der Stabilität dieser Links ab weshalb zwei Algorithmen
präsentiert werden, welche deren Integrität in lokalen und vernetzten Umgebungen
erhalten können. Dies bedeutet, dass Links zwischen Dateisystemobjekten,
Metadatenbeschreibungen und Netzressourcen nicht brechen wenn sich deren
Adressen ändern, z.B. wenn Dateien verschoben oder Linked-Data Ressourcen unter
geänderten URIs publiziert werden. Schließlich wird eine prototypische
Implementierung des vorgeschlagenen Metadatenmodells präsentiert, welche
demonstriert wie die Summe dieser Bausteine eine Metadatenschicht bildet die als
Grundlage für semantische Datenorganisation auf dem Desktop verwendet werden
kann.The organization of (multimedia) data on current desktop systems is done to a
large part by arranging files in hierarchical file systems, but also by
specialized applications (e.g., music or photo organizing software) that make
use of file-related metadata for this task. These metadata are predominantly
stored in embedded file headers, using a magnitude of mainly proprietary
formats. Generally, metadata and links play the key roles in advanced data
organization concepts. Their limited support in prevalent file system
implementations, however, hinders the adoption of such concepts on the desktop:
First, non-uniform access interfaces require metadata consuming applications to
understand both a file’s format and its metadata scheme; second, separate
data/metadata access is not possible, and third, metadata cannot be attached to
multiple files or to file folders although the latter are the primary constructs
for file organization. As a consequence of this, current desktops suffer, inter
alia, from (i) limited data organization possibilities, (ii) limited
navigability, (iii) limited data findability, and (iv) metadata fragmentation.
Although there were attempts to improve this situation, e.g., by introducing
semantic file systems, most of these issues were successfully addressed and
solved in the Web and in particular in the Semantic Web and reusing these
solutions on the desktop, a central hub of data and metadata manipulation, is
clearly desirable.
In this thesis a novel, backwards-compatible metadata model that addresses the
above-mentioned issues is introduced. This model is based on stable file
identifiers and external, file-related, semantic metadata descriptions that are
represented using the generic RDF graph model. Descriptions are accessible via a
uniform Linked Data interface and can be linked with other descriptions and
resources. In particular, this model enables semantic linking between local file
system objects and remote resources on the Web or the emerging Web of Data,
thereby enabling the integration of these data spaces. As the model crucially
relies on the stability of these links, we contribute two algorithms that
preserve their integrity in local and in remote environments. This means that
links between file system objects, metadata descriptions and remote resources do
not break even if their addresses change, e.g., when files are moved or Linked
Data resources are re-published using different URIs. Finally, we contribute a
prototypical implementation of the proposed metadata model that demonstrates how
these building blocks sum up to constitute a metadata layer that may act as a
foundation for semantic data organization on the desktop
Repositório digital pessoal semântico baseado na “cloud”
Doutoramento em InformáticaAo longo do tempo os indivíduos procuraram sempre formas de preservar
o conhecimento, recordações e experiencias de vida. A busca por suportes
estáveis que possam preservar as recordações dos efeitos da passagem do
tempo leva à projeção das mesmas sobre objetos físicos. Estes objetos eventualmente
são agregados em coleções que representam partes das vidas dos
seus criadores, e que que podem ser partilhadas com outras pessoas. O uso
generalizado das tecnologias da informação, conjuntamente com a sua simplicidade
trouxe consigo uma mudança de paradigma, levando a que muitas
interações que poderiam criar objetos físicos sobre os quais seriam projetadas
recordações passassem do mundo físico para o mundo digital passando a criar
objetos digitais, sobre os quais também podem ser projetadas recordações,
tal como o que acontece com os seus equivalentes físicos. Devido a sua
natureza digital estes objetos são simples de criar, manipular, duplicar e
partilhar. Estas características colocam-nos numa posição em que podem
ser gerados facilmente, usado para transmitir conteúdo aparentemente trivial
que é depois partilhado e prontamente esquecido. No entanto, apesar destes
objetos poderem passar a incorporar memorias, a combinação do excesso de
confiança nas suas características intrínsecas e de uma atitude que convida
ao esquecimento acabam por impedir este desfecho, o que pode levar a que
no futuro os indivíduos percam o acesso a estes objetos. O trabalho desenvolvido
ao longo desta tese foca-se sobre este problema, propondo resolve-lo
com a criação de um sistema de repositórios digitais pessoas para a recolha
de informação sobre o conteúdo pessoal de cada individuo. Em vez de se
focar na recolha do conteúdo propriamente dita, um repositório digital pessoal
dá prioridade à recolha de metadados sobre o conteúdo (desde que este
não esteja em perigo iminente) de forma a no futuro poder guiar os indivíduos
de volta aos serviços na “nuvem” onde o conteúdo ainda reside no
seu contexto original. Em cenários pessoais não é viável recorrer a pessoal
especializado para proceder a recolha e seleção destes dados. Para mitigar
este problema, os dados são recolhidos o mais cedo e próximo da origem
quanto possível por agentes de recolha. Estes foram desenhados de forma a
minimizar a intrusão nas rotinas dos seus utilizadores, ao mesmo tempo que
oferecem serviços complementares que podem ser utilizados de forma independente
do repositório digital pessoal, fomentando assim a adoção do uso
destes agentes. Este trabalho também descreve uma proposta de extensão
ao modelo CIDOC/CRM, utilizado para classificar e organizar a informação
recolhida. Esta extensão foi criada devido à necessidade de dotar o modelo
de novas entidades e propriedades destinadas a lidar com objetos digitais e
cenários pessoais.Throughout time individuals have always sought forms to preserve their
knowledge, memories and life experiences. Physical objects provide a
medium upon which individuals are able to project their memories, in an
attempt that they remain in a stable support better able to cope with the
passage of time. Physical objects eventually coalesce into a collection that
comes to represent part of its owners’ lives and that can eventually be passed
on to others. Widespread use of information technologies, coupled with their
perceived ease of use has shifted many interactions that would end up producing
external memory objects from the physical to the digital realm. As
with their physical counterparts, digital objects can also be used by individuals
to project their memories. Due to their digital nature, these objects
are simple to create, produce, manipulate, duplicate and share. These traits
place them into a position where they can be generated without too much
effort to convey what might appear to be trivial content, readily shared and
forgotten afterwards. Though, through memory projection they could become
part of their creator’s legacy, overconfidence in their reproducibility and
being forgotten can prevent them from being so. This deprives their creators
from part of their lives that, in spite of appearing trivial at first, might acquire
a deeper meaning with the passage of time. The work done throughout
this thesis addresses this issue by proposing the creation of personal digital
repositories to collect information regarding personal content. Instead
of focusing on collecting the content itself, the personal digital repository
prioritises gathering metadata about the content (when not immediately at
risk) in order to lead its owner back to the “cloud” applications where the
content can still be found in its original context. In personal scenarios it is
not feasible to rely on trained personnel to help with content gathering and
organisation. To mitigate this issue, content is collected as soon as possible
by collection agents. These are designed to be as unobtrusive as possible,
also offering additional services that can be used even without the personal
digital repository in order to encourage their adoption. This creates an intertwined
ecosystem where the content collection agents feed the personal
digital repository and can in turn use previously collected content to support
their additional services. This work also describes a proposed extension to
the CIDOC/CRM model, used to classify and organise the collected information.
The extension was created due to a perceived gap in the CIDOC/CRM
model when it came to dealing with digital objects