46 research outputs found
Sustaining Collection Value: Managing Collection/Item Metadata Relationships
Many aspects of managing collection/item metadata relationships are critical to sustaining collection value over time. Metadata at the collection-level not only provides context for finding, understanding, and using the items in the collection, but is often essential to the particular research and scholarly activities the collection is designed to support. Contemporary retrieval systems, which search across collections, usually ignore collection level metadata. Alternative approaches, informed by collection-level information, will require an understanding of the various kinds of relationships that can obtain between collection-level and item-level metadata. This paper outlines the problem and describes a project that is developing a logic-based framework for classifying collection-level/item-level metadata relationships. This framework will support (i) metadata specification developers defining metadata elements, (ii) metadata librarians describing objects, and (iii) system designers implementing systems that help users take advantage of collection-level metadata.Institute for Museum and Libary Services (Grant #LG06070020)published or submitted for publicationis peer reviewe
Recommended from our members
Will Formal Preservation Models Require Relative Identity?
The problem of identifying and re–identifying data put the notion of of ”same data” at the very heart of preservation, integration and interoperability, and many other fundamental data curation activities. However, it is also a profoundly challenging notion because the concept of data itself clearly lacks a precise and univocal definition. When science is conducted in small communicating groups, with homogeneous data these ambiguities seldom create problems and solutions can be negotiated in casual real-time conversations. However when the data is heterogeneous in encoding, content and management practices, these problems can produce costly inefficiencies and lost opportunities. We consider here the relative identity view which apparently provides the most natural interpretation of common identity statements about digitally–encoded data. We show how this view conflicts with the curatorial and management practice of “data” objects, in terms of their modeling, and common knowledge representation strategies
Recommended from our members
Definitions of Dataset in the Scientific and Technical Literature
The integration of heterogeneous data in varying formats and from diverse communities requires an improved understanding of the concept of a dataset, and of key related concepts, such as format, encoding, and version. Ultimately, a normative formal framework of such concepts will be needed to support the effective curation, integration, and use of shared multi-disciplinary scientific data. To prepare for the development of this framework we reviewed the definitions of dataset found in technical documentation and the scientific literature. Four basic features can be identified as common to most definitions: grouping, content, relatedness, and purpose. In this summary of our results we describe each of these features, indicating the directions a more formal analysis might take
Recommended from our members
A Framework for Applying the Concept of Significant Properties to Datasets
The concept of significant properties, properties that must be identified and preserved in any successful digital object preservation, is now common in data curation. Although this notion has clearly demonstrated its usefulness in cultural heritage domains its application to the preservation of scientific datasets is not as well developed. One obstacle to this application is that the familiar preservation models are not sufficiently explicit to identify the relevant entities, properties, and relationships involved in dataset preservation. We present a logic-based formal framework of dataset concepts that provides the levels of abstraction necessary to identify and correctly assign significant properties to their appropriate entities. A unique feature of this model is that it recognizes that a typed symbol structure is a unique requirement for datasets, but not for other information objects
Recommended from our members
One Thing is Missing or Two Things are Confused: An Analysis of OAIS Representation Information.
We describe two alternative interpretations of OAIS Representation Information (CCSDS, 2002), and show that both are flawed. The first is insufficient to formalize a model of preservation, and the second leads to category mistakes in conceptualizing the nature of digital artifacts. This analysis is based on earlier work developing a framework for the application of significant properties to datasets (Sacchi et al, 2011)
Fully Digital: Policy and Process Implications for the AAS
Over the past two decades, every scholarly publisher has migrated at least
the mechanical aspects of their journal publishing so that they utilize digital
means. The academy was comfortable with that for a while, but publishers are
under increasing pressure to adapt further. At the American Astronomical
Society (AAS), we think that means bringing our publishing program to the point
of being fully digital, by establishing procedures and policies that regard the
digital objects of publication primarily. We have always thought about our
electronic journals as databases of digital articles, from which we can publish
and syndicate articles one at a time, and we must now put flesh on those bones
by developing practices that are consistent with the realities of article at a
time publication online. As a learned society that holds the long-term rights
to the literature, we have actively taken responsibility for the preservation
of the digital assets that constitute our journals, and in so doing we have not
forsaken the legacy pre-digital assets. All of us who serve as the long-term
stewards of scholarship must begin to evolve into fully digital publishers
Foundations of Data Curation: The Pedagogy and Practice of "Purposeful Work" with Research Data
Increased interest in large-scale, publicly accessible data collections has made data curation critical to the management, preservation, and improvement of research data in the social and natural sciences, as well as the humanities. This paper explicates an approach to data curation education that integrates traditional notions of curation with principles and expertise from library, archival, and computer science. We begin by tracing the emergence of data curation as both a concept and a field of practice related to, but distinct from, both digital curation and data stewardship. This historical account, while far from definitive, considers perspectives from both the sciences and the humanities. Alongside traditional LIS and archival science practices, unique aspects of curation have informed our concept of “purposeful work” with data and, in turn, our pedagogical approach to data curation for the sciences and the humanities.Ope
A Vision for User-Defined Semantic Markup
Typesetting systems, such as LaTeX, permit users to define custom markup and corresponding formatting to simplify authoring, ensure the consistent presentation of domain-specific recurring elements and, potentially, enable further processing, such as the generation of an index of such elements. In XML-based and similar systems, the separation of content and form is also reflected in the processing pipeline: while document authors can define custom markup, they cannot define its semantics. This could be said to be intentional to ensure structural integrity of documents, but at the same time it limits the expressivity of markup. The latter is particularly true for so-called lightweight markup languages like Markdown, which only define very limited sets of generic elements. This vision paper sketches an approach for user-defined semantic markup that could permit authors to define the semantics of elements by formally describing the relations between its constituent parts and to other elements, and to define a formatting intent that would ensure that a default presentation is always available
Ontologies in Quantitative Biology: A Basis for Comparison, Integration, and Discovery
As biology is becoming a data-driven discipline, ontologies become increasingly important for systematically capturing the existing knowledge. This essay discusses current trends and how ontologies can also be used for discovery
Theoretical and technological building blocks for an innovation accelerator
The scientific system that we use today was devised centuries ago and is
inadequate for our current ICT-based society: the peer review system encourages
conservatism, journal publications are monolithic and slow, data is often not
available to other scientists, and the independent validation of results is
limited. Building on the Innovation Accelerator paper by Helbing and Balietti
(2011) this paper takes the initial global vision and reviews the theoretical
and technological building blocks that can be used for implementing an
innovation (in first place: science) accelerator platform driven by
re-imagining the science system. The envisioned platform would rest on four
pillars: (i) Redesign the incentive scheme to reduce behavior such as
conservatism, herding and hyping; (ii) Advance scientific publications by
breaking up the monolithic paper unit and introducing other building blocks
such as data, tools, experiment workflows, resources; (iii) Use machine
readable semantics for publications, debate structures, provenance etc. in
order to include the computer as a partner in the scientific process, and (iv)
Build an online platform for collaboration, including a network of trust and
reputation among the different types of stakeholders in the scientific system:
scientists, educators, funding agencies, policy makers, students and industrial
innovators among others. Any such improvements to the scientific system must
support the entire scientific process (unlike current tools that chop up the
scientific process into disconnected pieces), must facilitate and encourage
collaboration and interdisciplinarity (again unlike current tools), must
facilitate the inclusion of intelligent computing in the scientific process,
must facilitate not only the core scientific process, but also accommodate
other stakeholders such science policy makers, industrial innovators, and the
general public