9 research outputs found
SCOPE - A Scientific Compound Object Publishing and Editing System
This paper presents the SCOPE (Scientific Compound Object Publishing and Editing) system which is designed to enable scientists to easily author, publish and edit scientific compound objects. Scientific compound objects enable scientists to encapsulate the various datasets and resources generated or utilized during a scientific experiment or discovery process, within a single compound object, for publishing and exchange. The adoption of ânamed graphsâ to represent these compound objects enables provenance information to be captured via the typed relationships between the components. This approach is also endorsed by the OAI-ORE initiative and hence ensures that we generate OAI-ORE-compliant Scientific Compound Objects. The SCOPE system is an extension of the Provenance Explorer tool â which enables access-controlled viewing of scientific provenance trails. Provenance Explorer provided dynamic rendering of RDF graphs of scientific discovery processes, showing the lineage from raw data to publication. Views of different granularity can be inferred automatically using SWRL (Semantic Web Rules Language) rules and an inferencing engine. SCOPE extends the Provenance Explorer tool and GUI by: 1) Adding an embedded web browser that can be used for incorporating objects discoverable via the Web; 2) Representing compound objects as Named Graphs, that can be saved in RDF, TriX, TriG or as an Atom syndication feed; 3) Enabling scientists to attach Creative Commons Licenses to the compound objects to specify how they may be re-used; 4) Enabling compound objects to be published as Fedora Object XML (FOXML) files within a Fedora digital library
From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web
In the process of scientific research, many information objects are
generated, all of which may remain valuable indefinitely. However, artifacts
such as instrument data and associated calibration information may have little
value in isolation; their meaning is derived from their relationships to each
other. Individual artifacts are best represented as components of a life cycle
that is specific to a scientific research domain or project. Current cataloging
practices do not describe objects at a sufficient level of granularity nor do
they offer the globally persistent identifiers necessary to discover and manage
scholarly products with World Wide Web standards. The Open Archives
Initiative's Object Reuse and Exchange data model (OAI-ORE) meets these
requirements. We demonstrate a conceptual implementation of OAI-ORE to
represent the scientific life cycles of embedded networked sensor applications
in seismology and environmental sciences. By establishing relationships between
publications, data, and contextual research information, we illustrate how to
obtain a richer and more realistic view of scientific practices. That view can
facilitate new forms of scientific research and learning. Our analysis is
framed by studies of scientific practices in a large, multi-disciplinary,
multi-university science and engineering research center, the Center for
Embedded Networked Sensing (CENS).Comment: 28 pages. To appear in the Journal of the American Society for
Information Science and Technology (JASIST
SCOPE: A Scientific Compound Object Publishing and Editing System
This paper presents the SCOPE (Scientific Compound Object Publishing and Editing) system which is designed to enable scientists to easily author, publish and edit scientific compound objects. Scientific compound objects encapsulate the various datasets and resources generated or utilized during a scientific experiment or discovery process, within a single compound object, for publishing and exchange. The adoption of ânamed graphsâ to represent these compound objects enables provenance information to be captured via the typed relationships between the components. This approach is also endorsed by the OAI-ORE initiative and hence ensures that we generate OAI-ORE-compliant Scientific Compound Objects. The SCOPE system is an extension of the Provenance Explorer tool â which supports access-controlled viewing of scientific provenance trails. Provenance Explorer provided dynamic rendering of RDF graphs of scientific discovery processes, showing the lineage from raw data to publication. Views of different granularity can be inferred automatically using SWRL (Semantic Web Rules Language) rules and an inferencing engine. SCOPE extends the Provenance Explorer tool and GUI by: 1) Adding an embedded web browser that can be used for incorporating objects discoverable via the Web; 2) Representing compound objects as Named Graphs, that can be saved in RDF, TriX, TriG or as an Atom syndication feed; 3) Enabling scientists to attach Creative Commons Licenses to the compound objects to specify how they may be re-used; 4) Enabling compound objects to be published as Fedora Object XML (FOXML) files within a Fedora digital library
Provenance explorer: A graphical interface for constructing scientific publication packages from provenance trails
Scientific communities are under increasing pressure from funding organizations to publish their raw data, in addition to their traditional publications, in open archives. Many scientists would be willing to do this if they had tools that streamlined the process and exposed simple provenance information, i.e., enough to explain the methodology and validate the results without compromising the author's intellectual property or competitive advantage. This paper presents Provenance Explorer, a tool that enables the provenance trail associated with a scientific discovery process to be visualized and explored through a graphical user interface (GUI). Based on RDF graphs, it displays the sequence of data, states and events associated with a scientific workflow, illustrating the methodology that led to the published results. The GUI also allows permitted users to expand selected links between nodes to reveal more fine-grained information and sub-workflows. But more importantly, the system enables scientists to selectively construct "scientific publication packages" by choosing particular nodes from the visual provenance trail and dragging-and-dropping them into an RDF package which can be uploaded to an archive or repository for publication or e-learning. The provenance relationships between the individual components in the package are automatically inferred using a rules-based inferencing engine
Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier
As universities recognize the inherent value in the data they collect and
hold, they encounter unforeseen challenges in stewarding those data in ways
that balance accountability, transparency, and protection of privacy, academic
freedom, and intellectual property. Two parallel developments in academic data
collection are converging: (1) open access requirements, whereby researchers
must provide access to their data as a condition of obtaining grant funding or
publishing results in journals; and (2) the vast accumulation of 'grey data'
about individuals in their daily activities of research, teaching, learning,
services, and administration. The boundaries between research and grey data are
blurring, making it more difficult to assess the risks and responsibilities
associated with any data collection. Many sets of data, both research and grey,
fall outside privacy regulations such as HIPAA, FERPA, and PII. Universities
are exploiting these data for research, learning analytics, faculty evaluation,
strategic decisions, and other sensitive matters. Commercial entities are
besieging universities with requests for access to data or for partnerships to
mine them. The privacy frontier facing research universities spans open access
practices, uses and misuses of data, public records requests, cyber risk, and
curating data for privacy protection. This paper explores the competing values
inherent in data stewardship and makes recommendations for practice, drawing on
the pioneering work of the University of California in privacy and information
security, data governance, and cyber risk.Comment: Final published version, Sept 30, 201
Provenance-aware knowledge representation: A survey of data models and contextualized knowledge graphs
Expressing machine-interpretable statements in the form of subject-predicate-object triples is a well-established practice for capturing semantics of structured data. However, the standard used for representing these triples, RDF, inherently lacks the mechanism to attach provenance data, which would be crucial to make automatically generated and/or processed data authoritative. This paper is a critical review of data models, annotation frameworks, knowledge organization systems, serialization syntaxes, and algebras that enable provenance-aware RDF statements. The various approaches are assessed in terms of standard compliance, formal semantics, tuple type, vocabulary term usage, blank nodes, provenance granularity, and scalability. This can be used to advance existing solutions and help implementers to select the most suitable approach (or a combination of approaches) for their applications. Moreover, the analysis of the mechanisms and their limitations highlighted in this paper can serve as the basis for novel approaches in RDF-powered applications with increasing provenance needs
Enhanced Publication Management System: a systemic approach towards modern scientific communication.
The impact of the digital revolution and the mass adoption of ICT affected only partially the scientific communication workflow. Scientists are today acquainted to scientific workflows, electronic data, software, e-science infrastructures for carrying out their daily research activities, but the dissemination of research results still relies on the bare scientific article, which simply shifted from being printed to digital. The scientific article alone, however, cannot support an effective assessment of research results or enable science reproducibility: to achieve this goal all products related to a research activity should be shared and disseminated.
In the last decades, on the wave of Open Science, the scientific community has approached the problem of publishing research products different from the scientific article. One of the solutions is the paradigm of enhanced publications (EPs). EPs are digital objects that aggregate a digital scientific article and the other research products that have been used and produced during the research investigation described by the article and are useful to: (i) better interpret the article, (ii) enable more effective peer re- view, and (iii) facilitate or support reproducibility of science. Theory and practice of EPs is still not advanced and most Enhanced Publication Information Systems (EPISs) are custom implementations serving community specific needs. EPIS designers and developers have little or no technological support oriented to EPs. In fact, they realize EP-oriented software with a âfrom scratchâ approach, addressing the peculiarities of the community to serve.
Approach The aim of this thesis is to propose a systemic approach to the realisation of EPISs inspired by the lessons learned from the database domain. The state of the art of information systems and data models for EPs has been analyzed to identify the common features across different domains. Those common features have served as building blocks for the definition of a data model and functionalities for the representation and manipulation of EPs. The notion of Enhanced Publication Management System (EPMS) is introduced to denote information systems that provide EPIS designers and developers with EP-oriented tools for the setup, operation and maintenance of EPISs. The requirements of EPMSs have been identified and a reference software architecture that satisfies them is presented.
Contributions The main contributions of this thesis relate to the fields of information science and scientific communication. The analysis of the state of the art about EPIS results in a terminology and a classification that can be useful as reference for the com- parison and discussion of such systems. A systemic approach, based on the novel notion of Enhanced Publication Management System (EPMS), is proposed as a more cost effective solution to the realization of EPIS, compared to the current âfrom scratchâ strategy. A reference architecture for EPMSs and a general-purpose data model for EPs are pro- posed with the intent of contributing at building structured foundations of what is today becoming an area of research on its own