80,267 research outputs found

    Towards structured sharing of raw and derived neuroimaging data across existing resources

    Full text link
    Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery

    A linked data approach to publishing complex scientific workflows

    Get PDF
    Past data management practices in many fields of natural science, including climate research, have focused primarily on the final research output - the research publication - with less attention paid to the chain of intermediate data results and their associated metadata, including provenance. Data were often regarded merely as an adjunct to the publication, rather than a scientific resource in their own right. In this paper, we attempt to address the issues of capturing and publishing detailed workflows associated with the climate/research datasets held by the Climatic Research Unit (CRU) at the University of East Anglia. To this end, we present a customisable approach to exposing climate research workflows for the effective re-use of the associated data, through the adoption of linked-data principles, existing widely adopted citation techniques (Digital Object Identifier) and data exchange mechanisms (Open Archives Initiative Object Reuse and Exchange)

    Templates as a method for implementing data provenance in decision support systems

    Get PDF
    AbstractDecision support systems are used as a method of promoting consistent guideline-based diagnosis supporting clinical reasoning at point of care. However, despite the availability of numerous commercial products, the wider acceptance of these systems has been hampered by concerns about diagnostic performance and a perceived lack of transparency in the process of generating clinical recommendations. This resonates with the Learning Health System paradigm that promotes data-driven medicine relying on routine data capture and transformation, which also stresses the need for trust in an evidence-based system. Data provenance is a way of automatically capturing the trace of a research task and its resulting data, thereby facilitating trust and the principles of reproducible research. While computational domains have started to embrace this technology through provenance-enabled execution middlewares, traditionally non-computational disciplines, such as medical research, that do not rely on a single software platform, are still struggling with its adoption. In order to address these issues, we introduce provenance templates – abstract provenance fragments representing meaningful domain actions. Templates can be used to generate a model-driven service interface for domain software tools to routinely capture the provenance of their data and tasks. This paper specifies the requirements for a Decision Support tool based on the Learning Health System, introduces the theoretical model for provenance templates and demonstrates the resulting architecture. Our methods were tested and validated on the provenance infrastructure for a Diagnostic Decision Support System that was developed as part of the EU FP7 TRANSFoRm project

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Provenance-Aware Sensor Data Storage

    Get PDF
    Sensor network data has both historical and realtime value. Making historical sensor data useful, in particular, requires storage, naming, and indexing. Sensor data presents new challenges in these areas. Such data is location-specific but also distributed; it is collected in a particular physical location and may be most useful there, but it has additional value when combined with other sensor data collections in a larger distributed system. Thus, arranging location-sensitive peer-to-peer storage is one challenge. Sensor data sets do not have obvious names, so naming them in a globally useful fashion is another challenge. The last challenge arises from the need to index these sensor data sets to make them searchable. The key to sensor data identity is provenance, the full history or lineage of the data. We show how provenance addresses the naming and indexing issues and then present a research agenda for constructing distributed, indexed repositories of sensor data.Engineering and Applied Science

    Provenance-Aware Sensor Data Storage

    Get PDF
    Sensor network data has both historical and realtime value. Making historical sensor data useful, in particular, requires storage, naming, and indexing. Sensor data presents new challenges in these areas. Such data is location-specific but also distributed; it is collected in a particular physical location and may be most useful there, but it has additional value when combined with other sensor data collections in a larger distributed system. Thus, arranging location-sensitive peer-to-peer storage is one challenge. Sensor data sets do not have obvious names, so naming them in a globally useful fashion is another challenge. The last challenge arises from the need to index these sensor data sets to make them searchable. The key to sensor data identity is provenance, the full history or lineage of the data. We show how provenance addresses the naming and indexing issues and then present a research agenda for constructing distributed, indexed repositories of sensor data.Engineering and Applied Science

    Document DNA: Distributed Content-Centered Provenance Data Tracking

    Get PDF
    This thesis presents a new content-centered approach to provenance data tracking: the Document DNA. Knowledge workers are overwhelmed as they find it hard to structure, maintain, and find re-used content within their digital workspace. This issue is aggravated by the growing amount of digital data knowledge workers need to maintain. This thesis introduces a concept for tracing the evolution of text-based content across documents in the digital work space, without the need for a centralized tracking system. Our concept is inspired by the DNA common to life forms. We present an analysis and comparison of research undertaken to support knowledge workers and review provenance data tracking systems. Provenance data has been used for data security, databases and to track knowledge workers' interactions with digital content. However, very little research is available on the usefulness of provenance data for knowledge workers. Furthermore, current provenance data research is based on central systems and tracks provenance at the file level. We conducted three user studies to explore current issues knowledge workers face when working with digital content. The first study examined current knowledge workers' problems when re-using digital content. The second study examined to what extend the issues detected in our first study are addressed by document management systems. We found that document management systems do not fully address these issues, and that not all knowledge workers make use of the document management system available to them. The third study examined reasons for low user saturation of available document management systems. As a result of these three studies we identified task categories and a variety of related issues. Driven by these findings, we developed a conceptual model for Document DNA, which tracks the provenance of data used in the identified tasks. To show the effectiveness of our approach, we created a software prototype and conducted a realistic user study. Our software prototype is a Microsoft Word Add-In that tracks the evolution of content included in Microsoft Word documents. In our final user study, participants executed example tasks gathered from real knowledge workers with and without the support of our software prototype. The results of our study confirm that the Document DNA successfully addresses the issues identified. The participants were significantly faster when performing the tasks using the software prototype; most participants using traditional methods failed to identify the provenance of the data, whereas the majority of participants using the software prototype succeeded

    Research note – barriers and solutions to linking and using health and social care data in Scotland

    Get PDF
    Integration of health and social care will require integrated data to drive service evaluation, design, joint working and research. We describe the results of a Scottish meeting of key stakeholders in this area. Potential uses for linked data included understanding client populations, mapping trajectories of dependency, identifying at risk groups, predicting required capacity for future service provision, and research to better understand the reciprocal interactions between health, social circumstances and care. Barriers to progress included lack of analytical capacity, incomplete understanding of data provenance and quality, intersystem incompatibility and issues of consent for data sharing. Potential solutions included better understanding the content, quality and provenance of social care data; investment in analytical capacity; improving communication between data providers and users in health and social care; clear guidance to systems developers and procurers; and enhanced engagement with the public. We plan a website for communication across Scotland on health and social care data linkage, educational resources for front line staff and researchers, plus further events for training and information dissemination. We believe that these processes hold lessons for other countries with an interest in linking health and social care data, as well as for cross-sector data linkage initiatives in general.</p

    Provenance on the Web, Leaving the Walled Garden Behind ...

    Get PDF
    Preprin
    • …
    corecore