819 research outputs found

    Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline.

    Get PDF
    Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges--management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu

    Towards structured sharing of raw and derived neuroimaging data across existing resources

    Full text link
    Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery

    brainlife.io: A decentralized and open source cloud platform to support neuroscience research

    Full text link
    Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR data analysis to portions of the worldwide research community. brainlife.io was developed to reduce these burdens and democratize modern neuroscience research across institutions and career levels. Using community software and hardware infrastructure, the platform provides open-source data standardization, management, visualization, and processing and simplifies the data pipeline. brainlife.io automatically tracks the provenance history of thousands of data objects, supporting simplicity, efficiency, and transparency in neuroscience research. Here brainlife.io's technology and data services are described and evaluated for validity, reliability, reproducibility, replicability, and scientific utility. Using data from 4 modalities and 3,200 participants, we demonstrate that brainlife.io's services produce outputs that adhere to best practices in modern neuroscience research

    Recording provenance of workflow runs with RO-Crate

    Get PDF
    Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products.Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing.However, existing approaches tend to lack interoperable adoption across workflow management systems.In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.).The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects.Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems.We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems.Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.A corresponding RO-Crate for this article is at https://w3id.org/ro/doi/10.5281/zenodo.1036898

    A Semantic Framework to Support AI System Accountability and Audit

    Get PDF
    The Semantic Web - 18th International Conference, ESWC 2021, Proceedings Springer Science and Business Media Deutschland GmbH ISBN: 9783030773847 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ISSN (Print): 0302-9743 ISSN (Electronic): 1611-3349 Volume: 12731 LNCSPostprintPostprin

    NeuroProv - A visualisation system to enhance the utility of provenance Data for neuroimaging analysis

    Get PDF
    E-Science platforms such as myGRID and NeuGRID for Users are growing at an amazing rate. One of the key barriers to their widespread use in practice is the lack of provenance data to support the reasoning and verification of experimental or analysis results. Clinical researchers use workflows to orchestrate the data present in e-science platforms in order to facilitate processing. Even though most systems capture provenance data and store it, systems rarely make use of it, thus limiting the exploitation of the true potential of such provenance. This thesis investigates mechanisms to visualise provenance data for neuroimaging analysis and to provide means to exploit the true potential of provenance data. In order to achieve this, a visualisation system has been implemented based on the use-cases that have been designed following requirements elicited for neuroimaging analysis. In this research a technique has been used to address the requirements of provenance visualisation for neuroimaging analysis. The prototype system has been tested against the provenance generated by NeuGRID for Users (N4U) as a proof of concept for our research. Different workflows have been visualised to study the efficacy of the proposed solution. Furthermore, evaluation metrics have been defined to determine whether the proposed solution is suitable for the purpose of the research conducted. The results show that the proposed visualisation system enhances the utility of provenance data for neuroimaging analysis and therefore the proposed research can be used to provide value to provenance data for neuroimaging analyses

    A provenance-based semantic approach to support understandability, reproducibility, and reuse of scientific experiments

    Get PDF
    Understandability and reproducibility of scientific results are vital in every field of science. Several reproducibility measures are being taken to make the data used in the publications findable and accessible. However, there are many challenges faced by scientists from the beginning of an experiment to the end in particular for data management. The explosive growth of heterogeneous research data and understanding how this data has been derived is one of the research problems faced in this context. Interlinking the data, the steps and the results from the computational and non-computational processes of a scientific experiment is important for the reproducibility. We introduce the notion of end-to-end provenance management'' of scientific experiments to help scientists understand and reproduce the experimental results. The main contributions of this thesis are: (1) We propose a provenance modelREPRODUCE-ME'' to describe the scientific experiments using semantic web technologies by extending existing standards. (2) We study computational reproducibility and important aspects required to achieve it. (3) Taking into account the REPRODUCE-ME provenance model and the study on computational reproducibility, we introduce our tool, ProvBook, which is designed and developed to demonstrate computational reproducibility. It provides features to capture and store provenance of Jupyter notebooks and helps scientists to compare and track their results of different executions. (4) We provide a framework, CAESAR (CollAborative Environment for Scientific Analysis with Reproducibility) for the end-to-end provenance management. This collaborative framework allows scientists to capture, manage, query and visualize the complete path of a scientific experiment consisting of computational and non-computational steps in an interoperable way. We apply our contributions to a set of scientific experiments in microscopy research projects

    GENERIC AND ADAPTIVE METADATA MANAGEMENT FRAMEWORK FOR SCIENTIFIC DATA REPOSITORIES

    Get PDF
    Der rapide technologische Fortschritt hat in verschiedenen Forschungsdisziplinen zu vielfältigen Weiterentwicklungen in Datenakquise und -verarbeitung geführt. Hi- eraus wiederum resultiert ein immenses Wachstum an Daten und Metadaten, gener- iert durch wissenschaftliche Experimente. Unabhängig vom konkreten Forschungs- gebiet ist die wissenschaftliche Praxis immer stärker durch Daten und Metadaten gekennzeichnet. In der Folge intensivieren Universitäten, Forschungsgemeinschaften und Förderagenturen ihre Bemühungen, wissenschaftliche Daten effizient zu sichten, zu speichern und auszuwerten. Die wesentlichen Ziele wissenschaftlicher Daten- Repositorien sind die Etablierung von Langzeitspeicher, der Zugriff auf Daten, die Bereitstellung von Daten für die Wiederverwendung und deren Referenzierung, die Erfassung der Datenquelle zur Reproduzierbarkeit sowie die Bereitstellung von Meta- daten, Anmerkungen oder Verweisen zur Vermittlung domänenspezifischen Wis- sens, das zur Interpretation der Daten notwendig ist. Wissenschaftliche Datenspe- icher sind hochkomplexe Systeme, bestehend aus Elementen aus unterschiedlichen Forschungsfeldern, wie z. B. Algorithmen für Datenkompression und Langzeit- datenarchivierung, Frameworks für das Metadaten- und Annotations-management, Workflow-Provenance und Provenance-Interoperabilität zwischen heterogenen Work- flowsystemen, Autorisierungs und Authentifizierungsinfrastrukturen sowie Visual- isierungswerkzeuge für die Dateninterpretation. Die vorliegende Arbeit beschreibt eine modulare Architektur für ein wis- senschaftliches Datenarchiv, die Forschungsgemeinschaften darin unterstützt, ihre Daten und Metadaten gezielt über den jeweiligen Lebenszyklus hinweg zu orchestri- eren. Diese Architektur besteht aus Komponenten, die vier Forschungsfelder repräsen- tieren. Die erste Komponente ist ein Client zur Datenübertragung (“data transfer client”). Er bietet eine generische Schnittstelle für die Erfassung von Daten und den Zugriff auf Daten aus wissenschaftlichen Datenakquisesystemen. Die zweite Komponente ist das MetaStore-Framework, ein adaptives Metadaten- Management-Framework, das die Handhabung sowohl statischer als auch dynamis- cher Metadatenmodelle ermöglicht. Um beliebige Metadatenschemata behandeln zu können, basiert die Entwicklung des MetaStore-Frameworks auf dem komponen- tenbasierten dynamischen Kompositions-Entwurfsmuster (component-based dynamic composition design pattern). Der MetaStore ist außerdem mit einem Annotations- framework für die Handhabung von dynamischen Metadaten ausgestattet. Die dritte Komponente ist eine Erweiterung des MetaStore-Frameworks zur au- tomatisierten Behandlung von Provenance-Metadaten für BPEL-basierte Workflow- Management-Systeme. Der von uns entworfene und implementierte Prov2ONE Al- gorithmus übersetzt dafür die Struktur und Ausführungstraces von BPEL-Workflow- Definitionen automatisch in das Provenance-Modell ProvONE. Hierbei ermöglicht die Verfügbarkeit der vollständigen BPEL-Provenance-Daten in ProvONE nicht nur eine aggregierte Analyse der Workflow-Definition mit ihrem Ausführungstrace, sondern gewährleistet auch die Kompatibilität von Provenance-Daten aus unterschiedlichen Spezifikationssprachen. Die vierte Komponente unseres wissenschaftlichen Datenarchives ist das Provenance-Interoperabilitätsframework ProvONE - Provenance Interoperability Framework (P-PIF). Dieses gewährleistet die Interoperabilität von Provenance-Daten heterogener Provenance-Modelle aus unterschiedlichen Workflowmanagementsyste- men. P-PIF besteht aus zwei Komponenten: dem Prov2ONE-Algorithmus für SCUFL und MoML Workflow-Spezifikationen und Workflow-Management-System- spezifischen Adaptern zur Extraktion, Übersetzung und Modellierung retrospektiver Provenance-Daten in das ProvONE-Provenance-Modell. P-PIF kann sowohl Kon- trollfluss als auch Datenfluss nach ProvONE übersetzen. Die Verfügbarkeit hetero- gener Provenance-Traces in ProvONE ermöglicht das Vergleichen, Analysieren und Anfragen von Provenance-Daten aus unterschiedlichen Workflowsystemen. Wir haben die Komponenten des in dieser Arbeit vorgestellten wissenschaftlichen Datenarchives wie folgt evaluiert: für den Client zum Datentrasfer haben wir die Daten-übertragungsleistung mit dem Standard-Protokoll für Nanoskopie-Datensätze untersucht. Das MetaStore-Framework haben wir hinsichtlich der folgenden bei- den Aspekte evaluiert. Zum einen haben wir die Metadatenaufnahme und Voll- textsuchleistung unter verschiedenen Datenbankkonfigurationen getestet. Zum an- deren zeigen wir die umfassende Abdeckung der Funktionalitäten von MetaStore durch einen funktionsbasierten Vergleich von MetaStore mit bestehenden Metadaten- Management-Systemen. Für die Evaluation von P-PIF haben wir zunächst die Korrek- theit und Vollständigkeit unseres Prov2ONE-Algorithmus bewiesen und darüber hin- aus die vom Prov2ONE BPEL-Algorithmus generierten Prognose-Graphpattern aus ProvONE gegen bestehende BPEL-Kontrollflussmuster ausgewertet. Um zu zeigen, dass P-PIF ein nachhaltiges Framework ist, das sich an Standards hält, vergle- ichen wir außerdem die Funktionen von P-PIF mit denen bestehender Provenance- Interoperabilitätsframeworks. Diese Auswertungen zeigen die Überlegenheit und die Vorteile der einzelnen in dieser Arbeit entwickelten Komponenten gegenüber ex- istierenden Systemen

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems
    corecore