153 research outputs found

    A lightweight approach to research object data packaging

    Get PDF
    A Research Object (RO) provides a machine-readable mechanism to communicate the diverse set of digital and real-world resources that contribute to an item of research. The aim of an RO is to evolve from traditional academic publication as a static PDF, to rather provide a complete and structured archive of the items (such as people, organisations, funding, equipment, software etc) that contributed to the research outcome, including their identifiers, provenance, relations and annotations. This is of particular importance as all domains of research and science are increasingly relying on computational analysis, yet we are facing a reproducibility crisis because key components are often not sufficiently tracked, archived or reported. Here we propose Research Object Crate (or RO-Crate for short), an emerging lightweight approach to packaging research data with their structured metadata, rephrasing the Research Object model as schema.org annotations to formalize a JSON-LD format that can be used independently of infrastructure, e.g. in GitHub or Zenodo archives. RO-Crate can be extended for domain-specific descriptions, aiming at a wide variety of applications and repositories to encourage FAIR sharing of reproducible datasets and analytical methods.Abstract accepted for talk at Bioinformatics Open Source Conference (BOSC2019). Slides https://doi.org/10.7490/f1000research.1117129.1 Poster https://doi.org/10.7490/f1000research.1117130.1 Video recording https://www.youtube.com/watch?v=AociW94muL

    Functional units: Abstractions for Web service annotations

    Get PDF
    Computational and data-intensive science increasingly depends on a large Web Service infrastructure, as services that provide a broad array of functionality can be composed into workflows to address complex research questions. In this context, the goal of service registries is to offer accurate search and discovery functions to scientists. Their effectiveness, however, depends not only on the model chosen to annotate the services, but also on the level of abstraction chosen for the annotations. The work presented in this paper stems from the observation that current annotation models force users to think in terms of service interfaces, rather than of high-level functionality, thus reducing their effectiveness. To alleviate this problem, we introduce Functional Units (FU) as the elementary units of information used to describe a service. Using popular examples of services for the Life Sciences, we define FUs as configurations and compositions of underlying service operations, and show how functional-style service annotations can be easily realised using the OWL semantic Web language. Finally, we suggest techniques for automating the service annotations process, by analysing collections of workflows that use those services.</p

    Health related quality of life trajectories and predictors following coronary artery bypass surgery

    Get PDF
    BACKGROUND: Many studies have demonstrated that health related quality of life (HRQoL) improves, on average, after coronary artery bypass graft surgery (CABGS). However, this average improvement may not be realized for all patients, and it is possible that there are two or more distinctive groups with different, possibly non-linear, trajectories of change over time. Furthermore, little is known about the predictors that are associated with these possible HRQoL trajectories after CABGS. METHODS: 182 patients listed for elective CABGS at The Royal Melbourne Hospital completed a postal battery of questionnaires which included the Short-Form-36 (SF-36), Profile of Mood States (POMS) and the Everyday Functioning Questionnaire (EFQ). These data were collected on average a month before surgery, and at two months and six months after surgery. Socio-demographic and medical characteristics prior to surgery, as well as surgical and post-surgical complications and symptoms were also assessed. Growth curve and growth mixture modelling were used to identify trajectories of HRQoL. RESULTS: For both the physical component summary scale (PCS) and the mental component summary scale (MCS) of the SF-36, two groups of patients with distinct trajectories of HRQoL following surgery could be identified (improvers and non-improvers). A series of logistic regression analyses identified different predictors of group membership for PCS and MCS trajectories. For the PCS the most significant predictors of non-improver membership were lower scores on POMS vigor-activity and higher New York Heart Association dyspnoea class; for the MCS the most significant predictors of non-improver membership were higher scores on POMS depression-dejection and manual occupation. CONCLUSION: It is incorrect to assume that HRQoL will improve in a linear fashion for all patients following CABGS. Nor was there support for a single response trajectory. It is important to identify characteristics of each patient, and those post-operative symptoms that could be possible targets for intervention to improve HRQoL outcomes

    myExperiment: a repository and social network for the sharing of bioinformatics workflows

    Get PDF
    myExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research, which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 1000 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to [email protected]

    Methods Included:Standardizing Computational Reuse and Portability with the Common Workflow Language

    Get PDF
    A widely used standard for portable multilingual data analysis pipelines would enable considerable benefits to scholarly publication reuse, research/industry collaboration, regulatory cost control, and to the environment. Published research that used multiple computer languages for their analysis pipelines would include a complete and reusable description of that analysis that is runnable on a diverse set of computing environments. Researchers would be able to easier collaborate and reuse these pipelines, adding or exchanging components regardless of programming language used; collaborations with and within the industry would be easier; approval of new medical interventions that rely on such pipelines would be faster. Time will be saved and environmental impact would also be reduced, as these descriptions contain enough information for advanced optimization without user intervention. Workflows are widely used in data analysis pipelines, enabling innovation and decision-making for the modern society. In many domains the analysis components are numerous and written in multiple different computer languages by third parties. However, lacking a standard for reusable and portable multilingual workflows, then reusing published multilingual workflows, collaborating on open problems, and optimizing their execution would be severely hampered. Moreover, only a standard for multilingual data analysis pipelines that was widely used would enable considerable benefits to research-industry collaboration, regulatory cost control, and to preserving the environment. Prior to the start of the CWL project, there was no standard for describing multilingual analysis pipelines in a portable and reusable manner. Even today / currently, although there exist hundreds of single-vendor and other single-source systems that run workflows, none is a general, community-driven, and consensus-built standard

    Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data.

    Get PDF
    BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. RESULTS: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. CONCLUSION: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
    corecore