1,064 research outputs found

    The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

    Get PDF
    Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVerse and FigShare. Such repositories provide users with the means to share and publish datasets that were used or generated in research investigations. While these repositories have proven their usefulness, interpreting and reusing evidence for most research results is a challenging task. Additional contextual descriptions are needed to understand how those results were generated and/or the circumstances under which they were concluded. Because of this, scientists are calling for models that go beyond the publication of datasets to systematically capture the life cycle of scientific investigations and provide a single entry point to access the information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc. In this paper we present the Research Object (RO) suite of ontologies, which provide a structured container to encapsulate research data and methods along with essential metadata descriptions. Research Objects are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results. The ontologies we present have been designed in the light of requirements that we gathered from life scientists. They have been built upon existing popular vocabularies to facilitate interoperability. Furthermore, we have developed tools to support the creation and sharing of Research Objects, thereby promoting and facilitating their adoption.Comment: 20 page

    Workflow reuse in practice: a study of neuroimaging pipeline users

    Get PDF
    Workflow reuse is a major benefit of workflow systems and shared workflow repositories, but there are barely any studies that quantify the degree of reuse of workflows or the practical barriers that may stand in the way of successful reuse. In our own work, we hypothesize that defining workflow fragments improves reuse, since end-to-end workflows may be very specific and only partially reusable by others. This paper reports on a study of the current use of workflows and workflow fragments in labs that use the LONI Pipeline, a popular workflow system used mainly for neuroimaging research that enables users to define and reuse workflow fragments. We present an overview of the benefits of workflows and workflow fragments reported by users in informal discussions. We also report on a survey of researchers in a lab that has the LONI Pipeline installed, asking them about their experiences with reuse of workflow fragments and the actual benefits they perceive. This leads to quantifiable indicators of the reuse of workflows and workflow fragments in practice. Finally, we discuss barriers to further adoption of workflow fragments and workflow reuse that motivate further work

    APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools

    Full text link
    [EN] Background Scientific publications are meant to exchange knowledge among researchers but the inability to properly reproduce computational experiments limits the quality of scientific research. Furthermore, bibliography shows that irreproducible preclinical research exceeds 50%, which produces a huge waste of resources on nonprofitable research at Life Sciences field. As a consequence, scientific reproducibility is being fostered to promote Open Science through open databases and software tools that are typically deployed on existing computational resources. However, some computational experiments require complex virtual infrastructures, such as elastic clusters of PCs, that can be dynamically provided from multiple clouds. Obtaining these infrastructures requires not only an infrastructure provider, but also advanced knowledge in the cloud computing field. Objectives The main aim of this paper is to improve reproducibility in life sciences to produce better and more cost-effective research. For that purpose, our intention is to simplify the infrastructure usage and deployment for researchers. Methods This paper introduces Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools (APRICOT), an open source extension for Jupyter to deploy deterministic virtual infrastructures across multiclouds for reproducible scientific computational experiments. To exemplify its utilization and how APRICOT can improve the reproduction of experiments with complex computation requirements, two examples in the field of life sciences are provided. All requirements to reproduce both experiments are disclosed within APRICOT and, therefore, can be reproduced by the users. Results To show the capabilities of APRICOT, we have processed a real magnetic resonance image to accurately characterize a prostate cancer using a Message Passing Interface cluster deployed automatically with APRICOT. In addition, the second example shows how APRICOT scales the deployed infrastructure, according to the workload, using a batch cluster. This example consists of a multiparametric study of a positron emission tomography image reconstruction. Conclusion APRICOT's benefits are the integration of specific infrastructure deployment, the management and usage for Open Science, making experiments that involve specific computational infrastructures reproducible. All the experiment steps and details can be documented at the same Jupyter notebook which includes infrastructure specifications, data storage, experimentation execution, results gathering, and infrastructure termination. Thus, distributing the experimentation notebook and needed data should be enough to reproduce the experiment.This study was supported by the program "Ayudas para la contratación de personal investigador en formación de carácter predoctoral, programa VALi+d" under grant number ACIF/2018/148 from the Conselleria d'Educació of the Generalitat Valenciana and the "Fondo Social Europeo" (FSE). The authors would like to thank the Spanish "Ministerio de Economía, Industria y Competitividad" for the project "BigCLOE" with reference number TIN2016-79951-R and the European Commission, Horizon 2020 grant agreement No 826494 (PRIMAGE). The MRI prostate study case used in this article has been retrospectively collected from a project of prostate MRI biomarkers validation.Giménez-Alventosa, V.; Segrelles Quilis, JD.; Moltó, G.; Roca-Sogorb, M. (2020). APRICOT: Advanced Platform for Reproducible Infrastructures in the Cloud via Open Tools. Methods of Information in Medicine. 59(S 02):e33-e45. https://doi.org/10.1055/s-0040-1712460Se33e4559S 02Donoho, D. L., Maleki, A., Rahman, I. U., Shahram, M., & Stodden, V. (2009). Reproducible Research in Computational Harmonic Analysis. Computing in Science & Engineering, 11(1), 8-18. doi:10.1109/mcse.2009.15Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The Economics of Reproducibility in Preclinical Research. PLOS Biology, 13(6), e1002165. doi:10.1371/journal.pbio.1002165Chillarón, M., Vidal, V., & Verdú, G. (2020). CT image reconstruction with SuiteSparseQR factorization package. Radiation Physics and Chemistry, 167, 108289. doi:10.1016/j.radphyschem.2019.04.039Reader, A. J., Ally, S., Bakatselos, F., Manavaki, R., Walledge, R. J., Jeavons, A. P., … Zweit, J. (2002). One-pass list-mode EM algorithm for high-resolution 3-D PET image reconstruction into large arrays. IEEE Transactions on Nuclear Science, 49(3), 693-699. doi:10.1109/tns.2002.1039550Giménez-Alventosa, V., Antunes, P. C. G., Vijande, J., Ballester, F., Pérez-Calatayud, J., & Andreo, P. (2016). Collision-kerma conversion between dose-to-tissue and dose-to-water by photon energy-fluence corrections in low-energy brachytherapy. Physics in Medicine and Biology, 62(1), 146-164. doi:10.1088/1361-6560/aa4f6aWilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., … Bourne, P. E. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1). doi:10.1038/sdata.2016.18Calatrava, A., Romero, E., Moltó, G., Caballer, M., & Alonso, J. M. (2016). Self-managed cost-efficient virtual elastic clusters on hybrid Cloud infrastructures. Future Generation Computer Systems, 61, 13-25. doi:10.1016/j.future.2016.01.018Caballer, M., Blanquer, I., Moltó, G., & de Alfonso, C. (2014). Dynamic Management of Virtual Infrastructures. Journal of Grid Computing, 13(1), 53-70. doi:10.1007/s10723-014-9296-5Wolstencroft, K., Owen, S., Krebs, O., Nguyen, Q., Stanford, N. J., Golebiewski, M., … Goble, C. (2015). SEEK: a systems biology data and model management platform. BMC Systems Biology, 9(1). doi:10.1186/s12918-015-0174-yDe Alfonso, C., Caballer, M., Calatrava, A., Moltó, G., & Blanquer, I. (2018). Multi-elastic Datacenters: Auto-scaled Virtual Clusters on Energy-Aware Physical Infrastructures. Journal of Grid Computing, 17(1), 191-204. doi:10.1007/s10723-018-9449-zRawla, P. (2019). Epidemiology of Prostate Cancer. World Journal of Oncology, 10(2), 63-89. doi:10.14740/wjon1191Bratan, F., Niaf, E., Melodelima, C., Chesnais, A. L., Souchon, R., Mège-Lechevallier, F., … Rouvière, O. (2013). Influence of imaging and histological factors on prostate cancer detection and localisation on multiparametric MRI: a prospective study. European Radiology, 23(7), 2019-2029. doi:10.1007/s00330-013-2795-0Le, J. D., Tan, N., Shkolyar, E., Lu, D. Y., Kwan, L., Marks, L. S., … Reiter, R. E. (2015). Multifocality and Prostate Cancer Detection by Multiparametric Magnetic Resonance Imaging: Correlation with Whole-mount Histopathology. European Urology, 67(3), 569-576. doi:10.1016/j.eururo.2014.08.079Brix, G., Semmler, W., Port, R., Schad, L. R., Layer, G., & Lorenz, W. J. (1991). Pharmacokinetic Parameters in CNS Gd-DTPA Enhanced MR Imaging. Journal of Computer Assisted Tomography, 15(4), 621-628. doi:10.1097/00004728-199107000-00018Larsson, H. B. W., Stubgaard, M., Frederiksen, J. L., Jensen, M., Henriksen, O., & Paulson, O. B. (1990). Quantitation of blood-brain barrier defect by magnetic resonance imaging and gadolinium-DTPA in patients with multiple sclerosis and brain tumors. Magnetic Resonance in Medicine, 16(1), 117-131. doi:10.1002/mrm.1910160111Tofts, P. S., & Kermode, A. G. (1991). Measurement of the blood-brain barrier permeability and leakage space using dynamic MR imaging. 1. Fundamental concepts. Magnetic Resonance in Medicine, 17(2), 357-367. doi:10.1002/mrm.1910170208Donahue, K. M., Weisskoff, R. M., & Burstein, D. (1997). Water diffusion and exchange as they influence contrast enhancement. Journal of Magnetic Resonance Imaging, 7(1), 102-110. doi:10.1002/jmri.1880070114Flouri, D., Lesnic, D., & Sourbron, S. P. (2015). Fitting the two-compartment model in DCE-MRI by linear inversion. Magnetic Resonance in Medicine, 76(3), 998-1006. doi:10.1002/mrm.25991Brun, R., & Rademakers, F. (1997). ROOT — An object oriented data analysis framework. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 389(1-2), 81-86. doi:10.1016/s0168-9002(97)00048-xXuan Liu, Comtat, C., Michel, C., Kinahan, P., Defrise, M., & Townsend, D. (2001). Comparison of 3-D reconstruction with 3D-OSEM and with FORE+OSEM for PET. IEEE Transactions on Medical Imaging, 20(8), 804-814. doi:10.1109/42.938248Singh, S., Kalra, M. K., Hsieh, J., Licato, P. E., Do, S., Pien, H. H., & Blake, M. A. (2010). Abdominal CT: Comparison of Adaptive Statistical Iterative and Filtered Back Projection Reconstruction Techniques. Radiology, 257(2), 373-383. doi:10.1148/radiol.10092212Shepp, L. A., & Vardi, Y. (1982). Maximum Likelihood Reconstruction for Emission Tomography. IEEE Transactions on Medical Imaging, 1(2), 113-122. doi:10.1109/tmi.1982.4307558Goo, J. M., Tongdee, T., Tongdee, R., Yeo, K., Hildebolt, C. F., & Bae, K. T. (2005). Volumetric Measurement of Synthetic Lung Nodules with Multi–Detector Row CT: Effect of Various Image Reconstruction Parameters and Segmentation Thresholds on Measurement Accuracy. Radiology, 235(3), 850-856. doi:10.1148/radiol.2353040737Ravenel, J. G., Leue, W. M., Nietert, P. J., Miller, J. V., Taylor, K. K., & Silvestri, G. A. (2008). Pulmonary Nodule Volume: Effects of Reconstruction Parameters on Automated Measurements—A Phantom Study. Radiology, 247(2), 400-408. doi:10.1148/radiol.2472070868Hu, Y.-H., Zhao, B., & Zhao, W. (2008). Image artifacts in digital breast tomosynthesis: Investigation of the effects of system geometry and reconstruction parameters using a linear system approach. Medical Physics, 35(12), 5242-5252. doi:10.1118/1.2996110Lyra, M., & Ploussi, A. (2011). Filtering in SPECT Image Reconstruction. International Journal of Biomedical Imaging, 2011, 1-14. doi:10.1155/2011/69379

    Automatic deployment and reproducibility of workflow on the Cloud using container virtualization

    Get PDF
    PhD ThesisCloud computing is a service-oriented approach to distributed computing that has many attractive features, including on-demand access to large compute resources. One type of cloud applications are scientific work ows, which are playing an increasingly important role in building applications from heterogeneous components. Work ows are increasingly used in science as a means to capture, share, and publish computational analysis. Clouds can offer a number of benefits to work ow systems, including the dynamic provisioning of the resources needed for computation and storage, which has the potential to dramatically increase the ability to quickly extract new results from the huge amounts of data now being collected. However, there are increasing number of Cloud computing platforms, each with different functionality and interfaces. It therefore becomes increasingly challenging to de ne work ows in a portable way so that they can be run reliably on different clouds. As a consequence, work ow developers face the problem of deciding which Cloud to select and - more importantly for the long-term - how to avoid vendor lock-in. A further issue that has arisen with work ows is that it is common for them to stop being executable a relatively short time after they were created. This can be due to the external resources required to execute a work ow - such as data and services - becoming unavailable. It can also be caused by changes in the execution environment on which the work ow depends, such as changes to a library causing an error when a work ow service is executed. This "work ow decay" issue is recognised as an impediment to the reuse of work ows and the reproducibility of their results. It is becoming a major problem, as the reproducibility of science is increasingly dependent on the reproducibility of scientific work ows. In this thesis we presented new solutions to address these challenges. We propose a new approach to work ow modelling that offers a portable and re-usable description of the work ow using the TOSCA specification language. Our approach addresses portability by allowing work ow components to be systematically specifed and automatically - v - deployed on a range of clouds, or in local computing environments, using container virtualisation techniques. To address the issues of reproducibility and work ow decay, our modelling and deployment approach has also been integrated with source control and container management techniques to create a new framework that e ciently supports dynamic work ow deployment, (re-)execution and reproducibility. To improve deployment performance, we extend the framework with number of new optimisation techniques, and evaluate their effect on a range of real and synthetic work ows.Ministry of Higher Education and Scientific Research in Iraq and Mosul Universit

    Implementation of FAIR principles in the IPCC: the WGI AR6 Atlas repository

    Get PDF
    The Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC) has adopted the FAIR Guiding Principles. We present the Atlas chapter of Working Group I (WGI) as a test case. We describe the application of the FAIR principles in the Atlas, the challenges faced during its implementation, and those that remain for the future. We introduce the open source repository resulting from this process, including coding (e.g., annotated Jupyter notebooks), data provenance, and some aggregated datasets used in some figures in the Atlas chapter and its interactive companion (the Interactive Atlas), open to scrutiny by the scientific community and the general public. We describe the informal pilot review conducted on this repository to gather recommendations that led to significant improvements. Finally, a working example illustrates the re-use of the repository resources to produce customized regional information, extending the Interactive Atlas products and running the code interactively in a web browser using Jupyter notebooks.Peer reviewe
    corecore