154 research outputs found

    Containers for Portable, Productive, and Performant Scientific Computing

    Get PDF
    Containers are an emerging technology that holds promise for improving productivity and code portability in scientific computing. The authors examine Linux container technology for the distribution of a nontrivial scientific computing software stack and its execution on a spectrum of platforms from laptop computers through high-performance computing systems. For Python code run on large parallel computers, the runtime is reduced inside a container due to faster library imports. The software distribution approach and data that the authors present will help developers and users decide on whether container technology is appropriate for them. The article also provides guidance for vendors of HPC systems that rely on proprietary libraries for performance on what they can do to make containers work seamlessly and without performance penalty

    Leveraging Container Technologies in a GIScience Project: A Perspective from Open Reproducible Research

    Get PDF
    Scientific reproducibility is essential for the advancement of science. It allows the results of previous studies to be reproduced, validates their conclusions and develops new contributions based on previous research. Nowadays, more and more authors consider that the ultimate product of academic research is the scientific manuscript, together with all the necessary elements (i.e., code and data) so that others can reproduce the results. However, there are numerous difficulties for some studies to be reproduced easily (i.e., biased results, the pressure to publish, and proprietary data). In this context, we explain our experience in an attempt to improve the reproducibility of a GIScience project. According to our project needs, we evaluated a list of practices, standards and tools that may facilitate open and reproducible research in the geospatial domain, contextualising them on Peng’s reproducibility spectrum. Among these resources, we focused on containerisation technologies and performed a shallow review to reflect on the level of adoption of these technologies in combination with OSGeo software. Finally, containerisation technologies proved to enhance the reproducibility and we used UML diagrams to describe representative work-flows deployed in our GIScience project

    Leveraging Container Technologies in a GIScience Project: A Perspective from Open Reproducible Research

    Get PDF
    Scientific reproducibility is essential for the advancement of science. It allows the results of previous studies to be reproduced, validates their conclusions and develops new contributions based on previous research. Nowadays, more and more authors consider that the ultimate product of academic research is the scientific manuscript, together with all the necessary elements (i.e., code and data) so that others can reproduce the results. However, there are numerous difficulties for some studies to be reproduced easily (i.e., biased results, the pressure to publish, and proprietary data). In this context, we explain our experience in an attempt to improve the reproducibility of a GIScience project. According to our project needs, we evaluated a list of practices, standards and tools that may facilitate open and reproducible research in the geospatial domain, contextualising them on Peng’s reproducibility spectrum. Among these resources, we focused on containerisation technologies and performed a shallow review to reflect on the level of adoption of these technologies in combination with OSGeo software. Finally, containerisation technologies proved to enhance the reproducibility and we used UML diagrams to describe representative work-flows deployed in our GIScience project.This work has been funded by the Generalitat Valenciana through the “Subvenciones para la realización de proyectos de I+D+i desarrollados por grupos de investigación emergentes” programme (GV/2019/016) and by the Spanish Ministry of Economy and Competitiveness under the subprogrammes Challenges-Collaboration 2014 (RTC-2014-1863-8) and Challenges R+D+I 2016 (CSO2016-79420-R AEI/FEDER, EU). Sergio Trilles has been funded by the postdoctoral programme PINV2018 - Universitat Jaume I (POSDOC-B/2018/12) and stays programme PINV2018 - Universitat Jaume I (E/2019/031)

    The rockerverse : packages and applications for containerisation with R

    Get PDF
    The Rocker Project provides widely used Docker images for R across different application scenarios. This article surveys downstream projects that build upon the Rocker Project images and presents the current state of R packages for managing Docker images and controlling containers. These use cases cover diverse topics such as package development, reproducible research, collaborative work, cloud-based data processing, and production deployment of services. The variety of applications demonstrates the power of the Rocker Project specifically and containerisation in general. Across the diverse ways to use containers, we identified common themes: reproducible environments, scalability and efficiency, and portability across clouds. We conclude that the current growth and diversification of use cases is likely to continue its positive impact, but see the need for consolidating the Rockerverse ecosystem of packages, developing common practices for applications, and exploring alternative containerisation software

    Towards Modern, Accessible and Dynamic HPC Using Container-based Virtual Clusters

    Get PDF
    In this thesis, a novel Virtual Container Cluster (VCC) framework is presented. Despite the growing popularity of container virtualisation in order to increase the flexi-bility of the software stack, run time environment virtualisation still poses significant portability challenges; by depending on the underlying cluster execution paradigm,a niche class of HPC only containers has emerged. This trend is detrimental to reusability, reproducibility, and encouraging new communities to HPC. Traditional virtualisation techniques have a rich history within HPC, and have been demonstrated to offer much more than software flexibility. A Virtual Machine by nature requires an OS and full stack environment akin to a physical machine, and this allows it to be instantiated regardless of the underlying machine and what services it provides. This capability is essential in order to implement job forwarding and spanning - where the burden of an entire job can be transferred or shared between hetero-geneous cluster systems - with a high level of confidence that the environments will be compatible. In turn, this brings improvements to global resource performance, reducing the job turnaround time and increasing cluster utilization. The VCC is an innovative solution that combines the full stack and container virtualisation approaches. Therefore, it offers both the flexibility of containers with the improved portability, performance and scalability of the full stack approach. In order to maintain the same accessibility and lower barrier of entry as the run time environment approach, the design incorporates an autonomous configuration and contextualisation mechanism, along with a Software Defined Networking technology, to ensure the full stack container does not place an additional burden on the user. The usefulness and performance is validated through benchmarking and two case studies: virtual clusters in the classroom and inter-institutional spanning

    Addendum to Informatics for Health 2017: Advancing both science and practice

    Get PDF
    This article presents presentation and poster abstracts that were mistakenly omitted from the original publication

    VCC: A framework for building containerized reproducible cluster software environments

    Get PDF
    The problem of portability and reproducibility of the software used to conduct computational experiments has recently come to the fore. Container virtualisation has proved to be a powerful tool to achieve portability of a code and it's execution environment, through runtimes such as Docker, LXC, Singularity and others - without the performance cost of traditional Virtual Machines (Chamberlain, Invenshure, and Schommer 2014; Felter et al. 2014). However, scientific software often depends on a system foundation that provides middleware, libraries, and other supporting software in order for the code to execute as intended. Typically, container virtualisation addresses only the portability of the code itself, which does not make it inherently reproducible. For example, a containerized MPI application may offer binary compatibility between different systems, but for execution as intended, it must be run on an existing cluster that provides the correct interfaces for parallel MPI execution. As a greater demand to accomodate a diverse range of disciplines is placed on high performance and cluster resources, the ability to quickly create and teardown reproducible, transitory virtual environments that are tailored for an individual task or experiment will be essential. The Virtual Container Cluster (VCC) is a framework for building containers that achieve this goal, by encapsulating a parallel application along with an execution model, through a set of dependency linked services and built-in process orchestration. This promotes a high degree of portability, and offers easier reproducibility by shipping the application along with the foundation required to execute it - whether that be an MPI cluster, big data processing framework, bioinformatics pipeline, or any other execution model (Higgins, Holmes, and Venters 2017)

    Enabling the processing of bioinformatics workflows where data is located through the use of cloud and container technologies

    Get PDF
    >Magister Scientiae - MScThe growing size of raw data and the lack of internet communication technology to keep up with that growth is introducing unique challenges to academic researchers. This is especially true for those residing in rural areas or countries with sub-par telecommunication infrastructure. In this project I investigate the usefulness of cloud computing technology, data analysis workflow languages and portable computation for institutions that generate data. I introduce the concept of a software solution that could be used to simplify the way that researchers execute their analysis on data sets at remote sources, rather than having to move the data. The scope of this project involved conceptualising and designing a software system to simplify the use of a cloud environment as well as implementing a working prototype of said software for the OpenStack cloud computing platform. I conclude that it is possible to improve the performance of research pipelines by removing the need for researchers to have operating system or cloud computing knowledge and that utilising technologies such as this can ease the burden of moving data
    corecore