27 research outputs found

    Deploying Jupyter Notebooks at scale on XSEDE resources for Science Gateways and workshops

    Full text link
    Jupyter Notebooks have become a mainstream tool for interactive computing in every field of science. Jupyter Notebooks are suitable as companion applications for Science Gateways, providing more flexibility and post-processing capability to the users. Moreover they are often used in training events and workshops to provide immediate access to a pre-configured interactive computing environment. The Jupyter team released the JupyterHub web application to provide a platform where multiple users can login and access a Jupyter Notebook environment. When the number of users and memory requirements are low, it is easy to setup JupyterHub on a single server. However, setup becomes more complicated when we need to serve Jupyter Notebooks at scale to tens or hundreds of users. In this paper we will present three strategies for deploying JupyterHub at scale on XSEDE resources. All options share the deployment of JupyterHub on a Virtual Machine on XSEDE Jetstream. In the first scenario, JupyterHub connects to a supercomputer and launches a single node job on behalf of each user and proxies back the Notebook from the computing node back to the user's browser. In the second scenario, implemented in the context of a XSEDE consultation for the IRIS consortium for Seismology, we deploy Docker in Swarm mode to coordinate many XSEDE Jetstream virtual machines to provide Notebooks with persistent storage and quota. In the last scenario we install the Kubernetes containers orchestration framework on Jetstream to provide a fault-tolerant JupyterHub deployment with a distributed filesystem and capability to scale to thousands of users. In the conclusion section we provide a link to step-by-step tutorials complete with all the necessary commands and configuration files to replicate these deployments.Comment: 7 pages, 3 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

    Workshop Report: Container Based Analysis Environments for Research Data Access and Computing

    Get PDF
    Report of the first workshop on Container Based Analysis Environments for Research Data Access and Computing supported by the National Data Service and Data Exploration Lab and held at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign

    Enabling Interactive Analytics of Secure Data using Cloud Kotta

    Full text link
    Research, especially in the social sciences and humanities, is increasingly reliant on the application of data science methods to analyze large amounts of (often private) data. Secure data enclaves provide a solution for managing and analyzing private data. However, such enclaves do not readily support discovery science---a form of exploratory or interactive analysis by which researchers execute a range of (sometimes large) analyses in an iterative and collaborative manner. The batch computing model offered by many data enclaves is well suited to executing large compute tasks; however it is far from ideal for day-to-day discovery science. As researchers must submit jobs to queues and wait for results, the high latencies inherent in queue-based, batch computing systems hinder interactive analysis. In this paper we describe how we have augmented the Cloud Kotta secure data enclave to support collaborative and interactive analysis of sensitive data. Our model uses Jupyter notebooks as a flexible analysis environment and Python language constructs to support the execution of arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing, Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page

    Toward Open and Reproducible Environmental Modeling by Integrating Online Data Repositories, Computational Environments, and Model Application Programming Interfaces

    Get PDF
    Cyberinfrastructure needs to be advanced to enable open and reproducible environmental modeling research. Recent efforts toward this goal have focused on advancing online repositories for data and model sharing, online computational environments along with containerization technology and notebooks for capturing reproducible computational studies, and Application Programming Interfaces (APIs) for simulation models to foster intuitive programmatic control. The objective of this research is to show how these efforts can be integrated to support reproducible environmental modeling. We present first the high-level concept and general approach for integrating these three components. We then present one possible implementation that integrates HydroShare (an online repository), CUAHSI JupyterHub and CyberGIS-Jupyter for Water (computational environments), and pySUMMA (a model API) to support open and reproducible hydrologic modeling. We apply the example implementation for a hydrologic modeling use case to demonstrate how the approach can advance reproducible environmental modeling through the seamless integration of cyberinfrastructure services

    A Vision for Science Gateways: Bridging the Gap and Broadening the Outreach

    Get PDF
    The future for science gateways warrants exploration as we consider the possibilities that extend well beyond science and high performance computing into new interfaces, applications and user communities. In this paper, we look retrospectively at the successes of representative gateways thus far. This serves to highlight existing gaps gateways need to overcome in areas such as accessibility, usability and interoperability, and in the need for broader outreach by drawing insights from technology adoption research. We explore two particularly promising opportunities for gateways - computational social sciences and virtual reality – and make the case for the gateway community to be more intentional in engaging with users to encourage adoption and implementation, especially in the area of educational usage. We conclude with a call for focused attention on legal hurdles in order to realize the full future potential of science gateways. This paper serves as a roadmap for a vision of science gateways in the next ten years

    Evaluating and Enabling Scalable High Performance Computing Workloads on Commercial Clouds

    Get PDF
    Performance, usability, and accessibility are critical components of high performance computing (HPC). Usability and performance are especially important to academic researchers as they generally have little time to learn a new technology and demand a certain type of performance in order to ensure the quality and quantity of their research results. We have observed that while not all workloads run well in the cloud, some workloads perform well. We have also observed that although commercial cloud adoption by industry has been growing at a rapid pace, its use by academic researchers has not grown as quickly. We aim to help close this gap and enable researchers to utilize the commercial cloud more efficiently and effectively. We present our results on architecting and benchmarking an HPC environment on Amazon Web Services (AWS) where we observe that there are particular types of applications that are and are not suited for the commercial cloud. Then, we present our results on architecting and building a provisioning and workflow management tool (PAW), where we developed an application that enables a user to launch an HPC environment in the cloud, execute a customizable workflow, and after the workflow has completed delete the HPC environment automatically. We then present our results on the scalability of PAW and the commercial cloud for compute intensive workloads by deploying a 1.1 million vCPU cluster. We then discuss our research into the feasibility of utilizing commercial cloud infrastructure to help tackle the large spikes and data-intensive characteristics of Transportation Cyberphysical Systems (TCPS) workloads. Then, we present our research in utilizing the commercial cloud for urgent HPC applications by deploying a 1.5 million vCPU cluster to process 211TB of traffic video data to be utilized by first responders during an evacuation situation. Lastly, we present the contributions and conclusions drawn from this work

    Computing environments for reproducibility: Capturing the 'Whole Tale'

    Get PDF
    The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale project aims to address these barriers by connecting computational, data-intensive research efforts with the larger research process—transforming the knowledge discovery and dissemination process into one where data products are united with research articles to create “living publications” or tales. The Whole Tale focuses on the full spectrum of science, empowering users in the long tail of science, and power users with demands for access to big data and compute resources. We report here on the design, architecture, and implementation of the Whole Tale environment
    corecore