224 research outputs found

    Development of Authenticated Clients and Applications for ICICLE CI Services -- Final Report for the REHS Program, June-August, 2022

    Full text link
    The Artificial Intelligence (AI) institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) is funded by the NSF to build the next generation of Cyberinfrastructure to render AI more accessible to everyone and drive its further democratization in the larger society. We describe our efforts to develop Jupyter Notebooks and Python command line clients that would access these ICICLE resources and services using ICICLE authentication mechanisms. To connect our clients, we used Tapis, which is a framework that supports computational research to enable scientists to access, utilize, and manage multi-institution resources and services. We used Neo4j to organize data into a knowledge graph (KG). We then hosted the KG on a Tapis Pod, which offers persistent data storage with a template made specifically for Neo4j KGs. In order to demonstrate the capabilities of our software, we developed several clients: Jupyter notebooks authentication, Neural Networks (NN) notebook, and command line applications that provide a convenient frontend to the Tapis API. In addition, we developed a data processing notebook that can manipulate KGs on the Tapis servers, including creations of a KG, data upload and modification. In this report we present the software architecture, design and approach, the successfulness of our client software, and future work

    Distributed workflows with Jupyter

    Get PDF
    The designers of a new coordination interface enacting complex workflows have to tackle a dichotomy: choosing a language-independent or language-dependent approach. Language-independent approaches decouple workflow models from the host code's business logic and advocate portability. Language-dependent approaches foster flexibility and performance by adopting the same host language for business and coordination code. Jupyter Notebooks, with their capability to describe both imperative and declarative code in a unique format, allow taking the best of the two approaches, maintaining a clear separation between application and coordination layers but still providing a unified interface to both aspects. We advocate the Jupyter Notebooks’ potential to express complex distributed workflows, identifying the general requirements for a Jupyter-based Workflow Management System (WMS) and introducing a proof-of-concept portable implementation working on hybrid Cloud-HPC infrastructures. As a byproduct, we extended the vanilla IPython kernel with workflow-based parallel and distributed execution capabilities. The proposed Jupyter-workflow (Jw) system is evaluated on common scenarios for High Performance Computing (HPC) and Cloud, showing its potential in lowering the barriers between prototypical Notebooks and production-ready implementations

    Workflow models for heterogeneous distributed systems

    Get PDF
    The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

    Workshop Report: Container Based Analysis Environments for Research Data Access and Computing

    Get PDF
    Report of the first workshop on Container Based Analysis Environments for Research Data Access and Computing supported by the National Data Service and Data Exploration Lab and held at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign

    HPC-oriented Canonical Workflows for Machine Learning Applications in Climate and Weather Prediction

    Get PDF
    Machine learning (ML) applications in weather and climate are gaining momentum as big data and the immense increase in High-performance computing (HPC) power are paving the way. Ensuring FAIR data and reproducible ML practices are significant challenges for Earth system researchers. Even though the FAIR principle is well known to many scientists, research communities are slow to adopt them. Canonical Workflow Framework for Research (CWFR) provides a platform to ensure the FAIRness and reproducibility of these practices without overwhelming researchers. This conceptual paper envisions a holistic CWFR approach towards ML applications in weather and climate, focusing on HPC and big data. Specifically, we discuss Fair Digital Object (FDO) and Research Object (RO) in the DeepRain project to achieve granular reproducibility. DeepRain is a project that aims to improve precipitation forecast in Germany by using ML. Our concept envisages the raster datacube to provide data harmonization and fast and scalable data access. We suggest the Juypter notebook as a single reproducible experiment. In addition, we envision JuypterHub as a scalable and distributed central platform that connects all these elements and the HPC resources to the researchers via an easy-to-use graphical interface

    Streamlined HPC Environments with CVMFS and CyberGIS-Compute

    Get PDF
    High-Performance Computing (HPC) resources provide the potential for complex, large-scale modeling and analysis, fueling scientific progress over the last few decades, but these advances are not equally distributed across disciplines. Those in computational disciplines are often trained to have the necessary technical skills to utilize HPC (e.g. familiarity with the terminal), but many disciplines face technical hurdles when trying to apply HPC resources to their work. This unequal familiarity with HPC is increasingly a problem as cross-discipline teams work to tackle critical interdisciplinary issues like climate change and sustainability. CyberGIS-Compute is middle-ware designed to democratize to HPC services with the goal of empowering domain scientists, but a key challenge facing model developers on CyberGIS-Compute is creating a containerized software environment for their models. In this paper, we discuss our work to integrate the Cern Virtual Machine File System (CVMFS) into CyberGIS-Compute to provide consistent software environments across science gateways and HPC resources

    Hybrid Workflows for Large - Scale Scientific Applications

    Get PDF

    A Vision for Science Gateways: Bridging the Gap and Broadening the Outreach

    Get PDF
    The future for science gateways warrants exploration as we consider the possibilities that extend well beyond science and high performance computing into new interfaces, applications and user communities. In this paper, we look retrospectively at the successes of representative gateways thus far. This serves to highlight existing gaps gateways need to overcome in areas such as accessibility, usability and interoperability, and in the need for broader outreach by drawing insights from technology adoption research. We explore two particularly promising opportunities for gateways - computational social sciences and virtual reality – and make the case for the gateway community to be more intentional in engaging with users to encourage adoption and implementation, especially in the area of educational usage. We conclude with a call for focused attention on legal hurdles in order to realize the full future potential of science gateways. This paper serves as a roadmap for a vision of science gateways in the next ten years
    • …
    corecore